oreilly.comSafari Books Online.Conferences.


Analyzing Web Logs with AWStats
Pages: 1, 2, 3, 4, 5

Report Foundations: Hits, Pages, Sessions, and Visitors

To put the created reports in context, begin by looking at the raw log data, and from that define basic web analytics terminology.

Anatomy of a web server log file

Using the configuration format specified earlier, each web log will have multiple lines of text, each containing nine fields of data. To understand the work AWStats has to perform, consider how a record looks:

Table 1. Web server log record (line) example
  Field Data Example Explanation
1 Host (user) IP There has been a DNS lookup in this case. The web server can do it, but you can also do it later, if you do it at all. Judging from the user's host, there is a reasonable probability that the request came from Italy. (However, if the host were something like, the user might have been working for Alitalia in Boston!)
2 RFC 1413 identity (username) of the client determined by identd. - Rarely used. PC clients do not usually run identd. A dash is a placeholder in the absence of a value.
3 Authenticated User (login name) - The login name for a web server-required login. This is not usually present--most web sites use application server logins, not web server logins.
4 The date and time that the server finished processing the request [08/Jun/2005:19:03:22 +0200] Time includes UTC (Coordinated Universal Time) offset.
5 The user request GET/HTTP/1.1 In this case, the client requested the top-level default document / (index.html) using the GET method of the HTTP protocol version 1.1.
6 Response Status sent to client 200
  • 1xx--informational
  • 2xx--successful
  • 3xx--redirection
  • 4xx--client error
  • 5xx--server error
7 Bytes sent, excluding HTTP headers 4544  
8 Referer (sic) URL, if any The URL from which the client made the request. This field is blank if the user directly types a URL, chooses a bookmark, or uses privacy software that blocks the information from being sent.
9 User-Agent identification as reported by the user agent. This usually includes operating system and browser names and versions. Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4 This is a Firefox 1.0.4 browser on a Fedora Linux system. Note: some browsers, such as Opera, let the user choose which identification to send. A user can claim to use Microsoft Internet Explorer 6 even while using Opera. This impostor functionality is a response to all the poorly designed "Optimized for browser x" sites that refuse to work with other, legitimate standards-compliant browsers.

This one web server entry, a successful request for, represents what is commonly called a hit. An anonymous user navigated from the page

Hits everywhere

Consider that the web site's home page in the above example is actually a group of files--one text file (index.html), one style sheet to indicate formatting (CSS), six image files (GIF, ICO, and PNG), and some dynamic client-side logic (JavaScript) stored in two separate files on the web server. Simply calling up the home page will result in ten file requests to the web server, and thus ten hits:



1 HTML text file; for example, index.html
1 CSS formatting instructions file
6 GIF, ICO, and PNG image files
2 js JavaScript client logic instruction files
10 Total hits

Probably the most common web metric bandied about, "hits" is also the most meaningless.

A hit is a successful request for an object from a web server. Success usually merits a status code of 200 or, for objects that are identical to those already in a user's cache, 304.

Along with bandwidth consumption, hits can be useful as an input for server sizing and capacity planning. While people make much of hits to tout the success of a site, hits have no intrinsic business value. Representations to the contrary probably indicate a lack of understanding of how futile hits are as a useful business measure.

Pages: 1, 2, 3, 4, 5

Next Pagearrow

Sponsored by: