oreilly.comSafari Books Online.Conferences.


Analyzing Web Logs with AWStats
Pages: 1, 2, 3, 4, 5

Turning to Pages

As the internet has matured, more sophisticated attention turned from hits to pages. Unfortunately, this opened a new can of worms: there is no standard definition of a page. A web server log file simply contains information on objects requested from the web server. It is up to the web server log file analysis software to give semantic meaning to those objects.

Generally a page is a content object that a user viewed, such as an HTML file, a word processing document, or an Adobe Acrobat PDF file.

AWStats works by exclusion in defining a page. By default, any object accessed by a user on your web server is a page unless it has a filename suffix of css, js, class, gif, jpg, jpeg, png, bmp, or ico. You must explicitly add any other objects you do not want to count as pages in AWStats reports. For example, add ZIP achieves and Flash animation files to this list by adding their suffixes to the AWStats NotPageList directive in the AWStats configuration file:

NotPageList="css js class gif jpg jpeg png bmp ico swf zip
tgz gz tar"

Then AWStats will count everything but the following as pages:

Table 3. Files not counted as pages
Suffix Description
css Cascading Style Sheet formating instruction files
js JavaScript dynamic program logic
class Java program files
gif, jpg, jpeg, png, and bmp Various image/photo formats
ico An image icon file; many sites have a company logo saved as favicon.ico; many browsers use this in bookmarks (favorites) and tabs
swf ShockWave Flash animation
zip, tgz, gz, and tar Achieve formats created by PKZip, WinZip, tar, gzip, or similar

One advantage to this approach is that if you are using a CGI to generate dynamic pages, you do not have to worry about each CGI query counting as a page--this will be automatic.

Counting tips

  • Various standards boards, such as the Internet Advertising Bureau, seem to be converging on common definitions after several years of work on the topic. The primary driver is the advertising market. If you are planning to make your data public, you should consider guidelines provided by these organizations when defining your page exclusion list. If you adhere to these standards, you could add a methodology note when publishing your data so your audience will understand the basis for your numbers.
  • Your servers may not see all page requests from your users. To understand why and what you can do about it, see the web caching references.
  • You may have pages you do not want to track because they serve an internal purpose, are repetitive parts of a frame set, are temporary redirections, or something similar. Use the configuration parameter AWStats SkipFiles directive to list files to exclude.

Visitors and sessions

While the concept of a page is open to some interpretation, the concept of a visitor (and a visit, also known as a session) is more difficult to define. Log data neither defines nor tracks a visitor entity. Several heuristic approaches can be used to extrapolate individual visitors from server log data, each approach adding an additional level of refinement.

By convention, a visitor is at least the IP address (host) from which the web requests originate. Many commercial tools use cookies to increase the accuracy of this approach. AWStats does not yet use cookies to increase the accuracy of visitor recognition. This is an often-requested AWStats enhancement. Perhaps a Perl programmer reading this will take on the challenge.
A visit constitutes all activity occurring without a break of more than 30 minutes. Thus, if you request a page and then wait 29 minutes before requesting a new page, both page requests take place during the same visit (or session). However, if you request the subsequent page 30 minutes and 1 second later, that is a new visit. AWStats currently considers a visitor session break to be 60 minutes. Hopefully, this will be configurable in a future version.
Synonym for visit.
Unique visitors
The count of visitors after removing duplicate visits.
Authenticated visitors
Users who have logged in with a username and password. This can be a web server-controlled login or an application server-level login. Web log analysis tools like AWStats track logins at the web server level. The application level login is more common.

Pages: 1, 2, 3, 4, 5

Next Pagearrow

Sponsored by: