oreilly.comSafari Books Online.Conferences.


Analyzing Web Logs with AWStats, Part 2
Pages: 1, 2, 3, 4, 5

Search Engine Crawlers

Crawler traffic is highly beneficial--it is the ongoing updating of your content in search engine indexes. The ability to monitor this traffic is essential as part of an overall search engine optimization strategy. Many organizations invest in paid inclusion or keywords without first having exploited the greater benefits of organic merit-based search engine optimization (SEO). Monitor this traffic to ensure Google and other bots are updating their indexes on a regular basis. The relevant AWStats report is Robots/Spiders visitors.

Off-line Download

Off-line downloading tools, such as Wget and htttrack, will download content within a domain or subdirectory of a domain, as specified by the human user who launches the tool. While your server logs these requests, you do not really know if a user ever will look at all of the pages, nor how many times the user will consult the pages off-line. From a business point of view, off-line downloading could represent monitoring by your competition. The relevant AWStats report is Browsers.


Some site traffic consists of automated attempts to exploit weaknesses in web servers in an attempt to hijack the server. AWStats currently tracks five types of attacks on Microsoft IIS. If you don't use IIS, you can disable the report. The relevant AWStats report is Worm/Virus attacks.

Monitoring Scripts

Many sites employ automated virtual transactions to monitor specific processes in their website. The usual practice is to filter this traffic from your web statistics. To this end, AWStats provides two configuration directives. You can use SkipHosts if all of the traffic (and just that traffic) comes from a specific IP address, or SkipUserAgents if the "robot" performing the transaction identifies itself with a particular name.

A Note on Measuring Non-Human Traffic and Page Tagging

One criticism leveled at web server log file data analysis is that the presence of non-human traffic distorts the statistics. The primary alternative method, page tagging, works by including page tags that should call the counting server only when a normal browser, not a robot, visits the page. In theory, this excludes non-human traffic. Page tag vendors tout this as beneficial. Unfortunately, this approach misses information essential to the management of most sites. In particular, visibility of search engine crawler activity is an essential ingredient of an overall search engine strategy. AWStats offers the best of both worlds-- it captures automated traffic and reports on it, but maintains this data separate from interactive human user reports. Web log analysis can also report on objects that you cannot readily tag, such as images and binary document files.

Parting Words

These articles have only touched the surface of what is possible with web analytics and AWStats. The following resources may help you integrate web log analysis with AWStats into your website management.

Final Tips

  • To facilitate report interpretation by business and technical users, generate separate technical and business reports by maintaining two separate AWStats configuration files: one enabling technical reports, the other business reports.
  • If you decide to use the on-demand CGI interface:

    • Use at least version 6.4. There were security issues in previous versions.
    • Consider limiting access to the CGI interface by limiting traffic to internal IPs or by password protecting it.
  • Sign up for notification of AWStats updates. New releases may include useful features and resolve bug or security issues.

Additional Resources

Support Options

Measurement Guidelines

The following provide more exhaustive information on web analytics terminology and its usage.

Pages: 1, 2, 3, 4, 5

Next Pagearrow

Sponsored by: