LinuxDevCenter.com
oreilly.comSafari Books Online.Conferences.

advertisement


Cache-Friendly Web Pages
Pages: 1, 2

The Expires directives have two syntaxes. One is fairly unreadable; it expects you to calculate how many seconds until expiry. Fortunately, the module will also read a much more human syntax. This article describes the readable syntax.



The directives are:

ExpiresActive on|off
ExpiresDefault "<base> [plus] {<num>  <type>}*"
ExpiresByType type/encoding "<base> [plus] {<num> <type>}*"

base is one of:

  • access
  • now (equivalent to "access")
  • modification

num is an integer value that applies to the type. type is one of:

  • years
  • months
  • weeks
  • days
  • hours
  • minutes
  • seconds

If you're using the Expires directives for a server, virtual host, or directory, edit the httpd.conf file and add the directives inside those realms.

<Directory /whichever/directory/here>
    # Everything else you want to add to this section
    ExpiresActive on
    ExpiresByType image/gif "access plus 1 year"
    ExpiresByType text/html "modification plus 2 days"
    # ExpiresDefault "now plus 0 seconds"
    ExpiresDefault "now plus 1 month"
</Directory>

If you're using the Expires header in the .htaccess file, you will need to edit httpd.conf to set the AllowOverride header for the relevant directory. Apache will only read .htaccess in directories which have the "Indexes" override set.

# Allow the Indexes override for the directories using .htaccess.
<Directory /whichever/directory/here>
    # Everything else you want to add to this section
    AllowOverride Indexes 
</Directory>

Add the Expires directives to the .htaccess file in the relevant directory. The webmaster can edit the .htaccess file without needing access to httpd.conf.

The main problem with the ".htaccess" method is that the Indexes override and the .htaccess file give the webmaster more configuration options than just the Expires header. This may not be what the system administrator intends.

Alternative method: Cache-Control header

mod_cern_meta allows file-level control, and it also allows the use of Cache-Control headers (or any other header). The headers are put in a subdirectory of the origin directory, with a name based on the origin file's name.

Web Caching

Related Reading

Web Caching
By Duane Wessels

Uncomment the cern_meta_module line and recompile, as for expires_module in the last section.

In the httpd.conf file, set MetaFiles on, MetaDirectory to the subdirectory name, and MetaSuffix to a suffix for the header files.

MetaFiles on
MetaDirectory .web
MetaSuffix .meta

Using these values, the file /var/www/www.example.org/index.html would have a meta file at /var/www/www.example.org/.web/index.html.meta.

Any valid HTTP headers can be put in these files. This provides another way to apply the Expires header, and it's a way to add the Cache-Control headers. The relevant Cache-Control headers are:

Cache-Control : max-age = [delta-seconds]
Modifies the expiration mechanism, overriding the Expires header. Max-age implies Cache-Control : public.
Cache-Control : public
Indicates that the object may be stored in a cache. This is the default.
Cache-Control : private
Cache-Control : private = [field-name]
Indicates that the object (or specified field) must not be stored in a shared cache and is intended for a single user. It may be stored in a private cache.
Cache-Control : no-cache
Cache-Control : no-cache = [field-name]
Indicates that the object (or specified field) may be cached, but may not be served to a client unless revalidated with the origin server.
Cache-Control : no-store
Indicates that the item must not be stored in nonvolatile storage, and should be removed as soon as possible from volatile storage.
Cache-Control : no-transform
Proxies may convert data from one storage system to another. This directive indicates that (most of) the response must not be transformed. (The RFC allows for transformation of some fields, even with this header present.)
Cache-Control : must-revalidate
Cache-Control : proxy-revalidate
Forces the proxy to revalidate the page even if the client will accept a stale response. Read the RFC before using these headers, there are restrictions on their use.

Caveats and gotchas

  • HTTP/1.0 has minimal cache control and only understands the Pragma: no-cache header. Caches using HTTP/1.0 will ignore the Expires and Cache-Control headers.

  • None of the Cache-Control directives ensure privacy or security of data. The directives "private" and "no-store" assist in privacy and security, but they are not intended to substitute for authentication and encryption.

  • This article is not a substitute for the RFC. If your are implementing the Cache-Control headers, do read the RFC for a detailed description of what each header means and what the limits are.

Final words

Caching is a reality of the Internet and enables efficient usage of bandwidth. Your clients probably view your pages through a cache, and sometimes multiple caches. Applying cache headers to your pages protects the page content and allows your clients to save their bandwidth.

Further reading

Jennifer Vesperman is the author of Essential CVS. She writes for the O'Reilly Network, the Linux Documentation Project, and occasionally Linux.Com.


Return to the Linux DevCenter.




Linux Online Certification

Linux/Unix System Administration Certificate Series
Linux/Unix System Administration Certificate Series — This course series targets both beginning and intermediate Linux/Unix users who want to acquire advanced system administration skills, and to back those skills up with a Certificate from the University of Illinois Office of Continuing Education.

Enroll today!


Linux Resources
  • Linux Online
  • The Linux FAQ
  • linux.java.net
  • Linux Kernel Archives
  • Kernel Traffic
  • DistroWatch.com


  • Sponsored by: