Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Installing and Configuring Squid

by Jennifer Vesperman
07/26/2001

What can Squid do for your site?

Put Squid between the users and the Internet to cache your web pages. Users surf faster, HTTP traffic uses less bandwidth, and you can save on bandwidth fees -- or use the saved bandwidth for other traffic.

Finding Squid

Squid source code is available from squid-cache.org. There is also a list of mirror sites. Installation instructions are available in the ReadMe file in the source tar file.

There is also an RPM package for Red Hat Linux, and packages for FreeBSD and NetBSD. Instructions on installing the packages are available from the package sites. All three package sets come with Squid source code and additional instructions on installing from the source.

The ./configure script has several options. A recommended set is ./configure --enable-heap-replacement --enable-cache-digests --enable-dlmalloc.

Configuring Squid

The Squid configuration file is in the $SQUID-HOME/etc/squid.conf directory. The file has extensive comments about each option. This article is a quick reference to which options you are most likely to want to change. Your local network may require configurations not mentioned in this article.

When you have Squid configured, run squid -z to create the cache directory structure. Then you can start Squid.

Basic configuration

Basic configuration ACLs

Access-control lists manage the access to your network. This basic example limits access to the proxy to the network 1.2.3.4/24. It matches successfully if a request comes from any of the addresses between 1.2.3.0 and 1.2.3.255 (inclusive).

acl our_network src 1.2.3.4/24
http_access allow our_network
http_access deny all

ACLs are checked from top to bottom. Clients with IPs in our_network are permitted, anyone else falls through to the "deny all" and gets a failure message. The format for the class definition is acl listname src network/netmask.

ACLs have an implicit last line that reverses the rule of the previous line. This protects against forgetting to add the http_access deny all, but explicitly adding that line makes the ACL more readable and helps ensure that it's not missed when the ACL is changed.

miss_access

If an object isn't in the cache and marked as fresh, Squid checks with the origin server to see if it is still current and requests a new copy if it isn't. This behavior serves local users well, but is undesirable if the requesting client is a neighboring proxy server. The following ACL lines allow the local network to be passed objects which aren't in the current cache, but deny this service to anyone outside the local network.

miss_access allow our_network
miss_access deny all

icp_access

Caches communicate with ICP messages to find out whether they have fresh content that satisfies a request. The icp_access ACL lines are used to control the caches Squid can communicate with.

Configuration for speed

To maximize speed, minimize the number of simultaneous requests Squid has to handle. The more requests Squid has to process in parallel, the longer each request takes. Every bit of latency you can reduce speed of the server.

Aim to have 20 or 30 DNS servers. DNS lookups can be slow -- some continental backbones can take a minute and more to resolve a DNS request.

Dos and Don'ts

Configuration for economy

Saving dollars is a large part of what proxy caches are all about. It's easy to waste your proxy -- and real money in bandwidth -- if you don't understand what's going on. It's also easy to save money with one if you know the issues. Most web servers and web content are operated or produced by people who don't really understand the HTTP protocol. As cache administrators, their errors are going to land on your shoulders.

Disk space and memory

A cache can always use more disk space, but as the size of your disk-cache grows, you will need more memory to index it. There's a straightforward rule for memory.

Divide the size of your disk cache by 13 Kbytes, and multiply that by 130 bytes. Add the size of cache_mem, and add about 2.5 Mbytes more for executable files, libraries, and other sundry overhead. For example: We have a 10-Gbyte drive, and a cache_mem of 8 Mbytes.

10 Gbytes/13 Kbytes = 769,230
769,230 x 130 bytes = 99,999,900 bytes (or 97,656 Kbytes)
97,656 Kbytes + 2.5 Mbytes + 8 Mbytes = 10,849,656 Kbytes or about 108 Mbytes

The example server needs 108 Mbytes available to Squid to support 10 Gbytes of cache_dir.

Provide as much disk space as you can provide RAM to support it. Squid performs very badly when it starts to swap. Remember to set aside memory for anything else on the machine (DNS, cron, operating system, etc.).

Refresh patterns

Refresh patterns determine the lifetime of the object. Within an object lifetime, Squid will serve the object without requesting an IMS ("if modified since") request. Once the lifetime is exceeded, Squid will keep the object but will send an IMS request to the origin server. If the object has been modified since it was first cached, Squid requests the new copy. If not, it keeps the old copy. Either way, the object is marked as fresh again.

Here's our (default) basic refresh pattern:

refresh_pattern . 0 20% 4320

The dot (.) is the the regular expression pattern, and matches anything. It uses POSIX regular expressions. (See man 7 regex).

The zero (0) is the minimum freshness time. If it's anything other than zero, it will override any expiration headers given with the object. If the content provider actually provided an expiration header, we should usually honor it.

The last term (4320) is the maximum freshness time. The object becomes stale after this many minutes in the cache.

The 20 percent is used for our default case, for when there's no information from the content provider about the lifetime of the object. Squid takes x percent (20 percent in this example) of the difference between the last-modified time of the object and the current time, and uses that as the object lifetime. If the object lifetime is less than the minimum set by the refresh_pattern, it is increased to at least that. If it's greater than the supplied maximum, it's reduced to that.

Non-standard files

Some kinds of files can be maintained much longer than others. Zip, tar.gz, tgz, and .exe files rarely change content without also changing name. Using regular expressions, we can create a set of refresh patterns like this:

refresh_pattern -i exe$ 0 50% 999999
refresh_pattern -i zip$ 0 50% 999999
refresh_pattern -i tar\.gz$ 0 50% 999999
refresh_pattern -i tgz$ 0 50% 999999

Refresh pattern options

Note that these options violate the HTTP standard. Do not use them lightly.

override-expire pretends there is no expiration header on the object and calculates purely based on last-modified times. This permits you to cache sites that abuse the use of expiration headers, but also inhibits updates of frequently changed content (such as news sites).

ignore-reload prevents the object being refreshed when the user presses the refresh button on their browser. This does not perform well when the object has no content length -- you may wind up with a broken object that the users cannot reload.

reload-into-ims transforms reloads into validations. Beware: Web servers may permit an object to be updated without the last-modified time being altered. The server may then insist that the object is still valid when it actually is not.

More Dos and Don'ts

Caveats and gotchas

Final words

Squid can improve browsing speed and reduce HTTP bandwidth. The squid.conf file gives great flexibility, but can be initially daunting. These settings let you get started -- but are just a start. Experiment!

Further reading

Jennifer Vesperman is the author of Essential CVS. She writes for the O'Reilly Network, the Linux Documentation Project, and occasionally Linux.Com.


Return to the Linux DevCenter.

Copyright © 2009 O'Reilly Media, Inc.