Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Transparent Proxying with Squid

by Jennifer Vesperman
10/25/2001

Transparent proxying frees you from the hassle of setting up individual browsers to work with proxies. If you have a hundred, or a thousand, users on your network, it's a pain to set up each browser and to use proxies -- or to try to convince users to go into their preferences and type in these symbols they don't understand.

Using transparent proxying, you intercept their web requests and redirect them through the proxy. Nice and simple -- on the surface.

Why not to use a transparent proxy

Transparent proxying (more commonly known as TCP hijacking) is like Network Address Translation (NAT) in some respects: It is to be avoided at all costs, and only used if there absolutely, positively, no other way.

Why? Because transparent proxying does not work very well with certain web-browsers. With most browsers you're fine, but if even a quarter of your users are using badly behaved browsers, you can expect your help desk costs to exceed any benefits you might gain from transparent proxying. Unfortunately, these browsers are in wide use.

These browsers behave differently if they are aware of a proxy -- all other browsers follow the standard, and the only change they make with a proxy is to direct the requests to a different machine and port. Badly behaved browsers leave some of the HTTP headers out of their requests, and only add them when they know there's a proxy. Without those headers, user commands like "reload" don't work if there's a proxy between the user and the source.

Transparent proxying also introduces a layer of complexity, which can complicate otherwise simple transactions. For instance, a web-based application that requires an active server cannot test for the server by making a connection -- it will connect to the proxy, instead.

Transparent proxying theory

So how does transparent proxying work?

A firewall or other redirector catches TCP connections directed at specific ports on remote hosts (usually port 80), and directs them to the local proxy server. The proxy server uses HTTP headers to determine where it is supposed to make a connection to, and proxies the request.

System administrators are often asked to also transparently proxy FTP and SSL, but these can't be transparently proxied. FTP is a more complex protocol than HTTP, and provides fewer hints as to the original destination of the request. SSL is encrypted and contains no useful data about destinations. Attempts to decode SSL are precisely what it's designed to prevent: decoding SSL to transparent proxy -- it would be indistinguishable from a "true" man-in-the-middle attack.

To perform transparent proxying, we need a server between the clients and the destinations. This server must have the necessary facilities to match and redirect traffic, such as netfilter and iptables. Any firewalling system capable of Network Address Translation and traffic redirection is suitable.

You will need to configure a rule to catch traffic destined for port 80 on external hosts, and redirect this traffic to the port of a proxy server on the intercepting machine.

You can have proxies which aren't on the intercepting machine, but these are more awkward. First, the source address of the request is no longer available to the proxy -- it's lost in the process of redirection. You can solve this by using destination NAT (Network Address Translation), but you then have to route the proxy traffic back through the translating server. If you attempt to have the proxy pass the HTTP response back directly, the client will be confused and (quite correctly) refuse to speak to the proxy. The proxy is not the machine the client thinks it's talking to -- the client thinks it's making the request of the destination web server. The proxy must route back through the interceptor, so it can translate the addresses back, and let the client continue to believe it's speaking directly to the web server.

HTTP/1.1 made life easier for transparent proxies, by making the host header mandatory. This header contains the name of the machine (as given in the URL) and allows virtual name-based web-hosting, by allowing the web server to use the host header to determine which page to respond with.

For transparent proxies, it provides the proxy with the host name. Having received an intercepted port 80 connection, the proxy server needs to understand that it is not receiving a fully qualified absolute URI (Uniform Resource Identifier), but a relative URI. Normally, a proxy server receives http://host/path, but if the client thinks it's talking to the server, not a proxy, it just asks for /path.

Related Reading

Web CachingWeb Caching
By Duane Wessels
Table of Contents
Index
Sample Chapter
Full Description
Read Online -- Safari

The proxy server uses the HOST header to reassemble the fully qualified URI, then checks its cache and does its usual proxying.

Squid is suitable for transparent proxying because it is also designed as a reverse proxy (also known as an "HTTP accelerator"), and can read these abbreviated request headers. In accelerator mode, it fronts for the actual web servers and receives requests as if it were the web server, so it was designed with the ability to reassemble relative URIs. To use it as a transparent proxy, we enable this web acceleration behavior.

When using Squid as an HTTP accelerator, configure the host name and the port you want the proxy to accelerate. This prevents Squid from being used as an arbitrary HTTP relay. When using Squid in accelerator mode as a transparent proxy, set the host name to virtual and the port to whichever port we want a transparent proxy for.

Configuring a transparent proxy

Traffic interception

Intercept and/or redirect the traffic to the chosen port. Having the proxy on the same machine as the interceptor is preferable. The code example uses iptables as the redirection mechanism, and port 8080 as the proxy's http_port.

To a different machine

iptables -t nat -A PREROUTING -i $INTERFACE -p tcp --dport 80 -j DNAT --to 10.0.3.1:8080

To the same machine

iptables -t nat -A PREROUTING -i $INTERFACE -p tcp --dport 80 -j REDIRECT --to-port 8080

Squid configuration

In the squid.conf file, configure these options:

Note that you cannot transparently proxy more than one port at a time. The HTTP headers do not contain port information, so Squid cannot tell which port the request was intended for once the request has been intercepted.

Caveats and gotchas

Related articles:

Peering Squid Caches

Authentication and Squid

Using Squid on Intermittent Connections

Installing and Configuring Squid

Final words

The most common reason to use transparent proxying is to reduce the setup load for web browsers. System administrators need to be aware of the common problems of transparent proxies, and determine whether they are appropriate in their environments. If the end users are using browsers that are known to behave well with transparent proxies, and the machine designated as the proxy is capable of handling the load, a transparent proxy can be an effective solution.

Further reading

None of these explicitly describe transparent proxying, but they are useful nonetheless.

Jennifer Vesperman is the author of Essential CVS. She writes for the O'Reilly Network, the Linux Documentation Project, and occasionally Linux.Com.


Return to the Linux DevCenter.

Copyright © 2009 O'Reilly Media, Inc.