Web proxies intercept traffic from your systems as they move to other systems, analyze the packets, then send the data along. There are a lot of reasons why you might want to intercept packets.
Originally the main use case for a proxy was as a caching server. In this use case, the first time a person in your network goes to a website, the static content (particularly graphic images) gets downloaded and cached. Then, because the content is local, the next person to hit that site will get a fast response.
Internet links are faster now, and increasingly, people are using proxies that are also in the cloud. There’s still a caching speed benefit when loading content from very remote or very slow websites, but most of the heavily used sites on the internet are supported by a CDN (content delivery network). A CDN effectively does the same thing—it caches content and brings it closer to the user to reduce latency.
Why are web proxies used today?
Proxies are now commonly used for an entirely different purpose: security. Most web content is encrypted to protect your privacy. This is a good thing. But it makes it difficult to inspect that content for malicious content as it goes through the network. A proxy can solve this problem.
HTTPS uses a system of trusted “certificates,” which you can think of as very long complicated passwords. These certificates are hierarchical, with a top-level certificate held by a recognized authority like Verisign or GoDaddy. They’re used to prove that other certificates that are used to actually encrypt your web traffic are actually valid.
When you connect to a website through a proxy, your browser first connects to the proxy using an HTTPS session. The proxy server establishes a connection to the real destination website using the real certificates for that website on your behalf. It uses the data in your original web session to build the packets to send to the real site. Then it takes the responses from the website, decrypts them as if it was your web browser, and stuffs them into a new packet in the original HTTPS session that you started. Once that’s done, it sends them to your browser.
This is essentially a “man-in-the-middle” attack against your web session, something the certificates were intended to prevent. But this particular man-in-the-middle is allowed because you also trust the certificates the proxy server is using.
So why would attacking your own web session make you more secure? Well, that proxy server—located out on the internet, decrypting and looking at your web sessions—is inspecting that data for any signs of malicious content. And hopefully the very large proxy services will have enough customers to leverage their unique position in the middle of all of these sessions to identify new malicious content faster than you would on your own.
What proxy architectures are available?
There are a few different proxy architectures, each with their own advantages and disadvantages.
Some proxies use an app or a browser plug-in to redirect your web traffic through the proxy server. This has the advantage that all of the traffic from your web browser is automatically routed through the proxy, even if you aren’t in the office.
It has the disadvantage that some malicious software on your computer might be able to avoid it. For example, one of the use cases for a proxy is to identify “command and control” traffic. If there’s a piece of malicious software on your computer, it might try to “call home” for instructions or upload information stolen from your system. One way it might try to hide that traffic is in an encrypted HTTPS web session. So a proxy would be a useful tool in detecting that type of traffic.
But that malicious software is presumably running on your computer where it could, in theory, detect and bypass the app that forwards your packets through the proxy. In this type of architecture, it’s extremely important for the redirection software to be inaccessible with normal user credentials and impossible to bypass. Ideally, it’s integrated into your computer’s own network drivers.
The other problem is this proxy architecture doesn’t help any systems that don’t have the special redirection software on them. So if there’s a highly vulnerable device on your network like a printer, and an attacker has managed to break into it and use it to connect back to their command and control server, a proxy won’t help you. It also won’t help if they’ve somehow attached a malicious device to your network.
An alternative proxy architecture that’s useful in these situations basically just connects the outside of your network’s firewall to the proxy server using a sort of tunnel or VPN (virtual private network). Then it routes all of your traffic through the proxy. This architecture has its own drawback, though—if you take a computer out of the network to use at home or elsewhere, it becomes unprotected.
Some proxy services have both types of architecture, and will allow you to use them together to protect both cases.
So far we’ve mostly been talking about web proxies, but there are other types out there. One interesting type is a DNS (domain name server) proxy. The DNS proxy doesn’t decrypt anything. It just processes your DNS requests and shows you a special message on a private web page if you ever try to access a website that it knows or suspects is bad.
There are also SMTP (simple mail transport protocol) proxies for secure scanning and forwarding of email. That is a separate huge topic all its own, though.
What are some problems with proxies?
I like proxies, and I recommend using them, but there are problems with them that you should be aware of.
The first problem is an ethical one. Is it OK for you to look at my private web browsing traffic? Maybe I’m doing my banking and I don’t want you to see my password and potentially have access to my accounts. Maybe I’m applying for a new job and I don’t want anybody to know in case it doesn’t work out. Or maybe it includes private health information I don’t want my employer to know about.
But maybe I’m stealing company secrets, and you really do want to know.
The usual solution to this ethical problem is to never let a human look at the content of web sessions unless the proxy flags some sort of serious problem with it. This is combined with specifying lists of sites that won’t be inspected—things like banking applications would be included in this list.
The second problem with a proxy is that you really have to trust the service provider operating the proxy. What are they doing with my web browsing sessions? Are they really respecting the privacy of my banking? Are they internally secure, or is it possible somebody has hacked them and is able to snoop on private data?
There really isn’t a solid solution to this problem. You have to do your due diligence on any service provider that has access to your sensitive data.
And finally, there’s the problem of what proxies might miss. We’ve already discussed a few examples where some traffic might sneak past the proxy. But it’s good to be very clear about exactly what kind of traffic you’re trying to monitor.
A web proxy will be optimized to look at HTTPS traffic, but it won’t be able to inspect other protocols reliably, even if they use the same encryption system. In particular, a web proxy won’t be useful for inspecting things like file transfer protocols or torrents. And it won’t help at all against unsolicited inbound traffic from the internet. It’s just one tool in your security toolbox.