Network address translation (NAT) is very simple in concept. As packets pass through some network device—typically a firewall, router, or load balancer—either the source or destination IP address is changed. Then packets returning in the other direction are translated back to the original addresses. In some cases both are changed at once, which is called “twice NAT” in some documentation.
Port address translation (PAT) is a special case of NAT in which the source IP addresses for all packets going in one direction are translated to a common address. This is extremely common for internet edge firewalls, which translate all internal IP addresses to a single address, typically the firewall’s own address, on the internet side.
Why “port” address translation
In IPv4 networks, network address translation is nearly ubiquitous for internet edge devices. Because of the shortage of registered IPv4 addresses, most networks have to use private address ranges inside their networks. So nearly every home or enterprise network needs to use NAT at the internet edge.
Most of these implementations use PAT for outbound web browsing traffic. This way a single public address on the internet can be used for all of the devices inside the network. There’s a problem with this, though.
Consider what happens when my computer sends a packet out to a web site on the internet. The firewall replaces my computer’s IP address in the “source” field of the packet header and forwards it out to the destination, which it doesn’t change. The web server then needs to respond.
It sends a packet back to the only IP address it knows for my computer, which is actually the firewall’s external interface address. The internet routes that packet back to the firewall, but now the firewall needs to undo the translation, replacing the address in what is now the “destination” field of the packet header with my computer’s IP address. Then it forwards that packet on to me.
But what happens if there are two or more systems on the inside of the network? What if there are hundreds? What if I’m not the only one communicating with that particular web site? How does the firewall know how to undo the translation?
This is where the “port” in port address translation comes in. By far the most common protocols on the internet are a variation of either TCP or UDP. And both TCP and UDP include the concept of a port. The port is not a physical thing, it’s just a number in the packet header. In fact, there’s a source port and a destination port.
When my computer starts a new session to a web server, it sends the first packet to a particular predefined destination port as well as the destination IP address that we’ve already discussed.
For HTTP, the standard is port number 80. For HTTPS, it’s 443. The network uses the IP address to decide how to route the packet. But then, when it reaches the server, the operating system uses the port number to know which process to send that packet to, in this case, the web server process.
But I could have many web sessions open at the same time. So my computer also includes a “source” port number, which tells it which process, running locally on my computer, to send the response packets back to.
When the remote web server responds to my session, the packet header has its IP address as the source address and my PAT address as the destination address, and it also uses my original source port number as the new destination address.
Unlike the original destination port, which is something well-defined like 80 or 443, the source port number that my computer uses to identify the session for itself can be anything between 1 and 65535.
When the firewall receives a packet from my computer, it adds an entry to its translation table that includes my real IP address, the remote server’s IP address, and both the source and destination port numbers. So when it receives the response back from the server, it can use the source port to look up the “untranslate” values that it needs to use to get the packet back to me.
In reality it’s a little more involved than this because there’s a chance your computer and mine will both use the same source port number. So, while the translation algorithm is generally proprietary to the firewall vendor, many will make up a new source port for the translated packet header. Then it’s easier to make sure this port number is unique. But note that there’s still an upper limit of 65535 simultaneous sessions in this translation table for each shared external IP address.
Remember that a workstation can easily have several sessions open at once, so in practical terms you could encounter resource exhaustion problems on your firewall if you try to use a single PAT address for more than a few thousand internal workstations.
The other problem with PAT is that the firewall needs to match an incoming packet to an existing session. That clearly doesn’t work if you need to allow external devices to connect to an internal server for things like web or email servers. If there are systems that need to accept inbound connections then you need to use a static NAT.
Static network address translation is exactly what it sounds like. Instead of outgoing connections dynamically creating entries in the translation table, you create translation rules that will always be present.
With static NAT, the firewall knows how to do the translation in both directions, even if it’s seeing the first packet in the session.
A variation on static NAT is policy NAT. Recall the earlier comments about how the server uses the port numbers to decide which application to send the packet to. The firewall could also use the same information to make decisions.
The most common way this is used is to share the same internet-side IP address among a few different internal servers. For example, the firewall could have a rule that redirects TCP ports 80 and 443 to the web server, ports 25 and 587 to an email server, and ports 20, 21 and 22 to an FTP/SFTP server.
This kind of configuration is helpful if you have a shortage of addresses and need to double up.
The role of load balancers in NAT
Load balancers are effectively NAT appliances. But they can have even more complicated rules. A load balancer will customarily have a single address for a pool of servers that perform the same function. The load balancer will replace the destination IP address in the packet with the real address of one of the servers in the pool, then forward the packet to that server.
This is commonly used for performance or resiliency reasons. The load balancer keeps track of the status of all of the servers in the pool. If any of the servers becomes unavailable for any reason, the load balancer can immediately remove it from the pool and stop sending new sessions to that failed server.
Another feature found in some load balancers is the ability to look at HTML content and use that to select a server. For example, there could be a common IP address for all of the web servers in the example.com domain. The load balancer could look at the exact URL and forward https://example.com/application1 to the first server and https://example.com/application2 to the second server. Again, this forwarding is done by translating the destination IP address in the packet header.
What’s next for NAT & PAT?
Back in 2015, I wrote about IPv6 and the implications it has for NAT. While the current necessity for NAT and PAT is driven in large part by the shortage of IPv4 addresses, the need for them isn’t going away any time soon for two reasons.
First, IPv6 uptake continues to be slow because features like NAT and PAT combined with private addressing have eliminated most of the pressure to convert to IPv6. The only people who really need new IPv4 addresses are service providers. And even in the distant future, it’s possible to operate the internet on IPv6, but have service providers translating and tunneling IPv4 addresses for their customers and continuing to hide the problems.
And second, one of the very useful features of NAT is that it can hide your true identity, which is important for privacy reasons. At one key point in its early history, IPv6 addressing was supposed to be free of NAT. But, at the same time, it was supposed to embed your computer’s unique 48-bit MAC address in your IPv6 address.
If we implemented the internet like this, then anybody intercepting your packets on the public parts of the internet would be able to uniquely connect your packets to your computer or mobile device, even if the payload were encrypted. This would have opened the door to a host of “big data” traffic analysis attacks against individuals.
Traffic analysis can convey a lot of information. If I see a lot of pizzas being delivered to the same house, I know where the party is. I can even make a good guess at how many people are coming. I don’t need to know what the toppings are. Similarly, I can tell where your email server is, which bank you use, and even what social media services you interact with, just by looking at your packets. This is an excellent starting point.
Are you looking to track connections in your network and see traffic traversing your firewall? Check out Auvik TrafficInsights™.