What is asymmetric routing?
Asymmetric routing is a situation in which packets take one path to go from source to destination, but replies take a different path to return. Notice I called it a “situation” and not an “issue”? That’s because it’s not always a problem. It only becomes a problem where there’s something stateful in the path, like a NAT device or a firewall. Stateful devices expect the path that traffic takes between two devices to be the same in both directions (the “symmetry” part).
Quick review: what’s a “stateful” firewall?
Stateful firewalls, as the name suggests, do more than just filter packets according to rules. Putting it simply, they are able to monitor all the aspects of traffic, checking for patterns, and taking action when something looks out of the ordinary.
In this case, that means that when a firewall sees the first packets in a new session, it creates an entry in an internal table specifically for that session. Assuming that all the packets pass the appropriate policy and inspection rules on the firewall, the firewall knows that it’s OK to forward these packets. When the session is done, the firewall removes the entry from this table.
This ensures that the destination can’t send arbitrary packets back to the source (a vector for malicious attacks). The only packets that are allowed through are ones that match the entry in the session table. It’s basically guaranteeing that you can connect to a website on the Internet, but that website can’t connect back to you unexpectedly.
What does asymmetric routing look like?
Consider the following network diagram.
The workstation at the bottom sends a packet to the server along the green path, and it goes through the firewall on the left. But, because of a routing issue in the network, the server’s response follows the red path back, taking it through the firewall on the right. And that, friends, is asymmetric routing.
This is actually a pretty common scenario. Suppose, for example, you have a pair of VPN firewalls connecting your office network to a cloud provider (AWS or Azure, for example) for redundancy. You might not even think of these firewalls as firewalls, because their primary function is terminating the VPNs.
So unless you’re very careful with your routing, you could easily develop an asymmetric routing issue. And in the case of a VPN to your cloud provider, all the IP addresses will be private, just like the example.
Also, if you’ve configured any kind of multipath load balancing between these networks, there are four possible scenarios:
- The up and down paths both go through the firewall on the left
- They both go through the firewall on the right
- The up could go through the left and down through the right
- The up could go through the right and the down through the left
So, half of your sessions will have up and down paths using different firewalls.
This is important because what you’ll see in practice is even more confusing than the asymmetric routing issue example that we’ll be digging into. That’s because half of the sessions will work, and the other half will fail.
How do we fix asymmetric routing issues?
Intermittent and inconsistent asymmetric routing issues problems are always hard to find. To get around that situation, we’re going to construct an appropriate filter using Wireshark that just finds the non-working session.
To start, let’s focus on just one session using a filter like this:
ip.addr==10.10.10.55 && tcp.port==59574
This filter selects all packets with the specified IP address (either source or destination) and the specified TCP port (again, either in the source or the destination).
Test #1: PING
Now let’s do a few experiments and look at the results in Wireshark. First, we’ll PING the server from the workstation.
This looks completely normal. The source device, 192.168.100.10 is sending PING packets to 10.10.10.55, and the destination device is responding. From this capture, it looks like we have good routing and good connectivity between the source and destination networks. Moving on.
Test #2: HTTPS
Next, let’s try to establish an HTTPS session.
That’s not working. Looks like we’ve found a problem.
Let’s get forensic
The first question—why does PING work? This is where things can get a little confusing because not every protocol is stateful. In particular, ICMP, the protocol that carries PING packets, is not stateful. It’s common to simply allow all PING and PING responses through the firewall, particularly for internal traffic.
Looking back at our network diagram, the PING request from 192.168.100.10 to 10.10.10.55 goes to the firewall on the left. The firewall creates a session table entry for this session and waits for the reply traffic. However, the reply packet comes back through the firewall on the right. The one on the right is configured to simply allow all ICMP packets, which is common for internal firewalls. The second firewall also creates a session table entry and forwards the packet back to the original source.
There are other stateless protocols that will behave similarly. For example, DNS and NTP will work perfectly well in this network, despite the asymmetric forwarding because these protocols both use UDP as their transport.
Now, look at the packet capture for the HTTPS session. The source device (the client) sends a TCP SYN packet (packet number 3). The firewall on the left creates a session table entry, forwards the packet, and waits for other packets that are part of the same session to come along. The SYN packet reaches the webserver at 10.10.10.55, and it responds with a TCP SYN ACK packet (packet number 5).
But remember, this packet is getting forwarded along the other path. In this case, the firewall on the right shouldn’t have created a session table entry and forwarded the packet because this packet has its ACK flag set, so it’s not actually the first packet of the session. But it’s not unusual for firewalls to just blindly forward anything with a SYN flag.
To summarize what we’ve seen so far:
- The SYN packet sent by the client reached the server
- The SYN ACK sent by the server in response reached the client
- The client sent the third part of the standard TCP 3-way handshake, an ACK. We see this ACK packet in the trace as packet number 6
The client device then tries to start the HTTPS TLS session on top of this TCP session, and it fails. We see the “client hello” in packet number 7. Then we see a lot of “PSH ACK” messages, which indicate that the client device is desperately trying to get this session started, but not seeing the responses that it expects.
At this point, many people looking at this trace will guess that there’s something wrong with the server’s HTTPS configuration. Maybe there’s a problem with the certificate, or maybe the webserver process isn’t communicating properly with the network stack. These are good guesses because the trace shows everything’s apparently working up until the point of establishing the TLS session.
But if we look at the trace a little more closely, we see packet number 20. The server is retransmitting the SYN ACK. Why would it retransmit this packet? It’s retransmitting because it never saw the third packet of the 3-way handshake (packet number 6).
Why didn’t it see packet number 6? Remember the firewall on the left is forwarding packets from the client to the server. This firewall is keeping track of the session. It wants to see a SYN from client to server, followed by a SYN ACK from server to client, and then an ACK from client to server. But the SYN ACK packet went along the other path. So, from the firewall’s point of view, the ACK sent from client to server was wrong. That’s not part of the session initiation. So the firewall dropped this packet.
Finally, I want to show you something else from this same packet capture. Notice that, in the previous image, I included a filter:
ip.addr==10.10.10.55 && tcp.port==59574
I did this so we could just see one session. But I actually looked at 2 sessions at the same time.
Now you have to be careful because it’s easy to confuse the two sessions. And in a real network, there are probably dozens of sessions going on at the same time. In the noise of all these packets, it’s even easier to miss that tell-tale packet number 20.
There’s a few good lessons in this. The first is that just because PING works doesn’t mean your network is routing traffic correctly.
The second lesson is that when you’re looking at a packet trace, don’t stop at the point where things appear to have broken. In this case, packet number 20, far past the point of breakdown, was the critical clue that packet number 6 was not delivered. You can’t assume that the packet was received just because the packet’s in your trace.
And the third lesson is to be careful about your filters. If you’re looking at a particular TCP session, make sure to use a filter that shows you just the session that you’re interested in.
Being able to visualize your network connections in real-time is a great way to better understand how assets are connected, and spot possible routing issues as we’ve outlined here. Want to experience what deep network visibility can do for your network management? Try Auvik FREE for 14 days.