Software-defined Wide Area Networks (SD-WANs) are getting a lot of buzz these days, and many vendors have jumped on the bandwagon with new products trying to capitalize on the trend.

The basic idea of an SD-WAN is to reduce MPLS or other fixed internal WAN circuit costs by shunting some or all of your traffic over encrypted tunnels through the Internet. In some deployment models, the existing private WAN is completely replaced by one or two cheap Internet circuits at every site.

An SD-WAN requires a few key components.

  • First, a special device is required at every site to terminate the encrypted tunnels and decide which traffic flows should use which links.
  • Second, the devices need the ability to automatically and dynamically create the tunnels, particularly in cases where you need any-to-any communications between your sites.
  • Third, you need a central management system where you’ll define your traffic engineering rules.

The business case for SD-WAN: Cost-savings

Cost-savings are the driving reason behind SD-WAN. Internet circuits are cheaper than private MPLS WANs. Even though Internet circuits come with a host of security and congestion issues that are absent in private WANs, the cost difference is so great that it’s often cheaper to buy a great big Internet pipe and a firewall at every single site than to install even a modestly sized MPLS circuit at each site.

The cost difference is even better when you consider that, for most networks, the bulk of your branch traffic is actually destined for the Internet anyway. If I offload that traffic at the remote site and put it directly on the Internet instead of hauling it back across an expensive WAN, I save a lot of WAN bandwidth. I can also reduce the size of my central Internet circuit.

The main reason for back-hauling Internet-bound traffic is to inspect it on the big firewall (or IDS/IPS, sandboxes, etc.) at a central data center. So I would never implement an SD-WAN without putting some sort of small next-generation firewall at every site as well.

To recap: The business case for an SD-WAN is replacing an expensive private WAN with cheap Internet circuits. The cost-savings are partially offset by the additional costs of the SD-WAN appliances and firewalls at every site.

If there’s no expensive private WAN already in place, there’s likely no business case for SD-WAN.

piggybank SD-WAN savings

Photo: Ryan Hyde on Flickr

SD-WAN vendors to choose from

There are many different SD-WAN solutions on the market today with varying levels of maturity and cost, as well as different architectural models and features. If you’re looking at SD-WAN, it makes a lot of sense to compare the different vendors to see which is the best match to your organization.

Some of these vendors work primarily as cloud services, for which the main costs are operational (opex). Others expect you to buy the hardware and incur a primarily capital expense (capex).

Ongoing WAN charges generally appear as opex. So if your organization prefers—as many do—capex to opex, then your business case for SD-WAN is strengthened by replacing existing ongoing WAN charges with one-time fixed costs.

The biggest players in the SD-WAN market in 2017 appear to be Cisco iWAN, Talari, SilverPeak, and Riverbed. In addition, there are other companies like Citrix who’ve added SD-WAN capabilities to their product suites. And there are many smaller companies like Aerohive and Viptela who have mature products with good features.

In fact, I believe there are currently too many small vendors for the relatively new SD-WAN market. I expect to see many of the best small vendors bought up by the bigger players over the next couple years as the market expands.

Physical vs virtual networks

Throughout this discussion, it’s important to distinguish between the physical and virtual networks. In most cases, the physical network is composed of one or more (likely two) Internet circuits.

There are other options. You could use a private network such as an MPLS WAN for at least one of your physical networks, but this starts to eat away at your SD-WAN cost-savings. So we’ll assume we’re talking about two Internet circuits per site.

The biggest reasons for wanting multiple circuits are redundancy and load sharing. Some SD-WAN implementations are able to detect congestion on one link and shift traffic to the other to compensate. If I’m only using one Internet circuit at each site, it’s a lot simpler and cheaper to put a small firewall at each site and nail up an IPSec VPN to the data center.

Layered on top of the physical Internet links will be a mesh of virtual links among your sites. The virtual links are just like any VPN, but they’re generally created dynamically, as we’ll discuss in a moment. I can build an arbitrary number of virtual networks on top of a physical network, but the best and most common models use only one or two virtual networks.

If I have one mesh of VPNs, then I can make the VPNs transparently share traffic between the physical links. If one link fails, I can move the traffic to another. But the full SD-WAN magic comes into play if I have two or more sets of VPNs at each site. Then I can do traffic engineering and dynamically route traffic based on the application, as I’ll explain a little later.

SD-WAN devices and circuits

SD-WAN solutions include a special device at every site. In Cisco’s iWAN solution, the special device is simply a router. In other solutions, it’s a separate special purpose appliance. In all cases, it’s where the VPNs terminate and where the routing decisions are made.

The SD-WAN appliance is generally controlled from a central server that defines the rules. For example, If you want to connect to internal IP address A over protocol X, first create a VPN to this public IP address using these credentials. It’s considerably more complicated than a routing table. But it’s largely static information so it can be pushed down to the appliances at each site.

It could also include secondary path information that the appliance can use in case a link is not available. And in some solutions, there are additional rules for congestion avoidance: switch this traffic to the secondary path if the primary path meets a set of pre-defined performance metrics.

Firewalls are necessary

The usual deployment models show the special SD-WAN device with “WAN” or “outside” legs connected directly to Internet circuits. I strongly advise against doing this. Please, put a firewall between the SD-WAN device and the Internet circuit.

firewall flames SD-WAN

Photo: Eric E Castro on Flickr

Vendors will assure you their device is every bit as secure as a firewall. That’s untrue. When I deploy a firewall, I always direct its log messages to a central server so I can monitor the things attacking me from the outside, as well as anything that might have already compromised my internal security and is trying to reach back to a command-and-control network.

Firewalls give exceptionally detailed information about every session they allow through as well as everything they drop. At best, a router will tell you about the successful session. It won’t tell you the successful session happened after 50 failed attempts with different credentials. A successful connection looks like a good thing. A successful connection following 50 failed attempts means you’ve been breached.

If you’re also off-loading Internet bound traffic through the links, then you really want the ability to inspect that traffic and make sure internal user workstations aren’t connecting to command-and-control networks. Once again, a router won’t tell you that. An SD-WAN appliance won’t tell you that. Only a firewall gives you that visibility.

Creating a mesh

In Cisco’s SD-WAN model (iWAN), the tunnels are handled using the existing DMVPN (Dynamic Multi-point Virtual Private Network) protocol. Other vendors use other methods for finding the peer sites and dynamically creating encrypted VPN tunnels between them.

The dynamic nature is extremely important because a fully meshed network doesn’t scale well. In a network with N sites, a full mesh requires N(N-1)/2 separate links.

If you have even a modest number of sites—say, 10—you’d have to configure 45 IPSec VPNs. That’s already unmanageable. If you had hundreds or thousands of sites, it would be impossible to maintain the memory resources on all those devices just in case they might ever want to communicate.

The other option, of course, is to bounce all site-to-site communications through a central hub. But that introduces a lot of unwanted latency. It also requires an unreasonably large amount of bandwidth on your central site’s Internet circuit.

DMVPN uses a clever mechanism in which every remote spoke site is connected to a central hub site all the time. Whenever a spoke site needs to communicate with another spoke site, it starts by sending those packets through the hub. The hub forwards the packets, but it also sends information down to the spokes telling them to create a new spoke-to-spoke VPN. The spokes will continue to use the VPN until there’s no more traffic, at which point they tear down the spoke-to-spoke VPN to save resources.

DMVPN has the virtue of being an existing well-defined protocol with mature and stable code. So it’s a natural way of creating a mesh with Cisco devices.

In fact, you could create a DMVPN network using standard Cisco routers without needing any special iWAN features. What the iWAN layer gives you is the ability to create multiple DMVPN meshes connecting the routers, and the ability to dynamically route traffic over the virtual overlay networks based on protocol information and performance metrics. It’s called called traffic engineering.

traffic engineering shaping SD-WAN

Photo: Jo on Flickr

Traffic engineering is a big benefit

Traffic engineering is the process of defining how you want traffic routed based on information other than simple IP destination addresses. It implies there are multiple paths to a destination and that different traffic will take different paths based on some higher level information.

The routing decision might be based on a Quality of Service (QoS) marking or another parameter like a protocol number. It might even be based on a deep packet analysis of the protocol information in the data payload of the packet.

Often, traffic engineering also includes a performance aspect. For example, it could include the ability to send only a certain amount of traffic over its primary link. Excess traffic could be redirected to a secondary link or perhaps dropped. There could even be the ability to detect congestion on a link and automatically redirect traffic to avoid that congestion.

Final SD-WAN considerations

SD-WAN has a lot of very attractive features. Since the business case is primarily cost-savings that result from eliminating or downsizing an existing MPLS or leased-line WAN (or perhaps from avoiding the implementation of a new one), your starting point should be a detailed understanding of relative costs. The SD-WAN offerings from different vendors have some fairly significant differences in both features and cost.

The other thing I haven’t really discussed but that should be self-evident is that SD-WAN implementations are inherently much more complex than more traditional WANs. Complexity should always be a serious consideration in network design, because complexity leads to human error in configuration and long delays in troubleshooting.

In particular, one of the coolest features of SD-WAN implementations is the traffic engineering capability. But traffic engineering leads to situations where standard troubleshooting tools like ping and traceroute send packets down a different path from the one taken by application traffic. This leads to situations where you have connectivity but the application still doesn’t work. Will the junior network analyst on pager duty at 3 a.m. be able to figure this out?

If your organization is new to SD-WAN, it might be smart to start with very simple and easily remembered policies. Reducing complexity is always a good idea.

Finally, it’s a good idea to compare SD-WAN costs and implementation complexity against a traditional spoke-to-hub VPN configuration, as well as to an expensive private MPLS WAN.

The traditional VPN model is likely to be even less expensive than SD-WAN. However, it will lack a lot of the redundancy, traffic engineering, and fault tolerance features of SD-WAN. As well, it will generally be more difficult to maintain the static VPN configurations.

However, particularly in the case of Cisco iWAN, your point of comparison could easily be a straight DMVPN network. If the implementation model involved only a single Internet circuit at each remote site, there’s little to be gained in the way of fault tolerance.