A cynical network engineer might say, “configuration drift happens when you take the day off.” Someone changed something they shouldn’t have, and didn’t tell anyone. As a result, the network gets just a little less secure, and a little harder to troubleshoot.
And then it happens again. And again. And over time, all those little changes that people thought would never mean anything suddenly add up to a network looking a lot different from what it’s supposed to be. This is just one of the many ways configuration drift creeps in to networks.
To help keep networks secure, compliant, and easy to maintain, IT teams need to limit configuration drift as much as practical. While some level of configuration drift is almost unavoidable, effective communication and configuration management can help avoid and lessen the risks.
Let’s take a closer look at the different ways we run into configuration drift, and then dive into practical methods for limiting it on real-world networks.
What is configuration drift?
Configuration drift is what happens when a system’s actual state becomes different from an original intended state. While configuration drift is typically associated with changes that occur between pre-production and production environments in software development, the same principles can easily be applied to configuration of networks.
As the name implies, it usually doesn’t happen overnight. Instead, it’s a gradual process where configurable items (like computers, but also security policies, network devices, etc.) change over time.
When a computer network is first built, engineers typically put a significant amount of time and effort into network documentation and design. A well-defined intended state is documented, secure, and compliant with relevant standards. Hardware, firmware, applications, and network topology—often down to the port level—are consistent with this “intended state” of the network. Troubleshooting, asset management, and day-to-day IT operations can be surprisingly smooth during that time.
Until one day
You need to do a quick change to work around a software bug at a specific site. You should document it, but you have other high-priority tickets to get to, and there’s only so many hours in a day. Anyway, it’s no big deal because it’s just that one site.
Then, you add a switch in the IDF.
And apply that patch to the access points…
And modify firewall rules so that a new app works…
And create an Active Directory account for the new junior network admin…
And decommission the old network devices that use SSH ciphers that fail a security audit…
And now your coworker has been putting out fires all month, making who-knows-what changes just to keep systems online…
Congratulations! You now have a network that has significant configuration drift!
That’s the slippery slope of configuration drift in a nutshell. After all, changing just one plank in the Ship of Theseus doesn’t make it a whole new ship, right? Any of those things aren’t a big deal on their own, but they add up. Now, your network documentation is less valuable, and IT has less visibility into the actual state of the network.
Common causes of configuration drift
- Software/firmware patches. Applying patches and updating network devices is an important part of maintaining a secure network. Unfortunately, they can also lead to configuration changes. For example, updating firmware on an IoT device may enable a new network service or alter a setting.
- Hardware upgrades. New devices are added to networks all the time, often under the pressure of a deadline. Frequently, follow-up tasks, like updating network documentation and validation to ensure the devices match the intended state of the network, are missed, leading to configuration drift.
- Troubleshooting and issue remediation. Let’s face it, a big part of IT is making something that’s broken work again. Whether that’s a VLAN config, an app on an end user’s PC, or a misconfigured DNS, the primary goal when troubleshooting is to get things back to a “working” state. This can mean ad-hoc configuration changes that work, but introduce configuration drift at the same time.
- Poor communication. There are two ways poor communication can impact configuration drift. The first is simple: someone makes a change, but doesn’t communicate it to the team, or document it anywhere. Not only does this hurt knowledge transfer within the current team, but it also makes inheriting a network as a new admin a real challenge. The second is a bit more nuanced: the “intended state” of the system isn’t well-defined to begin. This can lead to individual teams or admins making changes they think are correct, but conflict with another stakeholder’s “intended state”.
- Changes made by end users or clients. This won’t be breaking news to anyone that has worked a help desk role, but end users tend to make unauthorized changes. Similarly, a common challenge for MSPs and clients in a co-managed IT scenario is that one party makes changes or add devices to a network without following proper procedures.
- Documentation as an afterthought. In the day-to-day grind of network management, updating documentation tends to fall behind other items on the priority list. Most network pros acknowledge the importance of keeping documentation current, but once a system is online and working, it’s time to move on to the next urgent situation. Documentation becomes a thing to do later. Except, later never comes. Network documentation becomes stale and over time, IT stops referencing it at all, and instead depends on tribal knowledge of network configuration.
Risks of configuration drift
It’s pretty easy to understand why network configuration drift can make a network less secure, harder to manage, and degrade performance. After all, the intended state was intended for a reason, and deviation from that probably isn’t ideal.
Additionally, most companies have adopted international standards, like NIST’s SP 800-128, ISO 27001 A.12.1.2, and CIS Security Control Four that make the importance of configuration management clear. That’s because the risks that arise when configuration management isn’t given top priority can be profound:
- Network breaches and security threats. Security misconfigurations are in the OWASP Top 10 for a reason: One of the most common reasons for security breaches is insecure configurations. Even if you start with a hardened configuration, configuration drift can increase the risk of a network breach, and other threats, over time.
- Downtime. A misconfiguration that allows an attacker to exploit a DoS vulnerability or compromise a critical server can lead to significant downtime. But that’s not all. Suppose you change the configuration on a network device, and it causes performance issues. No problem, you have a “golden config” you can revert to, right? But if that configuration is also bad, it’s going to take a lot longer to restore service. Check out our article on change-induced downtime for a deeper dive on the topic.
- Falling out of compliance. Compliance with standards such as ISO 27001, PCI-DSS, and HIPAA requires tight security controls. Left unchecked, configuration drift can lead to you falling out of compliance.
- Degraded performance. The intended state of a configuration is, in most cases, also its most optimized state. Ad-hoc changes can hamstring network optimization efforts by introducing bottlenecks and conflicts.
- Wasted time. Troubleshooting a network you don’t know much about—or one that doesn’t match your network documentation—can be a major time sink. That means in addition to potentially causing downtime for users, configuration drift can lead to IT troubleshooting issues that would have been simpler to address (or not exist at all) if the network was in its intended state.
How to prevent configuration drift and mitigate risk
Alright, it’s clear that configuration drift is both real and risky. But what can you do to prevent it? Here are the methods that succeed on real-world networks.
- Automate, automate, automate. The common component to any good configuration management tool or processes is automation. In fact, all the rest of our tips work best if you automate them. Manual management and updates are a recipe for configuration drift over time. Automation takes the human element out of it. For example, with automated backup, network devices are scanned regularly for configuration changes. If a change is detected in a running config, it’s backed up, and the old config is logged and stored for review. Within Auvik, you can drill down and see changes directly from within the UI as well. Compare that with how error-prone and time-consuming the same workflow would be if you had to do it manually.
- Maintain your device inventory. Maintaining an up-to-date device inventory is a key factor in reducing configuration drift. By automating the network discovery process, you can quickly compare the assets on the network to the assets that should be on the network to take corrective action if needed.
- Create baselines. To also avoid questions around the “intended state” of your network, your up-to-date inventory of all your network assets should always include a repository of baseline configurations.
- Monitor continuously and audit regularly. Improving your network visibility improves your ability to detect and correct configuration drift. In most modern networks, it’s fair to say some amount of configuration drift will occur, even with good configuration management practices. New devices are connected to the network. Admins make ad-hoc changes. It’s the nature of the beast in network management. Continuous network monitoring and regular network audits help you become proactive about drift management.
- Have a formal change management process. Change management is a fundamental IT Service Management (ITSM) process. Ensure that your teams are disciplined about defining and adhering to formal change management procedures. Following the right steps, the first time, significantly reduces future ad-hoc changes that often contribute to configuration drift
- Prioritize documentation and communication. When changes do occur, ensure they are clearly communicated to, and documented for, everyone that needs to know. Automatic network documentation and mapping tools do a lot of the heavy lifting towards providing a single source of truth, but there is a human element too. Some changes have to occur on the fly. When that happens, make sure your team members communicate what changed and update each other accordingly.
Make documenting changes part of the “definition of done” for your IT tickets.
There’s no silver bullet when it comes to eliminating network configuration drift. Given the complexity of our modern networks, configuration drift is almost unavoidable. But by introducing automation into your network documentation, monitoring, and configuration backup processes, you can reduce your exposure to configuration drift and react quickly when it occurs.
With Auvik, you’ll gain the confidence of knowing that your network discovery and inventory management is dynamic, up-to-date and always in one centralized location. Take our risk free, 14-day free trial now, and see what happens when you let intelligent automation manage your configuration.