We all make mistakes. Hopefully we learn something from them. I’ve been at this networking racket for a while, so I’ve made a lot of mistakes. Here are some that I did manage to learn something from.
You’ll notice that a lot of the network management mistakes that bite you are actually made during the design phase.
Mistake #1: Not using logical patterns
I have a network design mantra: Anything that breaks will break at 3 a.m.
When we were still dating, the first gift my wife gave me was a blanket because she’d seen me creep into the living room in the middle of the cold night too many times. She’d come out to see what was happening and find me on the phone guiding somebody through troubleshooting a problem.
Usually the person on the other end of the phone was a computer operator whose role was running jobs on the mainframe and swapping tapes. Often I didn’t have spreadsheets with me, and the operator usually didn’t understand the questions I’d ask. So I had to figure out what was wrong based on whatever limited information he had about the problem.
It doesn’t take very many cold nights huddled in a blanket in the living room to make you question your life choices. Not being one to question my life, I questioned my network design methodology. And I came up with a rule.
Solution: Do everything in logical patterns
Do everything in obvious logical patterns. Allocate IP addresses and VLAN IDs in obvious logical patterns. Connect your WAN circuits in obvious logical patterns. Lay out your Internet DMZ in obvious logical patterns. Your LAN and Wi-Fi networks, everything should be obvious logical patterns.
The logic should be locally relevant. I can’t tell you ahead of time what it will look like. Maybe you’ll set the user VLAN on the fifth floor to be VLAN 5 with IP address 192.168.5.0/24 so you can automatically associate locally meaningful information to network parameters.
The important thing is that when the phone rings at 3 a.m. and the panicked operator screams that he can’t reach some IP address, I know reflexively which site he’s talking about, what VLAN he’s trying to access, and I might possibly even remember that there was maintenance scheduled for tonight.
Mistake #2: Making the subnet too small
Another important design decision that affects network management is subnet size. Making the subnet too small doesn’t seem like a big mistake but it can be a real pain to deal with. Run out of addresses and you’ll need to readdress the segment, which means changing the addresses on every device.
For example, a /29 subnet gives you six addresses, which might be plenty today. But if you suddenly decide to implement a backup router with Hot Standby Router Protocol (HSRP) on that segment, you’ll burn through half of those addresses all at once.
Solution: Think ahead when creating subnets
Make sure you have enough subnet capacity that you can easily add devices without having to either radically change your design, or worse, break the logical pattern just to fit in some new device.
Mistake #3: Locking yourself out of a device you’re working on
Some network management mistakes can’t be avoided with good design. For example, there are many entertaining ways to accidentally lock yourself out of a remote network device while working on it.
Everybody’s done this. You log into a remote router or a switch and you change the access control list (ACL) on an interface. Just as you’re hitting “return” you remember you’re logged into the device through this interface, and in that same instant, you completely lose all contact with the device.
If you’re truly creative about these things, like me, you can come up with other clever ways of locking yourself out of a device. Oh, there are obvious ones like changing routing tables so that the remote device no longer knows how to get back to you. And there are interesting ones, like changing a network address translation (NAT) rule.
But the coolest way of locking yourself out of a device actually happened to a friend of mine. He once changed the timeout value for VTY sessions to one second. I wasn’t there, but apparently once he realized what had happened, he had to log back in and type like crazy to change things back. It took several tries.
Solution: Schedule a delayed reload
If you ever need to change the configuration of a device, stop and consider whether there’s a reasonable danger that an error could cut off your access. In particular, if you’re changing an ACL or the configuration of the interface that you’re currently using to access the device, the alarm bells should be ringing. Of course, you’d never make such an error but I might, so do this for me. If the danger exists, use a scheduled reload.
Router# reload in 10
The idea is to schedule the device to automatically reboot itself in some reasonable time like 10 minutes from now. If my change goes well, no problem — I just cancel the reboot.
Router# reload cancel
If my change blows up and I lose access, no problem — I just wait for the reboot. Then, because the new configuration was never saved, as I was locked out before I finished what I was doing, it will reboot with the last saved version and I’ll try again.
Alternate solution: Give yourself out-of-band management
If I lose my primary network connection to some remote device, it’s awfully nice to be able to connect to the console over a modem or a terminal server. Then I can troubleshoot the problem without needing to dispatch an expensive technician.
Mistake #4: Failing to document your network
There was a time I was having trouble with the connection between two switches, but there were no descriptions on any of the interfaces. So I had to go to the computer room, find what I thought was the right uplink interface, and trace the cable. Naturally the cable I traced went to some other device, so I had to start over. I lost track of how many cables I traced that day.
Forgetting to update description fields can cause just as many problems as not using them at all. I once took down a large section of a WAN because I changed the configuration on the wrong interface. I believed the incorrect information that was showing in the description field.
Solution: Make extensive use of description and comment fields.
Every switch and router interface should have a description saying exactly what it’s connected to. Preferably, the description includes all of the most pertinent information about the connected device: its name, the type of device including model number, what it’s for, who configured it, and when.
That way I can instantly tell what I’m looking for. Where’s my Internet firewall connected? Where’s the backup firewall? What’s this mystery port that’s configured but inactive? Can I use it or is it reserved for something?
Over to you: What are some of the network management mistakes you’ve made and regretted?