“My computer’s not working!”
“I can’t connect to the internet!”
“My emails aren’t sending!”
You’re probably used to hearing common requests and complaints like these from end users. It’s our job to take these issues, troubleshoot them, bring them to root cause, and get the user back up and running as quickly as possible. The challenge is that the problem a user describes is often so generic it’s tough to quickly interpret what’s happening.
So how do you even start to troubleshoot these types of issues?
First, it pays to be curious. Take what little information the user has shared with you and expand your scope of knowledge about the problem. Ask the user questions. An extra minute on the phone can save you hours of troubleshooting down the wrong path.
What should you be asking? Things like:
- Is just one user experiencing this connection issue or are others affected too?
- Is the user connected with a wired or wireless connection?
- Are the affected user(s) able to connect to the internet, internal connections, or neither?
- Is the issue consistent or intermittent? Is there a pattern or is it seemingly random?
Armed with this information, you can start diagnosing the issue at hand.
Am I dealing with an internal or external network issue?
If the user is trying to access a resource outside of the local network or if many users are affected by a network issue, you can quickly narrow down the issue to two likely causes: it’s either an issue with a core network element or it’s an issue outside of the network you’re responsible for managing.
Auvik’s internet connection check is the go-to feature to quickly determine whether the issue is internal or external.
Within the internet connection check, you can see if there’s a high degree of packet loss (greater than 2%). You can also see if the round trip time is significantly higher than normal (greater than 200 milliseconds) or is experiencing wild swings.
Read more: Auvik partners explain how the internet connection check helps them quickly troubleshoot internet connectivity issues.
Any of these stats mean there’s an issue between Auvik and the external interface on the site’s firewall, which means the issue is likely external to your network. If packet loss and round trip time are normal, it’s likely an internal issue.
How do I troubleshoot an internal network connectivity issue?
Once you’ve ruled out an external connectivity issue, you can turn your sights ton the internal network.
With many internal connectivity issues, your first glimpse at Auvik’s map will point you to the root cause right away. Take this map, for example:
I can quickly see there’s a critical Interface Down alert on a network interface, which is highlighted in a red line on the map.
Other times the issue may not be as obvious, but Auvik’s topology map is still the best place to start. Search for the user’s device in the map, and you can quickly see where the device is connected. If the device is showing offline in Auvik, you may have a physical connectivity issue, which can be one of the most challenging issues to troubleshoot remotely.
If the device is online, you can develop your hit list of devices this endpoint depends on for connectivity. In the example below, the highlighted device called “Auvik-core.local” is directly connected to a switch, which is also connected to a firewall that leads to the outside world. You’ll need to to ensure these devices are in top shape and are performing the way they’re supposed to.
During your health check on these key network devices, you need to be looking for a number of things:
- What’s the throughput on the device? Is it near the max capacity? Is it much higher than normal?
- Is there one interface that’s placing the majority of the load on a switch? Could one misbehaving device or user be impacting the whole network?
- Is there more than normal broadcast traffic occurring on the switch?
- How are the key device utilization metrics compared to their norms? Has the CPU utilization increased? Is the memory near its limit?
While this may seem like a lot to check on, it only takes a matter of seconds when reviewing a device dashboard in Auvik. Clockwise from top left you can quickly see the top interfaces on a device, overall device bandwidth, the traffic mix— including how much broadcast traffic there is—and the CPU and memory utilization of the device.
If the issue recently started occurring, you can also check the network for any alerts and for changes. Has there recently been a network topology change, like a new access point or switch being added or a device going offline? A quick glance at the Severity and Alert Name columns, or a simple search through the listed alerts, will show you the most pressing alerts you need to address.
Or has a key network element recently had its configuration changed? If yes, you can use Auvik’s configuration compare feature to highlight the changes that were made and determine if this could have an impact on the user’s connectivity. You’ll quickly see any configuration elements that have been added (highlighted in green), edited (highlighted in grey), or removed (which would be highlighted in red).
If there’s an issue with the configuration, you can restore the device to a previous config directly from Auvik, or you can export the config from Auvik and apply it directly to the device.
Read more: Auvik automates network configuration backups to help you manage network risk and minimize network downtime.
Everything in the network looks good—what can I do next?
If you’ve checked all these things and still haven’t found the smoking gun that’s causing your user’s connectivity issue, there’s likely an issue that’s specific to the endpoint or the application they’re trying to use. In that case, it’s time to turn your attention to the endpoint.
Network issues can be some of the nastiest to troubleshoot, especially without good network documentation. But thanks to Auvik, investigations that could previously take hours can now be done in seconds just by popping around the Auvik dashboard.
Your Guide to Selling Managed Network Services
Get templates for network assessment reports, presentations, pricing & more—designed just for MSPs.
I find that the top utilization (especially CPU and Traffic) can be a big help in troubleshooting, especially for issues like broadcast storms, converged VLANs, etc. If you see one switch with a really high CPU and/or a very high port utilization, you can often simply follow the interface to determine that a switch likely has a problem, even if it’s several layers down. Even stranger might be an edge switch with a utilization rate close to that of a core switch. Often, the port utilization alone will show you the issue. To double-check, simply console in via Auvik and check the ARP cache and/or FDB. If you see a handful of MACs on a port that should have them – you’ve found your problem.
Thanks for the tip, Eric. Love it.