The Graylog blog

Metrics For Investigating Network Performance Issues

network performance

When the world went remote in March 2020, cloud technologies made work possible. Rapid digital transformation changed everyone’s jobs, whether in-office, remote, or hybrid. Today, your business relies on network speed for everything from productivity to customer service. Keeping your company’s services running means you need to make sure you have low-latency connectivity across data centers, users, and cloud. 

You need to quickly investigate network performance issues and prove that you resolved them as fast as possible. 

Why do you experience slow network performance?

In traditional computing models, information flowed from a central data center. Users and devices connected from their locations, no matter where that was. With distributed users and devices, this model no longer provides adequate connectivity. As information flows into and out of a central data center, networks become overloaded, slowing them down. 

While network performance issues can come from data transfer volume, it’s often more likely that you need to investigate the root cause so that you can fix the underlying issue. As part of this process, you should start by looking into:

  • Are the impacted users internal or external?
  • Is the network environment a Local Area Network (LAN) or Wide Area Network (WAN)?
  • Is the network cable, fiber, or wireless?
  • What are the types of switches?
  • Are reports coming from the same network segment?

7 Common Network Performance Problems

When investigating network performance problems, understanding some of the most common types can help you resolve the issue faster. 

Central Processing Unit (CPU) Usage

Since the CPU is responsible for receiving and processing instruction, high usage often indicates that the network has too much traffic. For example, people may find that applications run slower or processes take longer than usual. 

Bandwidth Usage

Bandwidth refers to a network’s capacity to transmit data between devices. The more bandwidth you have, the faster the data travels. However, bandwidth is a finite resource so if one device is transferring a larger-than-usual amount of data, other users and devices experience slower speeds.

Physical Connectivity Issues

When your network cables or connectors are defective, you often experience slower network speed. Even wireless networks start with cabling somewhere, so faulty connections or hardware problems could be the culprit. 

Disabled Device

In some cases, a network device can end up disabled. You need to make sure that the following devices are online:

  • Switches
  • Routers
  • Bridges
  • Gateways
  • Modems
  • Repeaters
  • Access points

Network Hardware Misconfigurations

Devices also need to be configured appropriately. Updating firmware, installing, and reconfiguring a device can all create performance issues. 

Domain Name System (DNS) Problems

Websites and web-based applications use DNS to match domain names with IP addresses so that users can easily access them. If your network appears available to you but not to users, then you might have a DNS issue causing the network problems. 

Shadow IT

Depending on your network access controls, employees or visitors could be running devices on your network without you realizing it. These devices drain your bandwidth which could be slowing down your network. 

 

Network Metrics that Help Investigate Network Performance Root Cause Issues

You need to investigate network performance issues quickly to reduce business disruption. Further, you can use these same metrics to monitor network speed and ensure reliability. Continuously monitoring your network can reduce outages and reduce latency. 

Latency

Latency, measured by round trip time (RTT) in milliseconds, is how long data packets take traveling across the network to reach their final destination. RTT focuses on how long it takes data to travel from the source, to its destination, and back, giving you a measurement for network speed and reliability. 

 

This metric provides visibility into delays caused by bandwidth congestion. 

Jitter

Also measured in milliseconds, jitter tells you whether you have differences in latency times by looking at whether the normal data packet sequence is disrupted. While latency gives you a single metric around speed, jitter provides more information about what’s causing that slower travel time. 

This metric provides visibility into issues like:

  • network congestion 
  • route modifications 

Packet loss

When you send and receive data across a network, it gets broken into smaller pieces so that it can travel faster, then put back together when it gets to its final destination. Packet loss is related to both jitter and latency because it means that some of those smaller data units got interrupted while traveling across the network. Packet loss information gives you greater insight into jitter, ultimately helping you understand network latency issues. 

This metric provides visibility into issues like:

  • Network congestion
  • Device malfunctions
  • Software and firmware bugs
  • Overtaxed devices
  • Misconfigurations

Packet reordering

Once packets get to the recipient host, they need to be put back in the right order so that the information makes sense. Packet reordering is when they don’t get to the recipient in the right order. Also called packet out-of-order or packet out-of-sequence, this problem is related to jitter because having to put the packets back in the right order takes extra time. 

This metric provides visibility into issues like:

  • Routing problems
  • Application quality of service
  • Load balancer misconfigurations

Throughput

Throughput is the amount of data that your network transmits and processes. While bandwidth focuses on the maximum amount of data a network could handle, throughput gives you insight into how much data it actually handles so you can understand real-time performance. Even a high-performing network will have a throughput lower than its bandwidth. However, if you have trends over time, you can determine whether current throughput is the same or less than normal. 

This metric provides visibility into issues like:

  • Network congestion
  • Device malfunctions
  • Application quality of service

Packet duplication

If a recipient host doesn’t send an acknowledgement that a packet arrived, the sending host may resend that information because the communications indicate packet loss. Sometimes, duplicate packets occur somewhere along the network. Packet duplication undermines your network analytics’ integrity by reducing tools’ bandwidth, packet capture storage, and processing power. 

This metric provides visibility into issues like:

  • Software or firmware bugs
  • Misconfigurations
  • Routing problems

 

Graylog Operations: Faster Network Performance Investigations

With Graylog Operations, you get increased visibility across your IT environment by aggregating and correlating log data so that you can analyze issues more effectively and efficiently. With Graylog Operations, you can view real-time data to streamline your network performance investigations using dynamic lookup tables and multi-threaded searches.

To proactively mitigate network issues, you can create customized dashboards and receive high-fidelity alerts via email, text, and Slack, so that your organization experiences as little business interruption as possible. 

 

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog Blog delivered to your inbox once a month.