We heard loud and clear from the community the need to provide better visibility into the health of their infrastructure and networks. With SNMP and NetFlow support, we can now provide better visibility into your network and compute infrastructure.
The goal behind both plugins is to provide better performance and network fault detection information, resulting in higher quality and faster root cause analysis.
I’m going cover both plugins in detail, starting with the SNMP plugin, and will cover how you can use them separately or together.
SNMP Input Plugin
So what can I do with the SNMP input plugin?
For folks who aren’t well versed in SNMP traps or why you should care, here is a basic overview of the SNMP protocol and specifically, SNMP traps.
SNMP is regarded as the standard protocol for network infrastructure monitoring. It’s been around for a long time and it isn’t going away anytime soon.
Our new SNMP plugin is passive and only receives SNMP traps. SNMP traps are sent from an SNMP agent, which can run on pretty much any device, from switches, routers, printers to even a Raspberry Pi. SNMP agents are frequently deployed on Linux, Unix and even Windows servers but are most often seen on networking and peripheral devices like printers.
Depending on the SNMP agent, there may be many different trap message types that can be sent but in general they can be categorized into two basic use cases.
SNMP agents may be configured to send traps based on certain conditions, like exceeding a performance threshold or a less critical condition like nearing a capacity threshold i.e. running low on disk space. Metrics can be extracted from these types of trap messages for further analysis. Metrics may include trap severity (critical through to debug), metric values such as CPU % or even counters such as 1000 messages seen over the last hour.
Graylog can parse and analyze received trap metrics and even do basic alerting on a pre-defined threshold. Also, Graylog can forward those traps to other monitoring systems as well.
Most monitoring tools can’t handle high trap volume. For example, firewalls are notorious for generating high trap volumes due to DoS attacks or misconfiguration. Using Graylog as a buffering layer is a good idea to alleviate the burden on your monitoring tools while being able to filter out noise as well. This ensures you don’t miss anything important.
Passive fault detection
SNMP agents can be configured to send traps based on a failure/error/warning condition. These types of traps are generally explicit in nature and may not contain metric values i.e. “Network Interface is down on Port 1”, which may be a critical error or if the port has been set to maintenance mode may only be informational.
If you want to be even more proactive, then you will need a polling agent and other methods such as ICMP to figure out if there is a problem or not.
Can Graylog consume any kind of SNMP trap?
Yes, we can receive SNMP traps from any SNMP device. The caveat is you will need an appropriate MIB file to decode the long OID (object identifier number) into something human readable and searchable.
All you need to do is configure your SNMP agents to send traps to Graylog, load the appropriate MIB file and away you go. More information can be found on the Graylog Marketplace.
In summary, being able to receive traps natively can result in faster root cause analysis and a more timely resolution of a potential issue before it turns into a service impacting disaster.
NetFlow Input Plugin
So what can I do with the NetFlow input plugin?
NetFlow was developed by Cisco to help network admins help identify network traffic bottlenecks. It is complementary to SNMP support in that it gives you a holistic view of the network (on the premise your edge, gateway and core network elements support NetFlow). It isn’t real time, instead it’s focused on network traffic flows. Whereas SNMP data can be collected near real time (1s intervals) but is geared towards different use cases.
Graylog presently supports NetFlow v5 flows, newer versions will be supported in the future.
Here is the basic anatomy of a NetFlow v5 packet.
So now that I have all of this ‘Flow’ information, what do I do with it? Here are some common use cases:
Identify type and location of network traffic loads
Figure out who the top talkers are and use this information for current capacity optimization i.e. implement a QoS policy to prioritize important traffic over less important traffic. It can also be used to figure out future capacity needs, based on flows over time.
Understand where network congestion is occurring
Is my remote office having issues? Why? Is it because of the new high-definition video conferencing system?
Validate effectiveness of network configuration change
Measure QoS effectiveness by comparing NetFlow stats before and after. For example, you may prioritize VoIP traffic over FTP traffic. The result would be higher quality audio and slower FTP transfers, resulting in happier VoIP users and slightly less happy IT operations staff.
Identify potential security risks such as DDoS via unusual network protocol/traffic trends
Detect increased inbound traffic over port 80, directed at the company website. This may be a possible DDoS attack or a misconfigured application.
SNMP + Netflow
Can I use these two plugins together?
Absolutely. You can use both individually or together depending on your infrastructure/network configuration and requirements.
Use the SNMP plugin to identify possible performance issues and/or network faults. Use the NetFlow plugin to keep track of possible bottleneck issues and triage possible root causes for network faults/performance issues.