Implementing Geolocation with Graylog Pipelines
Geolocation can be automatically built into the Graylog platform by using the "GeoIP Resolver" plugin with a MaxMind database. However, you can further improve your ability to extract meaningful and useful data by leveraging the functionality of pipelines and lookup tables. In fact, these powerful features allow you to do much more than the basic plugin.
Enhanced Geolocation Fields
The amount of information found in the MaxMind database is much larger than the data that the GeoIP Resolver can extract. Other than grabbing the city, country code and IP coordinates, you can also find info such as postal code, time zone, or name of the country in a different language.
Switching Processing Order
The GeoIP Resolver plugin runs after the pipeline processing, so that all geolocation fields have already been normalized. However, this means that you can’t set any pipeline rule based on geolocation. On the other hand, doing geolocation with a pipeline rule after normalization occurred enables a broader range of interactions with the new fields. Future pipeline rules may characterize inbound firewall logs so that every non-US country is labeled as “foreign.”
The plugin is inherently less efficient than a pipeline leveraging a lookup table. In fact, the GeoIP Resolver must read the entire message to search for potential IPs that fit the traditional pattern (*.*.*.*). With pipeline rules, conditional statements can be leveraged to select only certain messages, and then specify the fields where the IP address may be found. In a nutshell, the whole process will become faster and more efficient.
Setting the Lookup Table Up
First thing first, we must create a ready-for-use geolocation lookup table ready for use. Here’s a step by step guide.
Storing the MaxMind Database
Just like with the plugin, you must first upload a MaxMind Database on your Graylog server. To do so, just follow the “Gelocoation” section of the "Configure the database" article in the Graylog documentation.
Setting up the Data Adapter
Graylog comes equipped with a built-in Data Adapter Type for MaxMind Databases. Just set the File path to the place where your mmdb file is, and the Database type to City Database. Then, save everything with “Create Adapter”.
Create the Geolocation Cache
To save precious computing resources to repeatedly lookup common values, a cache can be used to “remember” value pairs. Lookups are stored for some time so that data can be quickly pulled from memory with no need for additional API calls. Geolocation cache settings are standard, so you can just configure them as described by the lookup table documentation.
Create the Lookup Table
Creating the lookup table is a pretty self-explanatory process. Go to System > Lookup Tables and select “Create Lookup Table”. Fill out all the fields using the Data Adapter and Cache that you created in the previous steps, and then click on “Create Lookup Table” to save it for the pipelines. It should look like the screenshot below:
Setting Up the Pipelines
This section will explain how to create pipeline rules for looking up the relevant information inside our two normalized IP fields.
Note: Customize your field names with what you actually use in your own environment.
Our fields have been normalized to dst_ip and src_ip. In a separate pipeline, we also created two fields,dst_ip_is_internal and src_ip_is_internal,to identify RFC 1918 addresses. We are excluding all "is_internal" address because we do not want to locate RFC 1918 addresses. To do so, we need to build the "when" statement (our conditions) to check whether the message has a src_ip_is_internal field and whether that field is false. Here we go:
$message.src_ip_is_internal == false
This statement filters only the relevant messages. Now we can start performing our lookups.
Our first function is a let that sets a variable "geo" to the result of a lookup function. This lookup function defines both the lookup table (geoip-lookup) and the field to perform the lookup against, the string representation of the src_ip field.
let geo = lookup("geoip-lookup", to_string($message.src_ip));
Since we have defined the geo variable we can now interact with the values in the "multi-value" JSON object. We can create our geolocation information fields using the set_field function, and then map them to specific items inside the lookup table. Here are a few examples of the types of data that you can extract and the formatting.
Just put all those sections together to create a pipeline rule that will perform a geolookup and assign the values you are interested in knowing to their own fields.
Here is an example of the entire rule but performed on any dst_ip fields.
rule "dst_ip geoip lookup"
has_field("dst_ip_is_internal") && $message.dst_ip_is_internal == false
let geo = lookup("geoip-lookup", to_string($message.dst_ip));
The examples above are just a starting point. Feel free to customize these rules to run them against any IP-related fields or to extract additional geolocation data found in the MaxMind Database.
Our special thanks go to the amazing author of this post, Megan Roddie. Megan is a cyber threat researcher working for IBM X-Force IRIS. She holds a M.S. in Digital Forensics as well as several industry certifications.