Bolster OT Security with Graylog

Graylog can help secure industrial and operational assets

Anyone tracking the evolution of the IT industry is probably familiar with the concept of Industry 4.0. Essentially, it describes the process by which traditional industrial tasks become both digitized and continually managed in an IT-like fashion via modern technologies like cloud computing, digital twins, Internet of Things (IoT) sensorization, and artificial intelligence/machine learning.

An Industry 4.0 rollout, though, is likely to run into numerous management complications due to the various fundamental gaps between operational technology (OT) assets and IT assets. Management technologies and strategies that apply to a desktop, firewall, server, or network router may simply not apply naturally — or at all — to a turbine, filler, pump, or generator.

Fortunately, Graylog can play a key role in closing that gap. Thanks to its log aggregation and search capabilities, which typically apply even to OT assets, Graylog can support key information sharing and related management functions and thus help make an Industry 4.0 rollout faster, more cost-effective… and perhaps most importantly, more secure.

Failing to secure the OT infrastructure can be costly — sometimes catastrophic

Before we explore exactly how Graylog’s capabilities come into play in such a context, let’s consider, as a case study of what can happen to an insufficiently secure OT infrastructure, the May 2021 Colonial Pipeline ransomware hack.

This notorious breach of security, executed by a Russian criminal organization called DarkSide, led to the shutdown of America’s largest oil pipeline, which provides oil (fuel) for much of the east coast. Once the company’s OT infrastructure had been compromised and key databases had been encrypted, the organization had no choice but to pay an exorbitant ransom to DarkSide to restore operational control. It then analyzed its infrastructure several business days to determine whether the root problem had been solved before reopening the pipeline. Meanwhile, lines at gas stations formed in the affected states as fuel swiftly ran out, fuel prices escalated to extraordinary levels. In some cases, fuel simply wasn’t available at all. Several state governors declared a formal state of emergency; even the White House got involved.

And while the Colonial Pipeline scenario may seem unusual or new due to its broad media coverage, it’s neither of those things. Such attacks have been happening for years, though they haven’t always been discovered or reported in a timely fashion.

US intelligence officials announced in July 2021 that the Chinese government’s Ministry of State Security had in the past contracted with hackers to “conduct unsanctioned cyber operations globally,” including in cases involving OT assets. By way of the Lazarus criminal organization, the North Korean government is generally believed by intelligence agencies to have been behind the similar 2017 global WannaCry ransomware attacks. And going back to 2013, the Bowman Avenue Dam just north of New York City was compromised by a team of Iranian hackers who gained control of the dam’s SCADA management system and were only prevented from opening a sluice gate because that gate didn’t work correctly. Attacks of this type are always made easier because OT assets are not as routinely secured and comprehensively managed as IT assets.

Matters get even more complicated when considering that OT assets often involve warranty and management terms that would seem alien to an IT manager. Specifically, in many cases, OT assets cannot be updated directly by the organizations that bought them and own them.

Instead, such assets can only be updated by the manufacturer. Suppose the asset owners attempt to bypass this policy and update the assets by installing a new operating system or security patch on a Windows-based management server. In that case, the asset’s warranty will be invalidated.

This situation puts the organization in the vulnerable position of securing the OT infrastructure, depending on an outside company — the asset manufacturer. And all too often, that manufacturer doesn’t attend to the process with adequate speed or comprehensiveness.

Because OT assets are typically deployed in a distributed way and aren’t managed as part of the traditional IT network, security patches and updates to key apps and operating systems aren’t provisioned very quickly or efficiently by manufacturer-approved agents. That, in turn, means the window of opportunity for hackers and criminal organizations stays open much longer than it usually would.

In a case like Colonial Pipeline, such a problem can mean the difference between business success and failure. If the entire business model relies on a secure OT infrastructure, that model is only as successful as the OT assets’ weakest security link. The company assets that are the most critical are thus, in an unfortunate irony, the very assets that are the furthest outside the company’s power to secure — at least, if the warranty is to be preserved. This puts the organization in the uncomfortable position of deciding whether it would rather have for its critical infrastructure (1) the most up-to-date security or (2) the most comprehensive warranty coverage.

Leveraging Graylog to secure the OT infrastructure

So what can Graylog do to solve this seemingly unsolvable puzzle? The answer, of course, is quite a lot. Let’s walk through some of the possibilities.

First, it’s essential to understand that OT assets are typically managed via servers running some version of Windows or Linux. These operating systems may or may not be out of date (sometimes far out of date), but they are still on the IP network, and the logs they generate can still be accessed, aggregated, and searched via Graylog. Furthermore, this process in no way invalidates the asset warranty.

Graylog is thus an ideal platform to use in initially assessing the security status of OT assets (and, by extension, the services those assets provide to the organization). Using Graylog, it’s easy to establish key information such as the server OS type and version number, the management application type and version number, the full server and application history (including security patches installed), the frequency (or existence) of data backups involving that server, and many other essential points.

This key insight represents a starting point for the organization in creating a new strategy to secure OT assets and services. Once this level of transparency is established, the organization knows exactly which servers are the most out of date and presumably vulnerable, which are less so, and which are current and need no updates at all. It’s also possible to correlate different servers (and OT assets) against established business priorities, which is necessary first to create a plan to deal with the biggest potential business vulnerabilities.

Having determined and prioritized potential security shortfalls, the organization can pinpoint cases where warranty conflicts apply. Not all OT assets such as turbines or pumps involve such conflicts, and if they don’t in any particular case, the next logical step would be to apply security updates posthaste.

In cases where a warranty conflict does apply, the organization also now has a much more straightforward concept of the extent of the problem. It knows both the number and nature of out-of-date assets, where they fall in the infrastructure, which services they support, and which manufacturers are involved.

This provides the organization with new leverage to bring to bear in negotiating with the manufacturers involved. Instead of submitting a general update request, the organization can build a more specific argument: “We have three hundred twenty-two servers managing operational assets that need security updates at the following four sites. Of them, sixty-four servers, which are four years out of date from a security standpoint, support business-critical services. Our potential exposure is therefore considerable. So we’re hoping these servers can all be updated quickly, or in the near future, we may need to find a new manufacturer.”

Additional value from Graylog can be achieved by taking into account granular backup data drawn from server logs and incorporating this into the security strategy. This can be a particularly effective approach in ransomware threats such as the one made famous in the Colonial Pipeline breach, the WannaCry attacks, and countless others.

The idea is to leverage Graylog to discover and assess the business priority of key digital assets like core databases. Ransomware, as a strategy, works by encrypting such assets under the assumption that once they’re encrypted, they’re unusable, and the organization will have no choice but to pay the ransom (the size of which will be up to the attacker).

However, if the organization frequently backs up such assets or even continuously mirroring them to another logical location, such an attack can be addressed far more efficiently. The organization can detect and resolve the root cause of the security failure, delete the encrypted asset, replace it from the backup, and carry on operations normally — all without ever paying a dime to the attacker. Graylog can provide insight into OT server backup operations needed to make this strategy succeed.

Since encryption is processor-intensive, it is also possible to leverage Graylog to track server activity, correlating it against the average processor workload, thus detecting the possibility that large business databases are being encrypted in real-time. This advance notice can be invaluable to the organization in dealing with the problem.

However, making either strategy work does require clear insight into exactly which assets are business-critical, how and when they’re typically used, and under what circumstances that usage level changes — all of which Graylog can deliver.

Taking matters to the next level might involve deploying an endpoint security solution based on AI (artificial intelligence) specifically designed for OT assets. Such a solution can, via its sophisticated security capabilities, shore up the problematic aspects of an outdated OS or management application, even if they are many years out of date. And because no new updates of those types are installed, the warranty will not be invalidated.

Here, too, Graylog can play an essential role by establishing exactly which servers are best suited for such an AI product, whether it has been deployed yet in any given case, and then tracking that AI product’s update history and logged behavior over time.

And because Graylog can aggregate the logs created by that product, it can also help deliver the AI-driven insights those logs contain to other solutions in the overarching security architecture, even in cases where there is no direct interoperability. Graylog can, in this context, play an almost SOAR-like role in helping to orchestrate different security solutions from different vendors in the pursuit of the organization’s larger goals and strategies.

Thus, Graylog helps organizations handle the difficult problem of securing and managing OT assets by empowering them to:

Discover key asset information such as server OS/app versions
Establish the security posture of those assets against the organization’s goals
Establish the business priority of the associated services
Discover whether and to what extent warranty conflicts might come into play in the process of updating the assets
Bolster negotiations with manufacturers when conflicts do apply
Create and optimize sophisticated backup strategies designed to mitigate the potential impact of ransomware
Monitor and get the best use of AI-based security solutions designed for OT assets

‍

Jeff Darrington

Jeff Darrington is Graylog's Director, Technical Marketing. He is a long-time Graylog OS user with extensive experience in IT Operations, IT product solutions deployment in Firewalls, Networking, VOIP, Physical security Controls, and many others.

View More Posts By Jeff Darrington