The process of log collection is sometimes a daunting task, especially if you are planning to collect massive amounts of data. But if you take a minute to answer some key questions before you begin, you can transform the log collection task from daunting to smooth sailing.
Here we go with the questions…
What Are You Trying to Accomplish with Centralized Log Management?
Knowing what it is you are trying to accomplish with centralized log management is the foundation for how you set up Graylog Enterprise in the most effective way for your business.
Your log files provide volumes of data that provide insight into everything happening on your network. With centralized log management, you’re aggregating high volumes of logs across your on-premises, cloud-based, and hybrid environments. The problem is that each data source provides a log message, and the number of sources continues to expand.
For example, a system breach is a scenario that companies in every industry face, and the corresponding desired outcome is to prevent these before they happen. Another example is the need for maximum uptime and peak performance while keeping costs manageable. Proactively preventing issues before they arise is also a focus of every IT team.
What Logs Do You Need to Collect/Monitor?
Knowing what to log is critical to achieving your log management goals. For example, say your company relies on a large, complex, and diverse network of hosts (hundreds of servers worldwide). To maintain uptime, performance, and security, you’ll need to know what logs are critical to these operational functions. One way to figure this out is to look at the logs from these different categories to determine if they provide enough data to produce meaningful results.
For example, IT Ops might monitor network or hardware performance, while DevOps focuses on real-time application layer monitoring or troubleshooting, and security monitors user logins to critical resources. In this context, you would want to log categories of data from operations, security, and possibly DevOps.
Planning your use cases in advance makes the implementation of your centralized log management solution easier.
What Are Your Retention Requirements? How Long Will You Keep the Logs/Data?
A key question when planning your log management system is log retention. Or to put it another way, how long do you need to keep the data? This depends on several different factors.
For example, some regulatory frameworks require retention of event log data for a prescribed period. In the absence of a clear requirement, the question becomes one of balancing the cost of retention (storage) versus the utility of having historical data.
There is no single answer, as each situation is different. The most important thing to remember when designing a retention policy is that you are flexible enough to accommodate the different log sources.
Graylog provides two ways to retain event log data:
- Online — stored in Elasticsearch and is searchable through the Graylog GUI
- Archived — stored in a compressed format, either on the Graylog server or on a network file share. It is still searchable, via GREP for example, but must be reconstituted in Graylog to be searchable through the GUI again.
Most Graylog customers retain 30-90 days online (searchable in Elasticsearch) and 6-13 months of archives.
What Are Your Storage Requirements?
Like most data stores, Elasticsearch reacts badly when it consumes all available storage. To prevent this from happening, proper planning and monitoring are critical.
Many variables affect storage requirements, such as how much of each message is kept, whether the original message is retained once parsing is complete, and how much enrichment is done before storage.
A simple rule of thumb for planning storage is to take your average daily ingestion rate, multiply it by the number of days you need to retain the data online, and then multiply that number by 1.3 to account for metadata overhead. (GB/day x Ret. Days x 1.3 = storage req.).
Elasticsearch makes extensive use of slack storage space in the course of its operations. Users are strongly encouraged to exceed the minimum storage required for their calculated ingestion rate. When at maximum retention, Elasticsearch storage should not exceed 75% of total space.
Who Is Using Graylog?
The number of users is important when it comes to designing your Graylog architecture. It’s also important to know how your teams will use Graylog. With all the data in a centralized location, your IT operations and security teams can work with a shared understanding of everything happening.
For example, if you have junior and/or less technical members troubleshooting user-related issues, Graylog makes it easy to empower them through pre-built content such as dashboards that provide data visualization.
Also, you might have a distributed IT Operations team querying log data simultaneously, you will want to consider this when designing an architecture. Another example is determining access control for different user groups.
Security Analysts will require more access to the log management tool than your Help Desk. Management might have access to everything where Engineers only have access to test environments. Regardless, as in all questions of access control, the principle of least privilege should apply.
What Logs Do You Want to Collect With Graylog
Determining what logs you want to collect is the first step in getting your logs into Graylog. Some companies want to collect all of their logs and others just want to collect only specific logs. Regardless of which category you fit into, planning your log collection greatly reduces the time and resources required to get the most out of Graylog.
There are several decisions that you must make when planning your log collection. Some of these decisions include determining the event sources from which you must collect, how you will collect from these sources, how much of each event type to store, how events should be enriched, and how long to retain the data.
Choosing Event Log Sources
The selection of event sources should be driven by the use cases you have identified. For example, if the use case is monitoring of user logins to critical resources, the event sources selected should be only those of the critical resources in question. Perhaps the LDAP directory server, Local servers, firewalls, network devices, and key applications.
There are many other potential event source categories, including:
- Operating systems
- Endpoint Security (EDR, AV, etc.)
- Web Proxies/Gateways
- LDAP/Active Directory
- Network Devices
- Packet Capture/Network Recorder
- Application Logs
- Load Balancer Logs
- Automation System Logs
- Business Logic
You will need to determine how you will collect the logs. After a list of event sources has been determined, the next step is to decide the method of data collection for each source. It is critical to understand what method each event source uses and what resources that may be required. For example, if a log shipper will be required to read logs from a local file on all servers, a log shipper must be selected and tested before deployment. In other cases, proprietary APIs or software tools must be employed and integrated.In some cases, changes to the event sources themselves (security devices, network hardware, or applications) may be required. Additional planning is often required to deploy and maintain these collection methods over time.
Now that’s out of the way and you’re ready to go. Getting your logs into Graylog is a key step in getting the most out of your Graylog implementation. For a walkthrough of how to get your logs into Graylog, check out this video.