Fast, highly available storage is expensive. That means setting long retention times in Graylog or any other log management system can trigger serious cost constraints.
The good news is that, for the majority of use cases, you only need to perform instant log searches over a relatively short period of time. Many of our users want to keep 365 days of log data, but usually search over only the last 30 days.
Graylog has a configuration that tells it how long to keep log data. The standard behavior is to just delete data that contains log messages older than the configured retention period. The archiving functionality configures Graylog to automatically write all messages of an index to flat files on disk before deleting the index.
Simply configure the details in the Graylog console: where to send the index files, how long before re-indexing, how long to keep the files before overwriting, how large the files are allowed to get, what compression type you want to use, and a few other options. You can even archive some streams while automatically deleting old data in others to minimize the amount of storage you need while still addressing your different use cases.
If you need to take another look at archived data, you can temporarily re-import any archives for analysis in Graylog using the web interface. After you finish, you can once again delete the imported archive data.
This is one of the most common reasons our customers archive their data. Depending on industry and data processed, it could be just a few months’ worth, 3-5 years, or even near permanently.
It’s not unusual to find that a security incident began months before it was discovered. Logs are critical for determining when the exposure first occurred and what damage was done.
Because the archive files are simple plain text files, you can store them wherever you want. Put them on tape, burn them to a DVD, move them to cheap storage, or upload them to a cloud server. You’ll be able to temporarily re-import the data back into Graylog whenever needed.
Yes, they are GZIP compressed by default, but you can apply any other compression that you want.
Yes, you can apply any encryption or signature mechanisms that your operating system offers. Just automatically run it over the files when they are written.
The archiving process itself is not very IO-intensive, and you should not expect any serious sizing challenges with it. We use a special method to get all messages and apply no sorting, scoring, or other expensive algorithms.
However, be aware that importing large archives back in to Graylog can of course stress your storage cluster. You can import archives into a second, dedicated Graylog cluster with no special confirmation or tricks to circumvent this problem.