The Graylog blog

Log Formats – a (Mostly) Complete Guide

Log management software operates based on receiving, storing, and analyzing different types of log format files. A wide assortment of devices and systems generate these standardized log formats.

As such, it is essential to understand how they operate and differ from one another so that you can use them the right way and avoid some common mistakes.

WHAT IS A LOG FORMAT?

The main problem with log files, and the need for a structured format, is that they are typically unstructured text data, making it difficult to query the logs for any useful information. A log format is a structured format that allows logs to be machine-readable and easily parsed. This is the power of using structured logs and a log management system that supports them. The ability to translate raw data into something immediately comprehensible and easy to read is one of the must-have features of log management software.

USING A SYSLOG SERVER

Syslog provides a mechanism for network devices to send event messages to a logging server known as a Syslog server. You can use the Syslog protocol, which is supported by a wide range of devices,  to log different events. An example of how Syslog can be utilized is, a firewall might send messages about systems that are trying to connect to a blocked port, while a web-server might log access-denied events. Most network equipment, such as routers, switches, and firewalls can send Syslog messages. Additionally, some printers and web-servers such as Apache have the ability to send Syslog messages. Windows-based servers, however, don’t support Syslog natively, but there are a large number of third-party tools that make it easy to collect Windows Event Logs and forward them to a Syslog server.

Syslog servers provide a way to consolidate logs from multiple sources into a single location. Typically, most Syslog servers have the following components:

• A Syslog Listener: A mechanism to receive the Syslog messages.

• A Database: Typically, network devices generate vast amounts of Syslog data. Usually, Syslog servers will use some type of database to store Syslog data for quick retrieval.

• Management and Filtering Software: Due to the potential for large amounts of data to be sent to the Syslog server, it can be challenging to find specific log entries The solution is to use a Syslog server that makes it easy to filter and view important log messages Syslog servers typically have the ability to generate alerts, notifications, and alarms in response to select messages. The administrators receive notifications as soon issues occur, making it easy to act quickly.

Please check out the article on how to use Graylog as a Syslog server.

There are a few downsides to Syslog, though. First, the Syslog protocol doesn’t define a standard format for message content, and there are endless ways to format a message. Syslog just provides a transport mechanism for the message. Additionally, the way Syslog transports the message, network connections are not guaranteed so there is the potential to lose some of the log messages. Finally, there are security challenges. The main one is that there is no authentication for Syslog messages meaning that there is a potential for messages to come from unknown or unauthorized sources.

JSON LOG FORMAT

The JSON (JavaScript Object Notation) is a highly readable data-interchange format that has established itself as the standard format for structured logging. It is compact and lightweight, and simple to read and write for humans and machines. It can be parsed by nearly all programming languages, even those that don’t have built-in JSON functionality. JSON is a universal format due to its Unicode encoding, so it doesn’t matter whether you’re using a PC or Mac or the server you’re running.

Logging to JSON is a staple for log management and monitoring. This format is usually preferred to plain text since it offers flexibility in creating field-rich databases for later searches. JSON logs are richer than most other log formats, and they’re widely used for structured logging since they can be easily enriched with extra context and metadata. The common use case of JSON-based filtering is to include a log level such as “ERROR” in the data so the logs containing this information can be parsed quickly for troubleshooting purposes.

You can generate a  single log event by wrapping several log lines into a field. Albeit convenient, this can make the size of log files grow exponentially so adequate storage or log rotation is critical. If you’re logging to JSON, make sure to make full use of Graylog’s Archiving feature to save your precious space. If you want more info on how to ship your JSON logs to Graylog and parse them off in a clean and understandable format, you can have a look at our video guide here.

‍WINDOWS EVENT LOG

The Windows event log provides a detailed record of the operating system, application, and security and event notifications that are captured and stored by the Windows operating system. These events are typically used by system administrators to diagnose the potential issue and to prevent future problems. Operating Systems and Applications use these event logs to record important hardware and software actions that can be used to troubleshoot potential issues with the operating system and the installed applications. The Windows operating system creates log files to track events such as application installations, system setup operations, errors, and security issues.

The elements of a Windows event log include:

  • The date the event occurred.
  • The time the event occurred.
  • The username of the user logged onto the machine when the event occurred.
  • The name of the computer.
  • The Event ID is a Windows identification number that specifies the event type.
  • The Source which is the program or component that caused the event.
  • The type of event, including information, warning, error, security success audit or security failure audit.

 

The Windows event log captures operating system, setup, security, application, and forwarded events.

  • System events are incidents on the Windows operating system and these incidents could include items such as device drivers or other OS component errors.
  • Setup events include events relating to the configuration settings of the operating system.
  • Security events utilize the Windows system’s audit policies, and these events include user login attempts and system resource access.
  • Application events are incidents with the software that is installed on the local operating system. If an installed application crashes, a log entry about the issue will be created by the Windows event log and will include the application name and what caused it to crash.
  • Forwarded events sent from other systems on the same network when an administrator wants to use a computer that gathers multiple logs.

 

Microsoft also provides a command-line utility that retrieves event logs, runs queries, exports logs, archives logs, and clear logs. Graylog and other Third-party utilities can also work with Windows event logs to provide additional log search, correlation, and event details.

 

CEF FORMAT

The Common Event Format (CEF) is an open logging and auditing format from ArcSight. It is a text-based, extensible format that contains event information in an easily readable format. CEF has been created as a common event log standard so that you can easily share security information coming from different network devices, apps, and tools. You can also use it to improve interoperability of sensitive information and simplify integration between security and non-security devices by acting as a transport mechanism.

You can use CEF with both on-premise devices and by cloud-based service providers by implementing the ArcSight Syslog SmartConnector. CEF uses the UTF-8 Unicode encoding method, so the entire message must be UTF-8 encoded. The Syslog CEF forwarder compiles each event in CEF according to a specific, reduced syntax that works with ESM normalization. The base CEF format comprises a standard header and a variable extension constituted by several fields logged as key-value pairs. The header is a common prefix applied to each message containing the date and hostname, as in the example below:

Feb 23 12:54:06 host message

It also includes several fields formatted using a common prefix composed of fields separated by bar characters:

CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension

The extension part of the CEF message is a placeholder for additional fields. These strings are used to uniquely identify information such as the version of the CEF format, the type of sending device, the type of event reported, and much more. For example, the Signature ID identifies a specific event so that it can be easily identified by a correlation engine even when this activity is detected from different devices.

GRAYLOG EXTENDED LOG FORMAT – GELF

The GELF, short for Graylog Extended Log Format, is Graylog’s own log file format. The GELF was developed with the express aim to fix the shortcomings of the classic Syslog and take full advantage of the many features and capabilities of the Graylog tool.

By itself, Syslog is limited to 1024 bytes in length, and UDP (User Datagram Protocol) datagrams can’t go over 8192 bytes. That is why the GELF supports chunking. You can chunk your messages by prepending a byte header to a GELF message and then transport these logs via UDP, TCP (Transmission Control Protocol), and sometimes via HTTP.

There is also the option to save on network bandwidth by somewhat increasing your CPU usage – select if you want to send messages in an uncompressed, GZIP’d or ZLIB’d format and Graylog will do the rest.

Every GELF log message contains the following fields:

  • The host (the creator of the message)
  • The timestamp
  • The version
  • The long and short versions of the message
  • Several other custom fields you can freely configure to your own preferences

An example GELF file:

{

 "version": "1.1",
 "host": "example.org",
 "short_message": "A short message that helps you identify what is going on",
 "full_message": "Backtrace here\n\nmore stuff",
 "timestamp": 1385053862.3072,
 "level": 1,
 "_user_id": 9001,
 "_some_info": "foo",

 "_some_env_var": "bar"

}

 

COMMON LOG FORMAT – NCSA

The NCSA Common log format – also known as the Common Log Format – is a fixed (non-customizable) log format used by web servers when they generate server log files. It was named after NCSA_HTTPd, an early, now discontinued, web server software, which served as the basis for the far more popular open-source cross-platform web server software – Apache HTTP Server Project.

Every line in this log format is stored using this standardized syntax:

host ident authuser date request status bytes

‍To further illustrate, here is an example of a typical NCSA:

127.0.0.1 user-identifier john [20/Jan/2020:21:32:14 -0700] “GET /apache_pb.gif HTTP/1.0” 200 4782

‍Here is an explanation of what every part of this code means:

  • 127.0.0.1 – refers to the IP address of the client (the remote host) that made the request to the server.
  • user-identifier is the Ident protocol (also known as Identification Protocol, or Ident) of the client.
  • john is the userid (user identification) of the person that is requesting the document.
  • [20/Jan/2020:21:32:14 -0700] – is the date, time, and time zone that logs when the request was attempted. By default, it is in the strftime format of %d/%b/%Y:%H:%M:%S %z.
  • “GET /apache_pb.gif HTTP/1.0” is the client’s request line. GET refers to the method, apache_pb.gif is the resource that was requested, and HTTP/1.0 is the HTTP protocol.
  •  200 is the HTTP status code that was returned to the client after the request. 2xx is a successful response, 3xx is a redirection, 4xx is a client error, and 5xx is a server error.
  • 4782 is the size of the object – measured in bytes – that was returned to the client in question.

MOST COMMON LOG FORMATS – ELF

ELF is short for Extended Log Format. It is very similar to the Common Log Format (NCSA), but ELF files are a bit more flexible, and they contain more information.

Here is an example of an ELF file:

#Version: 1.0
#Date: 12-Jan-1996 00:00:00
#Fields: time cs-method cs-uri
00:34:23 GET /foo/bar.html
12:21:16 GET /foo/bar.html
12:45:52 GET /foo/bar.html
12:57:34 GET /foo/bar.html

The (#) sign indicates the start of a directive. The following directives are defined:

  • Version – the version of the Extended Log file format used.
  • Fields – which fields are recorded in the log.
  • Software – the software that generated the log.
  • Start-Date – the exact date and time when the log was started.
  • End-Date – the exact date and time when the log was finished.
  • Date – the exact date and time when the log was added.
  • Remark – Comments. These are ignored by log management tools and similar log file analysis software.

 

MOST COMMON LOG FORMATS – W3C

The W3C Extended Log Format is a customizable format used by the Microsoft Internet Information Server (IIS) versions 4.0 and 5.0.

Since it is customizable, you can add or omit different fields according to your needs and preferences, increasing or decreasing the size of the file. Properly archiving data logs is an integral part of log management and is essential for system administrators, cybersecurity experts, not to mention – auditing procedures and compliance standards.

W3C EXTENDED LOGGING FIELDS:

 

A W3C log file example:

#Software: Microsoft Internet Information Services 4.0  #Version: 1.0  #Date: 2002-12-12 19:12:42 
#Fields: time c-ip cs-method cs-uri-stem sc-status cs-version 
19:12:42 172.16.255.255 GET /default.htm 500 HTTP/1.0
  • #Software – the software that was involved in the generation of the log.
  • #Version – indicates that the W3C logging format 1.0 was used here.
  • #Date – the exact date and time when the entry was added.
  • #Fields – Time, Client IP Address, Method, URI Stem, HTTP Status, and the HTTP Version.
  • 19:12:42 172.16.255.255 GET /default.htm 200 HTTP/1.0 – at 19:12:42 UTC (Greenwich Mean Time), the user with the IP address of 172.16.255.255 and HTTP version 1.0 issued an HTTP GET command for the file Default.htm, but the request was denied with a 500 Internal Server Error warning.

Most Common Log Files – IIS

The Microsoft IIS (Internet Information Server) is another fixed log file format. It includes more information than the NCSA Common log format. While it records the usual data such as the user’s name, IP address, the date and time when the request took place, it also has additional information – like how long the request’s processing time (in milliseconds).

Here is how the Microsoft IIS log file looks when you open it in a word processing program:

192.168.114.201, -, 03/20/01, 7:55:20, W3SVC2, SALES1, 172.21.13.45, 4502, 163, 3223, 200, 0, GET, /DeptLogo.gif, -,
172.16.255.255, anonymous, 03/20/01, 23:58:11, MSFTPSVC, SALES1, 172.16.255.255, 60, 275, 0, 0, 0, PASS, /Intro.htm, -,
  • 192.168.114.201 – the user’s IP address
  • – indicates that the user is anonymous
  • 03/20/01 – the date
  • 7:55:20 – the time
  • W3SVC2 – Service and Instance
  • SALES1 – the name of the computer
  • 172.21.13.45 – the IP address of the server
  • 4502 – Time taken in milliseconds
  • 163 – how many bytes were received
  • 3223 – how many bytes were sent back
  • 200 – Service Status Code
  • 0 – Windows NT/2000 Status Code
  • GET – request type
  • /DeptLogo.gif – the operation’s target

Note a comma (,) separates the fields, and the hyphen (-) is used whenever a field doesn’t have a valid value available to it.

Most Common Log Files – ODBC

The ODBC is the logging format of a fixed set of data fields compliant with an Open Database Connectivity (ODBC) database, like the Microsoft Access or Microsoft SQL Server.

ODBC logging is a bit more complicated than most types of logging and requires some tinkering. You have to specify the database you want to be logged to, and you have to manually set up the database table to receive the log data.

A SQL template file is included in the IIS (Internet Information Server) that you must run in a SQL database. This file, named “Logtemp.sql” is, by default, found in this location:

c:winntsystem32inetsrvlogtemp.sql

This file is then used the following table:

 

After making this table, you also have to create a DSN (Data Source Name) that the ODBC will use to locate the database.

The final step is to provide the IIS with the name of the database and this table. If you protect the database with a username and password, you will also have to specify the IIS’s username and password.

Conclusion

These are the most widely used log format files used across the world. Every one of them has their specifications, good points, and drawbacks. Which one you will use depends on factors such as your software and hardware setup and the needs of your company, but Graylog excels at analyzing and archiving every one of them.

We hope this article will help you prevent issues that can arise between different log file formats and understand how they function and how they are created.

 

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog Blog delivered to your inbox once a month.