Log Formats – a (Mostly) Complete Guide

Log management software operates on the basis of receiving, storing, and analyzing different types of log format files. There are several of these standardized log formats that are most commonly generated by a wide assortment of different devices and systems.

As such, it is important to understand how they operate and differ from one another so that you can use them the right way, as well as avoid some common mistakes.

What Is a Log Format?

The main problem with log files, and the need for a structured format, is that they are typically unstructured text data. This makes it very difficult to query the logs for any useful information. A log format is a structured format that allows logs to be machine-readable and easily parsed. This is the power of using structured logs and a log management system that supports them. The ability to translate raw data into something immediately comprehensible and easy to read is one of the must-have features of log management software.

Using a Syslog Server

Syslog provides a mechanism for network devices to send event messages to a logging server which is typically known as a Syslog server. The Syslog protocol can be used to log different types of events and is supported by a wide range of devices. An example of how Syslog can be utilized is, a firewall might send messages about systems that are trying to connect to a blocked port, while a web-server might log access-denied events. Most network equipment, such as routers, switches, and firewalls can send Syslog messages. Additionally, some printers and web-servers such as Apache have the ability to send Syslog messages. Windows-based servers however, don’t support Syslog natively, but there are a large number of third-party tools it easy to collect Windows Event Logs and forward them to a Syslog server.

Syslog servers provide a way to consolidate logs from multiple sources into a single location. Typically, most Syslog servers have the following components:

• A Syslog Listener: A mechanism to receive the Syslog messages.

• A Database: Typically, network devices generate huge amounts of Syslog data. Usually, Syslog servers will use some type of database to store Syslog data for quick retrieval.

• Management and Filtering Software: Due to the potential for large amounts of data to be sent to the Syslog server, it can be difficult to find specific log entries The solution is to use a Syslog server that makes it easy to filter and view important log messages Syslog servers typically have the ability to generate alerts, notifications, and alarms in response to select messages. This allows the administrators to be notified as soon as an issue occurs allowing them to quickly action the events.

Please check out the article on how to use Graylog as a Syslog server.

There are a few downsides to Syslog though. First, the Syslog protocol doesn’t define a standard format for message content and there are endless ways to format a message. Syslog just provides a transport mechanism for the message. Additionally, the way Syslog transports the message, network connections are not guaranteed so there is the potential to lose some of the log messages. Finally, there are security challenges as well. The main security issue is that there is no authentication for Syslog messages meaning that there is a potential for messages to come from unknown or unauthorized sources.

JSON Log Format

The JSON (JavaScript Object Notation) is a highly readable data-interchange format that has established itself as the standard format for structured logging. It is compact and lightweight, and very simple to read and write for both humans and machines. It can be parsed by nearly all programming languages, even those that don’t have built-in JSON functionality. JSON is a universal format due to its Unicode encoding, so it doesn’t matter whether you’re using a PC or Mac or the server you’re running.

Logging to JSON is a staple for log management and monitoring. This format is usually preferred to plain text since it offers a lot of flexibility in creating field-rich databases for later searches. JSON logs are richer than most other log formats, and they’re widely used for structured logging since they can be easily enriched with extra context and metadata. common use case of JSON-based filtering is including a log level such as “ERROR” in the data so the logs containing this information can be parsed quickly for troubleshooting purposes.

A single log event can also be generated by wrapping several log lines into a field. Albeit convenient, this can make the size of log files grow exponentially so adequate storage or log rotation is critical. If you’re logging to JSON, make sure to make full use of Graylog’s Archiving feature to save your precious space. If you want more info on how to ship your JSON logs to Graylog and parse them off in a clean and understandable format, you can have a look at our video guide here.

Windows Event Log

The Windows event log provides a detailed record of the operating system, application, and security and event notifications that are captured and stored by the Windows operating system. These events are typically used by system administrators to diagnose the potential issue and to prevent future problems. Operating Systems and Applications use these event logs to record important hardware and software actions that can be used to troubleshoot potential issues with the operating system and the installed applications. The Windows operating system creates log files to track events such as application installations, system setup operations, errors, and security issues.

The elements of a Windows event log include:

o The date the event occurred.

o The time the event occurred.

o The username of the user logged onto the machine when the event occurred.

o The name of the computer.

o The Event ID is a Windows identification number that specifies the event type.

o The Source which is the program or component that caused the event.

o The type of event, including information, warning, error, security success audit or security failure audit.

The Windows event log captures operating system, setup, security, application, and forwarded events.

• System events are incidents on the Windows operating system and these incidents could include items such as device drivers or other OS component errors.

• Setup events include events relating to the configuration settings of the operating system.

• Security events utilize the Windows system's audit policies, and these events include user login attempts and system resource access.

• Application events are incidents with the software that is installed on the local operating system. If an installed application crashes, a log entry about the issue will be created by the Windows event log and will include the application name and what caused it to crash.

• Forwarded events that are sent from other systems on the same network when an administrator wants to use a computer that gathers multiple logs.

Microsoft also provides a command-line utility that retrieves event logs, runs queries, exports logs, archives logs, and clear logs. Graylog and other Third-party utilities can also work with Windows event logs to provide additional log search, correlation, and event details.

CEF Format

The Common Event Format (CEF) is an open logging and auditing format from ArcSight. It is a text-based, extensible format that contains event information in an easily readable format. CEF has been created as a common event log standard so that security information coming from different network devices, apps and tools could be easily shared. It is used to improve interoperability of sensitive information and simplify integration between security and non-security devices as well by acting as a transport mechanism.

CEF can be used with both on-premise devices and by cloud-based service providers by implementing the ArcSight Syslog SmartConnector. CEF uses the UTF-8 Unicode encoding method, so the entire message must be UTF-8 encoded. The Syslog CEF forwarder compiles each event in CEF according to a specific, reduced syntax that works with ESM normalization. The base CEF format comprises a standard header and a variable extension constituted by several fields logged as key-value pairs. The header is a common prefix applied to each message containing the date and hostname, as in the example below:

Feb 23 12:54:06 host message

It also includes several fields formatted using a common prefix composed of fields separated by bar characters:

CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension

The extension part of the CEF message is a placeholder for additional fields. These strings are used to uniquely identify information such as the version of the CEF format, the type of sending device, the type of event reported, and much more. For example, the Signature ID identifies a specific event so that it can be easily identified by a correlation engine even when this activity is detected from different devices.

Graylog Extended Log Format – GELF

The GELF, short for Graylog Extended Log Format is Graylog’s own log file format. The GELF was developed with the express aim to fix the shortcomings of the classic Syslog and to take full advantage of all the many features and capabilities of the Graylog tool.

By itself, Syslog is limited to 1024 bytes in length, and UDP (User Datagram Protocol) datagrams can’t go over 8192 bytes. That is why the GELF supports chunking. You can chunk your messages by prepending a byte header to a GELF message and then transport these logs via UDP, TCP (Transmission Control Protocol), and sometimes via HTTP.

There is also the option to save on network bandwidth by somewhat increasing your CPU usage - select if you want to send messages in an uncompressed, GZIP’d or ZLIB’d format and Graylog will do the rest.

Every GELF log message contains the following fields:

● The host (the creator of the message)

● The timestamp

● The version

● The long and short versions of the message

● Several other custom fields you can freely configure to your own preferences

An example GELF file:

{

 "version": "1.1",

 "host": "example.org",

 "short_message": "A short message that helps you identify what is going on",

 "full_message": "Backtrace here\n\nmore stuff",

 "timestamp": 1385053862.3072,

 "level": 1,

 "_user_id": 9001,

 "_some_info": "foo",

 "_some_env_var": "bar"

}

Common Log Format – NCSA

The NCSA Common log format - also known as the Common Log Format - is a fixed (non-customizable) log format that is used by web servers when they generate server log files. It was named after NCSA_HTTPd, an early, now discontinued, web server software, which served as the basis for the far more popular free open-source cross-platform web server software - Apache HTTP Server Project.

Every line in this log format is stored using this standardized syntax:

host ident authuser date request status bytes

To further illustrate, here is an example of a typical NCSA:

127.0.0.1 user-identifier john [20/Jan/2020:21:32:14 -0700] "GET /apache_pb.gif HTTP/1.0" 200 4782

Here is an explanation of what every part of this code means:

● 127.0.0.1 - refers to the IP address of the client (the remote host) that made the request to the server.

● user-identifier is the Ident protocol (also known as Identification Protocol, or Ident) of the client.

● john is the userid (user identification) of the person that is requesting the document.

● [20/Jan/2020:21:32:14 -0700] - is the date, time, and time zone that logs when the request was attempted. By default, it is in the strftime format of %d/%b/%Y:%H:%M:%S %z.

● "GET /apache_pb.gif HTTP/1.0" is the client’s request line. GET refers to the method, apache_pb.gif is the resource that was requested, and HTTP/1.0 is the HTTP protocol.

● 200 is the HTTP status code that was returned to the client after the request. 2xx is a successful response, 3xx is a redirection, 4xx is a client error, and 5xx is a server error.

● 4782 is the size of the object - measured in bytes - that was returned to the client in question.

Extended Log Format – ELF

ELF is short for Extended Log Format. It is very similar to the Common Log Format (NCSA), but ELF files are a bit more flexible and they contain more information.

This is an example of an ELF file:

#Version: 1.0

#Date: 12-Jan-1996 00:00:00

#Fields: time cs-method cs-uri

00:34:23 GET /foo/bar.html

12:21:16 GET /foo/bar.html

12:45:52 GET /foo/bar.html

12:57:34 GET /foo/bar.html

The (#) sign indicates the start of a directive. The following directives are defined:

● Version - the version of the Extended Log file format used.

● Fields - which fields are recorded in the log.

● Software - the software that generated the log.

● Start-Date - the exact date and time when the log was started.

● End-Date - the exact date and time when the log was finished.

● Date - the exact date and time when the log was added.

● Remark - Comments. These are ignored by log management tools and similar log file analysis software.

W3C Log File

The W3C Extended Log Format is a customizable format used by the Microsoft Internet Information Server (IIS) versions 4.0 and 5.0.

Since it is customizable, you can add or omit different fields according to your needs and preferences, which can increase or decrease the size of the file. Properly archiving data logs is an important part of log management and is essential for system administrators, cybersecurity experts, not to mention - auditing procedures and compliance standards.

W3C Extended Logging Fields

W3C log file example

#Software: Microsoft Internet Information Services 4.0  #Version: 1.0  #Date: 2002-12-12 19:12:42

#Fields: time c-ip cs-method cs-uri-stem sc-status cs-version

19:12:42 172.16.255.255 GET /default.htm 500 HTTP/1.0

● #Software - the software that was involved in the generation of the log.

● #Version - indicates that the W3C logging format 1.0 was used here.

● #Date - the exact date and time when the entry was added.

● #Fields - Time, Client IP Address, Method, URI Stem, HTTP Status, and the HTTP Version.

● 19:12:42 172.16.255.255 GET /default.htm 200 HTTP/1.0 - at 19:12:42 UTC (Greenwich Mean Time), the user with the IP address of 172.16.255.255 and HTTP version 1.0 issued an HTTP GET command for the file Default.htm, but the request was denied with a 500 Internal Server Error warning.

Microsoft IIS (Internet Information Server) Log File

The Microsoft IIS (Internet Information Server) is another fixed log file format. It includes more information than the NCSA Common log format. While it records the usual data such as the user’s name, IP address, the date and time when the request took place, it also has additional information - like how long the processing time of the request took (in milliseconds).

Here is how the Microsoft IIS log file looks when you open it in a word processing program:

192.168.114.201, -, 03/20/01, 7:55:20, W3SVC2, SALES1, 172.21.13.45, 4502, 163, 3223, 200, 0, GET, /DeptLogo.gif, -,

172.16.255.255, anonymous, 03/20/01, 23:58:11, MSFTPSVC, SALES1, 172.16.255.255, 60, 275, 0, 0, 0, PASS, /Intro.htm, -,

● 192.168.114.201 - the user’s IP address

● - indicates that the user is anonymous

● 03/20/01 - the date

● 7:55:20 - the time

● W3SVC2 - Service and Instance

● SALES1 - the name of the computer

● 172.21.13.45 - the IP address of the server

● 4502 - Time taken in milliseconds

● 163 - how many bytes were received

● 3223 - how many bytes were sent back

● 200 - Service Status Code

● 0 - Windows NT/2000 Status Code

● GET - request type

● /DeptLogo.gif - the operation’s target

Note that the fields are separated from each other with a comma (,) and that the hyphen (-) is used whenever a field doesn’t have a valid value available to it.

Open Database Connectivity (ODBC) Log File

The ODBC is the logging format of a fixed set of data fields that are compliant with an Open Database Connectivity (ODBC) database, like the Microsoft Access or Microsoft SQL Server.

ODBC logging is a bit more complex than most types of logging, and requires some tinkering. You have to both specify the database that you want to be logged to, and you have to manually set up the database table in order to receive the log data.

There is a SQL template file included in the IIS (Internet Information Server) that must be run in a SQL database. This file, named “Logtemp.sql” is, by default, found in this location:

c:\winnt\system32\inetsrv\logtemp.sql

This file is then used the following table:

After making this table, you have to also create a DSN (Data Source Name) that the ODBC will then use to locate the database.

The final step is to provide the IIS with the name of the database and this table. If the database is protected with a username and password, you will also have to specify them in the IIS as well.

Conclusion

These are the most widely used log format files used across the world. Every one of them has their own specifications, good points, and drawbacks. Which one you will use depends on factors such as your software and hardware setup, and the needs of your company, but Graylog excels at analyzing and archiving each and every one of them.

We hope that this article will help you to prevent issues that can arise between different log file formats, and understand how they function and how they are created.

Contact sales