Log management software operates on the basis of receiving, storing, and analyzing different types of log format files. There are several of these standardized log formats that are most commonly generated by a wide assortment of different devices and systems.
As such, it is important to understand how they operate and differ from one another so that you can use them the right way, as well as avoid some common mistakes.
What Is a Log Format?
A log format is a type of file format that log management tools use to compile and read data from. Since they are a kind of ASCII (American Standard Code for Information Interchange) text file, most log files can be opened by the majority of word processing software if their extension is renamed to .txt.
Of course, this will merely open the file so you can see the strings of data inside, and it won’t necessarily help you make sense of what all that information means. The ability to translate that raw data into something immediately comprehensible and easy to read is one of the must-have features of log management software.
Common Log Format – NCSA
The NCSA Common log format - also known as the Common Log Format - is a fixed (non-customizable) log format that is used by web servers when they generate server log files. It was named after NCSA_HTTPd, an early, now discontinued, web server software, which served as the basis for the far more popular free open-source cross-platform web server software - Apache HTTP Server Project.
Every line in this log format is stored using this standardized syntax:
host ident authuser date request status bytes
To further illustrate, here is an example of a typical NCSA:
127.0.0.1 user-identifier john [20/Jan/2020:21:32:14 -0700] "GET /apache_pb.gif HTTP/1.0" 200 4782
Here is an explanation of what every part of this code means:
● 127.0.0.1 - refers to the IP address of the client (the remote host) that made the request to the server.
● user-identifier is the Ident protocol (also known as Identification Protocol, or Ident) of the client.
● john is the userid (user identification) of the person that is requesting the document.
● [20/Jan/2020:21:32:14 -0700] - is the date, time, and time zone that logs when the request was attempted. By default, it is in the strftime format of %d/%b/%Y:%H:%M:%S %z.
● "GET /apache_pb.gif HTTP/1.0" is the client’s request line. GET refers to the method, apache_pb.gif is the resource that was requested, and HTTP/1.0 is the HTTP protocol.
● 200 is the HTTP status code that was returned to the client after the request. 2xx is a successful response, 3xx is a redirection, 4xx is a client error, and 5xx is a server error.
● 4782 is the size of the object - measured in bytes - that was returned to the client in question.
Extended Log Format – ELF
ELF is short for Extended Log Format. It is very similar to the Common Log Format (NCSA), but ELF files are a bit more flexible and they contain more information.
This is an example of an ELF file:
#Date: 12-Jan-1996 00:00:00
#Fields: time cs-method cs-uri
00:34:23 GET /foo/bar.html
12:21:16 GET /foo/bar.html
12:45:52 GET /foo/bar.html
12:57:34 GET /foo/bar.html
The (#) sign indicates the start of a directive. The following directives are defined:
● Version - the version of the Extended Log file format used.
● Fields - which fields are recorded in the log.
● Software - the software that generated the log.
● Start-Date - the exact date and time when the log was started.
● End-Date - the exact date and time when the log was finished.
● Date - the exact date and time when the log was added.
● Remark - Comments. These are ignored by log management tools and similar log file analysis software.
W3C Log File
The W3C Extended Log Format is a customizable format used by the Microsoft Internet Information Server (IIS) versions 4.0 and 5.0.
Since it is customizable, you can add or omit different fields according to your needs and preferences, which can increase or decrease the size of the file. Properly archiving data logs is an important part of log management and is essential for system administrators, cybersecurity experts, not to mention - auditing procedures and compliance standards.
W3C Extended Logging Fields
W3C log file example
#Software: Microsoft Internet Information Services 4.0 #Version: 1.0 #Date: 2002-12-12 19:12:42
#Fields: time c-ip cs-method cs-uri-stem sc-status cs-version
19:12:42 172.16.255.255 GET /default.htm 500 HTTP/1.0
● #Software - the software that was involved in the generation of the log.
● #Version - indicates that the W3C logging format 1.0 was used here.
● #Date - the exact date and time when the entry was added.
● #Fields - Time, Client IP Address, Method, URI Stem, HTTP Status, and the HTTP Version.
● 19:12:42 172.16.255.255 GET /default.htm 200 HTTP/1.0 - at 19:12:42 UTC (Greenwich Mean Time), the user with the IP address of 172.16.255.255 and HTTP version 1.0 issued an HTTP GET command for the file Default.htm, but the request was denied with a 500 Internal Server Error warning.
Microsoft IIS (Internet Information Server) Log File
The Microsoft IIS (Internet Information Server) is another fixed log file format. It includes more information than the NCSA Common log format. While it records the usual data such as the user’s name, IP address, the date and time when the request took place, it also has additional information - like how long the processing time of the request took (in milliseconds).
Here is how the Microsoft IIS log file looks when you open it in a word processing program:
192.168.114.201, -, 03/20/01, 7:55:20, W3SVC2, SALES1, 172.21.13.45, 4502, 163, 3223, 200, 0, GET, /DeptLogo.gif, -,
172.16.255.255, anonymous, 03/20/01, 23:58:11, MSFTPSVC, SALES1, 172.16.255.255, 60, 275, 0, 0, 0, PASS, /Intro.htm, -,
● 192.168.114.201 - the user’s IP address
● - indicates that the user is anonymous
● 03/20/01 - the date
● 7:55:20 - the time
● W3SVC2 - Service and Instance
● SALES1 - the name of the computer
● 172.21.13.45 - the IP address of the server
● 4502 - Time taken in milliseconds
● 163 - how many bytes were received
● 3223 - how many bytes were sent back
● 200 - Service Status Code
● 0 - Windows NT/2000 Status Code
● GET - request type
● /DeptLogo.gif - the operation’s target
Note that the fields are separated from each other with a comma (,) and that the hyphen (-) is used whenever a field doesn’t have a valid value available to it.
Open Database Connectivity (ODBC) Log File
The ODBC is the logging format of a fixed set of data fields that are compliant with an Open Database Connectivity (ODBC) database, like the Microsoft Access or Microsoft SQL Server.
ODBC logging is a bit more complex than most types of logging, and requires some tinkering. You have to both specify the database that you want to be logged to, and you have to manually set up the database table in order to receive the log data.
There is a SQL template file included in the IIS (Internet Information Server) that must be run in a SQL database. This file, named “Logtemp.sql” is, by default, found in this location:
This file is then used the following table:
After making this table, you have to also create a DSN (Data Source Name) that the ODBC will then use to locate the database.
The final step is to provide the IIS with the name of the database and this table. If the database is protected with a username and password, you will also have to specify them in the IIS as well.
Most Common Log Files – GELF
The GELF, short for Graylog Extended Log Format is Graylog’s own log file format. The GELF was developed with the express aim to fix the shortcomings of the classic Syslog and to take full advantage of all the many features and capabilities of the Graylog tool.
By itself, Syslog is limited to 1024 bytes in length, and UDP (User Datagram Protocol) datagrams can’t go over 8192 bytes. That is why the GELF supports chunking. You can chunk your messages by prepending a byte header to a GELF message and then transport these logs via UDP, TCP (Transmission Control Protocol), and sometimes via HTTP.
There is also the option to save on network bandwidth by somewhat increasing your CPU usage - select if you want to send messages in an uncompressed, GZIP’d or ZLIB’d format and Graylog will do the rest.
Every GELF log message contains the following fields:
● The host (the creator of the message)
● The timestamp
● The version
● The long and short versions of the message
● Several other custom fields you can freely configure to your own preferences
An example GELF file:
"short_message": "A short message that helps you identify what is going on",
"full_message": "Backtrace here\n\nmore stuff",
These are the most widely used log format files used across the world. Every one of them has their own specifications, good points, and drawbacks. Which one you will use depends on factors such as your software and hardware setup, and the needs of your company, but Graylog excels at analyzing and archiving each and every one of them.
We hope that this article will help you to prevent issues that can arise between different log file formats, and understand how they function and how they are created.