Using the ELK Stack for log file analysis

The ELK stack is a combination of elasticsearch, logstash and kibana for managing log files in a distributed environment. This post contains an simple example for getting started with this tool stack.

As part of this introduction I created a small image that contains all the code and examples used. The image can be found here.

Logfiles are often the basis for metrics like performance. In a distribute environment it becomes increasingly complex to to get an overall picture about the overall system performance, since log files are distributed as well.

The ELK stack offers a solution for getting those metrics in the following way:

  • Logstash is used to collect and parse log files on the individual nodes of the system and send the data collected to elastic search
  • Elasticsearch is the database for storing and searching / analyzing the data from the log files.
  • Finally kibana is a powerful and easy to use gui for visualizing the data.

Basic setup

In this example I have s single node system consisting of the an apache httpd and a apache tomcat web container. As an example web application the tomcat web gui and host manager are used. Apache httpd is configured to write the default access log and the forensic log:

Tomcat has a custom logging pattern for the access log:

The goal is to correlate these log files using the session id which is part of all three log files and eventually being able to get the user name which is only logged only in tomcat.

Installation

For httpd installation i used this great cookbook with some custom templates to configure logging. There is a very good chef cookbook available for installing the elk stack. Unfortunately I could not use it because of proxy issues with elasticsearch. So I opted to write my own recipes, which wasn’t so hard after all. It includes a basic installation and configuration of the three tools.

Configuration

The first thing to to is to configure logstash to collect and parse the log files. The content of the example configuration file for logstash looks like this and the template for the config file can be found here:

The input section of the file tells logstash where to look for the log files. In this example we have the two apache logs and the tomcat one.

The input section of the file tells logstash where to look for the log files. In this example we have the two apache logs and the tomcat one.

The biggest part of the file is the filter part. Since three types of logfiles are parsed it contains three sections for each log file. Let’s look at the tomcat log for details:

The if conditions is used to apply the following parsing only to tomcat log files. The following mutate directive tells logstash to set the type of the log information to “tomcat”. This is important to distinguish the logfiles later when analyzing with elasticsearch.

The following grok closures use regular expressions to extract data from the log file and assign it to a “tag”. Grok has predefined patterns for eample for the ip address. The user name is extracted using a custom pattern. Detailed information about grok and it’s predefined patterns can be found here. There is even a debugger online for testing your expressions.

The last section of the configuration file is the output section:

In this case the output is sent to standard out as well as the elasticsearch instance on the local host.

Running

To start logstash collecting log information simply run:

Testing

To test the collection of log files simply navigate to the tomcat start page and try to login using the credentials “admin” and password “admin”.

The output of logstash displayed in the console should look something like this:

Now go the kibana start page here.

Go to the settings tab and configure and index pattern as show below

logstash1

This basically creates an index pattern for kibana to search in. Important: Select the right timestamp field.

Now go to discover and issue some queries for the log files. You should be able to see the session id and user name correlated like displayed below:

logstash3 logstash2

Conclusion

Although this example is very simple it nevertheless shows how powerful the elk stack is. The charm about this tool suite is that it is easy to set up, lightweight, easy to configure and works with rest/http and json.

You have to be rigorous with logging configuration though. Keep in mind that any change to the logging configuration of the systems involved might break the whole analysis. Therefore it is important to keep logging configuration and logstash configuration as close as possible, since these types of configurations are interrelated.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">