Peter Marcely15.05.202130 minutes
In this tutorial, we are going to learn how to push application logs from our Django application to Elasticsearch storage and have the ability to display it in a readable way in Kibana web tool. The main aim of this article is to establish a connection between our Django server and ELK stack (Elasticsearch, Kibana, Logstash) using another tool provided by Elastic - Filebeat. We will also briefly cover all preceding steps, such as the reasoning behind logging, configuring logging in Django and installing ELK stack.
Having reasonable logging messages in the production helped me discover several non-trivial bugs that would otherwise be undiscoverable. It is also a good practice to use logging messages in the local environment to speed up the development, enabling these messages to stay there for production use.
In order to start logging, just add the following lines at the top of your file. The variable __name__ will be translated into the name of the module that will also appear in the final log messages.
import logging logger = logging.getLogger(__name__)
Another part of the log structure is the log level. Log level helps us identify the severity of the message and makes it easier to navigate in the log output. There are five levels that can be used for log messages. My usage of the logging levels is as follows:
In order to create a logging message, we need to call a method of the logger object. There are corresponding names to the logging levels.
One addition to those is an exception that corresponds to the error level but will also add a traceback, which is very handy, especially with having an integrated error monitoring tool like Sentry.
# DEBUG logger.debug('Email to user id={} sent'.format(user.id)) # INFO logger.info('Payment transaction finished with status={}'.format(payment.status)) # WARNING logger.warning('Referer {} is not in allow list'.format(ip) # ERROR logger.error( 'Got status code={}, error message={} while sending request to external API'.format( response.status_code, response.text ) ) # EXCEPTION try: print(b) except: logger.exception('Exception happened.')
Now, we need to set up the output of our log. The most common use cases are output on the screen (e.g. when you run manage.py runserver) or output to the log file. We want to go one step further and get the data into Elasticsearch storage in order to have a user-friendly web UI (Kibana) that allows easy search and filter options to access the logs.
When I did the first research, I found a couple of examples of integration with python-logstash and python-json-formatter, but it eventually did not work with python3. It also felt like a lot of hassle to get things going when there is an option to use a tool called Filebeat by Elastic. Filebeat monitors changes in the log file and sends all new records to the Elasticsearch storage through a parser called Logstash. Unfortunately, there is one condition that requires us to have full control over our instance(s), which is used to serve our Django application. If not, we need to find a way of using a service to stream logs to our ELK server or fall back to python-logstash.
Since we are going to use Filebeat, all we need to achieve is storing the data in a reasonable format in log rotated files. Besides that, I recommend using Sentry for managing our errors. We will set up our Django logger config to send error and exception log messages to Sentry.
This is a sample setup of logging in settings.py. For more information, you can also refer to Django documentation.
LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'simple': { 'format': '[%(asctime)s] %(levelname)s|%(name)s|%(message)s', 'datefmt': '%Y-%m-%d %H:%M:%S', }, }, 'handlers': { 'applogfile': { 'level': 'DEBUG', 'class': 'logging.handlers.RotatingFileHandler', 'filename': '/webapps/myproject/logs/django/myproject.log', 'maxBytes': 1024*1024*15, # 15MB 'backupCount': 10, 'formatter': 'simple', }, 'console': { 'level': 'DEBUG', 'class': 'logging.StreamHandler', 'formatter': 'simple' } }, 'loggers': { 'app1': { 'handlers': ['applogfile', 'console'], 'level': 'DEBUG', }, 'app2': { 'handlers': ['applogfile', 'console'], 'level': 'DEBUG', } } }
The log format is composed to be easily parsable by Logstash. The following messages will start appearing if everything is set correctly:
[2019-11-01 02:59:02] DEBUG|users.forms|Captcha valid for user_id=443 [2019-11-01 09:53:48] ERROR|jobs.views|Errror while redirecting the job=3232 [2019-11-01 10:20:31] INFO|newsletter.services.newsletter_service|Newsletter 55351 added keyword. [2019-11-01 12:42:09] INFO|newsletter.services.newsletter_service|Keywords successfully loaded for newsletter 60473.
I have already mentioned these terms a couple of times. What we need is the ELK infrastructure - Elasticsearch for storing data, Logstash for parsing and streaming data from our log files to Elasticsearch, and Kibana for data visualization. We can either choose a cloud-based solution (e.g. Elastic, logz.io, AWS Elasticsearch or go with the self-hosted solution. It all depends on the use case, for a hobby project or if you have DevOps capacity, I would go for a self-hosted solution. In the case of a commercial project, a cloud-based solution is a better choice. Elastic is progressing really fast with its releases, prepare yourself for a lot of maintenance work.
For the purpose of this article, I have prepared a self-hosted solution that is more demanding to install but will give you more detail on how things work inside the ELK infrastructure. Furthermore, this ELK server is further used for our micro startup called Working Nomads, so it does not mean all energy was wasted for nothing.
There is a $20 droplet with 4gb memory used in this example which is a minimum requirement for a server that has all the components of ELK installed. Your requirements may be much more demanding and you will need a whole cluster of several instances running a single component, but in this case, we are going to demonstrate the solution on this bare minimum server.
I am not going to go into details with the installation, I will just share the tutorials I have used and give a couple of hints to overcome the issues with which I struggled.
Install ELK on Ubuntu 18.04 based on this example. I have used Java 11, and instead of the ELK 6.x version, I have used ELK 7.4 version. Make sure you install certificates for your Kibana instance. Now, you should be able to access the Kibana dashboard, we will configure it later.
Now it is time to feed our Elasticsearch with data. To achieve that, we need to configure Filebeat to stream logs to Logstash and Logstash to parse and store processed logs in JSON format in Elasticsearch. At this moment, we will keep the connection between Filebeat and Logstash unsecured to make the troubleshooting easier. Later in this article, we will secure the connection with SSL certificates.
We need to configure the port which will be used by Logstash to listen to Filebeat. Furthermore, we need to configure a grok regex to convert the log message to a structured JSON. A new index file will be created every day.
Sample configuration in /etc/logstash/conf.d/logstash.conf
input { beats { port => 5044 ssl => false } } filter { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}] %{LOGLEVEL:loglevel}\|%{GREEDYDATA:module}\|%{GREEDYDATA:content}" } } date { locale => "en" match => [ "timestamp", "YYYY-MM-dd HH:mm:ss"] target => "@timestamp" timezone => "America/New_York" } } output { elasticsearch { hosts => "localhost:9200" manage_template => false index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}" } }
Now, restart Logstash and ensure the Filebeat input is working by checking the Logstash logs.
If you need to match a different pattern with grok regex, I recommend using Grok debugger to find out what you actually need.
Now, we need to log in to the instance that serves the Django application to install and configure Filebeat to monitor the log files and stream them to Logstash. Based on the previous part of this tutorial, there is a log being generated and rotated in /webapps/myproject/logs/django/. You will see files like these in the mentioned directory:
myproject.log myproject.log.1 myproject.log.2
Filebeat is capable to manage these rotated log files out of the box so no extra work is needed.
Configuration in /etc/filebeat/filebeat.yml
filebeat.inputs: - type: log enabled: true paths: - /webapps/myproject/logs/django/* output.logstash: hosts: ["ip_or_hostname_of_your_ELK:5044"]
You can now run Filebeat in a debug mode with this command:
filebeat -e -d "*"
Ideally, what we are aiming for is ensuring that Filebeat is sending the log messages to Logstash, such as:
Publish event: { "@timestamp": "2019-11-14T12:36:15.340Z", "@metadata": { "beat": "filebeat", "type": "_doc", "version": "7.4.2" }, "ecs": { "version": "1.1.0" }, "host": { "name": "django-host" }, "agent": { "hostname": "django-host", "id": "e992e850-bedd-48b2-a885-31c5a853e34e", "version": "7.4.2", "type": "filebeat", "ephemeral_id": "bd808730-a3a2-452b-a2ea-4ee6bbd55c7f" }, "message": "[2019-11-14 17:36:11] DEBUG|app1.utils|Sample debug message", "log": { "offset": 2177175, "file": { "path": "/webapps/myproject/logs/django/myproject.log" } }, "input": { "type": "log" } }
If the push from Filebeat to Logstash is successful, we can turn off the command and run it as a service. Make sure it runs at startup after the machine is rebooted.
Based on our use case, we should set the time period for which the logs are kept. Elastic comes with another tool called Curator. Follow this tutorial to install it, for a newer version of Elasticsearch you need to install it via pip, otherwise, Curator will not be compatible with Elasticsearch. Here is the configuration of the action file which deletes all indices older than 45 days. It works right away, you only need to add the configuration file to /home/user/.elasticsearch/ and change the disable_action flag to False.
You can test your configuration with a dry run.
curator --dry-run curator_action.yml
And then run it daily with cron job.
0 1 * * * user /usr/local/bin/curator /home/user/.curator/curator_action.yml >> /var/log/curator.log 2>&1
Now it is time to look into Kibana and see if the data is there. At this point, Kibana will probably offer you a way to configure your index pattern, if not, navigate to Settings > Kibana > Index Patterns and add index pattern "filebeat-*".
In the second step select @timestamp as Time filter field.
Now go to the Discover area of Kibana, select an appropriate time range and finally, you should see the logs coming in. Select the fields you have parsed through the grok regex - loglevel, module, and content and you will see the log feed.
This is a bare minimum of the configuration, we are going to add some handy features and encrypt the communication between Kibana and Filebeat.
When the exception occurs, we will get a multiline log message with the traceback. We are using the square bracket as the beginning of our log messages and Filebeat can be configured to recognize that. See the enhanced configuration file.
filebeat.inputs: - type: log enabled: true paths: - /webapps/myproject/logs/django/* multiline.pattern: '^\[' multiline.negate: true multiline.match: after output.logstash: hosts: ["ip_or_hostname_of_your_ELK:5044"]
Multiline messages like these will start to pop up in your logs.
For more information about multiline examples, you can consult the official documentation.
Currently, the connection between Filebeat and Logstash is unsecured which means logs are being sent unencrypted. It is heavily recommended to set up SSL certificates to make the connection secure and also to ensure that Logstash will only accept data from trusted Filebeat instances. To set this up, follow these tutorials: #1 and #2. You can also consult the official article, but it does not say how to generate the certificates.
The only pitfall I experienced following those tutorials was setting the ownership of the server.key to the Logstash user.
These are the final configuration files for Filebeat and Logstash:
input { beats { port => 5044 ssl => true ssl_certificate_authorities => ["/etc/logstash/ca.crt"] ssl_certificate => "/etc/logstash/server.crt" ssl_key => "/etc/logstash/server.key" ssl_verify_mode => "force_peer" } } filter { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}] %{LOGLEVEL:loglevel}\|%{GREEDYDATA:module}\|%{GREEDYDATA:content}" } } date { locale => "en" match => [ "timestamp", "YYYY-MM-dd HH:mm:ss"] target => "@timestamp" timezone => "America/New_York" } } output { elasticsearch { hosts => "localhost:9200" manage_template => false index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}" } }
filebeat.inputs: - type: log enabled: true paths: - /webapps/yourproject/logs/django/* ### Multiline options multiline.pattern: '^\[' multiline.negate: true multiline.match: after #----------------------------- Logstash output -------------------------------- output.logstash: # The Logstash hosts hosts: ["ip_or_hostname_of_your_ELK:5044"] # Optional SSL. By default is off. # List of root certificates for HTTPS server verifications ssl.certificate_authorities: ["/etc/filebeat/ca.crt"] # Certificate for SSL client authentication ssl.certificate: "/etc/filebeat/client.crt" # Client Certificate Key ssl.key: "/etc/filebeat/client.key"
Kibana offers features to make the interface simple to use. First of all, we can create searches that can be easily accessed. Imagine we have multiple instances reporting to our ELK server, we can configure the search based on filtering host.name and loglevel and save it.
We can create a list of searches, it is possible to add more filtering parameters based on your preference. We can then quickly select the search we need.
We can create visualizations based on our log messages. Since we have set up data retention, the log messages are not going to stay forever, it is thus not suitable for analytics. Though, it can help us quickly see a drop in a certain metric that will indicate that some functionality stopped working.
We can put multiple visualizations together and get simple monitoring of our systems.