Collecting logs into Azure DocumentDB using fluent-plugin-documentdb

In this article, I’d like to introduces a solution to collect logs and store them into Azure DocumentDB using fluentd and its plugin, fluent-plugin-documentdb.

Azure DocumentDB is a managed NoSQL database service provided by Microsoft Azure. It’s schemaless, natively support JSON, very easy-to-use, very fast, highly reliable, and enables rapid deployment, you name it. Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data. fluent-plugin-documentdb is fluentd output plugin that enables to store event collections into Azure DocumentDB.

This article shows how to

  • Collect Apache httpd logs across web servers
  • Ship the collected logs into the aggregator Fluentd in near real-time
  • Store the collected logs into DocumentDB
  • Utilize the collected log data stored on Document DB for advanced scenarios – like archiving the data to Azure Storage, doing Big data analysis, visualizing the data with PowerBI, and so forth
  • fluentd-azure-documentdb

    Pre-requisites

    • A basic understanding of fluentd – if you’re not familiar with fluentd, fluentd quickstart guide is good starting point
    • Azure subscription – you need to have Azure subscription that grants you access to Microsoft Azure services, and under which you can create DocumentDB cluster. If you don’t have yet click here to create it

    Setup: Azure DocumentDB

    To use Azure DocumentDB, you must create a DocumentDB database account using either the Azure portal, Azure Resource Manager templates, or Azure command-line interface (CLI). In addition, you must have a database and a collection to which fluent-plugin-documentdb writes event-stream out. Here are instructions:

    Setup: Fluentd Aggregator

    First of all, install Fluentd. The following shows how to install Fluentd using Ruby gem packger but if you are not using Ruby Gem for the installation, please refer to this installation guide where you can find many other ways to install Fluentd on many platforms.

    # install fluentd
    sudo gem install fluentd --no-ri --no-rdoc

    # create fluent.conf
    fluentd --setup <directory-path-to-fluent-conf>

    Also, install fluent-plugin-documentdb for fluentd aggregator to store collected logs data into Azure DocumentDB.

    sudo gem install fluent-plugin-documentdb

    Next, configure fluent.conf, a fluentd configuration file as follows. Please see this for fluent-plugin-documentdb configuration.

    # Receive events from 24224/tcp
    # This is used by log forwarding and the fluent-cat command
    <source>
        @type forward
        port 24224
    </source>

    # Store Data in DocumentDB
    <match apache.access>
        @type documentdb
        docdb_endpoint https://yoichikademo.documents.azure.com:443/
        docdb_account_key Tl1+ikQtnExxxUisJ+BXwbbaC8NtUqYVE9kUDXCNust5aYBduhui29Xtxz3DLP88PayjtgtnARc1PW+2wlA6jCJw==  (dummy)
        docdb_database LogDB
        docdb_collection Collection1
        auto_create_database true
        auto_create_collection true
        time_format %Y%m%d-%H:%M:%S
        localtime true
        add_time_field true
        time_field_name time
        add_tag_field true
        tag_field_name tag
    </match>

    Regarding he port number of the aggregator host above, the default is 24224. Note that both TCP packets (event stream) and UDP packets (heartbeat message) are sent to this port, which mean you need to open both TCP and UDP for this port if you have access controls between forwarders and aggregator. Please see the forward Output plugin article to understand more about forward plugin.

    Finally, run fluentd with specifiying fluent.conf that you configurea above.

    fluentd -c ./fluent.conf -vv &

    Setup: Fluentd Forwarders

    First, to set up Fluentd, run the following command to setup Fluentd. Again If you are not using Ruby Gem for the installation, please refer to the installation document.

    # install fluentd
    sudo gem install fluentd --no-ri --no-rdoc

    # create fluent.conf
    fluentd --setup <directory-path-to-fluent-conf>

    Then, give Fluentd a read access to servers’log files.

    sudo chmod og+rx /var/log/apache2
    sudo chmod og+r /var/log/apache2/*

    Next, configure fluent.conf, a fluentd configuration file as follows to tail apache access logs and forard event to aggregator

    # Apache Access Logs
    <source>
        @type tail
        path /var/log/apache2/access.log   # monitoring file
        pos_file /tmp/fluentd_pos_file     # position file
        format apache                      # format
        tag apache.access                  # tag
    </source>

    # Forward data to the aggregator
    <match apache.access>
        @type forward
        buffer_type memory
        buffer_chunk_limit 8m
        buffer_queue_limit 64
        flush_interval 1s
        <server>
            host  <Aggregator's hostname or IP>
            port 24224
        </server>
        <secondary>
            @type file
            path /var/log/fluentd/forward-failed
        </secondary>
    </match>

    Finally, start Fluentd with the configuration above to start log collections

    fluentd -c ./fluent.conf -vv &

    TEST

    Let’s check if logs will be forwarded from apache nodes to aggegator and ultimately stored in documentdb. First, create log events by sending test requests to web servers somehow (here using apache bench for example)

    ab -n 5 -c 2 http://<targetserver>/foo/bar/test.html

    If logs are collected successfully, you can see the logs stored in DocumentDB easily by using Document DB’s query explorer. Go to Azure Portal > Display your DocumentDB dashboard > Query Explorer.

    DocumentDB-QueryExplorer

    More log collections from external sources

    Fluentd’s Input plugins extend Fluentd to retrieve and pull event logs from external sources. An input plugin typically creates a thread socket and a listen socket. It can also periodically pull data from data sources. For examples, listening syslog events, tailing apache/nginx logs, and pulling data from well-known RDBMS/NoSQLs on-premices or on public cloud. See http://www.fluentd.org/plugins

    Setup for Advanced senarios

    Happy log collections with Azure DocumentDB and fluentd!!

    END

1 comment

Leave a Reply

Your email address will not be published. Required fields are marked *