Giter Club home page Giter Club logo

fluent-plugin-documentdb's Introduction

Azure DocumentDB output plugin for Fluentd

fluent-plugin-documentdb is a fluent plugin to output to Azure DocumentDB

fluent-plugin-documentdb overview

[NEWS] From fluent-plugin-documentdb-0.2.0, it supports partitioned collections, not only single-partition collections (See Partitioning and scaling in Azure DocumentDB for partitioned collections and single-partition collection ).

Requirements

fluent-plugin-documentdb fluentd ruby
>= 0.3.0 >= v0.14.15 >= 2.1
< 0.3.0 >= v0.12.0 >= 1.9

Installation

$ gem install fluent-plugin-documentdb

Configuration

DocumentDB

To use Microsoft Azure DocumentDB, you must create a DocumentDB database account using either the Azure portal, Azure Resource Manager templates, or Azure command-line interface (CLI). In addition, you must have a database and a collection to which fluent-plugin-documentdb writes event-stream out. Here are instructions:

Fluentd - fluent.conf

<match documentdb.*>
    @type documentdb
    @log_level info
    docdb_endpoint  DOCUMENTDB_ACCOUNT_ENDPOINT
    docdb_account_key DOCUMENTDB_ACCOUNT_KEY
    docdb_database  mydb
    docdb_collection mycollection
    auto_create_database true
    auto_create_collection true
    partitioned_collection true 
    partition_key PARTITION_EKY
    offer_throughput 10100
    time_format %s
    localtime false
    add_time_field true
    time_field_name time
    add_tag_field true
    tag_field_name time
</match>
  • docdb_endpoint (required) - Azure DocumentDB Account endpoint URI
  • docdb_account_key (required) - Azure DocumentDB Account key (master key). You must NOT set a read-only key
  • docdb_database (required) - DocumentDB database nameb
  • docdb_collection (required) - DocumentDB collection name
  • auto_create_database (optional) - Default:true. By default, DocumentDB database named docdb_database will be automatically created if it does not exist
  • auto_create_collection (optional) - Default:true. By default, DocumentDB collection named docdb_collection will be automatically created if it does not exist
  • partitioned_collection (optional) - Default:false. Set true if you want to create and/or store records to partitioned collection. Set false for single-partition collection
  • partition_key (optional) - Default:nil. Partition key must be specified for paritioned collection (partitioned_collection set to be true)
  • offer_throughput (optional) - Default:10100. Throughput for the collection expressed in units of 100 request units per second. This is only effective when you newly create a partitioned collection (ie. Both auto_create_collection and partitioned_collection are set to be true )
  • localtime (optional) - Default:false. By default, time record is inserted with UTC (Coordinated Universal Time). This option allows to use local time if you set localtime true
  • time_format (optional) - Default:%s. Time format for a time field to be inserted. Default format is %s, that is unix epoch time. If you want it to be more human readable, set this %Y%m%d-%H:%M:%S, for example.
  • add_time_field (optional) - Default:true. This option allows to insert a time field to record
  • time_field_name (optional) - Default:time. Time field name to be inserted
  • add_tag_field (optional) - Default:true. This option allows to insert a tag field to record
  • tag_field_name (optional) - Default:tag. Tag field name to be inserted

[note] @log_level is a fluentd built-in parameter (optional) that controls verbosity of logging: fatal|error|warn|info|debug|trace (See also Logging of Fluentd)

Configuration examples

fluent-plugin-documentdb will add id attribute which is UUID format and any other attributes of record automatically. In addition, it will add time and tag attributes if add_time_field and add_tag_field are true respectively. Please see 2 types of the plugin configurations example below - single-parition collection and partitioned collection. Source for fluentd to read is apache access log.

(1) Single-Partition Collection Case

fluent.conf

<source>
    @type tail                          # input plugin
    path /var/log/apache2/access.log   # monitoring file
    pos_file /tmp/fluentd_pos_file     # position file
    format apache                      # format
    tag documentdb.access              # tag
</source>

<match documentdb.*>
    @type documentdb
    docdb_endpoint https://yoichikademo.documents.azure.com:443/
    docdb_account_key Tl1xykQxnExUisJ+BXwbbaC8NtUqYVE9kUDXCNust5aYBduhui29Xtxz3DLP88PayjtgtnARc1PW+2wlA6jCJw==
    docdb_database mydb
    docdb_collection my-single-partition-collection
    auto_create_database true
    auto_create_collection true
    partitioned_collection true 
    localtime true
    time_format %Y%m%d-%H:%M:%S
    add_time_field true
    time_field_name time
    add_tag_field true
    tag_field_name tag
</match>

(2) Partitioned Collection Case

fluent.conf

<source>
    @type tail                          # input plugin
    path /var/log/apache2/access.log   # monitoring file
    pos_file /tmp/fluentd_pos_file     # position file
    format apache                      # format
    tag documentdb.access              # tag
</source>

<match documentdb.*>
    @type documentdb
    docdb_endpoint https://yoichikademo.documents.azure.com:443/
    docdb_account_key Tl1xykQxnExUisJ+BXwbbaC8NtUqYVE9kUDXCNust5aYBduhui29Xtxz3DLP88PayjtgtnARc1PW+2wlA6jCJw==
    docdb_database mydb
    docdb_collection my-partitioned-collection
    auto_create_database true
    auto_create_collection true
    partitioned_collection true 
    partition_key host
    offer_throughput 10100
    localtime true
    time_format %Y%m%d-%H:%M:%S
    add_time_field true
    time_field_name time
    add_tag_field true
    tag_field_name tag
</match>

Sample inputs and expected records

An expected output record for sample input will be like this:

Sample Input (apache access log)

125.212.152.166 - - [17/Jan/2016:05:03:25 +0000] "GET /foo/bar/test.html HTTP/1.1" 304 179 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"

Output Record

{
    id :  d2b2ece8-b948-41ae-a894-0ed1266e242a,
    host :  125.211.152.166,
    user :  -,
    method :  GET,
    path :  /foo/bar/test.html,
    code :  304,
    size :  179,
    referer :  -,
    agent :  Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36,
    time :  20160117-05:03:25,
    tag :  documentdb.access
}  

Tests

Running test code

$ git clone https://github.com/yokawasa/fluent-plugin-documentdb.git
$ cd fluent-plugin-documentdb

# edit CONFIG params of test/plugin/test_documentdb.rb 
$ vi test/plugin/test_documentdb.rb

# run test 
$ rake test

Creating package, running and testing locally

$ rake build
$ rake install:local
 
# running fluentd with your fluent.conf
$ fluentd -c fluent.conf -vv &
 
# send test apache requests for testing plugin ( only in the case that input source is apache access log )
$ ab -n 5 -c 2 http://localhost/foo/bar/test.html

TODOs

Change log

Links

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yokawasa/fluent-plugin-documentdb.

Copyright

CopyrightCopyright (c) 2016- Yoichi Kawasaki
LicenseApache License, Version 2.0

fluent-plugin-documentdb's People

Contributors

cosmo0920 avatar okkez avatar yokawasa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fluent-plugin-documentdb's Issues

Fix: CVE-2020-8130 Moderate severity

CVE-2020-8130

moderate severity
Vulnerable versions: <= 12.3.2
Patched version: 12.3.3
here is an OS command injection vulnerability in Ruby Rake before 12.3.3 in Rake::FileList when supplying a filename that begins with the pipe character |.

How to dynamically retrieve match tag

Hello @yokawasa,

Thank you for maintaining this plugin.

Some advice on how to achieve the following use case would be appreciated.

What I am tryint to achieve is, to somehow capture the content of "*" in the <match> tag, and to apply it to docdb_collection, such as the following.

<match tenant.*> # <------ Want to access *
  @type documentdb
  @log_level info
  docdb_endpoint https://abc.documents.azure.com:443
  docdb_account_key abc==
  docdb_database tenants
  docdb_collection ${tag} # <------ To use here...

Above example did not work, and resulted in creating a container called ${tag} in my Cosmos DB instance.

I was able to achieve this by using fluent-plugin-forest, however, the author of the plugin mentions that:

NOTE: This plugin will not be updated: Use Fluentd v0.14 native API to handle tags.

Is there a way to achieve this in fluent-plugin-documentdb natively?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.