Giter Club home page Giter Club logo

aliyun-odps-fluentd-plugin's Introduction

Aliyun ODPS Plugin for Fluentd

Getting Started


Introduction

  • ODPS-Open Data Processing Service is a massive data processing platform designed by alibaba.
  • DHS-ODPS DataHub Service is a service in Odps, which provides real-time upload and download functions for user.

Requirements

To get started using this plugin, you will need these things:

  1. Ruby 2.1.0 or later
  2. Gem 2.4.5 or later
  3. Fluentd-0.10.49 or later (Home Page)
  4. Protobuf-3.5.1 or later(Ruby protobuf)
  5. Ruby-devel

Install the Plugin

install the plugin from gem:

$ gem install fluent-plugin-aliyun-odps

ODPS Fluentd plugin now is available. Following is a simple example of how to write ODPS output configuration.

<source>
   type tail
   path /opt/log/in/in.log
   pos_file /opt/log/in/in.log.pos
   refresh_interval 5s
   tag in.log
   format /^(?<remote>[^ ]*) - - \[(?<datetime>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "-" "(?<agent>[^\"]*)"$/
   time_format %Y%b%d %H:%M:%S %z
</source>
<match in.**>
  type aliyun_odps
  aliyun_access_id ************
  aliyun_access_key *********
  aliyun_odps_endpoint http://service.odps.aliyun.com/api
  aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
  buffer_chunk_limit 2m
  buffer_queue_limit 128
  flush_interval 5s
  project your_projectName
  enable_fast_crc true
  data_encoding UTF-8
  <table in.log>
	table your_tableName
	fields remote,method,path,code,size,agent
	partition ctime=${datetime.strftime('%Y%m%d')}
	time_format %d/%b/%Y:%H:%M:%S %z
	shard_number 1
    retry_time 3
    retry_interval 1
    abandon_mode true
  </table>
</match>

Parameters

  • type(Fixed): always be aliyun_odps.
  • aliyun_access_id(Required):your aliyun access id.
  • aliyun_access_key(Required):your aliyun access key.
  • aliyun_odps_hub_endpoint(Required):if you are using ECS, set it as http://dh-ext.odps.aliyun-inc.com, otherwise using http://dh.odps.aliyun.com.
  • aliyunodps_endpoint(Required):if you are using ECS, set it as http://odps-ext.aiyun-inc.com/api, otherwise using http://service.odps.aliyun.com/api .
  • buffer_chunk_limit(Optional):chunk size,“k” (KB), “m” (MB), and “g” (GB) ,default 8MB,recommended number is 2MB, max size is 20MB.
  • buffer_queue_limit(Optional):buffer chunk size,example: buffer_chunk_limit2m,buffer_queue_limit 128,then the total buffer size is 2*128MB.
  • flush_interval(Optional):interval to flush data buffer, default 60s.
  • abandon_mode(Optional):drop pack after retry 3 times.
  • project(Required):your project name.
  • table(Required):your table name.
  • fields(Required): must match the keys in source.
  • partition(Optional):set this if your table is partitioned.
    • partition format:
      • fix string: partition ctime=20150804
      • key words: partition ctime=${remote}
      • key words int time format: partition ctime=${datetime.strftime('%Y%m%d')}
  • time_format(Optional):
    • if you are using the key words to set your and the key word is in time format, please set the param <time_format>. example: source[datetime] = "29/Aug/2015:11:10:16 +0800", and the param <time_format> is "%d/%b/%Y:%H:%M:%S %z"
  • shard_number(Optional): will write data to shards between [0,shard_number-1], this config must more than 0 and less than the max shard number of your table.
  • enable_fast_crc(Optional): use fast crc.so to calculate crc, this will improve speed up a lot, but this is not supported in some os.
  • retry_time(Optional): retry times when exception happens for each pack, default 3.
  • retry_interval(Optional): interval for retry, default 1s.
  • abandon_mode(Optional): default false. Setting this to true will abandon pack data after @retry_time, otherwise will raise a exception to fluentd and use fluentd's retry, but this may cause duplicated data.
  • data_encoding(Optional): default will use encoding in your source string(string.encoding), but if your actual encoding and string.encoding not match, you should set this setting to format your source string, supported type: "US-ASCII", "ASCII-8BIT", "UTF-8", "ISO-8859-1", "Shift_JIS", "EUC-JP", "Windows-31J", "BINARY", "CP932", "eucJP"

Useful Links


Authors && Contributors


License


licensed under the Apache License 2.0

aliyun-odps-fluentd-plugin's People

Contributors

hongbosoftware avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aliyun-odps-fluentd-plugin's Issues

uninitialized constant OdpsDatahub::CrcCalculator::Error

2016-06-17 13:28:23 +0800 [error]: unexpected error error_class=RuntimeError error="uninitialized constant OdpsDatahub::CrcCalculator::Error\nDid you mean? KeyError\n IOError\n Errno"
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluent-plugin-aliyun-odps-0.1.6/lib/fluent/plugin/out_aliyun_odps.rb:328:in rescue in start' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluent-plugin-aliyun-odps-0.1.6/lib/fluent/plugin/out_aliyun_odps.rb:325:instart'
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/compat/call_super_mixin.rb:42:in start' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/root_agent.rb:138:inblock in start'
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/root_agent.rb:127:in block (2 levels) in lifecycle' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/root_agent.rb:126:ineach'
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/root_agent.rb:126:in block in lifecycle' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/root_agent.rb:113:ineach'
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/root_agent.rb:113:in lifecycle' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/root_agent.rb:137:instart'
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/engine.rb:211:in start' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/engine.rb:175:inrun'
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/supervisor.rb:580:in run_engine' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/supervisor.rb:382:inblock in run_worker'
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/supervisor.rb:509:in main_process' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/supervisor.rb:378:inrun_worker'
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/lib/fluent/command/fluentd.rb:266:in <top (required)>' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/site_ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:inrequire'
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/site_ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in require' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/fluentd-0.14.0/bin/fluentd:5:in<top (required)>'
2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/bin/fluentd:22:in load' 2016-06-17 13:28:23 +0800 [error]: /Users/guyifeng/.rbenv/versions/2.3.0/bin/fluentd:22:in

'

fluentd.conf:

@type tail path /Users/guyifeng/workspace/HugoCount/docs/user_visits.csv pos_file /Users/guyifeng/workspace/HugoCount/docs/user_visits.csv.pos refresh_interval 5s tag user_visits.csv format csv @type aliyun_odps aliyun_access_id ****** aliyun_access_key ******* aliyun_odps_endpoint http://service.odps.aliyun.com/api aliyun_odps_hub_endpoint http://dh.odps.aliyun.com buffer_chunk_limit 2m buffer_queue_limit 128 flush_interval 5s project devel_statistics enable_fast_crc true table user_visits fields created_at,title,url,ip,request_uri,origin,referer,user_agent shard_number 1 retry_time 3 retry_interval 1 abandon_mode true

这个我该怎么处理呢?

如何在conf中设置解析datetime类型

例如一个http accesslog
如何在fluentd 的conf中定义,使其可以解析成datetime类型存入odps 表中?
目前我的尝试中,只能将日期以string模式存储入表

<source>
   type tail
   path /access.log
   pos_file /access.log.pos
   refresh_interval 5s
   tag acc.log
   format /^(?<remote>[^ ]*) - - \[(?<datetime>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<useragent>[^\"]*)" -$/
   time_format %Y%b%d %H:%M:%S %z
</source>

<match acc.**>
  type aliyun_odps
  aliyun_access_id xxxxx
  aliyun_access_key xxxxx
  aliyun_odps_endpoint http://service.odps.aliyun.com/api
  aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
  buffer_chunk_limit 2m
  buffer_queue_limit 128
  #flush_interval 5s
  project 123456
  #enable_fast_crc true
  <table acc.log>
    table acc
    fields remote,method,datetime,path,code,size,referer,useragent
    #partition ctime=${datetime.strftime('%Y%m%d')}
    #time_format %d/%b/%Y:%H:%M:%S %z
    #shard_number 5
    retry_time 3
    retry_interval 1
    abandon_mode true
  </table>
</match>

这里的datetime,如果我的odps表设置为datetime类型,就无法存入。

can not gem install

$ gem install fluent-plugin-aliyun-odps
Fetching: uuidtools-2.1.5.gem (100%)
WARNING:  You don't have /home/lyman/.gem/ruby/2.1.0/bin in your PATH,
          gem executables will not run.
Successfully installed uuidtools-2.1.5
Fetching: strptime-0.1.5.gem (100%)
Building native extensions.  This could take a while...
ERROR:  Error installing fluent-plugin-aliyun-odps:
        ERROR: Failed to build gem native extension.

    /usr/bin/ruby2.1 extconf.rb
mkmf.rb can't find header files for ruby at /usr/lib/ruby/include/ruby.h

extconf failed, exit code 1

Gem files will remain installed in /home/lyman/.gem/ruby/2.1.0/gems/strptime-0.1.5 for inspection.
Results logged to /home/lyman/.gem/ruby/2.1.0/extensions/x86_64-linux/2.1.0/strptime-0.1.5/gem_make.out

Reason seems lacking of ruby-dev package, better mention it in install guide.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.