Giter Club home page Giter Club logo

logstash-output-webhdfs's Introduction

Logstash Plugin

Travis Build Status

This is a plugin for Logstash.

It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.

Documentation

Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one central location.

Need Help?

Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.

Developing

1. Plugin Developement and Testing

Code

  • To get started, you'll need JRuby with the Bundler gem installed.

  • Create a new plugin or clone and existing from the GitHub logstash-plugins organization. We also provide example plugins.

  • Install dependencies

bundle install

Test

  • Update your dependencies
bundle install
  • Run tests
bundle exec rspec

2. Running your unpublished Plugin in Logstash

2.1 Run in a local Logstash clone

  • Edit Logstash Gemfile and add the local plugin path, for example:
gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"
  • Install plugin
# Logstash 2.3 and higher
bin/logstash-plugin install --no-verify

# Prior to Logstash 2.3
bin/plugin install --no-verify
  • Run Logstash with your plugin
bin/logstash -e 'filter {awesome {}}'

At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.

2.2 Run in an installed Logstash

You can use the same 2.1 method to run your plugin in an installed Logstash by editing its Gemfile and pointing the :path to your local plugin development directory or you can build the gem and install it using:

  • Build your plugin gem
gem build logstash-filter-awesome.gemspec
  • Install the plugin from the Logstash home
# Logstash 2.3 and higher
bin/logstash-plugin install --no-verify

# Prior to Logstash 2.3
bin/plugin install --no-verify
  • Start Logstash and proceed to test the plugin

Contributing

All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.

Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.

It is more important to the community that you are able to contribute.

For more information about contributing, see the CONTRIBUTING file.

logstash-output-webhdfs's People

Contributors

andsel avatar colinsurprenant avatar dstore-dbap avatar ebuildy avatar jakelandis avatar jordansissel avatar jsvd avatar ph avatar purbon avatar robbavey avatar sherzberg avatar suyograo avatar talevy avatar yaauie avatar ycombinator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

logstash-output-webhdfs's Issues

why there need to check standby_host before create connect

# Create and test standby client if configured.
if @standby_host
  @standby_client = prepare_client(@standby_host, @standby_port, @user)
  begin
    test_client(@standby_client)
  rescue => e
    logger.warn("Could not connect to standby namenode #{@standby_client.host}. Error: #{e.message}. Trying main webhdfs namenode.")
  end
end
@client = prepare_client(@host, @port, @user)
begin
  test_client(@client)
rescue => e
  # If no standy host is configured, we need to exit here.
  if not @standby_host
    raise
  else
    # If a standby host is configured, try this before giving up.
    logger.error("Could not connect to #{@client.host}:#{@client.port}. Error: #{e.message}")
    do_failover
  end
end

Add Azure Data Lake Storage (ADLS) compatibility

Good morning,

currently we have the use case to forward data from Logstash to an Azure ADLS endpoint.

This endpoint is does use the multi-step oauth2 authentication and allows uploading files through the ADLS REST API as described in here. It is a Microsoft modified, WebHDFS-compatible REST interface for applications.

While there is, thanks to your work, an already community developed Webhdfs output plugin available, the combination of oauth2 authentication with short-lived bearer tokens (1h) and the ADLS protocol does currently not seem to work in Logstash. It seems that Logstash does have its problems with multi-step authentication like oauth2 in general.

As far is I have understood the ADLS documentation, existing applications or services that use the WebHDFS API can easily integrate with ADLS, but according to the Logstash Webhdfs official documentation I do not see the option to integrate the needed oauth2 authentication.

On Github there is an Logstash ADLS output plugin available, but given the facts that it is 3rd party and the last commit was in 2017, we would rather use an official or community-supported plugin than this one.

Are there any plans to support such use case in the future?

I have already created an internal elastic enhancement request, to tackle this from different fronts.

Best regards,
Dennis Hagel

logstash-output-webhdfs不支持hdfs的ha模式

logstash-output-webhdfs的host参数只能设置一个ip地址,这在ha模式下非常不方便,也不合理。
如果hdfs是ha模式,当设置的namenode编程standby时,此次导入就失败了,希望host参数支持多个ip配置

Failed to flush outgoing items {:outgoing_count=>1, :exception=>"WebHDFS::ServerError"

The full error log is
Failed to flush outgoing items {:outgoing_count=>1, :exception=>"WebHDFS::ServerError", :backtrace=>["/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:351:in request'", "/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:349:in request'", "/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:270:in operate_requests'", "/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:73:in create'", "/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:211:in write_data'", "/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:195:in flush'", "org/jruby/RubyHash.java:1342:in each'", "/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:183:in flush'", "/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:219:in buffer_flush'", "org/jruby/RubyHash.java:1342:in each'", "/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:216:in buffer_flush'", "/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:159:in buffer_receive'", "/home/logstash/logstash-5.0.1/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:166:in receive'", "/home/logstash/logstash-5.0.1/logstash-core/lib/logstash/outputs/base.rb:92:in multi_receive'", "org/jruby/RubyArray.java:1613:in each'", "/home/logstash/logstash-5.0.1/logstash-core/lib/logstash/outputs/base.rb:92:in multi_receive'", "/home/logstash/logstash-5.0.1/logstash-core/lib/logstash/output_delegator_strategies/legacy.rb:19:in multi_receive'", "/home/logstash/logstash-5.0.1/logstash-core/lib/logstash/output_delegator.rb:42:in multi_receive'", "/home/logstash/logstash-5.0.1/logstash-core/lib/logstash/pipeline.rb:297:in output_batch'", "org/jruby/RubyHash.java:1342:in each'", "/home/logstash/logstash-5.0.1/logstash-core/lib/logstash/pipeline.rb:296:in output_batch'", "/home/logstash/logstash-5.0.1/logstash-core/lib/logstash/pipeline.rb:252:in worker_loop'", "/home/logstash/logstash-5.0.1/logstash-core/lib/logstash/pipeline.rb:225:in start_workers'"]}`

  • logstash Version: 5.0.1
  • Operating System: Red hat 6.5
  • Config File
    output {
    #stdout { codec => rubydebug }
    webhdfs {
    host => "..."
    port => "50070"
    path => "/user/logstash/dt=%{+YYYY-MM-dd}/logs-%{+HH}.log"
    user => "www"
    }
    }
    www is the hadoop mangent user

Unnecessary failovers for HDFS namenode

Hello,
I've noticed that in my cluster the webhdfs output plugin from time to time performs failover between my HDFS namenodes, despite the fact that the namenodes themselves do not failover. After a bit of investigation I've found that the actual exception causing the failover in plugin is: "Failed to connect to host hd4.local:50075, Net::ReadTimeout.", where hd4 is one of my datanodes, ie. the plugin performs failover even in case of datanode connection error. It is so because the plugin just searches the error string for pattern "Failed to connect". Maybe some more specific matching should be performed, eg. searching for namenode port as well? Unnecessary failovers cause a lot of problems for me, as they sometimes result in HDFS lease problems.

Observed in logstash 6.6.0, HDFS 2.7.3; logstash and hadoop machines are running on CentOS 7.

WebHDFS::ServerError: Failed to connect to host, end of file reached

Logstash information:

Please include the following information:

  1. Logstash version (e.g. bin/logstash --version)
    logstash 8.10.2
  2. Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker)
    rpm
  3. How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes)
    systemd
  4. How was the Logstash Plugin installed
    shipped by default.

JVM (e.g. java -version):
bundled jdk:
openjdk version "17.0.8" 2023-07-18
OpenJDK Runtime Environment Temurin-17.0.8+7 (build 17.0.8+7)
OpenJDK 64-Bit Server VM Temurin-17.0.8+7 (build 17.0.8+7, mixed mode, sharing)

OS version (uname -a if on a Unix-like system):
RHEL 8.4

Description of the problem including expected versus actual behavior:
Expected behavior:
The plugin should ingest events in hdfs using webhdfs or httpfs

Actual behavior:
The plugin crashes logstash with an "end of file reached" error using webhdfs or httpfs.

Steps to reproduce:

  1. Download and install logstash rpm.
  2. Change the shell of logstash to /bin/bash.
  3. Do "sudo su -l" and create a ticket with a valid keytab as the plugin in unable to do kinit by itself.
  4. Create a folder /usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/gssapi-1.3.1/
  5. Download gssapi gem, and extract the data.tar contents at /usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/gssapi-1.3.1/
  6. Copy the contents of /usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/gssapi-1.3.1/lib/ to /usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/logstash-output-webhdfs-3.0.6/lib
  7. Configure a pipeline with webhdfs output with the folling params:
    host => "host active namenode"
    port => "webhdfs port"
    path => "/tmp/test-%{[@metadata][thread_id]}.json"
    single_file_per_thread => true
    retry_times => 50
    use_kerberos_auth => true
    user => "username / principal (I tried both)"
  8. Start logstash.
  9. Tail the log until it crashes (few seconds).

Provide logs (if relevant):

[2023-10-25T14:30:02,512][ERROR][logstash.javapipeline ][main] Pipeline error {:pipeline_id=>"main", :exception=>#<WebHDFS::ServerError: Failed to connect to host node03.lab.local:9871, end of file reached>, :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/webhdfs-0.10.2/lib/webhdfs/client_v1.rb:401:in request'", "/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/webhdfs-0.10.2/lib/webhdfs/client_v1.rb:325:in operate_requests'", "/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/webhdfs-0.10.2/lib/webhdfs/client_v1.rb:168:in list'", "/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs_helper.rb:49:in test_client'", "/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:155:in register'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:69:in register'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:237:in block in register_plugins'", "org/jruby/RubyArray.java:1987:in each'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:236:in register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:610:in maybe_setup_out_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:249:in start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:194:in run'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:146:in `block in start'"], "pipeline.sources"=>["/etc/logstash/conf.d/bigdata.conf"], :thread=>"#<Thread:0x2d8b0d22 /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>

[logstash.outputs.webhdfs ] Failed to flush outgoing items {:outgoing_count=>1, :exception=>"LogStash::Error"

I encounter such error when I tried to use plugin "logstash-output-webhdfs" to output to webhdfs,
here is the configuration when I encounter issue:

output {
  webhdfs{
    host => "*****"
    port => 50070
    path => "/user/username/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log"
    user => "username"
  }
}

when I use some file as the output target, logstash work correctly and can output to the file

output {
	file{
			path => "/log/output.log"
		  }
}

But when the target change to webhdfs, it doesn't work well. I still didn't find out why, does anyone come into the same issue?

[2017-03-10T22:01:26,210][WARN ][logstash.outputs.webhdfs ] Failed to flush outgoing items {:outgoing_count=>1, :exception=>"LogStash::Error", :backtrace=>["org/logstash/ext/JrubyEventExtLibrary.java:202:in `sprintf'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:178:in `flush'", "org/jruby/RubyArray.java:2409:in `collect'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:173:in `flush'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1342:in `each'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:216:in `buffer_flush'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:159:in `buffer_receive'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:166:in `receive'", "/elk/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "org/jruby/RubyArray.java:1613:in `each'", "/elk/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "/elk/logstash/logstash-core/lib/logstash/output_delegator_strategies/legacy.rb:19:in `multi_receive'", "/elk/logstash/logstash-core/lib/logstash/output_delegator.rb:42:in `multi_receive'", "/elk/logstash/logstash-core/lib/logstash/pipeline.rb:331:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "/elk/logstash/logstash-core/lib/logstash/pipeline.rb:330:in `output_batch'", "/elk/logstash/logstash-core/lib/logstash/pipeline.rb:288:in `worker_loop'", "/elk/logstash/logstash-core/lib/logstash/pipeline.rb:258:in `start_workers'"]}
[2017-03-10T22:01:27,212][WARN ][logstash.outputs.webhdfs ] Failed to flush outgoing items {:outgoing_count=>1, :exception=>"LogStash::Error", :backtrace=>["org/logstash/ext/JrubyEventExtLibrary.java:202:in `sprintf'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:178:in `flush'", "org/jruby/RubyArray.java:2409:in `collect'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:173:in `flush'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1342:in `each'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:216:in `buffer_flush'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:159:in `buffer_receive'", "/elk/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:166:in `receive'", "/elk/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "org/jruby/RubyArray.java:1613:in `each'", "/elk/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "/elk/logstash/logstash-core/lib/logstash/output_delegator_strategies/legacy.rb:19:in `multi_receive'", "/elk/logstash/logstash-core/lib/logstash/output_delegator.rb:42:in `multi_receive'", "/elk/logstash/logstash-core/lib/logstash/pipeline.rb:331:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "/elk/logstash/logstash-core/lib/logstash/pipeline.rb:330:in `output_batch'", "/elk/logstash/logstash-core/lib/logstash/pipeline.rb:288:in `worker_loop'", "/elk/logstash/logstash-core/lib/logstash/pipeline.rb:258:in `start_workers'"]}

Documentation error

The Usage section states you supply the following configuration.
"webhdfs {
server => "127.0.0.1:50070" # (required)
path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log" # (required)
user => "hue" # (required)
}"
But the synopsis states that you supply the following configuration.
"webhdfs {
host => ...
path => ...
user => ...
}"
I believe "host" is correct and "server" is incorrect and should be changed to read "host".

write hdfs failed after the first time successfully written

In elastic/logstash#9712, user @hero1122 reports an issue with WebHDFS Output Plugin, indicating that there is some issue with HDFS support for append file:

[2018-06-07T10:26:37,363][ERROR][logstash.outputs.webhdfs ] Max write retries reached. Events will be discarded. Exception: {"RemoteException":{"message":"Failed to APPEND_FILE \/logstash\/dt=2018-06-07\/logstash-02.log for DFSClient_NONMAPREDUCE_-692957599_23 on 192.168.0.3 because this file lease is currently owned by DFSClient_NONMAPREDUCE_382768740_23 on 192.168.0.3","exception":"RemoteException","javaClassName":"org.apache.hadoop.ipc.RemoteException"}}
[2018-06-07T10:26:40,009][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x415fbb96 sleep>"}

For all general issues, please provide the following details for fast resolution:

  • Version: 6.2.4
  • Operating System: centos 6.5
  • Config File (if you have sensitive info, please remove it):
    port => "14000"
    use_httpfs => "true"
    path => "/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}-%{+mm}-%{+ss}.log"
    user => "hadoop"
  • Sample Data:
  • Steps to Reproduce:
    the first time will be successful,but the second time will be failed where append to hdfs!

And hdfs no support append file, we mush call file close method after write data info hdfs file!

@hero1122 can you please provide additional context to help us reproduce?

Unknown setting 'message_format' for webhdfs

  • Version:logstash-2.3.4
    • Operating System:linux
    • Config File :
      output{
      webhdfs{
      host => "sea2"
      codec => "json"
      standby_host => "sea3"
      path => "/cdnlog/%{+YYYYMMdd}/%{+HH}/%{+mm}/cnc_192.log"
      message_format =>"%{message}"
      user => "hadoop"
      }
      }
  • Steps to Reproduce:
    input data :1473738019.430 "116.228.11.210" 375 "GET"
    output hdfs file : 2016-09-13T03:39:58.642Z %{host} 1473738019.430 "116.228.11.210" 375 "GET"

More output in front of a timestamp and % {host}, but I don't need them

conf error

Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.

For all general issues, please provide the following details for fast resolution:

  • Version:
  • Operating System:
  • Config File (if you have sensitive info, please remove it):
    input {
    ...
    }
    filter {
    ...
    }
    output {
    webhdfs {
    server => "127.0.0.1:50070" # (required)
    path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log" # (required)
    user => "hue" # (required)
    }
    }

this conf is error

  • Sample Data:
  • Steps to Reproduce:

Ticket Expired : Logstash + WebHdfs + Kerberos

Hi,

We are trying to use this plugins but we are facing a problem when Kerberos is enabled (When kerberos is disabled, works fine).

[2017-10-19T13:11:27,226][INFO ][logstash.outputs.elasticsearch] Installing elasticsearch template to _template/netent.template.json
[2017-10-19T13:11:27,249][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["https://std-elasticsearch-poc.nix.cydmodule.com:9200"]}
[2017-10-19T13:11:28,097][ERROR][logstash.outputs.webhdfs ] Webhdfs check request failed. (namenode: 10.72.19.17:50070, Exception: gss_init_sec_context did not return GSS_S_COMPLETE: Unspecified GSS failure. Minor code may provide more information
Ticket expired
)
[2017-10-19T13:11:28,109][ERROR][logstash.pipeline ] Error registering plugin {:plugin=>"#<LogStash::OutputDelegator:0x3dc3e8a @namespaced_metric=#<LogStash::Instrument::NamespacedMetric:0x74150eec @Metric=#<LogStash::Instrument::Metric:0x5bd75dae @collector=#<LogStash::Instrument::Collector:0x494171d5 @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x2b0b2e19 @store=#<Concurrent::Map:0x000000000688bc entries=3 default_proc=nil>, @structured_lookup_mutex=#Mutex:0x77b537fa, @fast_lookup=#<Concurrent::Map:0x000000000688c0 entries=760 default_proc=nil>>>>, @namespace_name=[:stats, :pipelines, :main, :plugins, :outputs, :"37f78fe6277c84bb2c35f6ec985bd6afbdfbb97c-177"]>, @Metric=#<LogStash::Instrument::NamespacedMetric:0x54632a4e @Metric=#<LogStash::Instrument::Metric:0x5bd75dae @collector=#<LogStash::Instrument::Collector:0x494171d5 @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x2b0b2e19 @store=#<Concurrent::Map:0x000000000688bc entries=3 default_proc=nil>, @structured_lookup_mutex=#Mutex:0x77b537fa, @fast_lookup=#<Concurrent::Map:0x000000000688c0 entries=760 default_proc=nil>>>>, @namespace_name=[:stats, :pipelines, :main, :plugins, :outputs]>, @logger=#<LogStash::Logging::Logger:0x783e44fd @logger=#Java::OrgApacheLoggingLog4jCore::Logger:0x27c13500>, @out_counter=LogStash::Instrument::MetricType::Counter - namespaces: [:stats, :pipelines, :main, :plugins, :outputs, :"37f78fe6277c84bb2c35f6ec985bd6afbdfbb97c-177", :events] key: out value: 0, @in_counter=LogStash::Instrument::MetricType::Counter - namespaces: [:stats, :pipelines, :main, :plugins, :outputs, :"37f78fe6277c84bb2c35f6ec985bd6afbdfbb97c-177", :events] key: in value: 0, @strategy=#<LogStash::OutputDelegatorStrategies::Legacy:0x214f33f0 @worker_count=1, @Workers=[<LogStash::Outputs::WebHdfs codec=><LogStash::Codecs::JSON id=>"json_9fd7e005-e137-4589-b17d-2fcfc9d3ba14", enable_metric=>true, charset=>"UTF-8">, host=>"10.72.19.17", port=>50070, path=>"/user/logstash/%{type}-%{site}-%{+YYYY.MM.dd}-%{[@metadata][thread_id]}.log", user=>"hdfs", flush_size=>500, compression=>"snappy", idle_flush_time=>10, retry_interval=>10, workers=>1, retry_times=>10, single_file_per_thread=>true, use_kerberos_auth=>true, kerberos_keytab=>"/etc/logstash/conf.d/hdfs/spnego.service.keytab", id=>"37f78fe6277c84bb2c35f6ec985bd6afbdfbb97c-177", enable_metric=>true, standby_host=>false, standby_port=>50070, open_timeout=>30, read_timeout=>30, use_httpfs=>false, retry_known_errors=>true, snappy_bufsize=>32768, snappy_format=>"stream", use_ssl_auth=>false>], @worker_queue=#SizedQueue:0x2ca172cf>, @id="37f78fe6277c84bb2c35f6ec985bd6afbdfbb97c-177", @time_metric=LogStash::Instrument::MetricType::Counter - namespaces: [:stats, :pipelines, :main, :plugins, :outputs, :"37f78fe6277c84bb2c35f6ec985bd6afbdfbb97c-177", :events] key: duration_in_millis value: 0, @metric_events=#<LogStash::Instrument::NamespacedMetric:0x5c93193 @Metric=#<LogStash::Instrument::Metric:0x5bd75dae @collector=#<LogStash::Instrument::Collector:0x494171d5 @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x2b0b2e19 @store=#<Concurrent::Map:0x000000000688bc entries=3 default_proc=nil>, @structured_lookup_mutex=#Mutex:0x77b537fa, @fast_lookup=#<Concurrent::Map:0x000000000688c0 entries=760 default_proc=nil>>>>, @namespace_name=[:stats, :pipelines, :main, :plugins, :outputs, :"37f78fe6277c84bb2c35f6ec985bd6afbdfbb97c-177", :events]>, @output_class=LogStash::Outputs::WebHdfs>", :error=>"gss_init_sec_context did not return GSS_S_COMPLETE: Unspecified GSS failure. Minor code may provide more information\nTicket expired\n"}
[2017-10-19T13:11:30,184][ERROR][logstash.agent ] Pipeline aborted due to error {:exception=>#<WebHDFS::KerberosError: gss_init_sec_context did not return GSS_S_COMPLETE: Unspecified GSS failure. Minor code may provide more information
Ticket expired

output {
webhdfs {
codec => json
host => "xxxxxxx"
port => 50070
path => "/user/logstash/%{type}-%{site}-%{+YYYY.MM.dd}-%{[@metadata][thread_id]}.log"
user => "hdfs"
flush_size => 500
compression => "snappy"
idle_flush_time => 10
retry_interval => 10
workers => 1
retry_times => 10
single_file_per_thread =>true
use_kerberos_auth => true
kerberos_keytab => "/etc/logstash/conf.d/hdfs/hdfs.service.keytab"
}
}

I have some questions regarding the ticket and the keytab. Do we need a logstash account as kerberos principal? Or how is the approach for the setup with Kerberos?.

SO:Red Hat Enterprise Linux Server release 7.4
Elastic Version: 5.6.1

Offline Installs - missing in plugin pack

logstash 2.1.1-1
centos 6
logstash-output-webhdfs 2.0.2

When doing a plugin pack, the pack does not contain the logstash-output-webhdfs plugin. It has the webhdfs dep, but not the actual plugin. I'm not completely sure how this ties together. I've also tried installing just the webhdfs dependency that seems to be packaged with the pack generated on the other host. "./plugin list" does not list it as being installed, nor does it list any other deps.

I believe this is an upstream issue with the plugin utility, but opening an issue here for tracking since offline installs are broken.

List of plugins from host that was used to create the pack

The plugins on this host were installed via the typical ./plugin install

# ./plugin list
logstash-codec-collectd
logstash-codec-dots
logstash-codec-edn
logstash-codec-edn_lines
logstash-codec-es_bulk
logstash-codec-fluent
logstash-codec-graphite
logstash-codec-json
logstash-codec-json_lines
logstash-codec-line
logstash-codec-msgpack
logstash-codec-multiline
logstash-codec-netflow
logstash-codec-oldlogstashjson
logstash-codec-plain
logstash-codec-rubydebug
logstash-filter-anonymize
logstash-filter-checksum
logstash-filter-clone
logstash-filter-csv
logstash-filter-date
logstash-filter-dns
logstash-filter-drop
logstash-filter-fingerprint
logstash-filter-geoip
logstash-filter-grok
logstash-filter-json
logstash-filter-kv
logstash-filter-metrics
logstash-filter-multiline
logstash-filter-mutate
logstash-filter-ruby
logstash-filter-sleep
logstash-filter-split
logstash-filter-syslog_pri
logstash-filter-throttle
logstash-filter-urldecode
logstash-filter-useragent
logstash-filter-uuid
logstash-filter-xml
logstash-input-beats
logstash-input-couchdb_changes
logstash-input-elasticsearch
logstash-input-eventlog
logstash-input-exec
logstash-input-file
logstash-input-ganglia
logstash-input-gelf
logstash-input-generator
logstash-input-graphite
logstash-input-heartbeat
logstash-input-http
logstash-input-imap
logstash-input-irc
logstash-input-jdbc
logstash-input-kafka
logstash-input-log4j
logstash-input-lumberjack
logstash-input-pipe
logstash-input-rabbitmq
logstash-input-redis
logstash-input-s3
logstash-input-snmptrap
logstash-input-sqs
logstash-input-stdin
logstash-input-syslog
logstash-input-tcp
logstash-input-twitter
logstash-input-udp
logstash-input-unix
logstash-input-xmpp
logstash-input-zeromq
logstash-output-cloudwatch
logstash-output-csv
logstash-output-elasticsearch
logstash-output-email
logstash-output-exec
logstash-output-file
logstash-output-ganglia
logstash-output-gelf
logstash-output-graphite
logstash-output-hipchat
logstash-output-http
logstash-output-irc
logstash-output-juggernaut
logstash-output-kafka
logstash-output-lumberjack
logstash-output-mongodb
logstash-output-nagios
logstash-output-nagios_nsca
logstash-output-null
logstash-output-opentsdb
logstash-output-pagerduty
logstash-output-pipe
logstash-output-rabbitmq
logstash-output-redis
logstash-output-s3
logstash-output-sns
logstash-output-sqs
logstash-output-statsd
logstash-output-stdout
logstash-output-tcp
logstash-output-udp
logstash-output-webhdfs
logstash-output-xmpp
logstash-output-zeromq
logstash-patterns-core
# ./plugin pack
Packaging plugins for offline usage
Generated at /opt/logstash/plugins_package.tar.gz

./plugin still tries to hit the interwebs even with --local and --no-verify

bin # ./plugin install --no-verify --local webhdfs
Installing webhdfs
Error Bundler::HTTPError, retrying 1/10
Could not fetch specs from https://rubygems.org/
Error Bundler::HTTPError, retrying 2/10
Could not fetch specs from https://rubygems.org/
Error Bundler::HTTPError, retrying 3/10
Could not fetch specs from https://rubygems.org/
Error Bundler::HTTPError, retrying 4/10
Could not fetch specs from https://rubygems.org/
Error Bundler::HTTPError, retrying 5/10
Could not fetch specs from https://rubygems.org/
Error Bundler::HTTPError, retrying 6/10
Could not fetch specs from https://rubygems.org/
Error Bundler::HTTPError, retrying 7/10
Could not fetch specs from https://rubygems.org/
Error Bundler::HTTPError, retrying 8/10
Could not fetch specs from https://rubygems.org/
Error Bundler::HTTPError, retrying 9/10
Could not fetch specs from https://rubygems.org/
Error Bundler::HTTPError, retrying 10/10
Could not fetch specs from https://rubygems.org/
Too many retries, aborting, caused by Bundler::HTTPError
ERROR: Installation Aborted, message: Could not fetch specs from https://rubygems.org/

List of gems from pack file in /opt/logstash/vendor/cache/

cache # ls -1
addressable-2.3.8.gem
arr-pm-0.0.10.gem
atomic-1.1.99-java.gem
avl_tree-1.2.1.gem
awesome_print-1.6.1.gem
aws-sdk-2.1.36.gem
aws-sdk-core-2.1.36.gem
aws-sdk-resources-2.1.36.gem
aws-sdk-v1-1.66.0.gem
backports-3.6.7.gem
bindata-2.1.0.gem
bson-3.2.6-java.gem
buftok-0.2.0.gem
cabin-0.7.2.gem
childprocess-0.5.8.gem
cinch-2.3.1.gem
clamp-0.6.5.gem
coderay-1.1.0.gem
concurrent-ruby-0.9.2-java.gem
domain_name-0.5.25.gem
edn-1.1.0.gem
elasticsearch-1.0.15.gem
elasticsearch-api-1.0.15.gem
elasticsearch-transport-1.0.15.gem
equalizer-0.0.10.gem
faraday-0.9.2.gem
ffi-1.9.10-java.gem
ffi-rzmq-2.0.4.gem
ffi-rzmq-core-1.0.4.gem
file-dependencies-0.1.6.gem
filesize-0.0.4.gem
filewatch-0.6.7.gem
fpm-1.3.3.gem
gelf-1.3.2.gem
gelfd-0.2.0.gem
gems-0.8.3.gem
geoip-1.6.1.gem
gmetric-0.1.3.gem
hipchat-1.5.2.gem
hitimes-1.2.3-java.gem
http-0.9.8.gem
httparty-0.13.7.gem
http-cookie-1.0.2.gem
http-form_data-1.0.1.gem
http_parser.rb-0.6.0-java.gem
i18n-0.6.9.gem
jar-dependencies-0.3.1.gem
jls-grok-0.11.2.gem
jls-lumberjack-0.0.26.gem
jmespath-1.1.3.gem
jrjackson-0.3.7.gem
jruby-kafka-1.4.0-java.gem
jruby-openssl-0.9.12-java.gem
json-1.8.3-java.gem
logstash-codec-collectd-2.0.2.gem
logstash-codec-dots-2.0.2.gem
logstash-codec-edn-2.0.2.gem
logstash-codec-edn_lines-2.0.2.gem
logstash-codec-es_bulk-2.0.2.gem
logstash-codec-fluent-2.0.2-java.gem
logstash-codec-graphite-2.0.2.gem
logstash-codec-json-2.0.4.gem
logstash-codec-json_lines-2.0.2.gem
logstash-codec-line-2.0.2.gem
logstash-codec-msgpack-2.0.2-java.gem
logstash-codec-multiline-2.0.4.gem
logstash-codec-netflow-2.0.2.gem
logstash-codec-oldlogstashjson-2.0.2.gem
logstash-codec-plain-2.0.2.gem
logstash-codec-rubydebug-2.0.4.gem
logstash-core-2.1.1-java.gem
logstash-filter-anonymize-2.0.2.gem
logstash-filter-checksum-2.0.2.gem
logstash-filter-clone-2.0.4.gem
logstash-filter-csv-2.1.0.gem
logstash-filter-date-2.0.2.gem
logstash-filter-dns-2.0.2.gem
logstash-filter-drop-2.0.2.gem
logstash-filter-fingerprint-2.0.2.gem
logstash-filter-geoip-2.0.4.gem
logstash-filter-grok-2.0.2.gem
logstash-filter-json-2.0.2.gem
logstash-filter-kv-2.0.2.gem
logstash-filter-metrics-3.0.0.gem
logstash-filter-multiline-2.0.3.gem
logstash-filter-mutate-2.0.2.gem
logstash-filter-ruby-2.0.2.gem
logstash-filter-sleep-2.0.2.gem
logstash-filter-split-2.0.2.gem
logstash-filter-syslog_pri-2.0.2.gem
logstash-filter-throttle-2.0.2.gem
logstash-filter-urldecode-2.0.2.gem
logstash-filter-useragent-2.0.3.gem
logstash-filter-uuid-2.0.3.gem
logstash-filter-xml-2.0.2.gem
logstash-input-beats-2.0.3.gem
logstash-input-couchdb_changes-2.0.2.gem
logstash-input-elasticsearch-2.0.2.gem
logstash-input-eventlog-3.0.1.gem
logstash-input-exec-2.0.4.gem
logstash-input-file-2.0.3.gem
logstash-input-ganglia-2.0.4.gem
logstash-input-gelf-2.0.2.gem
logstash-input-generator-2.0.2.gem
logstash-input-graphite-2.0.4.gem
logstash-input-heartbeat-2.0.2.gem
logstash-input-http-2.0.2.gem
logstash-input-imap-2.0.2.gem
logstash-input-irc-2.0.3.gem
logstash-input-jdbc-2.0.5.gem
logstash-input-kafka-2.0.2.gem
logstash-input-log4j-2.0.4-java.gem
logstash-input-lumberjack-2.0.5.gem
logstash-input-pipe-2.0.2.gem
logstash-input-rabbitmq-3.1.1.gem
logstash-input-redis-2.0.2.gem
logstash-input-s3-2.0.3.gem
logstash-input-snmptrap-2.0.2.gem
logstash-input-sqs-2.0.3.gem
logstash-input-stdin-2.0.2.gem
logstash-input-syslog-2.0.2.gem
logstash-input-tcp-3.0.0.gem
logstash-input-twitter-2.2.0.gem
logstash-input-udp-2.0.3.gem
logstash-input-unix-2.0.4.gem
logstash-input-xmpp-2.0.3.gem
logstash-input-zeromq-2.0.2.gem
logstash-mixin-aws-2.0.2.gem
logstash-mixin-http_client-2.0.3.gem
logstash-mixin-rabbitmq_connection-2.2.0-java.gem
logstash-output-cloudwatch-2.0.2.gem
logstash-output-csv-2.0.2.gem
logstash-output-elasticsearch-2.2.0-java.gem
logstash-output-email-3.0.2.gem
logstash-output-exec-2.0.2.gem
logstash-output-file-2.2.0.gem
logstash-output-ganglia-2.0.2.gem
logstash-output-gelf-2.0.2.gem
logstash-output-graphite-2.0.2.gem
logstash-output-hipchat-3.0.2.gem
logstash-output-http-2.0.5.gem
logstash-output-irc-2.0.2.gem
logstash-output-juggernaut-2.0.2.gem
logstash-output-kafka-2.0.1.gem
logstash-output-lumberjack-2.0.4.gem
logstash-output-nagios-2.0.2.gem
logstash-output-nagios_nsca-2.0.3.gem
logstash-output-null-2.0.2.gem
logstash-output-opentsdb-2.0.2.gem
logstash-output-pagerduty-2.0.2.gem
logstash-output-pipe-2.0.2.gem
logstash-output-rabbitmq-3.0.6-java.gem
logstash-output-redis-2.0.2.gem
logstash-output-sns-3.0.2.gem
logstash-output-sqs-2.0.2.gem
logstash-output-statsd-2.0.4.gem
logstash-output-stdout-2.0.3.gem
logstash-output-tcp-2.0.2.gem
logstash-output-udp-2.0.2.gem
logstash-output-xmpp-2.0.2.gem
logstash-output-zeromq-2.0.2.gem
logstash-patterns-core-2.0.2.gem
lru_redux-1.1.0.gem
mail-2.6.3.gem
manticore-0.4.4-java.gem
march_hare-2.11.0-java.gem
memoizable-0.4.2.gem
method_source-0.8.2.gem
metriks-0.9.9.7.gem
mimemagic-0.3.0.gem
mime-types-2.99.gem
minitar-0.5.4.gem
mongo-2.0.6.gem
msgpack-jruby-1.4.1-java.gem
multi_json-1.11.2.gem
multipart-post-2.0.0.gem
multi_xml-0.5.5.gem
murmurhash3-0.1.6-java.gem
naught-1.1.0.gem
nokogiri-1.6.7-java.gem
octokit-3.8.0.gem
polyglot-0.3.5.gem
pry-0.10.3-java.gem
puma-2.11.3-java.gem
rack-1.6.4.gem
redis-3.2.2.gem
ruby-maven-3.3.8.gem
ruby-maven-libs-3.3.3.gem
rubyzip-1.1.7.gem
rufus-scheduler-3.0.9.gem
sawyer-0.6.0.gem
sequel-4.29.0.gem
simple_oauth-0.3.1.gem
slop-3.6.0.gem
snappy-0.0.12-java.gem
snappy-jars-1.1.0.1.2-java.gem
snmp-1.2.0.gem
spoon-0.0.4.gem
statsd-ruby-1.2.0.gem
stud-0.0.22.gem
thread_safe-0.3.5-java.gem
treetop-1.4.15.gem
twitter-5.15.0.gem
tzinfo-1.2.2.gem
unf-0.1.4-java.gem
user_agent_parser-2.3.0.gem
webhdfs-0.7.4.gem
win32-eventlog-0.6.5.gem
xml-simple-1.1.5.gem
xmpp4r-0.5.gem

Write errors when running vs HDP 2.3 (==Hadoop 2.7)

I'm running a very boring configuration in Logstash 2.1.2 that reads from a local CSV file and writes to webhdfs (connecting to an 8 node HDFS cluster created by a vanilla HDP install)

input { file { ... } }
filter { csv { ... } }
output {
  if [sourceKey] == "<<hdfs_output_key>>" {
        webhdfs {
            path => '/app/aleph2/data/aleph2_testing/56bc4f59e4b026e56f0a3cc5/ttm/test/bucket1/managed_bucket/import/ready/ls_input_%{+yyyy.MM.dd.hh}.json'
            host => '<<webhdfs.server.name>>'
            port => '50075'
            user => 'tomcat'
            codec => 'json'
            flush_size => 33554432
            idle_flush_time => 10
        }
   }
}

On some ~10MB file with ~200K lines

I get a non-stop stream of the following errors

{:timestamp=>"2016-02-11T19:51:31.812000-0500", :message=>"webhdfs write caused an exception: Failed to connect to host <<webhdfs.server.name>>:50075, Connection reset by peer - Connection reset by peer. Maybe you should increase retry_interval or reduce number of workers. Retrying...", :level=>:warn}
...
{:timestamp=>"2016-02-11T19:51:33.808000-0500", :message=>"Max write retries reached. Events will be discarded. Exception: Failed to connect to host <<webhdfs.server.name>>:50075, Connection reset by peer - Connection reset by peer", :level=>:error}

A random number of the events are discarded each run, ~25-50%

Looking at the server logs, it's filled with things like the following:

2016-02-11 08:54:24,863 WARN  hdfs.StateChange (FSNamesystem.java:appendFileInternal(2690)) - DIR* NameSystem.append: Failed to APPEND_FILE /app/aleph2/data/aleph2_testing/56bc4f59e4b026e56f0a3cc5/ttm/test/bucket1/managed_bucket/import/ready/ls_input_2016.02.11.01.json for DFSClient_NONMAPREDUCE_154673148_72 on 10.12.16.101 because this file lease is currently owned by DFSClient_NONMAPREDUCE_1522645774_64 on 10.12.16.101

Which appears to be caused by multiple threads accessing the same file according to Google. And there's a related but cryptic comment in the the source code here (sic):

  def write_data(path, data)
    # Retry max_retry times. This can solve problems like leases being hold by another process. Sadly this is no
    # KNOWN_ERROR in rubys webhdfs client.

So:

  • Is this actually not an issue, ie or am I just doing something stupid?
  • If it is a legit issue, Is there a workaround? The error says reduce number of workers, but that presumably means the global number of worker threads, which would cripple the performance of the other logstash configs

Thanks in advance of any insight/help!

Error using WebHDFS output

Constantly receive the following error with no indication on how to resolve the issue and with a result that no events are actually pushed to my hdfs. Can someone please provide some guidance?
Failed to flush outgoing items {:outgoing_count=>1, :exception=>"WebHDFS::ServerError", :backtrace=>["E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/webhdfs-0.7.4/lib/webhdfs/client_v1.rb:351:in request'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/webhdfs-0.7.4/lib/webhdfs/client_v1.rb:349:inrequest'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/webhdfs-0.7.4/lib/webhdfs/client_v1.rb:270:in operate_requests'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/webhdfs-0.7.4/lib/webhdfs/client_v1.rb:73:increate'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-2.0.2/lib/logstash/outputs/webhdfs.rb:184:in write_data'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-2.0.2/lib/logstash/outputs/webhdfs.rb:179:inwrite_data'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-2.0.2/lib/logstash/outputs/webhdfs.rb:169:in flush'", "org/jruby/RubyHash.java:1342:ineach'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-2.0.2/lib/logstash/outputs/webhdfs.rb:157:in flush'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:219:inbuffer_flush'", "org/jruby/RubyHash.java:1342:in each'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:216:inbuffer_flush'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:159:in buffer_receive'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-2.0.2/lib/logstash/outputs/webhdfs.rb:144:inreceive'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.0-java/lib/logstash/outputs/base.rb:81:in handle'", "E:/elk2/logstash-2.1.0/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.0-java/lib/logstash/outputs/base.rb:71:inworker_setup'"], :level=>:warn}

Compatibility with logstash 2.1?

It seems the latest version available at https://rubygems.org/gems/logstash-output-hdfs requires

logstash-core < 2.0.0, >= 1.4.0

Hence, trying install this plugin with logstash 2.1 gives:

Bundler could not find compatible versions for gem "logstash":
  In Gemfile:
    logstash-output-hdfs (>= 0) java depends on
      logstash (< 2.0.0, >= 1.4.0) java
Could not find gem 'logstash (< 2.0.0, >= 1.4.0) java', which is required by gem 'logstash-output-hdfs (>= 0) java', in any of the sources.

Manualy installation works fine:

cd /tmp && \
    git clone https://github.com/logstash-plugins/logstash-output-webhdfs.git && \
    cd logstash-output-webhdfs && \
    gem build logstash-output-webhdfs.gemspec && \
    /opt/logstash/bin/plugin install /tmp/logstash-output-webhdfs/logstash-output-webhdfs-2.0.2.gem

WebHDFS:IOError

Hi, i encounter an error while trying to save logstash into HDFS. The debug looked like this:

Failed to flush outgoing items {:outgoing_count=>93, :exception=>"WebHDFS::IOError", :backtrace=>["/opt/logstash/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:401:inrequest'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:270:in operate_requests'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:73:increate'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:211:in write_data'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:195:inflush'", "org/jruby/RubyHash.java:1342:in each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.2/lib/logstash/outputs/webhdfs.rb:183:inflush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:219:in buffer_flush'", "org/jruby/RubyHash.java:1342:ineach'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:216:in buffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:193:inbuffer_flush'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:112:in buffer_initialize'", "org/jruby/RubyKernel.java:1479:inloop'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/buffer.rb:110:in buffer_initialize'"], :level=>:warn}

And it was repeated. My configurations are

input {
  redis {
    data_type => "list"
    key => "filebeat"
  }
}

filter {
  if [type] == "conn_log" {
    csv {
      columns => ["ts","uid","id.orig_h","id.orig_p","id.resp_h","id.resp_p","proto","service","duration","orig_bytes","resp_bytes","conn_state","local_orig","local_resp","missed_bytes","history","orig_pkts","orig_ip_bytes","resp_pkts","resp_ip_bytes","tunnel_parents"]
      separator => "    "
    }
  } else if [type] == "weird_log" {
    csv {
      columns => ["ts","uid","id.orig_h","id.orig_p","id.resp_h","id.resp_p","name","addl","notice","peer"]
      separator => "    "
    }
  } else {
    csv {
      columns => ["ts","uid","id.orig_h","id.orig_p","id.resp_h","id.resp_p","fuid","file_mime_type","file_desc","proto","note","msg","sub","src","dst","p","n","peer_descr","actions","suppress_for","dropped","remote_location.country_code","remote_location.region","remote_location.city","remote_location.latitude","remote_location.longitude"]
      separator => "    "
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
  webhdfs {
    host => "127.0.0.1"
    path => "/user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log"
    user => "hduser"
  }
  file {
    path => "logs/redis-%{+YYYY-MM-dd}.txt"
  }
}

I use latest (2.4) version of logstash

Idea/enhancement: support for HA namenodes

You could make a strong case that this could/should either be implemented in the underlying webhdfs module or not at all

With HDP2.3 at least, HA HDFS is now enabled by default. There are 2 namenodes (active and standby) and they can switch states whenever they please.

If you hit the wrong one via WebHDFS you (are guaranteed to) get the following error (in a 403):

{\"RemoteException\":{\"exception\":\"StandbyException\",\"javaClassName\":\"org.apache.hadoop.ipc.StandbyException\",\"message\":\"Operation category READ is not supported in state standby\"}})", :level=>:error, :file=>"logstash/outputs/webhdfs.rb", :line=>"129", :method=>"register"}

So you could be a hero and accept 2 namenodes as configuration and then just switch between them (at most once per record!) whenever you get the above error..

(Incidentally the alternative is apparently to use HttpFS ... I see that the webhdfs library you use has "experimental" HttpFS support - do you expose that?)

Incomplete GZIP Block During Interrupted Writes Corrupts Files

Description of the problem including expected versus actual behavior:

Expected Behavior:

When writing to HDFS using the WebHDFS output plugin with GZIP compression, it should handle interrupted writes gracefully, ensuring data integrity and file consistency.

Actual Behavior:

Currently, the compress_gzip method in the WebHDFS output plugin does not account for interrupted writes or retries. When an interruption occurs during a write operation, the GZIP block being written becomes incomplete. GZIP's nature doesn't allow for appending more data to this incomplete block later on, effectively corrupting the file.


Steps to Reproduce:

  1. Configure Logstash with the WebHDFS output plugin and enable GZIP compression.
  2. Start ingesting data to HDFS through Logstash.
  3. Simulate a WebHDFS failure or maintenance activity during a write operation (you can kill the process, for example).
  4. Observe the resulting HDFS file. It will contain an incomplete GZIP block, making it corrupted and unreadable.

Logstash Information:

  1. Logstash version: 7.17.1
  2. Logstash installation source: tar
  3. How is Logstash being run: systemd

OS Version: RHEL 7

403 GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)

Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.

For all general issues, please provide the following details for fast resolution:

  • Version:elk 5.6.3
  • Operating System:centos 7.3
  • Config File (if you have sensitive info, please remove it):
input {
    beats {
        port => 5044
  }
}

output {
    webhdfs {
        host => "nfjd-hadoop02-node169"
        port => 50070
        user => "dengsc"
        path => "/user/dengsc/audit/%{+YYYYMMdd}/logstash-%{+HH}.log"
        use_kerberos_auth => true
        kerberos_keytab => "/home/dengsc/dengsc.keytab"
        standby_host => "nfjd-hadoop02-node147"
        standby_port => 50070
    }
    # stdout { codec => rubydebug { metadata => true } } # for debug
}
  • Sample Data:
2017-04-20 10:47:09,207 INFO FSNamesystem.audit: allowed=true	ugi=hdfs (auth:SIMPLE)	ip=/****	cmd=getfileinfo	src=/tmp/.cloudera_health_monitoring_canary_files	dst=null	perm=nullproto=rpc
  • Steps to Reproduce:
# install gssapi
export PATH=/home/dengsc/logstash/vendor/jruby/bin:$PATH
export GEM_HOME=/home/dengsc/logstash/vendor/bundle/jruby/1.9/
cd /home/dengsc/logstash/vendor/jruby
bin/gem install gssapi
cp -r /home/dengsc/logstash/vendor/bundle/jruby/1.9/gems/gssapi-1.2.0/lib/* /home/dengsc/logstash/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/


# start logstash
bin/logstash -f test.conf
# exception
[2018-01-02T16:51:08,884][ERROR][logstash.outputs.webhdfs ] Webhdfs check request failed. (namenode: nfjd-hadoop02-node147:50070, Exception: <html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/><title>Error 403 GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)</title></head><body><h2>HTTP ERROR 403</h2><p>Problem accessing /webhdfs/v1/. Reason:<pre>    GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                </body></html>)


# when i use hdfs cli
[dengsc@nfjd-hadoop02-node177 logstash]$ hdfs dfs -ls webhdfs://nfjd-hadoop02-node169:50070/user/dengsc
Found 2 items
drwx------   - dengsc supergroup          0 2018-01-02 13:34 webhdfs://nfjd-hadoop02-node169:50070/user/dengsc/.Trash
drwx------   - dengsc supergroup          0 2017-08-10 21:10 webhdfs://nfjd-hadoop02-node169:50070/user/dengsc/.staging
[dengsc@nfjd-hadoop02-node177 logstash]$ klist
Ticket cache: FILE:/tmp/krb5cc_2190
Default principal: [email protected]

Valid starting       Expires              Service principal
01/02/2018 16:50:54  01/03/2018 16:50:54  krbtgt/[email protected]
	renew until 01/09/2018 16:50:54
01/02/2018 16:51:08  01/03/2018 16:50:54  HTTP/nfjd-hadoop02-node147.jpushoa.com@
	renew until 01/09/2018 16:50:54
01/02/2018 16:51:08  01/03/2018 16:50:54  HTTP/[email protected]
	renew until 01/09/2018 16:50:54
01/02/2018 16:51:08  01/03/2018 16:50:54  HTTP/nfjd-hadoop02-node169.jpushoa.com@
	renew until 01/09/2018 16:50:54
01/02/2018 16:51:08  01/03/2018 16:50:54  HTTP/[email protected]
	renew until 01/09/2018 16:50:54

kerberos fail

[2019-03-13T16:33:14,906][ERROR][logstash.outputs.webhdfs ] Webhdfs check request failed. (namenode: vm32102:25002, Exception: gss_init_sec_context did not return GSS_S_COMPLETE: Unspecified GSS failure. Minor code may provide more information
No Kerberos credentials available (default cache: FILE:/tmp/krb5cc_1000)
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.