kzk / webhdfs Goto Github PK

Ruby client for Hadoop WebHDFS

License: Other

Ruby 100.00%

webhdfs's Introduction

webhdfs - A client library implementation for Hadoop WebHDFS, and HttpFs, for Ruby

The webhdfs gem is to access Hadoop WebHDFS (EXPERIMENTAL: and HttpFs). WebHDFS::Client is a client class, and WebHDFS::FileUtils is utility like 'fileutils'.

Installation

gem install webhdfs

Usage

WebHDFS::Client

For client object interface:

require 'webhdfs'
client = WebHDFS::Client.new(hostname, port)
# or with pseudo username authentication
client = WebHDFS::Client.new(hostname, port, username)

To create/append/read files:

client.create('/path/to/file', data)
client.create('/path/to/file', data, :overwrite => false, :blocksize => 268435456, :replication => 5, :permission => '0666')

#This does not require whole data in memory, and it can be read chunk by chunk, ex: File data
client.create('/path/to/file', file_IO_handle, :overwrite => false, :permission => 0666)

client.append('/path/to/existing/file', data)

client.read('/path/to/target') #=> data
client.read('/path/to/target' :offset => 2048, :length => 1024) #=> data

To mkdir/rename/delete directories or files:

client.mkdir('/hdfs/dirname')
client.mkdir('/hdfs/dirname', :permission => '0777')

client.rename(original_path, dst_path)

client.delete(path)
client.delete(dir_path, :recursive => true)

To get status or list of files and directories:

client.stat(file_path) #=> key-value pairs for file status
client.list(dir_path)  #=> list of key-value pairs for files in dir_path

And, 'content_summary', 'checksum', 'homedir', 'chmod', 'chown', 'replication' and 'touch' methods available.

For known errors, automated retries are available. Set retry_known_errors option as true.

#### To retry for LeaseExpiredException automatically
client.retry_known_errors = true

# client.retry_interval = 1 # [sec], default: 1
# client.retry_times = 1 # [times], default: 1

WebHDFS::FileUtils

require 'webhdfs/fileutils'
WebHDFS::FileUtils.set_server(host, port)
# or
WebHDFS::FileUtils.set_server(host, port, username, doas)

WebHDFS::FileUtils.copy_from_local(localpath, hdfspath)
WebHDFS::FileUtils.copy_to_local(hdfspath, localpath)

WebHDFS::FileUtils.append(path, data)

For HttpFs

For HttpFs instead of WebHDFS:

client = WebHDFS::Client.new('hostname', 14000)
client.httpfs_mode = true

client.read(path) #=> data

# or with webhdfs/filetuils
WebHDFS::FileUtils.set_server('hostname', 14000)
WebHDFS::FileUtils.set_httpfs_mode
WebHDFS::FileUtils.copy_to_local(remote_path, local_path)

For HTTP Proxy servers

client = WebHDFS::Client.new('hostname', 14000, 'proxy.server.local', 8080)
client.proxy_user = 'jack'   # if needed
client.proxy_pass = 'secret' # if needed

For SSL

Note that net/https and openssl libraries must be available:

client = WebHDFS::Client.new('hostname', 4443)
client.ssl = true
client.ssl_ca_file = "/path/to/ca_file.pem" # if needed
client.ssl_varify_mode = :peer # if needed (:none or :peer)
client.ssl_version = :TLSv1 # if needed

For Kerberos Authentication

Note that gssapi library must be available:

client = WebHDFS::Client.new('hostname', 14000)
# or if you want to use client delegation token with renewing per 8 hours
client = WebHDFS::Client.new('hostname', 14000, username, nil, nil, nil, {}, 8)

client.kerberos = true
client.kerberos_keytab = "/path/to/project.keytab"

For SSL Client Authentication

Note that openssl libraries must be available:

require 'openssl'

client = WebHDFS::Client.new(host, port)
client.ssl = true
client.ssl_key = OpenSSL::PKey::RSA.new(open('/path/to/key.pem'))
client.ssl_cert = OpenSSL::X509::Certificate.new(open('/path/to/cert.pem'))

AUTHORS

Kazuki Ohta [email protected]
TAGOMORI Satoshi [email protected]

LICENSE

Copyright: Copyright (c) 2012- Fluentd Project
License: Apache License, Version 2.0

webhdfs's People

Contributors

Stargazers

Watchers

webhdfs's Issues

writing to webHdfs in parquet/avro format

Hi,

we have use case where we want to read from the Elasticsearch and write into HDFS. for this we are using WebHdfs output plugin in Logstash. below is our logstash config for reference.

input {
elasticsearch {
hosts => "192.168.0.3"
index => "test"
query => '{"query": {"term": {"Name": "test"}}}'
size => 500
scroll => "5m"
}
}
output {
webhdfs {
host => "192.168.0.2"
port => 50070 # (required)
path => "/user/logstash/test1" # (required)
user => "hdfs" # (required)
flush_size => 500
idle_flush_time => 10
retry_interval => 10
codec => json
}
}

which is working fine for us. now we have a requirement where we want to write output in parquet/avro format in HDFS.

is there any config parameter by which we can write data in HDFS in Avro or Parquet format.

Error 403 GSSException: No valid credentials provided

when i use webhdfs with kerberos hadoop cluster

[hdfs@nfjd-hadoop02-node27 ~]$ irb
irb(main):001:0> require 'webhdfs'
=> true
irb(main):002:0> require 'gssapi'
=> true
irb(main):003:0> client = WebHDFS::Client.new('nfjd-hadoop02-node169', 50070)
=> #<WebHDFS::Client:0xfd0e5b6 @ssl_ca_file=nil, @ssl_version=nil, @kerberos_keytab=nil, @doas=nil, @retry_times=1, @httpfs_mode=false, @proxy_port=nil, @ssl_verify_mode=nil, @kerberos=false, @ssl=false, @username=nil, @ssl_cert=nil, @http_headers={}, @proxy_address=nil, @retry_known_errors=false, @port=50070, @host="nfjd-hadoop02-node169", @retry_interval=1, @ssl_key=nil>
irb(main):004:0> client.kerberos = true
=> true
irb(main):005:0> client.kerberos_keytab = "/home/hdfs/keytab/hdfs.keytab"
=> "/home/hdfs/keytab/hdfs.keytab"
irb(main):006:0> 
irb(main):007:0* 
irb(main):008:0* client.list('/user/dengsc')
WebHDFS::IOError: <html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/><title>Error 403 GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)</title></head><body><h2>HTTP ERROR 403</h2><p>Problem accessing /webhdfs/v1/user/dengsc. Reason:<pre>    GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails)</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                <br/>                                                </body></html>
	from /home/hdfs/logstash-5.6.3/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:401:in `request'
	from /home/hdfs/logstash-5.6.3/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:275:in `operate_requests'
	from /home/hdfs/logstash-5.6.3/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:138:in `list'
	from (irb):8:in `evaluate'
	from org/jruby/RubyKernel.java:1079:in `eval'
	from org/jruby/RubyKernel.java:1479:in `loop'
	from org/jruby/RubyKernel.java:1242:in `catch'
	from org/jruby/RubyKernel.java:1242:in `catch'
	from /home/hdfs/logstash/vendor/jruby/bin/irb:13:in `(root)'

json file =》 hdfs

i have a question：
run.log each line is a json, logstash "input"

input{
  file {
    type => "run-log"
    path => "/data/log/run.log"
    codec => "json"
  }
}

that "output"

output{
     if [type] == "run-log" {
    webhdfs {
      host => "192.168.9.7"
      user => "hue"
      path => "/user/hue/run-log"
    }
    }
}

the hadoop "hadoop fs -tail /user/hue/run-log"
2016-05-06T07:06:43.332Z app-node4.nginxs.net %{message}

My configuration is wrong? Please tell me，thinks

using keytab file for kerberos auth

Hi,
First, thank you for a great ruby gem!

I am trying to use webhdfs gem with Kerberos auth.
It works with a cached kerberos credential, but I am wondering whether it can use
a keytab file instead of a cached one.

Best wishes,

Failed to open TCP connection to d3.node.hadoop:1022

ENV

Ambari: 2.6.2.0
HDP: 2.6.5.0-292


HDFS : 2.7.3

Gssapi : 1.2.2

Hello, the following error occurred when I used your sample code to write data to HDFS using webhdfs. If you have time, I hope to get your answer. thank you

require 'webhdfs'
client = WebHDFS::Client.new("192.168.10.1", 50070, "whg")
# or with pseudo username authentication
# # client = WebHDFS::Client.new(hostname, port, username)
client.kerberos = true
client.kerberos_keytab = "/root/keytab/whg.keytab"
a =  client.list("/user/whg/")
print "list:\t", a, "\n\n"
created = client.create('/user/whg/test/webhdfs.txt', 'webhdfs create file success')
print "created:\t", created, "\n\n"
appended = client.append('/user/whg/test/webhdfs.txt', 'webhdfs append success')
print "appended:\t", appended, "\n"

[root@localhost logstash-6.5.3]# ./vendor/jruby/bin/jruby a.rb 
list:   [{"accessTime"=>0, "blockSize"=>0, "childrenNum"=>0, "fileId"=>5243995, "group"=>"hadoop", "length"=>0, "modificationTime"=>1542304804276, "owner"=>"whg", "pathSuffix"=>".Trash", "permission"=>"700", "replication"=>0, "storagePolicy"=>0, "type"=>"DIRECTORY"}, {"accessTime"=>0, "blockSize"=>0, "childrenNum"=>2, "fileId"=>5243966, "group"=>"hadoop", "length"=>0, "modificationTime"=>1542850334598, "owner"=>"whg", "pathSuffix"=>".hiveJars", "permission"=>"755", "replication"=>0, "storagePolicy"=>0, "type"=>"DIRECTORY"}, {"accessTime"=>0, "blockSize"=>0, "childrenNum"=>1, "fileId"=>6883669, "group"=>"hadoop", "length"=>0, "modificationTime"=>1515032558152, "owner"=>"whg", "pathSuffix"=>".sparkStaging", "permission"=>"755", "replication"=>0, "storagePolicy"=>0, "type"=>"DIRECTORY"}, {"accessTime"=>0, "blockSize"=>0, "childrenNum"=>2, "fileId"=>5481118, "group"=>"hadoop", "length"=>0, "modificationTime"=>1514960910443, "owner"=>"whg", "pathSuffix"=>"Documents", "permission"=>"755", "replication"=>0, "storagePolicy"=>0, "type"=>"DIRECTORY"}, {"accessTime"=>0, "blockSize"=>0, "childrenNum"=>2, "fileId"=>5318473, "group"=>"hadoop", "length"=>0, "modificationTime"=>1545616632480, "owner"=>"whg", "pathSuffix"=>"data", "permission"=>"755", "replication"=>0, "storagePolicy"=>0, "type"=>"DIRECTORY"}, {"accessTime"=>0, "blockSize"=>0, "childrenNum"=>3, "fileId"=>6883203, "group"=>"hadoop", "length"=>0, "modificationTime"=>1515029729092, "owner"=>"whg", "pathSuffix"=>"test", "permission"=>"755", "replication"=>0, "storagePolicy"=>0, "type"=>"DIRECTORY"}]

WebHDFS::ServerError: Failed to connect to host d1.node.hadoop:1022, Failed to open TCP connection to d1.node.hadoop:1022 (initialize: name or service not known)
           request at /usr/local/logstash-6.5.3/vendor/jruby/lib/ruby/gems/shared/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:351
  operate_requests at /usr/local/logstash-6.5.3/vendor/jruby/lib/ruby/gems/shared/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:270
            create at /usr/local/logstash-6.5.3/vendor/jruby/lib/ruby/gems/shared/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:73
            <main> at a.rb:9

Glob support?

Currently issuing something like client.get("/foo/*/bar") always results in {"RemoteException":{"exception":"FileNotFoundException"...}}

Compared to the CLI "hdfs dfs -ls '/foo/*/bar'" which lists the matches as expected.

Do you know whether the webhdfs API has any support for globbing or it's done purely client-side ? If so, any chance it could be added to ruby-webhdfs ?

Append api could not throw WebHDFS::FileNotFoundError?

Hi,

It will throw a exception named Errno::EPIPE: Broken pipe if I append the data which is bigger than 3M and the path is not exist.

2.2.1 :058 > @client.append("/tmp/webhdfs_append_test2", "ab" * 512 * 1024 * 3)
Errno::EPIPE: Broken pipe
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/protocol.rb:211:in `write'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/protocol.rb:211:in `write0'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/protocol.rb:185:in `block in write'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/protocol.rb:202:in `writing'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/protocol.rb:184:in `write'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http/generic_request.rb:187:in `send_request_with_body'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http/generic_request.rb:120:in `exec'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http.rb:1412:in `block in transport_request'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http.rb:1411:in `catch'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http.rb:1411:in `transport_request'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http.rb:1384:in `request'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http.rb:1377:in `block in request'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http.rb:853:in `start'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http.rb:1375:in `request'
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http.rb:1355:in `send_request'
    from /Users/zhangzhonglai/.rvm/gems/ruby-2.2.1@zzl/gems/webhdfs-0.7.0/lib/webhdfs/client_v1.rb:329:in `request'
    from /Users/zhangzhonglai/.rvm/gems/ruby-2.2.1@zzl/gems/webhdfs-0.7.0/lib/webhdfs/client_v1.rb:261:in `operate_requests'
    from /Users/zhangzhonglai/.rvm/gems/ruby-2.2.1@zzl/gems/webhdfs-0.7.0/lib/webhdfs/client_v1.rb:75:in `append'
    from (irb):58
    from /Users/zhangzhonglai/.rvm/rubies/ruby-2.2.1/bin/irb:11:in `<main>'2.2.1 :059 >

What's wrong about that?

Feature Request: specify two servers for ha cluster

Currently we can only specify a single name node, for a ha-cluster, when the name nodes becomes standby, we hope logstash can switch to the new active name node.

Not enough information in request exception

My hadoop setup consists of a few datanodes. And the hosts file /etc/hosts supposed to be something like:

IP_ADDRESS_1 hdslave01 name1
IP_ADDRESS_2 hdslave01 name2
IP_ADDRESS_3 hdslave01 name3

Let's say for some reason, some entries in the host file don't exist.

When I try to read a file from HDFS with the below code snippet:

client = WebHDFS::Client.new(@web_host, @web_port, 'user')
file = client.read(@path);

I understand that Hadoop will perform a redirection to the actual datanode upon
received the above request. As a result I receive a not so useful error message like:

#<SocketError: getaddrinfo: Name or service not known>

where the actual problem maybe that the name3 host is not defined in my hosts
file. Is there a way to construct a more meaningful error message when this exception is raised?

I see that there is no exception handling in the call conn.request and conn.send_request in request method in client_v1.rb. I can submit a pull request later if needed.

Thanks.

HA Namenode suppport?

We have two namenodes for high availability and get StandbyException when using the non-active namenode, which makes sense.

WebHDFS::IOError: {"RemoteException":{"exception":"StandbyException","javaClassName":"org.apache.hadoop.ipc.StandbyException","message":"Operation category READ is not supported in state standby"}}
    from /usr/local/lib/ruby/gems/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:401:in `request'
    from /usr/local/lib/ruby/gems/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:275:in `operate_requests'
    from /usr/local/lib/ruby/gems/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:138:in `list'
    from (irb):5
    from /usr/local/bin/irb:11:in `<main>'

However, is it up to the client to figure which is the active namenode to use as the host in this library? Is there a way to specify multiple host address for this situation?

SocketError: getaddrinfo: Name or service not known

network is ok,and test connect webhdfs is error

[root@localhost ~]# nc -vv  192.168.1.122 50070
Connection to 192.168.1.122 50070 port [tcp/*] succeeded!
s
HTTP/1.1 400 Bad Request
Connection: close
Server: Jetty(6.1.26)

[root@localhost ~]# /usr/lib64/fluent/ruby/bin/irb
irb(main):001:0> require 'webhdfs'
=> true
irb(main):002:0> client = WebHDFS::Client.new('192.168.1.122','50070')
=> #<WebHDFS::Client:0x0000000127f7a8 @host="192.168.1.122", @port="50070", @username=nil, @doas=nil, @proxy_address=nil, @proxy_port=nil, @retry_known_errors=false, @retry_times=1, @retry_interval=1, @httpfs_mode=false>
irb(main):003:0> client.create('/test', 'data')
SocketError: getaddrinfo: Name or service not known
    from /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:763:in `initialize'
    from /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:763:in `open'
    from /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:763:in `block in connect'
    from /usr/lib64/fluent/ruby/lib/ruby/1.9.1/timeout.rb:55:in `timeout'
    from /usr/lib64/fluent/ruby/lib/ruby/1.9.1/timeout.rb:100:in `timeout'
    from /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:763:in `connect'
    from /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:756:in `do_start'
    from /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:745:in `start'
    from /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:1285:in `request'
    from /usr/lib64/fluent/ruby/lib/ruby/1.9.1/net/http.rb:1265:in `send_request'
    from /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/webhdfs-0.5.5/lib/webhdfs/client_v1.rb:281:in `request'
    from /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/webhdfs-0.5.5/lib/webhdfs/client_v1.rb:242:in `operate_requests'
    from /usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems/webhdfs-0.5.5/lib/webhdfs/client_v1.rb:45:in `create'
    from (irb):3
    from /usr/lib64/fluent/ruby/bin/irb:12:in `<main>'

bad references to git in gemspec

webhdfs 0.7.4

When installing this as a dependency for logstash-output-webhdfs, the install fails due to a dependency on git in the gemspec.

bin # ./plugin install --local --no-verify /root/webhdfs-0.7.4.gem 
Installing webhdfs
Error Bundler::GemspecError, retrying 1/10
There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'
Error Bundler::GemspecError, retrying 2/10
There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'
Error Bundler::GemspecError, retrying 3/10
There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'
Error Bundler::GemspecError, retrying 4/10
There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'
Error Bundler::GemspecError, retrying 5/10
There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'
Error Bundler::GemspecError, retrying 6/10
There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'
Error Bundler::GemspecError, retrying 7/10
There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'
Error Bundler::GemspecError, retrying 8/10
There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'
Error Bundler::GemspecError, retrying 9/10
There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'
Error Bundler::GemspecError, retrying 10/10
There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'
Too many retries, aborting, caused by Bundler::GemspecError
ERROR: Installation Aborted, message: There was a Errno::ENOENT while loading webhdfs.gemspec: 
No such file or directory - git from
  /opt/logstash/vendor/local_gems/b47f8aeb/webhdfs-0.7.4/webhdfs.gemspec:14:in `eval_gemspec'

https://github.com/kzk/webhdfs/blob/master/webhdfs.gemspec#L14-L16

Feature requests: SSL and Kerberos HTTP SPNEGO

In my environment, the httpfs server requires SSL and Kerberos HTTP SPNEGO.
So for example, by using curl I talk to the httpfs server like:

$ kinit
$ curl --cacert /path/to/ca-cert.pem --negotiate -u : "https://httpfs-host:4443/webhdfs/v1/user/foo?op=liststatus"

Could it be possible for the webhdfs gem to support SSL and Kerberos HTTP SPNEGO?

Any interest in a remote copy/merge feature (hdfs -> local)?

Effectively something along these lines: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileUtil.html#copyMerge

I have an alpha implementation of this inside a project I'm currently working on, but thought it might be a good addition this one...