Giter Club home page Giter Club logo

webhdfs's Introduction

node-webhdfs

Build Status NPM version

Hadoop WebHDFS REST API (2.2.0) client library for node.js with fs module like (asynchronous) interface.

Examples

Writing to the remote file:

var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();

var localFileStream = fs.createReadStream('/path/to/local/file');
var remoteFileStream = hdfs.createWriteStream('/path/to/remote/file');

localFileStream.pipe(remoteFileStream);

remoteFileStream.on('error', function onError (err) {
  // Do something with the error
});

remoteFileStream.on('finish', function onFinish () {
  // Upload is done
});

Reading from the remote file:

var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient({
  user: 'webuser',
  host: 'localhost',
  port: 80,
  path: '/webhdfs/v1'
});

var remoteFileStream = hdfs.createReadStream('/path/to/remote/file');

remoteFileStream.on('error', function onError (err) {
  // Do something with the error
});

remoteFileStream.on('data', function onChunk (chunk) {
  // Do something with the data chunk
});

remoteFileStream.on('finish', function onFinish () {
  // Upload is done
});

TODO

  • Implement all calls (GETCONTENTSUMMARY, GETFILECHECKSUM, GETHOMEDIRECTORY, GETDELEGATIONTOKEN, RENEWDELEGATIONTOKEN, CANCELDELEGATIONTOKEN, SETREPLICATION, SETTIMES)
  • Improve documentation
  • More examples

Tests

Running tests:

npm test

License

MIT

webhdfs's People

Contributors

bernadinm avatar harrisiirak avatar hmalphettes avatar kirbysayshi avatar moshewe avatar pavelvanecek avatar xiao4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

webhdfs's Issues

writeFile not returning errors to callback

The writeFile method is not returning any errors to the client callback. I stepped through the code and it looks like the remoteStream.on('error') function (line 414) is being called after the remoteStream.on('finish') (line 418). This is causing the finish to call the callback w/o the error.

Steps to reproduce:
1.) create webHDFS client object with the host option pointed to a non-existing HDFS instance
2.) call hdfs.writeFile
3.) validate the error variable returned is null

readFile calls callback twice

hdfs.readFile('/path/to/file', function(err, content) { console.log('RETURNED'); });

readFile seems to call callback twice: once with an error for file not found, and once without an error and a valid buffer.

It appears that the finish event is being fired more than once.

upload to hdfs not working

When piping to stdout the local file is read, however the file ending up on HDFS is empty (no errors)

(also see http://stackoverflow.com/questions/25261469/uploading-a-file-to-hdfs-through-node-js-and-hdfs-module)

var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();
var fs = require('fs')

var localFilePath = "stupidfile.txt";
var remoteFilePath = "/user/cloudera/doesthiswork.txt";

var localFileStream = fs.createReadStream(localFilePath);
var remoteFileStream = hdfs.createWriteStream(remoteFilePath);

localFileStream.pipe(remoteFileStream);

console.log("opening stream to HDFS");

remoteFileStream.on('error', function onError (err) {
// Do something with the error
console.log("it failed");
console.log(err);
});

remoteFileStream.on('finish', function onFinish () {
// Upload is done
console.log("it is done!");
});

the console outputs

[cloudera@quickstart Documents]$ node hdfs-upload.js
opening stream to HDFS
it is done!

exists not returning errors

The exists method is not returning errors to the client and defaulting to false if an error is returned from HDFS. This should return the error to the client so the client can decide how it wants to handle the error as opposed to burying it.

I'll try and get a PR for these tickets.

Getting SSL connection options through to request

We're having trouble figuring out how to get SSL options (such as described here: https://github.com/request/request#tlsssl-protocol) through to the request module. Basically we've got an SSL port on our WebHDFS service with a self-signed cert and we'd like to be able to provide the CA certificate or turn of authorization (like NODE_TLS_REJECT_UNAUTHORIZED=0, but not global).

I've tried adding the following to both the createClient call and in the third argument of the createWriteStream call, where ca is our root certificate:

ca: ca, rejectUnauthorized: false, agentOptions: { ca: ca, rejectUnauthorized: false }

But nothing seems to work. Is there something I'm missing?

Missing features

Hi,

On your todo list you plan to implement all calls, do you happen to have an ETA for this? If not, I could also give it a shot of getting them implemented.

I will be needing: GETFILECHECKSUM GETHOMEDIRECTORY GETDELEGATIONTOKEN RENEWDELEGATIONTOKEN CANCELDELEGATIONTOKEN SETREPLICATION

Calling writeFile not working twice.

I am trying to append to a file.

hdfs.writeFile('/loginput/log.txt', "XYZ", true, function(err) {
    if(err) {
        console.log("ERROR");
        console.log(err);
    }
});

hdfs.writeFile('/loginput/log.txt', "PQR", true, function(err) {
    if(err) {
        console.log("ERROR");
        console.log(err);
    }
});

I was expecting XYZPQRXYZPQR... but getting only XYZXYZ.

The issue seems to because of the pipe being open after the write is complete. I tried creating a new client to the same HDFS, still the same result.

Read/Write with Kerberos

Hi I am trying to read/write in webhdfs/v1/tmp/test directory. But when but i got empty result in read file op. could you plz help me if I am doing any wrong.
` let hdfs = WebHDFS.createClient({
user: "<>",
host: "<<host/IP >>",
port: 50070,
path: "webhdfs/v1/tmp/test/"
});

let remoteFileStream = hdfs.createReadStream('test_hdfs.txt');
remoteFileStream.on("error", function onError(err) { //handles error while read
// Do something with the error
console.log("...error: ", err);
});
let dataStream = [];
remoteFileStream.on("data", function onChunk(chunk) { //on read success
// Do something with the data chunk
dataStream.push(chunk);
console.log('..chunk..',chunk);
});
remoteFileStream.on("finish", function onFinish() { //on read finish
console.log('..on finish..');
console.log('..file data..',dataStream);
}); `

I follow below link
https://dzone.com/articles/accessing-bigdata-hadoop-hdfs-data-by-using-nodejs

WritableStream gets reset after pipe()

I think there was a change to the streamable behavior. Basically, the following occurs:

var localFileStream = fs.createReadStream('/path/to/local/file');
var remoteFileStream = hdfs.createWriteStream('/path/to/remote/file');
localFileStream.pipe(remoteFileStream);

Inside WebHDFS.prototype.createWriteStream(), we attach to the 'pipe' event. This unwinds the source with src.unpipe(req), save and pause the stream. The problem is that the in stream_readable.js, there is something akin to the following:

  dest.emit('pipe', src);

  // start the flow if it hasn't been started already.
  if (!state.flowing) {
    debug('pipe resume');
    src.resume();
  }

The emit makes sense and is where we receive the 'pipe' event. The pause() sets the state.flowing to false. And then once the emitted event handler finishes, the second half of that block is invoked. This re-enables event flow within the source. Basically, this breaks the detach/reattach model.

The simple workaround is for the author to pause the stream immediately after calling pipe.

So,

rstream.pipe(wstream);
rstream.pause();

unable to delete a folder which is not empty

hi,
I am unable to delete a folder which is not empty and getting the following error.

{"RemoteException":{"exception":"PathIsNotEmptyDirectoryException","javaClassName":"org.apache.hadoop.fs.PathIsNotEmptyDirectoryException","message":"`/aakash is non empty': Directory is not empty"}}

can't upload large size file

Hi, i've try to upload small size of file and it succeed. but when i try to upload 130Mb file, it always failed. is there any options configurations that i must set first to do this operation?

sometimes i got this error msg.

Error: BP-1312390296-192.168.1.228-1545800077608:blk_1073838522_97940 does not exist or is not under Constructionnull

i've try to increase the buffer size like below this but it still failed :

var localFileStream = fs.createReadStream('/tmp/test.jar', { highWaterMark: 1024 * 1024 });
var remoteFileStream = hdfs.createWriteStream('/user/training/upload/test.jar', { highWaterMark: 1024 * 1024 });

Problem when piping an expressjs request into a writable stream

I'm facing a problem when i try to pipe a Express.js request into a webhdfs writable stream. Apparently the request gets stalled and never ends. I attach the code i'm using below:

post(req, res, next) {
        // perform basic content negotiation check to avoid invoke de backend
        // unnecessary
        if (!req.is("text/csv")) {
            res.status(415).end();
            return;
        }

        let destination = this.fs.createWriteStream(
            `/tmp/data.csv`
        );

        destination.on("error", (err) => { console.log(err.message); next(err); });
        destination.on("finish", () => res.status(201).end());

        req.pipe(destination);

        req.on("end", () => console.log("end"));
    }

On Finish callback calls each time

How can we identify if file is successfully uploaded because on finish callback calls each time there is error or file is uploaded successfully.

Module throws exception on `write`

The following code throws an exception:

var whdfs = require('webhdfs'),
      hdfs = null;

hdfs = whdfs.createClient(...);
var remotefd = hdfs.createWriteStream('/tmp/writetst.txt');
remotefd.on('error', err => { console.error(err); process.exit(); });
remotefd.on('finish', () => { console.log("Done with remote write."); process.exit(); });
remotefd.write('test');
remotefd.end();

The equivalent using pipe or writeFile works fine:

hdfs.writeFile('/tmp/writetst.txt', err => { console.log(err); process.exit(); });

Here's the exception:

/mnt/alpha/production/srcs/master-latest-git/3p/pkgs-linux/node/lib/node_modules/webhdfs/lib/webhdfs.js:551
      stream.pipe(upload);
            ^
TypeError: Cannot read property 'pipe' of null
    at Request._callback (/mnt/alpha/production/srcs/master-latest-git/3p/pkgs-linux/node/lib/node_modules/webhdfs/lib/webhdfs.js:551:13)

Does it support multiple namenodes?

We are giving the host as namenode and in case of failover my namenode changes. Is there any way i can give multiple namenodes and based on which one is live, it is able to connect ?

faced "java.lang.UnsupportedOperationException","message":"Operation APPEND is not supported."}} (error 400)

Doing below faced error:
curl http://:50073/WebWasb/webhdfs/v1/user/hadoop/HiveTableDDL.sql.tmp?op=APPEND

WebHdfsException: 400 Client Error: Bad Request for url: http://10.151.252.24:50073/WebWasb/webhdfs/v1/user/hadoop/HiveTableDDL.sql.tmp?op=APPEND&user.name=hue&doas=hadoop
{"RemoteException":{"exception":"UnsupportedOperationException","javaClassName":"java.lang.UnsupportedOperationException","message":"Operation APPEND is not supported."}} (error 400)
[18/Nov/2022 10:00:38 +0800] access INFO 192.168.219.247 hadoop - "POST /filebrowser/upload/file HTTP/1.1" returned in 498ms 200 2584

{"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"No enum constant org.apache.hdinsight.operations.GetOperation.APPEND"}}

Hadoop:
3.1.1

Could you please tell what's the reason and how to mitigate it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.