Giter Club home page Giter Club logo

teraserver-hdfs's Introduction

teraserver-hdfs

Teraserver plugin to allow downloading of files from HDFS

To use

node service.js -c /Path/to/config

API Endpoint

/api/v1/hdfs

URLs will follow this structure /api/v1/hdfs/endpoint/path/to/file.txt?token=TOKEN&ticket=DOWNLOAD_TICKET

This breaks down into several components.

  • endpoint - this is used to lookup the HDFS configuration for retrieving files. This is set in config.
  • path - This is the path in HDFS relative to the directory in HDFS defined in the configuration.
  • filename - The name of the file being retrieved
  • token - Is a standard TeraServer API token.
  • ticket - Is a shared secret between the endpoint and the user of the API. It is set in configuration.

By defualt this will download the file whole file if it is less then the byteInterval which is located in server/api/hdjs.js. If the file size is greater, then it will repeatedly send chunks specified by byteInterval until it is fully downloaded. Its value is curently set to 1 megabyte but you are free to change it based on your networks capacity and latency. This plugin also supports parital downloads, just follow the http spec concerning Range in your header.

Example to download first megabyte of file and save it as bigfile.log in current directory

curl -r 0-1000000 'localhost:8000/api/v1/hdfs/ENDPOINT/FILE/?token=TOKEN&ticket=TICKET' -o bigFile.log

Example config of hdfs plugin for teraserver


 "terafoundation": {
    "environment": "development",
    "log_path": "log/path",
    "connectors": {
      "hdfs": {
        "default": {
          "user": "User",
          "namenode_port": 50070,
          "namenode_host": "localhost",
          "path_prefix": "/webhdfs/v1"      // this is standard http api for hadoop hdfs
        },
        "second_connection": {
          "user": "User",
          "namenode_port": 50070,
          "namenode_host": "someOtherHost",
          "path_prefix": "/webhdfs/v1"      // this is standard http api for hadoop hdfs
        }
      }
    }
  }
"teraserver-hdfs": {
    "endpoint": {
      "connection": "default_connection",
      "directory": "/some/dir",        // set path relative to root
      "ticket": "secretPassword1"    //set whatever password you prefer, it must pass a === check
    },
    "other_endpoint": {
      "connection": "second_connection",
      "directory": "/another/dir",              // set path relative to root
      "ticket": "secretPassword2"   //set whatever password you prefer, it must pass a === check
    }
  }
};

In teraserver-hdfs, you specify the endpoints that are available. Each endpoint must specify a ticket, which is essentially any password you would like to set, it must be able to pass a === check. Users must provide this ticket on each request to access the api. Setting a path will restrict users to that directory and any subdirectory it contains. A request by the user with ../ in the file path will be rejected. the connection key must match one of the namespaces set in terafoundation.connectors.hdfs as shown above. If not set it will use the default connection.

Example of file upload

curl -XPOST -H 'Content-type: application/octet-stream' --data-binary @/path/to/file 'localhost:8000/api/v1/hdfs/ENDPOINT/FILENAME?token=TOKEN&ticket=TICKET'

Response: 'Upload Complete'

It is important to set the content-type to application/octect-stream and to use the --data-binary flag If not then curl could parse the file itself and will corrupt any binary data and even change the formating of regular text files.

Example of file deletion

$  curl -XDELETE 'localhost:8000/api/v1/hdfs/ENDPOINT/FILE?token=TOKEN&ticket=TICKET'

Response: 'Deletion successful'

teraserver-hdfs's People

Contributors

jsnoble avatar kstaken avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

jsnoble

teraserver-hdfs's Issues

Update documentation

Documentation is needed on uploads and delete and the config section should be updated based on latest terafoundation changes.

Invalid endpoint error handing

If an endpoint is provided that isn't in the configuration you get a confusing error about tickets. Should provide a more accurate response.

{"error":"Cannot read property 'ticket' of undefined"}

Get connection from terafoundation

It's not quite possible to do this yet but terafoundation will be getting an HDFS connector. When that's available this should switch to using that instead of handling the connection directly.

Requirements

This will be a plugin for Teraserver that allows one or more directories stored in HDFS to be exposed through a REST API.

URLs should look something like /api/v1/hdfs/endpoint1/path/to/file.txt?token=TOKEN&ticket=DOWNLOAD_TICKET

This breaks down into several components.

  • endpoint - this is used to lookup the HDFS configuration for retrieving files. Through configuration many endpoints can be defined.
  • path - This is the path in HDFS relative to the directory in HDFS defined in the configuration.
  • filename - The name of the file being retrieved
  • token - Is a standard TeraServer API token.
  • ticket - Is a shared secret between the endpoint and the user of the API. It is configured on the endpoint and is always required.

Configuration should be in Teraserver config.js and will look something like this.

config.teraserver-hfds = {};
config.teraserver-hdfs.endpoint1 = {
    namenode: 'namenode1.example.com',
    directory: '/path/in/hdfs1',
    ticket: 'SOMEVALUE'
};
config.teraserver-hdfs.endpoint2 = {
    namenode: 'namenode2.example.com',
    directory: '/path/in/hdfs1',
    ticket: 'SOMEVALUE'
};

With that configuration a request to /api/v1/hdfs/endpoint1/deeper/path/file.txt will translate into:

  • A request on namenode namenode1.example.com for path /path/in/hdfs1/deeper/path/file.txt

NOTE: It's very important that the path and file name components are sanitized so that relative path navigation can not be used to access files outside the configured directory. i.e /deeper/path/../../../some/other/path

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.