Giter Club home page Giter Club logo

netapp-hadoop-nfs-connector's Introduction

NetApp-Hadoop-NFS-Connector

This site is obsolete. Please use NetApp FAS NFSConnector product from now.netapp.com

Overview

The Hadoop NFS Connector allows Apache Hadoop (2.2+) and Apache Spark (1.2+) to use a NFSv3 storage server as a storage endpoint. The NFS Connector supports two modes: (1) secondary filesystem - where Hadoop/Spark runs using HDFS as its primary storage and can use NFS as a second storage endpoint, and (2) primary filesystem - where Hadoop/Spark runs entirely on a NFSv3 storage server.

The code is written in a way such that existing applications do not have to change. All one has to do is to copy the connector jar into the lib/ directory of Hadoop/Spark. Then, modify core-site.xml to provide the necessary details.

NOTE: The code is in beta. We would love for you to try it out and give us feedback.

This is the first release and it does the following:

  • Connects to a NFSv3 storage server supporting AUTH_NONE or AUTH_SYS authentication method.
  • Works with Apache Hadoop (vanilla) 2.2 or newer, Hortonworks HDP 2.2 or newer
  • Supports all operations defined by the Hadoop FileSystem interface.
  • Pipelines the READ/WRITE requests to utilize the underlying network (works fine with 1GbE and 10GbE networks)

We are planning to add these in the near future:

  • Ability to connect to multiple NFS endpoints (multiple IP addresses). This allows for even more bandwidth.
  • Integrate with Hadoop user authentication

How to use

Once the NFS connector is configured, you can easily invoke it from the command-line using the Hadoop shell.

  console> bin/hadoop fs -ls nfs://<nfs-server-hostname>:2049/ (if using as secondary filesystem)
  console> bin/hadoop fs -ls / (if using as default/primary filesystem)

When new jobs are submitted, you can simply provide it as an input or output path or both:

  (assuming NFS is used as a secondary filesystem)
  console> bin/hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-hostname>:2049/tera/in /tera/out
  console> bin/hadoop jar <path-to-examples> jar terasort /tera/in nfs://<nfs-server-hostname>:2049/tera/out
  console> bin/hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-hostname>:2049/tera/in nfs://<nfs-server-hostname>:2049/tera/out

Configuration

  1. Compile the project ``` console> mvn clean package ```
  2. Copy the jar file to the shared common library directory based on your Hadoop installation. For example, for hadoop-2.4.1: ``` console> cp target/hadoop-connector-nfsv3-1.0.jar $HADOOP_HOME/share/hadoop/common/lib/ ```
  3. Add parameters of NFSv3 connector into core-site.xml located in Hadoop configuration directory (e.g., for hadoop-2.4.1: $HADOOP_HOME/conf) ``` fs.defaultFS nfs://:2049 fs.nfs.configuration /nfs-mapping.json fs.nfs.impl org.apache.hadoop.fs.nfs.NFSv3FileSystem fs.AbstractFileSystem.nfs.impl org.apache.hadoop.fs.nfs.NFSv3AbstractFilesystem ```
  4. Start Hadoop. NFS can now be used inside Hadoop.

netapp-hadoop-nfs-connector's People

Contributors

amarnath-rachapudi avatar ecki avatar gsoundar avatar jxfeng avatar nkarthik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

netapp-hadoop-nfs-connector's Issues

Connector fails on Unsupported verifier flavorAUTH_SYS

Hi,
Were trying to use the connector to connect to a normal linux NFS share but receive the following exception:
[root@ip-172-31-11-139 ~]# hadoop fs -ls /
16/01/14 11:40:53 ERROR rpc.RpcClientHandler: RPC: Got an exception
java.lang.UnsupportedOperationException: Unsupported verifier flavorAUTH_SYS
at org.apache.hadoop.oncrpc.security.Verifier.readFlavorAndVerifier(Verifier.java:45)
at org.apache.hadoop.oncrpc.RpcDeniedReply.read(RpcDeniedReply.java:50)
at org.apache.hadoop.oncrpc.RpcReply.read(RpcReply.java:67)
at org.apache.hadoop.fs.nfs.rpc.RpcClientHandler.messageReceived(RpcClientHandler.java:62)
at org.jboss.netty.handler.timeout.IdleStateAwareChannelHandler.handleUpstream(IdleStateAwareChannelHandler.java:36)
at org.jboss.netty.handler.timeout.IdleStateHandler.messageReceived(IdleStateHandler.java:294)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

AUTH_SYS does not work with standard linux/Unix nfs server

This is really a problem in the Hadoop code but it makes the connector unusable for most filers.
In CredentialsSys length is calculated like this:
mCredentialsLength = 20 + mHostName.getBytes().length;
However in linux/Unix (svc_auth_unix.c) the hostname's length is rounded up:
str_len = RNDUP(str_len);

This leads to the following error message in mountd when the host name has 33 chars:
bad auth_len gid 0 str 36 auth 53

How is this working in the Netapp filers?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.