Giter Club home page Giter Club logo

Comments (7)

gmouchakis avatar gmouchakis commented on June 27, 2024

I'm not sure if I understand correctly what you are trying to do but if you use LOAD DATA with a path from another container then it is expected to fail because the hive container and the other container do not share a filesystem.

from docker-hive.

enanablancaynumeros avatar enanablancaynumeros commented on June 27, 2024

Hi @gmouchakis, that is right. What I am trying to do is to push data into HDFS through the REST API of the HDFS system and then to be able to use it from hive, pretty much as I am doing in production environments with my company cluster.

I also discovered the /user/hive/warehouse path created by HIVE does not appear on the HDFS filesystem defined by this docker compose environment, which means the hive docker container is somehow pointing to a different HDFS, but I still can't find the root of the problem.

If I am right, HIVE uses namenode to find in which data node you have the information in the HDFS filesystem, but not in this containerized environment. Everything should happen by http interactions between docker containers isn't it?

from docker-hive.

earthquakesan avatar earthquakesan commented on June 27, 2024

Hi @enanablancaynumeros! Was a stupid mistake from my side, Hadoop inside Hive container was not setup to work with remote hadoop. Fixed now

I've also updated hadoop to 2.8. If you still have this problem, feel free to reopen the issue.

from docker-hive.

earthquakesan avatar earthquakesan commented on June 27, 2024

I broke jdbc connector now, give me a moment.

from docker-hive.

earthquakesan avatar earthquakesan commented on June 27, 2024

First start-up the system as written in README.md.

Connect to hive:

➜  docker-hive git:(master) ✗ docker exec -it hive-server bash
root@d16fce776021:/opt# /opt/hive/bin/beeline -u jdbc:hive2://localhost:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 2.1.1)
Driver: Hive JDBC (version 2.1.1)
17/04/24 14:52:53 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.1.1 by Apache Hive
0: jdbc:hive2://localhost:10000> LOAD DATA LOCAL INPATH './../examples/files/kv1.txt' OVERWRITE INTO TABLE pks;
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:74 Table not found 'pks' (state=42S02,code=10001)
0: jdbc:hive2://localhost:10000> CREATE TABLE pks (foo INT, bar STRING);
No rows affected (0.171 seconds)
0: jdbc:hive2://localhost:10000> LOAD DATA LOCAL INPATH './../examples/files/kv1.txt' OVERWRITE INTO TABLE pks;
No rows affected (0.291 seconds)
0: jdbc:hive2://localhost:10000> SELECT * FROM pks;
+----------+----------+--+
| pks.foo  | pks.bar  |
+----------+----------+--+
| 238      | val_238  |
| 86       | val_86   |
| 311      | val_311  |
...

Connect to namenode and check that Hive is writing to the right HDFS:

➜  docker-hive git:(master) ✗ docker exec -it namenode bash
root@d15fdc2345a6:/# hadoop fs -ls /tmp/hive/root
Found 2 items
drwx------   - root supergroup          0 2017-04-24 14:46 /tmp/hive/root/50d7fa4d-e731-4a2b-8ed2-bd9b2c7dbfd7
drwx------   - root supergroup          0 2017-04-24 14:35 /tmp/hive/root/69eb874b-f76d-4446-acb9-babb80211e17

Now if you do the same thing from hive-server container you will see the same thing:

➜  docker-hive git:(master) ✗ docker exec -it hive-server bash
root@d16fce776021:/opt# hadoop fs -ls /tmp/hive/root
Found 2 items
drwx------   - root supergroup          0 2017-04-24 14:46 /tmp/hive/root/50d7fa4d-e731-4a2b-8ed2-bd9b2c7dbfd7
drwx------   - root supergroup          0 2017-04-24 14:35 /tmp/hive/root/69eb874b-f76d-4446-acb9-babb80211e17

from docker-hive.

enanablancaynumeros avatar enanablancaynumeros commented on June 27, 2024

Hi @earthquakesan, Thanks for the answer! I got it working if I only use the new entrypoint script, but it doesn't work under other use cases you haven't describe in your previous steps if I try to use the new hadoop and jdbc version.

For some reason I haven't had time to identify, it is possible to execute beeline inside of the docker container and now it points to the right HDFS, but if you try to hit the port 10000 from outside or other docker containers it refuses the connection, which affects plain curl calls and external jdbc drivers.

Let me know if you can not reproduce the problem. I run docker-compose build --no-cache after your changes a couple of times and still got that error.

from docker-hive.

earthquakesan avatar earthquakesan commented on June 27, 2024

@enanablancaynumeros I need a description of how you run the application, because I have no problems running the example app from here: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC

What I did to run it:

  1. Copy hadoop and hive from docker container to my dev environment
docker cp hive-server:/opt ./
➜  hive-example-app ls opt 
hadoop-2.8.0  hive
  1. Copy/paste code from the example to HiveJdbcClient.java and compile it:
javac HiveJdbcClient.java
  1. Download hadoop-core from mvn central:
# Copy it to opt/hadoop-2.8.0/
➜  hive-example-app ls opt/hadoop-2.8.0/hadoop-core-1.2.1.jar 
opt/hadoop-2.8.0/hadoop-core-1.2.1.jar
  1. Create run.sh script (my current working dir is /home/ivan/Workspace/Apps/hive-example-app/):
#!/bin/bash
HADOOP_HOME=/home/ivan/Workspace/Apps/hive-example-app/opt/hadoop-2.8.0
HIVE_HOME=/home/ivan/Workspace/Apps/hive-example-app/opt/hive
 
echo -e '1\x01foo' > /tmp/a.txt
echo -e '2\x01bar' >> /tmp/a.txt
 
HADOOP_CORE=$(ls $HADOOP_HOME/hadoop-core*.jar)
CLASSPATH=.:$HIVE_HOME/conf:$(./opt/hadoop-2.8.0/bin/hadoop classpath)
 
for i in ${HIVE_HOME}/lib/*.jar ; do
    CLASSPATH=$CLASSPATH:$i
done
 
java -cp $CLASSPATH HiveJdbcClient
  1. Run run.sh:
chmod +x run.sh
./run.sh

It will fail to read a.txt, because it is created on your host and not on remote host. Therefore you first have to copy the file there:

docker cp /tmp/a.txt hive-server:/tmp/

Otherwise, I can not see any problems with this sample application. Here is the output:

➜  hive-example-app ./run.sh 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ivan/Workspace/Apps/hive-example-app/opt/hadoop-2.8.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ivan/Workspace/Apps/hive-example-app/opt/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/04/26 10:55:53 INFO jdbc.Utils: Supplied authorities: localhost:10000
17/04/26 10:55:53 INFO jdbc.Utils: Resolved authority: localhost:10000
Running: show tables 'testHiveDriverTable'
testhivedrivertable
Running: describe testHiveDriverTable
key	int
value	string
Running: load data local inpath '/tmp/a.txt' into table testHiveDriverTable
Running: select * from testHiveDriverTable
1	foo
2	bar
Running: select count(1) from testHiveDriverTable
2

from docker-hive.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.