Giter Club home page Giter Club logo

hiveka's Introduction

Hive Storage Handler for Kafka

HiveKa is Apache Hive's storage handler that adds support in Apache Hive to query data from Apache Kafka. This provides an opportunity to Kafka users to inspect data ingested by Kafka without writing complex Kafka consumers. Hive makes it possible to run complex analytical queries across various data sources, like, HDFS, Solr, Hbase, etc.. HiveKa extends this support to Kafka.

Visit our website.

To create a Kafka table in Hive run:

create external table test_kafka (a int, b string) stored by 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' tblproperties('kafka.service.uri'='hivekafka-1.ent.cloudera.com:9092', 'kafka.whitelist.topics'='test4', 'kafka.avro.schema.file'='/tmp/test.avsc');

To generate Avro byte data into a topic, run our DemoProducer and pass the topic, number of messages and a kafka broker as parameters. For example:

java -classpath "/opt/cloudera/parcels/CDH/lib/avro/*:hive-kafka-1.0-SNAPSHOT.jar:/usr/lib/hive/*:/usr/lib/hive/*" org.apache.hadoop.hive.kafka.demoproducer.DemoProducer test4 10 hivekafka-1:9092

hiveka's People

Contributors

gwenshap avatar szehon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hiveka's Issues

UnresolvedAddressException following the suggested examples

Hi,

I was trying out HiveKa and following the examples and I ran into some strange issue in which the mapreduce job seems to be trying to connect to null host. The stacktrace below is from hive. What else can I provide you ?

hive> select username,count(*) from tweets group by username;
Query ID = cloudera_20150727032323_7c0874f7-d063-4cd5-9ee1-af6cb43a6f9a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.nio.channels.UnresolvedAddressException
        at sun.nio.ch.Net.checkAddress(Net.java:127)
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
        at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
        at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
        at kafka.consumer.SimpleConsumer.getOrMakeConnection(SimpleConsumer.scala:142)
        at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:69)
        at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:124)
        at kafka.javaapi.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:79)
        at org.apache.hadoop.hive.kafka.KafkaInputFormat.fetchLatestOffsetAndCreateKafkaRequests(KafkaInputFormat.java:181)
        at org.apache.hadoop.hive.kafka.KafkaInputFormat.getSplits(KafkaInputFormat.java:332)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:361)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:571)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)
        at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:428)
        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'java.nio.channels.UnresolvedAddressException(null)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

create tweet table error.

Hello.
I am really thank you for your project.

I faced problem when i create tweet table as you mentioned 'How do we use HiveKa?' in your home . (http://hiveka.weebly.com/)

Below is the stack trace.

14/11/27 00:10:17 INFO ql.Driver:
14/11/27 00:10:17 INFO ql.Driver: </PERFLOG method=acquireReadWriteLocks start=1417014617045 end=1417014617046 duration=1>
14/11/27 00:10:17 INFO ql.Driver:
14/11/27 00:10:17 INFO ql.Driver: Starting command: create external table tweets (username string, text string, tstamp bigint)
stored by 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
tblproperties('kafka.topic'='test',
'kafka.service.uri'='kafka01:9092',
'kafka.whitelist.topics'='tweet',
'kafka.avro.schema.file'='/tmp/tweet.avsc')
14/11/27 00:10:17 INFO ql.Driver: </PERFLOG method=TimeToSubmit start=1417014617025 end=1417014617047 duration=22>
14/11/27 00:10:17 INFO exec.DDLTask: Use StorageHandler-supplied org.apache.hadoop.hive.serde2.avro.AvroSerDe for table tweets
Failed with exception null
14/11/27 00:10:17 ERROR exec.Task: Failed with exception null
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3672)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:254)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

14/11/27 00:10:17 ERROR ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

My hive version is hive-service-0.10.0-cdh4.7.0 and kafka version is kafka-0.8.1.
Could you tell me what is the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.