Hive Storage Handler for Kafka

HiveKa is Apache Hive's storage handler that adds support in Apache Hive to query data from Apache Kafka. This provides an opportunity to Kafka users to inspect data ingested by Kafka without writing complex Kafka consumers. Hive makes it possible to run complex analytical queries across various data sources, like, HDFS, Solr, Hbase, etc.. HiveKa extends this support to Kafka.

Visit our website.

To create a Kafka table in Hive run:

create external table test_kafka (a int, b string) stored by 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' tblproperties('kafka.service.uri'='hivekafka-1.ent.cloudera.com:9092', 'kafka.whitelist.topics'='test4', 'kafka.avro.schema.file'='/tmp/test.avsc');

To generate Avro byte data into a topic, run our DemoProducer and pass the topic, number of messages and a kafka broker as parameters. For example:

java -classpath "/opt/cloudera/parcels/CDH/lib/avro/*:hive-kafka-1.0-SNAPSHOT.jar:/usr/lib/hive/*:/usr/lib/hive/*" org.apache.hadoop.hive.kafka.demoproducer.DemoProducer test4 10 hivekafka-1:9092

UnresolvedAddressException following the suggested examples

Hi,

I was trying out HiveKa and following the examples and I ran into some strange issue in which the mapreduce job seems to be trying to connect to null host. The stacktrace below is from hive. What else can I provide you ?

hive> select username,count(*) from tweets group by username;
Query ID = cloudera_20150727032323_7c0874f7-d063-4cd5-9ee1-af6cb43a6f9a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.nio.channels.UnresolvedAddressException
        at sun.nio.ch.Net.checkAddress(Net.java:127)
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
        at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
        at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
        at kafka.consumer.SimpleConsumer.getOrMakeConnection(SimpleConsumer.scala:142)
        at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:69)
        at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:124)
        at kafka.javaapi.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:79)
        at org.apache.hadoop.hive.kafka.KafkaInputFormat.fetchLatestOffsetAndCreateKafkaRequests(KafkaInputFormat.java:181)
        at org.apache.hadoop.hive.kafka.KafkaInputFormat.getSplits(KafkaInputFormat.java:332)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:361)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:571)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)
        at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:428)
        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'java.nio.channels.UnresolvedAddressException(null)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

hiveka / hiveka Goto Github PK

hiveka's Introduction

Hive Storage Handler for Kafka

hiveka's People

Contributors

Stargazers

Watchers

Forkers

hiveka's Issues

UnresolvedAddressException following the suggested examples

create tweet table error.

Below is the stack trace.

14/11/27 00:10:17 ERROR ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent