Version: 0.9.0 Steps: Stop kafka connect.</l

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I have similar issue. Version of confluent 2.0.1. Start worker

ERROR Uncaught exception in herder work thread about kafka-connect-hdfs HOT 10 CLOSED

confluentinc commented on August 28, 2024

ERROR Uncaught exception in herder work thread

from kafka-connect-hdfs.

Comments (10)

ewencp commented on August 28, 2024

@xianzhen 0.9 is the version of the broker, but which version of the connector are you using? It looks like the assignment and set of TopicPartitionWriters somehow got out of sync, which should not be possible.

from kafka-connect-hdfs.

xianzhen commented on August 28, 2024

@ewencp Thanks. Confluent version is 2.0.0. I check the source, One possible reason is that WorkerSinkTask clear all TopicPartitionWriter before ConsumerSinkTask is closed. I am not sure.

from kafka-connect-hdfs.

ewencp commented on August 28, 2024

@xianzhen This has been fixed as of the 3.0.0 release. I think the issue was that onPartitionsRevoked was removing the TopicPartitionWriters but then close() was trying to use them. The newer version relies on the fact that the framework guarantees it will revoke the partitions before finally stopping the task.

from kafka-connect-hdfs.

raju-divakaran commented on August 28, 2024

I could see the same error popping up again and we are using confluent version 3.0.0!

This is mainly noticed in our staging setup, where we are having 3 kafka connect instances. So if I stop all 3 instances, start one by one this error comes up. Its like when one is started and when I start the second one the first dies. Then I start the first one the second dies. Eventually, I go through that cycle a couple of times to make it stable.

Also I think it happens at a particular stage, in the sense if I start all three together.. this happens. But when I start one by one really quick.. at times this is not noticed!

[2016-07-07 13:22:13,242] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:183)
java.lang.NullPointerException
at org.apache.kafka.connect.runtime.WorkerSinkTask.stop(WorkerSinkTask.java:119)
at org.apache.kafka.connect.runtime.Worker.stopTask(Worker.java:397)
at org.apache.kafka.connect.runtime.Worker.stopTasks(Worker.java:373)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$RebalanceListener.onRevoked(DistributedHerder.java:1064)
at org.apache.kafka.connect.runtime.distributed.WorkerCoordinator.onJoinPrepare(WorkerCoordinator.java:237)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:212)
at org.apache.kafka.connect.runtime.distributed.WorkerGroupMember.poll(WorkerGroupMember.java:147)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:286)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:176)
at java.lang.Thread.run(Thread.java:745)
[2016-07-07 13:22:13,247] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:68)

from kafka-connect-hdfs.

ewencp commented on August 28, 2024

@raju-divakaran Are there any other error messages or stacktraces earlier in the log? It looks like the null pointer is due to the consumer not being allocated yet. We protect against this in WorkerSinkTask.close() but not in WorkerSinkTask.stop(). But as far as I can tell, this shouldn't be a problem because the way the order in which we create & initialize the WorkerSinkTask (the latter half of which creates the consumer) and then add it to the collection of tasks, any calls to Worker.stopTasks shouldn't see the task until the consumer is already allocated. The only path I can see where this wouldn't happen is if there was an exception during WorkerSinkTask.initialize, in which case there should be an error message like Task {} failed initialization and will not be started. and a corresponding stack trace.

from kafka-connect-hdfs.

commented on August 28, 2024

I have similar issue. Version of confluent 2.0.1.

Start worker in distributed mode in interactive mode (not daemon)
add connector with configuration
{ "name": "hdfs-test-ign", "config": { "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector", "tasks.max": "6", "topics": "test_ign", "hdfs.url": "hdfs://hdfsd:8020", "hadoop.conf.dir": "/confluent/hadoop/conf", "flush.size": "100", "partitioner.class": "io.confluent.connect.hdfs.partitioner.TimeBasedPartitioner", "locale": "us", "partition.duration.ms": "86400000", "path.format": "YYYY-MM-dd", "timezone": "US/Eastern", "rotate.interval.ms": "60000" } }
send some messages
stop worker (Ctrl+C)
start worker again.

After some restarts I get error and the only way to make it work again is to delete config and offset topics.
Worker config as well as output log are attached.
connect-avro-distributed.properties.txt
distributed_consumer_fail_log.txt

from kafka-connect-hdfs.

commented on August 28, 2024

I found out what causes this error. It turns out that config topic had more than one partition. Once I created topic with one partition, the problem disappeared.

from kafka-connect-hdfs.

blbradley commented on August 28, 2024

@Tseretyan Thanks for sharing. I believe your information helped with some trouble I had with using connect-standalone.

from kafka-connect-hdfs.

negi-tribhuwan commented on August 28, 2024

I am getting the same error on Kafka-Connect start. I have made the config topic with single partition.
I checked the zookeeper and Kafka. I can produce and consume message on "test" topic.
My setup is single server with Zookeeper, Kafka, Schema registry, Kafka-connect on same server:

I get following exception in log:

2016-10-20 04:42:15,915 - ERROR DistributedHerder - Uncaught exception in herder work thread, exiting:
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
2016-10-20 04:42:15,916 - INFO Thread-1 - Kafka Connect stopping
2016-10-20 04:42:15,916 - INFO Thread-1 - Stopping REST server
2016-10-20 04:42:15,917 - DEBUG Thread-2 - stopping org.eclipse.jetty.server.Server@347d46d4

from kafka-connect-hdfs.

cotedm commented on August 28, 2024

@tnegi7519 your issue looks a bit different than the one reported here. It looks more like your worker is not able to fetch topic metadata which is different than a NullPointerException coming from TopicPartitionWriters getting out of sync. If you are still having trouble with a timeout fetching topic metadata, please open a new issue with more context around the logging and the worker/connector configs and we'll see if we can help on that issue.

from kafka-connect-hdfs.

ERROR Uncaught exception in herder work thread about kafka-connect-hdfs HOT 10 CLOSED

Comments (10)

I get following exception in log:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent