Comments (10)
@xianzhen 0.9 is the version of the broker, but which version of the connector are you using? It looks like the assignment and set of TopicPartitionWriters somehow got out of sync, which should not be possible.
from kafka-connect-hdfs.
@ewencp Thanks. Confluent version is 2.0.0. I check the source, One possible reason is that WorkerSinkTask clear all TopicPartitionWriter before ConsumerSinkTask is closed. I am not sure.
from kafka-connect-hdfs.
@xianzhen This has been fixed as of the 3.0.0 release. I think the issue was that onPartitionsRevoked
was removing the TopicPartitionWriters
but then close()
was trying to use them. The newer version relies on the fact that the framework guarantees it will revoke the partitions before finally stopping the task.
from kafka-connect-hdfs.
I could see the same error popping up again and we are using confluent version 3.0.0!
This is mainly noticed in our staging setup, where we are having 3 kafka connect instances. So if I stop all 3 instances, start one by one this error comes up. Its like when one is started and when I start the second one the first dies. Then I start the first one the second dies. Eventually, I go through that cycle a couple of times to make it stable.
Also I think it happens at a particular stage, in the sense if I start all three together.. this happens. But when I start one by one really quick.. at times this is not noticed!
[2016-07-07 13:22:13,242] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:183)
java.lang.NullPointerException
at org.apache.kafka.connect.runtime.WorkerSinkTask.stop(WorkerSinkTask.java:119)
at org.apache.kafka.connect.runtime.Worker.stopTask(Worker.java:397)
at org.apache.kafka.connect.runtime.Worker.stopTasks(Worker.java:373)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$RebalanceListener.onRevoked(DistributedHerder.java:1064)
at org.apache.kafka.connect.runtime.distributed.WorkerCoordinator.onJoinPrepare(WorkerCoordinator.java:237)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:212)
at org.apache.kafka.connect.runtime.distributed.WorkerGroupMember.poll(WorkerGroupMember.java:147)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:286)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:176)
at java.lang.Thread.run(Thread.java:745)
[2016-07-07 13:22:13,247] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:68)
from kafka-connect-hdfs.
@raju-divakaran Are there any other error messages or stacktraces earlier in the log? It looks like the null pointer is due to the consumer not being allocated yet. We protect against this in WorkerSinkTask.close()
but not in WorkerSinkTask.stop()
. But as far as I can tell, this shouldn't be a problem because the way the order in which we create & initialize the WorkerSinkTask
(the latter half of which creates the consumer) and then add it to the collection of tasks, any calls to Worker.stopTasks
shouldn't see the task until the consumer is already allocated. The only path I can see where this wouldn't happen is if there was an exception during WorkerSinkTask.initialize
, in which case there should be an error message like Task {} failed initialization and will not be started.
and a corresponding stack trace.
from kafka-connect-hdfs.
I have similar issue. Version of confluent 2.0.1.
- Start worker in distributed mode in interactive mode (not daemon)
- add connector with configuration
{ "name": "hdfs-test-ign", "config": { "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector", "tasks.max": "6", "topics": "test_ign", "hdfs.url": "hdfs://hdfsd:8020", "hadoop.conf.dir": "/confluent/hadoop/conf", "flush.size": "100", "partitioner.class": "io.confluent.connect.hdfs.partitioner.TimeBasedPartitioner", "locale": "us", "partition.duration.ms": "86400000", "path.format": "YYYY-MM-dd", "timezone": "US/Eastern", "rotate.interval.ms": "60000" } }
- send some messages
- stop worker (Ctrl+C)
- start worker again.
After some restarts I get error and the only way to make it work again is to delete config and offset topics.
Worker config as well as output log are attached.
connect-avro-distributed.properties.txt
distributed_consumer_fail_log.txt
from kafka-connect-hdfs.
I found out what causes this error. It turns out that config topic had more than one partition. Once I created topic with one partition, the problem disappeared.
from kafka-connect-hdfs.
@Tseretyan Thanks for sharing. I believe your information helped with some trouble I had with using connect-standalone.
from kafka-connect-hdfs.
I am getting the same error on Kafka-Connect start. I have made the config topic with single partition.
I checked the zookeeper and Kafka. I can produce and consume message on "test" topic.
My setup is single server with Zookeeper, Kafka, Schema registry, Kafka-connect on same server:
I get following exception in log:
2016-10-20 04:42:15,915 - ERROR DistributedHerder - Uncaught exception in herder work thread, exiting:
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
2016-10-20 04:42:15,916 - INFO Thread-1 - Kafka Connect stopping
2016-10-20 04:42:15,916 - INFO Thread-1 - Stopping REST server
2016-10-20 04:42:15,917 - DEBUG Thread-2 - stopping org.eclipse.jetty.server.Server@347d46d4
from kafka-connect-hdfs.
@tnegi7519 your issue looks a bit different than the one reported here. It looks more like your worker is not able to fetch topic metadata which is different than a NullPointerException coming from TopicPartitionWriters getting out of sync. If you are still having trouble with a timeout fetching topic metadata, please open a new issue with more context around the logging and the worker/connector configs and we'll see if we can help on that issue.
from kafka-connect-hdfs.
Related Issues (20)
- using wrong user/keytab while there are multiple hdfs-sink connections HOT 1
- template file isn't committed and uploaded to storage when using AvroFormat
- java.util.ConcurrentModificationException during task rebalancing HOT 1
- log4j update schedule HOT 1
- Hive table does not match column names present in the parquet data
- Exception when reading Decimal types written by connector
- Hive Merge Feature
- Incremental Co-operative Rebalancing Support for HDFS Connector
- Error after install and unistall connect-transforms
- Adding Hive partition threw unexpected error
- HDFS2 connect compatibility with HDFS3 server
- CVE-2021-34538 HIGH vulnerability HOT 2
- Task is being killed and will not recover until manually restarted
- Allow to limit retry write errors by timeout
- Kafka Issue while running on docker and adding new connector HOT 1
- can't build because repo conjars is down
- multiple keytab kerberos issue HOT 1
- OzoneFileSystem
- Non-resolvable parent POM io.confluent:common:[7.7.0, 7.7.1) for io.confluent:kafka-connect-storage-common-parent:11.2.9
- [2024-05-30 10:25:31,403] ERROR [hdfs3_sink-test_v4|task-0] WorkerSinkTask{id=hdfs3_sink-test_v4-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:237) java.lang.NullPointerException: Cannot invoke "io.confluent.connect.hdfs3.DataWriter.open(java.util.Collection)" because "this.hdfsWriter" is null
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-connect-hdfs.