Giter Club home page Giter Club logo

Comments (7)

sundy-li avatar sundy-li commented on June 20, 2024

这个想法挺好的

自动倍增系统sleep的时间,直到成功

不能倍增,恢复时间越久,倍增的时间越多, 集群ok后,sinker写入的延迟就可能越大

建议LoopWrite, 抛弃retryTimes的设置

第一个版本是完全死循环写ck的,直到成功, 但数据库异常是不可规避的,死循环写在异常不可恢复的情况下,需要kill -9,数据也会丢失;
并且调大 retryTimes 也相当于循环写,也是可以满足你目前的需求的

sinker开源后,没有把业务相关的处理做到非常好,我建议用户基于基础代码进行适当改造

我司内部版本 有以下功能:

  1. clickhouse_manager 进行task配置,task下发
  2. sinker 定时获取 clickhouse_manager 的task配置,进行task的生命周期管理(start, stop)
  3. kafka消费到ck 不丢不重 (flush 成功后,手动提交这一批次最大的offset到kafka)
  4. 其他parser支持, 如 pb协议,内部上报协议等
  5. exporters监控

后续若有的代码与业务绑定不大,可以持续开源这块

from clickhouse_sinker.

jsding avatar jsding commented on June 20, 2024

期待能够实现开源

  1. Kafka, Clickhouse链接不正常的时候,系统进入休眠状态, 休眠时间倍增,但倍增的最大时间可以设置,比如1小时, 最多1小时后,系统激活检查是否正常, 如果正常,正常工作,如果不正常,继续休眠。

  2. kafka消费到ck 不丢不重 (flush 成功后,手动提交这一批次最大的offset到kafka)

如果实现了1,实际上clickhouse_sinker开启以后就可以不用太关注这个服务了。clickhouse sinker程序在一台机器上常开, 不必担心kafka, clickhouse维护的时候, clickhouse sinker大量的报错和重试。

from clickhouse_sinker.

ns-gzhang avatar ns-gzhang commented on June 20, 2024
  1. kafka消费到ck 不丢不重 (flush 成功后,手动提交这一批次最大的offset到kafka)

I'm curious how this can achieve exactly-once ingestion, if there are multiple Kafka partitions, and a consumer in a group may get messages from multiple partitions or even changing partitions when Kafka rebalancing happens. Keeping track of batch high offsets would work for a single partition in a batch in case of clickhouse_sinker crashes. But if the ClickHouse server crashes before you get positive response to the last insert, you won't be able to tell if the last batch is successful or not, right? In order to deal with that with ClickHouse's batch idempotency (exactly same batches will be deduped), we need to send in exactly the same batches for the unacked batches in case of ClickHouse server crash, which means we need to keep track of batch low offsets and high offsets (from a single partition, or every partition involved in the batch). Right? And we cannot use consumer group when rebalancing can happen? Thanks in advance for sharing your insights on this.

from clickhouse_sinker.

sundy-li avatar sundy-li commented on June 20, 2024

But if the ClickHouse server crashes before you get positive response to the last insert, you won't be able to tell if the last batch is successful or not, right?

Yes, so we should ensure each insert is ok, which we use LoopWrite to retry the failed inserts, users could set retry times to be a large number or send alarm messages.

Keeping track of batch high offsets would work for a single partition in a batch in case of clickhouse_sinker crashes

In each batch insert, we keep tracking the largest offsets of involved partitions, when the batch insert is successful, we commit the offsets of partitions.

from clickhouse_sinker.

ns-gzhang avatar ns-gzhang commented on June 20, 2024

Thanks Sundy for sharing more insights.

Yes, so we should ensure each insert is ok, which we use LoopWrite to retry the failed inserts, users could set retry times to be a large number or send alarm messages.

So you are saying LoopWrite should never give up until it succeeds. What if the sinker or the server/pod it runs on also crashes during retry?

In each batch insert, we keep tracking the largest offsets of involved partitions, when the batch insert is successful, we commit the offsets of partitions.

That works only if you never need to fetch from Kafka again for a pending batch (i.e. batch insertions are always successful - the assumption above), right? If I ever need to re-assemble a batch, I have to be able to control the mix of data from all the partitions involved to generate exactly the same batch (to deal with imaginary crash case above).

from clickhouse_sinker.

sundy-li avatar sundy-li commented on June 20, 2024

What if the sinker or the server/pod it runs on also crashes during retry?

If it crashes, the offset will not commit either, so it'ok.
It's possible that the messages may be duplicated. When we successfully insert into ClickHouse,
the sinker crashes, then we lose the chance to commit offsets. That way messages are duplicated when the sinker starts next time.

That works only if you never need to fetch from Kafka again for a pending batch (i.e. batch insertions are always successful - the assumption above), right?

Yes

from clickhouse_sinker.

ns-gzhang avatar ns-gzhang commented on June 20, 2024

Thanks again. That's what I'd like to confirm.

from clickhouse_sinker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.