我们希望用clickhouse_sinker 作为一个系统级服务，能够常驻，自动适应clickhouse, kafka链接状态的变化, 而不需要去手动停止。 <p

这个想法挺好的自动倍增系统sleep的时间，直到成功 <p dir="a

期待能够实现开源 Kafka, Clickhouse链接不正常的时候，系统进入休眠状态, 休眠

生产环境下使用的一些建议 about clickhouse_sinker HOT 7 CLOSED

housepower commented on June 20, 2024

生产环境下使用的一些建议

from clickhouse_sinker.

Comments (7)

sundy-li commented on June 20, 2024

这个想法挺好的

自动倍增系统sleep的时间，直到成功

不能倍增，恢复时间越久，倍增的时间越多，集群ok后，sinker写入的延迟就可能越大

建议LoopWrite, 抛弃retryTimes的设置

第一个版本是完全死循环写ck的，直到成功，但数据库异常是不可规避的，死循环写在异常不可恢复的情况下，需要kill -9，数据也会丢失；
并且调大 retryTimes 也相当于循环写，也是可以满足你目前的需求的

sinker开源后，没有把业务相关的处理做到非常好，我建议用户基于基础代码进行适当改造

我司内部版本有以下功能：

clickhouse_manager 进行task配置，task下发
sinker 定时获取 clickhouse_manager 的task配置，进行task的生命周期管理(start, stop)
kafka消费到ck 不丢不重（flush 成功后，手动提交这一批次最大的offset到kafka）
其他parser支持，如 pb协议，内部上报协议等
exporters监控

后续若有的代码与业务绑定不大，可以持续开源这块

from clickhouse_sinker.

jsding commented on June 20, 2024

期待能够实现开源

Kafka, Clickhouse链接不正常的时候，系统进入休眠状态, 休眠时间倍增，但倍增的最大时间可以设置，比如1小时, 最多1小时后,系统激活检查是否正常, 如果正常，正常工作，如果不正常，继续休眠。
kafka消费到ck 不丢不重（flush 成功后，手动提交这一批次最大的offset到kafka)

如果实现了1，实际上clickhouse_sinker开启以后就可以不用太关注这个服务了。clickhouse sinker程序在一台机器上常开, 不必担心kafka, clickhouse维护的时候, clickhouse sinker大量的报错和重试。

from clickhouse_sinker.

ns-gzhang commented on June 20, 2024

kafka消费到ck 不丢不重（flush 成功后，手动提交这一批次最大的offset到kafka）

I'm curious how this can achieve exactly-once ingestion, if there are multiple Kafka partitions, and a consumer in a group may get messages from multiple partitions or even changing partitions when Kafka rebalancing happens. Keeping track of batch high offsets would work for a single partition in a batch in case of clickhouse_sinker crashes. But if the ClickHouse server crashes before you get positive response to the last insert, you won't be able to tell if the last batch is successful or not, right? In order to deal with that with ClickHouse's batch idempotency (exactly same batches will be deduped), we need to send in exactly the same batches for the unacked batches in case of ClickHouse server crash, which means we need to keep track of batch low offsets and high offsets (from a single partition, or every partition involved in the batch). Right? And we cannot use consumer group when rebalancing can happen? Thanks in advance for sharing your insights on this.

from clickhouse_sinker.

sundy-li commented on June 20, 2024

But if the ClickHouse server crashes before you get positive response to the last insert, you won't be able to tell if the last batch is successful or not, right?

Yes, so we should ensure each insert is ok, which we use LoopWrite to retry the failed inserts, users could set retry times to be a large number or send alarm messages.

Keeping track of batch high offsets would work for a single partition in a batch in case of clickhouse_sinker crashes

In each batch insert, we keep tracking the largest offsets of involved partitions, when the batch insert is successful, we commit the offsets of partitions.

from clickhouse_sinker.

ns-gzhang commented on June 20, 2024

Thanks Sundy for sharing more insights.

Yes, so we should ensure each insert is ok, which we use LoopWrite to retry the failed inserts, users could set retry times to be a large number or send alarm messages.

So you are saying LoopWrite should never give up until it succeeds. What if the sinker or the server/pod it runs on also crashes during retry?

In each batch insert, we keep tracking the largest offsets of involved partitions, when the batch insert is successful, we commit the offsets of partitions.

That works only if you never need to fetch from Kafka again for a pending batch (i.e. batch insertions are always successful - the assumption above), right? If I ever need to re-assemble a batch, I have to be able to control the mix of data from all the partitions involved to generate exactly the same batch (to deal with imaginary crash case above).

from clickhouse_sinker.

sundy-li commented on June 20, 2024

What if the sinker or the server/pod it runs on also crashes during retry?

If it crashes, the offset will not commit either, so it'ok.
It's possible that the messages may be duplicated. When we successfully insert into ClickHouse,
the sinker crashes, then we lose the chance to commit offsets. That way messages are duplicated when the sinker starts next time.

That works only if you never need to fetch from Kafka again for a pending batch (i.e. batch insertions are always successful - the assumption above), right?

Yes

from clickhouse_sinker.

ns-gzhang commented on June 20, 2024

Thanks again. That's what I'd like to confirm.

from clickhouse_sinker.

生产环境下使用的一些建议 about clickhouse_sinker HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent