Comments (4)
A possible solution is to change the code in https://github.com/vmware/vmware-go-kcl/blob/master/clientlibrary/worker/shard-consumer.go#L256 adding input.Checkpointer.Checkpoint(nil)
:
if getResp.NextShardIterator == nil {
log.Infof("Shard %s closed", shard.ID)
if err := input.Checkpointer.Checkpoint(nil); err != nil {
return err
}
shutdownInput := &kcl.ShutdownInput{ShutdownReason: kcl.TERMINATE, Checkpointer: recordCheckpointer}
sc.recordProcessor.Shutdown(shutdownInput)
return nil
}
The problem with this solution is that if ProcessRecords
fails somehow, the checkpoint is set to SHARD_END
regardless of that failure. I think ProcessRecords
would be the right place to set the SHARD_END
checkpoint (but it's missing the knowledge to do it) or its return value must be checked (returning an error) before checking whether the shard is closed, because if SHARD_END
is put in the checkpointer but ProcessRecords
failed, we're probably going to lose data
from vmware-go-kcl.
The library has been tested well for resharding scenario. It does not require restart.
Also, make sure to do checkpointing during shutting down. See example:
https://github.com/vmware/vmware-go-kcl/blob/master/test/worker_test.go#L300
from vmware-go-kcl.
Thanks for the hint on the shutdown, that's exactly what I was looking for! 👍
As per the resharding scenario, we decided to shutdown instances after a resharding not for the resharding itself, but to balance the shards between kcl instances.
Some details: we have long living kcl instances reading from the kinesis streams. Those instances live on AWS EC2 spot servers and, hence, those servers can be killed sometimes due to their spot nature. Without shard stealing we found out that instances with longer lives will read from more shards over time, leading to unbalanced works between servers.
So we decided to use MaxShards
based on (n.shards / n. instances) + 1
so that all kcl instances will always read from the ~same number of shards over time.
Since the number of shards changes during a resharding operation, we quit all instances when a resharding happens and then, during the new startup, the correct MaxShards
value gets calculated and passed to kcl.
from vmware-go-kcl.
I often found automatic resharding is very annoying and using MaxShards is much easier. See
#4 for more discussion.
Glad to know the issue has been resolved.
from vmware-go-kcl.
Related Issues (20)
- Lease stealing code exception HOT 3
- Option to assume role in AWS Kinesis client HOT 1
- Checkpointer interface breaking change in v1.3.0 HOT 1
- worker.go:339 Error in getRecords: unexpected EOF
- How to get the StreamName/ConsumerName inside ProcessRecords method HOT 1
- Unsupported protocol scheme
- Workload skewed across workers HOT 3
- (Still) Too many calls to DescribeStream HOT 3
- Error in getRecords leaves a dangling record processor. HOT 1
- Possible Data Race for input Checkpointer HOT 1
- Semantic versioning friendly version tags HOT 1
- Shard consumer sometimes fails to recover from error refreshing lease HOT 9
- Prometheus metrics should add appname as label and not in the metric name HOT 1
- Possibility to Work w/ DynamoDB Streams HOT 3
- Error in publishing cloudwatch metrics. Error: NoCredentialProviders: no valid providers in chain. HOT 1
- How to use record deaggregator? HOT 2
- Multiple consumers processing records from the same shard. HOT 10
- Tagged toolchain docker container vmware/go-kcl-toolchain:0.1.2 uses golang version 1.12.4
- AWS Go SDK V2 HOT 15
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vmware-go-kcl.