Comments (11)
For what it's worth, I think this is a valuable feature. Not everyone uses Kinesis under a constant-load scenario in which you are constantly reading data. In our application, we use Kinesis to marshal data through the system. Sometimes the throughput we see in Kinesis is very high, other times zero. Also, Kinesis does support auto-scaling shards now. In our use case where data throughput is highly variable, it doesn't make sense to pay for extra workers. We are deployed in Kubernetes, so the way I would like to utilize this functionality is:
- Set the number of max leases to the kinesis max shards per stream value.
- Run my deployment with 1 replica, which at idle times will hold leases to all shards on the stream.
- Utilize a HorizontalPodAutoscaler on CPU
- See additional replicas added to the deployment under high load (this currently works, but since lease stealing is not supported no load balancing across workers occurs and the additional pods do nothing)
If lease stealing were supported, this deployment strategy would allow us to easily scale up workers based on data throughput. Combined with the Kubernetes cluster autoscaler we would only pay for compute resources when they are necessary.
I'd kindly ask that you reconsider adding this feature. It's very useful for certain workloads.
from vmware-go-kcl.
I plan on supporting this in https://github.com/patrobinson/gokini, which this library was originally forked from., in the future
from vmware-go-kcl.
This is now supported in gokini https://github.com/patrobinson/gokini/releases/tag/v0.0.6
from vmware-go-kcl.
In general, I don't like the idea of dynamic rebalancing. We'd like the partitioned workload (a set of shards) always go to fixed instance to simply KCL enabled application design because local cache will have enough information for processing data. If related data are scattered into multiple workers on different instances, an external cache/db must be used to reconcile those data.
I need to think more on a better solution.
from vmware-go-kcl.
I agree, we have a similar problem and we use an external redis to reconcile the data. The use case should at least be possible if not enabled by default. There is still the problem of a worker dying and leaving some shards unattended, how do you manage that currently?
from vmware-go-kcl.
No, it is not implemented. MaxLeasesToStealAtOneTime is used for shard rebalancing. Actually, this is the only missing feature comparing to AWS's KCL.
Shard rebalancing is a nice idea but it may not be that useful. In real life, shard doesn't split/merge often. Normally, we only do shard splitting to handle increasing volume. We never did any shard merge in production. Also, each host has its limited capacity. Splitting shard normally means scaling up the number of hosts. We'd rather set max number of shards for each host to handle and adjust min hosts number for auto-scaling-group to match the required host number.
from vmware-go-kcl.
Our workload is a little bit different, we do not set max # of shards and let auto scaling to scale base on CPU%. On a typical day we see the hosts scale up during peak time and scale down afterwards :)
from vmware-go-kcl.
Any update from vmware on this?
from vmware-go-kcl.
This need to bring over the change from
patrobinson/gokini@58a32d4
Since there are some production usages on this library, extensive investigation and testing are needed. We will come out solution soon.
from vmware-go-kcl.
If worker dies and lease will be expired, the unattended shards will be consumed by other workers when max number of lease has not been reached yet.
https://github.com/vmware/vmware-go-kcl/blob/master/clientlibrary/worker/worker.go#L227
Kinesis doesn't support auto scale up/down shard count. We know the shard number and have configured enough instances in ASG. Our newer production service uses Kubernetes and configuring some idle Pods does not consume much resources.
from vmware-go-kcl.
One advantage of using Kinesis for big data processing (Map/Reduce) is that Kinesis does the Mapping part for free.
from vmware-go-kcl.
Related Issues (20)
- Lease stealing code exception HOT 3
- Option to assume role in AWS Kinesis client HOT 1
- Checkpointer interface breaking change in v1.3.0 HOT 1
- worker.go:339 Error in getRecords: unexpected EOF
- How to get the StreamName/ConsumerName inside ProcessRecords method HOT 1
- Unsupported protocol scheme
- Workload skewed across workers HOT 3
- (Still) Too many calls to DescribeStream HOT 3
- Error in getRecords leaves a dangling record processor. HOT 1
- Stuck in waitOnParentShard after resharding HOT 4
- Possible Data Race for input Checkpointer HOT 1
- Semantic versioning friendly version tags HOT 1
- Shard consumer sometimes fails to recover from error refreshing lease HOT 9
- Prometheus metrics should add appname as label and not in the metric name HOT 1
- Possibility to Work w/ DynamoDB Streams HOT 3
- Error in publishing cloudwatch metrics. Error: NoCredentialProviders: no valid providers in chain. HOT 1
- How to use record deaggregator? HOT 2
- Multiple consumers processing records from the same shard. HOT 10
- Tagged toolchain docker container vmware/go-kcl-toolchain:0.1.2 uses golang version 1.12.4
- AWS Go SDK V2 HOT 15
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vmware-go-kcl.