There is 'MaxLeasesToStealAtOneTime' option in the config but lease stealing logic is

This is now supported in gokini <a href="https://github.com/patrobinson/gokini/release

This need to bring over the change from <a class="commit-link" data-hovercard-type

Support lease stealing about vmware-go-kcl HOT 11 CLOSED

vmware commented on August 16, 2024 2

Support lease stealing

from vmware-go-kcl.

Comments (11)

mwillfox commented on August 16, 2024 4

For what it's worth, I think this is a valuable feature. Not everyone uses Kinesis under a constant-load scenario in which you are constantly reading data. In our application, we use Kinesis to marshal data through the system. Sometimes the throughput we see in Kinesis is very high, other times zero. Also, Kinesis does support auto-scaling shards now. In our use case where data throughput is highly variable, it doesn't make sense to pay for extra workers. We are deployed in Kubernetes, so the way I would like to utilize this functionality is:

Set the number of max leases to the kinesis max shards per stream value.
Run my deployment with 1 replica, which at idle times will hold leases to all shards on the stream.
Utilize a HorizontalPodAutoscaler on CPU
See additional replicas added to the deployment under high load (this currently works, but since lease stealing is not supported no load balancing across workers occurs and the additional pods do nothing)

If lease stealing were supported, this deployment strategy would allow us to easily scale up workers based on data throughput. Combined with the Kubernetes cluster autoscaler we would only pay for compute resources when they are necessary.

I'd kindly ask that you reconsider adding this feature. It's very useful for certain workloads.

from vmware-go-kcl.

patrobinson commented on August 16, 2024 1

I plan on supporting this in https://github.com/patrobinson/gokini, which this library was originally forked from., in the future

from vmware-go-kcl.

patrobinson commented on August 16, 2024 1

This is now supported in gokini https://github.com/patrobinson/gokini/releases/tag/v0.0.6

from vmware-go-kcl.

taoj-action commented on August 16, 2024 1

In general, I don't like the idea of dynamic rebalancing. We'd like the partitioned workload (a set of shards) always go to fixed instance to simply KCL enabled application design because local cache will have enough information for processing data. If related data are scattered into multiple workers on different instances, an external cache/db must be used to reconcile those data.

I need to think more on a better solution.

from vmware-go-kcl.

Sytten commented on August 16, 2024 1

I agree, we have a similar problem and we use an external redis to reconcile the data. The use case should at least be possible if not enabled by default. There is still the problem of a worker dying and leaving some shards unattended, how do you manage that currently?

from vmware-go-kcl.

taoj-action commented on August 16, 2024

No, it is not implemented. MaxLeasesToStealAtOneTime is used for shard rebalancing. Actually, this is the only missing feature comparing to AWS's KCL.

https://aws.amazon.com/blogs/big-data/process-large-dynamodb-streams-using-multiple-amazon-kinesis-client-library-kcl-workers/

Shard rebalancing is a nice idea but it may not be that useful. In real life, shard doesn't split/merge often. Normally, we only do shard splitting to handle increasing volume. We never did any shard merge in production. Also, each host has its limited capacity. Splitting shard normally means scaling up the number of hosts. We'd rather set max number of shards for each host to handle and adjust min hosts number for auto-scaling-group to match the required host number.

from vmware-go-kcl.

Jackyjjc commented on August 16, 2024

Our workload is a little bit different, we do not set max # of shards and let auto scaling to scale base on CPU%. On a typical day we see the hosts scale up during peak time and scale down afterwards :)

from vmware-go-kcl.

Sytten commented on August 16, 2024

Any update from vmware on this?

from vmware-go-kcl.

taoj-action commented on August 16, 2024

This need to bring over the change from
patrobinson/gokini@58a32d4
Since there are some production usages on this library, extensive investigation and testing are needed. We will come out solution soon.

from vmware-go-kcl.

taoj-action commented on August 16, 2024

If worker dies and lease will be expired, the unattended shards will be consumed by other workers when max number of lease has not been reached yet.
https://github.com/vmware/vmware-go-kcl/blob/master/clientlibrary/worker/worker.go#L227

Kinesis doesn't support auto scale up/down shard count. We know the shard number and have configured enough instances in ASG. Our newer production service uses Kubernetes and configuring some idle Pods does not consume much resources.

from vmware-go-kcl.

taoj-action commented on August 16, 2024

One advantage of using Kinesis for big data processing (Map/Reduce) is that Kinesis does the Mapping part for free.

from vmware-go-kcl.

Support lease stealing about vmware-go-kcl HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent