Giter Club home page Giter Club logo

Comments (11)

mwillfox avatar mwillfox commented on August 16, 2024 4

For what it's worth, I think this is a valuable feature. Not everyone uses Kinesis under a constant-load scenario in which you are constantly reading data. In our application, we use Kinesis to marshal data through the system. Sometimes the throughput we see in Kinesis is very high, other times zero. Also, Kinesis does support auto-scaling shards now. In our use case where data throughput is highly variable, it doesn't make sense to pay for extra workers. We are deployed in Kubernetes, so the way I would like to utilize this functionality is:

  • Set the number of max leases to the kinesis max shards per stream value.
  • Run my deployment with 1 replica, which at idle times will hold leases to all shards on the stream.
  • Utilize a HorizontalPodAutoscaler on CPU
  • See additional replicas added to the deployment under high load (this currently works, but since lease stealing is not supported no load balancing across workers occurs and the additional pods do nothing)

If lease stealing were supported, this deployment strategy would allow us to easily scale up workers based on data throughput. Combined with the Kubernetes cluster autoscaler we would only pay for compute resources when they are necessary.

I'd kindly ask that you reconsider adding this feature. It's very useful for certain workloads.

from vmware-go-kcl.

patrobinson avatar patrobinson commented on August 16, 2024 1

I plan on supporting this in https://github.com/patrobinson/gokini, which this library was originally forked from., in the future

from vmware-go-kcl.

patrobinson avatar patrobinson commented on August 16, 2024 1

This is now supported in gokini https://github.com/patrobinson/gokini/releases/tag/v0.0.6

from vmware-go-kcl.

taoj-action avatar taoj-action commented on August 16, 2024 1

In general, I don't like the idea of dynamic rebalancing. We'd like the partitioned workload (a set of shards) always go to fixed instance to simply KCL enabled application design because local cache will have enough information for processing data. If related data are scattered into multiple workers on different instances, an external cache/db must be used to reconcile those data.

I need to think more on a better solution.

from vmware-go-kcl.

Sytten avatar Sytten commented on August 16, 2024 1

I agree, we have a similar problem and we use an external redis to reconcile the data. The use case should at least be possible if not enabled by default. There is still the problem of a worker dying and leaving some shards unattended, how do you manage that currently?

from vmware-go-kcl.

taoj-action avatar taoj-action commented on August 16, 2024

No, it is not implemented. MaxLeasesToStealAtOneTime is used for shard rebalancing. Actually, this is the only missing feature comparing to AWS's KCL.

https://aws.amazon.com/blogs/big-data/process-large-dynamodb-streams-using-multiple-amazon-kinesis-client-library-kcl-workers/

Shard rebalancing is a nice idea but it may not be that useful. In real life, shard doesn't split/merge often. Normally, we only do shard splitting to handle increasing volume. We never did any shard merge in production. Also, each host has its limited capacity. Splitting shard normally means scaling up the number of hosts. We'd rather set max number of shards for each host to handle and adjust min hosts number for auto-scaling-group to match the required host number.

from vmware-go-kcl.

Jackyjjc avatar Jackyjjc commented on August 16, 2024

Our workload is a little bit different, we do not set max # of shards and let auto scaling to scale base on CPU%. On a typical day we see the hosts scale up during peak time and scale down afterwards :)

from vmware-go-kcl.

Sytten avatar Sytten commented on August 16, 2024

Any update from vmware on this?

from vmware-go-kcl.

taoj-action avatar taoj-action commented on August 16, 2024

This need to bring over the change from
patrobinson/gokini@58a32d4
Since there are some production usages on this library, extensive investigation and testing are needed. We will come out solution soon.

from vmware-go-kcl.

taoj-action avatar taoj-action commented on August 16, 2024

If worker dies and lease will be expired, the unattended shards will be consumed by other workers when max number of lease has not been reached yet.
https://github.com/vmware/vmware-go-kcl/blob/master/clientlibrary/worker/worker.go#L227

Kinesis doesn't support auto scale up/down shard count. We know the shard number and have configured enough instances in ASG. Our newer production service uses Kubernetes and configuring some idle Pods does not consume much resources.

from vmware-go-kcl.

taoj-action avatar taoj-action commented on August 16, 2024

One advantage of using Kinesis for big data processing (Map/Reduce) is that Kinesis does the Mapping part for free.

from vmware-go-kcl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.