Giter Club home page Giter Club logo

Comments (4)

Randgalt avatar Randgalt commented on September 22, 2024

Interesting - distribution is essentially random. Each worker will take tasks as it can. I'll have a look if there's something other than randomness causing this. It if is indeed random, though, should we introduce some kind of distribution mechanism that tries to evenly distribute tasks?

from workflow.

dtoledo67 avatar dtoledo67 commented on September 22, 2024

If it is really random then I don't think we need another mechanism: small
variations are fine. The problem here is that it doesn't look random. Here
is how it looks on our production system with about 1 million tasks
executed. In this case, we have about 15 different task types. The level of
concurrency configured for each type of task is different. The pattern
looks more or less the same for each task type. After running multiple
experiments I think the pattern is more visible when you execute a large
number of tasks.

[image: Inline image 2]

On Wed, Dec 23, 2015 at 9:35 AM, Jordan Zimmerman [email protected]
wrote:

Interesting - distribution is essentially random. Each worker will take
tasks as it can. I'll have a look if there's something other than
randomness causing this. It if is indeed random, though, should we
introduce some kind of distribution mechanism that tries to evenly
distribute tasks?


Reply to this email directly or view it on GitHub
#19 (comment).

from workflow.

Randgalt avatar Randgalt commented on September 22, 2024

I believe I know what's happening. Internally, Workflow uses Curator's DistributedQueue to get tasks that need processing. Unfortunately, ZooKeeper only provides getChildren() to get children under a node - i.e. it returns ALL children. Each Task worker then tries to process all current tasks. There's code to prevent duplicates but it could mean that the first process to call getChildren() will likely process more than other processes.

Fortunately, there is a workaround. Curator's PriorityQueue processes children a small amount at a time to allow for higher priority new items getting inserted. I did some testing with this and seems to help. So, simply change your TaskType's mode from the default to TaskMode.PRIORITY. I ran your tests with the default and I get this:

workflow tester workflowManager-1 has executed 376 tasks
workflow tester workflowManager-3 has executed 336 tasks
workflow tester workflowManager-2 has executed 288 tasks

Using TaskMode.PRIORITY (the only change) I now get this:

workflow tester workflowManager-2 has executed 353 tasks
workflow tester workflowManager-3 has executed 313 tasks
workflow tester workflowManager-1 has executed 334 tasks

So, if possible, please test using TaskMode.PRIORITY. It shouldn't affect performance as all tasks will have the same priority by default.

from workflow.

Randgalt avatar Randgalt commented on September 22, 2024

I've re-written the queue code to more evenly distribute. Please test with: https://github.com/NirmataOSS/workflow/tree/simple-queue (i.e. branch "simple-queue")

from workflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.