Comments (4)
Interesting - distribution is essentially random. Each worker will take tasks as it can. I'll have a look if there's something other than randomness causing this. It if is indeed random, though, should we introduce some kind of distribution mechanism that tries to evenly distribute tasks?
from workflow.
If it is really random then I don't think we need another mechanism: small
variations are fine. The problem here is that it doesn't look random. Here
is how it looks on our production system with about 1 million tasks
executed. In this case, we have about 15 different task types. The level of
concurrency configured for each type of task is different. The pattern
looks more or less the same for each task type. After running multiple
experiments I think the pattern is more visible when you execute a large
number of tasks.
[image: Inline image 2]
On Wed, Dec 23, 2015 at 9:35 AM, Jordan Zimmerman [email protected]
wrote:
Interesting - distribution is essentially random. Each worker will take
tasks as it can. I'll have a look if there's something other than
randomness causing this. It if is indeed random, though, should we
introduce some kind of distribution mechanism that tries to evenly
distribute tasks?—
Reply to this email directly or view it on GitHub
#19 (comment).
from workflow.
I believe I know what's happening. Internally, Workflow uses Curator's DistributedQueue to get tasks that need processing. Unfortunately, ZooKeeper only provides getChildren()
to get children under a node - i.e. it returns ALL children. Each Task worker then tries to process all current tasks. There's code to prevent duplicates but it could mean that the first process to call getChildren()
will likely process more than other processes.
Fortunately, there is a workaround. Curator's PriorityQueue processes children a small amount at a time to allow for higher priority new items getting inserted. I did some testing with this and seems to help. So, simply change your TaskType's mode from the default to TaskMode.PRIORITY
. I ran your tests with the default and I get this:
workflow tester workflowManager-1 has executed 376 tasks
workflow tester workflowManager-3 has executed 336 tasks
workflow tester workflowManager-2 has executed 288 tasks
Using TaskMode.PRIORITY (the only change) I now get this:
workflow tester workflowManager-2 has executed 353 tasks
workflow tester workflowManager-3 has executed 313 tasks
workflow tester workflowManager-1 has executed 334 tasks
So, if possible, please test using TaskMode.PRIORITY. It shouldn't affect performance as all tasks will have the same priority by default.
from workflow.
I've re-written the queue code to more evenly distribute. Please test with: https://github.com/NirmataOSS/workflow/tree/simple-queue (i.e. branch "simple-queue")
from workflow.
Related Issues (20)
- A child task can submit its ancestor after a delay HOT 4
- Pluggable serialization HOT 1
- Providing more debugging for the scheduler
- scheduling of workflow stopped working HOT 4
- New feature request - Allow custom workflow task executors HOT 4
- workflowManager.getRunInfo() take a long time to complete HOT 15
- Potential thread leak HOT 5
- New feature request - workflow diagnostics HOT 9
- JsonSerializer.getTask() is not visible HOT 2
- Upgrade to Curator 3.2.0
- Graceful shutdown of one workflow instance. HOT 6
- workflow client or/and curator client loose connection to zookeeper HOT 3
- Use ZK container nodes wherever possible
- Scheduler stops receiving events from PathChildrenCache when associated ZK path was deleted and later recreated HOT 8
- FAILED_STOP re-executes the task HOT 2
- Long running Idempotent task keeps creating watches and leads to Out Of Memory issues HOT 7
- Add failure policy (or similar) to limit retries HOT 3
- Need a way to retrieve the metadata of a task submitted but not executed HOT 4
- mvn clean install command failing to execute HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from workflow.