Comments (4)
You are right in that a TaskScheduler is stateful and that there would be a single instance in a cluster. The size of state information in Fenzo would be proportional to the number of VMs (aka agent/host) and the number of tasks assigned. Other state information, such as related to autoscaling and groups, are too small to be concerned about.
Although I have no specific data to quote, I used this test program to create 10,000 VMs (agents/hosts) each with 16 cores, filling the 160K cores with 45K tasks (1-, 8, and 12-copu tasks). I noticed the resident set size to be about 750MB.
While this is not meant to be a reference to figuring out memory for a given scale, the quick hack shows you a way to test your possible scale and measure the anticipated memory size as well as the performance to expect. Fenzo makes it easy to also test new plugins for constraints and fitness calculators. LeaseProvider and TaskRequestProvider classes in the test package are useful for this, instead of requiring actual agent hosts.
from fenzo.
Thanks for the reply @spodila.
What are your thoughts toward resiliency? For example, if your process containing the TaskScheduler
dies then what action do you take?
Thanks again for the dialogue.
(closing as you've addressed my primary question)
from fenzo.
Upon start of the process containing the TaskScheduler, we initialize Fenzo with the entire state by calling TaskScheduler.getTaskAssigner().call(...) method for each task that is already known to be running. Specifically, since we run multiple instances of our framework with ZooKeeper based leader election, we perform this initialization upon being elected as the leader.
This does bring up a concern on latency at startup with large number of running tasks. However, we haven't come to the point where that is the next big problem to solve. If that does concern you, I'd love to hear your thoughts on it and/or exchange ideas on solving it.
from fenzo.
Perhaps a plugable mechanism that a) requests state from other scheduler instances, returning a CompletionStage aka Scala's Future, and b) a callback so that the scheduler can be notified of new state asynchronously.
This then permits multiple schedulers to work together. We could then back the request and the callback with CRDTs (for example).
I think you should also consider reentrancy in your API to support such a mechanism and concurrency in general.
from fenzo.
Related Issues (20)
- contributor agreement. HOT 8
- NamedResourceSetRequest HOT 2
- Resource ranges and "greedy" scheduling HOT 4
- Support v1 Protos (or provide utility to convert) HOT 5
- Support dynamic fitness threshold HOT 3
- Allow for custom shortfall evaluators HOT 4
- Execution failed for task ':fenzo-core:compileJava'.> 无效的源发行版: 1.8 HOT 2
- Update Mesos dependency
- BinPacking with weights HOT 2
- Incorrect handling of reserved resources HOT 2
- Unsafe concurrent access to unknown lease collection in AssignableVMs class
- Pluggable ENI fitness evaluator HOT 1
- After reading Fenzo, I don't understand how to get Framework information on Mesos?
- After reading Fenzo, I don't understand how to get Framework information on Mesos?
- DisableVM does not remember hosts HOT 2
- .travis.yml: The 'sudo' tag is now deprecated in Travis CI
- Dynamic ResAllocs
- CVE-2018-7489 on jackson-databind 2.4.5 HOT 1
- Unused variable
- Unreleased Resource: Streams HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fenzo.