This code is developed in the context of PHYSICS project to enable the meta-scheduling on a multi-cluster continuum using different algorithms.
Using docker:
docker run -ti -p 8080:8080 ryaxtech/global-continuum-placement:main
Install poetry, then run:
poetry install
poetry shell
./main.py
Create a platform file like this one in ./test-platform.json:
{
"platform": {
"cluster1": {
"type": "Edge",
"resources": {"nb_cpu": 4, "nb_gpu": 0, "memory_in_MB": 1024},
"architecture": "x86_64",
"objective_scores": {"Energy": 60, "Availability": 5, "Performance": 25}
},
"cluster2": {
"type": "Edge",
"resources": {"nb_cpu": 2, "nb_gpu": 1, "memory_in_MB": 4096},
"architecture": "arm64",
"objective_scores": {"Energy": 100, "Availability": 30, "Performance": 50}
},
"cluster3": {
"type": "HPC",
"resources": {"nb_cpu": 1000, "nb_gpu": 50, "memory_in_MB": 16e6},
"objective_scores": {"Energy": 10, "Availability": 80, "Performance": 100}
}
}
}
Initialize with a platform description through a REST API:
curl -H "Content-Type: application/json" -d @test-platform.json http://127.0.0.1:8080/clusters
Here an example application from the workload-test.json file:
{
"id": "19fe4293742e0b2c",
"displayName": "Full example",
"type": "Flow",
"executorMode": "NativeSequence",
"native": true,
"objectives": {
"Energy": "high",
"Availability": "low"
},
"flows": [
{
"flowID": "flow1",
"functions": [
{
"id": "function1",
"sequence": 1,
"allocations": [
"cluster3"
]
},
{
"id": "function2",
"sequence": 2,
"annotations": {
"cores": "2"
}
},
{
"id": "function3",
"sequence": 3,
"annotations": {
"cores": "2"
}
},
{
"id": "function4",
"sequence": 4,
"annotations": {
"cores": "2",
"architecture": "arm64"
}
},
{
"id": "function5",
"sequence": 5,
"annotations": {
"cores": 2,
"memory": 1000
}
}
]
},
{
"flowID": "flow2",
"executorMode": "NoderedFunction",
"annotations": {"core": 1, "memory": 1000},
"functions": [
{
"id": "excluded-func",
"annotations": { }
}
]
}
]
}
Then, we can ask for the scheduler to allocate our workflow functions on the clusters with different constraints:
curl -H "Content-Type: application/json" -d @test-workload.json http://127.0.0.1:8080/applications
The result should be:
[
{
"flowID": "1234",
"allocations": [
{"cluster": "cluster3", "resource_id": "function1"},
{"cluster": "cluster1", "resource_id": "function2"},
{"cluster": "cluster1", "resource_id": "function3"},
{"cluster": "cluster2", "resource_id": "function4"},
{"cluster": "cluster3", "resource_id": "function5"}
]
},
{
"flowID": "flow2",
"allocations": [
{"cluster": "cluster1", "resource_id": "flow2"}]
}
]
Let's explain these decisions:
function1
has an explicit cluster constraint for thecluster3
so it is allocated there.function2
only requires 2 CPU and all clusters have at least 2 CPU. The architecture constraint is not defined but by default it is x86_64, so only the cluster 1 and 3 can fit the constraint. The scheduler now take into account the objectives and favors the Energy and the Availability so it choose thecluster1
.function3
has the same constraints asfunction2
so it goes on the same cluster, thecluster1
because it still has 2 CPU available.function4
goes oncluster2
because it requires anarm64
architecture and only thecluster2
is providing it.function5
has only resources constraint and should go to the cluster1 regarding the objectives but it does not have enough resources. It is finally allocated tocluster3
which is the only one that fits the constraints and have available resources.flow1
: because it is a NoderedFunction, this flow is scheduled at the flow level. It is scheduled on thecluster1
because it has enough resources.
This also work at the flow level with the same annotations.
The scheduling is done function by function in the dependency order of the workflow.
For each function we apply filters to remove clusters that do not fit the placement and architecture constraints. Then, we apply a scoring function based on the objectives scores of the clusters and the objective levels of the workflow. Finally, we use a first fit policy on the sorted by highest score and allocate to the first cluster in the list that has enough resources.
If not cluster fits the constraints of a function it is not allocated. (Might be rejected with an error in the future.)
Foa-Energy
is an algorithm based on a Linear Program, to optimize the placement of batches of serverless functions in terms of energy consumption
, data transfers
, execution time
, and number of machines used
. It uses a single-constrained optimization Linear Program, that in addition to extra constraints turns into a multi-objective optimization algorithm.
To take a decision of scheduling, Foa-Energy needs to have access to the platform description, where the different clusters and the number of nodes (or machines) per cluster are available. As a direct inputs, Foa-Energy needs to receive, per cluster (these inputs are provided into the workload json file):
"averageDuration"
: 5.36,"averageDurationContainer"
: 1,"averageEnergy"
: 1,"averageEnergyContainer"
: 1,
Foa-Energy follows a few notations:
p
: Functions' execution time -> "averageDuration"p_tilde
: Containers' execution time -> "averageDurationContainer" ->c
: Functions' energy consumption -> "averageEnergy"c_tilde
: Containers' energy consumption -> "averageEnergyContainer"N
: The number of functions in the batchH
: Number of clusters availableK
: Number of different containers usedenv
: The association of container and functionmc
: The number of machines per clusterTMax
: A constraint of makespan (maximum execution time)CMax
: A constraint of amount of data downloaded for containers
The solution will be proposed as matrix x
and y
, where x
is the allocation of functions over the clusters, and y
the allocation of containers over the clusters.
For a local level of scheduling, it is needed an algorithm to takes Foa-Energy decisions and to do the local placement.
For more details, and to check a full evaluation of Foa-Energy, please refer to Foa-Energy's paper repository
for now the component supports these annotations:
cores
: number of CPU coresmemory
: memory in MBlocality
: which is a cluster type defines for each cluster like "HPC", "Cloud", "Edge", and "On-premise".architecture
: Hardware architecture, one of "x86_64", "arm64"optimizationGoal
: Should be "Energy", "Performance", or "Availability"importance
: the level associated to this goal : "Low, "Medium", "High"
Setup the environment:
poetry install
poetry shell
In the same terminal run linter with:
./lint.sh -f
Run the tests:
./test.sh
- Write a FAILING test
- Run it
- Write the code to make the test pass
- Lint your code with
./lint.sh -f
- Run the tests with
./test.sh
- Create a merge request