Comments (10)
Amazing, thank you! I just started designing a proper rule and this is a great help!
from bb-remote-execution.
One of the things I'm always careful of when I add Prometheus metrics is that I make sure to not expose any redundant information. Not only does that blow up the size of /metrics, it also makes it hard to grasp what the correlation between values is.
Every operation known by bb-scheduler always goes through the following state transitions:
nonexistent -> queued -> executing -> completed -> deleted
We have metrics in place at every state transition:
- nonexistent -> queued: buildbarn_builder_in_memory_build_queue_operations_queued_total
- queued -> executing: buildbarn_builder_in_memory_build_queue_operations_queued_duration_seconds_count
- executing -> completed: buildbarn_builder_in_memory_build_queue_operations_executing_duration_seconds_count
- completed -> deleted: buildbarn_builder_in_memory_build_queue_operations_completed_duration_seconds_count
What do you do if you want to measure the total number of operations in certain states? You simply subtract counters of the adjacent state transitions. For example, if I want to measure the number of operations that are either queued or executing (which is likely what you want to base your autoscaling on, as that represents the total amount of work you could be doing):
buildbarn_builder_in_memory_build_queue_operations_queued_total
-
sum(buildbarn_builder_in_memory_build_queue_operations_executing_duration_seconds_count) without (result, grpc_code)
We need to use the aggregation, because buildbarn_builder_in_memory_build_queue_operations_executing_duration_seconds_count has two additional labels that buildbarn_builder_in_memory_build_queue_operations_queued_total does not have.
from bb-remote-execution.
Thanks for a quick and detailed response!
Unfortunately this doesn't play well with k8s HPA + custom metrics exporter that can't execute an arbitrary expressions and requires a named metric. I could work around that issue though and record
the provided expression.
from bb-remote-execution.
By the way, here's an expression that's a bit more complex that I'd like to invite people to give a try:
max_over_time(
quantile_over_time(
0.95,
(
buildbarn_builder_in_memory_build_queue_operations_queued_total
-
sum(buildbarn_builder_in_memory_build_queue_operations_executing_duration_seconds_count) without (result, grpc_code)
)[4h:]
)[1h:]
)
In short, it takes the 95th percentile of the amount of work that came in over the last four hours. We then do some post-processing by taking the max_over_time(1h) to remove any jitter, causing us to unnecessarily kill/spawn/kill/spawn/... new workers.
(Note that you may want to tune the constants in the expression above to suit your needs. Also be sure to place it in a recording rule, because it's a bit heavy.)
from bb-remote-execution.
Just want to add that the proposed query is not equivalent to my original idea because it's also counting currently executing actions while I was thinking about scaling based on the actions that are waiting in the queue.
from bb-remote-execution.
You mean only scaling based on the size of the queue? But that means that for a sufficiently sized cluster (where the queue generally remains empty), such an autoscaler would scale the cluster down to a size of zero...?
from bb-remote-execution.
That's not really an issue if you set the minReplicas of your autoscaler to e.g. 2 right?
from bb-remote-execution.
Sorry, maybe I wasn't clear. My point was that this:
I'm always careful of when I add Prometheus metrics is that I make sure to not expose any redundant information.
is not strictly correct, as this information is not redundant because it can't be obtained any other way.
Depending on the type of the autoscaler you may indeed want to use "all actions in progress + in the queue" vs "all actions in the queue". Right now I've settled for the former as per advice above in this thread, but I'd like to play with the latter as well. As mentioned, there are challenges, but I think it better matches the goal for when your node pool is in the cloud and is auto-scaled and nodes come in one-by-one and you have to pay for the ones spawned. So I'd like my HPA to spawn replicas one-by-one when the queue is not empty for some significant time period.
Upd 1.
The "ideal" algorithm for me would be:
- Spawn an empty node pool, no replicas are active (okay, maybe one just to make builds warm up faster)
- Once the queue is not empty for like 2 minutes, spawn a second replica. It will not be able to spawn because of lack of hardware, so cloud will spawn a new node in the pool for me
- If the queue is still not empty for another 2 minutes, spawn a third one
- Et cetera
- If the queue is empty for like 10 minutes, kill one replica. The node will be free and cloud will kill it for me.
- If the queue is still empty for another 10 minutes, kill another replica
- Et cetera
This way I think I can design the scaling process to minimize infrastructure expenses at nights yet still provide maximal build speeds during the day when there are many builds happening.
from bb-remote-execution.
Sorry, maybe I wasn't clear. My point was that this:
I'm always careful of when I add Prometheus metrics is that I make sure to not expose any redundant information.
is not strictly correct, as this information is not redundant because it can't be obtained any other way.
Depending on the type of the autoscaler you may indeed want to use "all actions in progress + in the queue" vs "all actions in the queue". Right now I've settled for the former as per advice above in this thread, but I'd like to play with the latter as well.
Because a concrete recording rule hasn't been provided to you, doesn't mean it's not possible. What you are looking for (i.e., the number of actions currently in the queue) can be obtained by using this recording rule:
buildbarn_builder_in_memory_build_queue_operations_queued_total
-
buildbarn_builder_in_memory_build_queue_operations_queued_duration_seconds_count
As mentioned, there are challenges, but I think it better matches the goal for when your node pool is in the cloud and is auto-scaled and nodes come in one-by-one and you have to pay for the ones spawned. So I'd like my HPA to spawn replicas one-by-one when the queue is not empty for some significant time period.
Upd 1.
The "ideal" algorithm for me would be:
- Spawn an empty node pool, no replicas are active (okay, maybe one just to make builds warm up faster)
- Once the queue is not empty for like 2 minutes, spawn a second replica. It will not be able to spawn because of lack of hardware, so cloud will spawn a new node in the pool for me
- If the queue is still not empty for another 2 minutes, spawn a third one
- Et cetera
- If the queue is empty for like 10 minutes, kill one replica. The node will be free and cloud will kill it for me.
- If the queue is still empty for another 10 minutes, kill another replica
- Et cetera
This way I think I can design the scaling process to minimize infrastructure expenses at nights yet still provide maximal build speeds during the day when there are many builds happening.
The problem with such an algorithm is that the speed at which you scale up/down depends on the absolute size of your cluster. If you have a cluster with a hundred replicas and no load, it's going to take 100 * 10 = 1000 minutes = 16 hours, 40 minutes to scale all the way down. The recording rule that I presented, that scales the cluster based on a load quantile, works well regardless of the absolute scale of a cluster. It is guaranteed to scale up/down to a desired capacity within a certain amount of time.
Another problem with such an approach is that the startup speed of a node is also not factored in. If all you need is one more server, but it takes 10 minutes to start one up, you will get five new servers.
from bb-remote-execution.
What you are looking for can be obtained by using this recording rule:
Thanks! But yeah, I wasn't able to find this one myself.
The problem with such an algorithm is that the speed at which you scale up/down depends on the absolute size of your cluster.
That's why it can be improved based on the length of the queue. Plus, some manual tuning of parameters will certainly help. But I definitely see your point, there are challenges to that approach compared to the classic scaling based on load.
from bb-remote-execution.
Related Issues (20)
- Failed to build bb-remote-execution on Windows HOT 2
- Service resolution via Consul HOT 1
- BuildBarn always resolves symlinks to origin files HOT 9
- Buildbarn allows action to write under user home directory HOT 2
- Support remote persistent workers HOT 1
- Is it possible to allow for the setting of priorities or weights for workers? HOT 3
- Mount '/proc' into chrooted actions HOT 5
- Bazel build fails if remote worker is being downscaled during build HOT 1
- Linux sandboxing / network blocking HOT 4
- Document whether this is secure against malicious clients HOT 15
- worker/runner does not handle command with `./` in front of the command name HOT 6
- Allow runners to have multiple platform configurations HOT 2
- Instance name lost when ExecuteRequest arrives in scheduler HOT 6
- Scheduler does not reschedule on Internal Errors HOT 5
- Scheduler should report action as Executing when it starts, instead of waiting until ExecutionUpdateInterval
- Scheduller validation breaks after docker image: 20210430T150052Z-ed1e2f4 HOT 2
- Building bb_runner and bb_worker fails on Mac M1/arm64 HOT 16
- RequestMetadata extraction does not work with Bazel HOT 2
- scheduler handles workers above maximum size poorly HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bb-remote-execution.