Comments (41)
@derekyyyan @talhof8 you may want to take a look at what we did here: https://docs.openfaas.com/openfaas-pro/retries/
from nats-queue-worker.
@alexellis The way I see it, the fact that you get a 429
at all when using async with concurrency limits in an exposure of implementation detail (which is kind of expected at the moment, because the existing concurrency limiter was clearly designed for sync); when using sync it's part of the contract of sync functions, but when using async the contract is "handle this eventually, and let me know the results".
Getting a 429
on an async function doesn't make sense (unless it's an application error, such as a 429
on an HTTP request inside the function), because it's basically a statement of "i'm busy", to which the entity doing the async invocation would always respond, "yeah, that's why I said eventually.".
Re-queueing an async invocation after a 429
due to concurrency limits would require:
- A bunch of boilerplate on every client that invokes a function async
- Always setting a callback url
- Listening on said callback url for a
429
from OF - Keeping track of requests by ID and their original payloads
- Re-invoke the function every time, until it either fails or succeeds (non-
429
).
That's a lot of boilerplate and workflow accommodation that in my opinion would be better served being part of the OF framework itself, adding another feature to make sure that things "just work".
from nats-queue-worker.
Feel free to try out this prototype in your cluster - it's valid until 7th March to play with.
# Deploy demo
kind create cluster
arkade install openfaas --max-inflight=5 --set queueWorker.image=alexellis2/queue-worker:retry2
# Deploy test function
faas-cli deploy --name sleep --image ghcr.io/openfaas/sleep:latest \
--env sleep_duration=5s \
--env write_timeout=10s \
--env max_inflight=1 \
--env combine_output=false \
--env read_timeout=10s \
--label com.openfaas.scale.min=1
# Monitor (Terminal 1)
kubectl logs -n openfaas deploy/queue-worker -f|grep -v Received
# Now test with (Terminal 2)
time curl -i http://127.0.0.1:8080/async-function/sleep -d "" && sleep 4 && time curl -i http://127.0.0.1:8080/async-function/sleep -d ""
# View the logs in Terminal 1
You'll see the first request goes in-flight, the second gets retried a few times before being processed.
# Now try again and cause congestion with hey, 10 requests, only 1 can run at once.
hey -c 1 -n 10 -m POST -d "" http://127.0.0.1:8080/async-function/sleep
# To tune the minimum amount of replicas ready, add the the faas-cli deploy:
--label com.openfaas.scale.min=5
# To tune the concurrent connections to each function, update faas-cli deploy:
--env max_inflight=2
from nats-queue-worker.
Will test
from nats-queue-worker.
The request was for asynchronous concurrency limiting, that is what the preview / demo does.
The ack time has to be set to:
- The (max wait time for the max retries) + the function timeout
So if you have a maximum backoff of 32 seconds, and a maximum function timeout of 2 minutes, then you need to set your ACK time to 2m32s.
Each message is either executed to completion, or finds itself facing a retry, then waits once, before being re-queued.
This means that the ACK time is predictable and bounded.
Retries are to be expected, that was the feature requested here. OpenFaaS does not track connections to pods, that's the job of a service mesh. Even with a service-mesh, you cannot guarantee that a retried message will hit a pod with open capacity, it can however help that request hit a pod with fewer connections.
There are two mitigations for any concern about "a queue worker being clogged up" as such:
- The
max_inflight
for a queue-worker can be set to a larger number like 100 (previously, this involved adding more Pods) - thanks to @matthiashanel for that change - You can have a dedicated queue for slow or concurrency-limited functions
Remember that we are using a very contrived example here, of one function which can only process one request at once.
The 429 messages could be used to trigger an auto-scaling alert, I'm aware of one or two users who do this.
My assumption is that, if the work is completed and the currency is handled in a sane, bounded way, then it is irrelevant whether there is capacity in one of five pods, and a retry still happens. Show us with real data why that matters? Show us what happens when you add a service mesh.
Whatever the workload, some tuning and experimentation will be required. This is a specialist case and I would like to see some real usage from the demo before considering further changes or features.
from nats-queue-worker.
While I go do some load testing, a couple more clarifying questions:
Each message is either executed to completion, or finds itself facing a retry, then waits once, before being re-queued.
What happens if a message has been retried for the max # of times? Are you saying that it gets put back into the queue? Or does the queue worker give up and sends a 429 as the official response to the callback URL?
The
max_inflight
for a queue-worker can be set to a larger number like 100
If max_inflight
is much larger than the number of available function pods, wouldn't that cause lots of retries? Say, if I have max_inflight=100
and 10 available function pods, wouldn't that mean at any given time, there are 90 messages being retried over and over again? (Assuming the queue has plenty of messages).
Maybe that's fine? I'll load test that scenario too.
This is a specialist case and I would like to see some real usage from the demo before considering further changes or features.
I agree that my use case is niche (and perhaps Kevin's too), but I definitely will be using this demo, as the original behavior where the queue worker discards messages when function pods are busy simply doesn't work for me. The extra requests from my previous comment are nice-to-have's, but I could live without those.
from nats-queue-worker.
That actually sounds like a really elegant way to autoscale; I'll need to test that out.
One note on that: if 429
s from the function itself are indistinguishable (in prometheus) from 429
s from watchdog, it could theoretically create application-level bugs, because normally you'd backoff if your functions start getting 429
s, rather than increasing the number of pods.
Example:
- function/a: watchdog
429
s, autoscale up, which handles additional requests as intended - function/b: application
429
s, autoscale up, which hammers an external API even though we weren't necessarily at capacity, and causes an increased amount of invocations in our queue to callback with429
.
This can likely be mostly, if not entirely, mitigated by handling 429
s in your application, but it might be something that users need to watch out for.
from nats-queue-worker.
I've done some load testing, all with 100 requests and a sleep duration of 10s, and using Request Bin to record callbacks:
queue.max_inflight | # pods | time elapsed | expected time (# requests / # pods * sleep duration) | % increase from expected time | # Callbacks received |
---|---|---|---|---|---|
10 | 1 | 1014s | 1000s | 1.4% | 76 |
10 | 5 | 343s | 200s | 71.5% | 100 |
10 | 10 | 168s | 100s | 68% | 100 |
5 | 1 | 1326s | 1000s | 32.6% | 89 |
5 | 5 | 276s | 200s | 38% | 100 |
1 | 1 | 1022s | 1000s | 2.2% | 100 |
- When a queue worker reaches the max # of retries, it gives up without sending a 429 to the callback URL, as shown by the examples with # Callbacks received less than 100. I checked the queue worker logs and the number of times it says
giving up
matches the number of missing callbacks. I think queue workers should return a 429 to the callback, so that users can at least detect when messages are being given up. - It seems that the higher the
queue.max_inflight
, the less efficient the system becomes. as shown by comparing % increase from expected time against queue.max_inflight. The first row is an exception, but I suspect that only happened because the queue workers are giving up on messages. Withmax_inflight=10
, it takes ~70% more time to handle all the requests, which translates to ~40% loss of efficiency. While I can live with this, it certainly isn't negligible.
from nats-queue-worker.
Hi,
My exposure to 429
is on the function side - not the watchdog side. When developing a function that uses Twilio, I had a bug that caused the Twilio API to respond with a 429
due to my misuse. Enter my first exposure to 429
, and the concept of exponential backoff.
At the time, my function logged the 429
and returned a 500
, instead of passing through the 429
, which was intentional. I then fixed my bug and never thought about this again - until this discussion.
Now that we're here (I see this mentioned above) the use case I am thinking of is:
- Consumer calls function via async
- Function encounters
429
from an external source - How do we handle that?
For function retries, It seems like my options are:
- Log the
429
and return a500
, no retry will be done, like I am now. This seems crude. - Implement retry logic in my function, for example, Twilio returns a
Retry-After
header! I don't want to do this for every function, I'm lazy. - Pass the
429
back as the async function's response, and let a service mesh do retries somehow. I wonder if I'll be able to share theRetry-After
with LinkerD? We'll see! - Try the code in this branch, to see how OpenFaaS handles functions returning
429
responses.
I am currently leaning towards option 3 to do function retries, using LinkerD to do the retry for me. For example, if my function passes the 429
through using out-of-the-box OpenFaaS (so not the code in this issue), my expectation is that LinkerD will try it again. I can setup this policy to only apply to the async path to the function's service.. More updates to follow.
Also, I'll try to test cold starts in a scale from zero setting. I'm not sure what to expect there...if the pod isn't alive, the watchdog won't be able to respond, right? So in that case, the queue worker I am guessing would return 429
...perhaps LinkerD can help there too? We'll see.
from nats-queue-worker.
Just went through the demo more, and better understand how these settings do tuning. I didn't get a chance to do cold start testing, or with LinkerD but will do so later today.
- with 1 replica, and max in-flight 1, can handle 10 incoming requests in roughly 1m54s, in 69 attempts
- with 5 replicas, and max in-flight 1, can handle 10 incoming requests in roughly 20 seconds, in 40 attempts
- with 5 replicas, and max in-flight 2, can handle 10 incoming requests in roughly 8 seconds, in 10 attempts
- with 5 replicas, and max in-flight 2, can handle 10 incoming requests in roughly 6 seconds, in 10 attempts
- with 5 replicas, and max in-flight 2, can handle 10 incoming requests in roughly 5 seconds, in 10 attempts
- with 5 replicas, and max in-flight 2, can handle 10 incoming requests in roughly 5 seconds, in 23 attempts
- with 5 replicas, and max in-flight 2, can handle 100 incoming requests in roughly 2m14s-2m15s, in 349-353 attempts
I'm guessing the variance is to due to picking pods at random, instead of free ones? I'll see if the LinkerD tcp connection balancing stuff makes that go away - like you mentioned Alex.
from nats-queue-worker.
/message: enduser
from nats-queue-worker.
Thank you for your interest in OpenFaaS. This project is maintained and made available for hobbyists and commercial users alike, so we need to balance our time across everyone's needs. Whilst we are excited in your interest in using OpenFaaS, we would also ask you to take a look at our contribution guide on Setting expectations, support and SLAs.
Commercial users can purchase support in order to get dedicated help from OpenFaaS Ltd, or they can book ad-hoc consulting hours to get an engineer to dedicate time to helping them.
If that is not a good fit for you at this time, please check out the OpenFaaS GitHub Sponsors options which are priced for practitioners like yourself. Organisations can also sponsor through their GitHub billing relationship.
When you become a sponsor as an indvidual, it will show this on your issues and PRs, so that the community can see that you are supporting our work, and can prioritise your needs.
Thank you for supporting OpenFaaS.
from nats-queue-worker.
Thanks for the suggestion @kevin-lindsay-1 - we will prioritise your request on the roadmap and have a look into it, in due course.
from nats-queue-worker.
Hi, just wanted to chime in as I'm also running into this exact scenario.
One thing I noticed is that whatever is load balancing the requests to the function pods does not always send the request to an idle pod. For example, if I have 2 pods of a function with max_inflight=1
, and then I invoke the function asynchronously twice, there is a chance that both requests go to the same pod, resulting in a HTTP 429
response for one of the requests.
It maybe worthwhile to implement some kind of monitor for when a function pod is "at capacity", i.e. handling max_inflight
number of requests concurrently, and the queue worker can simply use the monitor to wait for a pod to become available, instead of blindly retrying with backoff. (I don't know if this is realistically possible.)
from nats-queue-worker.
One thing I noticed is that whatever is load balancing the requests to the function pods does not always send the request to an idle pod
Sounds like you may not have direct_functions set to false, which causes the gateway to use a Kubernetes service instead of the endpoint load balancing logic in faas-netes. See the guide on production use in the docs and upgrade to the latest version of the chart. @derekyyyan
from nats-queue-worker.
I can set direct_functions
explicitly to false, although the docs for the faas-netes helm chart suggests that it already defaults to false.
I'm also not using Linkerd or Istio.
I'll try replicating the problem when I get the chance, but here's my helm chart settings in case you can spot something I'm doing wrong. I'm on version 7.1.0 so should be mostly the same as the latest (7.2.0).
values.yaml
functionNamespace: openfaas-fn
httpProbe: false
psp: true
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx-private
cert-manager.io/cluster-issuer: letsencrypt-production
tls:
- hosts:
- <redacted>
secretName: openfaas-tls-cert
hosts:
- host: <redacted>
serviceName: gateway
servicePort: 8080
path: /
faasnetes:
httpProbe: false
imagePullPolicy: Always
setNonRootUser: true
gateway:
replicas: 2
queueWorker:
replicas: 10
ackWait: 1m
faasIdler:
create: false
from nats-queue-worker.
@derekyyyan this is a different topic to the one that Kevin opened, so we are going off piste here and you should raise your own ticket.
However.
Show us how to reproduce your hypothesis that endpoint load-balancing isn't working for you?
One thing I noticed is that whatever is load balancing the requests to the function pods does not always send the request to an idle pod
What makes you think this is the case?
We've done quite extensive testing and see load-balancing taking place aggressively. I'm including output to show it happening as expected. You are welcome to try the same commands and monitor the results.
faas-cli store deploy nodeinfo --label com.openfaas.scale.min=5 -g 192.168.0.26:31112
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-fjpf2
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-fjpf2
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-8nvll
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-5wt4c
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-47ghl
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-dlzt6
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-8nvll
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-dlzt6
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-47ghl
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-47ghl
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-dlzt6
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-5wt4c
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-5wt4c
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-dlzt6
alex@alex-nuc8:~$
Async:
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/async-function/nodeinfo -d ""
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/async-function/nodeinfo -d ""
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/async-function/nodeinfo -d ""
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/async-function/nodeinfo -d ""
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/async-function/nodeinfo -d ""
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/async-function/nodeinfo -d ""
alex@alex-nuc8:~$ kail -n openfaas-fn
openfaas-fn/nodeinfo-6dbd4cb849-pbsjb[nodeinfo]: 2021/02/24 18:04:53 POST / - 200 OK - ContentLength: 106
openfaas-fn/nodeinfo-6dbd4cb849-z46dn[nodeinfo]: 2021/02/24 18:04:53 POST / - 200 OK - ContentLength: 106
openfaas-fn/nodeinfo-6dbd4cb849-cl2kg[nodeinfo]: 2021/02/24 18:04:53 POST / - 200 OK - ContentLength: 106
openfaas-fn/nodeinfo-6dbd4cb849-pbsjb[nodeinfo]: 2021/02/24 18:04:53 POST / - 200 OK - ContentLength: 106
openfaas-fn/nodeinfo-6dbd4cb849-dfds6[nodeinfo]: 2021/02/24 18:04:53 POST / - 200 OK - ContentLength: 106
from nats-queue-worker.
@kevin-lindsay-1 to focus back on your original suggestion.
However, the same behavior exists when using async functions; if you set queue.max_inflight=5 and of-watchdog.max_inflight=1 , if the queue worker attempts to send that request to a pod that already has 1 invocation in progress, it receives the same 429 and forwards it off to the gateway as your official response. I propose that a queue worker would instead retry on 429s, or a similar retry mechanism (as a 429 is a completely valid response for functions at the moment).
A retry has been implemented by at least one user that I know of, and makes for a more natural fit for async functions than synchronous because:
- The body is small
- The request is already recorded, so doesn't need to be buffered
My concerns are:
- You may never be able to wait long enough to retry, what if the job takes 10 hours?
- You will be blocking other executions from running
- The retry and "busy wait" sounds like it would be stateful, and that state has to be managed somewhere
- Would this open the gate to requests for retryign on other HTTP codes?
- What if the 429 is not from the watchdog, but from a Twitter API or another API that will eventually ban you,if you simply do an exponential backoff, and don't use its custom HTTP headers for retrying?
from nats-queue-worker.
@alexellis I didn't mean to derail the original request - I'll create a separate issue if I get the chance to reproduce the behavior.
Although if we get the original feature implemented, I won't have to worry about queue workers sending requests to a "busy" pod as they can simply retry later.
from nats-queue-worker.
Thanks for sharing your expectations Kevin. When you have some time, I'd appreciate a reply to the questions that I asked and the concerns on retrying.
from nats-queue-worker.
@alexellis Ah, sorry, I wasn't sure if those concerns were around the OF implementation or the user implementation.
You may never be able to wait long enough to retry, what if the job takes 10 hours?
Does ack_wait
not already account for this?
You will be blocking other executions from running
I use separate queues for my async functions already because the timing of specific jobs can get fairly difficult to configure with one global queue. A little too much mental overhead for something that's pretty easy to fix with multiple queues, considering how lightweight queue workers are.
The retry and "busy wait" sounds like it would be stateful, and that state has to be managed somewhere
Agreed, however if I recall correctly, the ack_wait
implementation handles retry nicely as it is (afaik that uses a memory-based state system), so you could theoretically avoid such additional state in storage or memory by not considering a function "in progress" if you get a 429
from watchdog (or similar contract), and therefore not release the request in memory until you get a 302
or the like from a pod, potentially avoiding creating a bunch of additional state code. In other words, if you put the operation of retrying for concurrency before you actually consider it "queued", you may be able to avoid more expensive state.
If this wasn't implemented in OF I would have to do this myself, and it would likely require the same if not more expensive state, because I don't have the kind of contextual information that something internal to OF would have, so I would have to more-or-less keep track of "everything" so that I can invoke the same function with the exact same headers and payload, as well as cleaning out such a stateful system.
Would this open the gate to requests for retryign on other HTTP codes?
I would say that this is a contract between watchdog and the queue worker, and should therefore be only their collective concern, that is the scope of HTTP codes shouldn't increase unless the flow changes; application-level retry would maybe be a different feature request, but if that one day gets requested and implemented, a distinction between the watchdog and the function/handler itself may need to be made explicit in the HTTP request between the watchdog and queue-worker, if it's not already.
What if the 429 is not from the watchdog, but from a Twitter API or another API that will eventually ban you,if you simply do an exponential backoff, and don't use its custom HTTP headers for retrying?
As far as I'm aware with async, watchdog should return a 202
and if it errors its runtime pod should respond by sending an error response in the context, meaning that said 429
should be treated as the function response, rather than the queue response.
from nats-queue-worker.
@alexellis I have tested this, playing with the minimum number of pods, and it seems to work quite well.
The only thing that I'm wondering about is that when the queue's max_inflight
and number of pods are the same, I still see logs stating that functions were being retried. Shouldn't that not happen, assuming that the load balancing is balancing to a non-busy pod, and the queue worker has a pod available at all times for each invocation?
Specifically, I set the flag --label com.openfaas.scale.min=5
and used hey -n 100
.
from nats-queue-worker.
Thanks for trying it with a basic scenario.
OpenFaaS doesn't know anything about individual pods, endpoint IPs, or whether a pod is receiving more or less traffic than has been configured for its watchdog. Endpoints are picked at random. You'll need a service mesh if you want "least connections" as the semantics. Even then, you will see some retries.
Perhaps you can try it with a staging workload now to see how well it processes your specific needs? The max retry limit is 10 attempts with 100 or 500ms as the delay for the backoff.
from nats-queue-worker.
I ran a quick test, this is what I got:
Max retries = 10
Backoff delay = 100ms, 200ms, 400ms, 800ms, 1600ms, 3200ms, 6400ms (etc)
Total async requests made = 20
Replicas of function = 5 (min)
Max concurrent in function = 1
Time for workload in function = 5s
Queue-worker processes = 5
Results:
Completed = 20
429s = 82
First message = 08:21:37
Last message = 08:22:36
Expected sync time - (20x5s = 100s)
Actual async time - 59s
from nats-queue-worker.
@alexellis, thank you for making a prototype so quickly.
I was able to replicate what @kevin-lindsay-1 reported: I set --label com.openfaas.scale.min=5
and queueWorker.maxInflight=5
, then made 5 requests, and the queue worker logs showed that it had to retry one of the request 3 times, even though there are enough pods to accommodate all 5 requests.
This behavior is what I was originally referring to, sorry if I failed to explain clearly.
Endpoints are picked at random. You'll need a service mesh if you want "least connections" as the semantics.
I see. I think having a basic retry will inevitably lead to inefficient use of pods, but for my use case this is acceptable, and I can try using service mesh to improve efficiency.
Couple more questions/requests:
- Ability to configure the max number of retries, potentially allowing unlimited retries.
- Would it make sense to put a cap on the backoff delay to the value of
ackWait
? Since a function should never run longer thanackWait
, it makes sense to me that the queue work shouldn't wait any longer than that for retrying. - I can imagine a situation where one request just keeps hitting a busy pod and increasing the backoff delay, while other requests are being handled, resulting in that one request stuck forever. While I don't need the requests to be handled strictly in order, it would be great if queue workers can prioritize requests that have been pending for the longest amount of time.
from nats-queue-worker.
What happens if a message has been retried for the max # of times? Are you saying that it gets put back into the queue? Or does the queue worker give up and sends a 429 as the official response to the callback URL?
I imagine the latter, but it would be nice to know this explicitly as part of what I assume will be in a follow-up PR to the documentation.
If max_inflight is much larger than the number of available function pods, wouldn't that cause lots of retries? Say, if I have max_inflight=100 and 10 available function pods, wouldn't that mean at any given time, there are 90 messages being retried over and over again? (Assuming the queue has plenty of messages).
Maybe that's fine? I'll load test that scenario too.
I use a different queue for each async job because of such fine tuning like this (mainly to remove ambiguity), but just looking at your description of the setup I'd say unless you have a pool of other async jobs that are going to be queued at the same time, you probably shouldn't specifically tell the system to effectively overload the workload of the pods. You can likely remove this issue by autoscaling and adding additional pods to handle the workload.
from nats-queue-worker.
The 429 messages could be used to trigger an auto-scaling alert, I'm aware of one or two users who do this.
That actually sounds like a really elegant way to autoscale; I'll need to test that out.
This is a specialist case and I would like to see some real usage from the demo before considering further changes or features.
From the testing I've done this is a major step and I'm happy enough with the current behavior; I'll need to eval it over time and see if I've got any pain points, but as it stands I don't currently have any tangible items that would make me think that this solution isn't currently satisfactory. Can't think of any additional things that this feature needs right now.
As long as this feature exposes tuning parameters, I'm sure I can take it a long way before encountering any new needs.
from nats-queue-worker.
you probably shouldn't specifically tell the system to effectively overload the workload of the pods. You can likely remove this issue by autoscaling and adding additional pods to handle the workload.
I think the ideal solution would be to dynamically scale either queue.max_inflight
or the number of queue worker pods along with the function pods. That way the concurrency capacity of both the queue workers and the function would be in sync. (Unless I'm mistaken, this feature doesn't exist yet. Though this is probably worth a separate issue.)
from nats-queue-worker.
@derekyyyan you are missing function.max_inflight
from your list, and I think you need to test with different durations. Not everyone is going to have a function that always takes exactly 10s to complete, or that only has a fixed number of replicas.
Do you have a use-case for this request yet @derekyyyan, or is this mostly academic? I'm asking because you will need to do some testing, and tuning depending on your specific use-cases, probably at a queue-worker level. Where you quote a "36% decrease in efficiency" - this is a specific example of where tuning would be beneficial.
One replica processing 100 requests with a concurrency level of 5 doesn't make a lot of sense.
The demo that I published doesn't have configuration for the backoff delay, or the maximum number of retries. It also doesn't support dead-letter behaviour, like you mentioned.
from nats-queue-worker.
For scaling up, try something like:
- alert: APIHighRetryRate
expr: sum(rate(gateway_function_invocation_total{code="429"}[10s])) BY (function_name) > 1
for: 5s
labels:
service: gateway
severity: major
annotations:
description: High retry rate on "{{$labels.function_name}}"
summary: High retry rate on "{{$labels.function_name}}"
Run kubectl edit -n openfaas configmap/prometheus-config
to add it to your config.
This is what it looks like, but beware that the number of 429s will change over time, starting quickly and then backing off as the exponential retry time goes up.
from nats-queue-worker.
Thank you for sharing those results. There are quite a few variables here including the average duration of a function execution.
For Linkerd, just make sure that you have disabled endpoint load-balancing so that it can do its thing (direct_functions=true)
from nats-queue-worker.
from nats-queue-worker.
Wow, it's very balanced with --set gateway.directFunctions=true
and when using LinkerD.
OpenFaaS is chewing through the queue and the demand is spread out evenly amongst the sleep function pods. Granted, I am using the sleep function, so there's no variance with each request. But still....cool.
Thanks for sharing a demo!
from nats-queue-worker.
That's sounds even better than expected. Do you have a similar number to share like your previous test to compare with?
What about some of those screenshots?
from nats-queue-worker.
@alexellis as it pertains to the prometheus rule for 429
s, if this behavior is consistent across async and sync functions, is it possible that this could be the default autoscaling rule?
Here are some arguments I might expect against such a change:
Concurrency limits are not required.
To me, it seems that concurrency limits are effectively enforced even if not explicitly provided, by pod CPU/mem limits. I would argue that anyone creating a function should be aware of their function's limits, and since limits have reasonable defaults for CPU and mem, I would argue that concurrency should probably also have a reasonable default, because eventually, a pod is likely to crash if it takes on too many requests, which is kind of just another form of concurrency limiting. Like a lot in OF, it's up to the person writing the function to know how CPU and memory intensive the function is, and as such should probably be expected to know how many invocations it can handle at once before hitting said limits.
I don't want
429
s to be returned by watchdog under any circumstances; I don't want to use the concurrency limiting feature.
A potential issue with this approach, to me, is that if you don't limit concurrency, you could crash the pod, which I would argue is worse than having to retry, especially if your function is stateful or non-idempotent. Even though those issues would be caused by lack of following best practices, I would argue that if OF can protect users from such a case, it probably should, because if a developer isn't willing to retry a function, who's to say they're willing to write stateless or idempotent functions?
Additionally, there are likely tricks that could be used to lower the likelihood of receiving a 429 in sync invocations, but I'm not sure that's necessary, as you can auto-retry with most HTTP tools with minimal effort.
As a side effect of returning 429s by default, it would improve visibility, because identifying a 429 is easy, if not automatic, but identifying that an error corresponds to a pod crashing is much more difficult to identify and troubleshoot.
Another likely beneficial side effect might be that having the user be explicit about their concurrency limits will probably also cause them to think about tuning their pod's limits, which might lead users to go for many small pods rather than few large pods when they might not otherwise think about it.
from nats-queue-worker.
Any updates/plans?
I encountered the same scenario. Until then, I’ll fork this repo & re-enqueue 429
s.
from nats-queue-worker.
@talhof8 as far as I am aware, resolving this issue would probably require a slightly different set of features, and is only waiting being prioritized (enough demand) & incubated.
from nats-queue-worker.
For scaling up, try something like:
- alert: APIHighRetryRate expr: sum(rate(gateway_function_invocation_total{code="429"}[10s])) BY (function_name) > 1 for: 5s labels: service: gateway severity: major annotations: description: High retry rate on "{{$labels.function_name}}" summary: High retry rate on "{{$labels.function_name}}"
@alexellis I notice that this is in the helm chart, but it doesn't appear to be configurable. Do you think it would be reasonable to create an issue for this?
Thinking of making a PR for this, but I'd like to at least offer a discussion if you think it would be a good idea to make other things configurable in here.
Perhaps as a separate issue, it might also be nice to be able to choose the alert used for autoscaling, so that you can have 1 function use a certain rule, and other use a different rule, which would have a number of advantages, such as safely testing scaling behavior.
from nats-queue-worker.
I would be happy to see an issue. Thanks for commenting.
This thread can probably be closed due to the release of the pro queue-worker
openfaas/faas-netes#851
from nats-queue-worker.
This thread can probably be closed due to the release of the pro queue-worker openfaas/faas-netes#851
Agreed
from nats-queue-worker.
@alexellis openfaas/faas-netes#852
from nats-queue-worker.
Related Issues (20)
- [Research] Retries for certain HTTP codes HOT 4
- Move the queue-worker to Go modules HOT 4
- Async invocation callback follows 30x redirects but loses data on the way HOT 3
- Feature request - don't log message contents HOT 6
- [Feature Request] CRD for Queue/Queue Worker
- Queue Worker does not gracefully shut down HOT 3
- Per-function ack_wait
- Posting function statistics to the gateway returns http status 401 HOT 1
- Dynamic max_inflight HOT 2
- Is JetStream available for OpenFaaS? HOT 5
- ghcr missing tag 0.13.0 HOT 1
- Path needs transformation before invoking via gateway_invoke HOT 6
- Upgrade nats-streaming client
- queue-worker crashes if callback happens and function call failed HOT 3
- TLSInsecure option required for callbacks HOT 1
- Show built version and SHA on start-up HOT 2
- Fix properties used by queue worker and add concurrent function invocation
- Remove durable config from the readme HOT 2
- Expose nats monitoring endpoints to prometheus HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nats-queue-worker.