Comments (6)
Reproducing @rdner's original concerns with having the PublishEvents
call block:
-
For Example 2 it's the lack of observability. From the client's perspective, this will look like just a slow server. The client would not know about the reason why the request is so slow.
-
A lot of clients will compete for the queue to become empty and this solution does not really solve anything regarding this point. We could, of course, implement a very sophisticated ordering system that will do "first come – first served" for the queue publishing requests but it will complicate things and might make certain requests extremely slow.
from elastic-agent-shipper.
I personally prefer PublishEvents
as a blocking call. I think it simplifies the client implementation in that they do not need to worry about retry strategies, they can simply call PublishEvents
and wait for it to finish.
It avoids having to deal with issues where a client has chosen a retry strategy poorly, in the worst case pointlessly consuming CPU cycles busy-waiting on the queue or introducing unnecessary latency with a too long retry loop.
It also provides a natural and obvious back pressure signal where the input client should simply stop sending us new data.
The shipper currently integrates with the memory queue producer:
elastic-agent-shipper/queue/queue.go
Lines 57 to 64 in cf6513d
Looking at the beats' implementation of Publish
for the memory and disk queues it is actually blocking by default. If we wanted non-blocking we would need to switch to TryPublish
instead in the queue wrapper.
- Beats' memory queue
publish
implementation: https://github.com/elastic/beats/blob/e6db9a53f2fff60b350153b03407bc38b21bf0b1/libbeat/publisher/queue/memqueue/produce.go#L128-L136 - Beats' disk queue
publish
implementation: https://github.com/elastic/beats/blob/e6db9a53f2fff60b350153b03407bc38b21bf0b1/libbeat/publisher/queue/diskqueue/producer.go#L52-L62
With regards to @rdner's concerns above:
For Example 2 it's the lack of observability. From the client's perspective, this will look like just a slow server. The client would not know about the reason why the request is so slow.
When a client learns that the queue is full, it only has one choice: wait some period of time and retry again. If we return a QueueIsFull
error we delegate the choice of retry policy to the client, if we block we provide the optimal retry policy: immediately succeeding when space is free.
I am not worried about an input's ability to observe the queue state, but the agent user or operator's ability to observe the queue state. This we can provide through monitoring of the shipper itself.
A lot of clients will compete for the queue to become empty and this solution does not really solve anything regarding this point. We could, of course, implement a very sophisticated ordering system that will do "first come – first served" for the queue publishing requests but it will complicate things and might make certain requests extremely slow.
Correct, we can't do anything about this without a more complex design. Both a retry-backoff based polling approach and a blocking approach have the same behaviour here: the client that "wins" access to the queue is non-deterministic right now.
from elastic-agent-shipper.
@faec or @leehinman do you have any thoughts to add here?
from elastic-agent-shipper.
I think I prefer a fully blocking call if it's practical. Having RPCs routinely block for such a potentially long time makes me nervous abstractly, but since it's over local transport in our case I think it's ok. As far as concern (1), I agree this is possibly a false freedom for the client, since the knowledge that the queue is full shouldn't change the input's behavior (accumulate input data if it can, drop it if it must). For (2), when multiple clients are inserting into the queue in current implementations, having all of them block while waiting on progress should ensure eventual progress for each individual client (effectively they'll all be targeting the same channel, so progress should be as fair as the goroutine scheduler is).
One thing I'm uncertain of with blocking on everything is how we handle shutdown, since that's the only context where we'd practically need to know about partial progress in a long-running call. I believe our current expectation is that inputs will try to shut down before the shipper itself, so if it's technically feasible it would be nice to have some clean way to close off an active request and fetch the partial progress (this would be less relevant with non-blocking or blocking-on-first-event since that automatically gives more regular updates).
from elastic-agent-shipper.
I'm in favor of a blocking call, and the client should use a deadline on the gPRC call so they don't wait forever. If the client wants to do a more complex retry they can based on the gRPC call termination with DEADLINE_EXCEEDED
or they could just try again.
In the server we should probably be checking to see if the context is cancelled as well so we can stop using resources when the client has given up and we can't report back to them.
from elastic-agent-shipper.
Closing in favor of the implementation issue: #84
from elastic-agent-shipper.
Related Issues (20)
- Build 94 for main with status FAILURE
- Add a counter for the number of bytes transmitted by the shipper HOT 2
- Build 95 for main with status FAILURE
- Build 3 for 8.7 with status FAILURE
- Enable mTLS on connections between inputs and the shipper HOT 1
- Build 96 for main with status FAILURE
- Build 98 for main with status FAILURE
- Implement design to handle queued events on policy update
- Build 16 for 8.6 with status FAILURE
- The shipper event protocol should be a wrapper for JSON bytes HOT 18
- Add document equivalency test case to end to end tests HOT 1
- Decide what the default shipper Elasticsearch output configuration should be
- Migrate the default global Beat processors to the shipper HOT 1
- run docker integration tests during CI
- TestPersistedIndex/server_should_properly_shutdown is flaky HOT 1
- The Elasticsearch output should not report itself as degraded based only on the time between events HOT 3
- Archive this repository and remove the elastic-agent-shipper binary from the release process HOT 2
- Build 8 for 8.7 with status FAILURE
- ensure meta data isn't overwritten (shipper should not replace input, etc.)
- Build 111 for main with status FAILURE
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elastic-agent-shipper.