The shipper PublishEvents <a href="https://github.com

Reproducing <a class="user-mention notranslate" data-hovercard-type="user" data-hoverc

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Closing in favor of the implementation issue: <a class="issue-link js-issue-link" data

[Design] Should the `PublishEvents` RPC block when the queue is full? about elastic-agent-shipper HOT 6 CLOSED

elastic commented on July 18, 2024

[Design] Should the `PublishEvents` RPC block when the queue is full?

from elastic-agent-shipper.

Comments (6)

cmacknz commented on July 18, 2024

Reproducing @rdner's original concerns with having the PublishEvents call block:

For Example 2 it's the lack of observability. From the client's perspective, this will look like just a slow server. The client would not know about the reason why the request is so slow.
A lot of clients will compete for the queue to become empty and this solution does not really solve anything regarding this point. We could, of course, implement a very sophisticated ordering system that will do "first come – first served" for the queue publishing requests but it will complicate things and might make certain requests extremely slow.

from elastic-agent-shipper.

cmacknz commented on July 18, 2024

I personally prefer PublishEvents as a blocking call. I think it simplifies the client implementation in that they do not need to worry about retry strategies, they can simply call PublishEvents and wait for it to finish.

It avoids having to deal with issues where a client has chosen a retry strategy poorly, in the worst case pointlessly consuming CPU cycles busy-waiting on the queue or introducing unnecessary latency with a too long retry loop.

It also provides a natural and obvious back pressure signal where the input client should simply stop sending us new data.

The shipper currently integrates with the memory queue producer:

elastic-agent-shipper/queue/queue.go

Lines 57 to 64 in cf6513d

 producer := eventQueue.Producer(beatsqueue.ProducerConfig{}) 

 return &Queue{eventQueue: eventQueue, producer: producer}, nil 

 } 

 func (queue *Queue) Publish(event *messages.Event) (EntryID, error) { 

 if !queue.producer.Publish(event) { 

 return EntryID(0), ErrQueueIsFull 

 }

Looking at the beats' implementation of Publish for the memory and disk queues it is actually blocking by default. If we wanted non-blocking we would need to switch to TryPublish instead in the queue wrapper.

Beats' memory queue publish implementation: https://github.com/elastic/beats/blob/e6db9a53f2fff60b350153b03407bc38b21bf0b1/libbeat/publisher/queue/memqueue/produce.go#L128-L136
Beats' disk queue publish implementation: https://github.com/elastic/beats/blob/e6db9a53f2fff60b350153b03407bc38b21bf0b1/libbeat/publisher/queue/diskqueue/producer.go#L52-L62

With regards to @rdner's concerns above:

For Example 2 it's the lack of observability. From the client's perspective, this will look like just a slow server. The client would not know about the reason why the request is so slow.

When a client learns that the queue is full, it only has one choice: wait some period of time and retry again. If we return a QueueIsFull error we delegate the choice of retry policy to the client, if we block we provide the optimal retry policy: immediately succeeding when space is free.

I am not worried about an input's ability to observe the queue state, but the agent user or operator's ability to observe the queue state. This we can provide through monitoring of the shipper itself.

A lot of clients will compete for the queue to become empty and this solution does not really solve anything regarding this point. We could, of course, implement a very sophisticated ordering system that will do "first come – first served" for the queue publishing requests but it will complicate things and might make certain requests extremely slow.

Correct, we can't do anything about this without a more complex design. Both a retry-backoff based polling approach and a blocking approach have the same behaviour here: the client that "wins" access to the queue is non-deterministic right now.

from elastic-agent-shipper.

cmacknz commented on July 18, 2024

@faec or @leehinman do you have any thoughts to add here?

from elastic-agent-shipper.

faec commented on July 18, 2024

I think I prefer a fully blocking call if it's practical. Having RPCs routinely block for such a potentially long time makes me nervous abstractly, but since it's over local transport in our case I think it's ok. As far as concern (1), I agree this is possibly a false freedom for the client, since the knowledge that the queue is full shouldn't change the input's behavior (accumulate input data if it can, drop it if it must). For (2), when multiple clients are inserting into the queue in current implementations, having all of them block while waiting on progress should ensure eventual progress for each individual client (effectively they'll all be targeting the same channel, so progress should be as fair as the goroutine scheduler is).

One thing I'm uncertain of with blocking on everything is how we handle shutdown, since that's the only context where we'd practically need to know about partial progress in a long-running call. I believe our current expectation is that inputs will try to shut down before the shipper itself, so if it's technically feasible it would be nice to have some clean way to close off an active request and fetch the partial progress (this would be less relevant with non-blocking or blocking-on-first-event since that automatically gives more regular updates).

from elastic-agent-shipper.

leehinman commented on July 18, 2024

I'm in favor of a blocking call, and the client should use a deadline on the gPRC call so they don't wait forever. If the client wants to do a more complex retry they can based on the gRPC call termination with DEADLINE_EXCEEDED or they could just try again.

In the server we should probably be checking to see if the context is cancelled as well so we can stop using resources when the client has given up and we can't report back to them.

from elastic-agent-shipper.

cmacknz commented on July 18, 2024

Closing in favor of the implementation issue: #84

from elastic-agent-shipper.

[Design] Should the `PublishEvents` RPC block when the queue is full? about elastic-agent-shipper HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	producer := eventQueue.Producer(beatsqueue.ProducerConfig{})
	return &Queue{eventQueue: eventQueue, producer: producer}, nil
	}

	func (queue Queue) Publish(event messages.Event) (EntryID, error) {
	if !queue.producer.Publish(event) {
	return EntryID(0), ErrQueueIsFull
	}