Giter Club home page Giter Club logo

Comments (6)

cmacknz avatar cmacknz commented on July 18, 2024

Reproducing @rdner's original concerns with having the PublishEvents call block:

  1. For Example 2 it's the lack of observability. From the client's perspective, this will look like just a slow server. The client would not know about the reason why the request is so slow.

  2. A lot of clients will compete for the queue to become empty and this solution does not really solve anything regarding this point. We could, of course, implement a very sophisticated ordering system that will do "first come – first served" for the queue publishing requests but it will complicate things and might make certain requests extremely slow.

from elastic-agent-shipper.

cmacknz avatar cmacknz commented on July 18, 2024

I personally prefer PublishEvents as a blocking call. I think it simplifies the client implementation in that they do not need to worry about retry strategies, they can simply call PublishEvents and wait for it to finish.

It avoids having to deal with issues where a client has chosen a retry strategy poorly, in the worst case pointlessly consuming CPU cycles busy-waiting on the queue or introducing unnecessary latency with a too long retry loop.

It also provides a natural and obvious back pressure signal where the input client should simply stop sending us new data.

The shipper currently integrates with the memory queue producer:

producer := eventQueue.Producer(beatsqueue.ProducerConfig{})
return &Queue{eventQueue: eventQueue, producer: producer}, nil
}
func (queue *Queue) Publish(event *messages.Event) (EntryID, error) {
if !queue.producer.Publish(event) {
return EntryID(0), ErrQueueIsFull
}

Looking at the beats' implementation of Publish for the memory and disk queues it is actually blocking by default. If we wanted non-blocking we would need to switch to TryPublish instead in the queue wrapper.

  1. Beats' memory queue publish implementation: https://github.com/elastic/beats/blob/e6db9a53f2fff60b350153b03407bc38b21bf0b1/libbeat/publisher/queue/memqueue/produce.go#L128-L136
  2. Beats' disk queue publish implementation: https://github.com/elastic/beats/blob/e6db9a53f2fff60b350153b03407bc38b21bf0b1/libbeat/publisher/queue/diskqueue/producer.go#L52-L62

With regards to @rdner's concerns above:

For Example 2 it's the lack of observability. From the client's perspective, this will look like just a slow server. The client would not know about the reason why the request is so slow.

When a client learns that the queue is full, it only has one choice: wait some period of time and retry again. If we return a QueueIsFull error we delegate the choice of retry policy to the client, if we block we provide the optimal retry policy: immediately succeeding when space is free.

I am not worried about an input's ability to observe the queue state, but the agent user or operator's ability to observe the queue state. This we can provide through monitoring of the shipper itself.

A lot of clients will compete for the queue to become empty and this solution does not really solve anything regarding this point. We could, of course, implement a very sophisticated ordering system that will do "first come – first served" for the queue publishing requests but it will complicate things and might make certain requests extremely slow.

Correct, we can't do anything about this without a more complex design. Both a retry-backoff based polling approach and a blocking approach have the same behaviour here: the client that "wins" access to the queue is non-deterministic right now.

from elastic-agent-shipper.

cmacknz avatar cmacknz commented on July 18, 2024

@faec or @leehinman do you have any thoughts to add here?

from elastic-agent-shipper.

faec avatar faec commented on July 18, 2024

I think I prefer a fully blocking call if it's practical. Having RPCs routinely block for such a potentially long time makes me nervous abstractly, but since it's over local transport in our case I think it's ok. As far as concern (1), I agree this is possibly a false freedom for the client, since the knowledge that the queue is full shouldn't change the input's behavior (accumulate input data if it can, drop it if it must). For (2), when multiple clients are inserting into the queue in current implementations, having all of them block while waiting on progress should ensure eventual progress for each individual client (effectively they'll all be targeting the same channel, so progress should be as fair as the goroutine scheduler is).

One thing I'm uncertain of with blocking on everything is how we handle shutdown, since that's the only context where we'd practically need to know about partial progress in a long-running call. I believe our current expectation is that inputs will try to shut down before the shipper itself, so if it's technically feasible it would be nice to have some clean way to close off an active request and fetch the partial progress (this would be less relevant with non-blocking or blocking-on-first-event since that automatically gives more regular updates).

from elastic-agent-shipper.

leehinman avatar leehinman commented on July 18, 2024

I'm in favor of a blocking call, and the client should use a deadline on the gPRC call so they don't wait forever. If the client wants to do a more complex retry they can based on the gRPC call termination with DEADLINE_EXCEEDED or they could just try again.

In the server we should probably be checking to see if the context is cancelled as well so we can stop using resources when the client has given up and we can't report back to them.

from elastic-agent-shipper.

cmacknz avatar cmacknz commented on July 18, 2024

Closing in favor of the implementation issue: #84

from elastic-agent-shipper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.