We're looking to solidify a 1.0 release and I wanted

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Let's separate consumers from producers... Consumers <p dir="a

working towards 1.0 about nsq HOT 6 CLOSED

nsqio commented on August 27, 2024

working towards 1.0

from nsq.

Comments (6)

dustismo commented on August 27, 2024

There is an issue with the current protocol design. It can be impossible to match a specific request with the appropriate response if many requests are made in short order.

example
client: INDENTIFY
client: SUB topic1 mychannel
server: E_INVALID

There is no way to tell if E_INVALID is for the INDENTIFY or the SUB request.

from nsq.

mreiferson commented on August 27, 2024

@dustismo first, just want to make sure we're on the same page that INDENTIFY (in your example) is spelled incorrectly and is the one throwing the E_INVALID error.

You point is fair though, and it's something we've debated internally for some time.

We've taken the approach that errors are generally fatal. I started going through the process of making changes on the nsqd side that actually enforced this (ie. nsqd would close the connection if it encountered some of these scenarios for a client). In your example I realize there are a few more that should be promoted to fatal as well (like IDENTIFY and SUB). I also think IDENTIFY and SUB should both always return a success response so that clients could be written like:

connect()
send_magic()
identify(short_id, long_id)
err = read_response()
if err {
   // bail
}

subscribe(topic, channel)
err = read_response()
if err {
   // bail
}

That would clear up this specific case you're talking about where you'd like to provide useful feedback to your applications about why they couldn't connect, authenticate, or subscribe successfully.

For the rest of the cases (after subscribing and messages are flowing) our reader libraries simply log these errors and continue processing... you're really only dealing with 3 commands at that point (RDY, FIN, REQ). These also happen to be the commands that are generally sent at high volume. The overhead of having another packet for a success response (perhaps with some sort of client generated identifier?) isn't actually worth the performance cost. For some of these cases perhaps just adding more context to the error response (E_FIN_FAILED ... why ...) would be enough to effectively provide that same utility to you without overburdening the client libraries with additional state tracking?

from nsq.

dustismo commented on August 27, 2024

Ok, makes sense. I will keep the 'errors are fatal' mentality in mind (though, of course, more context would be good). Though I would still argue for an OK response on the IDENTIFY action.

This brings me to another issue (I'm being noisy I know :) )

The client I am writing is more or less completely async, so this pattern isn't so easy:

make request
wait for response

It works more like:

while(true) {
   send any waiting requests;
}
/// separately
while(true) {
   read reponse;
   send to response handler;
}

so matching requests with response can be more difficult.

What would make life much cleaner for this would be to have OK responses include the action. something like:

OK SUB

This would be esp helpful for the OK PUB response. I am currently using a semaphore to count OK response wrt to PUB commands to make sure I don't send faster then they are received (yet still maintain high throughput). This feels very brittle considering I don't know with certainty what the OK is for. (It does work, and I don't get any errant OK messages, just feels wrong)

from nsq.

mreiferson commented on August 27, 2024

Let's separate consumers from producers...

Consumers

I agree that IDENTIFY and SUB should always return a response and be fatal in their error cases (and have already started working on this in #133). At runtime, I think FIN, REQ, and RDY should probably be improved to return contextual error messages (like E_FIN_FAILED FIN <message_id>). This should resolve the issues you mention without the need for clients to keep track and match commands to responses.

Producers

Remember, commands on a single nsqd connection are serial. I suspect you'll be able to accomplish the throughput and robustness requirements you need by restructuring the way you're handling connections. I suggest:

use the MPUB command to achieve pipelining and higher throughput for a single, serial, connection.
use more connections performing serialized commands rather than decoupling the IO of a single connection.

For example, take a look at https://github.com/bitly/nsq/blob/master/examples/bench_writer/bench_writer.go - it can publish 321,000 (200 byte) messages/sec over a single connection using serialized MPUB commands.

Serializing async IO would look something like (in python, unfortunately, since I dont have experience with netty):

all_messages = [...]

def pub_loop(conn):
    message = all_messages.pop()
    pub(conn, message)

def pub(conn, message)
    callback = functools.partial(done_pub, conn=conn)
    conn.send("pub", message_data, callback=callback)

def done_pub(frame_type, data, conn):
    if frame_type == nsq.FRAME_TYPE_ERROR:
       # log, bail, retry
       return
    callback = functools.partial(pub_loop, conn=conn)
    ioloop.add_callback(callback)

Essentially just passing around context via callbacks on the io loop.

from nsq.

mreiferson commented on August 27, 2024

NSQ Protocol / Client Library Notes

I thought it would be helpful to document some of the philosophy and intentions here and solicit feedback as we move closer to 1.0. Much of this is already in place but there are some subtle changes that will improve the ability to write robust and performant client libraries. This will probably form the basis for some new documentation along side the existing protocol document.

The protocol design goals are as follows:

human readable commands over the wire
fast machine parseable response framing
errors are fatal (server closes client connection)
versioning
bounded
detection of slow/unresponsive clients (i.e. application level heartbeats)

Heartbeats

Application level heartbeats are useful for detection of slow/unresponsive/disconnected clients when TCP would prefer to continue to retry for considerable amounts of time. Think of the case like pulling a network cable, you want to control and bound these scenarios.

Client libraries should be designed in such a way that they account for and respond to heartbeats automatically.

Even though data is pushed to clients from nsqd asynchronously, client libraries can rely on the strictly ordered nature of commands and responses for matching.

The general approach should be to keep a queue of commands (could be a simple linked list), push onto the back when sending a command and pop off the front whenever a FrameTypeResponse or FrameTypeError is received (thus matching command to response).

Whenever a heartbeat is encountered it should be handled automatically (possibly calling back to the client as a notification). NOTE: Heartbeats will be promoted to their own frame type, to make this easier.

Consumers

Consumers can easily determine success/failure of connection, authentication, and subscription phase by:

send 4-byte magic
send IDENTIFY
send SUB
start read loop (with deadline) which will timeout or receive an error response if it fails

At runtime it is possible for FIN and REQ to return errors due to nsqd side message timeouts (and subsequent re-delivery to another consumer). These should be considered non-fatal. For performance reasons, along with RDY, these commands do not have success responses (and client libraries should make no attempt to match these, instead they should simply log the error message).

All other errors are considered fatal and connections are forcibly closed on the nsqd side.

All reads should be deadlined to 2x the heartbeat interval.

Producers

Producers can follow the same procedures as consumers for robust connection and authentication
(without subscribing).

Producers only have two commands available, PUB and MPUB. Producers should be written to perform
commands in serial. MPUB should be used for high-throughput pipelining.

Producers should expect to receive heartbeats when those are added in #131.

All reads should be deadlined to 2x the heartbeat interval.

TODO

MERGED Issue #135 - make MPUB an atomic operation (i.e. all error conditions are validated up front and the appropriate locks are acquired to assure this before actually queueing messages). This addresses the implicit problem of determining (from an error response) which messages an MPUB command succeeded or failed.
MERGED Issue #133:
- simplified protocol version handling
- fatal error changes to SUB/IDENTIFY
- close conns on invalid magic
- better context for FIN/REQ errors (consumers won't need to match commands to responses)
MERGED Issue #131: producers do not currently have heartbeats. For consistency with the consumer side, nsqd will send heartbeats periodically.
promote heartbeats to their own frame type

from nsq.

mreiferson commented on August 27, 2024

closing this, I've updated the issues labeled with 1.0 to reflect what I think is necessary.

from nsq.

working towards 1.0 about nsq HOT 6 CLOSED

Comments (6)

Consumers

Producers

NSQ Protocol / Client Library Notes

Heartbeats

Consumers

Producers

TODO

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent