restatedev / service-protocol Goto Github PK

View Code? Open in Web Editor NEW

5.0 4.0 1.0 65 KB

License: MIT License

service-protocol's Introduction

Restate Service Protocol

This repo contains specification documents and Protobuf schemas of the Restate Service Protocol.

Service invocation protocol specification

Development

To format the spec document:

npx prettier -w service-invocation-protocol.md

service-protocol's People

Contributors

Stargazers

Watchers

Forkers

tillrohrmann

service-protocol's Issues

Add `last_entry_name` to `FailMessage`

This is useful for observability.

Allow all completable journal entries to have a failure variant

In order to support cancellations, it would be great if all completable journal entries would have a failure variant that they support.

Remove unused messages

ATM we have no plan to implement anytime soon the SuspensionMessage. Let's remove it until we need it.

Harden the callback specification

Describe the syscall is implemented as two steps side effect + callback entry
If the closure fails, should we create the failed callback entry anyway?
How to handle receiving future completions?
How to handle case where closure sent some message, then closure fails, and we receive the completion?

Allow any journal entries to require_ack

The REQUIRES_ACK flag is now restricted to custom journal entries, we should allow it for any entry.

With this mechanism in place, the SDK can decide to wait for acks of specific types of entries, to avoid losing side effects when dealing with the cancellation signal. See https://docs.google.com/document/d/14s7D6KP1IKNS1K3OrVbZNKGNei9TO_-9Sk0YC5mpht8/edit#heading=h.hooeo8f4d03z

I propose to keep this concept of "REQUIRES_ACK" general such that we can at a later point introduce more "cancellation points" without touching this aspect implementation in the runtime (and potentially in the sdks as well)

Remove StartMessage#known_service_version

We don't need this field for service revisions, as the information won't be encoded in user code.

Introduce `StartMessage.state` and `PARTIAL_STATE` flag

Part of restatedev/restate#437

We need to introduce StartMessage.state and PARTIAL_STATE flag to carry eager state together with the beginning of the invocation.

Add `BackgroundInvokeEntryMessage.delay`

This is required to support restatedev/restate#20

Describe side effects in the spec

We have no description of how side effects work. Even if this is purely an sdk only feature, we should describe anyway how it works in the "optional features" section for users to implement this themselves.

Add `x-restate-server`

Add x-restate-server header when replying to invocations.

Support awakeable failure

Add a variant to CompleteAwakeableEntryMessage to allow failing an awakeable.

Define the new Deployment manifest

`BatchMessage`

In one of the initial designs of the protocol, we conceived the idea of BatchMessage, mostly to improve protocol efficiency, but then after a while we just ignored it as it didn't seem particularly interesting as optimization.

@StephanEwen recently suggested that a BatchMessage could in fact make sense, only in the response stream (from service endpoint to runtime), not for optimization purposes, but to provide a way to either commit all entries or no entries. This can be an interesting property for SDKs that wants to provide a different programming model, e.g. actor based models where there are no "await" points, where the side effect don't need any wrapping within a ctx.sideEffect or similar.

An interesting aspect of this message is that this would be an optional contract of the protocol, as only SDKs that wants to push this message to the runtime needs to implement it.

Some properties of the protocol to describe in the spec

ACKs returned only for side effects?
- Or better, whether to ack or not should be written in a flag. Important for unknown entries as well
Completions are received by service endpoint in any order
Service endpoint might even receive future completions?
When replaying, it's guaranteed no completions are sent by the runtime

Add a magic number to discern the header from other bytes

In order to harden the protocol, it would be good to add/prepend a magic number to the header so that one can distinguish a header value from a non-header value. That way, SDKs can guard against interpreting the wrong bytes as the header value.

Introduce `SuspensionMessage`

See restatedev/restate#97

Describe service discovery in protocol spec

Make suspensions an optional feature

When implementing the new TS SDK, the runtime initiated suspension mechanism made the implementation harder. It would be simpler for a new SDK to be able to ignore this feature altogether if possible. One idea could be to let the SDK tell the runtime whether it supports runtime initiated suspensions (w/o it, the SDK would probably not work on AWS lambda, though).

Another simplification could be to allow the waiting_for_completed_entries field to be empty which denotes that the invocation will be resumed on any journal entry completion.

More details are needed for why the suspension mechanism caused trouble (Stephan, Giselle).

Add license headers to proto files

Document service discovery protocol

While we have documented how the service protocol works, we have not documented how the discovery protocol works. To make the implementation of custom SDKs easier, this would probably help.

Pass partialStateFlag as part of StartMessage instead of in header

From the perspective of SDK development, it would be easier to have the partialStateFlag in the start message. This makes the header parsing simpler an you could just use the protobuf deserializers to get the flags... The flags would then also just pass as part of the message through the code base.

Cancellations

This issue includes:

#50
~~Introduce cancelled/failed field for background invoke and resolve awakeable~~
~~Introduce failed completion variant for GetState~~
~~Introduce StartMessage.cancel field~~
~~Describe the SDK cancellation behavior in the protocol spec~~

Make protocol actions more explicit

For some SDKs, it could be helpful to make certain protocol actions such as suspending more explicit by sending an explicit message instead of only closing the request channel. It would have the benefit that a runtime failure which also closes the request channel can be easily distinguished from a suspension. Some languages/frameworks have difficulties detecting the difference right now (see https://restatedev.slack.com/archives/C04KZRLE1SM/p1687219253254139 for more details).

Design the service endpoint registration protocol

Modify the protocol to define a new format for the `AwakeableIdentifier` (perhaps getting rid of the `AwakeableIdentifier` message?), carry the `partition_key` in `StartMessage` and modify the `CompleteAwakeableEntryMessage` accordingly

Notifying non-recoverable errors back to the runtime

In the protocol we distinguish three types of errors:

User errors
Infra recoverable errors, that is, they can be retried
Infra unrecoverable errors, that is, retrying them will lead again to failure

For user errors we already have a strategy to handle them, as all the fallible operations have a failure variant where we can write the user failure and propagate it back. It's up to the sdk to correctly implement error handling to write user failures as OutputStreamEntry.failure. For recoverable and unrecoverable errors, we don't have yet a good strategy to distinguish them.

To recap, right now a bidi stream can end in 4 ways:

Suspending, by sending a SuspensionMessage
Completing the invocation, by sending an OutputStreamEntry
Failing, by simply closing the stream correctly, not aborting it
Badly failing, by either connection broken or aborting the stream

In the runtime, we already distinguish the cases "Failing" and "Badly failing" thanks to the HTTP/2 protocol frames: https://github.com/restatedev/restate/blob/main/src/invoker/src/invoker.rs#L822 and https://github.com/restatedev/restate/blob/main/src/invoker/src/invocation_task.rs#L453 are the relevant code. In other words, I can always distinguish when the SDK closed the stream, or when it was aborted, or when there was a connection failure.

All the considerations above lead to this question: Is there ever a case where the SDK produces on-purpose recoverable errors?

If yes, we could simply say that when the SDK closes the stream correctly, and does not send SuspensionMessage or OutputStreamEntry, we know it's an unrecoverable failure, so the runtime won't retry to invoke it, and will simply propagate the error to the caller.
If no, then we need some way to let the sdk notify whether the occurred error is recoverable or not, by sending back something like ErrorMessage. If we go down the road this road, we still need to define what to do in case the stream was closed correctly without of SuspensionMessage nor OutputStreamEntry nor ErrorMessage, which probably means that this approach still implies the other one.

For example, a non-complete list of errors the Java sdk produces is here: https://github.com/restatedev/sdk-java/blob/main/sdk-core-impl/src/main/java/dev/restate/sdk/core/impl/ProtocolException.java, all of those are unrecoverable.

`PARTIAL_STATE` is always false

Quick question on the eager state implementation right now:

• Is Partial State always false, meaning do we get all state always?
• Is there a way we can keep the SDK simple, keep that flag out and just assume that the flag is always false?

Slack Message

Add `GetStateKeysEntryMessage`

Add `ClearAllEntryMessage`

Carry object key in `StartMessage` and `InvokeEntryMessage`/`BackgroundInvokeEntryMessage`

Allow end of stream without OutputStreamEntry

Right now to end a stream:

A message stream MUST start with StartMessage and MUST end with either:

One OutputStreamEntry
One SuspensionMessage
One ErrorMessage.
None of the above, which is equivalent to sending an empty ErrorMessage.

This assumes that a correct invocation always ends with a response, which is not the case for restatedev/restate#899. We should modify the end of stream operations to not assume OutputStreamEntry is the last message.

Tasks

Beta Give feedback

Replace `google.protobuf.Empty` with our `Empty` message

Update service protocol to define service protocol version and service discovery protocol version

Line up terminology between protocol and SDK

For example, a unidirectional call is called backgroundInvoke in the protocol but oneWayCall in the TS SDK.
This means that a user types oneWayCall in his code, but sees backgroundInvoke in the logs and traces.

Decide on the type for time

Both BackgroundInvoke and Sleep have a time field. Right now this type is i64, but perhaps we should change this to u64. We should also consider what would be the behaviour of Rust's time apis wrt time overflow

Simplify implementation of new SDKs

This is an exploration issue for collecting and evaluating ideas for the service protocol that could simplify the implementation of new SDKs. The goal should be that implementing a new SDK should be possible within 2-3 weeks. Key to it are an easy to understand description of the protocol, clear concepts, optionality of more advanced features and ease of use.

Tasks

Beta Give feedback

Simplify protocol by makein PollInputStreamEntryMessage non-completable

Currently, all our SDKs assume that the PollInputStreamEntryMessage is always completed. We can simplify the protocol by actually enforcing it. Moreover, we could think about getting rid of the PollInputStreamEntryMessage altogether and send the input value via the StartMessage.

Describe W3C trace context support

Make the journal an immutable log by storing completions separately

Implementing new SDKs could become easier for the journal would be an immutable log of journal entries and completions that are stored separately. This would allow us to remove the need for completable and non-completable journal entries. It could also help with implementing deterministic futures since we record in which order the journal entries are completed.

One problem that might arise is that with this change, there will be two components that append to the journal: the runtime which appends completions and the SDK appending journal entries. Right now, only the SDK is allowed to append journal entries which makes it quite simple to keep the runtime and SDK view on the journal in sync.

In order to solve this problem, the SDK would probably need to be able to re-order the tail of its journal in case there were completions that were appended before the last journal entries.

A minor disadvantage is that the runtime will lose the cheap capability to check whether a journal entry was completed or not.

More details on how an immutable log can simplify the SDK implementation are needed (Stephan, Giselle).

Transport the opaque sid

Perhaps it makes sense to transport the opaque id rather than invocaiton_id and service_key in the StartMessage?

We could use this opaque id as part of the awakeable identifier as well.

Remove the `SideEffectEntryMessage`

I propose to remove SideEffectEntryMessage from the "core" protocol and let every sdk define its own SideEffectEntryMessage as Unknown entry type.

The reason for this design choice is to keep the service-protocol small and define only journal entry messages which the runtime must be able to read and process in some way, as done with GetStateEntry, SetStateEntry or InvokeEntry, where the runtime needs to parse the entry, apply respective effects and eventually send back a completion. SideEffectEntryMessage does not fit in this category of messages, as the runtime never needs to parse it. It simply has to store it as blob and ack back "I've stored it" to the SDK. This specific mechanism is already specified by Unknown entries, which the runtime will accept and store, but won't try to parse them [1]. For an example usage of the Unknown entry mechanism, see CombinatorsEntryMessage.

Another nice consequence of this design choice is that we leave freedom to SDKs to define SideEffectEntry as they want, for example in Java we might be able to record error's stacktrace in a specific format, while in other languages we might need another message structure to record error's stacktrace.

[1] The spec still needs some clarification on this though, like defining when an Unknown entry should be acked or not. See #2.

Finish the invocation with the first OutputStreamEntry

One problem the TS SDK ran into was that a replay could happen after the first OutputStreamEntry was sent. Given that we don't support output streaming (yet and probably in the foreseeable future), we might consider changing the protocol such that an invocation terminates with the first and only OutputStreamEntry.

Agree on the format used to expose the awakeable identifier

Right now we have two approaches in the SDKs:

In the Java SDK we expose the code-generated protobuf object AwakeableIdentifier, from https://github.com/restatedev/proto/blob/main/dev/restate/core.proto
In the TS SDK we expose a JSON string encoded that encodes a "custom" object containing the id, which is de facto the same of AwakeableIdentifier

We should agree on a single format we use in all the SDKs, and use the same format in restatedev/proto#20.

Get keys

See restatedev/sdk-java#8

Make SideEffect an explicit journal entry type

While implementing the TS SDK, it turned out that implementing the side effect journal entry as a CustomEntry complicated the code a tad bit. The reason was that for the side effect entry, one always had to handle a CustomEntry and check that it contains a side effect. Maybe we want to make the SideEffect journal entry a first class citizen of the protocol to simplify this aspect even though the runtime does not need to understand it.

More details on what exactly was more complicated to implement for the side effect entry are needed (Stephan, Giselle).

restatedev / service-protocol Goto Github PK

service-protocol's Introduction

Restate Service Protocol

Development

service-protocol's People

Contributors

Stargazers

Watchers

Forkers

service-protocol's Issues

Tasks

Tasks

Recommend Projects

Recommend Topics

Recommend Org