cloudstateio / cloudstate Goto Github PK

View Code? Open in Web Editor NEW

763.0 70.0 97.0 7.65 MB

Distributed State Management for Serverless

Home Page: https://cloudstate.io

License: Apache License 2.0

JavaScript 10.92% Scala 68.03% Shell 1.22% Java 10.20% HTML 0.01% Dockerfile 0.14% Makefile 0.33% Go 9.16%

serverless knative kubernetes akka reactive cloud-native grpc javascript nodejs graalvm

cloudstate's Introduction

Cloudstate is no longer actively developed

Cloudstate was an open source protocol and reference implementation exploring ideas for stateful serverless, and was originally developed by Lightbend.

The project is no longer active, since 2021. An open source alternative is Eigr.

A continuation of the ideas can be found in Lightbend's platform-as-a-service Akka Serverless.

cloudstate's People

Contributors

Stargazers

Watchers

Forkers

jroper ukinau tiagoooliveira ticofab naheedmk jdaggett janory rubendg tz70s eed3si9n thinkmorestupidless naveen-nalam anvad inspectordidi udhayamgit cdoru martinhatas mcgarrah shankarshastri rnd4222 akramtexas anniyanvr mgudmund shivaathreya eformat tuapuikia octonato skymysky leetoo michaelpnash dudleycarr sleipnir dhiemaz vrrs ihostage chbatey palutz marcoderama edwardcallahan ellanvanninca mn8-technology chen0031 jboner ignasi35 srikanthops ralphlaude animeshinvinci mkipcak romankagan wisdombrain laik raboof tringuyen-yw justinhj chomnoue zeromem qstar sean-walsh rasummer dwijnand olofwalker reid-spencer pvlugter rstento edydkim katsutoxin jackiedlh vkorenev leilasj baodingfengyun jtownley beritou amasser tezheng ryanhanks tristin coreyauger lliiun-z radovankavicky gapdata crmiguez dorucioclea jargente retgits marcellanz ennru umar-nawaz rorystokes biancl turbocoreio eigr-labs semanticbeeng servicefoundation bearerpipelinetest linecode

cloudstate's Issues

Authentication and Authorization

It would be interesting to have the CloudState Proxy handle gRPC and HTTP Authentication and Authorization transparently. Even for outbound calls made by functions (route outbound calls via the proxy too) and seamlessly add and verify security tokens.

Python support

User platform support for Python-based services

Initial demonstration of functionality

Create a well-documented example application

Golang support

Ruby support

User platform support for Ruby-based services

Observability story

Define what metrics can and should be exposed by the platform.

User function Snapshot and Event formats

We need to document somewhere as to what data format these should be in. Protobuf is not the ideal data storage format as it is designed for protocols and not data storage. Should we mandate a format or leave it up to the users?

Golang Support

Where is a good entrypoint for folks interested in Golang contributions to clients slide?

Self-describing proxies?

It would be interesting to have the sidecar cache the description of the .proto provided by the user function and expose that description through a HTTP (or also a gRPC) endpoint—this would make it possible to obtain the descriptor from the outside.

Thoughts?

Optimized Akka serializer for com.google.protobuf.Any protobuf messages

Both event sourcing and CRDTs serializes Google protobuf Any values. Currently, it uses the built in Akka protobuf serializer to do this, which will store com.google.protobuf.Any in the manifest, and then the serialized Any as the message, which will contain the type_url as well as the serialized value field. This is inefficient, the com.google.protobuf.Any is repeated for every value unnecessarily. Instead, we should write an Akka serializer for com.google.protobuf.Any that stores the type_url in the manifest, and writes the value field directly as the bytes.

Standalone scale to zero

Without the standalone version, we need to decide whether we will support scale to zero or not. It may suffice to say that if you want scale to zero, you need to use in combination with Knative or something similar. Also possible is that in integrating with Knative, we'll need to implement our own activator proxy anyway (to avoid the tight undocumented coupling that Knatives activator proxy has with the rest of its infrastructure).

The design would essentially be the same as Knative - though I'm not 100% sure how it works in Knative, that is, I'm not sure how Knative ensures that requests are routed to the activator. Given that we don't have an independent autoscaler, our activator would need to also be responsible for scaling up from zero.

Graal support in the backend-platform

It would be extremely interesting to be able to make the sidecar application a Graal Native Image.

Make recommendation for gRPC client libs?

Should this project recommend gRPC client libraries on a per-user-platform basis?
This would make it easier for developers to figure out how to call services. It would also be worth-while for platform testing.

php support

User platform support for php-based services

Schedule a weekly contributors-confcall

Current proposed time is 6am AEST Wednesday (this means Tuesday for US and Europe).

Make a handful of (minor) updates to README.md to enhance its flow and readability.

The CloudState README.md is just about the coolest guide that I've seen in a great long while; and I've pretty much seen them all.
The coherent narrative laid out in the README.md has done nothing less for me than give me a manifesto for building (and using) next generation serverless, the way it should be done. Kudos 👍
It has, in short order, nicely framed FaaS in the context it should be seen; a first (important) step toward stateful cloud computing, but a first step nonetheless 🌩
With that, and with utmost respect for the design captured by the author(s) of this terrific README.md, I suggest that we make a handful of (minor) updates to it so as to enhance its flow and readability even more than it currently stands.

Include database in health checks

We should not return ready until we have established communication with the database. This would only apply to the Cassandra backend currently.

Protocol for interoperability

Define, implement, verify and document a Protobuf protocol that will ensure that the backend platform can be used by any user platform language (which supports protobuf).

TODO requirements

Java support

Add Java support for creating stateful serverless functions

Create a project generator site

It would be nice to have something like page (perhaps named https://new.cloudstate.io) where users could go, pick the storage management strategy, target language, service name, etc and it will generate a downloadable archive with a project structure, stub-files and some documentation links…

This would allow new users to really quickly be able to write their own stateful services without much pre-knowledge about how to structure their projects, what dependencies are needed etc.

Use Akka Persistence Typed

Instead of the original Akka Persistence

JavaScript support

Add support for defining stateful serverless functions in JavaScript

Old Revisions need to be scaled down to 0

This issue is specifically for when minScale > 0.
It's possible to "work around" this issue by setting the GC time limit to a low time and adjusting the number of old revisions to retain.

See: knative/serving#2720

Polyglot interoperability

Define, implement, verify and document that user code for the different platforms can interoperate.

Some thoughts on CQRS

Obviously a big thing we're missing right now is CQRS support - ie, read side processors for event sourced entities.

In general, there are two different types of read side processors, local, and remote. With local, you consume the event log directly. With remote, you publish to a message queue (eg Kafka) and consume from a different service from Kafka.

The consumption of these two things should feel the same to the developer. There may be an extra step in publishing to Kafka where you translate your event log (possibly filter/transform etc) to a message queue stream - we could add support for publishing the log directly with no user code though - the only issue around that is whether it's a good idea to expose the persistence format to the outside world, or whether we should require an anti-corruption layer.

An event stream consumer could just go and talk to a database directly - and I think we have to support that. Though, it would be good to investigate whether other technologies (Lightbend pipelines, Knative events) would be better to do that.

It would also be good to investigate whether we could create some more CloudState-esque consumer protocols. So for example, could a key-value entity be a consumer of an event stream? Or an event sourced entity? We would have to have a way of translating event streams to virtual gRPC calls. Also, we would need a way of expressing entity keys, the original events may be associated with the entity that produced them, but read side views very often want to pivot that, eg, in a chat application, you might have room entities which have a list of which users are in the room, but then you want a view that shows which rooms a user is in, so you pivot from users by room to rooms by user. To do that, you need to change the entity key. One idea for this would be to use forwards - so a non entity key based service receives the user joined room event, and then forwards it to the rooms by user entity and that message would be keyed by user. But that might be too much overhead, both from the developer experience, and from a performance perspective, so something more automated, perhaps the consumer can take the producers proto spec, and add their own pivot keys to the messages, possibly allowing fan out etc if the key is embedded in a repeated field.

Backend Platform upgrades

We need to define, implement, verify and document how the backend platform is updated and what the implications is for migrating to a new version.

Scala support

Add support for creating Scala-based stateful serverless functions

Support Evented services

Being able to emit events to be consumed by other services/endpoints as a part of command processing, and then being able to declare/autogenerate that an endpoint is able to receive certain kinds of events, either by having a transformation from an event to a command, and use the existing gRPC endpoints, or having to implement a specific endpoint signature to be able to receive events of that kind.

Implementing support for this will require a bit of R&D

create Gitter channel

I think it'd be useful to create a Gitter channel (or Discord,or whatever is convenient) for discussion, especially early on as the project is moving at a high pace.

How to do CRDTs

CRDT support is going to be interesting.

I see two general approaches. The first is to offer a very low level protocol, which essentially would look like the Akka ddata ReplicatedData and associated traits (eg delta handling traits). In this case, all CRDT types will need to be implemented in the language support libraries. The other approach would be to offer a high level protocol, reusing the CRDT types in Akka. This would mean the protocol itself would understand operations that can be made on for example a PNCounter or LWWMap.

I think the former approach is going to be necessary, as the latter will be really restrictive, you won't be able to implement custom CRDTs, and I'm not sure how you'd compose CRDTs. But the former pushes a lot more work into each implementation of the support library for each language.

A big challenge with all of these however is how to actually integrate with Akka ddata. All callbacks on Akka ddata types are synchronous, eg 'merge', 'mergeDelta' etc. Also, the modify' function on the Update` message is synchronous. This makes it impossible to implement them in the user function, since an invocation on anything from Akka to the user function is going to involve IO and so be asynchronous.

What we might need to do is fork an asynchronous version of the Replicator, that asynchronously invokes merge and friends. That could be equivalent to a rewrite.

How to do CRUD

Note that what is meant by CRUD is not SQL, or joins, but rather being able to get an Entity value, modifying it, and having the modified version stored for the next command/request. So "destructive updates".

Could in theory be implemented on top of the EventSourcing support by either storing the new Entity value as an event, or by repeatedly generating new Snapshots for each new state.

This also impacts the user-facing API as they would not have to deal with anything but the inbound commands (not events).

Create documentation describing how to add more language supports

Make a straight forward how-to on how to add more language support.

Workflow: Developer continuous feedback loop

We need to define the desired developer feedback loop (code-test-package-deploy-inspect)

P2P messaging

We would like to add a P2P messaging pattern, that is, a protocol that user functions can use to do P2P messaging through Akka.

What is a peer?

It's important to define what we mean by peer. A peer is an abstract concept that, for each domain, is defined by the domain. It could be a human, or it could be a device - eg, an IoT device - or it could be an entity (eg, an event sourced entity that is pushing updates to pages in real time). If it is a human, they may be interacting through many devices, for example, I have Slack installed on multiple laptops and multiple mobile devices, when someone sends me a message, and I have Slack on all my devices open, I expect to receive that message in real time on all of my devices at once. Typically, a device will have a TCP connection (perhaps gRPC stream or WebSocket) to a serverless service from which it will receive P2P messages. That connection may be over an unreliable network, and when it fails, it will reconnect, but not necessarily back to the same node that it was originally connected to.

Example characteristics

While we probably can't address every possible use case, we want to come up with one or more solutions that cover a broad range of use cases. With that in mind, here are some different characteristics or requirements that some use cases might have.

The P2P messaging may in some cases be between more than 2 peers (eg, a chat room, or multiple IoT devices in a home), there may be multiple publishers for a single topic, and multiple subscribers for a single topic - this may expand the traditional definition of P2P, perhaps we really are talking about addressed communication, but note that address is not a machine or actor address, it is the abstract user/device as defined above.

Various use cases exist for a range of different delivery guarantees. At most once is useful when the current state is being sent, and new messages invalidate previous messages. For example, tracking the location of an IoT enabled vehicle. The other major useful guarantee is effectively once. In this case it's assumed that the device receiving updates can deduplicate (using a domain specific sequence number for example, or unique ids), but needs at least once delivery. Instant messaging is an example of this.

Delivery time guarantees for effectively once messaging vary too. The point of P2P messaging is to allow effectively instant delivery, ie the only latency comes from network, routing, and processing, and that should happen in the happy case. In failure scenarios however, in some use cases there should be a maximum time that it takes for the message to be delivered, in other cases it's ok for the dropped message to not be delivered until the next message is received.

Solutions

Currently, the only out the box solution that Akka provides to implement P2P messaging as described above is distributed pubsub. This can be combined with Akka persistence to achieve at least once delivery, by persisting messages first, then publishing them, and then using the sequence number to detect dropped messages, and the journal to recover.

Distributed pubsub however requires replicating the subscriber state to all nodes, and hence doesn't scale well when there are a very large number of topics being subscribed to.

Here are two other distributed P2P possibilities that we might want to consider. These ideas are very raw and not fully thought out, they may be terrible.

Sharded mediator. In this case, all messages go through a sharded mediator. Subscribers are required to register subscription with the mediator, and they are required to maintain that subscription, including in cases when the mediator is rebalanced. The mediator may tell the subscriber when it's handing off to another node to assist on this, but the subscriber can't rely on that, and the subscriber should periodically resubscribe to the mediator - the mediator will also expire subscribers that haven't resubscribed for a while. Akka cluster sharding could be used, or a consistent hashing router could be used - if the latter, cluster membership events might trigger resubscribing. This solution has reduced availability because it introduces a third node that needs to be available for communication to succeed.
Gossip subscription state among publishers. In this case, a sharded contact point might be used to initially discover publishers and subscribers, once discovered, publishers gossip the subscription state between themselves and the contact point. Subscribers could either be part of the gossip cluster themselves, and use that to keep themselves active, or they could regularly tell the sharded contact point that they are active. Because there are potentially many gossip clusters, to keep communication down, gossip intervals would need to be long. Message activity could be used to trigger a temporary increase in gossip frequency (or an immediate gossip to all nodes), and publishers could keep messages that they publish for a time in case they learn of any new subscribers during this period of increased gossip frequency.

Installation instructions

Document and validate the steps necessary to create a working installation

Start FAQ.md doc

Discussing things like:

Why CRUD is too unconstrained to work well
How Akka Cluster work under the hood
etc.

How to do KVS

Having support for Key-Value style state management would be interesting,
it could either be implemented on top of CRUD #50 by the Entity being a Map.

This also impacts the user-facing API as they would not have to deal with anything but the inbound commands (not events), by adding/modifying/removing the values in the Map.

Support both Knative and standalone

We should support either working within Knative, or as standalone.

Currently, an operator that works with Knative revisions has been implemented, though it is probably in a non working state since we stopped work on integrating with Knative. We've created a patch for Knative that allows our operator to take over managing the deployment:

knative/serving#4152

That patch has been rejected due to undocumented tight coupling with the autoscaler, and we have found that the Knative autoscaler isn't suitable for scaling Akka clusters anyway, so work needs to be continued on Knative to disable the autoscaler for certain revisions too (and possibly the activator). But we will also need to make our proxy possible to work with either Knative or stand alone. To work with Knative, we need the following:

The autoscaler needs to be multi deployment aware. Currently, it only works with one deployment, but every new version of a Knative service that is deployed results in a new deployment. To support this, the autoscaler needs to probably do something along the lines of work out the proportion of traffic going to each node, and distribute scaling across the deployments accordingly. So, if 25% of traffic is going to deployment A, and 75% to deployment B, and the autoscaler decides that it needs 8 nodes to handle its load, then deployment A should be scaled to 2 nodes, and deployment B should be scaled to 6 nodes.
We need a way to be able to switch Knative vs standalone modes for the proxy.

Handoff between sidecar shutdown and user application needs to be orderly

See this improvement proposal which could address the issue: kubernetes/enhancements#753

TypeScript support

It would be interesting to target TypeScript

.NET Core support

It would be really cool to support .NET Core for user functions!

Node.js support

Create a formal specification for the CloudState protocol

Clean up the .protos and create a formal specification for the interactions, and then transfer those rules to the TCK for verification.

Automatic code formatting

Add automatic code formatting to the Reference Implementation build, and optionally, the respective frontend implementations.

Projection support

TODO devise a solution for being able to consume domain events to facilitate things like creating projections, or even consuming domain events from something like Alpakka (CloudEvents?)

Include server actor in health checks

We should not return ready until the server actor is running and bound (which implies that we have done the ready handshake with the user function).

Product CRDTs

We should provide support for product CRDTs - ie, CRDTs that are a product of multiple CRDTs. The underlying CRDT for this would be a Grow-only Map, which would be a map of keys (any type, but typically strings) to child CRDTs. Akka doesn't have support for this out of the box, though it's very similar to the ORMap, it just uses a GSet instead of an ORSet for maintaining the keys. Since for the product use case, we'd anticipate only a small number of keys, it would make sense for the delta to contain deltas of the member CRDTs (if supported), rather than full values (this isn't the case in ORMap, though is for ORMultiMap).

Once we have a GMap, we can expose that in the CRDT protocol, and then build user APIs that allow it to be used directly, as well as allow it to be used as a product (ie, essentially a plain old object whose properties are keys in the map). This would be done using a proxy in JavaScript, is is currently exposed for the ORMap, for Java, we might have some reflection based mechanism that would inject a POJO with the CRDT values.