wasmcloud / wadm Goto Github PK
View Code? Open in Web Editor NEWwasmCloud Application Deployment Manager (wadm): Declarative application deployments for wasmCloud applications.
Home Page: https://wasmcloud.com
License: Apache License 2.0
wasmCloud Application Deployment Manager (wadm): Declarative application deployments for wasmCloud applications.
Home Page: https://wasmcloud.com
License: Apache License 2.0
Looking for comments and thoughts on this one, it's a half RFC.
When creating more complex wadm manifests, I noticed that I was often copying the same scaler config around for multiple resources. For example, if I wanted to have 3 actors all run on a host with a custom label, I'll have to add this for each actor:
traits:
- type: spreadscaler
properties:
replicas: 1
spread:
- name: custom
requirements:
app: custom
It could be nice to be able to define this as custom configuration with a name, and then re-use that config using that name later. Essentially, this would just ensure that I could define multiple components with the same traits with fewer lines. This could look like the following:
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: my-backend
annotations:
version: v0.0.1
description: "My Backend"
spec:
traits:
- type: spreadscaler
name: customspreadscaler
properties:
replicas: 1
spread:
- name: custom
requirements:
app: custom
components:
- name: myactor1
type: actor
properties:
image: ghcr.io/myactor:0.1.0
traits:
- name: customspreadscaler
- name: myactor2
type: actor
properties:
image: ghcr.io/myactor2:0.1.0
traits:
- name: customspreadscaler
This is just a thought and I'm happy to be challenged here, I worry about making it too confusing with jumping around to multiple places in a manifest to understand the spread configuration
Create a NATS API that exposes (at least) the following functionality:
This issue also includes implementing the underlying persistence mechanism for deployments. At implementation time, we'll need to decide if we're going to use global/distributed ETS or use JetStream.
To be complete, there should be a module in the library code that:
See for context: #40 (comment)
The definition of done is that the basic scaffolding is in place and working so we can make a decision whether or not to move forward with it. This scaffolding includes:
At a minimum, the following guides should exist:
There are going to be many times where a capability provider is "shared" between different applications (think something like the httpserver provider). Right now if one is already running, then every single reconcile will trigger a command that fails if some other application is running the provider.
To solve this I think the solution should be that the provider scaler should NOT match on annotations and only check if a provider with the correct linkname is running on hosts that match its spread requirements. It should also run a reconcile anytime it sees a provider stopped event. This is correct behavior because all that this is checking for is that a host has that capability, it doesn't matter what is running it. If for some reason that provider is deleted (by a user or by another application that has been deleted), then the scaler should make sure it is running again for the application
Please note that this isn't perfect. In the future I'd like to avoid the, albeit short, downtime of a provider stopping and then starting again, but that will require some sort of shared state between all of the different scalers
Capability providers support an optional JSON configuration which we should be able to support via a config
block in wadm. You can see the configuration
variable in the interface: https://docs.rs/wasmcloud-interface-lattice-control/0.18.0/wasmcloud_interface_lattice_control/struct.StartProviderCommand.html.
This is most useful for defining configuration that a provider will use for every link, like a different default HTTPserver port with the wasmCloud httpserver provider. This config
block would be taken and sent along to the start provider command. It would be nicer to specify nested config as YAML or JSON instead of just an opaque string, but the provider itself may or may not support a specific format
This would look something like this:
- name: httpserver
type: capability
properties:
image: wasmcloud.azurecr.io/httpserver:0.17.0
contract: wasmcloud:httpserver
config:
address: 0.0.0.0:8085
traits:
- type: spreadscaler
properties:
replicas: 1
It would be important here to note the difference between provider configuration and the linkdef trait, since they can be used for different things.
This only needs to be running the integration and unit tests. No building and deploying of artifacts required
We need to have a loop that triggers on watching the KV store for manifest changes. This should be an easy thing to add to the struct that stores manifests. This should take any changes and automatically trigger an update from the appropriate Scaler
Every flag and config file should be documented both in the CLI and in wadm documentation. The best option is that we add a new section to the wasmcloud docs site for wadm. This task does not include deployment documentation (it is a separate task)
Received manifests should be stored in a KV bucket inside of NATS:
ActorsStarted will lead to far less reads and writes as we won't be updating the store for each and every actor that is started
Can anybody help me?
I try local compile, and run WADM ,i just got in touch with elixir , I don't know how to solve this problem?
-> % mix do compile
=NOTICE REPORT==== 24-Aug-2022::18:49:31.264459 ===
TLS client: In state certify at ssl_handshake.erl:2100 generated CLIENT ALERT: Fatal - Handshake Failure
- {bad_cert,hostname_check_failed}
===> Failed to update package pc from repo hexpm
===> Errors loading plugin pc. Run rebar3 with DEBUG=1 set to see errors.
=NOTICE REPORT==== 24-Aug-2022::18:49:31.327068 ===
TLS client: In state certify at ssl_handshake.erl:2100 generated CLIENT ALERT: Fatal - Handshake Failure
- {bad_cert,hostname_check_failed}
===> Failed to update package pc from repo hexpm
===> Errors loading plugin pc. Run rebar3 with DEBUG=1 set to see errors.
===> Unable to run pre hooks for 'compile', command 'compile' in namespace 'pc' not found.
** (Mix) Could not compile dependency :snappyer, "/Users/duyong/.mix/rebar3 bare compile --paths /Users/duyong/Desktop/workspace/wadm/wadm/_build/dev/lib/*/ebin" command failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile snappyer", update it with "mix deps.update snappyer" or clean it with "mix deps.clean snappyer"
Right now we basically consume the raw NATS API everywhere we interact with wadm. It would be nice if we added a client library to the wadm crate so that we could have an experience like client.put_manifest(Manifest{}).await
. This could become more helpful down the line too as we add more features
In #119 we made any provider satisfy the requirements for a spread (see #106 for context). However, an Actual Solution™ would be to have some sort of generated list of all shared providers so an undeploy doesn't delete one another manifest is relying on. I don't quite have an idea of how we should implement this, so any ideas or PRs are welcome!
A scaler will need to take a config of some kind, likely a handle to some sort of state, and two different methods. Below is a general idea of what it could look like, but part of this spike is to have an integration test that can pass in an Event
or a wadm manifest and output commands with a basic, not fully functional spreadscaler.
/// A Scaler is used to manage responding to even
pub trait Scaler {
/// Any type that has the necessary data to configure the scaler
type Config: Send + Sync;
/// Handles the event, returning any needed changes in response
fn handle_event(&self, event: ScopedEvent) -> Result<HashSet<Command>>;
/// Handles a new or updated manifest with its given config. This configuration should be stored by implementors in some form so that handle_event can produce the right commands
fn handle_manifest(&self, config: Self::Config) -> Result<HashSet<Command>>;
/// Removes a config from the scaler and emits the expected compensatory commands
fn remove_manifest(&self, config: &Self::Config) -> Result<HashSet<Command>>;
}
Being able to reference environment variables on App Manifest for config and secrets would be really wonderful. Areas on the manifest this may be relevant include Link Defs and Config.
An example of using Environment Variables with AWS Secrets Manager can be found here:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/secrets-envvar.html
I wonder if HashiCorp Vault and others allow similar use.
@brooksmtownsend mentioned the potential of storing values in JetStream also.
We should update the wasmcloud-otp Helm chart to use wadm by default as another container in the pod
This issue simply serves as a list of things that we've found while writing wadm 0.4 that we wish would change in the OTP host, ranging from adding an additional field on an event to changing structure of commands, etc.
Most of this is just a stream so they shouldn't be taken as must-do's, but after the release of wadm 0.4 we'll consider this list and what we can do to improve the efficiency of wadm and the host
After talking with @brooksmtownsend we think that a manifest should always deploy by default when it is put, but this means we need a request body to indicate when we don't want it to deploy immediately. So we'll need to modify the put endpoint to expect a different body
Previously, you could specify a NATS JWT and seed in order to authenticate with a NATS server over the command line entirely, not needing files. In wadm v0.4.0-alpha.1, the JWT is a pathbuf, so you will have to provide a file path in order to use this authentication method, in which case you could just use a credsfile.
I would like to be able to supply a JWT and seed on the CLI to keep this simple, e.g.
WADM_NATS_JWT=eyJWTthings WADM_NATS_NKEY=SUASDASDASD wadm
While implementing #75, I realized that we aren't using annotations to mark actors that are managed by wadm. In addition to being managed by wadm, we'll also want to note the specific Spread that the annotation is using so that we can manage conflicting actor spreads.
Until this is complete, the ActorSpreadScaler as implemented in #75 will not be able to properly handle multiple spreads that match to the same host and it's recommended to keep spread requirements as specific as possible.
The observed model should keep track of passed/failed health checks and be able to reflect that in the observed state
Using received events from the stream, the KV bucket should have a generated state for every observed lattice.
This should be very well tested with integration and/or e2e tests
These e2e tests should be run against a lattice with multiple hosts running to exercise the full functionality of the scalers. Testing a multitenant cluster (i.e. one with multiple lattice prefixes) is out of scope for this task
Heyho,
super excited to see wadm taking shape, so I tried to follow install instructions on my M2. Unfortunately there seems to be an issue with semver detection of cargo for v0.4.0-alpha.1
.
Trying to run the command from the readme:
$ cargo install wadm --bin wadm --features cli --git https://github.com/wasmcloud/wadm --version v0.4.0-alpha.1 --force
error: the `--version` provided, `v0.4.0-alpha.1`, is not a valid semver version: cannot parse 'v0.4.0-alpha.1' as a semver
$ cargo version
cargo 1.68.2 (6feb7c9cf 2023-03-26)
For now I was able to work around it via leaving out the --version
flag.
$ cargo install wadm --bin wadm --features cli --git https://github.com/wasmcloud/wadm --force
[...]
Installed package `wadm v0.4.0-alpha.1 (https://github.com/wasmcloud/wadm#a08761c6)` (executable `wadm`)
$ wadm --version
wadm 0.4.0-alpha.1
Thank you and keep up the great work!
All of our examples and getting started docs should use wadm by default and hide the current instructions behind an "advanced" tab or header
When I submit invalid YAML to wadm, I get timeout errors from the client, but in the wadm logs we have:
14:01:31.917 [error] Gnat.Server encountered an error while handling a request: %Protocol.UndefinedError{description: "", protocol: String.Chars, value: %YamlElixir.ParsingError{column: 9, line: 104, message: "Invalid sequence", type: :not_a_sequence}}
14:02:27.411 [error] Gnat.Server encountered an error while handling a request: %Protocol.UndefinedError{description: "", protocol: String.Chars, value: %YamlElixir.ParsingError{column: 9, line: 104, message: "Invalid sequence", type: :not_a_sequence}}
14:03:00.468 [error] Gnat.Server encountered an error while handling a request: %Protocol.UndefinedError{description: "", protocol: String.Chars, value: %YamlElixir.ParsingError{column: 19, line: 105, message: "Block mapping value not allowed here", type: :block_mapping_value_not_allowed}}
These happen quickly and should result in a YAML parsing error and not a timeout, this could just be modifying the API server flow for model put
https://github.com/wasmCloud/wadm/blob/main/wadm/lib/wadm/api/api_server.ex#L121 so that requests that fail validation return an error instead of panicking
As of the time of this issue writing, the observed model exists only in pure functional format. This issue is to track the integration of that functional model with actual NATS subscriptions to the lattice event stream to produce observed lattice state.
We need to make sure all scalers try to reconcile if a node "ages out" from the store. I think there are two different ways to do this:
NodeReaped
event that gets emitted to wadm.evt.{lattice-id}
ForceReconcile
event that gets emitted to wadm.evt.{lattice-id}
I'm more in favor of option 2 because it gives us more flexibility. It would also allow a user to force a reconciliation pass if they ever wanted to (by emitting the correct event to the right topic)
As part of Stage 1, we are implementing a dummy work loop to handle commands. This task involves taking a list of commands and turning them into actual compensatory actions against the lattice control topics
https://mermaid.live
https://mermaid-js.github.io/mermaid/#/entityRelationshipDiagram
Looking through the code that has been written up this is my mental picture
erDiagram
WASMCLOUD_HOST ||--|| State : has
State }|..o{ Actor : has
State }|..o{ Provider : has
State }|..o{ Link_Definiton : has
LATTICE_OBSERVER }|--|| Lattice : has
Lattice ||--|| State: contains
WADM ||--o| Deployment_State : has
Deployment_State }|..o{ Actor : has
Deployment_State }|..o{ Provider : has
Deployment_State }|..o{ Link_Definiton : has
graph TD
A[Updated OAM Spec] -->|Apply| B(HOST_CORE)
B -->|Spin Up| C(Actor/Provider/Link)
C --> |Reconcile| D{Diff}
D -->|Yes/Apply/Backoff| B
D -->|No| E[Done]
F[WASHBOARD]-->|manual entry| B
How is the contention between the UI and OAM spec going to be handled - who wins? Last in?
Incomplete thoughts - Since Horde is involved WADM is a singleton? Haven't processed that. Redis will maintain....
As part of #48 we are implementing the basics of a spreadscaler. This work is to make sure everything is logged, instrumented, feature complete, and tested. It should implement all of the same functionality currently supported in wadm 0.3
At the moment, the LinkScaler only ensures that a link exists between an actor and a provider but does not assert that the values specified are correct. See this TODO for where this code would go: https://github.com/wasmCloud/wadm/blob/main/src/scaler/spreadscaler/link.rs#L119
Essentially we already have this information in the lattice data stream for linkdefs so we should check to see if the linkdef exists so that we can issue a delete and then a put command.
Hi,I found that it publish the json data of the OAM model to nats, who consumed this? It seems that there is currently no introduction in the documentation.😁
hi, when i try to deploy a simlpe2.yaml, i got an error logs like follows:
Failed to perform reconciliation pass: Weighted target 'eastcoast' has insufficient candidate hosts.,
Weighted target 'haslights' has insufficient candidate hosts.
oam json req:
{
"apiVersion": "core.oam.dev/v1beta1",
"kind": "Application",
"metadata": {
"name": "my-example-app",
"annotations": {
"version": "v0.0.1",
"description": "This is my app revision 1"
}
},
"spec": {
"components": [
{
"name": "userinfo",
"type": "actor",
"properties": {
"image": "wasmcloud.azurecr.io/fake:1"
},
"traits": [
{
"type": "spreadscaler",
"properties": {
"replicas": 4,
"spread": [
{
"name": "eastcoast",
"requirements": {
"zone": "us-east-1"
},
"weight": 80
},
{
"name": "westcoast",
"requirements": {
"zone": "us-west-1"
},
"weight": 20
}
]
}
},
{
"type": "linkdef",
"properties": {
"target": "webcap",
"values": {
"port": 8080
}
}
}
]
},
{
"name": "webcap",
"type": "capability",
"properties": {
"contract": "wasmcloud:httpserver",
"image": "wasmcloud.azurecr.io/httpserver:0.13.1",
"link_name": "default"
}
},
{
"name": "ledblinky",
"type": "capability",
"properties": {
"image": "wasmcloud.azurecr.io/ledblinky:0.0.1",
"contract": "wasmcloud:blinkenlights"
},
"traits": [
{
"type": "spreadscaler",
"properties": {
"replicas": 1,
"spread": [
{
"name": "haslights",
"requirements": {
"ledenabled": true
}
}
]
}
}
]
}
]
}
}
what cause this error? T_T
The linkdef handler should be a Scaler
implementation. This will be the second Scaler
so there will be some additional work required:
Create a supervisor hierarchy of OTP processes that manages the reconciliation process in a consistent way that is compatible with the functional reconciliation model and BEAM clustering.
wash up
should now support pulling down wadm and starting it by default (just like it does for the host and NATS). As part of this, there should be an optional flag to not start it
I tried to define a linkdef scaler with the following spec:
- type: linkdef
properties:
target: httpclient
It did not get put. But, when adding example values, it works
- type: linkdef
properties:
target: httpclient
values:
foo: bar
We should make this list optional
Right now we have the API object to request that a manifest can undeploy without deleting the managed actors and providers, but we we haven't actually hooked up that logic. We'll need to add that to the ManifestUndeployed notification so that a scaler can just be deleted rather than spinning things down
When using WADM with a private registry, the host can't be found to deploy providers and actors.
The same code works fine when using WASMCLOUD_VERSION=v0.59.0.
I have not tested without using a private registry, so that may not be the issue, but I believe @jordan-rash could not recreate with 0.60 without private reg, so it is likely.
Errors:
17:14:58.444 [error] Failed to perform reconciliation pass: Weighted target 'oauth2_pkce' has insufficient candidate hosts.,
Weighted target 'apigw_router' has insufficient candidate hosts.,
Weighted target 'jammin_messaging_provider_spread' has insufficient candidate hosts.,
Weighted target 'httpclient_spread' has insufficient candidate hosts.,
Weighted target 'httpserver_spread' has insufficient candidate hosts.,
Weighted target 'vault_spread' has insufficient candidate hosts.,
Weighted target 'redis_spread' has insufficient candidate hosts.
17:15:14.659 [error] GenServer {Wadm.HordeRegistry, "st_default"} terminating
** (FunctionClauseError) no function clause matching in List.foldl/3
(elixir 1.13.3) lib/list.ex:248: List.foldl(%{}, %LatticeObserver.Observed.Lattice{actors: %{}, claims: %{}, hosts: %{"NA6UITC5DLLNAKTDFYQJRYO3FPAU5RPX64FXKDP46Y7FB74SUTMGT5DX" => %LatticeObserver.Observed.Host{first_seen: ~U[2023-02-22 22:15:14.657009Z], friendly_name: "weathered-fire-6128", id: "NA6UITC5DLLNAKTDFYQJRYO3FPAU5RPX64FXKDP46Y7FB74SUTMGT5DX", labels: %{"app" => "oauth2", "hostcore.arch" => "x86_64", "hostcore.os" => "linux", "hostcore.osfamily" => "unix"}, last_seen: ~U[2023-02-22 22:15:14.657009Z], status: :healthy}}, id: "default", instance_tracking: %{}, invocation_log: %{}, linkdefs: [], parameters: %LatticeObserver.Observed.Lattice.Parameters{host_status_decay_rate_seconds: 35}, providers: %{}, refmap: %{}}, #Function<4.118258543/2 in LatticeObserver.Observed.EventProcessor.record_heartbeat/4>)
(lattice_observer 0.1.0) lib/lattice_observer/observed/event_processor.ex:198: LatticeObserver.Observed.EventProcessor.record_heartbeat/4
(wadm 0.2.0) lib/wadm/lattice_state_monitor.ex:36: Wadm.LatticeStateMonitor.handle_info/2
(stdlib 3.17.2) gen_server.erl:695: :gen_server.try_dispatch/4
(stdlib 3.17.2) gen_server.erl:771: :gen_server.handle_msg/6
(stdlib 3.17.2) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message: {:cloud_event, %Cloudevents.Format.V_1_0.Event{data: %{"actors" => %{}, "friendly_name" => "weathered-fire-6128", "labels" => %{"app" => "oauth2", "hostcore.arch" => "x86_64", "hostcore.os" => "linux", "hostcore.osfamily" => "unix"}, "providers" => [], "uptime_human" => "32 seconds", "uptime_seconds" => 32, "version" => "0.60.0"}, datacontenttype: "application/json", dataschema: nil, extensions: %{}, id: "fbdaf2c2-9ae0-42cd-b678-f46c09904ad7", source: "NA6UITC5DLLNAKTDFYQJRYO3FPAU5RPX64FXKDP46Y7FB74SUTMGT5DX", specversion: "1.0", subject: nil, time: "2023-02-22T22:15:14.657009Z", type: "com.wasmcloud.lattice.host_heartbeat"}}
State: %LatticeObserver.Observed.Lattice{actors: %{}, claims: %{}, hosts: %{}, id: "default", instance_tracking: %{}, invocation_log: %{}, linkdefs: [], parameters: %LatticeObserver.Observed.Lattice.Parameters{host_status_decay_rate_seconds: 35}, providers: %{}, refmap: %{}}
Currently, with actors and providers we check for existence of, for actors, a public key, and for providers, a public key/contract ID/link name triple.
We do not take into account version information, which will be problematic when upgrading actors and providers to newer versions or using wadm as a part of an inner dev loop. The Actor and Provider scalers should also look for the versions of those running assets and upgrade older versions if necessary as a part of reconciling.
In #79 we added some things around a mirror stream for combining streams. In NATS 2.10, we will be able to have multiple filter subjects and won't need it any more. So once we update, we should remove the need for a mirror stream
We started Wadm a long time ago, but have mostly let it sit since then as we've worked on polishing the host. However, the time has come to get this all working and productionized. This document is a proposal for rewriting and releasing wadm as a fully supported and featured part of the wasmCloud ecosystem, complete with waterslide!
This section covers the goals and non-goals of this scope of work. Please note that some of the items in non-goals are possible future work for wadm, but are not in scope for this work
wash up
For this work, I propose we rewrite Wadm in Rust. This decision was made for the following reasons, in order of importance. As part of making this decision, two other languages (Elixir and Go) were considered. Reasons for rejecting are described in the last 2 sections
Schedulers and application lifecycle management are topics that many people in the cloud native space have deep knowledge of. If we are going to be writing something that does those things for wasmCloud, then we need as many eyes on it as possible. Based on current metrics of wasmCloud repos, we have very few contributors to our Elixir code and a lot more to our Rust repos. Other projects in, or consumed by, the wasm ecosystem are in Rust and also have higher numbers of contributors. Go would have also been an excellent choice here, but the other reasons listed here precluded it. We also have multiple contributors in the wasmCloud community right know who already know Rust.
The tl;dr is that we need contributors to be successful and the current language does not attract enough people.
One problem we've run into consistently in our elixir projects is issues with dynamic typing. Although this can be mitigated somewhat by tools like dialyzer, it requires programmer and maintainer discipline and still doesn't catch everything. Having a static type for each type of event that will drive a system like Wadm is critical for ensuring correct behavior and avoiding panics.
In addition to the need for static typing is the preference for having generics. In my experience with writing large applications for Kubernetes in both Rust and Go, a generic type system makes interacting with the API much easier. There is less generated code and need for rolling your own type system as what happens in many large Go projects. Go has added generics, but its system is nowhere near as strong as other statically typed languages such as Rust.
To support custom scalers, we will likely be supporting at least an actor based extension point and possibly a bare wasm module. Either way, most of the wasm libraries/runtimes out in the wild are written in Rust or have first-class bindings for Rust. Also, many of our wasmCloud libraries are written in Rust, which will allow for reuse.
This is the lowest priority reason why I am suggesting Rust, but it is still an important one. The current implementation requires bundling an Erlang VM along with the compiled Elixir. That means someone who runs Wadm as it currently stands will likely need to tune a VM. It is also larger, which leads to more space requirements on disk and longer download times.
Rust (and even Go moreso) has great support for static binaries and both run lighter than a VM without much additional tuning (if any).
As with any tool choice, there are tradeoffs that occur. Below is a list of disadvantages I think will be most likely to cause friction
One of the biggest questions here is why not continue with Elixir. By far the biggest thing we are giving up is the code around the lattice observer. However, writing this in Rust gives us the advantage of creating something that we could eventually make bindings for in any language (this also helps enable the reusability described below), although that isn't a goal here.
With that said, the previous sections cover in depth the advantage of using Rust over Elixir in this case
In my comparisons, I was looking for languages that would fit the requirements above. Due to the overlap of languages used for wasm as well as languages familiar to those in the Cloud Native space, that whittled things down to Go and Rust. Go in many ways excels at many of these requirements. It is much more popular that Rust and Elixir (probably combined) and has great support for statically compiling binaries. Also, things like NATS are native to Go.
It came down to a few main concerns of why Rust would be better:
To be clear, there are other smaller reasons, but those could be considered nitpicky.
One of the items I most thought about when drafting this was whether or not we should implement wadm as a true state machine. Given the simplicity of what it is trying to do, I propose we focus more on implementing an event-driven filtering approach. Essentially, a state machine approach is going to overkill for this stage of the project and the near future.
Loosely, I am calling these "Scalers" (name subject to change). Every scaler can take a list of events (that may or not be filtered) that returns a list of actions to take.
This does not mean we might iterate into a state machine style in the future (if you are curious, you can see Krator for an example of how this could be done in code) or that a scaler implementation can't use a state machine. This only means that for this first approach, we'll filter events into actions.
I have purposefully not gone into high levels of detail of what this looks like in code as it will probably be best just to try and see how this looks like as we begin to implement it. What we currently have in wadm is probably a good way of going about this (i.e. Scalers output commands)
One important point is that these "Scalers" should be commutative (i.e. if a+b=c then b+a=c, the order of operations doesn't matter). That means when a manfest is sent to wadm, it can run through the list of supported Scalers in any order and it will return the same output.
For this first production version, we will only be supporting a NATS API. This is because pretty much all wasmCloud tooling already uses NATS mostly transparently to an every day user. We can take advantage of that same tooling to keep things simple this time around. If we were to add an HTTP API right now, we'd have to figure out authn/z and figure out how we want to handle issuing tokens. So to keep it simple, we'll focus only on NATS to start.
One very important note here is that we definitely do want an HTTP API in the future. We know that many people will want to integrate with or extend WADM and an HTTP API is the easiest way to do that. But not for this first go around (well, second, but you get my point)
This is fairly self explanatory, but we want to store everything in one place now that NATS has KV support so we don't need any additional databases. Only the manifest data is stored in NATS. Lattice state is built up by querying all hosts on startup and then responding to events
A key requirement is that wadm can be run in high availability scenarios, which at its most basic means that multiple copies can be running.
I propose that this be done with leader election. Only one wadm process will ever be performing actions. All processes can gather the state of the lattice for purposes of fast failover, but only one performs actions. This is the simplest way to gain basic HA support
This is purely here as a design note and is not required for completing the work, but based on experiences with tools like Kubernetes and Nomad, extending with a custom scheduler is a common ask for large deployments. In code, adding a scaler will be as simple as implementing a Scaler
trait.
For most people however, I propose that custom "Scalers" be added via a wadm manifest. The application provider must have an actor that will implement a new wasmcloud:wadm-scaler
interface, but can be as arbitrarily complex as desired. This manifest will have 2 special requirements
wasmcloud.dev/scaler: <scaler-name>
Once again, this is not going to be implemented here, and will likely be another, smaller, proposal than this one
One key point to stress here is that wadm is meant to be the canonical scheduler for wasmCloud. This means that it is the general purpose scheduler that most people use when running wasmCloud, but no one is forced to use it. You can choose not to use it at all, or to write your own entirely custom scheduler.
To that end, I propose we publish the key functionality as a Rust crate. Much of the functionality could be used in many other applications besides a scheduler, but it can also be used to build your own if so desired. Basically we want to avoid some of the problems of what occurred in Kubernetes where everything must go through the built-in scheduler
Whew, we made it to the actual work! As part of thinking through these ideas, I started a branch that has implemented some of the basic building blocks like streaming events and leader election. When we actually begin work, it will be against a new wadm_0.4
branch in the main repo until we have completed work. Please note that these are a general roadmap, I didn't want to try and give minute details here. Below is the basic overview of needed work.
All of this work can be worked on in parallel. This is a bit shorter because we are about 40-50% there with the branch I started work on
This is a bit more difficult as these things must be worked on roughly in order. This work is more spike-like as it is spiking out the design of the Scaler
Scaler
trait and implement the spreadscaler type (at least the number of replicas functionality). Scalers will need to handle manifests and state changes given by events (such as if a host stops). We want to start with implementing so we can see what kind of info is neededThis is the "tying a bow on it" stage of work
wash up
by default with an optional flag to not use itWould be wonderful to include or update diagrams found here: #15 such that it reflects the new design patterns within WADM.
Once the below are in place, we should cut a 0.4.0-alpha.1
tag for all of these so we can start on the stage 4 work
wadm
crate to crates.ioA declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.