eigr / spawn Goto Github PK

Spawn - Actor Mesh

License: Apache License 2.0

Elixir 93.62% Shell 0.57% Makefile 1.28% HTML 1.75% Rust 2.77%

actor-model actorsystem cloud-native concurrency distributed-systems durable-computing-model event-driven kubernetes microservices serverless sidecar spawn stateful-actors virtual-actors

spawn's People

Contributors

Stargazers

Watchers

Forkers

sleipnir zblanco kianmeng venkatesh-sp crt-fork mruoss adolfont h3nrique joeljuca satanson johan-- crt-fork

spawn's Issues

Better Protocol Flow documentation

no process: the process is not alive or there's no process currently associated with the given name

Describe the bug

Apparently the Activators ignore the treatment of the actors' reactivation process, which provides that if an actor is not in Horde.Registry it will be reactivated on the node that has the host function record.

This did not occur in any test of direct invocation of an actor between the host function and the proxy, so the problem is only related to Activators

2022-09-10 17:49:30.537 [[email protected]]:[pid=<0.936.0> ]:[info]: Dispatching message to Actors [%{actor: "robert", command: "setLanguage"}]
2022-09-10 17:49:30.537 [[email protected]]:[pid=<0.936.0> ]:[debug]:Decoded event: %Google.Protobuf.Any{__unknown_fields__: [], type_url: "type.googleapis.com/io.eigr.functions.spawn.codebeambr.messages.Request", value: "\n\ahaskell"}
2022-09-10 17:49:30.538 [[email protected]]:[pid=<0.5994.0> ]:[debug]:Request for Activate Actor [robert] using command [setLanguage] with payload: %Google.Protobuf.Any{__unknown_fields__: [], type_url: "type.googleapis.com/io.eigr.functions.spawn.codebeambr.messages.Request", value: "\n\ahaskell"}
2022-09-10 17:49:30.538 [[email protected]]:[pid=<21279.2653.0> ]:[debug]:Actor robert reactivated. PID: #PID<0.2644.0>
2022-09-10 17:49:30.540 [[email protected]]:[pid=<0.5994.0> ]:[error]:GenServer #PID<0.5994.0> terminating
** (stop) exited in: GenServer.call({:via, Horde.Registry, {Spawn.Cluster.Node.Registry, {Actors.Actor.Entity, "robert"}}}, {:invocation_request, %Eigr.Functions.Protocol.InvocationRequest{__unknown_fields__: [], actor: %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: nil, name: "robert", persistent: false, snapshot_strategy: nil, state: nil}, async: false, command_name: "setLanguage", system: %Eigr.Functions.Protocol.Actors.ActorSystem{__unknown_fields__: [], name: "spawn-system", registry: nil}, value: %Google.Protobuf.Any{__unknown_fields__: [], type_url: "type.googleapis.com/io.eigr.functions.spawn.codebeambr.messages.Request", value: "\n\ahaskell"}}}, 30000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.13.1) lib/gen_server.ex:1019: GenServer.call/3 
    (actors 0.1.0) lib/actors.ex:112: Actors.invoke/4
    (activator 0.1.0) lib/activator/dispatcher/default_dispatcher.ex:48: anonymous fn/3 in Activator.Dispatcher.DefaultDispatcher.do_dispatch/3
    (flow 1.2.0) lib/flow/materialize.ex:758: anonymous fn/4 in Flow.Materialize.mapper/2
    (flow 1.2.0) lib/flow/materialize.ex:647: Flow.Materialize."-build_reducer/2-lists^foldl/2-0-"/3
    (flow 1.2.0) lib/flow/materialize.ex:647: anonymous fn/5 in Flow.Materialize.build_reducer/2
    (flow 1.2.0) lib/flow/map_reducer.ex:59: Flow.MapReducer.handle_events/3
    (gen_stage 1.1.2) lib/gen_stage.ex:2471: GenStage.consumer_dispatch/6
Last message: {:"$gen_consumer", {#PID<0.5993.0>, #Reference<0.3707952406.3645636609.244954>}, [%{actor: "robert", command: "setLanguage"}]}
State: {%{#Reference<0.3707952406.3645636609.244954> => nil}, %{done?: false, producers: %{#Reference<0.3707952406.3645636609.244954> => #PID<0.5993.0>}, trigger: #Function<2.15704490/3 in Flow.Window.Global.materialize/5>}, {0, 8}, [], #Function<6.70741094/4 in Flow.Materialize.build_reducer/2>}

Add Leader Election to Operator

Use Bonny Leader Election
Reference coryodaniel/bonny@ffb2fcd

Data too long for column 'data'

I'm not sure if this is a bug or if it's a consequence of the bad design of the protobuf combined with the misuse of the system but I decided to open it as a bug to investigate further

Describe the bug
When defining a protobuf that contains a list as an attribute for storing an actor's state and constantly updating this list we get an error when the proxy tries to update the actor's state. The exact error is this:

2022-09-08 18:06:54.846 [[email protected]]:[pid=<0.19013.0> ]:[debug]:QUERY ERROR db=7.1ms idle=273.1ms
INSERT INTO `events` (`actor`,`data`,`data_type`,`revision`,`tags`,`inserted_at`,`updated_at`) VALUES (?,?,?,?,?,?,?) ON DUPLICATE KEY UPDATE `revision` = ?, `tags` = ?, `data_type` = ?, `data` = ? ["robert", <<1, 10, 65, 69, 83, 46, 71, 67, 77, 46, 86, 49, 160, 117, 116, 23, 230, 62, 89, 174, 43, 164, 220, 41, 38, 105, 107, 194, 18, 4, 31, 236, 24, 18, 72, 66, 131, 218, 74, 202, 66, 226, 255, 14, 4, 130, 229, 128, ...>>, "type.googleapis.com/io.eigr.functions.spawn.codebeambr.state.actors.RobertState", 0, %{}, ~N[2022-09-08 21:06:54], ~N[2022-09-08 21:06:54], 0, %{}, "type.googleapis.com/io.eigr.functions.spawn.codebeambr.state.actors.RobertState", <<1, 10, 65, 69, 83, 46, 71, 67, 77, 46, 86, 49, 192, 178, 76, 108, 247, 29, 131, 250, 9, 65, 174, 36, 195, 167, 190, 139, 193, 146, 226, 58, 30, 126, 102, 120, 139, 198, 136, ...>>]
2022-09-08 18:06:54.846 [[email protected]]:[pid=<0.19013.0> ]:[error]:Task #PID<0.19013.0> started from #PID<0.1154.0> terminating
** (MyXQL.Error) (1406) Data too long for column 'data' at row 1
    (ecto_sql 3.8.3) lib/ecto/adapters/myxql.ex:287: Ecto.Adapters.MyXQL.insert/6
    (ecto 3.8.4) lib/ecto/repo/schema.ex:744: Ecto.Repo.Schema.apply/4
    (ecto 3.8.4) lib/ecto/repo/schema.ex:367: anonymous fn/15 in Ecto.Repo.Schema.do_insert/4
    (ecto 3.8.4) lib/ecto/repo/schema.ex:269: Ecto.Repo.Schema.insert!/4
    (statestores 0.1.0) lib/statestores/adapters/mysql.ex:19: Statestores.Adapters.MySQL.save/1
    (elixir 1.13.1) lib/task/supervised.ex:89: Task.Supervised.invoke_mfa/2
    (elixir 1.13.1) lib/task/supervised.ex:34: Task.Supervised.reply/4
    (stdlib 3.17) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Function: #Function<0.41325355/0 in Actors.Actor.StateManager.save_async/3>
    Args: []

To Reproduce
Steps to reproduce the behavior:

Clone the repository https://github.com/sleipnir/spawn-code-beam-br-java.git
Uncomment the commandLineRunner method in the Main class and run the application.

Don't forget to configure the proxy and mysql database locally.

Expected behavior
After a few iterations the proxy will start logging the error mentioned above.

Additional context
In my understanding the user should not keep an excessively long list in the actor state, this is bad practice and will consume excessive memory as well as cause this kind of inconsistency in the actor state storage. But I see it necessary to investigate the problem and try to mitigate it if possible

k8s manifest doenst work

Describe the bug
When I try to install the Operator manifest I get errors on kubernetes and Operator in turn does not install correctly

To Reproduce

kubectl -n eigr-functions apply -f apps/operator/manifest.yaml
deployment.apps/eigr-functions configured
clusterrole.rbac.authorization.k8s.io/eigr-functions unchanged
serviceaccount/eigr-functions unchanged
clusterrolebinding.rbac.authorization.k8s.io/eigr-functions unchanged
unable to recognize "apps/operator/manifest.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"
unable to recognize "apps/operator/manifest.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"
unable to recognize "apps/operator/manifest.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"

This is because the Bonny library is not generating the correct manifest in newer versions of Kubernetes as described here coryodaniel/bonny#117

Add Google PubSub Activator

There is a problem between dependencies that I don't know if it can be solved with the current format of the repository being based on an Elixir Umbrella project. Needs further investigations

Standalone Activators

I've been thinking about the design of the Activators proposal and I think that although it is functional it may not be great from a performance and resilience point of view.
Currently the Activators should be started together with the Proxy sidecar, this in itself generates some inconveniences such as:

The proxy must contain all Activator dependencies (brokers, authentication libraries, etc...)
The size of the Proxy container will consequently get bigger.
There will be an increase in the usage of resources like cpu and memory.

Thinking about these questions I think the best design would be if the Activators were a separate Pod (Deployment or DaemonSet). Since the responsibility of an Activator is to react to an event by invoking an Actor as a consequence of that event then nothing would stop us from separating the Activator from the Proxy code and running them as autonomous Pods with their own life cycles.

wdyt? @eliasdarruda @marcellanz

high demand requests fail update state

Describe the bug

022-09-08 17:41:40.035 [[email protected]]:[pid=<0.982.0> ]:[debug]:Terminating actor zezinho with reason {:function_clause, [{Actors.Actor.Entity, :update_state, [%Actors.Actor.Entity.EntityState{actor: %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: nil, name: "zezinho", persistent: false, snapshot_strategy: nil, state: nil}, state_hash: nil, system: %Eigr.Functions.Protocol.Actors.ActorSystem{__unknown_fields__: [], name: "spawn-system", registry: %Eigr.Functions.Protocol.Actors.Registry{__unknown_fields__: [], actors: %{"zezinho" => %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: %Eigr.Functions.Protocol.Actors.ActorDeactivateStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 50000}}}, name: "zezinho", persistent: true, snapshot_strategy: %Eigr.Functions.Protocol.Actors.ActorSnapshotStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 10000}}}, state: %Eigr.Functions.Protocol.Actors.ActorState{__unknown_fields__: [], state: nil, tags: %{}}}}}}}, %Eigr.Functions.Protocol.Context{__unknown_fields__: [], state: %Google.Protobuf.Any{__unknown_fields__: [], type_url: "type.googleapis.com/io.eigr.spawn.example.MyState", value: <<8, 168, 44>>}}], [file: 'lib/actors/actor/entity.ex', line: 512]}, {Actors.Actor.Entity, :handle_call, 3, [file: 'lib/actors/actor/entity.ex', line: 232]}, {:gen_server, :try_handle_call, 4, [file: 'gen_server.erl', line: 721]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 750]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 226]}]}
2022-09-08 17:41:40.035 [[email protected]]:[pid=<0.982.0> ]:[error]:GenServer {Spawn.Cluster.Node.Registry, {Actors.Actor.Entity, "zezinho"}} terminating
** (FunctionClauseError) no function clause matching in Actors.Actor.Entity.update_state/2
    (actors 0.1.0) lib/actors/actor/entity.ex:512: Actors.Actor.Entity.update_state(%Actors.Actor.Entity.EntityState{actor: %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: nil, name: "zezinho", persistent: false, snapshot_strategy: nil, state: nil}, state_hash: nil, system: %Eigr.Functions.Protocol.Actors.ActorSystem{__unknown_fields__: [], name: "spawn-system", registry: %Eigr.Functions.Protocol.Actors.Registry{__unknown_fields__: [], actors: %{"zezinho" => %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: %Eigr.Functions.Protocol.Actors.ActorDeactivateStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 50000}}}, name: "zezinho", persistent: true, snapshot_strategy: %Eigr.Functions.Protocol.Actors.ActorSnapshotStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 10000}}}, state: %Eigr.Functions.Protocol.Actors.ActorState{__unknown_fields__: [], state: nil, tags: %{}}}}}}}, %Eigr.Functions.Protocol.Context{__unknown_fields__: [], state: %Google.Protobuf.Any{__unknown_fields__: [], type_url: "type.googleapis.com/io.eigr.spawn.example.MyState", value: <<8, 168, 44>>}})
    (actors 0.1.0) lib/actors/actor/entity.ex:232: Actors.Actor.Entity.handle_call/3
    (stdlib 3.17) gen_server.erl:721: :gen_server.try_handle_call/4
    (stdlib 3.17) gen_server.erl:750: :gen_server.handle_msg/6
    (stdlib 3.17) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.980.0>): {:invocation_request, %Eigr.Functions.Protocol.InvocationRequest{__unknown_fields__: [], actor: %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: nil, name: "zezinho", persistent: false, snapshot_strategy: nil, state: nil}, async: false, command_name: "sum", system: %Eigr.Functions.Protocol.Actors.ActorSystem{__unknown_fields__: [], name: "spawn-system", registry: %Eigr.Functions.Protocol.Actors.Registry{__unknown_fields__: [], actors: %{"zezinho" => %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: %Eigr.Functions.Protocol.Actors.ActorDeactivateStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 50000}}}, name: "zezinho", persistent: true, snapshot_strategy: %Eigr.Functions.Protocol.Actors.ActorSnapshotStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 10000}}}, state: %Eigr.Functions.Protocol.Actors.ActorState{__unknown_fields__: [], state: nil, tags: %{}}}}}}, value: %Google.Protobuf.Any{__unknown_fields__: [], type_url: "type.googleapis.com/io.eigr.spawn.example.MyBusinessMessage", value: <<8, 168, 44>>}}}
State: %Actors.Actor.Entity.EntityState{actor: %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: nil, name: "zezinho", persistent: false, snapshot_strategy: nil, state: nil}, state_hash: nil, system: %Eigr.Functions.Protocol.Actors.ActorSystem{__unknown_fields__: [], name: "spawn-system", registry: %Eigr.Functions.Protocol.Actors.Registry{__unknown_fields__: [], actors: %{"zezinho" => %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: %Eigr.Functions.Protocol.Actors.ActorDeactivateStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 50000}}}, name: "zezinho", persistent: true, snapshot_strategy: %Eigr.Functions.Protocol.Actors.ActorSnapshotStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 10000}}}, state: %Eigr.Functions.Protocol.Actors.ActorState{__unknown_fields__: [], state: nil, tags: %{}}}}}}}
Client #PID<0.980.0> is alive

    (stdlib 3.17) gen.erl:233: :gen.do_call/4
    (elixir 1.13.1) lib/gen_server.ex:1027: GenServer.call/3
    (actors 0.1.0) lib/actors.ex:112: Actors.invoke/4
    (proxy 0.1.0) lib/proxy/routes/api.ex:35: anonymous fn/2 in Proxy.Routes.API.do_match/4
    (proxy 0.1.0) lib/plug/router.ex:246: anonymous fn/4 in Proxy.Routes.API.dispatch/2
    (telemetry 1.1.0) /home/sleipnir/workspaces/eigr/spawn/deps/telemetry/src/telemetry.erl:320: :telemetry.span/3
    (proxy 0.1.0) lib/plug/router.ex:242: Proxy.Routes.API.dispatch/2
    (proxy 0.1.0) lib/proxy/routes/api.ex:1: Proxy.Routes.API.plug_builder_call/2
2022-09-08 17:41:40.040 [[email protected]]:[pid=<0.980.0> ]:[error]:GenServer #PID<0.980.0> terminating
** (stop) exited in: GenServer.call({:via, Horde.Registry, {Spawn.Cluster.Node.Registry, {Actors.Actor.Entity, "zezinho"}}}, {:invocation_request, %Eigr.Functions.Protocol.InvocationRequest{__unknown_fields__: [], actor: %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: nil, name: "zezinho", persistent: false, snapshot_strategy: nil, state: nil}, async: false, command_name: "sum", system: %Eigr.Functions.Protocol.Actors.ActorSystem{__unknown_fields__: [], name: "spawn-system", registry: %Eigr.Functions.Protocol.Actors.Registry{__unknown_fields__: [], actors: %{"zezinho" => %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: %Eigr.Functions.Protocol.Actors.ActorDeactivateStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 50000}}}, name: "zezinho", persistent: true, snapshot_strategy: %Eigr.Functions.Protocol.Actors.ActorSnapshotStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 10000}}}, state: %Eigr.Functions.Protocol.Actors.ActorState{__unknown_fields__: [], state: nil, tags: %{}}}}}}, value: %Google.Protobuf.Any{__unknown_fields__: [], type_url: "type.googleapis.com/io.eigr.spawn.example.MyBusinessMessage", value: <<8, 168, 44>>}}}, 30000)
    ** (EXIT) an exception was raised:
        ** (FunctionClauseError) no function clause matching in Actors.Actor.Entity.update_state/2
            (actors 0.1.0) lib/actors/actor/entity.ex:512: Actors.Actor.Entity.update_state(%Actors.Actor.Entity.EntityState{actor: %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: nil, name: "zezinho", persistent: false, snapshot_strategy: nil, state: nil}, state_hash: nil, system: %Eigr.Functions.Protocol.Actors.ActorSystem{__unknown_fields__: [], name: "spawn-system", registry: %Eigr.Functions.Protocol.Actors.Registry{__unknown_fields__: [], actors: %{"zezinho" => %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: %Eigr.Functions.Protocol.Actors.ActorDeactivateStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 50000}}}, name: "zezinho", persistent: true, snapshot_strategy: %Eigr.Functions.Protocol.Actors.ActorSnapshotStrategy{__unknown_fields__: [], strategy: {:timeout, %Eigr.Functions.Protocol.Actors.TimeoutStrategy{__unknown_fields__: [], timeout: 10000}}}, state: %Eigr.Functions.Protocol.Actors.ActorState{__unknown_fields__: [], state: nil, tags: %{}}}}}}}, %Eigr.Functions.Protocol.Context{__unknown_fields__: [], state: %Google.Protobuf.Any{__unknown_fields__: [], type_url: "type.googleapis.com/io.eigr.spawn.example.MyState", value: <<8, 168, 44>>}})
            (actors 0.1.0) lib/actors/actor/entity.ex:232: Actors.Actor.Entity.handle_call/3
            (stdlib 3.17) gen_server.erl:721: :gen_server.try_handle_call/4
            (stdlib 3.17) gen_server.erl:750: :gen_server.handle_msg/6
            (stdlib 3.17) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
    (elixir 1.13.1) lib/gen_server.ex:1030: GenServer.call/3
    (actors 0.1.0) lib/actors.ex:112: Actors.invoke/4
    (proxy 0.1.0) lib/proxy/routes/api.ex:35: anonymous fn/2 in Proxy.Routes.API.do_match/4
    (proxy 0.1.0) lib/plug/router.ex:246: anonymous fn/4 in Proxy.Routes.API.dispatch/2
    (telemetry 1.1.0) /home/sleipnir/workspaces/eigr/spawn/deps/telemetry/src/telemetry.erl:320: :telemetry.span/3
    (proxy 0.1.0) lib/plug/router.ex:242: Proxy.Routes.API.dispatch/2
    (proxy 0.1.0) lib/proxy/routes/api.ex:1: Proxy.Routes.API.plug_builder_call/2
    (plug 1.13.6) lib/plug.ex:168: Plug.forward/4
Last message: {:continue, :handle_connection} 
State: {%ThousandIsland.Socket{acceptor_id: "C6D0FDA72891", connection_id: "CF66FAD33985", read_timeout: :infinity, socket: #Port<0.114>, transport_module: ThousandIsland.Transports.TCP}, %{handler_module: Bandit.InitialHandler, plug: {Proxy.Router, []}, read_timeout: 60000}}

Create index on column tags

We need an index on the tags column of the events table.

This is related to this other issue #118

It should be taken into account that a probable query would have actor and system as query components

Add HTTP Activator

We need to translate requests to and from json/protobuf. We will probably have to map the protobuf types and process the Protobuf Descriptors just like we did in the Massa project. Only then do I think we will be able to map the attributes of a payload in Json to a valid Protobuf because we will have to compile it via Protobuf Descriptors for valid Elixir modules in Runtime and then we can parser the json to protobuf and vice versa.

Example of converting protobuf json to elixir (pseudo code):

actor = Eigr.Functions.Protocol.Actors.Actor.new(name: "Joe")
{:ok, json} = Protobuf.JSON.to_encodable(actor)
decoded_actor = Protobuf.JSON.decode(json, Eigr.Functions.Protocol.Actors.Actor)

Links:
https://github.com/eigr/massa/blob/379828098fbbb7298de5ebd8ee1525c749555bbd/apps/massa_proxy/lib/massa_proxy/server/grpc_server.ex#L18
https://github.com/eigr/massa/blob/379828098fbbb7298de5ebd8ee1525c749555bbd/apps/massa_proxy/lib/massa_proxy/server/grpc_server.ex#L32
https://github.com/eigr/massa/blob/379828098fbbb7298de5ebd8ee1525c749555bbd/apps/massa_proxy/lib/massa_proxy/util.ex#L155

MongoDB Support

Statestore vault double-decoding encryption key!?

Describe the bug

I tried to run an ActorHost using the dice game example, but the application fails with an error:

[...]
2023-01-20 07:40:40.475 [[email protected]]:[pid=<0.2241.0> ]:[notice]:Application proxy exited: exited in: Proxy.Application.start(:normal, [])
    ** (EXIT) an exception was raised:
        ** (RuntimeError) Failed to start proxy. {:error, {:shutdown, {:failed_to_start_child, Proxy.Supervisor, {:shutdown, {:failed_to_start_child, Sidecar.Supervisor, {:shutdown, {:failed_to_start_child, Sidecar.ProcessSupervisor, {:shutdown, {:failed_to_start_child, Statestores.Supervisor, {:shutdown, {:failed_to_start_child, Statestores.Vault, {:EXIT, {{:badmatch, {:error, {%ArgumentError{message: "non-alphabet character found: \".\" (byte 46)"}, [{Base, :bad_character!, 1, [file: 'lib/base.ex', line: 137]}, {Base, :"-decode64base!/2-lbc$^0/2-0-", 2, [file: 'lib/base.ex', line: 648]}, {Base, :decode64base!, 2, [file: 'lib/base.ex', line: 640]}, {Statestores.Vault, :init, 1, [file: 'lib/statestores/vault/vault.ex', line: 9]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 851]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 814]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}]}}}, [{Statestores.Vault, :start_link, 1, [file: 'lib/cloak/vault.ex', line: 184]}, {:supervisor, :do_start_child_i, 3, [file: 'supervisor.erl', line: 414]}, {:supervisor, :do_start_child, 2, [file: 'supervisor.erl', line: 400]}, {:supervisor, :"-start_children/2-fun-0-", 3, [file: 'supervisor.erl', line: 384]}, {:supervisor, :children_map, 4, [file: 'supervisor.erl', line: 1250]}, {:supervisor, :init_children, 2, [file: 'supervisor.erl', line: 350]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 851]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 814]}]}}}}}}}}}}}}}
            (proxy 0.5.0-rc.12) lib/proxy/application.ex:28: Proxy.Application.start/2
            (kernel 8.5) application_master.erl:293: :application_master.start_it_old/4
[os_mon] memory supervisor port (memsup): Erlang has closed
2023-01-20 07:40:40.489 [[email protected]]:[pid=<0.2298.0> ]:[notice]:    :alarm_handler: {:clear, :system_memory_high_watermark}
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,proxy,{bad_return,{{'Elixir.Proxy.Application',start,[normal,[]]},{'EXIT',{#{'__exception__' => true,'__struct__' => 'Elixir.RuntimeError',message => <<\"Failed to start proxy. {:error, {:shutdown, {:failed_to_start_child, Proxy.Supervisor, {:shutdown, {:failed_to_start_child, Sidecar.Supervisor, {:shutdown, {:failed_to_start_child, Sidecar.ProcessSupervisor, {:shutdown, {:failed_to_start_child, Statestores.Supervisor, {:shutdown, {:failed_to_start_child, Statestores.Vault, {:EXIT, {{:badmatch, {:error, {%ArgumentError{message: \\"non-alphabet character found: \\\\".\\\\" (byte 46)\\"}, [{Base, :bad_character!, 1, [file: 'lib/base.ex', line: 137]}, {Base, :\\"-decode64base!/2-lbc$^0/2-0-\\", 2, [file: 'lib/base.ex', line: 648]}, {Base, :decode64base!, 2, [file: 'lib/base.ex', line: 640]}, {Statestores.Vault, :init, 1, [file: 'lib/statestores/vault/vault.ex', line: 9]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 851]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 814]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}]}}}, [{Statestores.Vault, :start_link, 1, [file: 'lib/cloak/vault.ex', line: 184]}, {:supervisor, :do_start_child_i, 3, [file: 'supervisor.erl', line: 414]}, {:supervisor, :do_start_child, 2, [file: 'supervisor.erl', line: 400]}, {:supervisor, :\\"-start_children/2-fun-0-\\", 3, [file: 'supervisor.erl', line: 384]}, {:supervisor, :children_map, 4, [file: 'supervisor.erl', line: 1250]}, {:supervisor, :init_children, 2, [file: 'supervisor.erl', line: 350]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 851]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 814]}]}}}}}}}}}}}}}\">>},[{'Elixir.Proxy.Application',start,2,[{file,\"lib/proxy/application.ex\"},{line,28}]},{application_master,start_it_old,4,[{file,\"application_master.erl\"},{line,293}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,proxy,{bad_return,{{'Elixir.Proxy.Application',start,[normal,[]]},{'EXIT',{#{'__exception__' => true,'__struct__' => 'Elixir.RuntimeError',message => <<"Failed to start proxy. {:error, {:shutdown, {:failed_to_start_child, Proxy.Supervisor, {:shutdown, {:failed_to_start_child, Sidecar.Supervisor, {:shutdown, {:failed_to_start_child, Sidecar.ProcessSupervisor, {:shutdown, {:failed_to_start_child, Statestores.Supervisor, {:shutdown, {:failed_to_start_child, Statestores.Vault, {:EXIT, {{:badmatch, {:error, {%ArgumentError{message: \"non-alphabet character found: \\\".\\\" (byte 46)\"}, [{Base, :bad_character!, 1, [file: 'lib/base.ex', line: 137]}, {Base, :\"-decode64base!/2-lbc$^0/2-0-\", 2, [file: 'lib/base.ex', line: 648]}, {Base, :decode64base!, 2, [file: 'lib/base.ex', line: 640]}, {Statestores.Vault, :init, 1, [file: 'lib/statestores/vault/vault.ex', line: 9]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 851]}, {:gen_se

Crash dump is being written to: erl_crash.dump...Segmentation fault (core dumped)

The interesting part seems to be this:

non-alphabet character found: "." (byte 46)

When looking at my statestore secret, it contains a . (dot).

Here is the current implementation that is failing:

spawn/spawn_statestores/statestores/lib/statestores/vault/vault.ex

Lines 4 to 21 in ec7236c

 @impl GenServer 

 def init(config) do 

 config = 

 Keyword.put(config, :ciphers, 

 default: 

  {Cloak.Ciphers.AES.GCM, tag: "AES.GCM.V1", key: decode_env!("SPAWN_STATESTORE_KEY")}, 

 secondary: 

  {Cloak.Ciphers.AES.CTR, tag: "AES.CTR.V1", key: decode_env!("SPAWN_STATESTORE_KEY")} 

 ) 

 {:ok, config} 

 end 

 defp decode_env!(var) do 

 var 

 |> System.get_env() 

 |> Base.decode64!() 

 end

I would guess the application already gets the decoded value and there is no need to decode it again.

To Reproduce

apiVersion: spawn-eigr.io/v1
kind: ActorHost
metadata:
  name: dice
  labels:
    spawn-eigr.io.actor-system: yggdrasil
spec:
  host:
    image: eigr/dice-game-example:0.1.1
    ports:
      - name: http
        containerPort: 8080
        protocol: TCP

Expected behavior

No error 😅

Kubernetes:

Version v1.25.2
Provider: Bare metal

Spawn Operator:

Version 0.5.0-rc.12

Best candidate for primary key

Looking the models, I have a suggestion.
In the actors table, if ecto supports a primary key different than a sequential numeric, the best candidate is the actor field, which is the actor’s name, with this change when the state store looks for an actor by name the phase of lookup will be cut.

Code generation

Although we rely on serializable data types via Protobuf and our user experience in the development of our APIs so they are contract-first approaches the fact is that we do little with regard to generating code for the user. If we could generate code in different languages from protobuf declarations including the rpc type we would bring a better experience for our developer users.

The positive side effect of this approach is that by declaring service methods via protobuf's service rpc we could generate the methods of the actor's functions in the supported languages and we could guarantee the consistency of the names of these methods for the different languages. Currently we make invocations to actors passing the method name as an argument, but developers of different languages have different ways of naming their methods/functions, for example a python developer would think of a method name based on the pattern using underscore while a java developer would use the camel case pattern.

If the methods were generated from the protobuf rpc source then the command names of the actors would be declared consistently between actors regardless of the way adopted to declare their method or function in the different sdks.

Sqlite3 DB doesnt run migrations

When Spawn is started with sqlite3 configured as statestore it does not create the events table and the migrations appear as already migrated and up.

To use sqlite3 you need to create the events table manually, we need to fix this so that Spawn is able to create them when starting.

Add an initialization callback to actors

The idea is to allow users to be able to register functions that should be called when the Process initializes for the first time, allowing the user to execute some code and initialize the Actor state.

Initially I didn't consider this because there are some risks in this type of functionality, such as the user overwriting a previous valid state with an invalid or empty state after a POD reboot.

But this can be circumvented using two strategies:

Always send the current state of the Actor in the Proxy as a parameter to the user function in order to allow the user to perform some type of validation.
In the function return a flag with the state update policy can be used to indicate whether the state should be overwritten or ignored. That is, if the Proxy already has a state saved in persistent storage and the flag indicates that the state should be overwritten then the proxy should update the state value every time. Otherwise just ignore the state value returned by the user function.

Below is an example of a user function that could be created for this functionality:

@ActorEntity(name = "Joe", stateType = SomeState.class)
public class JoeActor {

    // This method is called once after the actor is registered. 
    // Subsequent activations do not trigger this callback
    @Init
    public Value init(ActorContext<SomeState> context) {
        // some code here
        ....
        return Value.ActorValue.at()
                .state(SomeState.getDefaultInstance())
                // Clear the state every time the function restarts
                .updateStrategy(PersistenceStrategy.OVERRIDE) // or default PersistenceStrategy.IGNORE
                .reply();
    }

    @Command
    public Value someMethod(SomeType data, ActorContext<SomeState> context) {
        // some code here
        ....
        return Value.ActorValue.at()
                // ....
                .reply();
    }
}

What do you think of this feature @marcellanz?

Would it be better to rename defact to action in the Elixir SDK?

I am in doubt if the macro defact is self explanatory enough. In Spawn an Action is an action that the Actor executes when receiving an event or when dealing with its own state. Analogous to a function but more adherent to the terminologies of the Actor Model.

It would be interesting to make this more explicit by renaming it directly to action ?

ping @eliasdarruda

Use cloudevent protobuf format on Eventing Activators

Our protobuf-based implementation makes it difficult for generic components like activators and proxy to decode a json format to a protobuf format efficiently. I suggest supporting Cloudevent's binary format for consuming events from queues.

I also find it feasible to make the client be able to choose between the clodevent format or the base64 based format (bytes of type protobuf must be encoded to base64) currently supported. This way we give some more flexibility to our activators.

https://github.com/alanconway/cloudevents-spec/blob/master/protobuf-format.md

PurgeTimeout as a optional actor option

We need to add a option that purges all data related with the actor, it will clean its state and delete the entry from the Statestore after a configured time.

It can be called PurgeTimeout, DestroyTimeout or something like that, open to suggestions on the naming pattern.

Should it de-register from all hosts too? @sleipnir WDYT?

More Tests

It is necessary to add more tests, mainly in the protocol layer and in consistency check tests

Add Kafka Activator

Remove unnecessary suffix in kubernetes resources.

See #123 for context.

Add Cassandra support

[BUG] Spawn fail when use with Phoenix and OTP 25

Describe the bug
When use Phoenix on OTP 25 errors below occurred on logs:

16:24:19.842 [notice] Application tls_certificate_check exited: :tls_certificate_check_app.start(:normal, []) returned an error: shutdown: failed to start child: :tls_certificate_check_shared_state
    ** (EXIT) an exception was raised:
        ** (MatchError) no match of right hand side value: {:error, :enoent}
            (public_key 1.13) pubkey_os_cacerts.erl:38: :pubkey_os_cacerts.get/0
            (tls_certificate_check 1.17.3) /app/deps/tls_certificate_check/src/tls_certificate_check_shared_state.erl:362: :tls_certificate_check_shared_state.maybe_load_authorities_trusted_by_otp/3
            (tls_certificate_check 1.17.3) /app/deps/tls_certificate_check/src/tls_certificate_check_shared_state.erl:335: :tls_certificate_check_shared_state.new_shared_state/2
            (tls_certificate_check 1.17.3) /app/deps/tls_certificate_check/src/tls_certificate_check_shared_state.erl:262: :tls_certificate_check_shared_state.handle_shared_state_initialization/2
            (stdlib 4.0.1) gen_server.erl:1120: :gen_server.try_dispatch/4
            (stdlib 4.0.1) gen_server.erl:1197: :gen_server.handle_msg/6
            (stdlib 4.0.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3

To Reproduce
Build docker image with args below:

ARG ELIXIR_VERSION=1.14.0
ARG OTP_VERSION=25.0.4
ARG DEBIAN_VERSION=bullseye-20220801-slim

Expected behavior
No Errors.

Kubernetes (please complete the following information):

Version 1.25.x
Provider: k3d

Spawn Operator (please complete the following information):

Version 0.5.x

Spawn Proxy (please complete the following information):

Version 0.5.x

Use alternative Distribution Protocol

In order to achieve our goals of offering a way to reach the edges in a local-first approach, it is necessary that we offer an alternative to the old Erlang Distribution Protocol, because in environments that are really distributed in a fabric that deviates from the basic requirements of a datacenter or public private cloud if requires a protocol that is easily routable over the Internet or over unreliable connections.

I'm inclined to test quic_dist as an alternative to Erlang Distributed, but if anyone has another proposal feel free to present it.

For this to work, it is very important that the alternative protocol chosen is easily pluggable with Elixir and Libcluster.

Column updated_at not being updated

When the state is updated in the database the updated_at column is not being updated with the current date and time

Rust SDK Support

Spawn Elixir SDK call abstract actors by name

Today it is not possible to call abstract actors directly by name because the spawn function only accepts the Actor module

Improvements to the Spawn concurrency model

Spawn actors are created in advance and map directly to Elixir Processes. These processes in turn are implemented via OTP using the GenServer abstraction and are started for the first time when a registration request from the User Function is received. Internally, Actor's GenServer processes are started via DynamicSupervisor.

After the registration of the user function and its actors, these processes start a first Activation/Deactivation round, that is, the Process starts, waits for a period to receive work requests and if nothing happens within the established time period for deactivation occurs then the process is terminated, its current state is saved in persistent storage and is dormant until a new invocation of this Actor/Process takes place.

Spawn processes are very efficient and fast, and in fact, a summon for a Spawn actor takes less than 1ms on average to go back and forth. But the fact that the Spawn actors are:

Initiated ahead of command invocation requests.
Be nominated.
Have your name keyed in persistent storage.

Prevent users from dynamically creating their Actors. Which is good and bad depending on certain use cases.
Good because it forces users to think and design their programs carefully.
Bad because it prevents the creation of instances of the same actor but with different individual states. Which in turn diminishes the scalability of certain types of systems.
That is, currently Spawn uses the Highlander philosophy "There can be only one" of each type of Actor.

Technically, it would be possible to make the behavior of the Actors more flexible so that a modeling of the type 1 type of actor for many instances can be possible. This could be achieved by adding a partitioning key type to the Actor and persistent storage schema.

To give a clear example, let's do a mental exercise:

Suppose the user defines an Actor that is responsible for representing an order delivery person. Let's call this fictional Actor CourierActor. Now suppose that our delivery system has a pool of Couriers and that when there is a delivery request, the system draws, based on some non-specific criteria for this mental exercise, which of these couriers should make the delivery, the courier must be able to accept and refuse the order, and also to inform if it is available to make deliveries at a given time. If we were to model this system with Spawn and used the SDK for Springboot the Ator code might look like this:

@ActorEntity(name = "Courier", stateType = Order.class)
public class CourierActor {

    @Command
    public Value shift(Shift shift, ActorContext<Order> context) {
        // some code here
        ....
        return Value.ActorValue.at()
                .value(onValue)
                .state(updateState(newState))
                .reply();
    }

    @Command
    public Value offer(Offer offer, ActorContext<Order> context) {
        // some code here
       Order order = offer.getOrder();
       // others validations
        ....
        // Then accept or reject order
        return Value.ActorValue.at()
                .value(offerResult)
                .state(updateState(order))
                .reply();
    }

    .......
}

An invocation for this Actor could be done as follows:

public class DeliveryEngine {

  @Autowired
  SpawnSystem actorSystem;

  public void offer(Delivery delivery) {
    // Somehow get the courier pool, 
    // decide on one of them to make the delivery, 
    // assemble the delivery offer and make the call 
   Offer offer = .....
    actorSystem.invoke("Courier", "offer", offer, OfferResult.class);
 
  }

}

The problem with currently modeling this type of system with Spawn is that we can only invoke one actor that represents all Couriers, when the most natural and obviously most scalable would be if we could tell the Spawn system to create an Actor CourierActor instance for every real Courier logged into the system.

Finally what I propose is that the user can, when desired, change from this:

actorSystem.invoke("Courier", "offer", offer, OfferResult.class);

For this:

actorSystem.invoke("Courier", "Joe", "offer", offer, OfferResult.class);

In other words, that it is possible to use a control/partition key, which is used to create and identify a specific instance of a specific type of Actor.

To be able to do this we would have to change a fair amount of the Spawn code, which would be:

Add the control/partition key to the initial state of the ActorEntity.
Create a mechanism that indicates to the registration step whether the actor to be registered is of type single or multi instance.
Change the database schema to include this control key.
Change the state lookup query to account for this control key.
Change the registration code so that:
- If the registered actor allows this type of multi-instance treatment, the registration step will not be able to start the GenServer Process early as done when the Actor must be a single instance. This is because at this moment it is not possible to know the value of the control key and therefore starting the Actor without it would not be useful in the Actor lookup step when it is invoked.
- Add a default key (I suggest the value none) for when it is desirable to keep the default behavior which is the current behavior of Spawn, ie single instance.

This change can have unwanted side effects for the user, for example if he sends a null or incorrect value for some reason as the value for the control/partition key during an invocation. In this case the Actor will not be located. Or try to invoke an Actor with a control key when that Actor has not been registered as a multi-instance type. Which in this case will return an error from the proxy. These are just some examples that I can imagine of problems that can occur when we create this functionality.

Could @marcellanz or @wesleimp imagine any more? Or rather, what do you think of all this?

Integration with external/remote Services

Discussed in #92

^{Originally posted by sleipnir October 26, 2022}
One of the mainstays of what we do with Eigr, inherited from the Cloudstate days and discussed later on Massa (a discussion was opened but the ideas were never properly recorded here, and which Marcel and I discussed a few times), is the idea of that we should abstract the infrastructure, and by infrastructure I don't just mean the state, from the developer. So that he can think only in terms of Business Domain and not so much in infrastructure code. We know, of course, that this is an almost utopian goal because, no matter what we do, the developer will always want to do more or another way. However, I think it is healthy to give him the tools possible so that he can at least greatly reduce his need to touch the infrastructure.

This is how I present the idea of Actors acting as a proxy for remote services or APIs. So that it continues to use the current way of invoking actors to also access remote services.

We could call these special actors ProxyActors or BindingActors. They in turn could be registered in the traditional way that actors are registered but we would create a new category of Commandos (today we have Commands and TimerCommands) that we would use to map remote services to Spawn actions. It would probably also be necessary to create additional attributes in the ActorSettings of the Actor so that the proxy knows that it is an actor that only acts as a proxy for a service. So that it can perform the necessary request based on the configurations of the given commands.

I think each SDK could have good ways of declaring this new Actor type. In Java they would probably just be an annotated Interface, in Elixir they could be behaviors and etc...

One thing we must take into account is the types of formats we will support. At first I find it easy to support HTTP+Json since protobuf types can be easily transformed into Json types. gRPC would be more complicated as we would have to dynamically generate a grpc client and I think this is quite complicated to do at first. Once the user defines the types in the usual Protobuf-based way we could transcode this to Json types easily I think.

It would not be possible to have actors of this type persistently, but we can certainly investigate that option as well. Maybe it's not smart to mix these concepts, maybe it is, the fact is that I don't know the best approach for this case.

Wdyt?

Add Mnesia native Statestore

Related with #88

More useful examples

Bakeware temp directory not found

Describe the bug

I wanted to have a look into spawn and deployed the manifest to my kubernetes cluster, but the pod fails to start:

$ kubectl logs -f -n eigr spawn-operator-679b5dfbc9-bbz9f
bakeware: Error creating directory /app/.cache/bakeware/.tmp/gGaFkn??: No such file or directory
bakeware: Unrecoverable validation error

$ kubectl describe pod -n eigr-functions spawn-operator-679b5dfbc9-bbz9f
[...]

Containers:
  spawn-operator:
    Container ID:   containerd://2071d4906138cde79a728a55e393ea1e852fb27261d7cf8145d80e5d8eaa0f73
    Image:          eigr/spawn-operator:0.5.0-rc.7
    Image ID:       docker.io/eigr/spawn-operator@sha256:4788f13dc1112bc6bf07d8dc156b833e67652dc7211c7d56b9d91ec168ed9741
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 16 Jan 2023 19:07:20 +0100
      Finished:     Mon, 16 Jan 2023 19:07:20 +0100
    Ready:          False
    Restart Count:  8
    Limits:
      cpu:     200m
      memory:  200Mi
    Requests:
      cpu:     200m
      memory:  200Mi
    Environment:
      MIX_ENV:                    prod
      BONNY_POD_NAME:             spawn-operator-679b5dfbc9-bbz9f (v1:metadata.name)
      BONNY_POD_NAMESPACE:        eigr-functions (v1:metadata.namespace)
      BONNY_POD_IP:                (v1:status.podIP)
      BONNY_POD_SERVICE_ACCOUNT:   (v1:spec.serviceAccountName)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wslrx (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-wslrx:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  17m                   default-scheduler  Successfully assigned eigr-functions/spawn-operator-679b5dfbc9-bbz9f to nuccy
  Normal   Pulling    17m                   kubelet            Pulling image "eigr/spawn-operator:0.5.0-rc.7"
  Normal   Pulled     17m                   kubelet            Successfully pulled image "eigr/spawn-operator:0.5.0-rc.7" in 4.894598973s
  Normal   Created    16m (x5 over 17m)     kubelet            Created container spawn-operator
  Normal   Started    16m (x5 over 17m)     kubelet            Started container spawn-operator
  Normal   Pulled     16m (x4 over 17m)     kubelet            Container image "eigr/spawn-operator:0.5.0-rc.7" already present on machine
  Warning  BackOff    2m40s (x72 over 17m)  kubelet            Back-off restarting failed container

To Reproduce

Just run the default installation steps:

kubectl create ns eigr-functions && \
  curl -L https://github.com/eigr/spawn/releases/download/v0.5.0-rc.7/manifest.yaml | kubectl apply -f -

Wait a few seconds and check the logs of the spawn-operator pod.

Expected behavior

Pod starts without a problem.

Additional context

Kubernetes: v1.25.2
Spawn: v0.5.0-rc.7

Actors are started with empty original parameters

Describe the bug
Example:

%Actors.Actor.Entity.EntityState{
  actor: %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: nil, name: "zezinho", persistent: false, snapshot_strategy: nil, state: nil}, ```

This causes incorrect behavior in Actors and cascading errors in clients during invocations

The correct would be:

%Actors.Actor.Entity.EntityState{
  actor: %Eigr.Functions.Protocol.Actors.Actor{__unknown_fields__: [], deactivate_strategy: %SomeStrategyHere{}, name: "zezinho", persistent: true, snapshot_strategy: %SomeStrategyHere{}, state: nil}

Save Actor metadata via tags attribute

The Statestore provides that actors can have user-defined metadata.
This metadata must be saved in the tags attribute of Statestores.Schemas.Event but today we don't have a mechanism that allows filling this attribute.

This is important to allow aggregations and lookups by attributes other than the actor name.

Integration with ElectricSQL

Add Vaxine Statemanager support

Change message InvocationRequest using just the Actor name instead of the entire Actor entity

Currently, it is necessary to pass the entire Actor Entity in the invocation message, but only the Actor and ActorSystem name are needed so that it is possible to lookup the Actor Process.
Removing the entity would save a few bytes and make the protocol less confusing.

Add gRPC Activator

It would be nice if this elixir-grpc/grpc#254 was implemented before we start this item

We need to dynamically implement a grpc contract, just like we did in the Massa project, for the client for this we need at runtime to compile the proto files via FileDescriptor

Example of converting protobuf json to elixir (pseudo code):

actor = Eigr.Functions.Protocol.Actors.Actor.new(name: "Joe")
{:ok, json} = Protobuf.JSON.to_encodable(actor)
decoded_actor = Protobuf.JSON.decode(json, Eigr.Functions.Protocol.Actors.Actor)

C# SDK Support

@joaomannes can you help with this?

Python SDK Support

Wrong service name for actor system!?

Describe the bug

I don't know whether this is a bug or I am doing something wrong, but at least it feels/looks wrong to me 🤔

When I create an actor system named example I get a service named system-spawn-system-svc:

apiVersion: v1
kind: Service
metadata:
  name: system-spawn-system-svc
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: epmd
    port: 4369
    protocol: TCP
    targetPort: epmd
  selector:
    actor-system: spawn-system
  sessionAffinity: None
  type: ClusterIP

To Reproduce

apiVersion: eigr-spawn.io/v1
kind: ActorSystem
metadata:
  name: example
spec:
  # ...

Expected behavior

Service is named system-example-svc and uses actor-system: example as selector.

apiVersion: v1
kind: Service
metadata:
  name: system-example-svc
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: epmd
    port: 4369
    protocol: TCP
    targetPort: epmd
  selector:
    actor-system: example
  sessionAffinity: None
  type: ClusterIP

_{Btw.: It feels redundant to name a service "*-svc"}

Kubernetes:

Version v1.25.2
Provider: Bare metal 😉

Spawn Operator:

Version eigr/spawn-operator:0.5.0-rc.9

Kubernetes failed to start Operator

Describe the bug
When we try to start the Operator container in Kubernetes (minikube or kind) it cannot start because it cannot write the erlang_cookie file

=ERROR REPORT==== 28-Aug-2022::14:24:56.201865 ===
Failed to create cookie file '/.erlang.cookie': eacces
=SUPERVISOR REPORT==== 28-Aug-2022::14:24:56.202565 ===
    supervisor: {local,net_sup}
    errorContext: start_error
    reason: {"Failed to create cookie file '/.erlang.cookie': eacces",
             [{auth,init_no_setcookie,0,[{file,"auth.erl"},{line,293}]},
              {auth,init_setcookie,3,[{file,"auth.erl"},{line,345}]},
              {auth,init,1,[{file,"auth.erl"},{line,144}]},
              {gen_server,init_it,2,[{file,"gen_server.erl"},{line,423}]},
              {gen_server,init_it,6,[{file,"gen_server.erl"},{line,390}]},
              {proc_lib,init_p_do_apply,3,
                        [{file,"proc_lib.erl"},{line,226}]}]}
    offender: [{pid,undefined},
               {id,auth},
               {mfargs,{auth,start_link,[]}},
               {restart_type,permanent},
               {significant,false},
               {shutdown,2000},
               {child_type,worker}]

To reproduce:

On branch feat/new-k8s-crds

make create-minikube-cluster make build-operator-image create-k8s-namespace && kubectl apply -f apps/operator/manifest.yaml

Then see the POD logs

[BUG] Gossip Strategy fail

Describe the bug

When starting the Proxy with the variable PROXY_CLUSTER_STRATEGY=gossip the system fails to start with the error below:

Could not start application proxy: exited in: Proxy.Application.start(:normal, [])
     ** (EXIT) an exception was raised:
         ** (RuntimeError) Failed to start Proxy Application: {:error, {:shutdown, {:failed_to_start_child, Proxy.Supervisor, {:shutdown, {:failed_to_start_child, Sidecar.Supervisor, {:shutdown, {:failed_to_start_child, Sidecar.ProcessSupervisor , {:shutdown, {:failed_to_start_child, :"Spawn.Cluster", {:shutdown, {:failed_to_start_child, Cluster.Supervisor, {:shutdown, {:failed_to_start_child, :proxy, {{:badmatch, {:error, :eaddrinuse }}, [{Cluster.Strategy.Gossip, :init, 1, [file: 'lib/strategy/gossip.ex', line: 126]}, {:gen_server, :init_it, 2, [file: 'gen_server. erl', line: 851]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 814]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl' , line: 240]}]}}}}}}}}}}}}}}
             (proxy 0.0.0-local.dev) lib/proxy/application.ex:28: Proxy.Application.start/2
             (kernel 8.5) application_master.erl:293: :application_master.start_it_old/4

To Reproduce
Steps to reproduce the behavior:

Open terminal on spawn_proxy/proxy folder and use this command to init the proxy: MIX_ENV=dev PROXY_CLUSTER_STRATEGY=gossip PROXY_DATABASE_TYPE=mysql PROXY_DATABASE_POOL_SIZE=10 SPAWN_STATESTORE_KEY=3Jnb0hZiHIzHTOih7t2cTEPEpY98Tu1wvQkPfq/XwqE= iex --name [email protected] -S mix
Open another terminal on spawn_proxy/proxy folder and use this command to init the proxy: MIX_ENV=dev PROXY_CLUSTER_STRATEGY=gossip PROXY_DATABASE_TYPE=mysql PROXY_DATABASE_POOL_SIZE=10 SPAWN_STATESTORE_KEY=3Jnb0hZiHIzHTOih7t2cTEPEpY98Tu1wvQkPfq/XwqE= iex --name [email protected] -S mix
See error

Expected behavior
Proxy starts without errors when using the gossip strategy

Screenshots

Spawn Operator (please complete the following information):

Version 0.5.x

Spawn Proxy (please complete the following information):

Version 0.5.x

Additional context

I opened this Issue for historical record purposes only as I already submitted a PR to the libcluster project that fixes this issue. I'm just waiting for the merge of pr in the main branch and the release of a new version containing the correction to finish this bug in Spawn.

For more details see here and here

Default Actors Methods

I propose creating default methods for actors.

My idea is that all actors automatically have methods to get values from the actor's state. That way users who want to implement methods just to get and/or set the state value don't need to code such methods, they would already exist by default.
Obviously this option could be optional and the definition of such methods could exist or not based on configuration so that this feature does not need to be changed in the SDKs.

The method names could be get_state and if the user is called by GetState or getState the system should understand these names as valid as well.

I think it's dangerous to provide a method to set an actor's state, but this might be another option if the user really wants such a method.

Integration with external/remote Services

One of the mainstays of what we do with Eigr, inherited from the Cloudstate days and discussed later on Massa (a discussion was opened but the ideas were never properly recorded here, and which Marcel and I discussed a few times), is the idea of that we should abstract the infrastructure, and by infrastructure I don't just mean the state, from the developer. So that he can think only in terms of Business Domain and not so much in infrastructure code. We know, of course, that this is an almost utopian goal because, no matter what we do, the developer will always want to do more or another way. However, I think it is healthy to give him the tools possible so that he can at least greatly reduce his need to touch the infrastructure.

This is how I present the idea of Actors acting as a proxy for remote services or APIs. So that it continues to use the current way of invoking actors to also access remote services.

I think each SDK could have good ways of declaring this new Actor type. In Java they would probably just be an annotated Interface, in Elixir they could be behaviors and etc...

Wdyt?

CLI for common tasks

Create a CLI to start the projects and maybe help with the deployment of the user Spawn projects

Error trying to create 5000 actors

When I try to create 5000 actors I'm getting the error "Killed" in the proxy.
With the help of @sleipnir I found this error:
Out of memory: Killed process 21551 (beam.smp) total-vm:29151740kB, anon-rss:16088396kB, file-rss:0kB, shmem-rss:66344kB, UID:1000 pgtables:48292kB oom_score_adj:0

Before the error the proxy created around 1000 actors

Hex.pm lists 0.1.0 as latest version of the SDK

Describe the bug

Latest version of spawn_sdk on hex.pm is 0.1.0 although newer version have been published 🤔

To Reproduce

Add spawn_sdk to your mix.exs like the following and run mix deps.get:

{:spawn_sdk, ">= 0.0.0"}`

Expected behavior

Latest version (currently: 0.5.0-rc.13) should be installed.

Additional context

Using {:spawn_statestores_postgres, ">= 0.0.0"} works fine. For that package, the latest version is 0.5.0-rc.13 on hex.pm.

	@impl GenServer
	def init(config) do
	config =
	Keyword.put(config, :ciphers,
	default:
	{Cloak.Ciphers.AES.GCM, tag: "AES.GCM.V1", key: decode_env!("SPAWN_STATESTORE_KEY")},
	secondary:
	{Cloak.Ciphers.AES.CTR, tag: "AES.CTR.V1", key: decode_env!("SPAWN_STATESTORE_KEY")}
	)

	{:ok, config}
	end

	defp decode_env!(var) do
	var
	\|> System.get_env()
	\|> Base.decode64!()
	end

eigr / spawn Goto Github PK

spawn's People

Contributors

Stargazers

Watchers

Forkers

spawn's Issues

Discussed in #92

Recommend Projects

Recommend Topics

Recommend Org