Comments (8)
Additionally (although this may be better separated out as a new issue) calling .stop()
on a RemoteActor kills the RemoteActor, not the actor that is represents. This may be the intended behaviour, but it is a bit of a footgun and so should be either documented or have its behaviour changed. I got around this in the integration test by sending a message to the remote actor that then causes it to stop itself.
from ractor.
Additionally (although this may be better separated out as a new issue) calling
.stop()
on a RemoteActor kills the RemoteActor, not the actor that is represents. This may be the intended behaviour, but it is a bit of a footgun and so should be either documented or have its behaviour changed. I got around this in the integration test by sending a message to the remote actor that then causes it to stop itself.
Yes for this, I don't have a good solution for remote supervision which is why for now stop()
doesn't propagate over the wire. I suppose it does make sense to allow a remote process to stop an actor, and we should propagate the stop signal (and kill for that matter), however I still think remote supervision is a large task to tackle here.
from ractor.
The problem seems to occur due to NodeSession
receiving ActorTerminated
prior to PgLeave
. The handler for ActorTerminated
removes the actor from state.remote_actors
, and then when the PgLeave
handler calls NodeSession::get_or_spawn_remote_actor
in order to get the ActorCell
to remove them from the group, it inadvertently creates the remote actor again:
I believe we need to be able to remove terminated actors from the process groups without recreating them, but currently when we unregister from the process group it uses the ActorCell
in order to notify any supervisors with the SupervisionEvent::ProcessGroupChanged
event. It looks like ActorCell
gets around this in the local case by first removing all group event listeners for that actor, and then leaving the group:
ractor/ractor/src/actor/actor_cell/mod.rs
Lines 283 to 284 in 6e36b8b
This approach wouldn't work for this situation, since we don't control the entire terminate+pgleave process.
I'm open to ideas in solving this. One approach could be that when the NodeSession receives a Terminate event, we could manually unregister it from the process group (and in the process, notifying any supervisors that are monitoring it), and then in the PgLeave event handler only unregister actors from the process group if they still exist in state.remote_actors
. Testing this approach does seem to fix the issue, but it may not be the cleanest solution.
from ractor.
I have a minimum reproducible example available here: https://gist.github.com/calebfletcher/057e9ca35acf3aaa181c1f0a5f679357
If you start server.rs
first, then client.rs
, then have a look at the logs on the client there should be a warning and an error, the same messages that are in the second comment of this thread.
from ractor.
Ran into one issue, when an actor is stopped on the remote node it causes errors on the local node:
[2023-04-23T15:17:15Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:54596 <- 127.0.0.1:4697 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(Terminate(Terminate { ids: [2] })) })) }'
[2023-04-23T15:17:15Z DEBUG ractor_cluster::net::session] RECEIVE 127.0.0.1:54596 <- 127.0.0.1:4697 - 'NetworkMessage { message: Some(Control(ControlMessage { msg: Some(PgLeave(PgLeave { group: "test", actors: [Actor { pid: 2, name: Some("actorname") }] })) })) }'
[2023-04-23T15:17:15Z DEBUG ractor_cluster::node::node_session] Actor 2 on node 0 exited, terminating local `RemoteActor` 0.2
[2023-04-23T15:17:15Z DEBUG ractor::registry] registering actor with name actorname
[2023-04-23T15:17:15Z ERROR ractor_cluster::node::node_session] Failed to spawn remote actor with 'Actor 'actorname' is already registered in the actor registry'
[2023-04-23T15:17:15Z WARN ractor_cluster::node::node_session] NodeSession Some(NameMessage { name: "node_b@localhost", flags: Some(NodeFlags { version: 1 }), connection_string: "localhost:4697" }) received an unknown child actor exit event from 0.2 - 'Some("remote")'
The name of the remote actor is 'actorname', and node_a
is the remote node.
I'm not sure why the termination of the actor is trying to re-register the actor, will have to look into it.
from ractor.
Hmm so the local remote actor isn't fully shutdown perhaps? And therefore couldn't self-leave the process group? Yeah it might be something we need to look at, let me see if I can get a repro scenario together and replicate it and what might be good solutions for solving it. Thanks for reporting!
from ractor.
I need to re-run this after the merged PR, but I'm hoping this solved your issue. If yes, I'll add an integration test of the situation to make sure we don't regress on it.
from ractor.
Reran my test case with the upstream changes, it seems to have fixed the issue, so thanks for that. I have opened PR #103 to implement the change and to add an integration test.
from ractor.
Related Issues (20)
- Mod Driver Aplikasi Fake GPS HOT 1
- About section needs update for async-std
- Support `async fn` in traits. HOT 1
- Request for TCP Echo Server Example with TcpListener and TcpStream as Actors HOT 1
- Enum contains type of itself HOT 4
- Add support to downcast a BoxedMessage to get a reference to it's wrapped type without consuming it HOT 4
- When panic=abort is on, panic is not captured. HOT 1
- Lifetimes do not match method in trait HOT 2
- Subscriber-Driven OutputPort Subscriptions
- Enhancing OutputPort Backpressure Handling via RecvError::Lagged Management and Buffer Configurability HOT 6
- `post_stop` of children are being called when supervisor fails. HOT 1
- Not depend protobuf-src on windows HOT 1
- Compiler panics HOT 2
- Handle multiple message types HOT 1
- Awaiting input in an actor HOT 1
- SpawnErr when spawning named actor will permanently pollute that name
- As a subscriber, I should can subscribe multiple type of messages from publisher HOT 2
- With async-trait turned off, it is impossible to use factory HOT 3
- flush() method to help testing HOT 1
- call_t! not working with predefined trait type HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ractor.