Several issues have been ticketed about a race condition between Connext and the user

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

It would be good to expose which subion was matched so that you could further as

The ticket proposes: wait_until_subion_available <l

I see <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Nondeterministic startup behavior about rmw_connext HOT 11 CLOSED

ros2 commented on September 13, 2024

Nondeterministic startup behavior

from rmw_connext.

Comments (11)

jacquelinekay commented on September 13, 2024

Here's my idea for fixing this issue:

The DataReaderListener interface offers an on_subscription_match() callback, and the DataWriterListener interface similarly offers an on_publication_match() callback:

https://community.rti.com/static/documentation/connext-dds/5.2.0/doc/api/connext_dds/api_cpp/classDDSListener.html

We could expose a new function in Publisher, wait_until_subscription_available. (The parallel function in Client would be wait_until_service_available.) The overridden on_subscription_match callback would set a member boolean in CustomPublisherListener called is_subscription_matched to true, and it would also trigger a guard condition. The rmw_wait_until_subscription_available function would first check is_subscription_matched, return if true, and create a waitset and wait on the guard condition if false.

from rmw_connext.

esteve commented on September 13, 2024

@jacquelinekay sounds good to me

from rmw_connext.

wjwwood commented on September 13, 2024

It would be good to expose which subscription was matched so that you could further assert that the matched subscription or publication is the one you're expecting (to be more robust to external publishers and subscribers). There are already a few functions in rmw which aim at this: https://github.com/ros2/rmw/blob/master/rmw/include/rmw/rmw.h#L224-L232 But you may need to add more or just expose a gid from the established connection and the user could use that to compare with a publisher or subscription they have locally.

Otherwise I think the approach sounds good to me.

from rmw_connext.

dirk-thomas commented on September 13, 2024

The ticket proposes:

wait_until_subscription_available
boolean in CustomPublisherListener called is_subscription_matched

Both clearly do not work for N subscribers if the first one unblocks
the wait / toggles the boolean. Extending the proposal as you mentioned
(providing identifiers about the subscribers) would be necessary to work
for N > 1.

The functions you referenced (rmw_get_gid_for_publisher) are there to
support the intra process comm. Intra process comm is not anonymous pub/sub
since it requires detailed knowledge about all communicating entities. In
this term intra process comm is special.

I think our general pub/sub should not expose (or use internal) these
functionalities) since it would require the undelying rmw implementation to
be not anonymous. Any implementation which simply subscribes via UDP
without signaling the publisher would be unable to implement our interface.
That's why I think it would be the wrong direction. E.g. ROS 1 does
explicitly not expose these information to a publisher / subscriber. Only
the number of active connections is available.

from rmw_connext.

wjwwood commented on September 13, 2024

Just because rmw_get_gid_for_publisher is only used by intra process right now does not mean that it could not be used by other parts of the system. I also disagree that intra process is not anonymous pub/sub.

In my opinion the notion of anonymous pub/sub means that you pub and sub to an anonymous name which is data centric rather than host centric, i.e. pub and sub to /chatter rather than pub to /chatter and sub to publishing_node/chatter. I do not agree that it means information about the topology of the graph cannot be known or used. If you're using a connection-less system then you need to have a meta data system to collect the information, which is what the ROS master did in part in ROS 1 and it's what DDS does in discovery. I don't think it is the wrong direction to ask that any underlying middleware be able to answer questions like "who is connected to me?" or "to whom am I connected?" and "when did that happen?".

And I still don't see how you build a life cycle system which doesn't use this kind of meta data but can still know when all connections are established and it's safe to move to the next state.

from rmw_connext.

dirk-thomas commented on September 13, 2024

The definition of a publish / subscribe system is that one communicating entity is unaware of any of the other communicating entities.

See first paragraph of https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern:

In software architecture, publish–subscribe is a messaging pattern where senders of messages,
called publishers, do not program the messages to be sent directly to specific receivers, called
subscribers, but instead characterize published messages into classes without knowledge of which
subscribers, if any, there may be. Similarly, subscribers express interest in one or more classes and
only receive messages that are of interest, without knowledge of which publishers, if any, there are.

I am not arguing that the information doesn't need to exist for orchestration (like in the rosmaster). I argue against using this information within a publisher / subscriber directly.

from rmw_connext.

gerkey commented on September 13, 2024

I see @dirk-thomas's point about assuming anonymity for pub/sub in general, but there's no reason that we can't offer to expose information about participants for those cases in which it's useful. There's of course the risk of abuse of this kind of feature by poorly structured user code.

But there are high reliability or otherwise more stringently configured systems in which it may be a requirement to at least know and perhaps verify who is talking with whom, for reasons of ensuring proper system configuration and an audit trail (e.g., "who sent the message on /cmd_vel that made the robot crash into the wall?").

+1 to the proposal.

from rmw_connext.

dirk-thomas commented on September 13, 2024

The proposal is about using these information to fix the various non-deterministic startup behaviors. I would classify that as "abuse of this kind of feature" according to you rational.

from rmw_connext.

jacquelinekay commented on September 13, 2024

Dirk is arguing that the feature should not be offered by Publisher or Subscriber. But it could be offered by an alternative manager for orchestration. With the codebase in its current state, the IntraProcessManager could manage checking connections made within the process, which I think would be adequate for the use case I'm most concerned with (determinism in tests).

For example:

void IntraProcessManager::wait_until_connection_established(const Publisher::SharedPtr pub, const Subscription::SharedPtr sub, timeout=-1);

void IntraProcessManager::wait_until_connection_established(const Service::SharedPtr pub, const Client::SharedPtr sub, timeout=-1);

However, adding such a feature to IntraProcessManager now has the danger of crowding the API, since when component lifecycle is implemented, there will presumably be a different manager for orchestration.

from rmw_connext.

wjwwood commented on September 13, 2024

I think the definition that was linked to is neither at odds with the test Jackie is working on nor the way intra process is implemented. In the case of the test Jackie is trying to fix, the code is different from normal user code because it is trying to assert something about the state of the system (as opposed to trying to write a reusable and decoupled piece of software). The definition doesn't imply that the underlying communication has to be devoid of this information, only the code the user writes. And in the case of our intra process comms, the only information the user has to provide is the topic and whether or not they want to use intra process, which doesn't violate the anonymous pub/sub pattern in my opinion.

I am not arguing that the information doesn't need to exist for orchestration (like in the rosmaster). I argue against using this information within a publisher / subscriber directly.

It need not be in the subscription or publisher, but we do need a way to assert that a publisher/subscriber pair have been matched. I haven't heard another proposal for how to accomplish the technical aspects, only a suggestion to wait and use life cycle. But in my opinion waiting on life cycle doesn't help because what Jackie needs to make these tests reliable is the same primitive functionality that life cycle needs to be able to work (as far as I can see).

With the codebase in its current state, the IntraProcessManager could manage checking connections made within the process, which I think would be adequate for the use case I'm most concerned with (determinism in tests).

I don't think that this is specific only to intra process, because conceptually the same deterministic startup problems exist with inter process communications as well. And in order for intra process to know when connection within the process have been made then you'll need the same mechanism because at the core of that is the need to know when a rmw publisher and subscription pair have been matched and are ready to be used.

So I think Jackie's suggestion of the using the DDS listener internally and exposing that information some how is the correct approach at the lowest level. How best to expose that in rclcpp is not clear to me, but one option is that we add the necessary interfaces to rmw, and just use rmw functions directly in the tests. That way we can defer exposing it in rclcpp until we have lifecycle and it's clearer how that will interact with rclcpp. No matter what we do now we will have to add some functionality to rmw to make this happen, that is unless someone else knows how to do it with the current capabilities.

from rmw_connext.

dirk-thomas commented on September 13, 2024

Assuming that using the now available API to wait for a server to be available or check the matched subscriber count solves this use case we are going ahead and close this.

from rmw_connext.

Nondeterministic startup behavior about rmw_connext HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent