Giter Club home page Giter Club logo

Comments (5)

eboasson avatar eboasson commented on June 26, 2024

Hi,

I had a look and it is to be expected: you can get lucky and have all 70 subscribers get it (I did, so I know it can happen), but there is no guarantee whatsoever. The reason is firstly the way the two programs work, and secondly the chosen quality-of-service settings:

The first part, the way they work is the following: the publisher waits until the discovery data indicates there is a subscriber (really just until there is a change in the number of discovered subscribers), then it writes one sample and terminates. Because all the processes independently discover each other, it may be that it publishes that sample when it has discovered just one subscriber, some of them, or all of them.

While that is entirely timing dependent, in practice it seems likely that starting the publisher after the subscribers were started gives you a good chance that it will discover many of them in time, simply because starting the publisher triggers the discovery. Another factor could be that the first call to dds_get_status_changes is likely to happen before anything has been discovered (discovery requires multiple roundtrips) and in consequence it'll probably sleep for 20ms between the call to dds_create_writer and the call to dds_write. Those 20ms might well be enough to complete discovery in many cases.

The second part has to do with the QoS: this is a volatile (durability QoS kind) topic/writer/reader, and so the one sample published only goes to the readers discovered at the time of writing [1]. Addressing this type of problem is what the "durability" QoS setting is for. If, instead of DDS_DURABILITY_VOLATILE (the default), you use DDS_DURABILITY_TRANSIENT_LOCAL (on both sides!), then writer will keep the sample for any reader discovered later and the readers will request it when they discover the writer.

There is one problem with this approach though: the writer must remain in existence, so you can't terminate the process immediately. If you change the QoS this way and add a sleep — e.g., of 1s — after the call to dds_write, I would expect it to reliably deliver the data to all 70 readers.

This is nice, of course, but you should not have to keep the process and the writer in existence (it would mean you could never stop it as long as another reader might show up ...), and that is why DDS has a DDS_DURABILITY_TRANSIENT setting as well [2]. The idea behind "transient" is that the writer need not be kept around, that the DDS middleware stores that data independently of the application processes. For the subscribers not much changes [3], they still get the historical data. [4]

The real strength of DDS lies in this particular mode, "transient" data is the concept that really helps for building fault-tolerant, extensible systems where processes can come and go. "Transient-local" is nice, but it can't really help when components can fail/crash. However, it is also vastly simpler to implement than "transient" data. While full support for transient data is very much in sight for Cyclone, at the moment it is not yet supported. And so, though transient-local is but a poor alternative, it is the only option at the moment [5].

Does this clear up things or did I only make it more confusing? 🤔

[1] It may even be dropped by the reader if it hasn't yet discovered the writer, that's a grey area.
[2] It is specified and available in several implementations, but it is not required by the "minimal" profile in the specification, which is where Cyclone currently is.
[3] Nothing, really, for normal uses, but you can design experiments in which you can tell the difference in correct implementations.
[4] There is an obscure QoS "writer data lifecycle" that contains a setting called "autodispose_unregistered_instances". It defaults to true, but that means that the data written by a transient writer would be deleted from the system when the writer disappears, kinda defeating the purpose of setting "transient" durability ... It is much better to set it to false!
[5] It is partially supported, in that it can work with the transient data support in OpenSplice.

from cyclonedds.

jwcesign avatar jwcesign commented on June 26, 2024

Thank you so much for explanation!
is there any way to make all the sub receive the message but not set QoS "durability" as DDS_DURABILITY_TRANSIENT_LOCAL?

from cyclonedds.

jwcesign avatar jwcesign commented on June 26, 2024

And one more question, if I launch 70 sub, then one pub, sometimes all receive, but if I launch 100 sub, then one pub, there is few sub receive the message, what's the reason?

from cyclonedds.

eboasson avatar eboasson commented on June 26, 2024

I guess the more subscribers, the more time it takes to the discovery. It all runs in parallel, so it is conceivable that with 70 it can still make it, but that with 100 it has almost all of them halfway through the discovery. 'Tis but a guess ... there is a hard-to-read tracing format that would tell you more, but I'm quite certain the overhead of tracing to a text file will significantly affect this timing.

As to making sure all subs receive the message while using "volatile" data, the only option is to wait longer between dds_create_writer and dds_write. If you know there will be n subs, you could wait until the dds_get_publication_matched_status returns a current_count equal to n. Or, easier, if you know that you start the subscribers first, you could just wait for a little while.

P.S. All that is perfectly fine, but it does make for a much tighter coupling between the subscribers and the publisher than I personally would be happy with.

from cyclonedds.

jwcesign avatar jwcesign commented on June 26, 2024

I think u right, DDS should be not so tight coupling. Thank you for your careful explanation!!!

from cyclonedds.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.