Giter Club home page Giter Club logo

Comments (7)

eboasson avatar eboasson commented on June 30, 2024

Strange, I would certainly not have expected it to crash there. I can't recall ever having seen it call ddsrt_recvmsg with "bad" addresses, and the memory should also definitely not be freed before that thread shuts down.

Is there a simple way to reproduce this that you know of?

from cyclonedds.

daisukes avatar daisukes commented on June 30, 2024

I could not reproduce with a simple code, but my code for an integration test on Gazebo often gets SIGSEGV at the exit.

https://github.com/CMU-cabot/cabot-navigation/blob/main/cabot_navigation2/test/run_test.py

from cyclonedds.

eboasson avatar eboasson commented on June 30, 2024

I could not reproduce with a simple code, but my code for an integration test on Gazebo often gets SIGSEGV at the exit.

https://github.com/CMU-cabot/cabot-navigation/blob/main/cabot_navigation2/test/run_test.py

I'll give it a try, with a bit of luck I can run that test locally and reproduce it that way.

from cyclonedds.

daisukes avatar daisukes commented on June 30, 2024

Hello @eboasson

I made some scripts to run the integration test. Please read README in the zip file.
I can run it on Ubuntu20.04, 32GB RAM, 12 core machine.
The code will run several docker containers and require a display (maybe works with virtual display)

cyclone-dds-bug-1937.zip

from cyclonedds.

eboasson avatar eboasson commented on June 30, 2024

Hi @daisukes, that's really nice! 🙏

I tried yesterday by following the information in the cabot README, but to no avail: gazebo on aarch64 Ubuntu 22.04 doesn't even exist. Even if it had, it would anyway have had to run in qemu and no doubt run out of memory, disk space or CPU cycles. Now I know my old Intel box here at home also won't do.

That means the first real chance comes on Monday.

from cyclonedds.

eboasson avatar eboasson commented on June 30, 2024

Hi @daisukes, we (@PatrickM-ZS really) tried it out and can easily reproduce it, so that's great.

What we see is that the Python exit handling ends up closing the RMW implementation too early, so in the main thread of the program. It looks like Cyclone DDS is still active when the exit handlers call librmw_implementation.so, which dlcloses librmw_cyclonedds_cpp.so and results in the unmapping (via munmap) of the memory for that library and the Cyclone library (libddsc.so). At that point, the memory containing the code that the thread that crashed is executing (and some of the data it uses) has been yanked from the address space and it crashes.

I'm pretty much certain this is not a bug in Cyclone or the Cyclone RMW layer but something in the Python (test) scripting that causes the ROS 2 nodes and other entities to not be cleaned up before this exit handlers kick in. I don't know Python well enough to know how you could get into that state, and I also haven't tried to find out yet. Perhaps you have an idea what to look for?

Note that it could be fairly trivially "solved" in the Cyclone RMW layer if can run some code just prior to unmapping the library: dds_delete(DDS_CYCLONEDDS_HANDLE) will immediately delete all entities, stop all threads and deinitialize the Cyclone library. However, I think that goes directly against the philosophy behind rclcpp, rcl and rmw, which seems to be that the application must delete all the entities it created from the leaf entities to the nodes/contexts.

from cyclonedds.

daisukes avatar daisukes commented on June 30, 2024

@eboasson

Thank you very much for your support!
I put rclpy's cleanup before exiting the test code, and it looks stable.

from cyclonedds.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.