Giter Club home page Giter Club logo

rmw_connext's Introduction

rmw_connext

Implementation of the ROS Middleware (rmw) Interface using RTI's Connext DDS.

DEPRECATION NOTICE: rmw_connextdds is a new RMW implementation for RTI Connext DDS, which supersedes the one contained in this repository (rmw_connext_cpp). This new implementation was developed by RTI in collaboration with the ROS 2 community, and it resolves several performance issues that are present in this implementation. rmw_connextdds is included in ROS 2 releases starting with Galactic. This rmw implementation will be supported until the end-of-life of the ROS distributions it is available in (ROS 2 Dashing and Foxy).

Working with rmw_connext

To use rmw_connext with ROS2 applications, set the environment variable RMW_IMPLEMENTATION=rmw_connext_cpp and run your ROS2 applications as usual:

Linux:
export RMW_IMPLEMENTATION=rmw_connext_cpp
or prepend on ROS2 command line, such as: RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run rviz2 rviz2

Windows:
set RMW_IMPLEMENTATION=rmw_connext_cpp

Binary Installation

Pre-built binaries for RTI Connext DDS are available for x86_64 (Debian/Ubuntu) Linux platform using the steps outlined in the ROS2 installation wiki, available under a non-commercial license.
Other platforms must be built from source, using a separately-installed copy of RTI Connext DDS.

How to get RTI Connext DDS

This implementation of rmw_connext requires version 5.3.1 of RTI Connext DDS, which can be obtained through the RTI University Program, purchase, or as an evaluation. Note that the RTI website has Free Trial offers, but these are typically for the most-current version of RTI Connext DDS (6.0.1 as of this writing), which does not build with this implementation of rmw_connext.

Building

Refer to the Install DDS Implementations page for details on building rmw_connext for your platform.

Using Connext XML QoS settings

QoS profiles can be specified in XML according to the load order specified here. url_profile and string_profile cannot be used.

ROS will use the profile with the is_default_qos="true" attribute. The policies defined in the ROS QoS profile will override those in the default profile, except when rmw_qos_profile_system_default is used.

For example:

<?xml version="1.0"?>
<dds xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://community.rti.com/schema/5.3.1/rti_dds_qos_profiles.xsd" version="5.3.1">
  <qos_library name="Ros2TestQosLibrary">
    <qos_profile name="Ros2TestDefaultQos" base_name="BuiltinQosLib::Baseline.5.3.0" is_default_qos="true">
      <participant_qos>
        <property>
          <value>
            <!-- 6.25 MB/sec (52 Mb/sec) flow controller -->
            <element>
              <name>dds.flow_controller.token_bucket.slow_flow.token_bucket.max_tokens</name>
              <value>8</value>
            </element>
            <element>
              <name>dds.flow_controller.token_bucket.slow_flow.token_bucket.tokens_added_per_period</name>
              <value>8</value>
            </element>
            <element>
              <name>dds.flow_controller.token_bucket.slow_flow.token_bucket.bytes_per_token</name>
              <value>8192</value>
            </element>
            <element>
              <name>dds.flow_controller.token_bucket.slow_flow.token_bucket.period.sec</name>
              <value>0</value>
            </element>
            <element>
              <name>dds.flow_controller.token_bucket.slow_flow.token_bucket.period.nanosec</name>
              <value>10000000</value>
            </element>
          </value>
        </property>
      </participant_qos>

      <datawriter_qos topic_filter="rt/my_large_data_topic">
        <reliability>
          <kind>RELIABLE_RELIABILITY_QOS</kind>
        </reliability>
        <publish_mode>
          <flow_controller_name>dds.flow_controller.token_bucket.slow_flow</flow_controller_name>
        </publish_mode>
      </datawriter_qos>
    </qos_profile>
  </qos_library>
</dds>

That will force all publishers in the my_large_data_topic to use the slow_flow flow controller, but the reliability specified in the ROS QoS profile will be used except if its value is RMW_QOS_RELIABILITY_POLICY_SYSTEM_DEFAULT. See Topic Name Mangling section to understand the rt/ prefix.

See RTI Connext docs to understand topic filters.

Overriding ROS specified QoS policies for a topic

To use this feature, you must first set the following environment variable:

:: Windows
set RMW_CONNEXT_ALLOW_TOPIC_QOS_PROFILES=1
# Linux/MacOS
export RMW_CONNEXT_ALLOW_TOPIC_QOS_PROFILES=1

If the environment variable is set, when a profile name matches the dds topic name, it will be used and the ROS specified profile will be ignored.

For example:

<?xml version="1.0"?>
<dds xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://community.rti.com/schema/5.3.1/rti_dds_qos_profiles.xsd" version="5.3.1">
  <qos_library name="Ros2TestQosLibrary">
    <qos_profile name="Ros2TestDefaultQos" base_name="BuiltinQosLib::Baseline.5.3.0" is_default_qos="true">
      <participant_qos>
        <property>
          <value>
            <!-- 6.25 MB/sec (52 Mb/sec) flow controller -->
            <element>
              <name>dds.flow_controller.token_bucket.slow_flow.token_bucket.max_tokens</name>
              <value>8</value>
            </element>
            <element>
              <name>dds.flow_controller.token_bucket.slow_flow.token_bucket.tokens_added_per_period</name>
              <value>8</value>
            </element>
            <element>
              <name>dds.flow_controller.token_bucket.slow_flow.token_bucket.bytes_per_token</name>
              <value>8192</value>
            </element>
            <element>
              <name>dds.flow_controller.token_bucket.slow_flow.token_bucket.period.sec</name>
              <value>0</value>
            </element>
            <element>
              <name>dds.flow_controller.token_bucket.slow_flow.token_bucket.period.nanosec</name>
              <value>10000000</value>
            </element>
          </value>
        </property>
      </participant_qos>
    </qos_profile>
    <qos_profile name="rt/my_large_data_topic" base_name="BuiltinQosLib::Baseline.5.3.0">
      <datawriter_qos>  <!--Don't use topic filters here-->
        <publish_mode>
          <flow_controller_name>dds.flow_controller.token_bucket.slow_flow</flow_controller_name>
        </publish_mode>
        <reliability>
          <kind>RELIABLE_RELIABILITY_QOS</kind>
        </reliability>
      </datawriter_qos>
    </qos_profile>
  </qos_library>
</dds>

In this case, all publishers in the topic /my_large_data_topic will use the specified slow flow controller and have a reliable reliability (regardless of the reliability specified in ROS code).

Caveats:

  • If you want to override the QoS profiles used for all publishers in a topic, the subscription profiles in the same topic will also be overriden. If you don't explicitly provide one, a default will be used.
  • RTI Connext will log an error each time that it tries to find a profile that doesn't exist. Your will see a lot of these logs in your terminal when using RMW_CONNEXT_ALLOW_TOPIC_QOS_PROFILES option.

Specifying an specific QoS library

If you only provided one QoS library to the process, that one will be used. If not, the RMW_CONNEXT_QOS_PROFILE_LIBRARY must be used:

:: Windows
set RMW_CONNEXT_QOS_PROFILE_LIBRARY=Ros2TestQosLibrary
# Linux/MacOS
export RMW_CONNEXT_QOS_PROFILE_LIBRARY=Ros2TestQosLibrary

Specifying a different default QoS profile

You can use the RMW_CONNEXT_DEFAULT_QOS_PROFILE environment variable for this. It overrides the profile marked with is_default_qos="true" when set. The profile is looked up in the QoS profile library RMW connext is using.

Using user provided publish mode

ROS is always overriding the QoS profile of datawriters to use ASYNCHRONOUS_PUBLISH_MODE_QOS. To avoid that from being overriden, you can set the following environment variable:

:: Windows
set RMW_CONNEXT_DO_NOT_OVERRIDE_PUBLICATION_MODE=1
# Linux/MacOS
export RMW_CONNEXT_DO_NOT_OVERRIDE_PUBLICATION_MODE=1

ROS topic name mangling

ROS uses the following mangled topics when the ROS QoS policy avoid_ros_namespace_conventions is false, which is the default:

  • Topics are prefixed with rt. e.g.: /my/fully/qualified/ros/topic is converted to rt/my/fully/qualified/ros/topic.
  • The service request topics are prefixed with rq and suffixed with Request. e.g.: /my/fully/qualified/ros/service request topic is rq/my/fully/qualified/ros/serviceRequest.
  • The service response topics are prefixed with rr and suffixed with Response. e.g.: /my/fully/qualified/ros/service response topic is rr/my/fully/qualified/ros/serviceResponse.

Quality Declaration (per REP-2004)

See RTI Quality Declaration file, hosted on RTI Community website.

rmw_connext's People

Contributors

ahcorde avatar anamud avatar clalancette avatar cottsay avatar dhood avatar dirk-thomas avatar esteve avatar fujitatomoya avatar gerkey avatar hidmic avatar ivanpauno avatar jacobperron avatar jacquelinekay avatar jwillemsen avatar karsten1987 avatar lobotuerk avatar mikaelarguedas avatar mjcarroll avatar mjeronimo avatar mm318 avatar neil-rti avatar nuclearsandwich avatar ross-desmond avatar serge-nikulin avatar sloretz avatar sriramster avatar tfoote avatar thomas-moulard avatar vmayoral avatar wjwwood avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rmw_connext's Issues

ros2 topic * commands can't always determine the type with Connext

Bug report

Required Info:

  • Operating System:
    • Ubuntu 16.04 AMD64
  • Installation type:
    • Source
  • Version or commit hash:
  • DDS implementation:
    • RTI Connext
  • Client library (if applicable):
    • rclpy-ish

Steps to reproduce issue

Terminal 1:

RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_cpp talker

Terminal 2:

RMW_IMPLEMENTATION=rmw_connext_cpp ros2 topic echo /chatter

Expected behavior

The data on the /chatter topic is printed.

Actual behavior

Around 20% of the time, the ros2 topic command will hang for 3-5 seconds, then print:

Could not determine the type for the passed topic

Additional information

Re-running the relevant command usually (but not always) makes it start working. Running it a number of times in a row always makes it start running.

Also, I may have seen this with other RMW implementations (particularly Fast-RTPS), but it seems much more common on Connext (though I haven't yet ruled out other factors).

get_datareader_qos should use DDSSubscriber and get_datawriter_qos should use DDSPublisher

Feature request

Feature description

The get_datawriter_qos and get_datareader_qos use a RTI Connext DDS API extension to retrieve the default datareader/datawriter QoS from the DDSDomainParticipant. According to the DDS specification the datareader QoS should be retrieved from DDSSubscriber and datawriter QoS shoiuld be retrieved from DDSPublisher.

Implementation considerations

By using the RTI extension this code can break in the future.

Code in wait function continues to possibly return RMW_RET_OK despite DDS return error

Bug report

Required Info:

  • Operating System:

    • Ubuntu 16.04
  • Installation type:

(
RMW_SET_ERROR_MSG("Failed to get detach condition from waitset");
)

(
RMW_SET_ERROR_MSG("Failed to get detach condition from waitset");
)

(
RMW_SET_ERROR_MSG("Failed to get detach condition from waitset");
)

(

RMW_SET_ERROR_MSG("Failed to get detach condition from waitset");
)

  • Version or commit hash:
    • see above
  • DDS implementation:
    • RTI Connext
  • Client library (if applicable):
    • NA

Steps to reproduce issue

Quoting @dirk-thomas

After setting the error message the function should return `RMW_RET_ERROR`. The quick fix would be to just add the return statement in the cases it is missing.

The "correct" fix would be to update the structure of the code to store the return value in a value but still perform the other cleanup and only at the very end `return`.

I will provide PR for the quick fix and started an issue for the correct fix.

Fixed guard conditions are not reset

@dirk-thomas and I identified an apparent bug: while the non-fixed guard conditions are reset to false after waiting, the fixed guard conditions that were added during waitset creation are never reset to false.

As a result, if you've ever done something to trigger a fixed guard condition (e.g., add a node or subscriber), then rmw_wait() on Connext should return immediately, every time you call it, without waiting. That behavior might explain many of our test failures.

The proposed fix is to add a block to the end of the wait call to reset the fixed guard conditions.

Accumulate DDS errors during cleanup and return error code from wait() function

Bug report

In #263 we provided a quick fix for when DDS returns errors which does not guarantee a complete clean up. To quote @wjwwood

You'd need to accumulate the errors during clean up and then return an error return code if there are any accumulated (and also find a way to communicate all of the issues encountered or else decide to just log errors that are being "overwritten" by new ones).

To recap, we are talking about the code in this function. The call graph for this function is as follows:

rmw_ret_t wait() is used in:

  • rmw_connext/rmw_connext_dynamic_cpp/src/functions.cpp => rmw_ret_t rmw_wait()
  • rmw_connext/rmw_connext_cpp/src/rmw_wait.cpp => rmw_ret_t rmw_wait()

rmw_ret_t rmw_wait() is used in:

  • rcl/rcl/src/rcl/wait.c => rcl_ret_t rcl_wait()

rcl_ret_t rcl_wait() is used in:

  • rclcpp/rclcpp/src/rclcpp/executor.cpp => wait_for_work()

wait_for_work() is used in:

  • AnyExecutable::SharedPtr Executor::get_next_executable(std::chrono::nanoseconds timeout)

get_next_executable() is used in:

  • spin_once()
  • spin_some()

spin_once() or spin_some() is used directly in the application

I would first need to understand what is meant by the complete clean up. Can someone maybe fill me in a bit?

Would we need to set all

rmw_subscriptions_t * subscriptions,
rmw_guard_conditions_t * guard_conditions,
rmw_services_t * services,
rmw_clients_t * clients,
rmw_waitset_t * waitset,

to null and then also catch and handle this in rcl/rcl/src/rcl/wait.c => rcl_ret_t rcl_wait()?

Required Info:

  • Operating System:
    • Ubuntu 16.04
  • Installation type:
    • source
  • Version or commit hash:
  • DDS implementation:
    • RTI Connext
  • Client library (if applicable):
    NA

[rmw_connext_dynamic] fix cleanup functions

In #45 various clean up functions have been introduced. In various tests (e.g. in test_rclcpp) these functions print error messages:

Error in destruction of rmw publisher handle: failed to delete contained entities for publisher, at ...
Error in destruction of rmw subscription handle: failed to delete subscriber, at ...

RTI shared memory issue

I am currently testing the namespace implementation on connext.
Build Status

On the build farm i get rti related error messages, which I cannot reproduce locally:

18:10:14 4: [test_executable_0] [D0108|ENABLE]RTIOsapiSharedMemoryMutex_create:OS semget() failure, error 0X1C: No space left on device
18:10:14 4: [test_executable_0] [D0108|ENABLE]NDDS_Transport_Shmem_create_recvresource_rrEA:failed to initialize shared memory resource mutex for key 0xb086aa

The RTI knowledge base recommends increasing the number of allowed semaphores and such, however I am a bit sceptical about it. Has one of you ever encountered similar behavior?

https://research.rti.com/kb/what-causes-rtiosapisharedmemorymutexcreateos-semget-failure-error-0x1c-error-message

Unused WIN32 compiler definiton

Bug report

Required Info:

  • Operating System: Windows 10
  • Installation type: from source
  • Version or commit hash: 188147a
  • DDS implementation: RTI Connext
  • Client library (if applicable): N/A

Steps to reproduce issue


Expected behavior

rmw_connext_cpp/CMakeLists.txt, lines 116 and 122 should contain RMW_CONNEXT_CPP_BUILDING_DLL definition.

Actual behavior

The lines use ROSIDL_TYPESUPPORT_CONNEXT_CPP_BUILDING_DLL definition (copy paste error?)

Nondeterministic startup behavior

Several issues have been ticketed about a race condition between Connext and the user thread: Connext DataReaders and DataWriters are slow to establish a connection (probably due to multicast discovery). rclcpp spin_* functions appear not to work if called before these entities are accessed before initialization:

ros2/ros2#111
ros2/rclcpp#124
ros2/system_tests#68 (pull request)
#76

There should be an option in create_subscription/publisher/service/client to block until the underlying DataReader/Writer is finished initializing.

unbounded fields workaround does not work for non-primitive datatypes in srvs

I created rcl_interfaces with some basic nested service definitions: https://github.com/ros2/rcl_interfaces/tree/master/rcl_interfaces/srv

However it crashed on generation:

WARN com.rti.ndds.nddsgen.antlr.auto.IdlLexer ParamDescription_.idl line 33  preprocessor directive not supported. It will be ignored
WARN com.rti.ndds.nddsgen.antlr.auto.IdlLexer GetParamsRequest_.idl line 25  preprocessor directive not supported. It will be ignored
INFO com.rti.ndds.nddsgen.Main Done
Traceback (most recent call last):
  File "/home/tfoote/work/ros2/parameters/install/lib/rosidl_typesupport_connext_cpp/rosidl_typesupport_connext_cpp", line 75, in <module>
    sys.exit(main())
  File "/home/tfoote/work/ros2/parameters/install/lib/rosidl_typesupport_connext_cpp/rosidl_typesupport_connext_cpp", line 61, in main
    service_specs,
  File "/home/tfoote/work/ros2/parameters/install/lib/python3.4/site-packages/rosidl_typesupport_connext_cpp/__init__.py", line 86, in generate_dds_connext_cpp
    _modify(plugin_cxx_filename, unbounded_fields, _step_2_1_and_2_3_and_2_4)
  File "/home/tfoote/work/ros2/parameters/install/lib/python3.4/site-packages/rosidl_typesupport_connext_cpp/__init__.py", line 94, in _modify
    modified = callback(unbounded_fields, lines)
  File "/home/tfoote/work/ros2/parameters/install/lib/python3.4/site-packages/rosidl_typesupport_connext_cpp/__init__.py", line 144, in _step_2_1_and_2_3_and_2_4
    modified |= _step_2_3(unbounded_fields, lines)
  File "/home/tfoote/work/ros2/parameters/install/lib/python3.4/site-packages/rosidl_typesupport_connext_cpp/__init__.py", line 188, in _step_2_3
    dds_type = _get_dds_type(unbounded_fields, field_name)
  File "/home/tfoote/work/ros2/parameters/install/lib/python3.4/site-packages/rosidl_typesupport_connext_cpp/__init__.py", line 250, in _get_dds_type
    idl_type = MSG_TYPE_TO_IDL[field.type.type]
KeyError: 'ParamDescription'

With a little debugging I localized it to this workaround for unbounded arrays: https://github.com/ros2/rmw_connext/blob/master/rosidl_typesupport_connext_cpp/rosidl_typesupport_connext_cpp/__init__.py#L76-L86

It is using a strict dictionary of primitive types. It needs to be robust to unbounded non-primitive types.

There is a more complete method msg_type_to_idl that would probably work, but the logic needs to be updated to support the arbitrary datatypes, not just the list of primitives.

Performance issues with large data

I used this tool https://github.com/ApexAI/performance_test to compare performance of rmw_connext and rmw_fastrtps.

While performance seems ok with small data rmw_connext does not properly work with large data.
It fails at sending 50 4Mb samples per second while rmw_fastrtps can handle 500 samples per second.

This is not a problem with Connext Pro itself as I was using RTIs tool to verify proper performance( https://community.rti.com/downloads/rti-connext-dds-performance-test).

All tests were done using Bouncy following the instructions from here:
https://github.com/ros2/ros2/wiki/Linux-Development-Setup

Results for 4Mb PointClound @ 50 Hz:
Fastrtps Best Effort:
log_PointCloud4m_19-09-2018_16-05-02.pdf
Connext Pro Best Effort:
log_PointCloud4m_19-09-2018_14-30-57.pdf
Connext Pro Reliable:
log_PointCloud4m_19-09-2018_14-34-17.pdf

Results for 4Mb PointClound @ 500 Hz:
Fastrtps Best Effort:
log_PointCloud4m_19-09-2018_16-07-02.pdf
Connext Pro Best Effort:
log_PointCloud4m_19-09-2018_14-50-57.pdf

To run a full performance investigation which also reproduces the results here you can run
python src/performance_test/performance_test/helper_scripts/run_experiment.py as described here: https://github.com/ApexAI/performance_test

[connext_dynamic] finish support for wait_for_service

Currently local changes to publishers and subscriptions are reflected in Connext dynamic, but not local changes to service clients and service servers, and so rmw_service_server_is_available() is still disabled for Connext dynamic (but not for Connext static):

// TODO(wjwwood): remove this once local graph changes are detected.
RMW_SET_ERROR_MSG("not implemented");
return RMW_RET_ERROR;

This can be considered follow on work from: #168

Setup windows debug libraries in Connext_LIBRARIES (cmake module)

The Visual Studio build of RTI Connext generates release and debug libraries (same using a "d" postfix). The issues is not critical as far as you don't need to debug the RTI Connext libraries. And the workaround would be to modify the _excpected_library_base_names in FindConnext.cmake package adding a d at the end.

I would say that the proper way of getting this defined in the Find cmake module would be to use:

set(Connext_LIBRARIES optimized nddsc debug nddscd
                      optimized ...   debug ...)

Something similar is implemented in the FindwxWindows.cmake module

When I tried a quick test for this layout ament is generating the error:

ament_export_libraries() package 'rmw_connext_cpp' passes the build configuration keyword 'debug' as the last exported library

Could be that some parsing/code in the ament package does not support this way of defining the libraries? I need to investigate about it.

use a heuristic to determine whether or not to use asynchronous publishing

See: #183 (comment)

The proposal from the linked pull request would be, use synchronous publishing:

  • if the reliability is BEST_EFFORT (type bounded or unbounded)
  • if the reliability is RELIABLE and the type is bounded and the maximum size is less than MAX_SYNC_PAYLOAD

Where MAX_SYNC_PAYLOAD is some maximum size that can be used without asynchronous publishing.

Use asynchronous publishing:

  • if the reliability is RELIABLE and the type is bounded and the maximum size is more than MAX_SYNC_PAYLOAD
  • if the reliability is RELIABLE and the type is unbounded (has no maximum size)

Something else to consider is whether or not messages with unbounded size can always be published with synchronous publishing, even with reliability as BEST_EFFORT.

Connext is very slow to shutdown

Bug report

Required Info:

  • Operating System:
    • Ubuntu 16.04 AMD64
  • Installation type:
    • Source
  • Version or commit hash:
  • DDS implementation:
    • RTI Connext
  • Client library (if applicable):
    • rclpy-ish

Steps to reproduce issue

RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_cpp talker
(hit Ctrl-C here)

Expected behavior

Talker starts up, publishes some data, then quickly goes away when the user hits Ctrl-C.

Actual behavior

Talker starts up, publishes some data, then takes at least 3 seconds (sometimes longer) to go away after the user hits Ctrl-C.

Additional information

This seems to get worse with the number of nodes in the process. For the composition demos, for instance, it almost seems like it takes 3-5 seconds for each node loaded into the process.

[rmw_connext_dynamic] "heap" corruption of the connext dynamic test_subscriber after receiving a message

The pub-sub test for connext dynamic on Windows fails.

Running the test_publisher works, and so does running the test_subscriber. But when run together the `test_subscriber crashes (on shutdown it looks like) after receiving the first message.

This is the error:

Unhandled exception at 0x00007FFF1ED30F20 (ntdll.dll) in test_subscriber__rmw_connext_dynamic_cpp.exe: 0xC0000374: A heap has been corrupted (parameters: 0x00007FFF1ED6DD40).

This is the back trace:

    nddscpp.dll!DDSGuardCondition::`vector deleting destructor'(unsigned int)   C++
>   rmw_connext_dynamic_cpp.dll!rmw_destroy_guard_condition(rmw_guard_condition_t * guard_condition) Line 755   C++
    test_subscriber__rmw_connext_dynamic_cpp.exe!rclcpp::executor::Executor::~Executor() Line 51    C++
    test_subscriber__rmw_connext_dynamic_cpp.exe!rclcpp::executors::single_threaded_executor::SingleThreadedExecutor::~SingleThreadedExecutor() Line 45 C++
    test_subscriber__rmw_connext_dynamic_cpp.exe!rclcpp::spin(std::shared_ptr<rclcpp::node::Node> & node_ptr) Line 77   C++
    test_subscriber__rmw_connext_dynamic_cpp.exe!main(int argc, char * * argv) Line 41  C++

This looks like a double free. I'm looking into a way to fix it.

Import of symbols is not working fine on 5.2 Community and VS2015

We need to use some flags to get proper visibility from headers in the 5.2 Community version of RTI connext. The following patch implement the ones needed in rmw_connext and rmw_connext_dynamic.

diff --git a/connext_cmake_module/cmake/Modules/FindConnext.cmake b/connext_cmak
index 087a563..aac2436 100644
--- a/connext_cmake_module/cmake/Modules/FindConnext.cmake
+++ b/connext_cmake_module/cmake/Modules/FindConnext.cmake
@@ -183,7 +183,12 @@ if(NOT "${_NDDSHOME} " STREQUAL " ")
   set(Connext_LIBRARY_DIR "${Connext_LIBRARY_DIRS}")

   if(WIN32)
-    set(Connext_DEFINITIONS "RTI_WIN32" "NDDS_DLL_VARIABLE")
+    set(Connext_DEFINITIONS "RTI_WIN32"
+                            "NDDS_DLL_VARIABLE"
+                            "RTI_dds_c_DLL_VARIABLE"
+                            "RTI_dds_cpp_DLL_VARIABLE"
+                            "RTI_log_DLL_VARIABLE")
+
     # This will be a .bat file and it will be on the PATH.
     set(Connext_DDSGEN2 "rtiddsgen.bat")
   else()

dynamic service server segfaults on Windows

The test_server test in the examples package crashes on Windows with this:

Unhandled exception at 0x00007FFF11C84724 (rmw_connext_xtypes_dynamic_cpp.dll) in test_server__rmw_connext_xtypes_dynamic_cpp.exe: 0xC0000005: Access violation reading location 0x0000000000000000.

This null pointer access occurs at this line: https://github.com/ros2/rmw_connext/blob/master/rmw_connext_dynamic_cpp/src/functions.cpp#L976

Our guess is to why is that the structure being referenced here is a static const struct in a shared library which initializes by calling a static function on another library. This introduces an initialization race condition which causes an issue on Windows. Basically since the static member is not initialized when the other static member is initialized it produces an invalid structure.

So the fix seems to be to just move the initialization of the struct's members to the first access at run time.

I'll open a pr against rosidl_dds.

Which version of Connext DDS is used/supported in ROS 2

Due to problems with Opensplice I'm evaluating other DDS implementations.

I took a closer look at https://www.rti.com/products/ .
There are several versions of Connext DDS like Professional, Secure, Micro, Cert.
Which of them is supported with ros 2 ?

As far as i can see rti connext is available for raspberry pis. ( https://community.rti.com/content/forum-topic/howto-run-rti-connext-dds-raspberry-pi )
Do you know of someone who already tried to run ROS 2 with rit connext on a raspberry pi?

wait_for_service not being woken by graph events

While investigating the appropriate timeout for ros2/system_tests#259, I noticed a correlation between the timeout used in wait_for_service calls (20s) and the time taken for tests to run successfully.

The tests that have two wait_for_service calls with timeouts of 20s each take one of 6, 26, or 46 seconds to run. Change the wait_for_service call to each be 30s each and the tests take one of 6, 36, or 66 seconds to run. Change the wait_for_service call to be multiple 1s wait_for_service calls and the tests never take longer than 9s.

Note that the tests still pass, they just spend an unnecessary amount of time in the wait_for_service calls, presumably because the waitset is not triggered by any graph event of the service coming up.

Given that wait_for_service passes in the end, my money is on the graph event triggering before we wait on the waitset. Therefore we are waiting for something that has already occurred.

We have come across this in rmw_fastrtps_cpp before: what we need is an equivalent to ros2/rmw_fastrtps#147, which prevents guard conditions from being triggered between the time we check them to decide if we should wait, and the time we actually wait.

This seems related to #201 but distinct in that this is a race condition in services showing up as opposed to #201 being a race condition in services going away.

Testing for Beta2 coverage demo_nodes_cpp parameter tools are not working correctly on Windows

I am running the parameter nodes in demo_nodes_cpp and under connext they are not working correctly on Windows. They appear to be hanging. The events executables provide some output after the Ctrl-C but not the full equivalent of the fastrtps runs. I've waited 30+ seconds and it still responds quickly immediately after the Ctrl-C.

Output of events after Ctrl-C:
parameter_events_after_ctrlc_sync
Async version:
parameter_events_after_ctrlc
Simple set and get just hangs:
set_and_get_parameters_connext
For reference fastrtps running the same sample in the same workspace:
set_and_get_parameters_fastrtps

[rmw_connext_cpp] Cannot create secure nodes with current usage of partitions

Found that out when investigating ros2/sros2#32 (comment)

As all our nodes now start with parameters enabled. They create a set of services topics by default:

get_parametersReply
get_parametersRequest
get_parameter_typesReply
get_parameter_typesRequest
list_parametersReply
list_parametersRequest
describe_parametersReply
describe_parametersRequest
set_parametersReply
set_parametersRequest

These topics are using the partitions using the previx defined in the design doc and the node name (rq/<NODE_NAME> for requests and rr/<NODE_NAME> for replies)

If we define the access policies to match this, e.g.

          <partitions>
            <partition>rq/talker</partition>
          </partitions>
          <topics>
            <topic>get_parametersRequest</topic>
          </topics>

The node creation fails:

RTI_Security_AccessControl_check_create_datawriter:endpoint not allowed: no rule found; default DENY
DDS_DomainParticipantTrustPlugins_getLocalDataWriterSecurityState:!security function check_create_datawriter
DDS_DataWriter_create_presentation_writerI:ERROR: Failed to get local datawriter security state
DDS_DataWriter_createI:!create PRESPsWriter
DDS_Publisher_create_datawriter_disabledI:!create DataWriter
DDSDataWriter_impl::createI:!create writer
initialize:!create DataWriter
connext::details::EntityUntypedImpl::initialize:!failed (see previous errors)

>>> [rcutils|error_handling.c:155] rcutils_set_error_state()
This error state is being overwritten:

  'C++ exception during construction of Requester, at /home/mikael/work/ros2/current_ws/build_debug_isolated/rcl_interfaces/rosidl_typesupport_connext_cpp/rcl_interfaces/srv/dds_connext/get_parameters__type_support.cpp:98'

with this new error message:

  'failed to create requester, at /home/mikael/work/ros2/current_ws/src/ros2/rmw_connext/rmw_connext_cpp/src/rmw_client.cpp:139'

rcutils_reset_error() should be called after error handling to avoid this.
<<<
terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
  what():  could not create client: failed to create requester, at /home/mikael/work/ros2/current_ws/src/ros2/rmw_connext/rmw_connext_cpp/src/rmw_client.cpp:139, at /home/mikael/work/ros2/current_ws/src/ros2/rcl/rcl/src/rcl/client.c:174

This is due to that fact that the partition is set after the requester is created. So the requester has an empty partition when the access rules are being checked.
requester creation here:

requester = callbacks->create_requester(
participant, service_str, &datareader_qos, &datawriter_qos,
reinterpret_cast<void **>(&response_datareader),
reinterpret_cast<void **>(&request_datawriter),
&rmw_allocate);

partition set here:

dds_publisher->set_qos(publisher_qos);

This should be fixed as soon as we get rid of the use of partitions for topic namespacing, at that point we should remove the whitelist for empty partitions here:
https://github.com/ros2/sros2/blob/69ee5b691604cebc8af822db359bba7c67a9df7d/sros2/api/__init__.py#L348-L349

@Karsten1987 @rohitsalem @ruffsl FYI

spin_node_once() can deadlock

It looks like spin_node_once() can get lost and never return to the caller.

Reproduction:

cd build/test_rclcpp
while ./gtest_intra_process__rmw_connext_cpp; do true; done

Update: I'm not sure whether it matters, but I was also running stress -c 8 in parallel (on my 8-core Linux box).

Eventually the test hangs with this output:

Running main() from gtest_main.cc
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from test_intra_process_within_one_node
[ RUN      ] test_intra_process_within_one_node.nominal_usage
spin_node_once(nonblocking) - no callback expected
spin_node_some() - no callback expected
spin_node_once() - callback (1) expected - try 1/2

Stacktrace from attaching gdb to the deadlocked process:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fd97eb1b09b in RTIOsapiSemaphore_take () from /usr/lib/libnddscore.so.5
#2  0x00007fd97ed2b3c7 in PRESWaitSet_wait () from /usr/lib/libnddscore.so.5
#3  0x00007fd97f449e6d in DDS_WaitSet_waitI () from /usr/lib/libnddsc.so.5
#4  0x00007fd97e845ba3 in DDSWaitSet_impl::wait(DDSConditionSeq&, DDS_Duration_t const&) () from /usr/lib/libnddscpp.so.5
#5  0x00007fd98028bcd8 in wait<ConnextStaticSubscriberInfo, ConnextStaticServiceInfo, ConnextStaticClientInfo> (
    subscriptions=0x7ffe441bfb80, guard_conditions=0x7ffe441bfbb0, services=0x7ffe441bfb90, clients=0x7ffe441bfba0, 
    wait_timeout=0x0) at /home/gerkey/ros2_ws/install/include/rmw_connext_shared_cpp/shared_functions.hpp:259
#6  0x00007fd980288465 in rmw_wait (subscriptions=0x7ffe441bfb80, guard_conditions=0x7ffe441bfbb0, services=0x7ffe441bfb90, 
    clients=0x7ffe441bfba0, wait_timeout=0x0) at /home/gerkey/ros2_ws/src/ros2/rmw_connext/rmw_connext_cpp/src/functions.cpp:787
#7  0x00007fd9808fbb90 in rclcpp::executor::Executor::wait_for_work (this=0x7ffe441bffa0, timeout=...)
    at /home/gerkey/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:370
#8  0x00007fd9808fcd00 in rclcpp::executor::Executor::get_next_executable (this=0x7ffe441bffa0, timeout=...)
    at /home/gerkey/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:523
#9  0x00007fd9808fab1c in rclcpp::executor::Executor::spin_once (this=0x7ffe441bffa0, timeout=...)
    at /home/gerkey/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:129
#10 0x00007fd9808fa61f in rclcpp::executor::Executor::spin_node_once_nanoseconds (this=0x7ffe441bffa0, node=..., timeout=...)
    at /home/gerkey/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:97
#11 0x0000000000485649 in rclcpp::executor::Executor::spin_node_once<std::ratio<1l, 1000l> > (this=0x7ffe441bffa0, node=..., 
    timeout=...) at /home/gerkey/ros2_ws/install/include/rclcpp/executor.hpp:115
#12 0x0000000000480e3e in test_intra_process_within_one_node_nominal_usage_Test::TestBody (this=0xc35940)
    at /home/gerkey/ros2_ws/src/ros2/system_tests/test_rclcpp/test/test_intra_process.cpp:77
#13 0x00000000004b0daa in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0xc35940, 
    method=&virtual testing::Test::TestBody(), location=0x4bdb3b "the test body") at /usr/src/gtest/src/gtest.cc:2090
#14 0x00000000004ac40a in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0xc35940, 
    method=&virtual testing::Test::TestBody(), location=0x4bdb3b "the test body") at /usr/src/gtest/src/gtest.cc:2126
#15 0x0000000000499f2d in testing::Test::Run (this=0xc35940) at /usr/src/gtest/src/gtest.cc:2162
#16 0x000000000049a632 in testing::TestInfo::Run (this=0xc35180) at /usr/src/gtest/src/gtest.cc:2338
#17 0x000000000049ab8e in testing::TestCase::Run (this=0xc35610) at /usr/src/gtest/src/gtest.cc:2445
#18 0x000000000049f634 in testing::internal::UnitTestImpl::RunAllTests (this=0xc352b0) at /usr/src/gtest/src/gtest.cc:4243
#19 0x00000000004b1ceb in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
    object=0xc352b0, 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x49f3c6 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x4be678 "auxiliary test code (environments or event listeners)")
    at /usr/src/gtest/src/gtest.cc:2090
#20 0x00000000004ad2b2 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
    object=0xc352b0, 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x49f3c6 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x4be678 "auxiliary test code (environments or event listeners)")
    at /usr/src/gtest/src/gtest.cc:2126
#21 0x000000000049e59f in testing::UnitTest::Run (this=0x6e3c80 <testing::UnitTest::GetInstance()::instance>)
    at /usr/src/gtest/src/gtest.cc:3880
#22 0x00000000004bb543 in main (argc=1, argv=0x7ffe441c0438) at /usr/src/gtest/src/gtest_main.cc:38

"wrong type writer" error if topics with same token have different types

I have a node which subscribes to topic data and publishes reception rate data on topic reception_rate/data. Those topics have different types.

I get a TDataWriter::narrow:ERROR: Bad parameter: wrong type writer error when I try to publish, but only if it's publishing to */data. If it publishes to reception_rate/data_ there's no error. I suspect that the typesupport is being mixed up with the other topic's.

Here's how to reproduce it with two publishers in the same node:

import sys
from time import sleep

import rclpy

from std_msgs.msg import Int64, String


def main(args=None):
    if args is None:
        args = sys.argv

    rclpy.init(args=args)

    node = rclpy.create_node('talker')

    chatter_pub = node.create_publisher(String, 'chatter')
    chatter_pub2 = node.create_publisher(Int64, 'test/chatter')

    msg = String()
    msg2 = Int64()

    i = 1
    while True:
        msg.data = 'Hello World: {0}'.format(i)
        msg2.data = i
        i += 1
        print('Publishing: "{0}"'.format(msg.data))
        chatter_pub.publish(msg)
        chatter_pub2.publish(msg2)
        sleep(1)


if __name__ == '__main__':
    main()

Behaviour with fastrtps (matches expected behaviour):

$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 run demo_nodes_py talker
Publishing: "Hello World: 1"
Publishing: "Hello World: 2"
Publishing: "Hello World: 3"
Publishing: "Hello World: 4"

$ ros2 topic list --show-types
/chatter [std_msgs/String]
/test/chatter [std_msgs/Int64]

Behaviour with connext (fresh daemon):

$ RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_py talker
RTI Data Distribution Service EVAL License issued to OSRF [email protected] For non-production use only.
Expires on 16-Jul-2017 See www.rti.com for more information.
Publishing: "Hello World: 1"
TDataWriter::narrow:ERROR: Bad parameter: wrong type writer

$ ros2 topic list --show-types
/chatter [std_msgs/String]
/test/chatter [std_msgs/String]

That last output showing /test/chatter having type std_msgs/String is suspicious.

CMake module is not working for MSVC 2015

The FindConnext.cmake was not detecting my home made installation of RTI Connext which mimic the 5.1 and 5.2 RTI official directory layout. I would propose the following patch:

diff --git a/connext_cmake_module/cmake/Modules/FindConnext.cmake b/connext_cmake_module/cmake/Modules/FindConnext.cmake
index 6180438..7fb3df5 100644
--- a/connext_cmake_module/cmake/Modules/FindConnext.cmake
+++ b/connext_cmake_module/cmake/Modules/FindConnext.cmake
@@ -166,27 +166,6 @@ if(NOT "${_NDDSHOME} " STREQUAL " ")
     endif()
   endwhile()

-  if(_matched_VS2015)
-    set(_i 0)
-    while(TRUE)
-      list(LENGTH _libs _length)
-      if(NOT ${_i} LESS ${_length})
-        break()
-      endif()
-      list(GET _libs ${_i} _lib)
-      set(_match TRUE)
-      string(FIND "${_lib}" "VS2015" _found)
-      if(NOT ${_found} EQUAL -1)
-        set(_match FALSE)
-      endif()
-      if(_match)
-        math(EXPR _i "${_i} + 1")
-      else()
-        list(REMOVE_AT _libs ${_i})
-      endif()
-    endwhile()
-  endif()
-

The patch removes an extra check for MSVC2015 which seems not needed to me since the check is already done in previous code. And it is particularly buggy in checking the presence VS2015 string. Removing the patch worked fine for me in my tests.

catch error before nullptr access segfault

Bug report

Required Info:

  • Operating System: Windows 10
  • Installation type: from source
  • Version or commit hash: 188147a
  • DDS implementation: RTI Connext
  • Client library (if applicable): N/A

Steps to reproduce issue

Visual code inspection

Expected behavior

Return value of _DataReader::narrow call should be checked for nullptr

Actual behavior

rosidl_typesupport_connext_cpp/resource/msg__type_support.cpp.em(167): no check
rosidl_typesupport_connext_cpp/resource/msg__type_support.cpp.em(251): no check
rosidl_typesupport_connext_c/resource/msg__type_support_c.cpp.em(279): OK, see this commit
rosidl_typesupport_connext_c/resource/msg__type_support_c.cpp.em(439): no check

Additional information

In my previous company I have seen RTI Connext fails on this call. I highly recommend to test all returned pointers for NULL.


Feature request

Feature description

Implementation considerations

Please repeat this fix for all narrow calls:
9ff3275

segfault in all connext request-response tests on Windows

All of the connext based request-response tests and examples crash in the std::string implementation with this error:

Unhandled exception at 0x00007FFF11C61332 (vcruntime140d.dll) in test_server__rmw_connext_cpp.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

The back trace (of the connext version of test_server):

>   userland_msgs__rosidl_typesupport_connext_cpp.dll!std::char_traits<char>::copy(char * _First1, const char * _First2, unsigned __int64 _Count) Line 529  C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right, unsigned __int64 _Roff, unsigned __int64 _Count) Line 1132  C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right) Line 1115   C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::operator=(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right) Line 1003    C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!connext::ReplierParams<userland_msgs::dds_::AddTwoIntsRequest_,userland_msgs::dds_::AddTwoIntsResponse_>::service_name(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & service_name) Line 72 C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!userland_msgs::service_type_support::create_replier__AddTwoInts(void * untyped_participant, const char * service_name, void * * untyped_reader) Line 74   C++
    rmw_connext_cpp.dll!rmw_create_service(const rmw_node_t * node, const rosidl_service_type_support_t * type_support, const char * service_name) Line 640 C++
    test_server__rmw_connext_cpp.exe!rclcpp::node::Node::create_service<userland_msgs::AddTwoInts>(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & service_name, std::function<void __cdecl(std::shared_ptr<rmw_request_id_t> const &,std::shared_ptr<userland_msgs::AddTwoIntsRequest_<std::allocator<void> > > const &,std::shared_ptr<userland_msgs::AddTwoIntsResponse_<std::allocator<void> > > &)> callback_with_header, std::shared_ptr<rclcpp::callback_group::CallbackGroup> group) Line 219  C++
    test_server__rmw_connext_cpp.exe!main(int argc, char * * argv) Line 36  C++

It's my belief that this is related to ros2/ros2#31, because when researching that issue I remember a discussion online about how the libc++ developers intentionally caused a link time error even though they could have made it work in order to avoid a run time error which would occur when creating a std::string in one library and passing it to a library built with a different version. Here is the summary:

http://stackoverflow.com/questions/8454329/why-cant-clang-with-libc-in-c0x-mode-link-this-boostprogram-options-examp/8457799#8457799

In order to turn this run time crash into a link time error, libc++ uses a C++11 language feature called inline namespace to change the ABI of std::string without impacting the API of std::string. That is, to you std::string looks the same. But to the linker, std::string is being mangled as if it is in namespace std::__1. Thus the linker knows that std::basic_string and std::__1::basic_string are two different data structures (the former coming from gcc's libstdc++ and the latter coming from libc++).

And I think that the VS2013 - VS2015 Preview headers do not use this trick, but potentially do change the implementation of basic_string, which would cause an issue. This might be expected since they do not encourage you to use binaries built with one VC with binaries built from another.

So, it looks to me that we have no bug in our code, but simply that we need newer or from source versions of the RTI libraries to make this work.

detect participant-local graph changes

For reference:

Basically, any newly created DataWriters and DataReaders generate an entry on a "builtin" DDS topic. In OpenSplice you get all notifications, but Connext follows a section of the DDS spec that says locally created (created in the same participant) DataWriters and DataReaders don't generate entries (the specifics are in the above links).

So for us to get notifications of local changes, we'll need to maintain some state ourselves.

Because of this I disabled the rcl_service_server_is_available function for Connext and Connext Dynamic:

I also disabled the related tests in rcl (which should be re-enabled after they're fixed):

Make use of ndds_namespace_cpp.h

Feature request

Feature description

Make use of ndds_namespace_cpp.h and the DDS namespace

Implementation considerations

Currently all code within this project uses DDS_ as prefix for all DDS defined types but when using ndds_namespace_cpp.h instead of ndds_cpp.h all DDS defined types are in the DDS namespace. That would simplify the porting of rmw code between the various DDS vendors because only RTI has a DDS_ prefix as alternative mapping for cpp.

subscriptions, services and clients pointer arguments are not checked for NULL

Bug report

Required Info:

Steps to reproduce issue

I did not write a test case but I can if needed

Expected behavior

NA

Actual behavior

NA

Additional information

I will provide a PR.

Solution should be like this:

  // add a condition for each subscriber
  if (subscriptions) {
    for (size_t i = 0; i < subscriptions->subscriber_count; ++i) {
      OpenSpliceStaticSubscriberInfo *subscriber_info =
        static_cast<OpenSpliceStaticSubscriberInfo *>(subscriptions->subscribers[i]);
      if (!subscriber_info) {
        RMW_SET_ERROR_MSG("subscriber info handle is null");
        return RMW_RET_ERROR;
      }
      DDS::ReadCondition *read_condition = subscriber_info->read_condition;
      if (!read_condition) {
        RMW_SET_ERROR_MSG("read condition handle is null");
        return RMW_RET_ERROR;
      }
      rmw_ret_t status = check_attach_condition_error(
        dds_waitset->attach_condition(read_condition));
      if (status != RMW_RET_OK) {
        return status;
      }
    }
  }

@serge-nikulin fyi

master does not compile on Windows

See this latest job: http://54.183.26.131:8080/job/ros2_batch_ci_windows/199/console

This the relevant error:

"C:\Jenkins\workspace\ros2_batch_ci_windows\workspace\build\userland\userland.sln" (default target) (1) ->
"C:\Jenkins\workspace\ros2_batch_ci_windows\workspace\build\userland\add_two_ints_server.vcxproj.metaproj" (default target) (7) ->
"C:\Jenkins\workspace\ros2_batch_ci_windows\workspace\build\userland\add_two_ints_server.vcxproj" (default target) (105) ->
(ClCompile target) -> 
 C:\Jenkins\workspace\ros2_batch_ci_windows\workspace\src\ros2\examples\userland\src\add_two_ints_server.cpp(50): error C2668: 'rclcpp::node::Node::create_service': ambiguous call to overloaded function [C:\Jenkins\workspace\ros2_batch_ci_windows\workspace\build\userland\add_two_ints_server.vcxproj]

It is repeated a few times. I guess this is something to do with the most recent changes. At first I thought it was the changes I was testing, but it turned out to be a problem on master too.

The commits tested are:

==> vcs log -l1 src
..................
[ ... ]
=== src\ros2\examples (git) ===
commit c7b5e7780170549dee4e36df394697d455f6ec03
Merge: 2472ad3 2a61abf
Author: Esteve Fernandez <[email protected]>
Date:   Wed Apr 22 14:05:41 2015 -0700

    Merge pull request #15 from ros2/request-header

    Pass request header to callbacks
=== src\ros2\launch (git) ===
commit 47bb7510e0634acde3ac9048b41a525ae83cdc86
Author: Dirk Thomas <[email protected]>
Date:   Mon Apr 20 12:21:45 2015 -0700

    use waitpid when available
=== src\ros2\rcl (git) ===
commit a8962978a2f82059cef2097427e746e349a89d2a
Merge: 02db75d dd5f1de
Author: Dirk Thomas <[email protected]>
Date:   Tue Apr 7 17:04:52 2015 -0700

    Merge pull request #2 from ros2/code_style_uncrustify

    code style only
=== src\ros2\rcl_interfaces (git) ===
commit 645e40ebbfb8e857e46545b419965109989d4929
Author: Esteve Fernandez <[email protected]>
Date:   Wed Apr 22 17:24:18 2015 -0700

    Renamed recurse to recursive
=== src\ros2\rclc (git) ===
commit a2b2292eb34f06bd931e1a70f84368e6c651cd3d
Merge: 9615893 9609603
Author: Dirk Thomas <[email protected]>
Date:   Tue Apr 7 17:04:57 2015 -0700

    Merge pull request #2 from ros2/code_style_uncrustify

    code style only
=== src\ros2\rclcpp (git) ===
commit 8ad1f1f4c5b1ab5b152cc664c56f3c991eaaac4f
Merge: 1bf595d 6b6b94f
Author: Esteve Fernandez <[email protected]>
Date:   Tue Apr 28 15:09:22 2015 -0700

    Merge pull request #25 from ros2/spin-node-until-future-complete

    Added spin_node_until_future_complete
=== src\ros2\rmw (git) ===
commit 89900d73c6ed11c67b8b10e16a9eece4e2265629
Merge: 8524e33 889dce1
Author: Dirk Thomas <[email protected]>
Date:   Tue Apr 28 12:00:17 2015 -0700

    Merge pull request #11 from ros2/typesupport_for_rmw_impl

    export type support for rmw implementation
=== src\ros2\rmw_connext (git) ===
commit 1aece5657d69a34e8c808943105e3bb45a66c2ad
Author: Dirk Thomas <[email protected]>
Date:   Tue Apr 28 12:10:09 2015 -0700

    standardize target suffix
=== src\ros2\rmw_implementation (git) ===
commit d719244878056bc13b3c2381e97b1b618c4372d9
Author: Dirk Thomas <[email protected]>
Date:   Fri Apr 3 12:30:43 2015 -0700

    update license file to keep copyright template
=== src\ros2\rmw_opensplice (git) ===
commit 48e5f4f9dd32d49e2510b0ad99d5560f4fd2820a
Author: Dirk Thomas <[email protected]>
Date:   Tue Apr 28 12:10:02 2015 -0700

    standardize target suffix
=== src\ros2\rosidl (git) ===
commit 47cb8d2918c927956c556df959c4b46e89a3c57e
Author: Dirk Thomas <[email protected]>
Date:   Tue Apr 28 12:10:43 2015 -0700

    function to depend on include directories and libraries of generated interface target
=== src\ros2\rosidl_dds (git) ===
commit e749aecbcc49cc52d89ec0711b9a1af725a4b0eb
Author: Dirk Thomas <[email protected]>
Date:   Tue Apr 28 12:09:54 2015 -0700

    standardize target suffix

Pub/sub fails across different nodes in same process

Branch multiple_nodes in system_tests, package test_rclcpp, illustrates this bug:
https://github.com/ros2/system_tests/tree/multiple_nodes

Case 1 fails for Connext and passes for Opensplice:
node1 and node2 are both added to different executors.
node1 publishes "foo", node2 subscribes to "foo"
node2 publishes "bar", node2 subscribes to "bar"
Both publishers publish 5 times.
0/5 messages are received for both subscribers.

Case 2 fails for Connext and passes for Opensplice:
node1 and node2 are both added to the same executor.
node1 publishes "foo", node2 subscribes to "foo"
node2 publishes "bar", node2 subscribes to "bar"
Both publishers publish 5 times.
0/5 messages are received for both subscribers.

Case 2 passes:
one node publishes "foo", "bar", subscribes to "foo" and "bar"
Both publishers publish 5 times.
5/5 messages are received for both subscribers.

Case 3 passes:
node1 and node2 both added to one executor.
node1 publishes "foo", subscribes to "foo"
node2 publishes "bar", subscribes to "bar"
Both publishers publish 5 times.
5/5 messages are received for both subscribers.

race condition in graph changes and service is available

I noticed this when debugging the flaky test in rcl called test_rcl_service_server_is_available which is in the rcl/test/rcl/test_graph.cpp file:

https://github.com/ros2/rcl/blob/db1353008bff40e87338c95fb46bcb4b85c970d6/rcl/test/rcl/test_graph.cpp#L477

The race seems to be between the graph guard condition being triggered (and waiting wait sets being woken up):

https://github.com/ros2/rcl/blob/db1353008bff40e87338c95fb46bcb4b85c970d6/rcl/test/rcl/test_graph.cpp#L523

And the rcl_service_server_is_available function reporting that a service that was previously available is no longer available:

https://github.com/ros2/rcl/blob/db1353008bff40e87338c95fb46bcb4b85c970d6/rcl/test/rcl/test_graph.cpp#L542

Normally the test only checks this when a change occurs in the graph, but this caused this test to fail with connext periodically. So I added a condition for connext where it will check on each loop regardless of whether or not a graph change was detected:

https://github.com/ros2/rcl/blob/db1353008bff40e87338c95fb46bcb4b85c970d6/rcl/test/rcl/test_graph.cpp#L525-L538

The rcl_service_server_is_available function normally reported the right state on the next loop. This special case for connext should be removed after this is fixed.

This could be caused by graph changes getting combined through some sort of coalescing of events or it could be a delay introduced by connext, I'm not sure yet. I've decided to work around and document the issue rather than solve it now.

[connext_dynamic] currently broken

As a regression of #194 the dynamic rmw implementation doesn't function anymore. Even for simple pub/sub examples the executables crash in the new add_information calls.

Update the hook to call 5.2 VS2015

Simple fix to call the 5.2 instead of 5.1

--- a/connext_cmake_module/env_hook/connext.bat.in
+++ b/connext_cmake_module/env_hook/connext.bat.in
@@ -1,7 +1,7 @@
 set "Connext_HOME=@Connext_HOME@"

 :: Call RTI's env setup script, piping stdout to nul, since they have echo on.
-call "%Connext_HOME:/=\%\..\rti_set_env_5.1.0.bat" 1> nul
+call "%Connext_HOME:/=\%\..\rti_set_env_5.2.0.bat" 1> nul

 :: Add the Connext_LIBRARY_DIR to the Path so .dll's can be found at runtime.
 set "Connext_LIBRARY_DIR=@Connext_LIBRARY_DIR@"

Do something about name length limits

While working on ros2/rclcpp#233, it was discovered that connext can throw an error like this when trying to create a parameter service client:

PRESContentFilteredTopic_createFilterProperty:!copy content filtered property "filter expression" field: reached maximum length for content filter property (current length: 262, max. length: 256). Please consider increasing contentfilter_property_max_length parameter under participant's resource limits.
PRESContentFilteredTopic_associateReader:!copy sequence for content filtered property data
DDS_Subscriber_create_datareader_disabledI:ERROR: Failed to associate reader and content filtered topic
DDSDataReader_impl::create_disabledI:!create reader
DDSDataReader_impl::createI:!create reader
DDSDomainParticipant_impl::create_datareader:ERROR: Failed to create datareader
initialize:!create DataReader
connext::details::EntityUntypedImpl::initialize:!failed (see previous errors)

In this case, the node name was test_parameters_local_synchronous_repeated and the error above was produced when trying to create the test_parameters_local_synchronous_repeated__get_parameter_types client during the construction of a SyncParametersClient, which in turn constructs an AsyncParametersClient, which creates various clients. The service name is only 63 characters long, so presumably there's something else going on that's creating a filter expression that's 262 characters long.

For now, I'm working around the problem in the test by shortening the node name (to test_parameters_local_synch_repeated).

I don't know much about the name handling in DDS generally or connext specifically, so I'm starting by flagging this as an issue. It could be that the fix is in system configuration, not code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.