Giter Club home page Giter Club logo

Comments (6)

PeterJohnson avatar PeterJohnson commented on July 18, 2024

The server is single threaded (it’s an event loop), so that can’t be the cause. Something else must be going on. Probably something isn’t being tracked right in the data structure maintenance, resulting in a null pointer deref or use after free at this code, and the actual cause is elsewhere. How rapidly are we talking about? Is this during unit testing? We of course want to fix the crash, but in general, creating and closing publishers should be done rarely.

from allwpilib.

brettle avatar brettle commented on July 18, 2024

Thanks for the rapid response. I'll continue to investigate and see if I can create a small reproducible test case.

For context, we have an NT client that is reporting kinematics info for objects in a simulation that is controlled by a JUnit5 test that is acting as the NT server. So, this is a system test, not a unit test. The server/test expresses interest in an object by publishing an empty (not null) double[] to a topic named for the object. The simulation/client listens for such a publication and responds by publishing the relevant info to the same topic. At the moment, both publications are created and closed each timestep and the timestep is currently effectively limited to 5ms by the hardcoded NT limits. I realize that isn't ideal and intend to improve it, but I was expecting performance issues not crashes to be the result. Also, the topic is published as noncached, and for all pubs and subs I'm using PubSubOption.periodic(Double.MIN_VALUE), PubSubOption.keepDuplicates(true), PubSubOption.sendAll(true).

from allwpilib.

brettle avatar brettle commented on July 18, 2024

The crashes also seem to be associated with seeing the warning from this line:

WARN("client {} duplicate publish of pubuid {}", m_id, pubuid);

I'm assuming that is occurring because the unpublish is delayed getting to the server (due to the limited wire flush rate) until after a subsequent publish has been made. I can reproduce the warnings in a simple test case. Let me know if that is of interest.

from allwpilib.

PeterJohnson avatar PeterJohnson commented on July 18, 2024

All of that should be done FIFO, so the unpublish should be processed first, then the publish. A test case for that would be great.

from allwpilib.

brettle avatar brettle commented on July 18, 2024

Test case that produces warnings can be found here.

from allwpilib.

brettle avatar brettle commented on July 18, 2024

I have a theory about the cause of this bug and would like to know if I'm barking up the wrong tree.

In ClientImpl::HandleLocal():

if (Unpublish(msg->pubHandle, msg->topicHandle)) {
m_outgoing.SendMessage(msg->pubHandle, std::move(elem));

Unpublish() calls:

m_outgoing.EraseHandle(pubHandle);

which "erases" the handle from m_handleMap. Then when HandleLocal() calls m_outgoing.SendMessage() it tries to retrieve the handle info from the map here:

m_queues[m_handleMap[handle].queueIndex].Append(handle,
std::forward<T>(msg));

Since, the handle is no longer in the map, it adds it with default handle info, specifically a queueIndex of 0, which refers to the default 100ms queue. If the publisher was using that default period, SendMessage() will add the unpublish message to the correct queue and everything will be fine. But if the publisher is using a non-default period, SendMessage() will add the unpublish message to the 100ms queue even though the publish message and value messages may be in a different queue. This can result in duplicate publications if the pubid is reused by the client for a new publication and the associated publish message is sent before the unpublish message that is in the default queue. Make sense?

Creating a simple test case that is guaranteed to reproduce the behavior using the API doesn't seem to be feasible since it is dependent on the the exact timing of the client thread which is not exposed through the API.

from allwpilib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.