Giter Club home page Giter Club logo

Comments (16)

dougbinks avatar dougbinks commented on June 1, 2024 2

Excellent. I'm merging this into dev then main at the moment.

from enkits.

dougbinks avatar dougbinks commented on June 1, 2024 1

I have a small example close to what I was doing in the main app I can share if it would be useful (though I, unfortunately, couldn't reproduce the problem with this).

It might be useful to see this, even if it can't reproduce the problem.

One thing to check is that you haven't deadlocked. There are a number of ways this can occur, but the most common is a WaitForTask running a task which calls WaitForTask on the parent. This can be avoided by using task priorities (in both the tasks and also in the WaitForTask call to prevent running lower priority tasks) or dependencies (dependencies are much safer and also better performance in general). You should normally be able to see this in a debugger by checking the thread callstacks.

Using std::memory_order_seq_cst might have fixed this by changing the performance in such a way that the deadlock doesn't happen. However I'll also go over the m_WaitingForTaskCount memory order semantics and check whether acquire/release isn't sufficient here.

from enkits.

dougbinks avatar dougbinks commented on June 1, 2024 1

Thanks - it does seem from the above that there isn't a deadlock in your code.

from enkits.

dougbinks avatar dougbinks commented on June 1, 2024 1

Many thanks for running this test!

I am a little puzzled as memory_order_seq_cst is only sequentially consistent with other memory_order_seq_cst operations on other threads - yet in this case there no other such operations on other threads except the threadState variable, which is unused by the TaskCompletion thread. So in this case memory_order_seq_cst and memory_order_acquire should be the same (since the operation is a load).

So I think memory_order_seq_cst fixing the problem likely as a side effect of the code the compiler creates for this.

I've pushed one further change which adds changes the fetch_sub on m_RunningCount read-modify-write operations to use memory_order_acq_rel. This should ensure that when m_WaitingForTaskCount is read by the thread completing the task, m_RunningCount has been decremented and is visible in the WaitForTaskCompletion thread in the correct order.

from enkits.

dougbinks avatar dougbinks commented on June 1, 2024

Thanks for this issue report. I expect this is likely due to an ARM specific issue - getting memory order correct is harder on this platform.

The acquire / release semantics should be all we need here, but I will investigate.

I wouldn't change DISPATCH_TIME_FOREVER to DISPATCH_TIME_NOW as this won't wait on the semaphore, resulting in the idle CPU threads spinning which wastes power and could lower clock rates.

from enkits.

pr0g avatar pr0g commented on June 1, 2024

Hi @dougbinks!

No problem at all and thank you for the speedy reply! 😅 Much appreciated 🙂

I see interesting! Thanks for letting me know, I'd be interested to hear what you find.

Sure thing, that was just a dumb experiment to see what would happen, I have reverted this now 😝

I have a small example close to what I was doing in the main app I can share if it would be useful (though I, unfortunately, couldn't reproduce the problem with this).

Thanks again!

from enkits.

pr0g avatar pr0g commented on June 1, 2024

Sure thing, please see attached files below.

This is the example I was working on (it's doing some extra stuff you can probably ignore in the reduction step but it hopefully might be useful).

Just rename the main.cpp.txt file to main.cpp (GitHub doesn't like the .cpp extension) and to build and run you can do cmake -B build and then cmake --build build.

If you're using a single-config generator add -DCMAKE_BUILD_TYPE=Release etc... in the first step or if you're using a multi-config generator pass --config Release (or some other build type like RelWithDebInfo) to the second command.

Just for some added context this code is based on a filtering algorithm for points in an area. Basically I have a bunch of control points and only some of them are visible (some can be hidden) but I don't want to actually remove them. The ones I draw are the 'filtered' points. I have a mapping from control_points_to_filtered_points and filtered_points_to_control points. The algorithm I was speeding up was the intersection test (which in the snippet here is just control_point_index % 2 == 0). Locally I'd tried playing around adding different sleep amounts (e.g. std::this_thread::sleep_for(...); to simulate the time it really took.

CMakeLists.txt
main.cpp.txt

It's from this little application - https://pollardinefarm.co.uk/farming/a-wood-pasture-tool/

Thanks!

from enkits.

pr0g avatar pr0g commented on June 1, 2024

I was able to go back to the earlier commit and the latest version of enkiTS without my change and I still can get the repro to hit, the call stack looks like this:

image

image

from enkits.

pr0g avatar pr0g commented on June 1, 2024

The other worker threads look like this:

image

Let me know if there's any more info I could share that might be useful. Thanks!

from enkits.

dougbinks avatar dougbinks commented on June 1, 2024

I have created an alternative potential fix for the issue on branch dev_issue_91:

https://github.com/dougbinks/enkiTS/tree/dev_issue_91

Since TaskComplete is called after every task finishes I would prefer not to add a potentially more expensive atomic operation there. WaitForTaskCompletion however is only called when there is a thread with no work to run waiting on another task to complete.

So I've changed the read-modify-write operations to in WaitForTaskCompletion to use memory_order_acq_rel. I don't have an ARM MacOS system to test on though. If you were able to check if this solves the problem that would be very helpful.

from enkits.

pr0g avatar pr0g commented on June 1, 2024

Hi @dougbinks, thanks for taking a look at this.

I just did a quick test and my repro unfortunately still occurs 😖

from enkits.

pr0g avatar pr0g commented on June 1, 2024

See the attached video for what happens

enkits.mp4

from enkits.

pr0g avatar pr0g commented on June 1, 2024

Thanks for the detailed description! Super interesting, I'll aim to test this later this evening and will get back to you as soon as I can 👍

from enkits.

pr0g avatar pr0g commented on June 1, 2024

@dougbinks that worked! 😄 🥳 I can confirm the lock-up no longer happens with commit d1752d0.

Thank you very much for looking into this and finding a fix, much appreciated!

from enkits.

dougbinks avatar dougbinks commented on June 1, 2024

Closing as fixed with d1752d0

Please re-open if you encounter this issue again.

from enkits.

pr0g avatar pr0g commented on June 1, 2024

Closing as fixed with d1752d0

Please re-open if you encounter this issue again.

Awesome! 🥳

Absolutely 👍 Thank you!

from enkits.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.