Giter Club home page Giter Club logo

loony's Introduction

"Don't let me block you" - Araq

"We didn't have a good story about migrating continuations between threads." - Disruptek

Have you ever asked yourself what would a lock-free MPMC queue look like in nim?

What about a lock-free MPMC queue designed on an algorithm built for speed and memory safety?

What about that algorithm implemented by some loonatics?

Enter Loony

"C'mon man... 24,000 threads and 500,000,000 continuations... which are written in "normal" nim." - Disruptek

"OK, time to get my monies worth from all my cores" - saem

"My eyes are bleeding" - cabboose

About

A massive thank you to the author Oliver Giersch who proposed this algorithm and for being so kind as to have a look at our implementation and review it! We wish nothing but the best for the soon to be Dr Giersch.

Loony is a 100% Nim-lang implementation of the algorithm depicted by Giersch & Nolte in "Fast and Portable Concurrent FIFO Queues With Deterministic Memory Reclamation".

The algorithm was chosen to help progress the concurrency story of CPS for which this was bespokingly made.

After adapting the algorithm to nim CPS, disruptek adapted the queue for any ref object and was instrumental in ironing out the bugs and improving the performance of Loony.

What can it do

While the following is possible; this is only by increasing the alignment our 'node' pointers to 16 which would invariably effect performance.

  • Lock-free consumption by up to 32,255 threads
  • Lock-free production by up to 64,610 threads

You can use -d:loonyNodeAlignment=16 or whatever alignment you wish to adjust the contention capacity.

With the 11 bit aligned implementation we have:

  • Lock-free consumption up to 512 threads
  • Lock-free production up to 1,025 threads
  • Memory-leak free under ARC
  • Can pass ANY ref object between threads; however:
    • Is perfectly designed for passing Continuations between threads
  • 0 memory copies

Issues

Loony queue only works on ARC.

ORC is not supported (See Issue #4)

Spawn (Nims threadpool module) is not supported. It won't ever be; don't use spawn unless you are a masochist in general.

Installation

Download with nimble install loony (CPS dependency for tests) or directly from the source.

How to use

Simple.

First, ensure you compile with arc and threads (--gc:arc --threads:on)

Then:

import pkg/loony

type TestRef = ref object

let loonyQueue = newLoonyQueue[TestRef]()
# Loony takes reference pointers and sends them safely!

var aro = new TestRef

loonyQueue.push aro
# Enqueue objects onto the queue
# unsafePush is available, see MemorySafety & Cache Coherance below!

var el = loonyQueue.pop
# Dequeue objects from the queue
# unsafePop is available, see MemorySafety & Cache Coherance below!

There is now a preliminary sublibrary which provides state managers called Ward's. These can be imported with import loony/ward. The documentation is found below, tests and further documentation are to follow when time allows.

Documentation

The full API documentation is kept up-to-date on GitHub.

The API documentation for the Ward submodule is found here.

Memory Safety & Cache Coherence

Loony's standard push and pop operations offer cache coherency by using primitives such as atomic_thread_fence to ensure a CPU's store buffer is committed on the push operation and read on the pop operation; this is a higher-cost primitive. You can use unsafePush and unsafePop to manipulate a LoonyQueue without regard to cache coherency for ultimate performance.

Debugging

Pass --d:loonyDebug in compilation or with a config nimscript to use debug procedures and templates.

Warning ⚠️ this switch causes node allocations and deallocations to write on an atomic counter which does marginally effect performance!

echoDebugNodeCounter() # Echos the current number of nodes in the queue

debugNodeCounter:
  # Put things onto the queue an off the queue here! At the
  # end of the template, if the number of nodes you started
  # with does not equal the number of nodes you finished
  # the block with, it will print information that is useful
  # to finding the source of the leak! This template will not
  # do anything if loonyDebug is off.
  discard

Compiler Flags

We recommend against changing these values unless you know what you are doing. The suggested max alignment is 16 to achieve drastically higher contention capacities. Compilation will fail if your alignment does not fit the slot count index.

-d:loonyNodeAlignment=11 - Adjust node alignment to increase/decrease contention capacity -d:loonySlotCount=1024 - Adjust the number of slots in each node

What are Continuations?

If you've somehow missed the next big thing for nim; see CPS

loony's People

Contributors

disruptek avatar shayanhabibi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

loony's Issues

introduce queue state?

I'd like to have some controls on the queue:

  • I want to be able to pause the queue for pop clients (so they receive nil).
  • I want to have a way to clear the entire queue, discarding the contents.
  • I want a backchannel for pop that works while the queue is paused so that I can, for example, tear down items in the queue with extra logic.

Now, maybe the right way to implement this is with a queue manager that sits outside the queue itself, but in any event, these are pretty minor features that we should be able to support for programmer comfort.

[MacOS/Darwin] Emulating Futex behaviour

Issue

Currently I use linux futex and windows waitonaddress to manage backpressure when using Wards and Loony as a threadpool queue.

Mac/Darwin do not have primitives or APIs that are similar.

Solutions

At the moment, I can only think of using conditions and locks.

SIGSEGV: Illegal storage access. (Attempt to read from nil?)

Using a slightly modified example code for more consumer threads my system throws SIGSEGV error.

import std/[locks, os, atomics]
import loony

type
  Message = ref object
    value: string

let fifo = newLoonyQueue[Message]()
var terminate: Atomic[bool]


proc producer() {.thread.} =
  var n: int = 0
  for i in 1..1000000:
    n.inc()
    let msg = Message(value: "Message " & $i)
    if n > 10000:
      echo "Producing ", repr(msg)
      n = 0
    fifo.push msg
    #sleep(0)
  terminate.store(true)

proc consumer() {.thread.} =
  var n: int = 0
  while true:
    let item = fifo.pop
    if not item.isNil:
        n.inc()
        if n > 10000:
            echo "Consumed: ", repr(item)
            n = 0
    else:
        if terminate.load:
            break
    #sleep(0)

# Create worker threads
var producerThread: Thread[void]
var consumerthreads: array[512, Thread[void]]

const THREADS: int = 14

# Start worker threads
createThread(producerThread, producer)
for t in 0..<THREADS:
    echo "create thread", t
    consumerthreads[t].createThread(consumer)

joinThread(producerThread)

for t in 0..<THREADS:    
    joinThread(consumerthreads[t])

[ORC] Crash when GC attempts to collect element

Orc crashes when a Continuation (or any ref object) passed through the queue reaches the end of its life span and the final =destroy is run.

This is evident by incrementing the ref count of an object by 1 before the end of its lifespan and comparing the behaviour before and after.

ARC does not share this issue; the final =destroy can be called without issue and the memory is collected appropriately.

problem with strings, something I should know about?

Hi, I'm quite new to Nim and I work on it occasionally during weekends.

I'm testing loony for a simple double thread producer/consumer paradigm. Here's the code:

import std/[locks, os]
import loony

type
  Message = object
    value: string

let fifo:LoonyQueue[Message] = newLoonyQueue[Message]()
var L: Lock
var producing = true

proc producer() {.thread.} =
  for i in 1..10:
    let Message = Message(value: "Message " & $i)
    acquire(L)
    fifo.push Message
    release(L)
    echo "Produced: ", Message
    sleep(100)

proc consumer() {.thread.} =
  while producing:
    acquire(L)
    let item = fifo.pop
    if item.value != "":
      echo "Consumed: ", item
    release(L)
    sleep(10)

# Create worker threads
var producerThread, consumerThread: Thread[void]

initLock(L)

# Start worker threads
createThread(producerThread, producer)
createThread(consumerThread, consumer)

joinThread(producerThread)
producing = false
joinThread(consumerThread)

deinitLock(L)

It's a simple, documentation-alike, piece of software, but when I run it I get this output:

Produced: (value: "Message 1")
Produced: (value: "Message 2")
Produced: (value: "Message 3")
Consumed: (value: "��\x0Ew\x14�\x03@")
Produced: (value: "Message 4")
Produced: (value: "Message 5")
Consumed: (value: "��\x0Ew\x14�\x03@")
Produced: (value: "Message 6")
Produced: (value: "Message 7")
Consumed: (value: "��\x0Ew\x14�\x03@")
Produced: (value: "Message 8")
Produced: (value: "Message 9")
Consumed: (value: "��\x0Ew\x14�\x03@")
Produced: (value: "Message 10")
Consumed: (value: "��\x0Ew\x14�\x03@")

I would like to talk about the weird codes I get from the popped Messages' values and asking if there's something I don't know about strings that explain this.
I also tested the same code using a LoonyQueue[string] directly, and in this case the code didn't compile at all

Using MacOS Catalina 10.15.7, nim 1.6.12, latest loony

Example code errors

Hi, I tried to use the example code from the readme (https://github.com/nim-works/loony#how-to-use) but I got some errors

Hint: used config file '/home/---/.choosenim/toolchains/nim-#devel/config/nim.cfg' [Conf] Hint: used config file '/home/---/.choosenim/toolchains/nim-#devel/config/config.nims' [Conf] ................................................................ /home/---/Desktop/Nim/test.nim(10, 11) template/generic instantiation ofpushfrom here /home/---/.nimble/pkgs/loony-0.1.1/loony.nim(177, 27) template/generic instantiation offetchIncTailfrom here /home/---/.nimble/pkgs/loony-0.1.1/loony.nim(71, 26) Error: attempting to call undeclared routine: 'fetchAdd'

This happens both on the 0.1.0 and 0.1.1

SIGSEGV on q.isEmpty

Hi,

i see SIGSEGV on calls to queue.isEmpty. The queue is initialized using newLoonyQueue. It has not received any items yet. I tried if- and while - makes no difference. I tried std/atomics and threading/atomics. It feels a bit like a timing/init-problem. Though i never managed to make it work reliably. Goes like this - the whole app runs thru and finishes. Then on the next run it breaks early. On the following start i might run and finish or break in-between.
I use the main-thread to fill the queue and one separate thread to read and process the items from the queue ? A queue-items contains a seq[int], int, string, so nothing special here.

Im on Nim 1.6.6 / macOS

this is my thread-proc ::

proc warden[K, V]( map: Map[K, V] ) {.thread, nimcall.} =

  echo "maintanance:: started"

  while true:

    while map.workQ.isEmpty:
      sleep 500
      continue

    var elem = map.workQ.pop
    #if elem.isNil: break
    echo &"ward :: ", elem.task, " :: ", elem.parent, " - ",elem.slot

    if elem.slot == -1:
      echo "\nmaintanance:: shuts down..\n"
      return

Traceback for :

nim r -d:b32 --gc:arc --threads:on -d:useMalloc -d:debug -d:nimOldCaseObjects Mapi.nim

/Users/asc/sw/nim/ctrie/champ/Mapi.nim(30) warden
/Users/asc/.nimble/pkgs/loony-0.1.12/loony.nim(68) isEmpty
/Users/asc/.choosenim/toolchains/nim-1.6.6/lib/pure/concurrency/atomics.nim(336) load
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Error: execution of an external program failed: '/Users/asc/.cache/nim/Mapi_d/Mapi_558E24AF8FBA3F576C55E835091AA4F6A61AFFB3 '

[NIM] Thread completion triggers thread sanitizer warnings

Thread analyzer determines the deallocshared proc called by a thread on completion is a data-race.

Atm only solution I can come up with involves changing or providing a modified std lib thread library that corrects this behaviour. The safest methodology is having the pointer deallocshared acts on being atomically stored and loaded for destruction? Haven't look too deep into it yet.

Related to #9 and #13

[CHORE] Refactor alias types

Refactor all these alias types like TagPtr to use a distinct type with converters to avoid boilerplate spaghetti as per disruptek

thread sanitizer does not agree with loony

ico@workdoos:/tmp/loony(main)$ nim r  --passC:-fsanitize=thread --passL:-fsanitize=thread --threads:on tests/test.nim 
[...]
🟢 creation and initialization of the queue
FATAL: ThreadSanitizer CHECK failed: ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:223 "((IsLoadOrder(mo))) != (0)" (0x0, 0x0)
    #0 __tsan::TsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/tsan/tsan_rtl_report.cpp:47 (libtsan.so.0+0x972d2)
    #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:78 (libtsan.so.0+0xb513a)
    #2 AtomicLoad<long long unsigned int> ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:223 (libtsan.so.0+0x762e4)
    #3 __tsan_atomic64_load ../../../../src/libsanitizer/tsan/tsan_interface_atomic.cpp:539 (libtsan.so.0+0x762e4)
    #4 load_OOZloonyZnode_116 <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x6baa1)
    #5 advTail_test_334 <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x72289)
    #6 push_test_306 <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x731f4)
    #7 enqueue_test_303 <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x73378)
    #8 NimMainModule <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x819d5)
    #9 NimMainInner <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x797f3)
    #10 NimMain <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x79849)
    #11 main <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x798b9)
    #12 __libc_start_main ../csu/libc-start.c:308 (libc.so.6+0x26d09)
    #13 _start <null> (test_EF21B045952947A3759AD00DE01B49BBBC3590C4+0x94d9)

Error: execution of an external program failed: '/home/ico/.cache/nim/test_d/test_EF21B045952947A3759AD00DE01B49BBBC3590C4 '

Debugging helpers

Using some kind of flag like -d:loonyDebug want to be able to count and relay the number of enqueues and dequeues performed with the use of a queue; allocations and deallocations of nodes; index tracking/alert when reaching overflow etc. I don't mind if it affects performance.

Introduce balls

Make tests with balls.

  • Test single thread low/high volume push and pop ensuring exact retrieval and reallocation of nodes
  • Test threaded (2) independent high volume producer and consumer
  • Test threaded (2) mixed producer and consumer
  • Create suite testing incremental numbers of threads with mixed producer and consumer qualities

[LOONY] Memory Leak 0.1.3

As with all great things, somewhere along the way we screwed up. The nodes memory reclamation algorithm is not correct for some changes that were made while refactoring.

Currently the leak has been corrected up to a degree of ~+90% (with previous test node leaks of about 29 now down to 2+/-2). I’m on the last stretch of correcting the remainder.

you can see #15 for progress

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.