API reference · (The API was redesigned in version 0.2.0. See API reference for version 0.1.8.)
kcas provides an implementation of atomic lock-free multi-word compare-and-swap (MCAS), which is a powerful tool for designing concurrent algorithms.
Features and properties:
-
Efficient: In the common uncontended case only k + 1 single-word CASes are required per k-CAS.
-
Lock-free: The underlying algorithm guarantees that at least one domain will be able to make progress.
-
Disjoint-access parallel: Unrelated operations progress independently, without interference, even if they occur at the same time.
-
Read-only compares: The algorithm supports obstruction-free read-only compare (CMP) operations that can be performed on overlapping locations in parallel without interference.
-
Composable: Independently developed transactions can be composed with ease.
kcas is published on opam and is distributed under the ISC license.
To use the library
# #require "kcas"
# open Kcas
one first creates shared memory locations:
# let a = Loc.make 0
and b = Loc.make 0
and x = Loc.make 0
val a : int Loc.t = <abstr>
val b : int Loc.t = <abstr>
val x : int Loc.t = <abstr>
One can then manipulate the locations individually:
# Loc.set a 6
- : unit = ()
# Loc.get a
- : int = 6
Attempt primitive operations over multiple locations:
# Op.atomically [
Op.make_cas a 6 10;
Op.make_cas b 0 52
]
- : bool = true
Perform transactions over them:
# Tx.(
commit (
let* a = get a
and* b = get b in
set x (b - a)
)
)
- : unit = ()
And get the answer:
# Loc.get x
- : int = 42
The API of kcas is divided into submodules. The main modules are
-
Loc
, providing an abstraction of shared memory locations, -
Op
, providing an interface for primitive operations over multiple shared memory locations, and -
Tx
, providing composable transactions over shared memory locations.
The following sections discuss each of the above in turn.
The Loc
module is essentially compatible with the Stdlib
Atomic
module, except that a number of
functions take an optional
backoff
as an argument.
In other words, an application that uses
Atomic
, but then needs to perform
atomic operations over multiple atomic locations, could theoretically just
rebind module Atomic = Loc
and then use the
Op
and/or
Tx
APIs
to perform operations over multiple locations. This should not be done
just-in-case, however, as, even though kcas is efficient, it does naturally
have higher overhead than the Stdlib
Atomic
.
The Op
module is probably most suitable when using kcas as a means to design and
implement new lock-free algorithms.
To program with primitive operations one simply makes a list of CAS operations
using
make_cas
and then attempts them using
atomically
.
Typically that needs to be done inside a loop of some kind as such an attempt
can naturally fail.
Let's first
make
two locations representing stacks:
# let stack_a = Loc.make [19]
and stack_b = Loc.make [76]
val stack_a : int list Loc.t = <abstr>
val stack_b : int list Loc.t = <abstr>
Here is a function that can atomically move an element from given source
stack
to the given target
stack:
# let rec move ?(backoff = Backoff.default)
source
target =
match Loc.get source with
| [] -> raise Exit
| (elem::rest) as old_source ->
let old_target = Loc.get target in
let ops = [
Op.make_cas source old_source rest;
Op.make_cas target old_target (elem::old_target)
] in
if not (Op.atomically ops) then
let backoff = Backoff.once backoff in
move ~backoff source target
val move : ?backoff:Backoff.t -> 'a list Loc.t -> 'a list Loc.t -> unit =
<fun>
Note that we also used the
Backoff
module provided by kcas above.
Now we can simply call move
:
# move stack_a stack_b
- : unit = ()
# Loc.get stack_a
- : int list = []
# Loc.get stack_b
- : int list = [19; 76]
As one can see, the API provided by
Op
is
quite low-level and is not intended for application level programming.
The Tx
module provides a higher-level API that is intended to be suitable for both
designing and implementing new lock-free algorithms and as an application level
programming interface for compositional use of such algorithms.
As our first example of using transactions, let's implement a lock-free stack. A stack can be just a shared memory location that holds a list of elements:
# type 'a stack = 'a list Loc.t
type 'a stack = 'a list Loc.t
To create a stack we just
make
a new location with an empty list:
# let stack () : _ stack = Loc.make []
val stack : unit -> 'a stack = <fun>
To push an element to a stack we
modify
the stack to cons the element onto the list:
# let push stack element =
Tx.modify stack @@ List.cons element
val push : 'a list Loc.t -> 'a -> unit Tx.t = <fun>
Popping an element from a stack is a little more complicated as we need to
handle the case of an empty stack. Let's go with a basic approach where we first
get
the content of the stack,
set
it if necessary, and
return
an optional element.
# let try_pop stack = Tx.(
let* content = get stack in
match content with
| [] -> return None
| element :: rest ->
let+ () = set stack rest in
Some element
)
val try_pop : 'a list Loc.t -> 'a option Tx.t = <fun>
Above we used the
let*
and
let+
binding operators to sequence
primitive transactions. We could also implement try_pop
more concisely using
the infix operators
>>=
and
>>.
:
# let try_pop stack = Tx.(
get stack >>= function
| [] -> return None
| element :: rest ->
set stack rest >>.
Some element
)
val try_pop : 'a list Loc.t -> 'a option Tx.t = <fun>
With a couple of useful list manipulation helper functions
# let hd_opt = function
| [] -> None
| element :: _ -> Some element
val hd_opt : 'a list -> 'a option = <fun>
# let tl_safe = function
| [] -> []
| _ :: rest -> rest
val tl_safe : 'a list -> 'a list = <fun>
an even more concise implementation is possible using
update_as
:
# let try_pop stack = Tx.update_as hd_opt stack tl_safe
val try_pop : 'a list Loc.t -> 'a option Tx.t = <fun>
Above,
update_as
is used as a shorthand to both compute the result and the new value for the
stack contents.
If the stack already contained an empty list, []
, all of the above variations
of try_pop
generate a read-only CMP operation in the
obstruction_free
mode. This means that multiple domains may run try_pop
on an empty stack in
parallel without interference. The variation using
update_as
also makes only a single access to the underlying transaction log and is likely
to be the fastest variation.
So, to use a stack, we first need to create it and then we may
commit
transactions to push
and try_pop
elements:
# let a_stack : int stack = stack ()
val a_stack : int stack = <abstr>
# Tx.commit @@ push a_stack 101
- : unit = ()
# Tx.commit @@ try_pop a_stack
- : int option = Some 101
# Tx.commit @@ try_pop a_stack
- : int option = None
As an astute reader you may wonder why push
and try_pop
return transactions
that we then need to separately
commit
to. We'll get to that soon!
Let's then implement a lock-free queue. To keep things simple we just use the traditional two-stack queue data structure:
# type 'a queue = {
front: 'a list Loc.t;
back: 'a list Loc.t
}
type 'a queue = { front : 'a list Loc.t; back : 'a list Loc.t; }
To create a queue we
make
the two locations:
# let queue () = {
front = Loc.make [];
back = Loc.make []
}
val queue : unit -> 'a queue = <fun>
To enqueue we just
modify
the back of the queue and cons
the element to the list:
# let enqueue queue element =
Tx.modify queue.back @@ List.cons element
val enqueue : 'a queue -> 'a -> unit Tx.t = <fun>
Dequeue is again more complicated. First we examine the front of the queue. If
there is an element, we update the front and return the element. If the front is
empty, we examine the back of the queue in rev
erse. If there is an element we
clear the back, move the rest of the elements to the front, and return the
element. Otherwise we return None
as the queue was empty.
# let try_dequeue queue = Tx.(
update queue.front tl_safe >>= function
| element :: _ -> return (Some element)
| [] ->
exchange_as List.rev queue.back [] >>= function
| [] -> return None
| element :: rest ->
set queue.front rest >>.
Some element
)
val try_dequeue : 'a queue -> 'a option Tx.t = <fun>
Above,
update
and
exchange_as
are used as convenient shorthands and to reduce the number of accesses to the
transaction log. If both the front and back locations already contained an empty
list, []
, the above generates read-only CMP operations in the
obstruction_free
mode allowing multiple domains to run try_dequeue
on an empty queue in
parallel without interference. Additionally, if the back contained only one
element, no write to the front is generated.
Question: When does a transaction generate a read-only compare against a particular location?
First of all, the transaction must be attempted in the
obstruction_free
mode, which is the default mode thatcommit
uses initially.Additionally, there must be no operation in the transaction that sets a new value to the location.
If an operation sets a location to a new value, the full original state of the location is forgotten, and the transaction will then later attempt a compare-and-set operation against that location even if a later operation inside the transaction sets the location to its original value.
The intention behind this approach is to strike a balance between adding overhead and also supporting convenient read-only updates.
So, to use a queue, we first need to create it and then we may
commit
transactions to enqueue
and try_dequeue
elements:
# let a_queue : int queue = queue ()
val a_queue : int queue = {front = <abstr>; back = <abstr>}
# Tx.commit @@ enqueue a_queue 76
- : unit = ()
# Tx.commit @@ try_dequeue a_queue
- : int option = Some 76
# Tx.commit @@ try_dequeue a_queue
- : int option = None
Beware: Using two stacks for a queue is easy to implement and performs well in many cases. Unfortunately it has one major weakness. The problem is that it may take a relatively long time to reverse the back of a queue. This can cause starvation as producers may then be able to always complete their transactions before consumers and the back of the queue might grow without bound.
The main benefit of the
Tx
API
over the
Op
API
is that transactions are composable. In fact, we already used
let*
to compose primitive transactions when implementing transactional stacks and
queues. Composition is not limited to primitive transactions.
For example, one can push multiple elements to a transactional stack atomically:
# Tx.(
commit (
push a_stack 3 >>
push a_stack 1 >>
push a_stack 4
)
)
- : unit = ()
Or transfer elements between different transactional data structures:
# Tx.(
commit (
try_pop a_stack >>= function
| Some element ->
enqueue a_queue element
| None ->
return ()
)
)
- : unit = ()
The ability to compose transactions allows algorithms and data-structures to be used for a wider variety of purposes.
The transaction mechanism provided by kcas is quite intentionally designed
to be very simple and efficient. This also means that it cannot provide certain
features, because adding such features would either add significant dependencies
or overheads to the otherwise simple and efficient implementation. In
particular, the transactions provided by kcas do not directly provide
blocking or the ability to wait for changes to shared memory locations before
retrying a transaction. The way
commit
works is that it simply retries the transaction in case it failed. To avoid
contention, a
backoff
mechanism is used, but otherwise
commit
will essentially perform a
busy-wait, which should usually be
avoided.
This project uses ocamlformat (for OCaml) and prettier (for Markdown).
- Update CHANGES.md.
- Run
dune-release tag VERSION
to create a tag for the newVERSION
. - Run
dune-release
to publish the newVERSION
. - Run
./update-gh-pages-for-tag VERSION
to update the online documentation.