ocaml-multicore / kcas Goto Github PK

View Code? Open in Web Editor NEW

100.0 8.0 10.0 1.49 MB

Software Transactional Memory for OCaml

Home Page: https://ocaml-multicore.github.io/kcas/

License: ISC License

OCaml 99.14% Makefile 0.03% Shell 0.65% Dockerfile 0.17%

lock-free ocaml software-transactional-memory

kcas's Introduction

📢 Note 📢

Multicore OCaml project has now been merged into OCaml 🎉. This repository is no longer developed or maintained. Please follow the updates at the OCaml Github repository.

Citation

If you are citing this work in an academic paper, please cite the ICFP 2020 paper "Retrofitting Parallelism onto OCaml": https://dl.acm.org/doi/10.1145/3408995

Branch `trunk`	Branch `4.14`	Branch `4.13`	Branch `4.12`

README

Overview

OCaml is a functional, statically-typed programming language from the ML family, offering a powerful module system extending that of Standard ML and a feature-rich, class-based object system.

OCaml comprises two compilers. One generates bytecode which is then interpreted by a C program. This compiler runs quickly, generates compact code with moderate memory requirements, and is portable to many 32 or 64 bit platforms. Performance of generated programs is quite good for a bytecoded implementation. This compiler can be used either as a standalone, batch-oriented compiler that produces standalone programs, or as an interactive REPL system.

The other compiler generates high-performance native code for a number of processors. Compilation takes longer and generates bigger code, but the generated programs deliver excellent performance, while retaining the moderate memory requirements of the bytecode compiler. The native-code compiler currently runs on the following platforms:

	Tier 1 (actively maintained)	Tier 2 (maintained when possible)
x86 64 bits	Linux, macOS, Windows, FreeBSD	NetBSD, OpenBSD
x86 32 bits	Linux, Windows	FreeBSD, NetBSD, OpenBSD
ARM 64 bits	Linux, macOS	FreeBSD
ARM 32 bits	Linux	FreeBSD, NetBSD, OpenBSD
Power 64 bits	Linux
Power 32 bits	Linux
RISC-V 64 bits	Linux
IBM Z (s390x)	Linux

Other operating systems for the processors above have not been tested, but the compiler may work under other operating systems with little work.

Copyright

All files marked "Copyright INRIA" in this distribution are Copyright © 1996-2021 Institut National de Recherche en Informatique et en Automatique (INRIA) and distributed under the conditions stated in file LICENSE.

Installation

See the file INSTALL.adoc for installation instructions on machines running Unix, Linux, macOS, WSL and Cygwin. For native Microsoft Windows, see README.win32.adoc.

Documentation

The OCaml manual is distributed in HTML, PDF, and Emacs Info files. It is available at

https://ocaml.org/releases/latest/manual.html

Availability

The complete OCaml distribution can be accessed at

https://ocaml.org/docs/install.html

Keeping in Touch with the Caml Community

There is an active and friendly discussion forum at

https://discuss.ocaml.org/

The OCaml mailing list is the longest-running forum for OCaml users. You can email it at

[email protected]

You can subscribe and access list archives via the Web interface at

https://sympa.inria.fr/sympa/subscribe/caml-list

An alternative archive of the mailing list is also available at

https://inbox.ocaml.org/

There also exist other mailing lists, chat channels, and various other forums around the internet for getting in touch with the OCaml and ML family language community. These can be accessed at

https://ocaml.org/community/

In particular, the IRC channel #ocaml on Libera has a long history and welcomes questions.

Bug Reports and User Feedback

Please report bugs using the issue tracker at https://github.com/ocaml/ocaml/issues

To be effective, bug reports should include a complete program (preferably small) that exhibits the unexpected behavior, and the configuration you are using (machine type, etc).

For information on contributing to OCaml, see HACKING.adoc and CONTRIBUTING.md.

Separately maintained components

Some libraries and tools which used to be part of the OCaml distribution are now maintained separately. Please use the issue trackers at their respective new homes:

The Graphics library (removed in OCaml 4.09)
The Num library (removed in OCaml 4.06)
The OCamlbuild tool (removed in OCaml 4.03)
The camlp4 tool (removed in OCaml 4.02)
The LablTk library (removed in OCaml 4.02)
The CamlDBM library (removed in OCaml 4.00)
The OCamlWinTop Windows toplevel (removed in OCaml 4.00)

kcas's People

Contributors

Stargazers

Watchers

Forkers

fondation451 sadiqj clayrat jmid shakthimaan tmcgilchrist davidalphafox nickbetteridge dashy-dolphin dangdennis

kcas's Issues

Extend the `Tx` mechanism to support non-busy wait or blocking

This issue is based on my comment on previous PR that introduced the Tx API. I will likely update this issue with some further notes.

The Tx mechanism is fundamentally limited as it does not support non-busy wait or blocking. Unfortunately, adding full support for blocking seems like it would be outside the scope of the underlying algorithms as it would add significant dependencies (ability to suspend fibers and/or domains) and/or significant overheads (having lists of waiters with each location, and accessing those lists during transactions).

However, there might be practical ways to extend the Tx API to allow it to support low overhead blocking transactions on top of the underlying transaction log mechanism.

To support blocking, one essentially needs a way to signal waiters. After mutating some locations the mutator signals waiters. For a scalable mechanism that signal needs to be selective and only wake up those waiters that are interested in the mutated locations.

To associate waiters with locations in a truly low-overhead fashion, one possibility would be to allow locations to be "tagged":

module Loc : sig
  type ('tag, 'a) tagged

  val make_tagged: 'tag -> 'a -> ('tag, 'a) tagged
  val get_tag : ('tag, 'a) tagged -> 'tag

  type 'a t = (unit, 'a) t
  
  (* ... *)

In a blocking transaction mechanism that 'tag could be a bag of the waiters of changes to the location.

Additionally, a scalable blocking mechanism also needs to be able to efficiently figure out which locations have been read and which have been written. A waiter needs to add itself to the read locations and a mutator needs to signal waiters of written locations.

module Tx : sig
  (* ... *)

  module Log : sig
    type t

    type 'r reducer = {
      one : 't 'a. ('t, 'a) Loc.tagged -> 'r;
      zero : 'r;
      plus : 'r -> 'r -> 'r;
    }

    val reduce : 'r reducer -> t -> 'r
    (** [reduce reducer] performs a fold over the transaction log. *)
  end

  exception Retry of unit t
  (** Exception raised by {!reset_and_retry}. *)

  val reset_and_retry : (Log.t -> unit t) -> 'a t
  (** [reset_and_retry on_read] returns a transaction that resets the current
      transaction such that it only reads from the accessed locations.  The
      [on_read] function is then called with the internal transaction log to
      construct a transaction that is then composed after the current
      transaction.  The composed transaction [tx] is then raised as a
      [Retry tx] exception. *)

  val written: (Log.t -> unit t) -> 'a t -> 'a t
  (** [written on_written tx] returns a transaction that executes as [tx] and
      then calls the given function with a view of the internal transaction log
      restricted to the written locations.  The returned transaction is then
      composed after the transaction [tx].

      The intended use case for [written] is to extend a transaction to signal
      waiters in a blocking transaction mechanism:

      {[
        let rec blocking_tx tx =
          let all_waiters = Loc.make [] in
          match
            tx
            |> written (fun log ->
                 (* remove all waiters of all written locations
                    and add them to the [all_waiters] list. *)
               )
            |> attempt
          with
          | result ->
            (* signal [all_waiters] *)
            result
          | exception Exit ->
            blocking_tx tx
          | exception Retry add_waiters_tx -> (
            match attempt add_waiters_tx with
            | () ->
              (* blocking wait *)
              blocking_tx tx
            | exception Exit ->
              (* Locations were already mutated before waiters could be added *)
              blocking_tx tx)
      ]} *)

The idea of (resetting and) extending transactions with the waiter operations is that this way the kcas mechanism itself checks whether the waiters should be added (as the read locations didn't change during the original transaction and the addition of waiters — if either fails then the transaction can be just retried without blocking) or signaled (as the mutations, including taking all the waiters, were completed successfully).

The above is only a preliminary idea. I have not yet fully implemented the above to verify it in practise.

Here is the blocking_tx example with proper highlighting:

        let rec blocking_tx tx =
          let all_waiters = Loc.make [] in
          match
            tx
            |> written (fun log ->
                 (* remove all waiters of all written locations
                    and add them to the [all_waiters] list. *)
               )
            |> attempt
          with
          | result ->
            (* signal [all_waiters] *)
            result
          | exception Exit ->
            blocking_tx tx
          | exception Retry add_waiters_tx -> (
            match attempt add_waiters_tx with
            | () ->
              (* blocking wait *)
              blocking_tx tx
            | exception Exit ->
              (* Locations were already mutated before waiters could be added *)
              blocking_tx tx)

Of course, a proper implementation would be a bit more complicated with things like backoff.

Consider timeouts

During a recent talk a question on the detection of problems such as starvation came up. One potential solution/helper for those could be just to add support for timeouts. It should already be possible to implement the equivalent of commit, which is basically a loop calling attempt, outside of the library and include a timeout mechanism, but perhaps it would make sense e.g. to add an optional argument to commit to specify a timeout. As the timeout would then likely only need to be examined after attempt failure, it could be (with a bit of care) implemented so that it adds almost no overhead (i.e. examine the optional timeout only after the first failure).

Check on CI that code examples in the generated docs compile and run correctly

PR #120 adds some code examples to the generated documentation, but it does not add a CI mechanism to ensure that the examples remain correct. We should add e.g. MDX based checking of the code snippets.

What is the purpose of the kcas_test package?

See question in ocaml/opam-repository#23634

Reconsider and document behavior of `Xt.post_commit`

Currently, if an action registered with Xt.post_commit raises an exception, then actions registered later will not be called. At the very least this and the order in which actions will be called should be documented, but it might make sense to reconsider the order in which the actions are called and whether it would make sense to call all of the actions even in case one or more actions raise exception.

provide an implementation of Stdlib.Semaphore.Counting and Stdlib.Semaphore.Binary

If you stick to the same interface, then people can easily switch between the two implementations.

https://v2.ocaml.org/api/Semaphore.Counting.html

https://v2.ocaml.org/api/Semaphore.Binary.html

You said you can be much more efficient, right?

[Question] Hashtable rehashing example

Thank for the excellently written README. I am trying to understand the material in it.

I have a question related to Hashtable with rehashing:

After switching to the rehashing state, all mutators will then cooperatively race to perform the rehash.

In what sense is the race cooperative? I agree that another mutator does not attempt to replace a (key, value) and thus cause a transaction failure for the mutator that is trying to do the more long running rehash but how you would then explain this as a cooperative race?

Also would this be an place where it might actually be a good idea to throw a Later exception if a rehash is pending? Why have multiple threads potentially all try to do a rehash? Why not let one thread do the rehash while other threads abort until the rehash is complete?

Switching to GKMZ algorithm?

It seems that the current implementation uses the Harris-Fraser RDCSS-based algorithm, however, there's a new Guerraoui-Kogan-Marathe-Zablotchi algorithm which claims to be more optimal by only requiring k+1 CASes instead of 3k+1.

First-impression notes on the API

I guess kcas is still experimental and we can refine the API. I took a quick reading of the API (module signature) and I think it could use some work.

Here are my (non-exhaustive) first-impression notes on the API:

Overall, on first impression, the API could perhaps use some reorganisation, either to subsections or to submodules, to more clearly separate operations on single refs, safe and unsafe operations on a single ref, and operations on multiple refs.
- For example, the spec type t is on line 5, then there are other specs, the first introduction form val mk_cas is on line 17. Then there are two unrelated specs, and the first elimination form val commit is on line 29. I would suggest grouping specs on a single abstraction together.
Why is there both Kcas.cas and Kcas.W1.cas?
- Answer: W1 could be dropped as it essentially duplicates Atomic.

The APIs of map and try_map could maybe use polymorphic variants:

val try_map : 'a ref -> ('a -> 'a option) -> [ `Failed | `Aborted | `Success of 'a ]
val map : 'a ref -> ('a -> 'a option) -> [ `Aborted | `Success of 'a ]

(This should not have major negative impact on performance.)

Backoff measured in milliseconds sounds excessive as modern CPUs can do incredible amount of work in a single millisecond.
I would drop the separate module type Backoff = sig ... end as one can say module type of Backoff.

New release to fix build of reagents?

Now that multicore is closer to becoming reality (ocaml/ocaml#10831) it'd be good to be able to easily install/use the rest of the libraries around it.
ocaml-multicore/reagents#25 says a new kcas is needed to build, and I confirm that is still the case with latest OCaml 4.12.0+domains+effects and the multicore opam repo.

opam remote -v
[NOTE] These are the repositories in use by the current switch. Use '--all' to see all configured repositories.

<><> Repository configuration for switch 4.12.0+domains+effects <><><><><><><><>
 1 multicore git+https://github.com/ocaml-multicore/multicore-opam.git
 2 default   https://opam.ocaml.org

By default you get a compile error like this (if you install the missing jbuilder dependency):

#=== ERROR while compiling reagents.0.3.0 =====================================#
# context     2.1.1 | linux/x86_64 | ocaml-variants.4.12.0+domains+effects | git+https://github.com/ocaml-multicore/multicore-opam.git
# path        /var/tmp/edwin-caches/.opam/4.12.0+domains+effects/.opam-switch/build/reagents.0.3.0
# command     /var/tmp/edwin-caches/.opam/opam-init/hooks/sandbox.sh build jbuilder build -p reagents -j 8
# exit-code   1
# env-file    /var/tmp/edwin-caches/.opam/log/reagents-112796-f96fc6.env
# output-file /var/tmp/edwin-caches/.opam/log/reagents-112796-f96fc6.out
### output ###
# [...]
#       ocamlc reagents/.reagents.objs/reagents__Offer.{cmo,cmt} (exit 2)
# (cd _build/default && /var/tmp/edwin-caches/.opam/4.12.0+domains+effects/bin/ocamlc.opt -w -40 -g -bin-annot -I reagents/.reagents.objs -I /var/tmp/edwin-caches/.opam/4.12.0+domains+effects/lib/kcas -I /var/tmp/edwin-caches/.opam/4.12.0+domains+effects/lib/lockfree -no-alias-deps -open Reagents__ -o reagents/.reagents.objs/reagents__Offer.cmo -c -impl reagents/offer.ml)
# File "reagents/offer.ml", line 42, characters 17-28:
# 42 |   let get_id r = Kcas.get_id r
#                       ^^^^^^^^^^^
# Error: Unbound value Kcas.get_id
#     ocamlopt reagents/.reagents.objs/reagents__Offer.{cmx,o} (exit 2)
# (cd _build/default && /var/tmp/edwin-caches/.opam/4.12.0+domains+effects/bin/ocamlopt.opt -w -40 -g -I reagents/.reagents.objs -I /var/tmp/edwin-caches/.opam/4.12.0+domains+effects/lib/kcas -I /var/tmp/edwin-caches/.opam/4.12.0+domains+effects/lib/lockfree -no-alias-deps -open Reagents__ -o reagents/.reagents.objs/reagents__Offer.cmx -c -impl reagents/offer.ml)
# File "reagents/offer.ml", line 42, characters 17-28:
# 42 |   let get_id r = Kcas.get_id r
#                       ^^^^^^^^^^^
# Error: Unbound value Kcas.get_id

The workaround mentioned in the above issue still works:

opam pin add kcas --dev-repo

Could you make a new release to allow these to be able to installed/built together?

Port Backoff module to use `Domain.Sync.cpu_relax`

lockfree library already does something similar: https://github.com/ocaml-multicore/lockfree/blob/6846bf4b16bcc61c60ecad43d2de3941ca56b1bf/src/backoff.ml

ocaml-multicore/ocaml-multicore#561 removes Domain.Sync.{critical_section, wait_for}, so this needs to be updated.

Suggestions for first-time contributors to k-CAS

The kcas library currently lacks benchmarks, tests, and cool examples and those can also serve as a good way to understand k-CAS itself. The kcas_data library provides a few data structures, but there is plenty of room for more.

Here are some potential ideas:

Using k-CAS, create examples of solutions to well known concurrency problems such as
- The dining philosophers problem — an example already exists here
Implement examples and benchmarks of lock-free data structures:
- Stacks — simple list based stacks already exists (e.g. Stack); what about something using an array?
- Queues — a few queue implementations already exist (e.g. Queue), but there are many ways to implement queues
- Deques
- Linked lists — a doubly linked list example already exists (e.g. Dllist); what about singly linked lists?
- Skip lists
- Binary search trees — there are many different ways to implement binary search trees
- Maps and Sets — possibly using some binary search tree or skip list
- Hash tables — a couple of hash table examples already exists (e.g. Hashtbl), but there is definitely room for more
- Bags
- Priority queues — there is an example of a leftist heap, but there are many other approaches to priority queues
Note that with k-CAS it is possible to straighforwardly translate an imperative data-structure to a lock-free data-structure by using transactions.
Use k-CAS to implement composable synchronization primitives (mutexes, condition variables, semaphores, barriers, ...) with the idea being that one can commit a transaction that e.g. simultaenously acquires multiple mutexes
Use k-CAS e.g. with domainslib to parallelize algorithms that e.g. manipulate non-trivial data structures and are difficult to parallelize otherwise.
Device tests / benchmarks / examples that perform particularly poorly, e.g.
- example that performs poorly with the default commit ~mode:Mode.obstruction_free and significantly better with commit ~mode:Mode.lock_free, or
- example that suffers from starvation.

Consider supporting `or_else` and proper nested transactions

The current transactional interface does not provide an or_else operation to modularly choose between alternative blocking transactions. To support such an operation requires being able to roll back changes made by a transaction involved in such a choice. It should be possible to implement such an operation on the transaction log. The challenge is to implement it as efficiently as possible.

Library causes linking errors when used in another project

When the library is used in a project, the following linking error occurs:

Error: No implementations provided for the following modules:
         Unix referenced from .opam/4.06.1+multicore/lib/kcas/kcas.cmxa(Kcas_backoff)
make: *** [bench] Error 1

This is due to Unix not being listed as a dependency in the kcas package metadata, so unix.cmxa is not linked before kcas.cmxa, despite it being required.

Consider interaction of `Xt.post_commit` and `Xt.rollback`

Post commit actions registered with Xt.post_commit aren't currently scoped by Xt.snapshot and Xt.rollback. What should actually be done with Xt.post_commit with respect to rollbacks requires some careful thinking.

[Suggestion] Add some warnings related ref, mutable, IO to README

Add some warnings on avoiding impure code, mutation operations within transactions to the README.md.

Here is some excerpted text written by @polytypic already that is a good start:

(From https://discuss.ocaml.org/t/ann-kcas-and-kcas-data-0-3-0-software-transactional-memory/12085/16)

Yes, when using kcas transactions, one should be aware of the fact that the transaction function may be called many times and transactional shared memory location accesses within the transaction may choose to raise exceptions to force the transaction to be retried later.

To be safe, kcas transactions should typically be pure (no use of refs or mutable, no IO, …).
...

Consider automatic transaction validation

Currently when running a transaction function the transaction log is not validated until the transaction function returns. A long running transaction function may potentially waste a lot of resources (CPU time, allocate memory causing GC cycles) before it returns and is then rejected during validation. It would likely make sense to add a simple automatical validation scheme. For example, the transaction log could keep a count of the number of accesses to the transaction log. After some number of accesses the whole log would be validated. The next validation would be performed after double the number of transaction log accesses. And so on.

Retarget the documentation

kcas aims to be usable both

for experts implementing correct and performant lock-free data structures, and
for everyone gluing together programs using such data structures.

The second bullet is really the reason why a composable approach makes sense as it allows people to reuse carefully designed and implemented data structures and other kinds of communication and synchronization mechanisms to solve their ad hoc concurrent programming problems. The current documentation, however, mostly reflects the first bullet. While it is important that one can implement efficient queues and hash tables using kcas, it is also very unlikely to help most people see how they could use kcas to solve higher-level concurrent programming problems, such as those described in the two messages below:

So, write new documentation to primarily address the second bullet and relegate the discussion of the advanced data structure implementations techniques to secondary status.