Giter Club home page Giter Club logo

jepsen.tarantool's Introduction

Tarantool Jepsen Test

Building

This is a test suite, written using the Jepsen distributed systems testing library, for Tarantool. It provides a number of workloads, which uses Elle and Knossos to find transactional anomalies up to strict serializability.

We include a wide variety of faults, including network partitions, process crashes, pauses, clock skew, and membership changes.

How to use

Prerequisites

You'll need a Jepsen cluster running Ubuntu, which you can either build yourself or run in AWS via Cloudformation.

The control node needs:

  • A JVM with version 1.8 or higher.
  • JNA, so the JVM can talk to your SSH.
  • (optional) Gnuplot, that helps Jepsen renders performance plots.
  • (optional) Graphviz, that helps Jepsen renders transactional anomalies.

These dependencies you can get (on Ubuntu) via:

sudo apt install -y openjdk8-jdk graphviz gnuplot

Jepsen will install dependencies (e.g. git, build tools, various support libraries) as well as Tarantool itself automatically on all DB nodes participated in test.

Usage

Tests distributed as a JAR file suitable for running with JVM. Release archives with JAR file, shell script for running JAR file, CHANGELOG.md and README.md are published for every release. Before start one can download archive for latest release and unpack it.

To see all options and their default values, try

./run-jepsen test --help

To run test register with Tarantool 2.8 10 times during 600 seconds, try:

./run-jepsen test --username root --nodes-file nodes --workload register
                  --version 2.8 --time-limit 600 --test-count 10

To run test set with Tarantool built using source code in master branch during 100 seconds with 20 threads, try:

./run-jepsen test --nodes-file node --engine vinyl --workload set
                  --concurrency 20 --time-limit 100

To focus on a particular set of faults, use --nemesis

./run-jepsen test --nemesis partition,kill

Options

  • --concurrency - how many workers should we run? Must be an integer, optionally followed by n (e.g. 3n) to multiply by the number of nodes.
  • --engine - what Tarantool data engine should we use? Available values are memtx and vinyl. Learn more about DB engines in Tarantool documentation.
  • --leave-db-running - leave the database running at the end of the test, so you can inspect it. Useful for debugging.
  • --logging-json - use JSON structured output in the Jepsen log.
  • --mvcc - enable MVCC engine, learn more about it in Tarantool documentation.
  • --nemesis - a comma-separated list of nemesis faults or groups of faults to enable. Nemeses groups are: none with none nemeses, standard includes partition and clock, all includes all nemeses listed below. Available nemeses are:
    • clock generates a nemesis which manipulates clocks.
    • pause pauses and resumes a DB's processes using SIGSTOP and SIGCONT signals.
    • kill kills a DB's processes using SIGKILL signal.
    • partition splits network connectivity for nodes in a cluster and then recover it.
  • --nemesis-interval - how long to wait between nemesis faults.
  • --node - node(s) to run test on. Flag may be submitted many times, with one node per flag.
  • --nodes - comma-separated list of node hostnames.
  • --nodes-file - file containing node hostnames, one per line.
  • --username - username for login to remote server via SSH.
  • --password - password for sudo access on remote server.
  • --strict-host-key-checking - whether to check host keys.
  • --ssh-private-key - path to an SSH identity file.
  • --test-count - how many times should we repeat a test?
  • --time-limit - excluding setup and teardown, how long should a test run for, in seconds?
  • --version - what Tarantool version should we test? Option accepts two kind of versions: branch version (for example 2.2) to use a latest version of package from this branch or GIT commit hash to use version built on this commit.
  • --workload - test workload to run. Available workloads are:
    • bank simulates transfers between bank accounts. Uses SQL to access to bank accounts.
    • bank-multitable simulates transfers between bank accounts when each account is in a separate space (table). Uses SQL to access to bank accounts.
    • bank-lua simulates transfers between bank accounts. Uses Lua functions to access to bank accounts.
    • bank-multitable-lua simulates transfers between bank accounts when each account is in a separate space (table). Uses Lua functions to access to bank accounts.
    • counter-inc increments a counter.
    • register models a register with read, write and CAS (Compare-And-Set) operations.
    • set inserts a series of unique numbers as separate instances, one per transaction, and attempts to read them back through an index.

How to build

For building Jepsen tests locally one need to setup Leiningen build system and Clojure.

For building tests, try:

lein deps
lein compile

For running tests, try:

lein run test --nodes-file nodes --workload register --version 2.8 --time-limit 100

License

Copyright © 2020-2021 VK Company Limited

This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at https://www.eclipse.org/legal/epl-2.0.

This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.

jepsen.tarantool's People

Contributors

ligurio avatar totktonada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jepsen.tarantool's Issues

Remove dead code

  • remove unused code
  • remove commented out code
  • remove unused imports

Add Pages test

Pages (Verifies the transactional isolation of pagination by inserting groups of elements together like [1, 5, -15, 23], then concurrently performing reads of every element in the collection. We expect to find that for every element of a group, all the other elements exist).

[2pt] Add Set test

Set (test inserts a series of unique numbers as separate instances, one per transaction, and attempts to read them back through an index), serializability.

Add Sequential test

Sequential (looks for violations of sequential consistency across multiple keys, where transaction order is inconsistent with client order).

Nemesis partition failed with parameter :primaries

2020-10-21 12:00:22,337{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :start-partition        :primaries
2020-10-21 12:00:22,347{GMT}    WARN    [jepsen nemesis] jepsen.core: Process :nemesis crashed
java.lang.IllegalArgumentException: No implementation of method: :primaries of protocol: #'jepsen.db/Primary found for class: tarantool.db$db
        at clojure.core$_cache_protocol_fn.invokeStatic(core_deftype.clj:583) ~[clojure-1.10.1.jar:na]
        at clojure.core$_cache_protocol_fn.invoke(core_deftype.clj:575) ~[clojure-1.10.1.jar:na]
        at jepsen.db$fn__3265$G__3258__3272.invoke(db.clj:30) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis.combined$grudge.invokeStatic(combined.clj:162) ~[na:na]
        at jepsen.nemesis.combined$grudge.invoke(combined.clj:149) ~[na:na]
        at jepsen.nemesis.combined$partition_nemesis$reify__2377.invoke_BANG_(combined.clj:194) ~[na:na]
        at jepsen.nemesis$invoke_compat_BANG_.invokeStatic(nemesis.clj:46) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$invoke_compat_BANG_.invoke(nemesis.clj:42) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$compose$reify__5683.invoke_BANG_(nemesis.clj:259) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$invoke_compat_BANG_.invokeStatic(nemesis.clj:46) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$invoke_compat_BANG_.invoke(nemesis.clj:42) ~[jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_invoke_op_BANG_$fn__5888.invoke(core.clj:259) ~[jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_invoke_op_BANG_.invokeStatic(core.clj:259) [jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_invoke_op_BANG_.invoke(core.clj:254) [jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_apply_op_BANG_.invokeStatic(core.clj:294) [jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_apply_op_BANG_.invoke(core.clj:286) [jepsen-0.1.18.jar:na]
        at jepsen.core.NemesisWorker.run_worker_BANG_(core.clj:410) [jepsen-0.1.18.jar:na]
        at jepsen.core$run_workers_BANG_$run__5872.invoke(core.clj:206) [jepsen-0.1.18.jar:na]
        at dom_top.core$real_pmap_helper$build_thread__214$fn__215.invoke(core.clj:146) [jepsen-0.1.18.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:152) [clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:665) [clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) [clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) [clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:425) [clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:156) [clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.applyTo(RestFn.java:132) [clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:669) [clojure-1.10.1.jar:na]
        at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) [clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:397) [clojure-1.10.1.jar:na]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.10.1.jar:na]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_262]
2020-10-21 12:00:22,348{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :start-partition        :primaries      indeterminate: No implementation of method: :primaries of protocol: #'jepsen.db/Primary found for class: tarantool.db$db

Catch exceptions in operations

When tarantool is not available or something goes wrong exceptions like below spams log:

2020-10-21 09:56:36,372{GMT}    INFO    [jepsen worker 2] jepsen.util: 2        :invoke :add    1
2020-10-21 09:56:36,373{GMT}    INFO    [jepsen worker 3] jepsen.util: 3        :invoke :add    1
2020-10-21 09:56:36,401{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :start-partition        nil
2020-10-21 09:56:36,419{GMT}    WARN    [jepsen nemesis] jepsen.core: Process :nemesis crashed
java.lang.IllegalArgumentException: Expected op {:type :info, :f :start, :value nil, :process :nemesis, :time 27663624311} to have a grudge for a 
:value, but none given.
        at jepsen.nemesis$partitioner$reify__5651.invoke_BANG_(nemesis.clj:135) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis.combined$partition_nemesis$reify__2377.invoke_BANG_(combined.clj:195) ~[na:na]
        at jepsen.nemesis$invoke_compat_BANG_.invokeStatic(nemesis.clj:46) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$invoke_compat_BANG_.invoke(nemesis.clj:42) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$compose$reify__5683.invoke_BANG_(nemesis.clj:259) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$invoke_compat_BANG_.invokeStatic(nemesis.clj:46) ~[jepsen-0.1.18.jar:na]
        at jepsen.nemesis$invoke_compat_BANG_.invoke(nemesis.clj:42) ~[jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_invoke_op_BANG_$fn__5888.invoke(core.clj:259) ~[jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_invoke_op_BANG_.invokeStatic(core.clj:259) [jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_invoke_op_BANG_.invoke(core.clj:254) [jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_apply_op_BANG_.invokeStatic(core.clj:294) [jepsen-0.1.18.jar:na]
        at jepsen.core$nemesis_apply_op_BANG_.invoke(core.clj:286) [jepsen-0.1.18.jar:na]
        at jepsen.core.NemesisWorker.run_worker_BANG_(core.clj:410) [jepsen-0.1.18.jar:na]
        at jepsen.core$run_workers_BANG_$run__5872.invoke(core.clj:206) [jepsen-0.1.18.jar:na]
        at dom_top.core$real_pmap_helper$build_thread__214$fn__215.invoke(core.clj:146) [jepsen-0.1.18.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:152) [clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:665) [clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) [clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) [clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:425) [clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:156) [clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.applyTo(RestFn.java:132) [clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:669) [clojure-1.10.1.jar:na]
        at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) [clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:397) [clojure-1.10.1.jar:na]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.10.1.jar:na]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_262]
2020-10-21 09:56:36,419{GMT}    INFO    [jepsen nemesis] jepsen.util: :nemesis  :info   :start-partition        nil     indeterminate: Expected op {:type :info, :f :start, :value nil, :process :nemesis, :time 27663624311} to have a grudge for a :value, but none given.
2020-10-21 09:56:36,487{GMT}    INFO    [jepsen worker 0] jepsen.util: 0        :ok     :add    1

It would be nice to catch all possible exceptions and convert them to history :info records with a reason of exception.

       (invoke! [this test op]
         (try
           (let [id     (key (:value op))
                 value  (val (:value op))]
             (case (:f op)
               :read   (let [proc (if (:strong-reads? opts)
                                    "SRegisterStrongRead"
                                    "REGISTERS.select")
                             v (-> conn
                                   (voltdb/call! proc id)
                                   first
                                   :rows
                                   first
                                   :VALUE)]
                         (assoc op
                                :type :ok
                                :value (independent/tuple id v)))
               :write (do (voltdb/call! conn "REGISTERS.upsert" id value)
                          (assoc op :type :ok))
               :cas   (let [[v v'] value
                            res (-> conn
                                    (voltdb/call! "registers_cas" v' id v)
                                    first
                                    :rows
                                    first
                                    :modified_tuples)]
                        (assert (#{0 1} res))
                        (assoc op :type (if (zero? res) :fail :ok)))))
           (catch org.voltdb.client.NoConnectionsException e
             (assoc op :type :fail, :error :no-conns))
           (catch org.voltdb.client.ProcCallException e
             (let [type (if (= :read (:f op)) :fail :info)]
               (condp re-find (.getMessage e)
                 #"^No response received in the allotted time"
                 (assoc op :type type, :error :timeout)

                 #"^Connection to database host .+ was lost before a response"
                 (assoc op :type type, :error :conn-lost)

                 #"^Transaction dropped due to change in mastership"
                 (assoc op :type type, :error :mastership-change)

                 (throw e))))))

Like this - https://github.com/jepsen-io/voltdb/blob/2799ede72fd06ef0dd7879bb52d94e6fffe9b71d/src/jepsen/voltdb/single_register.clj#L61

Add a focus test for TXM

TXM has been implemented in scope of Transaction engine for memtx engine #4897

It implements MVCC and bank would be a good test for it because bank is a canonical test for MVCC: "For instance, when making a wire transfer between two bank accounts if a reader reads the balance at the bank when the money has been withdrawn from the original account and before it was deposited in the destination account, it would seem that money has disappeared from the bank." via

Add G2 test

G2 (checks for a type of phantom anomaly prevented by serializability: anti-dependency cycles involving predicate reads).

[2pt] Add Comments test

Comments (checks for a specific type of strict serializability violation, where transactions on disjoint records are visible out of order).

Add partition nemesis

partition - introduce network partitions into the cluster that isolates a randomly chosen node from the rest of the nodes in the cluster.

Bump Jepsen version to 0.2.3

https://github.com/jepsen-io/jepsen/releases/tag/0.2.1

Useful changes for Tarantool tests:

  • 0.2.1: nemesis.membership: an experimental namespace which supports writing membership-changing nemeses and generators. Users provide an implementation of the nemesis.membership.state/State protocol: a mostly-pure structure which defines how to observe the state of the cluster on a specific node, merging those node views, generating operations, applying those operations to the cluster, and (since clusters often resolve membership changes asynchronously) deciding when those operations have been completed. Given this object, the membership system handles spawning threads to observe the cluster state, evolves the given state machine towards a fixed state over time, and provides a stateful nemesis and generator that work together to perform membership changes. The resulting package can be combined with other faults through nemesis.combined.
  • 0.2.1: Jepsen now logs the GIT hash and command line used at the start of each test, which makes it easier to reproduce results.
  • 0.2.0 jepsen.cli now takes a --no-ssh option, which is helpful when running Jepsen against local systems, existing databases, or external APIs.
  • 0.1.19 jepsen.generator.pure is basically stable for writing production tests at this point. See the namespace docs for details.

Access to Tarantool via SQL and using connector

There are two interfaces to access to data in Tarantool: using SQL and connector. I suppose we should make two flavors with each Jepsen tests to use both interfaces. Such approach used, for example, in tests for Yugabyte, that has ycql and ysql (https://github.com/jepsen-io/jepsen/tree/master/yugabyte/src/yugabyte)

Tarantool clients on Clojure:

Initial commit includes methods from a native Clojure connector, but these methods was never used -
1b76c83#diff-9833d33163e019f95895e621c02b2bb5f1c883552d6bf83409de649b98ee2f3f

Rewrite operations of a client in register workload to SQL

register was a first experience in a jepsen testing it contains two operations (write and read) written using Lua.
Although both can be implemented using SQL.

https://github.com/tarantool/jepsen.tarantool/blob/master/src/tarantool/register.clj

  (invoke! [this test op]
     (case (:f op)
       :read (assoc op
                    :type  :ok
                    :value (cl/read-v-by-k conn 1))
       :write (do (let [con (cl/open (jepsen/primary test) test)]
                   (cl/write-v-by-k con 1 (:value op)))
                   (assoc op :type :ok))
       :cas (let [[old new] (:value op)
                  con (cl/open (jepsen/primary test) test)]
                  (assoc op :type (if (cl/compare-and-set con 1 old new)
                                   :ok
                                   :fail)))))

  (teardown! [this test])
      ;(j/execute! conn ["DROP TABLE jepsen"]))

https://github.com/tarantool/jepsen.tarantool/blob/master/src/tarantool/client.clj

(defn read-v-by-k
  "Reads the current value of a key."
  [conn k]
  (first (vals (first (j/execute! conn ["SELECT _READ(?, 'JEPSEN')" k])))))

(defn write-v-by-k
  "Writes the current value of a key."
  [conn k v]
  (j/execute! conn ["SELECT _WRITE(?, ?, 'JEPSEN')"
                    k v]))

(defn compare-and-set
  [conn id old new]
  (first (vals (first (j/execute! conn ["SELECT _CAS(?, ?, ?, 'JEPSEN')"
                                        id old new])))))

https://github.com/tarantool/jepsen.tarantool/blob/master/resources/tarantool/jepsen.lua

--[[ Function implements an WRITE operation, which takes a key and value
and sets the key to the value if and only if the key is already exists, and
insert value if it is absent.
Example: SELECT _WRITE(1, 3, 'JEPSEN')
]]
box.schema.func.create('_WRITE',
   {language = 'LUA',
    returns = 'integer',
    body = [[function (id, value, table)
             box.space[table]:upsert({id, value}, {{'=', 1, 1}, {'=', 2, value}})
             return value
             end]],
    is_sandboxed = false,
    param_list = {'integer', 'integer', 'string'},
    exports = {'LUA', 'SQL'},
    is_deterministic = true})

--[[ Function implements an READ operation, which takes a key and returns a
value.
Example: SELECT _READ(1, 'JEPSEN')
]]
box.schema.func.create('_READ',
   {language = 'LUA',
    returns = 'integer',
    body = [[function (id, table)
             box.begin()
             local tuple = box.space[table]:get{id}
             if tuple then
                 return tuple[2]
             end
             box.commit()
             return nil
             end]],
    is_sandboxed = false,
    param_list = {'integer', "string"},
    exports = {'LUA', 'SQL'},
    is_deterministic = true})

Add clock-skew nemesis

clock-skew - cover a range of issues related to the clock synchronization across different machines. There are a number of different clock skews introduced – small (~ 100ms), medium (~ 250ms), large (~ 500ms) and xlarge (~ 1 secs). Used libfaketime to simulate some node clocks, both CLOCK_REALTIME and CLOCK_MONOTONIC, running up to 5x faster than others.

Consider nemeses used in Foundation DB

FoundationDB nemeses:

  • Laggy communication between nodes
  • Network routing errors
  • Swizzle clogging*
  • Simulated software bugs**
  • Upgrades from old server/file versions
  • Incomplete writes to disk (see #75)
  • Corrupted writes to disk (see #75)
  • Disk drive runs out of space (see #75)
  • Single node in a cluster terminates (done in #16)
  • Network partitions between nodes (done in #17)
  • Processes freeze for random durations (done in #21)

"For a while, there was an informal competition within the engineering team to design failures that found the toughest bugs and issues the most easily. After a period of one-upsmanship, the reigning champion is called “swizzle-clogging”. To swizzle-clog, you first pick a random subset of nodes in the cluster. Then, you “clog” (stop) each of their network connections one by one over a few seconds. Finally, you unclog them in a random order, again one by one, until they are all up. This pattern seems to be particularly good at finding deep issues that only happen in the rarest real-world cases." https://apple.github.io/foundationdb/testing.html

Add partition-ring nemesis

partition-ring - introducing network partitions dividing the cluster in two, or into overlapping rings of 3/5 nodes each, such that every node observed a majority, but no two nodes agreed on what that majority was.

Consider nemeses used in PingCAP

https://github.com/pingcap/tipocket#nemesis

  • random_kill, all_kill, minor_kill, major_kill, kill_tikv_1node_5min, kill_pd_leader_5min: As their name implies, these nemeses inject unavailable in a specified period of time.
  • short_kill_tikv_1node, short_kill_pd_leader: Kill selected container, used to inject short duration of unavailable fault.
    partition_one: Isolate single nodes
  • scaling: Scale up/down TiDB/PD/TiKV nodes randomly
  • shuffle-leader-scheduler/shuffle-region-scheduler/random-merge-scheduler: Just as there name implies.
  • delay_tikv, delay_pd, errno_tikv, errno_pd, mixed_tikv, mixed_pd: Inject IO-related fault.
  • small_skews, subcritical_skews, critical_skews, big_skews, huge_skews: Clock skew, small_skews ~100ms, subcritical_skews ~200ms, critical_skews ~250ms, big_skews ~500ms and huge_skews ~5s.

Add Monotonic test

Monotonic (verifies that internal transaction timestamps are consistent with logical transaction order), serializability.

Add test with checks for anomalies described in Adya's PhD

Adya’s formalization of transactional isolation levels provides a more thorough summary of the preventative interpretation of the ANSI levels, defining serializability as the absence of four phenomena. Serializability prohibits:

P0 (Dirty Write): w1(x) … w2(x)
P1 (Dirty Read): w1(x) … r2(x)
P2 (Fuzzy Read): r1(x) … w2(x)
P3 (Phantom): r1(P) … w2(y in P)

Here w denotes a write, r denotes a read, and subscripts indicate the transaction which executed that operation. The notation “…” indicates a series of micro-operations except for a commit or abort. P indicates a predicate.

As Adya notes, the preventative interpretation of the ANSI specification is overly restrictive: it rules out some histories which are legally serializable.

Make Tarantool spaces synchronous again

As described in documentation synchronous replication can be enabled per-space using the is_sync option. This option was missed after switching from Lua to SQL in tests.

(was lost in 3214905)

It is not possible to change is_sync using SET SESSION in SQL:

unix/:/var/run/tarantool/jepsen.control>  box.space._session_settings:select()
---
- - ['error_marshaling_enabled', false]
  - ['sql_default_engine', 'memtx']
  - ['sql_defer_foreign_keys', false]
  - ['sql_full_column_names', false]
  - ['sql_full_metadata', false]
  - ['sql_parser_debug', false]
  - ['sql_recursive_triggers', true]
  - ['sql_reverse_unordered_selects', false]
  - ['sql_select_debug', false]
  - ['sql_vdbe_debug', false]
...

But it is possible to create Lua function in SQL and use box.space.j:alter{is_sync = true} in it.

Add test with ERR_INJs for reproducing dirty reads

tarantool/tarantool#5208 (comment)

Fault injection на dirty read попробовать в тестах Jepsen.

  1. Обязательно Debug сборка Тарантул.
  2. В отдельном файбере ставим ERRINJ_WAL_DELAY и за ним запись в спейс - этот файбер повиснет после добавления записи
  3. В исходном файбере читаем то, что вставляли в п.2
  4. В исходном файбере (последовательно) выставляем ERRINJ_WAL_WRITE_DISK и снимаем ERRINJ_WAL_DELAY - получаем откат и п.3 становится грязным чтением

пример (сорри за лапидарность):

box.cfg{} box.schema.create_space('t'):create_index('p')
f1=require('fiber').new(function() box.error.injection.set('ERRINJ_WAL_DELAY', true) box.space.t:insert{3} end)
box.space.t:select{}
---
- - [3]
...
box.error.injection.set('ERRINJ_WAL_WRITE_DISK', true)
box.error.injection.set('ERRINJ_WAL_DELAY', false)
box.space.t:select{}
---
- []
...

Close connections to DBMS

Right now connections in a client stay opened and we should close them.
Unfortunatley there is no arbitrary method in next.jdbc to close a connection.

See discussion: https://clojureverse.org/t/how-to-manage-database-connection-in-clojure/5067/8

Possible solution in https://www.github.com/jepsen-io/redis/tree/master/src%2Fjepsen%2Fredis%2Fclient.clj

(defn close!
  "Closes a connection to a node."
  [^java.io.Closeable conn]
  (.close (:pool conn)))

Another approach is to use with-open, that will cleanup connection automatically - https://cljdoc.org/d/seancorfield/next.jdbc/1.1.610/doc/getting-started/prepared-statements

It is possible to log connect/disconnect operations in Tarantool with triggers - https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_session/#box-session-on-connect:

function log_connect ()
  local log = require('log')
  local m = 'Connection. user=' .. box.session.user() .. ' id=' .. box.session.id()
  log.info(m)
end

function log_disconnect ()
  local log = require('log')
  local m = 'Disconnection. user=' .. box.session.user() .. ' id=' .. box.session.id()
  log.info(m)
end

box.session.on_connect(log_connect)
box.session.on_disconnect(log_disconnect)

Logging shows that Jepsen performs about 500 connects in test Counter, i.e. one connection per operation.

Remove option `--single-mode`

In a first commit option --single-mode was introduced. It is needed at least in an instance file to decide should we use cluster-specific options or not. But we can easily understand mode automatically because we have a vector with IP addresses of nodes.
It makes command-line to run jepsen tests more universal.

Add Kill nemesis

kill (SIGKILL) - kill and restart a randomly chosen database process that is a part of the cluster.

Add Bank test

Bank (verifies that DBMS conserves the total sum of values in a table, while transferring units between various rows).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.