Giter Club home page Giter Club logo

hitchhiker-tree's Introduction

Hitchhiker Tree

Hitchhiker trees are a newly invented (by @dgrnbrg) datastructure, synthesizing fractal trees and functional data structures, to create fast, snapshottable, massively scalable databases.

Watch the talk from Strange Loop to learn more, especially about the concept!

What's in this Repository?

The hitchhiker namespaces contain a complete implementation of a persistent, serializable, lazily-loaded hitchhiker tree. This is a sorted key-value datastructure, like a scalable sorted-map. It can incrementally persist and automatically lazily load itself from any backing store which implements a simple protocol.

Outboard is a sample application for the hitchhiker tree. It includes an implementation of the IO subsystem backed by Redis, and it manages all of the incremental serialization and flushing.

The hitchhiker tree is designed very similarly to how Datomic's backing trees must work--I would love to see integration with DataScript for a fully open source Datomic.

Outboard

Outboard is a simple API for your Clojure applications that enables you to make use of tens of gigabytes of local memory, far beyond what the JVM can manage. Outboard also allows you to restart your application and reuse all of that in-memory data, which dramatic reduces startup times due to data loading.

Outboard has a simple API, which may be familiar if you've ever used Datomic. Unlike Datomic, however, Outboard trees can be "forked" like git repositories, not just transacted upon. Once you've created a tree, you can open a connection to it. The connection mediates all interactions with the outboard data: it can accept transactions, provide snapshots for querying, and be cloned.

API Usage Example

(require '[hitchhiker.outboard :as ob])

;; First, we'll create a connection to a new outboard
(def my-outboard (ob/create "first-outboard-tree"))

;; We'll get a snapshot of the outboard's current state, which is empty for now
;; Note that snapshots are only valid for 5 seconds, but making a new snapshot is free
;; It would be easy to write an "extend-life" function for snapshots
(def first-snapshot (ob/snapshot my-outboard))

;; This will insert the pair "hello" "world" only into the snapshot
(-> first-snapshot
    (ob/insert "hello" "world")
    (ob/lookup "hello"))
;;=> "world"

;; Inserts must be done in a transaction to persist
(-> (ob/snapshot my-outboard)
    (ob/lookup "hello"))
;;=> nil

;; We can insert some data into it via a transaction
;; The update! function is atomic, just like swap! for atoms
;; update! will pass its transaction function a snapshot of the outboard
(ob/update! my-outboard (fn [snapshot] (ob/insert snapshot "goodbye" "moon")))

;; Since the insert was transacted, it persists
(-> (ob/snapshot my-outboard)
    (ob/lookup "goodbye"))
;;=> "moon"

;; If you'd like, you can "fork" an outboard. Let's fork our outboard.
;; To fork, you just save a snapshot under a new name
(def forked-outboard (ob/save-as (ob/snapshot my-outboard) "forked-outboard"))

;; Now, we can transact into the snapshot, which will not affect other forks
(ob/update! forked-outboard (fn [snapshot] (ob/insert snapshot "goodbye" "sun")))

;; As we can see:
(-> (ob/snapshot my-outboard)
    (ob/lookup "goodbye"))
;;=> "moon"
(-> (ob/snapshot forked-outboard)
    (ob/lookup "goodbye"))
;;=> "sun"

You should check out the docstrings/usage of these functions, too:

  • close will gracefully shut down an outboard connection
  • open will reopen an outboard (you can only create outboards which don't exist)
  • destroy will delete all data related to the closed, named outboard
  • lookup and lookup-fwd-iter provide single and ordered sequence access to snapshots

Background

Outboard is an off-heap functionally persistent sorted map. This map allows your applications to retain huge data structures in memory across process restarts.

Outboard is the first library to make use of hitchhiker trees. Hitchhiker trees are a functionally persistent, serializable, off-heap fractal B tree. They can be extended to contain a mechanism to make statistical analytics blazingly fast, and to support column-store facilities.

Details about hitchhiker trees, including related work, can be found in docs/hitchhiker.adoc.

Testing

You'l need a local Redis instance running to run the tests. Once you have it, just run

lein test

Benchmarking

This library includes a detailed, instrumented benchmarking suite. It's built to enable comparative benchmarks between different parameters or code changes, so that improvements to the structure can be correctly categorized as such, and bottlenecks can be reproduced and fixed.

To try it, just run

lein bench

The benchmark tool supports testing with different parameters, such as:

  • The tree's branching factor
  • Whether to enable fractal tree features, just use the B-tree features, or compare to a vanilla Clojure sorted map
  • Reordering of delete operations (to stress certain workloads)
  • Whether to use the in-memory or Redis-backed implementation

The benchmarking tool is designed to make it convenient to run several benchmarks; each benchmark's parameters can be separate by a --. This makes it easy to understand the characteristics of the hitchhiker tree over a variety of settings for a parameter.

You can run a more sophisticated experiment benchmark by doing

lein bench OUTPUT_DIR options -- options-for-2nd-experiment -- options-for-3rd-experiment

This generates an Excel workbooks called "analysis.xlsx" with benchmark results. For instance, if you'd like to run experiments to understand the performance difference between various values of B (the branching factor), you can do:

lein bench perf_diff_experiment -b 10 -- -b 20 -- -b 40 -- -b 80 -- -b 160 -- -b 320 -- -b 640

And it will generate lots of data and the Excel workbook for analysis.

If you'd like to see the options for the benchmarking tool, just run lein bench.

Technical details

See the doc/ folder for technical details of the hitchhiker tree and Redis garbage collection system.

Gratitude

Thanks to the early reviewers, Kovas Boguta & Leif Walsh. Also, thanks to Tom Faulhaber for making the Excel analysis awesome!

License

Copyright © 2016 David Greenberg

Distributed under the Eclipse Public License version 1.0

hitchhiker-tree's People

Contributors

danboykis avatar dgrnbrg avatar ernestas avatar gfredericks avatar kyleburton avatar louisburke avatar radarhere avatar rbanffy avatar ricardojmendez avatar rockymeza avatar tommy-mor avatar yogthos avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.