xtdb / xtdb Goto Github PK

An immutable database for application development and time-travel data compliance, with SQL and XTQL. Developed by @juxt

Home Page: https://xtdb.com

License: Mozilla Public License 2.0

Clojure 78.91% Java 9.14% Shell 0.33% Dockerfile 0.07% Emacs Lisp 0.02% CSS 0.02% Kotlin 10.16% TypeScript 0.49% Python 0.82% Rust 0.01% Jinja 0.05%

database datalog document-database bitemporal immutable-store temporal xtdb sql

xtdb's Introduction

XTDB is an open-source immutable database with comprehensive time-travel. XTDB has been built to simplify application development and address complex data compliance requirements. XTDB can be used via SQL and XTQL.

XTDB 2.x is currently in early access; if you are looking for a stable release of an immutable document database with bitemporal query capabilities, we are continuing to develop and support XTDB 1.x at https://github.com/xtdb/xtdb/tree/1.x.

Major features:

Immutable - while it’s optimised for current-time queries, you can audit the full history of your database at any point, with the need for snapshots.
'Bitemporal' - all data is accurately versioned as updates are made ('system' time), but it also allows you to separately record and query when that data is, was, or will become valid in your business domain ('valid' time).
Dynamic - you don’t need to specify schema up-front before documents (rows with arbitrarily nested data) can be inserted.
Speaks both SQL and XTQL.

XTQL is a data-oriented, composable query language - designed from the ground up to be amenable to both hand-written and generated queries. It is heavily inspired by the theoretical bases of both Datalog and the relational algebra.

It also supports SQL, for compatibility with existing experience and tooling. Particularly, it supports the bitemporal functionality as specified in the SQL:2011 standard.
Cloud native - the ACID, columnar engine is built on Apache Arrow and designed for object storage
It is written and supported by JUXT.

Quick links

2.x Documentation
1.x Documentation (see the 1.x branch)
Maven releases
Release notes
Support: discuss.xtdb.com | [email protected]
Developing XTDB 2.x
Bibliography

Inside-out Architecture

XTDB embraces the transaction log as the central point of coordination when running as a distributed system.

What do we have to gain from turning the database inside out?

Simpler code, better scalability, better robustness, lower latency, and more flexibility for doing interesting things with data.

— Martin Kleppmann

Pre-Release Snapshot Builds

Maven snapshot versions are periodically published under 2.0.0-SNAPSHOT and are used to facilitate support and debugging activities during the development cycle. To access snapshots versions, the Sonatype snapshot repository must be added to your project definition:

<repository>
  <id>sonatype.snapshots</id>
  <name>Sonatype Snapshot Repository</name>
  <url>https://s01.oss.sonatype.org/content/repositories/snapshots</url>
  <releases>
    <enabled>false</enabled>
  </releases>
  <snapshots>
    <enabled>true</enabled>
  </snapshots>
</repository>

;; project.clj
:repositories [["sonatype-snapshots" {:url "https://s01.oss.sonatype.org/content/repositories/snapshots"}]]

;; deps.edn
:mvn/repos {"sonatype-snapshots" {:url "https://s01.oss.sonatype.org/content/repositories/snapshots"}}

In contrast to regular releases which are immutable, a 2.0.0-SNAPSHOT release can be "updated" - this mutability can often be useful but may also cause unexpected surprises when depending on 2.0.0-SNAPSHOT for longer than necessary. Snapshot versions, including full 2.0.0-<timestamp> coordinates (which are useful to avoid being caught out by mutation), can be found here.

XTDB 1.x

XTDB 1.x is a mature product offering that is used in production by many organizations, and its ongoing development is focused on hardening and performance. XTDB 1.x is an embeddable database that emphasizes in-process JVM usage to enable advanced functionality like user-defined transaction functions, speculative transactions, programmatic Datalog rules, and more.

XTDB 2.x’s initial objective is to take the key principles embodied in XTDB 1.x — immutability, schemaless records, and temporal querying — to a mainstream audience.

	XTDB 1.x	XTDB 2.x (early access)
Status	Stable	Experimental (pre-alpha)
Initial Stable Release	2019	TBD
Query languages	EDN Datalog	XTQL + SQL:2011
Bitemporal Querying	Timeslice only (point-in-time)	Fully bitemporal - SQL:2011 and beyond
Query focus	OLTP	OLAP + OLTP ('HTAP')
Storage & Compute	Coupled (nodes are full replicas)	Separated (cost-effective scale out)
Primary Storage Format	Custom Key-Value encodings	Columnar Apache Arrow
-	-	-
Immutable Semantics	Yes	Yes
Online Transactions (ACID, strong consistency)	Yes	Yes
Always-On Bitemporality	Yes	Yes
Dynamism (ad-hoc graph joins, union typing, schemaless, etc.)	Yes	Yes

Repo Layout

2.x is split across multiple projects which are maintained within this repository.

api contains the user API to XTDB 2.x.
core contains the main functional components of XTDB along with interfaces for the pluggable storage components (Kafka, JDBC, S3 DB etc.). Implementations of these storage options are located in their own projects.
http-server and http-client-jvm contain the HTTP server implementation, and a remote client for JVM users.
Storage and other modules are under modules. Modules are published to Maven independently so that you can maintain granular dependencies on precisely the individual components needed for your application.

Questions, Thoughts & Feedback

We would love to hear from you: [email protected]

License and Copyright

XTDB is licensed under the Mozilla Public License, version 2 or (at your option) any later version.

xtdb's People

Contributors

Stargazers

Watchers

Forkers

maacl jorinvo armincerf crimeminister mike-thompson-day8 ideaplexus cxz rayokota zhangjingcat agent001 intfrr vijaykiran keesterbrugge theronic johantonelli adamniederer markwoodhall souenzzo misterzirillo clojure-land mikewin andreacrotti e7dal mbrukman luciodale forkkit agam csm sinhasantos tiele jtaylor-io orolle cvnb jarohen crux-team malcolmsparks refset xroger88 jmorton vivekseth severeoverfl0w ezzatomar straux dotemacs neuromantik33 spacegangster xiayb dgr laashub-soa jonpither randomseed-io bennyandresen samrose msladecek bestjex kriti-sc ribelo keytiong mikepjb tolitius tchigher gustavomonarin jaju vaelatern deobald alistaironeill studiojms thekotlindev coyotesqrl iwillig keeds baajarmeh rafaelsfrr iomonad kevinmershon luposlip randomactsofsoftware yueda27 remyrd reducecombine joelittlejohn davidalphafox err felipeschneider88 caleb psagers xlfe reborg saneef vaibhavkulkarni nzioki aleksandersumowski mavxg awesomedatatool tatut sheluchin kurmanka bnationsdev hendursaga tekacs

xtdb's Issues

Spike Reducers

Our querying code potentially matches the criteria for when to use reducers, as laid out by Alex in this comment: https://www.reddit.com/r/Clojure/comments/5lhz7q/seq_functions_reducers_transducers_when_do_i_use/dbw3fph/.

It seems a performance gain worth chasing, to utilise the cores available.

Branch Snapshotting

The idea of branching off a time dimension. This is with a specific use case in mind: banks wanting to snapshot their data at business time T, but then to make corrections against this newly formed timeline.

It would be good to have a story around this in the docs - the need for, and the way Crux solves this problem (even if it doesn't directly solves it, but offers a helpful path).

Currently, not MVP.

Kafka Dashboard

We want to have some insight into the Kafka cluster, but avoid building too many pieces, we might be able to use something like these for example:
https://www.datadoghq.com/dashboards/kafka-dashboard/
https://www.confluent.io/product/control-center/

Kafka Deployment

There should be a way to deploy a Crux cluster into the cloud.

This might involve several layers, from Docker Compose to full AWS one click deployment or Kubernetes pods.

For this to be simple to use, we're likely to need a remote REST API, see #8. Depends on #5.

For the MVP we assume that a Crux cluster also includes Kafka and Zookeeper, these components might later be possible to swap out.

Subscriptions

Can API users make subscriptions?

Performance Testing in Cloud

Performance testing on AWS with large data sets.

Test Loading Graphs from the Stanford Network Analysis Project

https://snap.stanford.edu/index.html

Performance Bug - Joins

The join query in the micro-bench is extremely slow. We are not leveraging the AVT index in the triple store where we could be.

Backup & Restore

Ability to take a backup of the KV store and restore it on another node.

A variant can be to seed it on start with a URL to either a directory or S3 resource to bootstrap the KV store from.
There should also be a way to trigger the backup of a running node.

We're currently want to avoid building intelligence about when to do these actions into the code itself as long as it can be scripted.

Add CI and show status on Readme

Bitemp Indexes

We want to support bitemporality. In our model the easiest way should be to add a transaction time at the end of the index key, as business time is the usual axis for query. When querying for transaction time, business time can default to transaction time if not given, as there can (?) be no writes in the future. When transaction time is not given, it defaults to now, or the last part of the index is simply ignored.

Needs further analysis.

Documentation website

Because there's only so much you can fit in a github readme.

For MVP: something simple and clean in Asciidoc, with basic authentication.

Crux Query Node Index Status

Some of this info might be indirectly be visible through a Kafka dashboard, as it depends on where the node is at in its consumption of the topics.

Document Store Spike

We want to spike Proposal A from docs/internals.md aside crux.kv, it should preferably be able to reuse most of the crux.kv-store and crux.db protocols and have crux.query work without too many changes.

Find a Realistic Query Domain with Examples

This isn't necessarily about performance, but just a realistic domain and set of queries to stress the engine in different ways.

RocksDB Backup

Part of the API to the KV store must be to be able to back itself up to a directory.
Another namespace will then take this, tgz it and back it up somewhere else (like s3).

The Kafka offsets are stored inside the KV store, so a replica should be able to fetch the latest backup and simply continue.

https://github.com/facebook/rocksdb/wiki/How-to-backup-RocksDB%3F

Timeseries Data

We might want to create a separate topic for transient, high volume data that can still be queries, but should not get stuck in the transaction log, and might also not be documents, but more individual measurements. Some of this high churn data could be related to existing entities, and might also be pushed onto a second, more durable topic for various retention mechanisms.

You can potentially get some of this retention "for free" by putting them into a compacted topic by key/time-granularity, like :foo/#inst "2001-01-01" which will ensure that there's only one version of :foo for this day kept.

Add front page

Queries should allow Datasource to setup a context

There's a large perf gain for Rocks in re-using Iterators in queries, especially where multiple seeks might be made. For example there's a 30-40% improvement for the multiple-clauses query in the micro-bench.

We could add a new protocol fn in-context to Datasource, so that queries's can surround query execution with this.

Some thoughts:

the name in-context can be improved upon
Since we may one day use reducers or pmap & friends, we should probably not use thread-locals and bite the bullet with a context arg to iterate-with.

I had an initial attempt, but have decided to wait on implementing this, until a conversation is had around the above two points. It's also a straightforward change, so it's not critical at this stage (but would be critical prior to the MVP release).

Query Node UberJAR

We want to be able to build a JAR that starts consuming from a Kafka topic and populates the KV store.

This implies adding some basic configuration, like where the broker lives, which topic to subscribe to and where to store the KV data.

This does not include remote access to query the KV store yet, for this, see #8.

Separate Topics for Transactions and Documents

By having the documents in one (or more) separate, potentially partitioned, topic(s) which the immutable transaction log refers to, we can delete and compact a way data. This also opens up to archive older or seldom used documents in various ways.

Local Dev Node

It should be possible to leverage the embedded Kafka setup used for tests to start up a local cluster and running the main method to have an ingest node running in process, with state persisted across runs.

Load DBPedia Dataset

We want to import the N-triples from http://wiki.dbpedia.org/services-resources/ontology

This is around 8Gb unzipped, without any temporal aspects. This is still a good first start to stress the engine.

Will require some form of upsert and involves mapping RDF to meaningful EDN.

Datalog Range Queries

Range queries should use the underlying indexes to scan the data.

Micro-optimisation: use types for seek k/vs

As part of seek-and-iterate, we generate small vector tuples. Rudimentary REPL testing shows a slight perf gain if we use defrecords.

Note, the perf downside of small vectors could also be due to the way we are consuming them - more investigation needed.

Note (2), profiling shows it could be that multiple calls to (.key i) on the Rocks iterator has a hit. Refactoring to a micro-type should also these tackle redundant calls.

LMDB Backend

There are various LMDB bindings for Java. We want one which works in Java 9 and later.

https://github.com/lmdbjava/lmdbjava
https://github.com/lmdbjava/lmdbjava/blob/master/src/test/java/org/lmdbjava/TutorialTest.java

https://github.com/LWJGL/lwjgl3
https://github.com/LWJGL/lwjgl3/blob/master/modules/samples/src/test/java/org/lwjgl/demo/util/lmdb/LMDBDemo.java

https://github.com/deephacks/lmdbjni

The lwjql library claims to work in Java 9, while lmdbjava seems to have issues.

Eviction

Ability to really remove data as part of a transaction. Combined with Kafka compaction this can be used to implement various retention mechanisms and to delete personal data.

It should be a relatively cheap operation, and not require rebuilding the entire topics or indexes.

Crux Client API

We want a simple crux.client namespace that allow you to both submit transactions to Kafka and query the query nodes over HTTP. This might not be the end game. See #8

Query Node API

There needs to be a way to query the data.

The simplest approach is to assume every participant is running their own query node as a library, and use the API directly.

Another approach is some from of remote API, either just taking Datalog over REST and returning EDN, or GraphQL (but likely not for MVP).

Delete, Upsert and Put

We want to be able to transact in a few different ways.

Delete can delete either an entire entity, or an attribute.
Upsert merges a new version of an entity with the existing one.
Put overwrites and entity with a new version.

It should preferably be possible to provide an identity attribute when transacting and have the entity id resolved if it an entity with this attribute exists.

Investigate use of Criterium

Initially to help with timings and reporting in the micro-bench

Retention Mechanisms

We want to explore various retention mechanisms. For the docs this can be done similar to eviction #32, having the query node decide which versions of an entity to delete. For TTL it's for example all versions of an entity older than a specific time (compared to transaction time) for example.

Unlike eviction, retention is likely done by logic on the query node.
There are more advanced versions where you keep data for say every day or month.

LMDB Backup

Similar to RocksDB #4 we need to be able to backup and restore, see #6, the LMDB back end.

RocksDB Transactions

See https://github.com/facebook/rocksdb/wiki/Transactions

We would prefer the entire transaction from Kafka to be written in a single RocksDB transaction.

Lookup-refs

We want to support lookup refs or something similar for the crux.kv/id attribute, both to simplify upserts and deletes. We also want to support user defined identity attributes on the initial insert.

Datalog Rules

Recursion and traversal can only be expressed using rules in Datalog, so we need support for this.

Performance Tuning - matching results

Use reducers and transducers for Query.

Query not optimised for A but no V

Should optimise for

[e :attr]

Multi-Valued Attributes

We want to support more than one value for an attribute, similar to Datomic's cardinality many.

An extension of this is to support actual types apart from (the implicit) sets, primarily lists but maybe also maps. This might be easier done using documents and not triplets. This card does not necessarily need to go that far, as that probably requires changing the underlying model.

Add micro-bench checks to the build

I.e. to fail if a query goes beyond a sensible threshold

cider-refresh broken

An intrinsic part of my workflow. Worth looking into.

History API

It should be possible to read the entire history for a set of entities.

This does not necessarily mean that we need to support queries over the history database.

Read Your Own Writes

There should be a way to ensure you can easily read what you just wrote.

When transacting, one could get some form of transaction context (transaction id, transaction time) back, and queries can block until the node has caught up until this point.

More Primitive Types

Currently number? maps straight to long, which isn't correct. We should consider simply encoding the basic primitives and a support a similar list to Datomic, see:
https://docs.datomic.com/on-prem/schema.html#required-schema-attributes

We could also consider defaulting to the EDN encoding for other things for simplicity?

Spike middleware approach for enforcing a common schema

Micro-bench as Part of Build

We want the bench marks to be runnable as part of the build, potentially in CI. They should be canary style tests that can tell us if something suddenly went slow.

Index-based Query Backend

This new query backend will be based straight of the indexes and use worst case optimal joins.

These indexes are designed to be able to be semi-lazy, so only when sorting is necessary do they need to realise full results from an index (but usually not for the entire query). One goal is to be able to get a lazy seq of results, so one can handle much larger result sets.

The indexes have implicit support for normal unification joins and ranges, so these features will be kept from the current query capabilities.

Set based not, != should be possible to add. Clause-based not (ie. not-exists) should also be possible to implement. We also aim to support or. Support for and is implicit, but the operator might be needed for certain scenarios with nesting.

We don't aim to support or-join or not-join yet, as they are a bit quirkier both to understand and to implement.

Function predicates (other than the built-ins) can be added as a post-processing step when realising a single result, but often they should be possible to push down as an decorator on a specific unary index. Predicates operate on the ground facts, and need the entity documents to retrieve the attribute values.

This issue is closely related to #10.

Continuous Integration

Once we have #1965 and #1972. Possibly using AWS CodeDeploy.

Connect with CircleCI.

Redeployment should be manual, not automatic, as long as we're all pushing to master with wild abandon. People might be using the deployed node to do tests, it shoudn't go down every time someone fixes a typo.

Datalog Console

Use for testing in the cloud. A simple wep-app presenting a datalog console, using an embedded Crux, connecting to a provision kafka cluster.

This is a tool for development now, that may or may not graduate into part of the actual product.

Crux Query Node Deployment

We want a way to deploy the Crux query nodes next to our Kafka cluster, see #1965.