unum-cloud / ustore Goto Github PK

Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

Home Page: https://unum-cloud.github.io/ustore/

License: Apache License 2.0

CMake 3.35% C 7.46% C++ 81.27% Python 5.96% Java 0.92% Shell 0.30% Dockerfile 0.50% Jupyter Notebook 0.24%

acid database key-value-store document-database graph-database json apache-arrow arrow networkx pandas

ustore's Introduction

UStore

Modular ¹ Multi-Modal ² Transactional ³ Database
For Artificial Intelligence ⁴ and Semantic Search ⁵

1. supports: RocksDB • LevelDB • UDisk • UCSet backends
2. can store: Blobs • Documents • Graphs • 🔜 Features • 🔜 Texts
3: guarantees Atomicity • Consistency • Isolation • Durability
4: comes with Pandas and NetworkX API and 🔜 PyTorch data-loaders
5: brings vector-search integrated with USearch and UForm

drivers: Python • C • C++ • GoLang • Java
packages: PyPI • CMake • Docker Hub

Youtube intro • Discord chat • Full documentation

Quickstart

Installing UStore is a breeze, and the usage is about as simple as a Python dict.

$ pip install ukv
$ python

from ukv import umem

db = umem.DataBase()
db.main[42] = 'Hi'

We have just create an in-memory embedded transactional database and added one entry in its main collection. Would you prefer that data on disk? Change one line.

from ukv import rocksdb

db = rocksdb.DataBase('/some-folder/')

Would you prefer to connect to a remote UStore server? UStore comes with an Apache Arrow Flight RPC interface!

from ukv import flight_client

db = flight_client.DataBase('grpc://0.0.0.0:38709')

Are you storing NetworkX-like MultiDiGraph? Or Pandas-like DataFrame?

db = rocksdb.DataBase()

users_table = db['users'].table
users_table.merge(pd.DataFrame([
    {'id': 1, 'name': 'Lex', 'lastname': 'Fridman'},
    {'id': 2, 'name': 'Joe', 'lastname': 'Rogan'},
]))

friends_graph = db['friends'].graph
friends_graph.add_edge(1, 2)

assert friends_graph.has_edge(1, 2) and \
    friends_graph.has_node(1) and \
    friends_graph.number_of_edges(1, 2) == 1

Function calls may look identical, but the underlying implementation can be addressing hundreds of terabytes of data placed somewhere in persistent memory on a remote machine.

Is someone else concurrently updating those collections? Bundle your operations to guarantee consistency!

db = rocksdb.DataBase()
with db.transact() as txn:
    txn['users'].table.merge(...)
    txn['friends'].graph.add_edge(1, 2)

So far we have only covered the tip of the UStore. You may use it to...

Get C99, Python, GoLang, or Java wrappers for RocksDB or LevelDB.
Serve them via Apache Arrow Flight RPC to Spark, Kafka, or PyTorch.
Store Document and Graphs in embedded DB, avoiding networking overheads.
Tier DBMS between in-memory and persistent backends under one API.

But UStore can more. Here is the map:

Basic Usage:
- Modalities
  - Storing Blobs
  - Storing Documents
  - Storing Graphs
  - Storing Vectors
- Drivers
  - For Python ∆
  - For C ∆
  - For C++ ∆
  - For GoLang ∆
  - For Java ∆
- AI Usecases ∆
- Frequently Questioned Answers
- Frequently Asked Questions
Advanced Usage for production, performance tuning, and administration:
For contributors and advanced users looking to fork, extend, wrap, or distribute and, potentially, monetize alternative builds of UStore:

## Basic Usage

UStore is intended not just as database, but as "build your database" toolkit and an open standard for NoSQL potentially-transactional databases, defining zero-copy binary interfaces for "Create, Read, Update, Delete" operations, or CRUD for short.

A few simple C99 headers can link almost any underlying storage engine to numerous high-level language drivers, extending their support for binary string values to graphs, flexible-schema documents, and other modalities, aiming to replace MongoDB, Neo4J, Pinecone, and ElasticSearch with a single ACID-transactional system.

Redis, for example, provides RediSearch, RedisJSON, and RedisGraph with similar objectives. UStore does it better, allowing you to add your favorite Key-Value Stores (KVS), embedded, standalone, or sharded, such as FoundationDB, multiplying its functionality.

Modalities

Blobs

Binary Large Objects can be placed inside UStore. The performance will vastly vary depending on the used underlying technology. The in-memory UCSet will be the fastest, but the least suited for larger objects. The persistent UDisk, when properly configured, can entirely bypass the the Linux kernel, including the filesystem layer, directly addressing block devices.

Modern persistent IO on high-end servers can exceed 100 GB/s per socket when built on user-space drivers like SPDK. This is close to the real-world throughput of high-end RAM and unlocks new, uncommon to databases use cases. One may now put a Gigabyte-sized video file in an ACID-transactional database, right next to its metadata, instead of using a separate object store, like MinIO.

Documents

JSON is the most commonly used document format these days. UStore document collections support JSON, as well as MessagePack, and BSON, used by MongoDB.

UStore doesn't scale horizontally yet, but provides much higher single-node performance, and has almost linear vertical scalability on many-core systems thanks to the open-source simdjson and yyjson libraries. Moreover, to interact with data, you don't need a custom query language like MQL. Instead we prioritize open RFC standards to truly avoid vendor locks:

JSON Pointer: RFC 6901 to address nested fields.
JSON Patch: RFC 6902 for field-level updates.
JSON MergePatch: RFC 7386 for document-level updates.

Graphs

Modern Graph databases, like Neo4J, struggle with large workloads. They require too much RAM, and their algorithms observe data one entry at a time. We optimize on both fronts:

Using delta-coding to compress inverted indexes.
Updating classical graph algorithms for high-latency storage to process graphs in Batch-like or Edge-centric fashion.

Vectors

Feature Stores and Vector Databases, like Pinecone, Milvus, and USearch provide standalone indexes for vector search. UStore implements it as a separate modality, on par with Documents and Graphs. Features:

8-bit integer quantization.
16-bit floating-point quantization.
Cosine, Inner Product, and Euclidean metrics.

Drivers

UStore for Python and for C++ look very different. Our Python SDK mimics other Python libraries - Pandas and NetworkX. Similarly, C++ library provides the interface C++ developers expect.

As we know, people use different languages for different purposes. Some C-level functionality isn't implemented for some languages. Either because there was no demand for it, or as we haven't gotten to it yet.

Name	Transact	Collections	Batches	Docs	Graphs	Copies
C99 Standard	✓	✓	✓	✓	✓	0

C++ SDK	✓	✓	✓	✓	✓	0
Python SDK	✓	✓	✓	✓	✓	0-1
GoLang SDK	✓	✓	✓	✗	✗	1
Java SDK	✓	✓	✗	✗	✗	1

Arrow Flight API	✓	✓	✓	✓	✓	0-2

Some frontends here have entire ecosystems around them! Apache Arrow Flight API, for instance, has its own drivers for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby and Rust.

Frequently Questioned Answers

Keys are 64-bit integers, by default. Why?
Values are binary strings under 4 GB long. Why?

Frequently Asked Questions

Transactions are ACI(D) by-default. What does it mean?
Why not use LevelDB or RocksDB interface? Answered
Why not use SQL, MQL or CYPHER? Answered
Does UStore support Time-To-Live? Answered
Does UStore support compression? Answered
Does UStore support queues? Answered
How can I add drivers for language X? Answered
How can I add database X as an engine? Answered

Advanced Usage

Engines

Following engines can be used almost interchangeably. Historically, LevelDB was the first one. RocksDB then improved on functionality and performance. Now it serves as the foundation for half of the DBMS startups.

	LevelDB	RocksDB	UDisk	UCSet
Speed	1x	2x	10x	30x
Persistent	✓	✓	✓	✗
Transactional	✗	✓	✓	✓
Block Device Support	✗	✗	✓	✗
Encryption	✗	✗	✓	✗
Watches	✗	✓	✓	✓
Snapshots	✓	✓	✓	✗
Random Sampling	✗	✗	✓	✓
Bulk Enumeration	✗	✗	✓	✓
Named Collections	✗	✓	✓	✓
Open-Source	✓	✓	✗	✓
Compatibility	Any	Any	Linux	Any
Maintainer	Google	Facebook	Unum	Unum

UCSet and UDisk are both designed and maintained by Unum. Both are feature-complete, but the most crucial feature our alternatives provide is performance. Being fast in memory is easy. The core logic of UCSet can be found in the templated header-only ucset library.

Designing UDisk was a much more challenging 7-year long endeavour. It included inventing new tree-like structures, implementing partial kernel bypass with io_uring, complete bypass with SPDK, CUDA GPU acceleration, and even a custom internal filesystem. UDisk is the first engine to be designed from scratch with parallel architectures and kernel-bypass in mind.

Transactions

Atomicity

Atomicity is always guaranteed. Even on non-transactional writes - either all updates pass or all fail.

Consistency

Consistency is implemented in the strictest possible form - "Strict Serializability" meaning that:

reads are "Serializable",
writes are "Linearizable".

The default behavior, however, can be tweaked at the level of specific operations. For that the ::ustore_option_transaction_dont_watch_k can be passed to ustore_transaction_init() or any transactional read/write operation, to control the consistency checks during staging.

	Reads	Writes
Head	Strict Serial	Strict Serial
Transactions over Snapshots	Serial	Strict Serial
Transactions w/out Snapshots	Strict Serial	Strict Serial
Transactions w/out Watches	Strict Serial	Sequential

If this topic is new to you, please check out the Jepsen.io blog on consistency.

Isolation

	Reads	Writes
Transactions over Snapshots	✓	✓
Transactions w/out Snapshots	✗	✓

Durability

Durability doesn't apply to in-memory systems by definition. In hybrid or persistent systems we prefer to disable it by default. Almost every DBMS that builds on top of KVS prefers to implement its own durability mechanism. Even more so in distributed databases, where three separate Write Ahead Logs may exist:

in KVS,
in DBMS,
in Distributed Consensus implementation.

If you still need durability, flush writes on commits with an optional flag. In the C driver you would call ustore_transaction_commit() with the ::ustore_option_write_flush_k flag.

Containers and Cloud Deployments

The entire DBMS fits into a sub 100 MB Docker image. Run the following script to pull and run the container, exposing Apache Arrow Flight server on the port 38709. Client SDKs will also communicate through that same port, by default.

docker run -d --rm --name ustore-test -p 38709:38709 unum/ustore

The default configuration file can be retrieved with:

cat /var/lib/ustore/config.json

The simplest way to connect and test would be the following command:

python ...

Pre-packaged UStore images are available on multiple platforms:

Docker Hub image: v0.7.
RedHat OpenShift operator: v0.7.
Amazon AWS Marketplace images:
- Free Community Edition: v0.4.
- In-Memory Edition: 🔜
- Performance Edition: 🔜

Don't hesitate to commercialize and redistribute UStore.

Configuration

Tuning databases is as much art as it is science. Projects like RocksDB provide dozens of knobs to optimize the behavior. We allow forwarding specialized configuration files to the underlying engine.

{
    "version": "1.0",
    "directory": "./tmp/"
}

We also have a simpler procedure, which would be enough for 80% of users. That can be extended to utilize multiple devices or directories, or to forward a specialized engine config.

{
    "version": "1.0",
    "directory": "/var/lib/ustore",
    "data_directories": [
        {
            "path": "/dev/nvme0p0/",
            "max_size": "100GB"
        },
        {
            "path": "/dev/nvme1p0/",
            "max_size": "100GB"
        }
    ],
    "engine": {
        "config_file_path": "./engine_rocksdb.ini",
    }
}

Database collections can also be configured with JSON files.

Key Sizes

As of the current version, 64-bit signed integers are used. It allows unique keys in the range from [0, 2^63). 128-bit builds with UUIDs are coming, but variable-length keys are highly discouraged. Why so?

Using variable length keys forces numerous limitations on the design of a Key-Value store. Firstly, it implies slow character-wise comparisons — a performance killer on modern hyperscalar CPUs. Secondly, it forces keys and values to be joined on a disk to minimize the needed metadata for navigation. Lastly, it violates our simple logical view of KVS as a "persistent memory allocator", putting a lot more responsibility on it.

The recommended approach to dealing with string keys is:

Choose a mechanism to generate unique integer keys (UID). Ex: monotonically increasing values.
Use "paths" modality build up a persistent hash map of strings to UIDs.
Use those UIDs to address the rest of the data in binary, document and graph modalities.

This will result in a single conversion point from string to integer representations and will keep most of the system snappy and the C-level interfaces simpler than they could have been.

Value Sizes

We can only address 4 GB values or smaller as of the current now. Why? Key-Value Stores are generally intended for high-frequency operations. Frequently (thousands of times each second), accessing and modifying 4 GB and larger files is impossible on modern hardware. So we stick to smaller length types, making using Apache Arrow representation slightly easier and allowing the KVS to compress indexes better.

Roadmap

Our development roadmap is public and is hosted within the GitHub repository. Upcoming tasks include:

Read full roadmap in our docs here.

ustore's People

Contributors

Stargazers

Watchers

ustore's Issues

Supporting in-document IDs and joined doc imports/exports

Benefits

Java, GoLang and many other bindings will receive "upsert" functionality with just a single char const * argument.
Similarly, streaming exports can emplace ID into the packed document, to simplify post-processing for user.
This form is compatible with Mongo DB and Elastic stack, which are behind in terms of Apache Arrow adoption.

Changes

If no docs_count is set:
- if the format is JSON - we count the newlines.
- we need to have at least the first length variable set.
Every input document is checked to contain an integer-castable top-level _id field.

Generating UUIDs

When inserting new objects without a known ID, the user would first call a separate function allocating a range of new IDs and would only then proceed to ukv_write or ukv_docs_write.

Refactoring `strided_range_gt < ukv_octet_t >`

Remove STL PMR dependency for compatibility

Query K8s for the network topology

There is an official C SDK for that: https://github.com/kubernetes-client/c

Batch document insertions via Pandas interface

The collection.table.update(df) must insert-or-assign entries to the current collection, building documents on the fly.
A single reference nlohmann::json can be built for every input arrow::Table, which will then be repeatedly updated with every row contents and dumped onto a tape, which will later be passed to ukv_docs_write. To make the process more efficient, the nlohmann::json may be instantiated with std::string_view keys and strings, avoiding any copies. Alternatively, we can probably directly use the underlying serialization engine without instantiating any associative structures in-memory.

GoLang Packaging

Inconsistent usage arena usage

Currently, safe_vector_gt and growing_tape_t both use arenas but receive them in different forms - pointer and reference. Furthermore, they shouldn't be default-constructible without that argument.

PyBind Batch Insertions

Stress-testing the serializability of transactions

Pandas Interface

Converting `ukv_doc_field_msgpack_k` to and from JSON in Docs

Typo detected in README.md

- Futhermore
+ Furthermore

https://github.com/unum-cloud/UKV/blob/0141d607a177fc1a36eb2536771b156619072242/README.md?plain=1#L219

Implement import/export tool for CSV/Parquet/JSON Arrow datasets

See 7a91352 for interface placeholder. Library should be implemented using Apache Arrow datasets functionality. The CLI interface will be added later.

Extend tests in C++

Valid cases:

roundtrips with batches of entries mixed between different collections.
listing present collection names.
overlapping graph transactions.

Invalid cases:

removing the default nameless collection.
passing banned options combinations.
passing null values with non-zero lengths.

PyBind Iterators & Streams

Refactoring `unit.cpp` tests to use `triplet_t`

Listing collection names in C++

The db_t must add a member function expected_gt<cols_t> db_t::collections(), where cols_t would be:

struct cols_t {
    span_gt<col_t> ids;
    strings_tape_iterator_t names;
};

Then db_t::clear() can be refactored to use new features.

The `noexcept` document patching can throw

Sections like this shouldn't create temporary objects and must use our arenas.

            modify_field(original_doc,
                         value,
                         (std::string(field) + yyjson_mut_get_str(path)).c_str(),
                         ukv_doc_modify_insert_k,
                         c_error);

Refactoring `span_gt` and `indexed_range_gt`

The span_gt was accidentally added to helpers.
Similar functionality was already provided by the indexed_range_gt in ukv/cpp/ranges.hpp.
Those should be merged into a single class.

Add Docker image for testing

That image must run tests for all bindings across all languages.

Add tests for shared memory exports

In unit.cpp we must create a child process, read into shared memory and check that the supplied pointed is available from a child process. The normal read must fail in that setting.

Empty writes should allow flushes

When an empty write request is received, we just return success without checking if a flush was requested.
That must change in all the engines.

LevelDB wrapper doesn't implement Snapshots

... Even though those are natively supported by LevelDB

The `docs_modify` unit tests don't check return values

Normalizing argument order

Currently, scans, reads, and writes may have a different order of arguments.
The suggested order of output arguments should be count, indicators, offsets, lengths, and data.
Going from less info to more.

Arrow + PyBind11

To allow zero-copy passage of PyArrow tables/data-frames, we need some more dependencies: https://stackoverflow.com/a/57907044/2766161

Our arenas must be used to avoid allocations in `yyjson`

The yyjson_read_opts and yyjson_val_write_opts are already implemented, but they seem to be unstable.

Using `struct`s for all the function interfaces

Some of our function calls are too mouthful with up to 26 arguments, most of which are optional and have default values. ukv_paths_match is a great example. We should consider packing them into structs and passing by a pointer to the underlying engine.

Translating Python Dictionaries into internal document format directly

For Document collections a list[dict] can be assigned to some keys. Those should be converted directly into internal format (often MsgPack) in the C++ layer.

Add JSON-Patching & Merging tests

Examples: https://github.com/json-patch/json-patch-tests

Docs update and insert can avoid full document reads

Currently both ukv_doc_modify_update_k and ukv_doc_modify_insert_k pull the entire document. They, however, only need presence indicators. That will use less memory and work much faster with engines, that can query keys without retrieving values.

Arrow Flight RPC Bindings

Converting `ukv_doc_field_bson_k` to and from JSON in Docs

Implement tabular benchmarks in Python and C++

The benchmark will be receiving a directory of Parquet, CSV, Avro or Orca files and exporting them simultaneously into graph, document, and, potentially, path collections. The latter can also be added into the twitter benchmark.

Implement baseline K Nearest Neighbors Search

Extend GoLang bindings to support Transactions, Batch and Document operations

Fixing Python batch document read & writes

At this point, the batch document reads and document writes are semantically wrong. They are internally replacing a batch request with a number of separate requests, which changes the expected behavior.

Worse than that, the implementation parses/converts all inputs at once, but serializes and writes them one-by-one. A proper implementation would create something like a growing_tape_t and parse+convert+serialize entries one-by-one, but would submit those for write just once!

Bringing back the ukv_format_docs_internal_k would allow us to immediately choose the optimal underlying serialization format to avoid following conversions in the backend.

Smarter bulk & unordered scans

Currently ukv_scan is only working for fully consistent sorted exported of keys from collections.
With the bulk flag we allow prioritizing throughput over consistency, but a point can be made, that ML-like pipelines don’t need any dependency in operations whatsoever. Instead they may use scans to uniformly random-sample entries, which would in turn require a full scan of keys. If the user leaves start_key unset, we can perform the bulk sampling behind the curtains ourselves.
It will make the interface more ugly by making a function dual-use, but will keep the interface short. Worth considering.

New operation-level memory-management options

By default, every new call to UKV discards previous contents of arena. We may want to keep it to reuse the same iterators and underlying memory for key-value pairs scans.
Similarly, for bulk analytical applications, we may want to add a flag that grows arena on-demand, if memory is available.

Implement `ukv_option_scan_sample_k` for `ukv_scan`

In LevelDB and RocksDB backends we are forced to scan the whole collection performing reservoir-sampling along the way. With in-memory containers it is somewhat cheaper. Unum KVS and consistent_set will support this feature natively. This feature has to be implemented in the engines and be propagated up to the Python level.

Configurations for LevelDB, RocksDB and their collections

Such configurations should control the exact behavior of separate collections and will differ depending on modality:

document collections may constrain types of JSON members under given JSON Pointers.
graph collections may enable or disable multi-graph support, directedness, and graph-specific compression.
vector collections may tune hyper-parameters affecting tradeoffs between search quality and speed.
binary collections may hint expected blob sizes that may affect the underlying compaction policy of LSM trees.

Aside from those, the more obvious modality-agnostic configurations would include:

manually set memory budgets for caching in persistent variants.
custom directory paths to support better tiering between SSDs and HDDs connected to the same node.

Most DBMS engines don't provide any space to store collection-specific metadata, so this will need a separate persisting mechanism. The simplest solution would be to use JSON files, as we already include two JSON parsers in our builds.

Extending support for `ukv_col_drop`

Following functions should be added, that would use ukv_col_drop under the surface:

status_t col_t::clear() noexcept, which would remove the keys and values.
status_t col_t::clear_values() noexcept, which would remove just the values with ukv_col_drop_vals_k.
status_t graph_ref_t::clear_edges() noexcept, which would clear_values() the underlying collection.
status_t graph_ref_t::clear() noexcept, which would clear() the underlying collection.

Same for the matching interfaces in Python:

Collection.clear
Collection.remove
DataBase.clear

Arrow Client needs additional arena and mutex for DB-wide operations

Following methods:

ukv_collection_init
ukv_collection_drop
ukv_transaction_init
ukv_transaction_commit

Currently have no arena to wrap like the other methods do:

    ar::Status ar_status;
    arrow_mem_pool_t pool(arena);
    arf::FlightCallOptions options = arrow_call_options(pool);

For that rpc_client_t must gain a mutex and an arena.

Twitter benchmark refactoring

Dynamic batch size (as a multiple of copies_per_tweet_k) must be allowed in docs_upsert
The full graph must be imported in the index_file
All insertions must be imported in transactions, to avoid state corruption

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.