Giter Club home page Giter Club logo

schrodingerdb's Introduction

SchrödingerDB

Maybe it's there, maybe it's not!

Overview

This is a toy project that is designed to be a learning project for people that I mentor in my company to understand how databases work under the hood.

Architecture

SchrödingerDB is a simple key-value store using a Hash Index.

Like most client-server databases, the following components must be implemented:

block-beta
columns 1
  block
    columns 1
    Transport
    space
    Transport --> QueryProcessor
    space
    QueryProcessor --> ExecutionEngine
    space
    ExecutionEngine --> StorageEngine
  end
Loading

Transport

Client communication

The DB must open a client communication port on a TCP port to handle Redis client requests.

Cluster communication

No cluster communication is planned for now.

Query Processor

The query processor is responsible for parsing, validating and interpreting queries. No RBAC will be implemented, so no access checks need to be performed afterward.

The DB will leverage RESP (Redis serialization protocol) because this protocol is well documented, and we will be able to use a simple REDIS CLI to test the DB.

Regarding the version, RESP2 is enough for our limited use case.

Query Parser

The DB should accept a limited subset of Redis/Valkey queries. A query planner is not required as the DB only supports point queries based on a single index.

Query Optimizer

No query optimizer is required as queries will be basic and straightforward.

Execution Engine

Local Execution

Remote Execution

No remote execution is planned for now.

Storage Engine

Constraint: The storage engine must be pluggable. Even if there is a single canonical implementation, a pluggable architecture allows developer to mock the storage engine during tests (spy, fault injection, ...).

Transaction Manager

No required for now. If we want to implement a Transaction-capable database, then using FoundationDB seems to most interesting option.

Lock Manager

We may want to implement a limited concurrency control mechanism.

Storage Structure Manager

This component is the one on which we will focus on. It organizes data on disk and manages data retrieval.

Buffer Manager

This component should maintain a cache of data pages.

Recovery Manager

A basic recovery manager should be implemented, not all edge cases will be covered.

Features

Writes

The database uses append-only data files, meaning that once a key-value pair is written to the database, the data file is never modified. Hence, challenges like update and deletion arise.

flowchart TD
    A([Client]) -->|APPEND key value| B[(Database)]
    B --> C{UPSERT}
    C --> |O_APPEND| D[fa:fa-file LogFile]
    C --> |ADD key offset| E[HashTable]
Loading

Now, what if we use multiple files (segments) instead of one?

  • What are the challenges?
  • How can we solve them?

Reads

flowchart TD
    A([Client]) -->|GET key| B[(Database)]
Loading

Exercise 1: complete the GET flowchart

Roadmap

  • Append-only data files
  • Indexing
  • Log file for durability
  • Query language
  • Partitioning
  • Sharding
  • Transactions
  • Replication
  • Clustering

Run the app

To run the project schrodingerDB you will need to have maven and a java SDK installed

mvn clean compile
mvn test
mvn package
java -jar target/schrodingerdb-<Version>.jar

schrodingerdb's People

Contributors

rhardouin avatar github-hugo-vincent avatar hy0g0 avatar

Stargazers

 avatar

Watchers

 avatar

schrodingerdb's Issues

Use a thread pool to handle client connections

The current implementation spawns one thread per client. This works when there are just a few of clients, but it doesn't scale. The number of threads should be fixed. It can be hardcoded for now, e.g set to the number of cores: Runtime.getRuntime().availableProcessors().

Implement in-memory GET/SET commands

A redis client must be able to SET and GET a value

SET requirement:

  • SET key value must work

  • All other options should be ignored: [NX | XX] [GET] [EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT unix-time-milliseconds | KEEPTTL]

Implement PING Redis command

The DB must listen on the Redis TCP port and answer PONG to the PING command, like Redis.

The TCP port can be hardcoded to 6379, no need to make it configurable for now.

Handle array of bulk strings commands

Context

Up to now, the database handles "Inline commands" to easily test it with telnet or nc. However, a Redis client does not send inline commands as per the Redis documentation:

Clients send commands to a Redis server as an array of bulk strings. The first (and sometimes also the second) bulk string in the array is the command's name. Subsequent elements of the array are the arguments for the command

Task

SchrodingerDB should work with a standard Redis client, for instance:

❯ docker run --network=host --rm redis redis-cli PING

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.