chrislessard / lsm-tree Goto Github PK

View Code? Open in Web Editor NEW

45.0 45.0 12.0 263 KB

An implementation of an LSM Tree in Python

Python 100.00%

lsm-tree's People

Contributors

Stargazers

Watchers

Forkers

sparklyys thyfate aiodb darionrichie ericyeungcode manosanaggh chang-lehung iamluisgb mlkimmins hubdwoo pavankoppineni nilay103

lsm-tree's Issues

Search segments with binary search

When the index and bloom filter fail, it would be smart to leverage the sorted nature of the SSTables to perform a binary search for the key in question on each segment.

This could either be done by loading the segment keys into memory, or by performing the search in place

Improve memtable add performance

The red black tree's add method has been modified to perform updates as well. This is done lazily, by simple searching for a node then updating it before attempting addition.

It would be better, performance wise, to update values in place.

Compaction algorithm: no reason to compress individual segments

Since the DB uses a memtable, there's no reason to run the compaction algorithm on a segment as there cant be any duplicate keys in an individual segment. If a duplicate was written to the memtable, the corresponding node simply would have been updated.

Anticipated inconsistency while flushing memtable to disk

In the function flush_memtable_to_disk, I believe that the log needs to be added first before adding key value node to the bloom filter and index.

If there is a failure and the bloom filter doesn't work but the key-value node gets added to the index it might lead to ignoring a key that is present.
This is because the bloom filter may return false, indicating that the key is not present.

Memtable: I dont see much problem with multiple memtables exist among workers. Essentially, it provides multiple SStable segments on disk, and a dedicated worker will do the compaction any way.
Apendlog: What exists in Apendlog are essentially just timeseries of transactions. If node failure happened, then the next boost should first compact the Apendlog, then you can start recovering the lost data from last round memtable.
For multiple workers / threads accessing the same segment, this is unfortunately have to be locked for thead-safe, and they have to wait for each other.

But from my perspective, it should not be so hard to make the first two points available.

chrislessard / lsm-tree Goto Github PK

lsm-tree's People

Contributors

Stargazers

Watchers

Forkers

lsm-tree's Issues

Search segments with binary search

Improve memtable add performance

Compaction algorithm: no reason to compress individual segments

Anticipated inconsistency while flushing memtable to disk

Bloomfilter isnt resistent to system failure

Changing the BloomFilter parameters should be resilient

Backup index with metadata

Ideas to achieve parallelism

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent