Giter Club home page Giter Club logo

lsm-tree's People

Contributors

chrislessard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

lsm-tree's Issues

Search segments with binary search

When the index and bloom filter fail, it would be smart to leverage the sorted nature of the SSTables to perform a binary search for the key in question on each segment.

This could either be done by loading the segment keys into memory, or by performing the search in place

Improve memtable add performance

The red black tree's add method has been modified to perform updates as well. This is done lazily, by simple searching for a node then updating it before attempting addition.

It would be better, performance wise, to update values in place.

Compaction algorithm: no reason to compress individual segments

Since the DB uses a memtable, there's no reason to run the compaction algorithm on a segment as there cant be any duplicate keys in an individual segment. If a duplicate was written to the memtable, the corresponding node simply would have been updated.

Anticipated inconsistency while flushing memtable to disk

In the function flush_memtable_to_disk, I believe that the log needs to be added first before adding key value node to the bloom filter and index.

  • If there is a failure and the bloom filter doesn't work but the key-value node gets added to the index it might lead to ignoring a key that is present.
  • This is because the bloom filter may return false, indicating that the key is not present.

Bloomfilter isnt resistent to system failure

If the system crashes, there isn't a good way of backing up the bloom filter. Because it is checked to see if values are on disk, this could lead to a situation where the recovered DB instance incorrectly reports that keys aren't stored.

A backup of the bloom filter should be saved whenever the memtable is flushed to disk.

Backup index with metadata

Rather than calling the expensive repopulate_index() method on each DB initialization, it would be a good idea to just save the index to disk with the other metadata.

Ideas to achieve parallelism

I have some proposition how to achieve parallelism, if you are still interested in iterating this lib:

Essentially, we could allow each worker / thread have their own memtable and Apendlog.

  • Memtable: I dont see much problem with multiple memtables exist among workers. Essentially, it provides multiple SStable segments on disk, and a dedicated worker will do the compaction any way.
  • Apendlog: What exists in Apendlog are essentially just timeseries of transactions. If node failure happened, then the next boost should first compact the Apendlog, then you can start recovering the lost data from last round memtable.
  • For multiple workers / threads accessing the same segment, this is unfortunately have to be locked for thead-safe, and they have to wait for each other.

But from my perspective, it should not be so hard to make the first two points available.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.