chrislessard / lsm-tree Goto Github PK
View Code? Open in Web Editor NEWAn implementation of an LSM Tree in Python
An implementation of an LSM Tree in Python
When the index and bloom filter fail, it would be smart to leverage the sorted nature of the SSTables to perform a binary search for the key in question on each segment.
This could either be done by loading the segment keys into memory, or by performing the search in place
The red black tree's add method has been modified to perform updates as well. This is done lazily, by simple searching for a node then updating it before attempting addition.
It would be better, performance wise, to update values in place.
Since the DB uses a memtable, there's no reason to run the compaction algorithm on a segment as there cant be any duplicate keys in an individual segment. If a duplicate was written to the memtable, the corresponding node simply would have been updated.
In the function flush_memtable_to_disk
, I believe that the log needs to be added first before adding key value node to the bloom filter and index.
If the system crashes, there isn't a good way of backing up the bloom filter. Because it is checked to see if values are on disk, this could lead to a situation where the recovered DB instance incorrectly reports that keys aren't stored.
A backup of the bloom filter should be saved whenever the memtable is flushed to disk.
It would be good if there was a way to make sure the bloom filter doesn't need to be reset whenever its parameters change
Rather than calling the expensive repopulate_index()
method on each DB initialization, it would be a good idea to just save the index to disk with the other metadata.
I have some proposition how to achieve parallelism, if you are still interested in iterating this lib:
Essentially, we could allow each worker / thread have their own memtable and Apendlog.
But from my perspective, it should not be so hard to make the first two points available.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.