Giter Club home page Giter Club logo

pebblesdb's People

Contributors

aakp10 avatar abhijith97 avatar jayashreemohan29 avatar pandian4github avatar rescrv avatar vijay03 avatar xxks-kkk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pebblesdb's Issues

YCSB legacy code in db_bench?

I go through db_bench.cc, and I notice that code ships with some code related to YCSB. I'm curious what's the status of this part of the code? --benchmarks doesn't expose ycsb option and when I manually enable this option, core dump happens.

Also, it seems db_bench.cc is somewhat broken when I run the benchmark by default

$ ./db_bench
LevelDB:    version 1.17
Date:       Tue Apr 17 18:12:07 2018
CPU:        1 * Intel(R) Core(TM) i5-2435M CPU @ 2.40GHz
CPUCache:   3072 KB
Keys:       16 bytes each
Values:     1024 bytes each (512 bytes after compression)
Entries:    1000000
RawSize:    991.8 MB (estimated)
FileSize:   503.5 MB (estimated)
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
fillseq      :     197.156 micros/op;    5.0 MB/s
fillsync     :    4347.880 micros/op;    0.2 MB/s (1000 ops)
fillrandom   :     234.785 micros/op;    4.2 MB/s
overwrite    :     285.580 micros/op;    3.5 MB/s
readrandom   :      61.021 micros/op; (1000000 of 1000000 found)
readrandom   :      58.686 micros/op; (1000000 of 1000000 found)
readseq      :       7.205 micros/op;  137.7 MB/s
readreverse  :      16.426 micros/op;   60.4 MB/s
lt-db_bench: ./db/dbformat.h:111: leveldb::Slice leveldb::ExtractUserKey(const leveldb::Slice&): Assertion `internal_key.size() >= 8' failed.
Aborted (core dumped)

PebblesDB does not discard partially-flushed values

Verified in:

What happened:
After experiencing a power failure while adding values to PebblesDB with the verify_checksums and paranoid_checks parameters set to true, database gets corrupted. After applying the recovery method suggested in https://github.com/google/leveldb/blob/main/doc/index.md (using RepairDB), a value that was partially persisted is present.

The root cause of the problem is that some writes to the log file exceed the common size of a page at the page cache. This can result in a "torn write" scenario where only part of the write's payload is persisted while the rest is not, since the pages of the page cache can be flushed out of order. There are several references about this problem:

This problem was already reported in leveldb google/leveldb#251 and does not exist in the latest release (1.23).

How to reproduce
This issue can be replicated using LazyFS, a file system capable of simulating power failures and the behavior of the OS mentioned above, i.e., simulating file system pages persisted out of order at the disk.
The main problem is a write to the file 000003.log which is 12288 bytes long. LazyFS will persist portions (in sizes of 4096 bytes) of this write out of order and will crash, simulating a power failure.
To reproduce this problem, one can follow these steps (the mentioned files write_test.cpp, etc., are in this zip pebblesdb_test.zip):

  1. Mount LazyFS on a directory where PebblesDB data will be saved, with a specified root directory. Assuming the data path for PebblesDB is /home/pebblesdb/data and the root directory is /home/pebblesdb/data-r, add the following lines to the default configuration file (located in the config/default.toml directory):
[[injection]]
type="split_write"
file="/home/pebblesdb/data-r/000003.log"
persist=[1,3]
parts=3
occurrence=4

These lines define a fault to be injected. A power failure will be simulated after writing to the /home/pebblesdb/data-r/000003.log file. Since this write is large (12288 bytes), it is split into 3 parts (each with 4096 bytes), and only the first and the third parts will be persisted. Specify that it's the fourth write issued to this file (with the parameter occurrence).

  1. Start LazyFS with the following command:
    ./scripts/mount-lazyfs.sh -c config/default.toml -m /home/pebblesdb/data -r /home/pebblesdb/data-r -f

  2. Compile and execute the write_test.cpp file, that adds 4 pairs of key-values to PebblesDB, where the third pair is the only one that exceeds the size of a page at the page cache .

Immediately after this step, PebblesDB will shut down because LazyFS was unmounted, simulating the power failure. At this point, you can analyze the logs produced by LazyFS to see the system calls issued until the moment of the fault. Here is a simplified version of the log:

{'syscall': 'write', 'path': '/home/pebblesdb/data-r/000003.log', 'size': '262144', 'off': '0'}
{'syscall': 'read', 'path': '/home/pebblesdb/data-r/000003.log', 'size': '131072', 'off': '0'}
{'syscall': 'write', 'path': '/home/pebblesdb/data-r/000003.log', 'size': '4096', 'off': '0'}
{'syscall': 'fsync', 'path': '/home/pebblesdb/data-r/000003.log'}
{'syscall': 'write', 'path': '/home/pebblesdb/data-r/000003.log', 'size': '4096', 'off': '0'}
{'syscall': 'fsync', 'path': '/home/pebblesdb/data-r/000003.log'}
{'syscall': 'write', 'path': /home/pebblesdb/data-r/000003.log', 'size': '12288', 'off': '0'}
  1. Remove the fault from the configuration file, unmount the filesystem with fusermount -uz /home/pebblesdb/data
  2. Mount LazyFS again with the previously provided command.
  3. Attemp to start PebblesDB (it fails).
  4. Compile and execute the repair.cpp file that recovers the database.
  5. Compile and execute the read_test.cpp file that reads and checks the values previously inserted. The value for the key k3 is only part of the initial value.

Note that when paranoid_checks and verify_checksums are set to false, PebblesDB does not fail on restart and discards the partial value of the key k3 (says that this key does not exist).

Create python binding for PebblesDB

I'd like to have PebblesDB be accessed via Python and be installed via pip. Currently, PebblesDB works well in C++, and somewhat less well with Java (using the LevelDB JNI binding).

PebblesDB Crashing with large KV pairs

PebblesDB is crashing when running the large-sized KV pair using YCSB Benchmark. I have been trying to make it run but it is keep crashing when I feed the KV pair size larges than few 10s of KB. If PebblesDB does not support large-sized KV pairs then please mention here and if it does then how to work it around.

Set max memory used by PebblesDB

Right now, PebblesDB uses a lot of memory for the TableCache (caching metadata) and for the bloom filters used for each sstable.

We want to add a command line option for PebblesDB which would limit the total amount of memory used by PebblesDB for the TableCache and bloom filters.

When using the specified amount of memory, preference should be given first to the table cache, and then bloom filters for upper levels (level 0, level 1).

Some question about VersionSet::PickCompactionForGuards

Why not add all files at the level level directly to guards_compaction_and_all_files instead of taking the intersection of Complete_guards and Guards? I'm a little confused about this code, so I'd appreciate explaining it ใ€‚
`

              int guard_index_iter = 0;
	  for (size_t i = 0; i < complete_guards.size(); i++) {
		  GuardMetaData* cg = complete_guards[i];
		  int guard_index = -1;
		  Slice guard_key = cg->guard_key.user_key(), next_guard_key;
		  if (i + 1 < complete_guards.size()) {
			  next_guard_key = complete_guards[i+1]->guard_key.user_key();
		  }

		  for (; guard_index_iter < guards.size(); guard_index_iter++) {
			  int compare = icmp_.user_comparator()->Compare(guards[guard_index_iter]->guard_key.user_key(), guard_key);
			  if (compare == 0) {
				  guard_index = guard_index_iter;
				  guard_index_iter++;
				  break;
			  } else if (compare > 0) {
				  break;
			  } else {
				  // Ideally it should never reach here since there are no duplicates in complete_guards and complete_guards is a superset of guards
			  }
		  }

		  if (guard_index == -1) { // If guard is not found for this complete guard
			  continue;
		  }
		  GuardMetaData* g = guards[guard_index];
		  bool guard_added = false;
		  for (unsigned j = 0; j < g->files.size(); j++) {
			  FileMetaData* file = g->file_metas[j];
			  Slice file_smallest = file->smallest.user_key();
			  Slice file_largest = file->largest.user_key();
			  if ((i < complete_guards.size()-1 							// If it is not the last guard, checking for smallest and largest to fit in the range
							  && (icmp_.user_comparator()->Compare(file_smallest, guard_key) < 0
									  || icmp_.user_comparator()->Compare(file_largest, next_guard_key) >= 0))
					  || (i == complete_guards.size()-1 						// If it is the last guard, checking for the smallest to fit in the guard
							  && icmp_.user_comparator()->Compare(file_smallest, guard_key) < 0)) {
				  guards_to_add_to_compaction.push_back(g);
				  guards_compaction_add_all_files.push_back(true);	
				  guard_added = true;
				  break; // No need to check other files
			  }
		  }
		  if (!guard_added && which == 0 && (force_compact || v->guard_compaction_scores_[current_level][guard_index] >= 1.0)) {
			  guards_to_add_to_compaction.push_back(g);
			  guards_compaction_add_all_files.push_back(false);
			  continue;
		  }
	  }

`

VersionSet::RemoveFileLevelBloomFilterInfo isn't thread-safe

How to Reproduce
Run db_test many times then see DBTest.MultiThreaded crashes occasionally

Stack trace

==== Test DBTest.MultiThreaded
[New Thread 0x7fff4cfd9700 (LWP 23220)]
[New Thread 0x7fff4c7d8700 (LWP 23221)]
[New Thread 0x7fff4bfd7700 (LWP 23222)]
... starting thread 0
[New Thread 0x7fff4b7d6700 (LWP 23223)]
... starting thread 1
[New Thread 0x7fff4afd5700 (LWP 23224)]
... starting thread 2
[New Thread 0x7fff4a7d4700 (LWP 23225)]
... starting thread 3

Thread 307 "db_test" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff4c7d8700 (LWP 23221)]
0x00007ffff7afe106 in std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) where
#0  0x00007ffff7afe106 in std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00005555555d46e3 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*>, std::_Select1st<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >) ()
#2  0x00005555555d2919 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*>, std::_Select1st<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::erase[abi:cxx11](std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >) ()
#3  0x00005555555d0a64 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*>, std::_Select1st<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >) ()
#4  0x00005555555cd5e9 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*>, std::_Select1st<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::erase[abi:cxx11](std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::_Rb_tree_const_iterator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >) ()
#5  0x00005555555c8e3c in std::_Rb_tree<unsigned long, std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*>, std::_Select1st<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::erase(unsigned long const&) ()
#6  0x00005555555c6273 in std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> > >::erase(unsigned long const&) ()
#7  0x00005555555b9358 in leveldb::VersionSet::RemoveFileLevelBloomFilterInfo(unsigned long) ()
#8  0x0000555555591e7f in leveldb::DBImpl::DeleteObsoleteFiles() ()
#9  0x0000555555595600 in leveldb::DBImpl::BackgroundCompactionGuards(leveldb::FileLevelFilterBuilder*) ()
#10 0x0000555555594dc3 in leveldb::DBImpl::CompactLevelThread() ()
#11 0x000055555559e395 in leveldb::DBImpl::CompactLevelWrapper(void*) ()
#12 0x00005555555e7403 in leveldb::(anonymous namespace)::StartThreadWrapper(void*) ()
#13 0x00007ffff7326494 in start_thread (arg=0x7fff4c7d8700) at pthread_create.c:333
#14 0x00007ffff7068acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Fix memory leak when using PebblesDB with small key-value pairs

PebblesDB has a known memory-leak when used with a large number of small key-value pairs. This doesn't appear for some reason when PebblesDB is used with large key-value pairs (the default). This doesn't affect default behavior, but we would like to fix this going forward.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.