pjtatlow / jammdb Goto Github PK
View Code? Open in Web Editor NEWJust Another Memory Mapped Database
License: Apache License 2.0
Just Another Memory Mapped Database
License: Apache License 2.0
On a multithreaded workload I regularly get a panic when retrieving an item, while writes may be happening (95% read, 5% insert). From what I can tell, it does not occur on single-threaded workloads, and happens more often when using many threads.
There are two kinds of errors I'm getting:
page_node::PageNode::index
get_bucket
returns Error value BucketMissing
out of the blueUsing https://github.com/marvin-j97/rust-storage-bench, run with:
RUST_BACKTRACE=full cargo run -r -- --out jammdb_test.jsonl --workload task-f --backend jamm-db --fsync --threads 16 --minutes 5 --key-size 8 --value-size 256 --items 100 --cache-size 5000000
May need to run multiple times, it's very non-deterministic.
Panic
thread '<unnamed>' panicked at jammdb-0.11.0/src/page_node.rs:69:22:
INVALID PAGE TYPE FOR INDEX: 4
stack backtrace:
0: 0x563f6b08a69c - std::backtrace_rs::backtrace::libunwind::trace::ha637c64ce894333a
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
1: 0x563f6b08a69c - std::backtrace_rs::backtrace::trace_unsynchronized::h47f62dea28e0c88d
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x563f6b08a69c - std::sys_common::backtrace::_print_fmt::h9eef0abe20ede486
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:67:5
3: 0x563f6b08a69c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hed7f999df88cc644
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:44:22
4: 0x563f6b0b44e0 - core::fmt::rt::Argument::fmt::h1539a9308b8d058d
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/rt.rs:142:9
5: 0x563f6b0b44e0 - core::fmt::write::h3a39390d8560d9c9
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/mod.rs:1120:17
6: 0x563f6b087ccf - std::io::Write::write_fmt::h5fc9997dfe05f882
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/io/mod.rs:1762:15
7: 0x563f6b08a484 - std::sys_common::backtrace::_print::h894006fb5c6f3d45
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:47:5
8: 0x563f6b08a484 - std::sys_common::backtrace::print::h23a2d212c6fff936
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:34:9
9: 0x563f6b08bc37 - std::panicking::default_hook::{{closure}}::h8a1d2ee00185001a
10: 0x563f6b08b99f - std::panicking::default_hook::h6038f2eba384e475
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:292:9
11: 0x563f6b08c0b8 - std::panicking::rust_panic_with_hook::h2b5517d590cab22e
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:779:13
12: 0x563f6b08bf9e - std::panicking::begin_panic_handler::{{closure}}::h233112c06e0ef43e
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:657:13
13: 0x563f6b08ab66 - std::sys_common::backtrace::__rust_end_short_backtrace::h6e893f24d7ebbff8
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:170:18
14: 0x563f6b08bd02 - rust_begin_unwind
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
15: 0x563f6a169d15 - core::panicking::panic_fmt::hbf0e066aabfa482c
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
16: 0x563f6a32080d - jammdb::page_node::PageNode::index::hb7e48e18bd0cd9e3
17: 0x563f6a31ec48 - jammdb::cursor::search::he06cf7d0bde89993
18: 0x563f6a1ef32f - jammdb::bucket::Bucket::get::h648e73f8fb9f722d
19: 0x563f6a2018d3 - worker::db::DatabaseWrapper::get::h286a7b65683b822d
Not a panic, but a non-deterministic Err value (Bucket is definitely not missing, considering millions of reads before it did not fail):
thread '<unnamed>' panicked at src/worker/db.rs:196:52:
called `Result::unwrap()` on an `Err` value: BucketMissing
I think a read-only transaction should be db.tx(false), right?
`// open the existing database file
let db = DB::open("my-database.db")?;
// open a read-only transaction to get the data
let tx = db.tx(true)?;`
Hi,
Is there any way to list all the buckets in DB.
May 04 17:38:59 c003-n3 on-prem-agent[25208]: thread 'tokio-runtime-worker' panicked at 'assertion failed: self.meta.root_page == page_id || self.page_parents.contains_key(&page_id)', /home/REDACTED/.cargo/registry/src/github.com-1285ae84e5963aae/jammdb-0.5.0/src/bucket.rs:723:17
Hi folks, I found that range
does not work expectedly.
In the example, range
only contains one element, key = 1, val = "1"
. But, I think the correct range
should return two elements key = 1
and key = 2
.
fn main() {
let db = jammdb::Db::open("foo").unwrap();
let tx = db.tx(true).unwrap();
let b = tx.get_or_create_bucket("foo").unwrap();
b.put(1u64.to_be_bytes(), "1").unwrap();
b.put(2u64.to_be_bytes(), "2").unwrap();
b.put(3u64.to_be_bytes(), "3").unwrap();
tx.commit().unwrap();
let tx = db.tx(true).unwrap();
let b = tx.get_bucket("foo").unwrap();
for i in b.range(1u64.to_be_bytes().as_slice()..=2u64.to_be_bytes().as_slice()) {
println!("remove key {}", i.key());
b.delete(i.key()).unwrap();
}
tx.commit().unwrap();
let tx = db.tx(false).unwrap();
let b = tx.get_bucket("foo").unwrap();
// panic unexpectedly, as the code should remove key in range 1..=2
assert!(b.get(2u64.to_be_bytes().as_slice()).is_none());
}
Thank you @pjtatlow
I tried to clone and pass-through Arc, but it shows [E0596] cannot borrow data in an Arc as mutable. Any idea ?
let jdb = jammdb::DB::open("../jammdb").unwrap();
HttpServer::new(move || {
App::new()
.data(jdb.clone())
async fn fn_inituserdb(req: HttpRequest, _jdb: Datajammdb::DB) ->Result<String, Error> {
let _tx = _jdb.tx(true).borrow_mut().as_ref().map_err(|e| actix_web::http::StatusCode::BAD_REQUEST);
| ^^^^ cannot borrow as mutable
[E0596] cannot borrow data in an Arc as mutable.
[Note] cannot borrow as mutable
The following code panics with
thread 'main' panicked at 'attempt to subtract with overflow', .../jammdb-0.5.0/src/page.rs:107:25
:
const TEST_DB: &str = "tmp.db";
fn main() {
let _ = std::fs::remove_file(TEST_DB);
{
let db = jammdb::DB::open(TEST_DB).unwrap();
{
let tx = db.tx(true).unwrap();
let root = tx.get_or_create_bucket("ROOT").unwrap();
tx.commit().unwrap();
}
{
let tx = db.tx(true).unwrap();
let root = tx.get_or_create_bucket("ROOT").unwrap();
let child = root.get_or_create_bucket("CHILD").unwrap();
tx.commit().unwrap(); // panic! is here, page.rs:107: self.overflow = num_pages - 1; and num_pages is 0
}
}
let _ = std::fs::remove_file(TEST_DB);
}
If I do all stuff in the single transaction everything is OK
fn main() {
let _ = std::fs::remove_file(TEST_DB);
{
let db = jammdb::DB::open(TEST_DB).unwrap();
{
let tx = db.tx(true).unwrap();
let root = tx.get_or_create_bucket("ROOT").unwrap();
// remove tx.commit() + db.tx()
let root = tx.get_or_create_bucket("ROOT").unwrap();
let child = root.get_or_create_bucket("CHILD").unwrap();
tx.commit().unwrap();
}
}
let _ = std::fs::remove_file(TEST_DB);
}
Unfortunately, the problem prevents me from using nested buckets at all
Error:
thread 'db::test::test_db' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0x7f877000191c', src/index.crates.io-6f17d22bba15001f/jammdb-0.9.0/src/freelist.rs:64:29
Code:
use jammdb::DB;
#[test]
fn test_db() -> Result<(), Box<dyn std::error::Error>> {
let db = DB::open("data/123.db")?;
let tx = db.tx(true)?;
tx.create_bucket("test")?;
tx.commit()?;
Ok(())
}
Version:
cargo 1.71.0-nightly (d0a4cbcee 2023-04-16)
rustc 1.71.0-nightly (fec9adcdb 2023-04-21)
jammdb: 0.9.0
system: Linux 6.2.8-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC x86_64 GNU/Linux
In boltdb you can specify mmap flags to tell the OS to read ahead aggressively for faster reads, as well as file flags to control how the OS will buffer writes before committing them to disk. We need to support this in jammdb
too.
Hey!
Would adding an option to have an async version via something like maybe_async
make sense?
I want to implement this since I want to use this database but I want to use it in an async context and doing tokio::spawn_blocking
or similar for every db call doesn't feel right.
How tolerant is JammDB to random power outages when it is writing data? Is data corruption a risk with JammDB if the machine is unexpectedly stopped mid-write?
Hi,
I got the below error for integration with actix web app. Anyway, idea how to resolve this?
let mut tx = _jdb.tx(true)?;
| ^ the trait ResponseError
is not implemented for jammdb::Error
Please describe complexity bounds of your DB. Are reads/writes bounded? Are they logarithmic?
A few minutes ago I rejected Ozon which is another Rust port of BoltDB, because it multiplies capacity by two, so using inefficiently at least disk space. Do you share with Ozon this "feature"?
P.S. I need a bounded or logarithmic for both reads and writes key/value store.
Hi,
I'm coming from go and was looking for something similar to boltdb.
After finding your project and starting to learn rust, I wanted to ask if it would be feasible to implement the typestate pattern for transactions.
Instead of just allowing to call tx() with a bool you could also call ro() or rw() which would give a transaction without any of the write functions even available. That way you could catch these errors at compiler time.
If you think this is a good idea, I'd be happy to do a bit of gruntwork for adding the state to each neccesary type.
I don't think I can handle all problems however, so I'd be grateful for some help later on.
I would suggest these 3 TypeStates:
This way other projects will still function, while enabling compiler type checking the transaction type.
Is there a project design document? Or a description of the database structure design? I want to do something interesting with your implementation, so I need more details.
Hello! Noob question here as i'm using your project to learn rust.
Is there a way to read and write in the same transaction in order to do some conditional .put() ?
For instance adding one (key,value) only if the value is not present already
Example:
I'm opening a tx in the root,
then sending the buckets to some functions as references and inside that function i'm checking if a key exist, and if it exist, i'm throwing an error , and if not, i'm doing a .put()
match schema_index_kinds_bucket.get(kindName) {
Some(data) => match data {
Data::Bucket(_) => Err(Error::KindExists),
Data::KeyValue(_) => Err(Error::KindExists),
},
None => {
schema_kinds_bucket
.put(kindId.to_be_bytes(), kindName)
.map_err(Error::Database)?;
schema_index_kinds_bucket
.put(kindName, kindId.to_be_bytes())
.map_err(Error::Database)?;
Ok(())
}
}
The error I get, which makes sense if this is not compatible with read-write, is that the "schema_kinds_bucket" may not live long enough
Hi, I was trying to use jammdb in a project of mine but encountered a segmentation fault after boxing an iterator derived from a jammdb::Cursor
.
This is a small example that demonstrates the issue, my-database.db
has been initialized with the same content as the example in the README.
use jammdb::{Data, DB};
fn iterate(db: &mut DB) -> Box<dyn Iterator<Item = String>> {
Box::new(
db.tx(false)
.unwrap()
.get_bucket("names")
.unwrap()
.cursor()
.filter_map(|item| match item {
Data::Bucket(_) => None,
Data::KeyValue(kv) => Some(kv),
})
.map(|kv| String::from_utf8(kv.value().to_vec()).unwrap()),
)
}
fn main() {
let mut db = DB::open("my-database.db").unwrap();
for surname in iterate(&mut db) {
println!("{}", surname);
}
}
GDB backtrace:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
core::ptr::non_null::NonNull<T>::as_ref (self=0x90)
at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/ptr/non_null.rs:115
(gdb) bt
#0 core::ptr::non_null::NonNull<T>::as_ref (self=0x90) at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/ptr/non_null.rs:115
#1 0x00005555555918ff in alloc::sync::Arc<T>::inner (self=0x90) at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/liballoc/sync.rs:731
#2 0x0000555555591b0f in <alloc::sync::Arc<T> as core::ops::deref::Deref>::deref (self=0x90) at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/liballoc/sync.rs:982
#3 0x0000555555597c6a in jammdb::transaction::TransactionInner::page (self=0x0, id=93824993075216)
at /home/giacomo/.cargo/registry/src/github.com-1ecc6299db9ec823/jammdb-0.3.0/src/transaction.rs:223
#4 0x000055555556bf2e in jammdb::bucket::Bucket::page_node (self=0x555555622910, page=93824993075216)
at /home/giacomo/.cargo/registry/src/github.com-1ecc6299db9ec823/jammdb-0.3.0/src/bucket.rs:621
#5 0x000055555556f889 in jammdb::cursor::Cursor::seek_first (self=0x555555623000) at /home/giacomo/.cargo/registry/src/github.com-1ecc6299db9ec823/jammdb-0.3.0/src/cursor.rs:204
#6 0x000055555556faac in <jammdb::cursor::Cursor as core::iter::traits::iterator::Iterator>::next (self=0x555555623000)
at /home/giacomo/.cargo/registry/src/github.com-1ecc6299db9ec823/jammdb-0.3.0/src/cursor.rs:232
#7 0x0000555555565cb8 in core::iter::traits::iterator::Iterator::try_fold (self=0x555555623000, init=(), f=...)
at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/iter/traits/iterator.rs:1876
#8 0x0000555555565c4f in core::iter::traits::iterator::Iterator::find_map (self=0x555555623000, f=0x555555623000)
at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/iter/traits/iterator.rs:2207
#9 0x0000555555565515 in <core::iter::adapters::FilterMap<I,F> as core::iter::traits::iterator::Iterator>::next (self=0x555555623000)
at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/iter/adapters/mod.rs:1070
#10 0x000055555556591d in <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::next (self=0x555555623000)
at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/iter/adapters/mod.rs:791
#11 0x0000555555565a0e in <alloc::boxed::Box<I> as core::iter::traits::iterator::Iterator>::next (self=0x7fffffffd1d0)
at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/liballoc/boxed.rs:951
#12 0x00005555555648d5 in jammdb_ub_minimal::main () at src/main.rs:21
Of course it would be pointless to implement such a function, but the code above is just a reduced case of where this happens. From what I've understood, I guess this is caused by Tx, Bucket and/or Cursor structs getting dropped, while the boxed iterator that the function returns still points to memory there. I saw there are unsafe
uses in the code, that's why I'm guessing this is the nature of what's happening.
Unfortunately I'm not an experienced Rust user, I tried to look at the code but I'm not able to easily understand how to fix this, otherwise I would have tried to submit a PR.
Hope this helps, thanks for the otherwise nice crate!
Description mentions that the library uses mmap
, also links to "Single-level store" article, which suggests that jammdb may give users references leading to mmaped files.
I expected e.g. jammdb::DB::open
to be unsafe fn
, as full protection against unreasonable things that is needed to make it completely sound (or maybe even enough-for-practical-considerations sound) may not be viable gived the architecture.
But it seems to be not the case, the crate API does not hint at potentials undefined behaviours users may face with the library. For example, what worst could happen if the database file is mangled by external process while in use? What if multiple processes open the same database for writing (including using networked filesystem without locking)?
Shall the entry function be marked unsafe
to make users commit to not doing unreasonable things to the database? Or is jammdb
actually fully handles all possible complications of memory mapping, so the API is indeed sound even when misused?
I have a boltdb config file which generate in go language, I want to read it using jammdb, but it panic when I open it, the error as follows:
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `4`,
right: `3`', /home/jicky/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-be2141875385cea5/jammdb-0.10.0/src/page.rs:69:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted (core dumped)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.