Giter Club home page Giter Club logo

jammdb's People

Contributors

brennenmm7 avatar geniot avatar lemmih avatar pjtatlow avatar wackbyte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jammdb's Issues

Crashes during multithreaded workload

On a multithreaded workload I regularly get a panic when retrieving an item, while writes may be happening (95% read, 5% insert). From what I can tell, it does not occur on single-threaded workloads, and happens more often when using many threads.

There are two kinds of errors I'm getting:

  • Panic in page_node::PageNode::index
  • get_bucket returns Error value BucketMissing out of the blue

System

  • Ubuntu 22.04 LTS
  • i9 11900K
  • 32 GB RAM
  • Samsung PM9A3 NVMe SSD

Reproduction?

Using https://github.com/marvin-j97/rust-storage-bench, run with:

RUST_BACKTRACE=full cargo run -r -- --out jammdb_test.jsonl --workload task-f --backend jamm-db --fsync --threads 16 --minutes 5 --key-size 8 --value-size 256 --items 100 --cache-size 5000000

May need to run multiple times, it's very non-deterministic.

Stack trace

Panic

thread '<unnamed>' panicked at jammdb-0.11.0/src/page_node.rs:69:22:
INVALID PAGE TYPE FOR INDEX: 4

stack backtrace:
   0:     0x563f6b08a69c - std::backtrace_rs::backtrace::libunwind::trace::ha637c64ce894333a
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:     0x563f6b08a69c - std::backtrace_rs::backtrace::trace_unsynchronized::h47f62dea28e0c88d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x563f6b08a69c - std::sys_common::backtrace::_print_fmt::h9eef0abe20ede486
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x563f6b08a69c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hed7f999df88cc644
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x563f6b0b44e0 - core::fmt::rt::Argument::fmt::h1539a9308b8d058d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/rt.rs:142:9
   5:     0x563f6b0b44e0 - core::fmt::write::h3a39390d8560d9c9
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/mod.rs:1120:17
   6:     0x563f6b087ccf - std::io::Write::write_fmt::h5fc9997dfe05f882
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/io/mod.rs:1762:15
   7:     0x563f6b08a484 - std::sys_common::backtrace::_print::h894006fb5c6f3d45
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x563f6b08a484 - std::sys_common::backtrace::print::h23a2d212c6fff936
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x563f6b08bc37 - std::panicking::default_hook::{{closure}}::h8a1d2ee00185001a
  10:     0x563f6b08b99f - std::panicking::default_hook::h6038f2eba384e475
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:292:9
  11:     0x563f6b08c0b8 - std::panicking::rust_panic_with_hook::h2b5517d590cab22e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:779:13
  12:     0x563f6b08bf9e - std::panicking::begin_panic_handler::{{closure}}::h233112c06e0ef43e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:657:13
  13:     0x563f6b08ab66 - std::sys_common::backtrace::__rust_end_short_backtrace::h6e893f24d7ebbff8
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:170:18
  14:     0x563f6b08bd02 - rust_begin_unwind
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
  15:     0x563f6a169d15 - core::panicking::panic_fmt::hbf0e066aabfa482c
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
  16:     0x563f6a32080d - jammdb::page_node::PageNode::index::hb7e48e18bd0cd9e3
  17:     0x563f6a31ec48 - jammdb::cursor::search::he06cf7d0bde89993
  18:     0x563f6a1ef32f - jammdb::bucket::Bucket::get::h648e73f8fb9f722d
  19:     0x563f6a2018d3 - worker::db::DatabaseWrapper::get::h286a7b65683b822d

Not a panic, but a non-deterministic Err value (Bucket is definitely not missing, considering millions of reads before it did not fail):

thread '<unnamed>' panicked at src/worker/db.rs:196:52:
called `Result::unwrap()` on an `Err` value: BucketMissing

`range` seems not work correctly.

Hi folks, I found that range does not work expectedly.

In the example, range only contains one element, key = 1, val = "1". But, I think the correct range should return two elements key = 1 and key = 2.

fn main() {
    let db = jammdb::Db::open("foo").unwrap();
    let tx = db.tx(true).unwrap();
    let b = tx.get_or_create_bucket("foo").unwrap();
    b.put(1u64.to_be_bytes(), "1").unwrap();
    b.put(2u64.to_be_bytes(), "2").unwrap();
    b.put(3u64.to_be_bytes(), "3").unwrap();
    tx.commit().unwrap();

    let tx = db.tx(true).unwrap();
    let b = tx.get_bucket("foo").unwrap();
    for i in b.range(1u64.to_be_bytes().as_slice()..=2u64.to_be_bytes().as_slice()) {
      println!("remove key {}", i.key());
      b.delete(i.key()).unwrap();
    }
    tx.commit().unwrap();

    let tx = db.tx(false).unwrap();
    let b = tx.get_bucket("foo").unwrap();

    // panic unexpectedly, as the code should remove key in range 1..=2
    assert!(b.get(2u64.to_be_bytes().as_slice()).is_none()); 
 }

[E0596] cannot borrow data in an Arc as mutable.

Thank you @pjtatlow

I tried to clone and pass-through Arc, but it shows [E0596] cannot borrow data in an Arc as mutable. Any idea ?

let jdb = jammdb::DB::open("../jammdb").unwrap();
HttpServer::new(move || {
App::new()
.data(jdb.clone())
async fn fn_inituserdb(req: HttpRequest, _jdb: Datajammdb::DB) ->Result<String, Error> {
let _tx = _jdb.tx(true).borrow_mut().as_ref().map_err(|e| actix_web::http::StatusCode::BAD_REQUEST);
| ^^^^ cannot borrow as mutable

[E0596] cannot borrow data in an Arc as mutable.
[Note] cannot borrow as mutable

panicked at 'attempt to subtract with overflow'

The following code panics with
thread 'main' panicked at 'attempt to subtract with overflow', .../jammdb-0.5.0/src/page.rs:107:25:

const TEST_DB: &str = "tmp.db";
fn main() {
    let _ = std::fs::remove_file(TEST_DB);
    {
        let db = jammdb::DB::open(TEST_DB).unwrap();
        {
            let tx = db.tx(true).unwrap();
            let root = tx.get_or_create_bucket("ROOT").unwrap();
            tx.commit().unwrap();
        }
        {
            let tx = db.tx(true).unwrap();
            let root = tx.get_or_create_bucket("ROOT").unwrap();
            let child = root.get_or_create_bucket("CHILD").unwrap();
            tx.commit().unwrap(); // panic! is here, page.rs:107: self.overflow = num_pages - 1;  and num_pages is 0
        }
    }
    let _ = std::fs::remove_file(TEST_DB);
}

If I do all stuff in the single transaction everything is OK

fn main() {
    let _ = std::fs::remove_file(TEST_DB);
    {
        let db = jammdb::DB::open(TEST_DB).unwrap();
        {
            let tx = db.tx(true).unwrap();
            let root = tx.get_or_create_bucket("ROOT").unwrap();
            // remove tx.commit() + db.tx()
            let root = tx.get_or_create_bucket("ROOT").unwrap();
            let child = root.get_or_create_bucket("CHILD").unwrap();
            tx.commit().unwrap();
        }
    }
    let _ = std::fs::remove_file(TEST_DB);
}

Unfortunately, the problem prevents me from using nested buckets at all

Misaligned ptr

Error:

thread 'db::test::test_db' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0x7f877000191c', src/index.crates.io-6f17d22bba15001f/jammdb-0.9.0/src/freelist.rs:64:29

Code:

use jammdb::DB;

#[test]
fn test_db() -> Result<(), Box<dyn std::error::Error>> {
    let db = DB::open("data/123.db")?;

    let tx = db.tx(true)?;
    tx.create_bucket("test")?;
    tx.commit()?;

    Ok(())
}

Version:

cargo 1.71.0-nightly (d0a4cbcee 2023-04-16)
rustc 1.71.0-nightly (fec9adcdb 2023-04-21)
jammdb: 0.9.0
system: Linux 6.2.8-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC x86_64 GNU/Linux

Async version

Hey!

Would adding an option to have an async version via something like maybe_async make sense?

I want to implement this since I want to use this database but I want to use it in an async context and doing tokio::spawn_blocking or similar for every db call doesn't feel right.

Power outage tolerance

How tolerant is JammDB to random power outages when it is writing data? Is data corruption a risk with JammDB if the machine is unexpectedly stopped mid-write?

Describe complexity bounds

Please describe complexity bounds of your DB. Are reads/writes bounded? Are they logarithmic?

A few minutes ago I rejected Ozon which is another Rust port of BoltDB, because it multiplies capacity by two, so using inefficiently at least disk space. Do you share with Ozon this "feature"?

P.S. I need a bounded or logarithmic for both reads and writes key/value store.

[Suggestion] Type state pattern for transactions

Hi,

I'm coming from go and was looking for something similar to boltdb.
After finding your project and starting to learn rust, I wanted to ask if it would be feasible to implement the typestate pattern for transactions.

Instead of just allowing to call tx() with a bool you could also call ro() or rw() which would give a transaction without any of the write functions even available. That way you could catch these errors at compiler time.

If you think this is a good idea, I'd be happy to do a bit of gruntwork for adding the state to each neccesary type.
I don't think I can handle all problems however, so I'd be grateful for some help later on.

I would suggest these 3 TypeStates:

  • Legacy (would be used when tx is called)
  • ReadOnly
  • WriteOnly

This way other projects will still function, while enabling compiler type checking the transaction type.

database Design

Is there a project design document? Or a description of the database structure design? I want to do something interesting with your implementation, so I need more details.

Read-write operations

Hello! Noob question here as i'm using your project to learn rust.

Is there a way to read and write in the same transaction in order to do some conditional .put() ?

For instance adding one (key,value) only if the value is not present already

Example:
I'm opening a tx in the root,

then sending the buckets to some functions as references and inside that function i'm checking if a key exist, and if it exist, i'm throwing an error , and if not, i'm doing a .put()

 match schema_index_kinds_bucket.get(kindName) {
        Some(data) => match data {
            Data::Bucket(_) => Err(Error::KindExists),
            Data::KeyValue(_) => Err(Error::KindExists),
        },
        None => {
            schema_kinds_bucket
                .put(kindId.to_be_bytes(), kindName)
                .map_err(Error::Database)?;
            schema_index_kinds_bucket
                .put(kindName, kindId.to_be_bytes())
                .map_err(Error::Database)?;
            Ok(())
        }
    }

The error I get, which makes sense if this is not compatible with read-write, is that the "schema_kinds_bucket" may not live long enough

Segmentation fault when boxing cursor

Hi, I was trying to use jammdb in a project of mine but encountered a segmentation fault after boxing an iterator derived from a jammdb::Cursor.

This is a small example that demonstrates the issue, my-database.db has been initialized with the same content as the example in the README.

use jammdb::{Data, DB};

fn iterate(db: &mut DB) -> Box<dyn Iterator<Item = String>> {
    Box::new(
        db.tx(false)
            .unwrap()
            .get_bucket("names")
            .unwrap()
            .cursor()
            .filter_map(|item| match item {
                Data::Bucket(_) => None,
                Data::KeyValue(kv) => Some(kv),
            })
            .map(|kv| String::from_utf8(kv.value().to_vec()).unwrap()),
    )
}

fn main() {
    let mut db = DB::open("my-database.db").unwrap();

    for surname in iterate(&mut db) {
        println!("{}", surname);
    }
}

GDB backtrace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
core::ptr::non_null::NonNull<T>::as_ref (self=0x90)
    at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/ptr/non_null.rs:115

(gdb) bt
#0  core::ptr::non_null::NonNull<T>::as_ref (self=0x90) at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/ptr/non_null.rs:115
#1  0x00005555555918ff in alloc::sync::Arc<T>::inner (self=0x90) at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/liballoc/sync.rs:731
#2  0x0000555555591b0f in <alloc::sync::Arc<T> as core::ops::deref::Deref>::deref (self=0x90) at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/liballoc/sync.rs:982
#3  0x0000555555597c6a in jammdb::transaction::TransactionInner::page (self=0x0, id=93824993075216)
    at /home/giacomo/.cargo/registry/src/github.com-1ecc6299db9ec823/jammdb-0.3.0/src/transaction.rs:223
#4  0x000055555556bf2e in jammdb::bucket::Bucket::page_node (self=0x555555622910, page=93824993075216)
    at /home/giacomo/.cargo/registry/src/github.com-1ecc6299db9ec823/jammdb-0.3.0/src/bucket.rs:621
#5  0x000055555556f889 in jammdb::cursor::Cursor::seek_first (self=0x555555623000) at /home/giacomo/.cargo/registry/src/github.com-1ecc6299db9ec823/jammdb-0.3.0/src/cursor.rs:204
#6  0x000055555556faac in <jammdb::cursor::Cursor as core::iter::traits::iterator::Iterator>::next (self=0x555555623000)
    at /home/giacomo/.cargo/registry/src/github.com-1ecc6299db9ec823/jammdb-0.3.0/src/cursor.rs:232
#7  0x0000555555565cb8 in core::iter::traits::iterator::Iterator::try_fold (self=0x555555623000, init=(), f=...)
    at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/iter/traits/iterator.rs:1876
#8  0x0000555555565c4f in core::iter::traits::iterator::Iterator::find_map (self=0x555555623000, f=0x555555623000)
    at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/iter/traits/iterator.rs:2207
#9  0x0000555555565515 in <core::iter::adapters::FilterMap<I,F> as core::iter::traits::iterator::Iterator>::next (self=0x555555623000)
    at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/iter/adapters/mod.rs:1070
#10 0x000055555556591d in <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::next (self=0x555555623000)
    at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/iter/adapters/mod.rs:791
#11 0x0000555555565a0e in <alloc::boxed::Box<I> as core::iter::traits::iterator::Iterator>::next (self=0x7fffffffd1d0)
    at /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/liballoc/boxed.rs:951
#12 0x00005555555648d5 in jammdb_ub_minimal::main () at src/main.rs:21

Of course it would be pointless to implement such a function, but the code above is just a reduced case of where this happens. From what I've understood, I guess this is caused by Tx, Bucket and/or Cursor structs getting dropped, while the boxed iterator that the function returns still points to memory there. I saw there are unsafe uses in the code, that's why I'm guessing this is the nature of what's happening.

Unfortunately I'm not an experienced Rust user, I tried to look at the code but I'm not able to easily understand how to fix this, otherwise I would have tried to submit a PR.

Hope this helps, thanks for the otherwise nice crate!

API safety and soundness

Description mentions that the library uses mmap, also links to "Single-level store" article, which suggests that jammdb may give users references leading to mmaped files.

I expected e.g. jammdb::DB::open to be unsafe fn, as full protection against unreasonable things that is needed to make it completely sound (or maybe even enough-for-practical-considerations sound) may not be viable gived the architecture.

But it seems to be not the case, the crate API does not hint at potentials undefined behaviours users may face with the library. For example, what worst could happen if the database file is mangled by external process while in use? What if multiple processes open the same database for writing (including using networked filesystem without locking)?

Shall the entry function be marked unsafe to make users commit to not doing unreasonable things to the database? Or is jammdb actually fully handles all possible complications of memory mapping, so the API is indeed sound even when misused?

can't open boltdb file

I have a boltdb config file which generate in go language, I want to read it using jammdb, but it panic when I open it, the error as follows:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `4`,
 right: `3`', /home/jicky/.cargo/registry/src/mirrors.sjtug.sjtu.edu.cn-be2141875385cea5/jammdb-0.10.0/src/page.rs:69:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted (core dumped)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.