plabayo / venndb Goto Github PK
View Code? Open in Web Editor NEWin memory Rust database to query your data like a Venn diagram
Home Page: https://venndb.rs
License: Apache License 2.0
in memory Rust database to query your data like a Venn diagram
Home Page: https://venndb.rs
License: Apache License 2.0
It can be useful if users can use custom columns defined on runtime to be able to filter efficiently. It can replace the current query predicate
The behaviour where inserted rows with an any value for a focussed property that match for all given values of that property is correct. This is desired and what is wanted. It is the opposite direction however which is currently wrong and which will require another breaking change release.
When querying one can specify a value desired for a filter map property. This can also be an any value given it lives within the same type dimension as the property of that row to begin with. At first this was seen as a side effect and not desired. However this is actually meaningful and in that light it turns out that current behaviour is wrong. In v0.3 (and before) we ignore the query value in case it is an any value, which has the indirect effect that all rows will match regardless of the value. This also includes undefined ones. This is however incorrect for two reasons:
This has to be corrected, but does require a breaking change, as it is a change in behaviour correction.
Sometimes you want to search for something faster. Bloom/Quotient/Cuckoo filters helps with that.
https://en.wikipedia.org/wiki/Approximate_Membership_Query_Filter
Rust's type system helps a lot with validation, given plenty of assumptions and requirements can be registered within the used types. At times however the combination of properties give meaning as a group, where some combinations make sense and others not. Enums (sum types) are what help usually in this situation. However it is not always a desired one, and in the case of venndb it doesn't play very well with the rest of the setup. As such for that reason and others it can be useful to allow for a custom validation of a row prior to appending it, such that one can guarantee that any appended rows do fulfill all required requirements.
Example:
#[derive(Debug, VennDB)]
#[venndb(name = "MyDB", validatator = "my_validator_fn")]
pub struct Value {
pub foo: String,
pub bar: u32,
}
fn my_validator_fn(value: &Value) -> bool {
!foo.is_empty() && value.bar > 0
}
let mut db = MyDB::default();
assert!(db.append(Value {
foo: "".to_owned(),
bar: 42,
}).is_err()); // fails because foo == empty
The above example illustrates that the type system can only bring you so far.
Of course one can argue that a NewType can solve this particular issue by only allowing the newtype
instances to be created with values that match such conditions. And this is true.
However, what if the legal value space of a column depends upon the specific value of another?
E.g. what is foo
can be empty when bar
is 2. This becomes a lot more difficult, and given the
large space that unsigned integers are composed from it might also become impossible
to abstract such rules in a sum type. If desired at all of course, given it would also make
the usage of it within a venndb-derived db more difficult.
However, ban it for Keys
Sometimes you may need to remove some records from a table. This can be done through an additional optional filter removed
.
Consider:
#[derive(Debug, VennDB)]
pub struct Employee {
#[venndb(filter, any)]
pub country: Option<String>,
}
We need to find a way to generate code such that one can say:
let db = EmployeeDB::from_iter(/* ... */);
let mut query = db.query();
query.country("US".into());
// get all employees which are in US or have the "Any" country defined...
// whatever that means
let employees = query.exec()...
Eg to allow anyone from Engineering or HR department.
Let's say you have something called struct StringFilter(string)
that you use as the property in your proxy db instance struct.
Most likely you do implement From<String> for StringFilter
where you sanitise the string prior to storing it.
Thing is... Right now if you have the property foo: StringFilter
, you need to do query.foo("a".into())
.
This QoL improvement would make it that you can do query.foo("a")
instead, which is easier to use and also
is closer to the intent of the property, given that the StringFilter
wrapper is only there to facilitate behaviour
we want from the database interaction, rather then a high level detail that a user should really care about.
The space (and time for construction) complexity for a Db for a type that has many (but not all) unique #[venndb(filter)]
fields appears to be O(n^2)
, which makes constructing (and holding) such a Db quickly infeasible for collection with more than ~100,000 unique fields.
Among other solutions, I wonder if we could use some sort of compressed bitset implementation, like roaring or hi_sparse_bitset to better handle this scenario with minimal to no performance loss in the "large number of total entries, small number of unique filter field values" case.
If we're not interested in adjusting the current implementation, this could also be something gated behind a feature flag or configured by some derive attribute.
I'd be happy to take a look at feasibility if it's something that would be considered ๐
This testcase breaks on v0.2.1, despite it expected to not to break:
#[derive(Debug, VennDB)]
pub struct Worker {
#[venndb(key)]
id: u32,
is_admin: bool,
is_active: Option<bool>,
#[venndb(filter, any)]
department: Option<Department>,
}
#[test]
fn test_any_row_optional_filter_map_white_rabbit() {
let db = WorkerDB::from_rows(vec![
Worker {
id: 1,
is_admin: false,
is_active: Some(true),
department: Some(Department::Engineering),
},
Worker {
id: 2,
is_admin: false,
is_active: None,
department: None,
},
Worker {
id: 3,
is_admin: false,
is_active: Some(true),
department: Some(Department::Any),
},
Worker {
id: 4,
is_admin: false,
is_active: Some(true),
department: Some(Department::HR),
},
])
.unwrap();
let mut query = db.query();
query.department(Department::Marketing);
let results = query.execute().unwrap().iter().collect::<Vec<_>>();
assert_eq!(results.len(), 1);
assert_eq!(results[1].id, 3);
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.