Comments (6)
I usually use mockeroo for generating data.
Sakila is also commonly used in opensource database apps as sample.
I've also gathered data from various sources and put them here.
I've also written some webapp that showcase displaying these database.
from locustdb.
Some properties we probably want from our data generator:
- able to run in-process and without reading from disk
- generate 100M values within seconds
- deterministic with fixed seed
- can be extended with new data distributions
from locustdb.
I think you guys are looking for meaningful data rather than just noise, but SeaHash might be of some use. You can define your own seed for it. The following code iterates 100 million times in just under a second on my machine.
extern crate seahash;
use seahash::SeaHasher;
use std::hash::Hasher;
fn main() {
let mut hash = SeaHasher::new();
let mut last = 0;
for _ in 1..100_000_000 {
hash.write_u64(last);
last = hash.finish();
}
println!("{:#X}", last);
}
from locustdb.
Actually, completely random and incompressible data is an important test case so SeaHash could be a very useful basis for one the generators.
from locustdb.
09467c6 implements basic framework for random column generation, as well as specific generators for int and string columns based on Markov chains which should capture most properties we care about with regards to performance (except correlations between columns).
Still worthwhile to add additional simpler distributions that are easier to configure and faster to generate.
from locustdb.
f8688fe adds composable generator that picks a portion of partitions and replaces them with all null
values and #41 adds generators for random strings.
from locustdb.
Related Issues (20)
- Revisit choice of hash functions during hash grouping HOT 1
- Query planner chooses names for anonymous result columns that might be identical to existing ones HOT 4
- Fix performance regression in benchmark case
- String packed columns break things
- Unable to build `LocustDB` on Mac OS X HOT 4
- Logo? HOT 3
- Queries that have type errors or reference missing columns should give helpful errors/warnings HOT 7
- Order by string column fails with `top_n_asc not supported for type ScalarStr` HOT 4
- Support window functions like row_number() HOT 4
- Default ordering of index columns and inserts to the already existing data HOT 1
- Why not support ansi SQL? HOT 1
- Allow simple GROUP BY clauses
- It does not work at all. HOT 3
- Unary minus and negative constants don't parse
- Tweak RocksDB options
- Optimize RocksDB layout for multiple tables
- Fix usages of `unsafe` related to hard-to-model lifetimes
- Perform merging of select queries by constructing and executing query plan
- Expand cases where intermediary results can be streamed between operators
- Columns in query output are not always in same order as projections in query
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from locustdb.