Giter Club home page Giter Club logo

Comments (6)

ivanceras avatar ivanceras commented on September 24, 2024

I usually use mockeroo for generating data.
Sakila is also commonly used in opensource database apps as sample.
I've also gathered data from various sources and put them here.

I've also written some webapp that showcase displaying these database.

from locustdb.

cswinter avatar cswinter commented on September 24, 2024

Some properties we probably want from our data generator:

  • able to run in-process and without reading from disk
  • generate 100M values within seconds
  • deterministic with fixed seed
  • can be extended with new data distributions

from locustdb.

cedws avatar cedws commented on September 24, 2024

I think you guys are looking for meaningful data rather than just noise, but SeaHash might be of some use. You can define your own seed for it. The following code iterates 100 million times in just under a second on my machine.

extern crate seahash;

use seahash::SeaHasher;
use std::hash::Hasher;

fn main() {
    let mut hash = SeaHasher::new();
    let mut last = 0;

    for _ in 1..100_000_000 {
    	hash.write_u64(last);
    	last = hash.finish();	
    }

    println!("{:#X}", last);
}

from locustdb.

cswinter avatar cswinter commented on September 24, 2024

Actually, completely random and incompressible data is an important test case so SeaHash could be a very useful basis for one the generators.

from locustdb.

cswinter avatar cswinter commented on September 24, 2024

09467c6 implements basic framework for random column generation, as well as specific generators for int and string columns based on Markov chains which should capture most properties we care about with regards to performance (except correlations between columns).
Still worthwhile to add additional simpler distributions that are easier to configure and faster to generate.

from locustdb.

cswinter avatar cswinter commented on September 24, 2024

f8688fe adds composable generator that picks a portion of partitions and replaces them with all null values and #41 adds generators for random strings.

from locustdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.