Giter Club home page Giter Club logo

external_sort's Introduction

external_sort

Provides the ability to perform external sorts on structs, which allows for rapid sorting of extremely large data streams.

Usage

Add this to your Cargo.toml:

[dependencies]
external_sort = "^0.1.1"

and this to your crate root:

extern crate external_sort;

Examples

The following shows using external_sort to sort a vector of simple structs.

Note that your struct must impl Ord, Clone, as well as the serde Serialize and Deserialize traits. Additionally, in order for external_sort to track it's memory buffer usage, your struct must be able to report on it's size (via external_sort::ExternallySortable)

extern crate external_sort;
#[macro_use]
extern crate serde_derive;

use external_sort::{ExternalSorter, ExternallySortable};

#[derive(Serialize, Deserialize, Clone, PartialEq, Eq, PartialOrd, Ord)]
struct Num {
    the_num: u32,
}

impl Num {
    fn new(num: u32) -> Num {
        Num { the_num: num }
    }
}

impl ExternallySortable for Num {
    fn get_size(&self) -> u64 {
        4
    }
}

fn main() {
    let unsorted = vec![
        Num::new(5),
        Num::new(2),
        Num::new(1),
        Num::new(3),
        Num::new(4),
    ];
    let sorted = vec![
        Num::new(1),
        Num::new(2),
        Num::new(3),
        Num::new(4),
        Num::new(5),
    ];

    let external_sorter = ExternalSorter::new(16, None);
    let iter = external_sorter.sort(unsorted.into_iter()).unwrap();
    for (idx, i) in iter.enumerate() {
        assert_eq!(i.unwrap().the_num, sorted[idx].the_num);
    }
}

If your struct is unable to report on it's size, simply return 1 from get_size(), and then pass the number of objects (rather than bytes) that the ExternalSorter should keep in memory when calling ExternalSorter::new()

external_sort's People

Contributors

aaronrphillips avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

external_sort's Issues

Add comprenssion to optimize disk usage

Hi! First of all thank you so much for this crate, it has solved a lot of problems for me <3

Secondly, It would be great to compress saved files in tmp folder to prevent huge disk usage, with this simple struct:

struct CorResult {
        gene: String,
        gem: String,
        correlation: f64,
        p_value: f64,
        adjusted_p_value: Option<f64>,
}

Using a get_size = 1 and a sort chunk of 10,000,000 elements it's consuming around 40 GB in disk sorting 360,000,000 elements, I've observed that saved chunks are in plain text, maybe a zlib compression could be added, I've tried with flate2-rs crate but I'm getting some errors.
I'm relatively new in Rust, and I don't know anything about existing compression algorithms. I'll keep trying on my spare time to get this working for a PR, but if you have some ideas about this improvement it'd be really appreciated

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.