Giter Club home page Giter Club logo

rustysparsemmx's Introduction

Rust

RustySparseMMX

A rust program to convert a dense ',' or '\t' separated table to the Matrix Marked spasre matrix format. Only integer values are supported. Float values as provided by some databases are converted to ints.

A preprint of this is available on ResearchGate

Usage

rusty_sparse_mmx -h
rusty_sparse_mmx 0.1.0
Stefan L. <[email protected]>
Convert a dense csv table to the MatrixMarket format used by 10x CellRanger. Meaning the outfiles
are matrix.mtx.gz, features.tsv.gz and barcodes.tsv.gz. To circumvent problems while importing into
Scanpy the files are created in a folder named 'filtered_feature_bc_matrix'

USAGE:
    rusty_sparse_mmx [OPTIONS] --ipath <IPATH>

OPTIONS:
    -h, --help                     Print help information
    -i, --ipath <IPATH>            the input input path
    -s, --sep <SEP>                the column separator str [default: ,]
    -t, --transpose <TRANSPOSE>    transpose the data [default: false]
    -V, --version                  Print version information

Install

  1. Clone this repo.

In this repo you then do:

cargo build -r
sudo cp target/release/rusty_sparse_mmx /usr/bin/
sudo cp target/release/chimera2sparse /usr/bin/

You can of cause also use the target/release/dense_2_sparse program from the original point or copy it somewhere else.

Testing

cargo build -r
target/release/rusty_sparse_mmx -i testData -s "\t"
Rscript testData/Rtest.R
Rscript testData/Rtest_transp.R

This output is expected:

Processing file "testData/DenseMatrix.csv"
I have detected 300 columns
100 columns 300 rows and 2693 data points read
sparse Matrix: 300 cell(s), 100 gene(s) and 100 entries written to path Ok("testData/DenseMatrix/filtered_feature_bc_matrix"); 
Attaching SeuratObject
[1] "OK"

Speed

Using the 3,2G big GSE166895_postQC_mRNAraw_FL-FBM-CB.csv file from the Human Cell Atlast database (Bone -> "Blood and immune development in human fetal bone marrow and Down syndrome" ) takes 1min 46 sec on one core of an AMD Ryzen 5 3600X 6-Core Processor using Ubuntu 22.04.1 LTS with cernel Linux 5.15.0-58-generic and a magnetic disk as storage.

This is quite impressive - or?

Memory Usage

rusty_sparse_mmx comparative memory usage

Memory usage measured over time using top program on Linux. Shown is the top value 'RES' measured every 5 seconds for a R dense to sparse conversion [red], the rusty_sparse_mmx Rust process for the same data [blue] and the loading of the sparse matrix into R [black]. The file GSE166895_postQC_mRNAraw_FL-FBM-CB.csv from [The Human Cell Atlas / Data] was used for these measurements

References

The Human Cell Atlas / Data. Blood and immune development in human fetal bone marrow and Down syndrome. 2022. url: https://data.humancellatlas.org/explore/projects/04ad400c-58cb-40a5-bc2b-2279e13a910b/m/project-matrices

rustysparsemmx's People

Contributors

stela2502 avatar

Stargazers

hao dong avatar Rob Patro avatar

Watchers

 avatar  avatar

Forkers

dhtc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.