Giter Club home page Giter Club logo

minimap2-rs's Introduction

A rust FFI library for minimap2. In development! Feedback appreciated!

https://crates.io/crates/minimap2 https://docs.rs/minimap2/latest/minimap2/ CircleCI codecov

Structure

minimap2-sys is the library of the raw FFI bindings to minimap2. minimap2 is the most rusty version.

How to use

Requirements

minimap2 = "0.1.10"

Also see Features

Tested with rustc 1.64.0 and nightly. So probably a good idea to upgrade before running. But let me know if you run into pain points with older versions and will try to fix!

rustup update

Usage

Create an Aligner

let mut aligner = Aligner::builder()
    .map_ont()
    .with_threads(8)
    .with_cigar()
    .with_index("ReferenceFile.fasta", None)
    .expect("Unable to build index");

Align a sequence:

let seq: Vec<u8> = b"ACTGACTCACATCGACTACGACTACTAGACACTAGACTATCGACTACTGACATCGA";
let alignment = aligner
    .map(&seq, false, false, None, None)
    .expect("Unable to align");

Presets

All minimap2 presets should be available (see functions section):

let aligner = map_ont();
let aligner = asm20();

Customization

MapOpts and IdxOpts can be customized with Rust's struct pattern, as well as applying mapping settings. Inspired by bevy.

Aligner {
    mapopt: MapOpt {
        seed: 42,
        best_n: 1,
        ..Default::default()
    },
    idxopt: IdxOpt {
        k: 21,
        ..Default::default()
    },
    ..map_ont()
}

Working Example

There is a binary called "fakeminimap2" that I am using to test for memory leaks. You can follow the source code for an example. It also shows some helper functions for identifying compression types and FASTA vs FASTQ files. I used my own parsers as they are well fuzzed, but open to removing them or putting them behind a feature wall.

Alignment functions return a Mapping struct. The Alignment struct is only returned when the Aligner is created using .with_cigar().

A very simple example would be:

let mut file = std::fs::File::open(query_file);
let mut reader = BufReader::new(reader);
let mut fasta = Fasta::from_buffer(&mut reader)

for seq in reader {
    let seq = seq.unwrap();
    let alignment: Vec<Mapping> = aligner
        .map(&seq.sequence.unwrap(), false, false, None, None)
        .expect("Unable to align");
    println!("{:?}", alignment);
}

There is a map_file function that works on an entire file, but it is not-lazy and thus not suitable for large files. It may be removed in the future or moved to a separate lib.

let mappings: Result<Vec<Mapping>> = aligner.map_file("query.fa", false, false);

Multithreading

Multithreading is supported, for implementation example see fakeminimap2. Minimap2 also supports threading itself, and will use a minimum of 3 cores for building the index. Multithreading for mapping is left to the end-user.

let mut aligner = Aligner::builder()
    .map_ont()
    .with_threads(8);

Features

The following crate features are available:

  • mm2-fast - Replace minimap2 with mm2-fast. This is likely not portable.
  • htslib - Support output of bam/sam files using htslib.
  • simde - Compile minimap2 / mm2-fast with simd-everywhere support.
  • map-file - Default - Convenience function for mapping an entire file. Caution, this is single-threaded.

Map-file is a default feature and enabled unless otherwise specified.

Building for MUSL

Follow these instructions.

In brief, using bash shell:

docker pull messense/rust-musl-cross:x86_64-musl
alias rust-musl-builder='docker run --rm -it -v "$(pwd)":/home/rust/src messense/rust-musl-cross:x86_64-musl'
rust-musl-builder cargo build --release

Please note minimap2 is only tested for x86_64. Other platforms may work, please open an issue if minimap2 compiles but minimap2-rs does not.

Features tested with MUSL

  • mm2-fast - Fail
  • htslib - Success
  • simde - Success

Want feedback

  • Many fields are i32 / i8 to mimic the C environment, but would it make more sense to convert to u32 / u8 / usize?
  • Let me know pain points

Tools using this binding

Chopper - Long read trimming and filtering mappy-rs - Drop-in multi-threaded replacement for python's mappy

Pain Points

Probably not freeing C memory somewhere.... Not sure yet, if so it's just leaking a little... Need to do a large run to test it.

Next things todo

  • Print other tags so we can have an entire PAF format
  • -sys Compile with SSE2 / SSE4.1 (auto-detect, but also make with features)
  • Multi-thread guide (tokio async threads or use crossbeam queue and traditional threads?)
  • Iterator interface for map_file
  • MORE TESTS
  • -sys Get SSE working with "sse" feature (compiles and tests work in -sys crate, but not main crate)
  • -sys Possible to decouple from pthread?
  • -sys Enable Lisa-hash for mm2-fast? But must handle build arguments from the command-line.

Citation

You should cite the minimap2 papers if you use this in your work.

Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. [doi:10.1093/bioinformatics/bty191][doi]

and/or:

Li, H. (2021). New strategies to improve minimap2 alignment accuracy. Bioinformatics, 37:4572-4574. [doi:10.1093/bioinformatics/btab705][doi2]

Changelog

0.1.12 UNRELEASED

0.1.11

  • HTS lib: add support for optional quality scores by @eharr

0.1.10

  • HTS lib support by @eharr
  • HTS lib: Output sam/bam files by @eharr
  • More tests by @eharr
  • Display impl for Strand thanks to @ahcm
  • Update minimap2-sys to latest version by @jguhlin
  • -sys crate mm2fast added as additional backend by @jguhlin
  • zlib dep changes by @jguhlin (hopefully now it is more portable and robust)
  • -sys crate now supports SIMDe

0.1.9

  • Thanks for @Adoni5 for switching to builder pattern, and @eharr for adding additional fields to alignment.
  • Do not require libclang for normal compilation.

0.1.8

  • Multithreading support (use less raw pointers, and treat more like rust Struct's)

0.1.7

  • use libc instead of std:ffi::c_int as well

0.1.6

  • Support slightly older versions of rustc by using libc:: rather than std::ffi for c_char (Thanks dwpeng!)
  • Use fffx module for fasta/q parsing

Funding

Genomics Aotearoa

minimap2-rs's People

Contributors

jguhlin avatar eharr avatar ahcm avatar adoni5 avatar wdecoster avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.