Giter Club home page Giter Club logo

adscodex's Introduction

Adaptive DNA Storage Codec (ADS Codex)

ADS Codex is a DNA storage codec that provides high density and can adapt to different requirements for DNA synthesis and sequencing.

External Dependencies

Reed-Solomon Package

ADS Codex depends on https://github.com/klauspost/reedsolomon

Please install it using

go get -u github.com/klauspost/reedsolomon

Lookup Tables

Lookup tables speed up significantly ADS Codex. You can generate them using the tblgen tool (see below), or download them from github (1.7 GB file):

https://github.com/lanl/adscodex/releases/download/1.0/tables.zip

Unpack the zip into the tbl directory where the tools and the unit tests are expectin the lookup tables.

Installation

To get ADS Codex clone this repository and build the packages and commands that you are interested in. The description in docs/howtos/HOWTO-setup-go-and-adscodex.txt has more detailed information on how to build it.

Documentation

The specification of the codec is located in the slides located in the doc directory. More documentation on the implementation is located in the source code.

The HOWTO documents in docs/howtos have more information on how to encode and decode data with ADS Codex.

Packages

oligo

Contains the basic abstraction of an oligo that is used by the rest of the packages.

oligo/short

An implementation of the basic oligo interface that stores an oligo in a 64-bit integer, and therefore can handle short oligos (up to 32 nts).

oligo/long

An implementation of the basic oligo interface that can store an arbitrary long oligo. It uses one byte per nt.

criteria

Abstract interface for oligo viability criteria. It is used by the Level 0 codec (l0) to check if an oligo can be synthesized/sequenced. The package implements a single criteria: H4G2 that prevents oligos with homopolymers longer than 4 nts (for A, T, and C) or 2 nts for G.

l0

Level 0 of the ADS Codex codec (bit packing). Theoretically it can pack any value up to 64 bits. In practice it is prohibitively slow to pack large values and requires lookup tables even for 17 bit values to achieve reasonable performance.

l1

Level 1 of the ADS Codex codec. Packs an address and array of bytes into a single oligo.

l2

Level 2 of the ADS Codex codec. Packs an arbitrary array of bytes into a collection of oligos. Provides erasure code oligos for recove of the data in case of errors.

Tools

The tools in the repository use the packages to provide some convenient commands.

tblgen

Generates encoding and decoding lookup tables for speeding-up the Level 0 encoding and decoding.

For example, generating an encoding lookup table for 17 nts oligos that has 2^13 entries can be done by:

./tblgen -e encnt17b13.tbl -l 17 -b 13

Generating a decoding lookup table for 17 nts oligos that has 2^14 entries can be done by:

./tblgen -d decnt17b7.tbl -l 17 -b 7

Although the code is parallelized and uses all available cores, it can take few hours to generate the table.

encode

Encodes the specified file and outputs a list of oligos that represent it.

decode

Decodes the specified list of oligos into a file. If not all data can be recovered, the output file might have holes.

Miscelaneous utilities

The utils directory contains many utilities that can be used to analyze sequenced data.

Unit Tests

The packages have some limited unit tests that can be run by the standard:

go test

The unit tests will slowly be extended to cover all use cases.

Limitations

There are multiple TODO and FIXME comments in the source code that describe things that are missing, or implementation restrictions that should be fixed eventually.

adscodex's People

Contributors

dmanno avatar lionkov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

adscodex's Issues

flag provided but not defined: -etbl

Hi,

I'm running the encoder tool (using macOS):

adscodex % go run encode/main.go -dseqnum 3 -rseqnum 2 -etbl ~/ALTGOPATH/src/adscodex/tbl/h4g2-17-13.etbl \ -p3 CAGTGAGCTGGCAACTTCCA -p5 CGACATCTCGATGGCAGCAT $HOME/32kfilerand >> dna.out

flag provided but not defined: -etbl
Usage of /var/folders/lt/0b3s6w253tg4jq9x3qh9qk740000gn/T/go-build1056971182/b001/exe/main:
-addr uint
start address
-compat
compatibility with 0.9
-dbnum int
number of data blocks (default 5)
-dseqnum int
number of data oligos per erasure group (default 3)
-dtcsum string
L1 data blocks checksum type (parity or even) (default "parity")
-mdcnum int
metadata error detection blocks (default 2)
-mdcsum string
L1 metadata blocks checksum type (rs for Reed-Solomon, crc for CRC) (default "crc")
-mdsz int
metadata block size (default 4)
-p3 string
3'-end primer (default "CAGTGAGCTGGCAACTTCCA")
-p5 string
5'-end primer (default "CGACATCTCGATGGCAGCAT")
-rndmz
randomze data
-rseqnum int
number of erasure oligos per erasure group (default 2)
-shuffle int
random seed for shuffling the order of the oligos (0 disable)
exit status 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.