Giter Club home page Giter Club logo

codereconstruction's Introduction

Reconstruction Algorithms for DNA Storage Systems

Omer Sabary, Alexander Yucovich, Guy Shapira, Eitan Yaakobi

In the trace reconstruction problem a length-n string x yields a collection of noisy copies, called traces, y1, …, yt where each yi is independently obtained from x by passing through a deletion channel, which deletes every symbol with some fixed probability. The main goal under this paradigm is to determine the required minimum number of i.i.d traces in order to reconstruct x with high probability. The trace reconstruction problem can be extended to the model where each trace is a result of x passing through a deletion-insertion-substitution channel, which introduces also insertions and substitutions. Motivated by the storage channel of DNA, this work is focused on another variation of the trace reconstruction problem, which is referred by the DNA reconstruction problem. A DNA reconstruction algorithm is a mapping which receives t traces y1, …, yt as an input and produces x^, an estimation of x. The goal in the DNA reconstruction problem is to minimize the edit distance between the original string and the algorithm’s estimation. For the deletion channel case, the problem is referred by the deletion DNA reconstruction problem and the goal is to minimize the Levenshtein distance.

In this work, we present several new algorithms for these reconstruction problems. Our algorithms look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original sequence. Our algorithms do not require any limitations on the input and the number of traces, and more than that, they perform well even for error probabilities as high as 0.27. The algorithms have been tested on simulated data as well as on data from previous DNA experiments and are shown to outperform all previous algorithms.

Algorithms

This repository includes our implementation of the following algorithms: Algorithms for the deletion-channel:

  1. BMA Algorithm. Batu el al.
  2. ML-SCS Algorithm. Our work
  3. ML-SCS Algorithm for Large Clusters. Our work

Algorithms for the deletion-insertion-substitution-channel:

  1. BMA-Lookahead Algorithm. Gopalan et al.
  2. VS Algorithm. Viswanathan and Swaminathan
  3. LCS-Anchor Algorithm. Our work
  4. The Iterative Reconstruction Algortihm. Our work
  5. Divider BMA Algorithm. Our work
  6. Hybrid Algorithm. Our work

Compilation

g++ -std=c++0x -O3 -g3 -Wall -c -fmessage-length=0 -o *.cpp g++ -o DNA *.o

Usage

./DNA >results.txt

The output file presents edit distance histogram. The i-th entry of the histogram shows the number of clusters that their estimated sequence has edit distance of i errors from the original strings.

The output also includes average edit distance rate (the rate are presented multiplied by 10^-3 for convenience).

Edlib Installation

Use the package manager pip to install foobar.

pip install edlib

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

TBA

codereconstruction's People

Contributors

kennard123661 avatar

Watchers

James Cloos avatar  avatar

Forkers

xavier-h-10

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.