Giter Club home page Giter Club logo

sv-merger's Introduction

sv-merger

Requirements:

Must have python 2 or 3.

Additional python packages: intervaltree

Please install via: pip install intervaltree

Sample execution

arg 0: Name of the step to execute.

arg 1: Tab separated file containing the structural variants (SV) to be merged.

arg 2: File containing tandem repeat coordinates given under folder trf_coords.

arg 3: SV type, e.g. DEL, INS.

python main.py MERGE ./test_data/toy_SV_data.csv ./trf_coords/chr21.trf.sorted.gor DEL

Note: Merges SVs within and outside separately. Uses an overlap threshold of 50% for SVs outside of tandem repeats, and a threshold of 85% for SVs within the tandem repeats.

In-depth execution

Finding SVs within and outside tandem repeat regions (TRR)s.

arg 0: Name of the step to execute.

arg 1: Tab separated file containing the structural variants (SV) to be merged.

arg 2: File containing tandem repeat coordinates given under folder trf_coords.

arg 3: Output file name for this step.

arg 4: Relaxation parameter while finding SVs within/outside tandem repeats. E.g. When (SV.begin >= TRR.begin - relaxation) and (SV.end <= TRR.end + relaxation), where SV and TRR denotes a structural variant and tandem repeat region, respectively, the SV is accepted as within the given TRR.

python main.py FIND_TRR_OVERLAPS ./test_data/toy_SV_data.csv ./trf_coords/chr21.trf.sorted.gor ./test_data/toy_SV_data.csv.trr_overlap  5

Pre-clustering SVs.

arg 0: Name of the step to execute.

arg 1: File name containing SV and TRR overlaps, i.e. the output file name from the previous step.

arg 2: Output file name for this step.

arg 3: Merging overlap percentage. Note: If different merging overlap parameters will be used for SVs within and outside TRRs, use the smaller percentage in this step.

arg 4: Boolean flag for using the tandem repeat coordinates in SV pre-clustering and merging. Using 1 will carry the SVs within TRRs to the start site of their respective TRRs, using 0 will use the original SV sites.

python main.py PRE_CLUSTER ./test_data/toy_SV_data.csv.trr_overlap  ./test_data/toy_SV_data.csv.precluster 50 1

Merging SVs.

arg 0: Name of the step to execute.

arg 1: File name containing pre-clustered SVs, i.e. output file name from the previous step.

arg 2: Output file name for merged SVs within TRRs.

arg 3: Output file name for merged SVs outside TRRs.

arg 4: SV type, e.g. DEL, INS.

arg 5: Overlap percentage for SVs within TRRs.

arg 6: Overlap percentage for SVs outside TRRs.

python main.py FIND_CLIQUES ./test_data/toy_SV_data.csv.precluster ./test_data/toy_SV_data.csv.intrr.merged.csv ./test_data/toy_SV_data.csv.outtrr.merged.csv DEL 85 50

Columns for the input file containing the SVs

0: chromosome

1: begin site

2: end site (use begin site + SV length for both insertions and deletions)

3: SV id (unique identifier for the SVs)

4: Sample id (unique identifier for the sample the given SV is found in)

5: Method/algorithm finding the SV

6: SV type (e.g. DEL, INS)

7: SV length

Output

The columns are as follows:

0: SV ids

1: The final clique id for given SV id in the 1st column.

For each clique id, a representative SV can be chosen if there are more than 1 SV per clique id. One approach would be to pick an SV with the most frequent begin, end, or begin-and-end coordinate among the SVs within the same clique id.

sv-merger's People

Contributors

dbeyter avatar

Stargazers

LIU avatar Snowseed avatar Erik Garrison avatar Darren J. Lin avatar slp avatar George Carvalho avatar SimonY avatar kmanjor avatar

Watchers

James Cloos avatar Renjie Tan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.