Giter Club home page Giter Club logo

parties's Introduction

ParTIES

PARamecium Toolbox for Interspersed DNA Elimination Studies

Description

We present the Paramecium Toolbox for Interspersed DNA Elimination Studies (ParTIES), designed for Paramecium species, that (i) identifies eliminated sequences, (ii) measures their presence in a sequencing sample and (iii) detects rare elimination polymorphisms.

For a full description of the software, its options and results look at the user manual ("user_manual.pdf")

Install

ParTIES requires some other programs. An installation example is provided at the end of the user manual, as well as in the "INSTALL" file. Once all the dependencies are installed, use the "check" file to make sure everything is ok.

 ./check

Usage

To run ParTIES use the following command

parties [MODE] : PARamecium Toolbox for Interspersed DNA Elimination Studies
 Run : Run ParTIES using the configuration file
 Map : Map reads on a reference using bowtie2
 MIRAA : Method of Identification by Read Alignment Anomalies
 MICA : Method of Identification by Comparison of Assemblies
 Insert : Insert IES within a genome to create an IES containing reference
 MIRET : Method of Ies RETention
 Assembly : Filter reads and assemble them
 MILORD : Method of Identification and Localization of Rare Deletion
 Compare : Compare IES/InDel datasets

License

This software is distributed under the GNU GPL v3 license. See the "LICENSE" file for details.

How to cite

If you use this software please cite the following publication

Denby Wilkes C, Arnaiz O, Sperling L. ParTIES : a toolbox for Paramecium interspersed DNA elimination studies. Bioinformatics. 2015 Nov 20. pii: btv691. [Epub ahead of print] PubMed PMID: 26589276.

Example

The example directory contains the following files :

File Description
scaffold51_1.fa The somatic reference for this example [a single somatic scaffold of the Paramecium tetraurelia genome]
Example_reads_1.fastq.gz Read file 1 (paired with read file 2), 100 nt-long reads
Example_reads_2.fastq.gz Read file 2 (paired with read file 1), 100 nt-long reads
example.cfg Configuration file used to gives all needed options to run PARTIES with the All module (see below)

The following command will run the entire pipeline (based on the config file), generating results in the $OUT directory. You can comment lines in the configuration file by adding "#" at the begining of a line.

OUT=Test1
gunzip example/Example_reads_1.fastq.gz
gunzip example/Example_reads_2.fastq.gz
parties Run -genome example/scaffold51_1.fa -out_dir $OUT -config example/example.cfg

Before running the pipeline, check the configuration file ("example/example.cfg") to set the number of threads.

You can also run each step independently, specifying the intermediate result files on the command line.

The map module will align the reads on the reference.

OUT=Test2
parties Map -genome example/scaffold51_1.fa -out_dir $OUT \
 -fastq1 example/Example_reads_1.fastq -fastq2 example/Example_reads_2.fastq \
 -max_insert_size 500 -index_genome -threads 4 

The MIRAA module searches for breakpoints in an alignment file.

parties MIRAA -genome example/scaffold51_1.fa -out_dir $OUT \
 -bam $OUT/Map/$OUT.scaffold51_1.fa.BOWTIE.sorted.bam \
 -min_break_coverage 5 -threads 4 

The Assembly module filters sequencing reads and assemble them into contigs. Three different assemblies are created.

parties Assembly -genome example/scaffold51_1.fa -out_dir $OUT \
 -bam $OUT/Map/$OUT.scaffold51_1.fa.BOWTIE.sorted.bam \
 -miraa $OUT/MIRAA/MIRAA.gff3 \
 -fastq1 example/Example_reads_1.fastq -fastq2 example/Example_reads_2.fastq \
 -insert_size 300 -kmer 51 -threads 4 

The MICA module computes comparisons between genomes, looking for insertions in germline genomes.

parties MICA -genome example/scaffold51_1.fa -out_dir $OUT \
 -bam $OUT/Map/$OUT.scaffold51_1.fa.BOWTIE.sorted.bam \
 -miraa $OUT/MIRAA/MIRAA.gff3 \
 -germline_genome $OUT/Assembly/VELVET_51_at_least_one_no_match/VELVET_51_at_least_one_no_match_contigs.fa \
 -germline_genome $OUT/Assembly/VELVET_51_no_filter/VELVET_51_no_filter_contigs.fa \
 -germline_genome $OUT/Assembly/VELVET_51_no_mac_junctions/VELVET_51_no_mac_junctions_contigs.fa \
 -insert_size 300 -threads 4 

The Insert module creates an IES containing reference

parties Insert -genome example/scaffold51_1.fa -out_dir $OUT \
 -ies $OUT/MICA/MICA.gff3 -suffix _with_IES \
 -threads 4 

We use the Map module once again to align the reads on the IES containing reference.

parties Map -genome $OUT/Insert/Insert.fa -out_dir $OUT \
 -fastq1 example/Example_reads_1.fastq -fastq2 example/Example_reads_2.fastq \
 -max_insert_size 500 -index_genome -threads 4 -force

The MIRET module calculates precisely the level of retention of each IES in a sample.

parties MIRET -genome example/scaffold51_1.fa -out_dir $OUT \
 -germline_genome $OUT/Insert/Insert.fa \
 -bam $OUT/Map/$OUT.scaffold51_1.fa.BOWTIE.sorted.bam \
 -germline_bam $OUT/Map/$OUT.Insert.fa.BOWTIE.sorted.bam \
 -ies $OUT/MICA/MICA.gff3 \
 -germline_ies $OUT/Insert/Insert.gff3 \
 -score_method Boundaries -threads 4 

The MILORD module searches for rare deletions in sequencing reads compared to a reference.

When run on a germline genome, we do expect to see deletions that correspond to somatic reads.

parties MILORD -genome $OUT/Insert/Insert.fa -out_dir $OUT \
 -bam $OUT/Map/$OUT.Insert.fa.BOWTIE.sorted.bam \
 -ies $OUT/Insert/Insert.gff3 \
 -threads 4 

The Compare module allows coordinate-based comparisons between elements (MICA and/or MILORD results)

parties Compare -genome $OUT/Insert/Insert.fa -out_dir $OUT \
 -reference_set $OUT/Insert/Insert.gff3 \
 -current_set $OUT/MILORD/MILORD.gff3 \
 -threads 4 

parties's People

Contributors

oarnaiz avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

gdelevoye

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.