Giter Club home page Giter Club logo

decoil-pre's Introduction

Decoil

Decoil (deconvolve extrachromosomal circular DNA isoforms from long-read data) is a software package for reconstruction circular DNA.

Getting started using docker or singularity

As a prequisite you need to have install docker or singularity (you can install this from the official website or using conda).

1.1 Download as docker image

Download decoil docker image from docker-hub. This contains all the dependencies needed to run the software. No additional installation needed. All the environment, packages, dependencies are all specified in the docker/singularity image.

# docker
docker pull madagiurgiu25/decoil:1.1.2-slim

1.2 Download as singularity image

# singularity
singularity pull decoil.sif  docker://madagiurgiu25/decoil:1.1.2-slim

2. Run example using docker or singularity

To test your installation check the Example.


3. Run Decoil reconstruction using docker or singularity

To run Decoil on your data you need to cofigure the following parameters:

# run decoil with your input with standard parameters
BAM_INPUT="<absolute path to your BAM file>"
OUTPUT_FOLDER="<absolute path to your output folder>"
NAME="<sample name>"
GENOME="<absolute path to your reference genome file>"
ANNO="<absolute path to your gtf annotation file>"

and then run the following command:

# docker
docker run -it --platform=linux/amd64 \
    -v ${BAM_INPUT}:/data/input.bam \
    -v ${BAM_INPUT}.bai:/data/input.bam.bai \
    -v ${GENOME}:/annotation/reference.fa \
    -v ${ANNO}:/annotation/anno.gtf \
    -v ${OUTPUT_FOLDER}:/mnt \
    -t madagiurgiu25/decoil:1.1.2-slim \
    decoil-pipeline sv-reconstruct \
            -b /data/input.bam \
            -r /annotation/reference.fa \
            -g /annotation/anno.gtf \
            -o /mnt --name ${NAME}
# singularity
mkdir -p ${OUTPUT_FOLDER}
mkdir -p ${OUTPUT_FOLDER}/logs
mkdir -p ${OUTPUT_FOLDER}/tmp
singularity run \
    --bind ${OUTPUT_FOLDER}/logs:/mnt/logs \
    --bind ${OUTPUT_FOLDER}/tmp:/tmp \
    --bind ${BAM_INPUT}:/data/input.bam \
    --bind ${BAM_INPUT}.bai:/data/input.bam.bai \
    --bind ${GENOME}:/annotation/reference.fa \
    --bind ${ANNO}:/annotation/anno.gtf \
    --bind ${OUTPUT_FOLDER}:/mnt \
    decoil.sif \
    decoil-pipeline sv-reconstruct \
            -b /data/input.bam \
            -r /annotation/reference.fa \
            -g /annotation/anno.gtf \
            -o /mnt --name ${NAME}

Install Decoil from source

You can install the latest version of Decoil repository (git and conda/mamba required):

git clone https://github.com/madagiurgiu25/decoil-pre.git
cd  decoil-pre

# create conda environment
mamba env create -f environment.yml
conda activate envdecoil

python -m pip install -r requirements.txt
python setup.py build install

# add decoil in $PATH
ROOT=`dirname $(which decoil)`
export PATH=$PATH:$ROOT

And check if the installation worked:

# might take a while
decoil-pipeline --version
decoil --version


Decoil configurations

An overview about the available functionalities:

decoil-pipeline decoil decoil-viz
(recommended) (advanced users) (recommended)
SV calling x
coverage track x
reconstruction x x
visualization x
docker x x x
singularity x x x


1. Reconstruct ecDNA using decoil-pipeline (recommended)

To reconstruct ecDNA we recommend to use decoil-pipeline using the sv-reconstruct mode.
This requires only a .bam file as input and generates internally all the files required for the reconstruction.

# call help
docker run -it --platform=linux/amd64 -t madagiurgiu25/decoil:1.1.2-slim decoil-pipeline --help

usage: decoil-pipeline <workflow> <parameters> [<target>]
Example: 
    # run decoil including the processing and visualization steps
    decoil-pipeline -f sv-recontruct --bam <input> --outputdir <outputdir> --name <sample> --sv-caller <sniffles> -r <reference-genome> -g <annotation-gtf>
        

Decoil 1.1.2: reconstruct ecDNA from long-read data

positional arguments:
  {sv-only,sv-reconstruct,reconstruct-only}
                        sub-command help
    sv-only             Perform preprocessing
    sv-reconstruct      Perform preprocessing and reconstruction

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -n, --dry-run
  -f, --force
  -c, --use-conda

The pipeline has the following running modes:

  • sv-only
  • sv-reconstruct
  • reconstruct-only


2. Visualization of ecDNA threads using decoil-viz (recommended)

To interpret and visualize the results of the ecDNA reconstruction threads, use decoil-viz.


3. Reconstruct ecDNA using decoil (advanced users only)

This configuration is the most flexible and allows users to use their own SV calls. For details go here.


FAQ

Check recommendations for filtering or debugging in the FAQ section.


File formats

The relevant output files for the users are:

  • reconstruct.bed - contains all genomic fragments composing all reconstructions
  • reconstruct.ecDNA.bed - contains all genomic fragments composing all the ecDNA labeled reconstructions
  • summary.txt - summarize all the circular reconstructions
cat reconstruct.bed

#chr    start   end     circ_id fragment_id     strand  coverage        estimated_proportions
chr2    15585356        15633376        0       5       +       149     75
chr3    11150000        11160001        0       41      -       103     75
chr3    11049997        11060001        0       33      +       117     75
chr2    15585356        15633376        3       5       +       149     36
chr3    11150000        11160001        3       41      -       103     36
chr3    11049997        11060001        3       33      +       117     36
chr2    15585356        15633376        3       5       +       149     36
chr2    16521052        16628305        3       13      +       37      36
chr3    10981202        11028470        3       25      -       31      36
chr12   68807722        68970910        2       53      +       252     252
cat summary.txt

circ_id chr_origin      size(MB)        label   topology_idx    topology_name   estimated_proportions
0       chr3,chr2       0.068025                4       multi_region_inter_chr  75
3       chr3,chr2       0.270566        ecDNA   5       simple_duplications     36
2       chr12           0.163188        ecDNA   0       simple_circle           252

Citation

If you use Decoil for your work please cite our paper:

Madalina Giurgiu, Nadine Wittstruck, Elias Rodriguez-Fos, Rocio Chamorro Gonzalez, Lotte Bruckner, Annabell Krienelke-Szymansky, Konstantin Helmsauer, Anne Hartebrodt, Philipp Euskirchen, Richard P. Koche, Kerstin Haase*, Knut Reinert*, Anton G. Henssen*. Reconstructing extrachromosomal DNA structural heterogeneity from long-read sequencing data using Decoil. Genome Research 2024, DOI: https://doi.org/10.1101/gr.279123.124

@article{Giurgiu2024ReconstructingDecoil,
    title = {{Reconstructing extrachromosomal DNA structural heterogeneity from long-read sequencing data using Decoil}},
    year = {2024},
    journal = {Genome Research},
    author = {Giurgiu, Madalina and Wittstruck, Nadine and Rodriguez-Fos, Elias and Chamorro Gonzalez, Rocio and Brueckner, Lotte and Krienelke-Szymansky, Annabell and Helmsauer, Konstantin and Hartebrodt, Anne and Euskirchen, Philipp and Koche, Richard P. and Haase, Kerstin and Reinert, Knut and Henssen, Anton G.},
    month = {8},
    pages = {gr.279123.124},
    doi = {10.1101/gr.279123.124},
    issn = {1088-9051}
}

Paper repository: https://github.com/henssen-lab/decoil-paper

License

Decoil is distributed under the BSD 3-Clause license. Consult the accompanying LICENSE file for more details.

Disclaimer

Decoil and the content of this research-repository (i) is not suitable for a medical device; and (ii) is not intended for clinical use of any kind, including but not limited to diagnosis or prognosis.

decoil-pre's People

Contributors

madagiurgiu25 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

decoil-pre's Issues

Stuck at "Combine simple circles" step while running Decoil

Hi! I'm currently facing an issue while running Decoil: after going through various samples, I've encountered a persistent problem with one specific sample. The process seems to be stuck at a particular step, and there has been no new output for over a day.

Here are the details:
Operating System: Windows11, Docker Desktop.
Command Used:

# run decoil with your input with standard parameters
$BAM_INPUT="./GBM5.md.bam"
$OUTPUT_FOLDER="./output"
$NAME="GBM5"
$GENOME="./reference.fa"
$ANNO="./anno.gtf"

# run decoil in `sv-reconstruct` mode
docker run -it --platform=linux/amd64 `
    -v ${PWD}/${BAM_INPUT}:/data/input.bam `
    -v ${PWD}/${BAM_INPUT}.bai:/data/input.bam.bai `
    -v ${PWD}/${GENOME}:/annotation/reference.fa `
    -v ${PWD}/${ANNO}:/annotation/anno.gtf `
    -v ${PWD}/${OUTPUT_FOLDER}:/output `
    -t madagiurgiu25/decoil:1.1.1-slim-test `
    decoil -f sv-reconstruct `
            -b /data/input.bam `
            -r /annotation/reference.fa `
            -g /annotation/anno.gtf `
            -o /output -n ${NAME}

Output on command:
image

Output file:
image

The issue arises with only this specific sample, and all other samples seem to be processed correctly without any hitches.Thank you for your time and assistance.

No summary.txt file produced

I've managed to get decoil to run with my data and produce all the right files but I am missing the summary.txt file
This file was produced with the test data but not with my own data.
(I used singularity to work with the docker image as a sif file)

The command I used was:

export BAM_INPUT="/path/to/${SAMPLE}_mod_aligned.bam"
export OUTPUT_FOLDER="/path/to/${SAMPLE}T"
export NAME="${SAMPLE}T"
export GENOME="path/to/reference.fa"
export ANNO="path/to/anno.gtf"

singularity run
-B $PWD:$HOME
-B ${BAM_INPUT}:/data/input.bam
-B ${BAM_INPUT}.bai:/data/input.bam.bai
-B ${GENOME}:/annotation/reference.fa
-B ${ANNO}:/annotation/anno.gtf
-B ${OUTPUT_FOLDER}:/output
decoil.sif decoil -f sv-reconstruct
-b /data/input.bam
-r /annotation/reference.fa
-g /annotation/anno.gtf
-o /output -n ${NAME}

Genes contained within ecDNA

Would there be a way to extract the genes that are contained within ecDNA. The decoil-viz html report shows protein-coding genes in the visualisations but it would be helpful to have a list or table of these genes as part of the summary in the report or as part of the output of decoil.

ecDNA Filtering

Hi,

I have a question about potential filtering steps for the decoil output.

I used Decoil on a sample and found 43 ecDNA, which was more than I was expecting. I have also used coral on the same long read sequencing dataset and found 3 ecDNA. amplicon architect was used on this sample with short read sequencing and found 3 ecDNAs as well.

I was curious if there are any parameters that you would recommend adjusting or potentially filtering steps that might be helpful.

I used the decoil-pipeline via Docker as outlined in the readme section. I haven't adjusted any of the optional arguments yet. I'm happy to provide additional information if that would be helpful.

Thank you so much!

parameters

I wasn't completely clear about all the parameters in the readme:
--sv-caller and --dry-run were not accepted
also, --dry-run (-n) and --name (-n) have the same input which wouldn't work?

adjusting decoil reconstruct parameters

Similar to #11 I'd like to make my decoil ecDNA calls more conservative. But most of my samples have ~25-30x and I don't want to downsample them and loose information unnecessarily.

With default parameters decoil is calling ecDNA in all my samples, which is unlikely. Visually inspecting these regions confirms that some are high-confidence and "look" like ecDNA and others do not have strong evidence.

I have tried increasing --min-vaf to 0.1 which did reduce the number of ecDNA slightly

Given your samples were ~5-10x how should I adjust the parameters accordingly?

What is the difference between --min-cov and --fragment-min-cov ?

does --min-cov-alt refer to the number of reads spanning junctions ?

By default does --filter-score remove cycles with estimated copy number /proportion < WGS coverage and should I consider increasing this number ?

visualising the output of decoil

If I have this output from decoil, which files can I use to visualise any potential ecDNA. The manuscript mentions using gGnome
clean.vcf
config.json
coverage.bw
fragments_clean_1.bed
fragments_clean_2.bed
fragments_clean_3.bed
fragments_initial.bed
graph.gpickle
logs
metrics_frag_cov_distribution.png
metrics_frag_len_cov_correlation.png
metrics_frag_len_distribution.png
reconstruct.bed
reconstruct.bed_debug
reconstruct.ecDNA.bed
reconstruct.json
reconstruct.links.txt_debug
sv.sniffles.bedpe
sv.sniffles.vcf

Hope a command only for Decoil Reconstruction

The current version of Decoil supports three tasks: SV calling using Sniffles1, bigWig track generation using bamCoverage and Decoil reconstruction. I hope to introduce a new command specifically for the third step or divide these three steps. This way, I can directly perform the third step when the files from the first two steps are available, making parameter testing more convenient.
Additionally, I notice the use of NanoFilt to filter Nanopore data in your pre-print. Is this step necessary?

T2T Reference Genome

Hi,

I was wondering if decoil currently supports the t2t reference genome?

I used decoil on the same sample with the grch38 and then the t2t reference genome. The grch38 reference gave the expected output of a few ecDNA structures, but the t2t reference output no structures.

I followed up with another published dataset of the cell line colo320dm just to validate what I was seeing and got the same result.

I also used bamcoverage and sniffles on those samples and the t2t and g38 reference showed that they had similar coverage and SVs.

Maybe I'm doing something incorrectly, but I was wondering if you might have any thoughts on the issue?

Thank you!

An IsADirectoryError when generate summary.txt in the final step.

This tool is really useful for ecDNA analysis by ont data and it's easy to use.
However, i encountered an error when run with test data as showed in picture, and it seemed like an error in source code rather than my data.
Could you offer a simple solution for this problem? Thanks!

截屏2023-12-05 10 49 57

By the way, would Decoil be available in conda or pip or python script in the future? It would be more convenient to use in the servers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.