Giter Club home page Giter Club logo

amycne's Introduction

AMYCNE

AMYCNE is a copy number estimation toolkit, designed for WGS data. It contains modules for genotyping of copy number, counting the number of chromosomes, annotating vcf files, and calling CNVs. AMYCNE require coverage tab files as input, these files may e produced using TIDDIT.

NOTE: The variant calling module is in a development stage, use it at your own risk!

Installation

AMYCNE requires bottleneck, scipy and numpy. These two packages may be installed using pip:

pip install numpy
pip install scipy
pip install bottleneck

AMYCNE has been tested on python 2.7.11, but might run on older versions of python as well. To improve the performance of AMYCNE, the code of AMYCNE may be compiled using cython:

python setup.py build_ext --inplace

Run

Type the following for a list of modules:

python AMYCNE.py

The following section decribes the basic commands for running AMYCNE, for more info, use the --help flag for each module. Each module requires a coverage file and a gc content file, having the same bin size.

Genotype: estimate the copy number in one or more target region
  Use the genotype module by typing:
    python AMYCNE.py --genotype
    
  Use the following command to genotype a specified region(chr1:100-10000):
  python AMYCNE.py --genotype --gc GC_tab_file.tab --coverage coverage_tab_file.tab --R chr1:100-10000
  
  To genotype all coverage files in a folder type the followin command:
  python AMYCNE.py --genotype --gc GC_tab_file.tab --folder /path/to/folder --R chr1:100-10000
  
  Multiple regions could be genotyped using a region text file instead of the --R flag:
  python AMYCNE.py --genotype --gc GC_tab_file.tab --coverage coverage_tab_file.tab --region region.txt
  python AMYCNE.py --genotype --gc GC_tab_file.tab --folder /path/to/folder --region region.txt
  
The region file consists of operations. Each line within the region text file describes one operation. THe supported operations are sum(sum) and average(avg). The operations are written in the following format:

sum(1:104198143-104207173|1:104230040-104238912|1:104292279-104301311)
avg(1:104198143-104207173|1:104230040-104238912|1:104292279-104301311)

Each region is separated by |, any number of regions within one operation is supported(except 0), and any number of operations within a region file is allowed.

Anotate: estimate the copy number across structural variants stored in a vcf file
    Use the anotate module by typing:
      python AMYCNE.py --anotate
      
    The annotate module requires a coverage file, a gc content file, as well as the structural variant vcf:
     python AMYCNE.py --anotate --gc gc_content_file.tab --coverage coverage.tab --vcf sv.vcf > annotated.sv.vcf

Call: Perform CNV calling
  Use the call module by typing:
      python AMYCNE.py --call
      
  The call module requires a coverage file, and a gc content file, and prints vcf file to --output
     python AMYCNE.py --call --gc gc_content_file.tab --coverage coverage.tab --output out.vcf

Generate GC content file

The Generate_GC_tab.py script may used to generate gc content files: python Generate_GC_tab.py --fa reference.fa --size bin_size > gc_content.tab

note that AMYCNE requires the same bin size for the coverage

Generate coverage Files

Coverage tab files may be generated using TIDDIT. The files should be given in the following format:

#chromosome start end coverage quality

chr1 0 100 23 10

chr1 100 200 23 10

chr1 200 300 23 10

chr2 0 100 23 10

chr2 100 200 23 10

chr2 200 300 23 10

chrX 0 100 23 10

chrY 0 100 23 10

The quality column is optional, the header is also optional, and is not read by the software. The bins need to cover the entire genome.

amycne's People

Contributors

j35p312 avatar

Stargazers

bioinformatic_gen avatar

Watchers

James Cloos avatar  avatar

Forkers

dnil

amycne's Issues

Calculators FFY and FFX

Hi
I have looked at the code to calculate FFY and FFX, but I don't quite understand how to calculate it. Can you describe in detail how it is calculated?

UnboundLocalError: local variable 'region' referenced before assignment

Trying to use the program returns the error in the title. The cause appears to be in genotype.py at line 40:

def main(Data,GC_hist,args):
    #get the coverage across the region

    coverage_list=[]
    len_list=[]
    gc_list=[]
    ref_list=[]
    total_bin_list=[]
    bin_count = 0
    used_bin_count = 0
    
    operations=[]
    if args.region:
        for line in open(args.region):
            mode,regions = retrieve_regions(line)
            operations.append({"mode": mode, "regions":regions,"command":line.strip()})
    else:
    
        chromosome=region.split(":")[0]

The chromosome=region.split(":")[0] line fails because it calls region (which has not been defined yet) rather than args.region as it's likely meant to.

Edit: It should be args.R not args.region.

Just tried AMYCNE, stack trace after a while

I get this stack output:

I mapped data to grch38 (the full set, including alt contigs etc.)
I used sambamba depth to generate the coverage file and your GC script for the GC tsv file.

example input:

$head ref_gc.tsv 
1       0       100     -1.0
1       100     200     -1.0
1       200     300     -1.0
1       300     400     -1.0
1       400     500     -1.0
1       500     600     -1.0
1       600     700     -1.0
1       700     800     -1.0
1       800     900     -1.0
1       900     1000    -1.0
$ tail ref_gc.tsv 
HLA-DRB1*16:02:01       10100   10200   0.4
HLA-DRB1*16:02:01       10200   10300   0.4
HLA-DRB1*16:02:01       10300   10400   0.52
HLA-DRB1*16:02:01       10400   10500   0.53
HLA-DRB1*16:02:01       10500   10600   0.38
HLA-DRB1*16:02:01       10600   10700   0.52
HLA-DRB1*16:02:01       10700   10800   0.45
HLA-DRB1*16:02:01       10800   10900   0.49
HLA-DRB1*16:02:01       10900   11000   0.55
HLA-DRB1*16:02:01       11000   11005   0.2

$ head Y2_6pg_13_cycles.md.even_cov.cov 
# chrom    chromStart      chromEnd        readCount       meanCoverage    sampleName
1       16400   16500   1       0.7     Y2_6pg_13_cycles
1       16500   16600   2       0.63    Y2_6pg_13_cycles
1       16600   16700   1       0.17    Y2_6pg_13_cycles
1       16700   16800   0       0       Y2_6pg_13_cycles
1       16800   16900   0       0       Y2_6pg_13_cycles
1       16900   17000   0       0       Y2_6pg_13_cycles
1       17000   17100   0       0       Y2_6pg_13_cycles
1       17100   17200   0       0       Y2_6pg_13_cycles
1       17200   17300   0       0       Y2_6pg_13_cycles
$ tail Y2_6pg_13_cycles.md.even_cov.cov 
HLA-DRB1*16:02:01       10000   10100   0       0       Y2_6pg_13_cycles
HLA-DRB1*16:02:01       10100   10200   0       0       Y2_6pg_13_cycles
HLA-DRB1*16:02:01       10200   10300   0       0       Y2_6pg_13_cycles
HLA-DRB1*16:02:01       10300   10400   0       0       Y2_6pg_13_cycles
HLA-DRB1*16:02:01       10400   10500   0       0       Y2_6pg_13_cycles
HLA-DRB1*16:02:01       10500   10600   0       0       Y2_6pg_13_cycles
HLA-DRB1*16:02:01       10600   10700   0       0       Y2_6pg_13_cycles
HLA-DRB1*16:02:01       10700   10800   0       0       Y2_6pg_13_cycles
HLA-DRB1*16:02:01       10800   10900   0       0       Y2_6pg_13_cycles
HLA-DRB1*16:02:01       10900   11000   0       0       Y2_6pg_13_cycles
Command output:
  finished reading the coverage data
  applying filters
  computing coverage histogram

Command error:
  /mnt/flash_scratch/nextflow_conda/env-4731703d9d1e13495b9cb3da6a63e29e/lib/python2.7/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
    out=out, **kwargs)
  /mnt/flash_scratch/nextflow_conda/env-4731703d9d1e13495b9cb3da6a63e29e/lib/python2.7/site-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
    ret = ret.dtype.type(ret / rcount)
  Traceback (most recent call last):
    File "...bin/AMYCNE.py", line 164, in <module>
      call.main(Data,GC_hist,args)
    File "call.py", line 338, in call.main
      ratio_hist=chromosome_hist(Data,args.Q)
    File "call.py", line 156, in call.chromosome_hist
      for chromosome in Data["chromosomes"]:
  TypeError: list indices must be integers, not str

Predicting fetal fraction

Hello,
how to predict fetal fraction using python AMYCNE.py --ff ? What files should I use for predicting fetal fraction? Please let me know.

Problems with reproduction of AMY1 test

Hello,

I've been trying to execute the following line:

python AMYCNE.py --genotype --gc ref_gc_cont.tab --coverage jap_hum_cov.tab --region AMY1.txt --Q 0

I'm using the AMY1.txt file provided by the documentation and gc and coverage tables were generated with TIDDIT and Generate_GC_tab.py The gc table was generated using the human reference genome ch38. The coverage tab was generated from the file HGDP00772.alt_bwamem_GRCh38DH.20181023.Japanese.cram colected from HGDP converted to .bam with samtools.

I receive the following error message:

Error: Too many low quality regions! consider rerunning the analysis using a smaller --size_cutoff, and less strict regions masking

I tried using the optional argument --s_cutoff with the values 10, 1 and 0 and still got the same message

Is there a value that you could recommed? Or a better .bam file to run this test?

Thank you in advance,

Luiza Gomes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.