Giter Club home page Giter Club logo

cleancall's People

Contributors

hyunminkang avatar mrflick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cleancall's Issues

Ref allele mismatch error and best practices

Dear Hyun,

Thank you very much for developing the CleanCall tool.
May I ask for your help to understand the proper usage of the tool and to resolve the errors that I encounter.

  1. There are mismatches of ref allele in the generated pileup and genome reference file that actually abort cctools verify :
FATAL ERROR -
ref G do not match with

I see that few percent of the positions indeed do have mismatches, for example c versus C, A versus AG and so on. This happens mostly at low coverage sites. The mismatches are unique for each sample, of course. How could I overcome this?

  1. There is a general question regarding the best scenario to find possible cross-contamination in a large number of samples, N=1000-2000.
    Is it ok to build N pileups for each individual BAM file with cctools pileup and then supply them to cctools verify via ped file ?
$ cat *ped
#FAM_ID IND_ID  FAT_ID  MOT_ID  SEX     MPU
24-0992 24-0992 0       0       0       tmp.pileup.24-0992.txt.gz
24-1155 24-1155 0       0       0       tmp.pileup.24-1155.txt.gz
24-1000 24-1000 0       0       0       tmp.pileup.24-1000.txt.gz

Will it work to detect the contamination of a sample A by sample B ?

  1. Is it true that the same VCF file with variant frequencies (e.g., from HapMap) is used for the pileup --loci, verify --vcf, and genotype --invcf ?
    This is how it is in the Quick start usage info. I am not sure I understand where does the input VCF for genotype correction come from.

Many thanks in advance,
Vasily.

stderr.txt
stdout.txt

Error when trying to install cleancall

I got the following message after running make. It seems that the hard coded path /home/hmkang/code/working/cleancall/ is the problem.

CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /home/hmkang/code/working/cleancall/missing --run aclocal-1.11 -I m4 /bin/sh: 0: Can't open /home/hmkang/code/working/cleancall/missing Makefile:270: recipe for target 'aclocal.m4' failed make: *** [aclocal.m4] Error 127

Please let me know if it is an easy fix.

Strange symbols in "ALT" field

Hi (Thank you for answering my other question).
cctools genotype writes strange symbols into the vcf file.
An example bam, index and output files can be found at https://cb.skoltech.ru/~enabieva/for_cleancall/ (the browser may complain about the certificate). An example of an offending line is on line 80 (chr20:310376). The reference is ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
The vcf file is the HapMap3.3 from the GATK Resource Bundle (ftp://[email protected]/bundle/hg38/hapmap_3.3.hg38.vcf.gz)

Another issue I'm having is that if I don't restrict genotype to a particular chromosome, (chr20 in the example), it creates a Makefile target that's called just ".tbi" and then complains about it. I think it might have to do with prefixing the chromosome name with "chr".

Problems with processing RNAseq data

Hi there, Thank you for creating cleancall.
I have some problems when analysis RNA-seq data:
The result file is as follows.
image

How can I get the contamination results of RNA-seq? I only want results that don't need correction.
or is there any problem with my running method?
'''
prod 11:48:21 R290-1: /mnt/GenePlus001/prod/maxx/software/contamination4/dbsnp
$ cat dbsnp_pileup.sh
/mnt/GenePlus001/prod/maxx/software/cleancall/bin/cctools pileup -loci /mnt/GenePlus001/prod/maxx/software/contamination3/dbsnp_138.b37.vcf.gz -index 180022992FR1 -out 180022992FRdbsnp_pileup -ref /products/repos/prod/Akso/BNC_v2/BNC/program/NoahCare/db/alignment/tgp_phase2_flat/hs37d5.fa --run 1 &
prod 11:49:17 R290-1: /mnt/GenePlus001/prod/maxx/software/contamination4/dbsnp
$ cat dbsnp_verify.sh
/mnt/GenePlus001/prod/maxx/software/cleancall/bin/cctools verify -index /mnt/GenePlus001/prod/maxx/software/contamination4/dbsnp/180022992FRdbsnp_pileup.ped -o 180022992FR_verify -vcf /mnt/GenePlus001/prod/maxx/software/contamination3/dbsnp_138.b37.vcf.gz --run 1
'''

Problem when running make file

Hi there, Thanks for creating cleancall.

I describe below some problems that I'm having with the tool:

I ran cctools pileupas described in the documentation: -loci VCF file and --index tabular_file_with_sampleID_bam_path.

I get the following error message:

Finished generating CAPT Makefile Running 10 parallel jobs of CAPT make -f t.pileup.Makefile -j 10 /bin/sh: 1: set: Illegal option -o pipefail make: *** [t.pileup.sample1.txt.gz.tbi] Error 2

I opened the makefile (*.pileup.Makefile) and I was able to copy and run the lines manually. With exception of (echo "#FAM_ID IND_ID FAT_ID MOT_ID SEX MPU"; cut -f 1 bam.man | perl -lane 'print join("\t",$$F[0],$$F[0],0,0,0,"output2.pileup.".$$F[0].".txt.gz")') > output2.pileup.ped

That i believe it is necessary to edit it (?!) to where it was $$F[0] -> $F[0]

Here is the command I ran: cctools pileup --loci VCF --index tabular_file_with_sampleID_bam_path --out output2.pileup --ref REF_genome --run 10

So there are two possible problems: 1) Makefile is not running correctly after the cctools pileup call and (maybe missing to install something (?!)) and 2) maybe the way that pileup.ped is generated may be problematic.

Thank you!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.