zstephens / exogene Goto Github PK
View Code? Open in Web Editor NEWA workflow for identifying viral integrations in both short and long read data
License: GNU General Public License v3.0
A workflow for identifying viral integrations in both short and long read data
License: GNU General Public License v3.0
Could you explain each column in integrations.tsv? I confuse about meaning of header row.
Is there a way to adapt the pipeline for nonhuman organisms?
Hi.
I executed exogene on WGS data. I think final output file is "integration.tsv".
I show my example output.
The results show many integration site for Encephalomyocarditis virus.
I think it is false positive because this sample is liver cancer. Perhaps HBV is true.
How filter out the false positive??
And, the results show many SOFTCLIP_MAPQ is 0%. is it ok?
Thanks.
Thanks so much for making this workflow available and dockerizing it!
When testing it with a custom viral genome file (-v
), I noticed that the workflow would run, but I saw a suspicious early [E::bwa_idx_load_from_disk] fail to locate the index files
message and the rest of the run would continue and eventually fail to find any integration sites.
It turns out this was due to the viral fasta I was supplying not having been indexed with bwa index
(init_ref.sh
indexes the joint reference but not the viral one alone) since it is used as the target of the initial mapping step (assumedly your included reference is already indexed). This is easy enough to do but took a while to figure out because there's no documentation suggesting that this file needs to be indexed in the README and I was trying to figure out if the joint indexing had failed.
As far as solutions, I was thinking of either:
bwa index
(this wouldn't require any repackaging of the docker container)init_ref.sh
that also indexes the supplied viral reference fasta with samtools and bwa if the -v flag is specified. If you'd prefer this not be the default behavior, there could be an additional commandline flag to enable it, or a check for a matching bwa index file with appropriate suffix so it's not reindexed if those files already exist.Cheers!
Tim J
Hi Stephen, I am not experienced Docker user and couldn't solve the issue by myself. I am running Docker on WSL (Cygwin and Ubuntu). After running exogene in Docker, I use this command as you described:
./Exogene-SR.sh -f1 test_data/SRR3104446_1.fq.gz -f2 test_data/SRR3104446_2.fq.gz -r refs/HumanViral_Reference_12-12-2018.fa -o output
Even though the path is true, gzip can't open or read the files:
gzip: test_data/SRR3104446_1.fq.gz: No such file or directory
gzip: test_data/SRR3104446_2.fq.gz: No such file or directory
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[main] Version: 0.7.17-r1188
[main] CMD: /usr/bin/bwa mem -k 30 -t 4 /home/refs/HumanViral_Reference_12-12-2018.fa -
[main] Real time: 0.018 sec; CPU: 0.020 sec
Traceback (most recent call last):
File "/home/exogene/dev/readlist_2_fq.py", line 38, in
fi_1 = get_file_handle(IN_R1, 'r')
File "/home/exogene/dev/readlist_2_fq.py", line 17, in get_file_handle
return open(fn, rw)
IOError: [Errno 2] No such file or directory: 'test_data/SRR3104446_1.fq'
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] EOF marker is absent. The input is probably truncated.
We were unable to grab template length stats from bwa.log, making a complete guess...
estimated template length: 350 50
=== BREAKPOINT DEVIATIONS:CHR INTEGRATION_POS #READS VIRUS ANNOTATION SOFTCLIP_POS #SOFTCLIP D
ISCORDANT_POS #DISCORDANT LONGREAD_POS #LONGREAD NEAREST_GENEmv: cannot stat '*_hits.ids': No such file or directory
Thank you for your time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.