Giter Club home page Giter Club logo

metasv's Introduction

metasv's People

Contributors

chapmanb avatar joelmartin avatar johnmu avatar marghoob avatar sbandara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metasv's Issues

Questions on genotyping

I as wondering if MetaSV does "joint genotyping" of multiple samples, analogous to what GATK does for SNPs?

Does MetaSV do break-point resolution? For example, if I have a population samples with structural variants in the same region, can MetaSV use that information to try to figure out the boundaries of the structural variant?

Can the genotyping be done separate from the variant calling/discovery?

Thank you,

Luz

OSError: [Errno 12] Cannot allocate memory

Hello,

I've run MetaSV and received this following message:


ERROR 2016-02-23 02:20:01,971 run_spades_single-<Process(PoolWorker-15, started daemon)> Caught exception in worker thread
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/metasv/run_spades.py", line 75, in run_spades_single
retcode = cmd.run(cmd_log_fd_out=spades_log_fd, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/metasv/external_cmd.py", line 20, in run
self.p = subprocess.Popen(self.cmd, stderr=cmd_log_fd_err, stdout=cmd_log_fd_out)
File "/usr/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1223, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
()
INFO 2016-02-23 02:20:07,528 metasv.run_spades Merging the contigs from []


I am running on a machine with 96 GB of RAM, so memory shouldn't be an issue here. Monitoring RAM during while the script is running, it seems to use about 55 GB RAM. Have you an idea what the issue might be?

Thanks again for your assistance,
Madeline

Object has no attribute 'length' in the last step, Output final VCF file

Hello,

I ran:
python2.7 run_metasv.py --version
run_metasv.py 0.5

module load bedtools/2.24.0
module load gcc/4.8.4

PATH=$PATH:/bcbiometasv/miniconda/bin
PYTHONPATH="${PYTHONPATH}:/bcbiometasv/miniconda:/bcbiometasv/miniconda/lib/python2.7/site-packages"

python2.7 run_metasv.py --reference hg19_chromosome.fa --boost_sc
--age /bcbiometasv/miniconda/bin/AGE-master/age_align
--pindel_vcf 5.realigned.pindelx5_1toY.N0_PTonly_LI.filtered.somatic.142.recode.vcf 6.realigned.pindelx5_1toY_N0.PTonly_TD.filtered.somatic.142.recode.vcf 7.realigned.pindelx5_1toY_N0.PTonly_D.filtered.somatic.142.recode.vcf 8.realigned.pindelx5_1toY_N0.PTonly_INV.filtered.somatic.142.recode.vcf 9.realigned.pindelx5_1toY_N0.PTonly_SI.filtered.somatic.142.recode.vcf
--cnvnator_vcf 4.PTonly.NTrealign.root.cnvnator.N0.filtered.somatic.142.recode.vcf
--lumpy_vcf 3.tumor.gt.lumpy.svtyper.PRECISE.N0.PTonly.filtered.somatic.142.recode.vcf --manta_vcf 1.somaticSV_manta.PASS.N0only.PTonly.filtered.somatic.142.recode.vcf
--breakdancer_native 2.breakdancer.cfg.LIBTN.a.TumorOnly.noCTXITX.somatic.manEdit.out
--sample filter.somatic
--bam Clean3_mergedL7L8_hg19_kmer_q15_TrimN_N0_L70.recal_sort2_dedup2.realigned2.NTrealign.bam
--spades /bcbiometasv/miniconda/bin/SPAdes-3.6.0/bin/spades.py
--spades_options '-k 71'
--num_threads 4
--workdir /bcbiometasv/miniconda/bin/UP53input
--outdir out_somatic --min_support_ins 2 --max_ins_intervals 1000000
--mean_read_length 146 --isize_mean 365 --isize_sd 104

It is in the last step. I can see variant.vcf with only a header. But I also see the following error. Can you advise me the workaround for this?

INFO 2016-02-28 23:00:57,715 genotype_interval-<Process(PoolWorker-16, started daemon)> For interval chrY:22260563-22301084 DEL counts are 36, 142 and normal_frac is 0.253521 gt is 0/1
INFO 2016-02-28 23:00:58,091 genotype_interval-<Process(PoolWorker-16, started daemon)> For interval chrY:28792949-28793380 DEL counts are 296, 4088 and normal_frac is 0.072407 gt is 0/1
INFO 2016-02-28 23:00:58,245 genotype_interval-<Process(PoolWorker-16, started daemon)> For interval chrY:28805583-28814110 DEL counts are 229, 3217 and normal_frac is 0.0711843 gt is 0/1
INFO 2016-02-28 23:00:58,700 genotype_intervals-<Process(PoolWorker-16, started daemon)> Genotyped 351 intervals in 1.00553 minutes
INFO 2016-02-28 23:00:58,790 parallel_genotype_intervals-<_MainProcess(MainProcess, started)> Following BED files will be merged: ['/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/bin/UP53input/genotyping/0/genotyped.bed', '/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/bin/UP53input/genotyping/2/genotyped.bed', '/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/bin/UP53input/genotyping/1/genotyped.bed', '/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/bin/UP53input/genotyping/3/genotyped.bed']
INFO 2016-02-28 23:00:58,878 parallel_genotype_intervals-<_MainProcess(MainProcess, started)> Finished parallel genotyping of 1410 intervals in 1.08642 minutes
INFO 2016-02-28 23:00:58,882 metasv.main Output final VCF file
Traceback (most recent call last):
File "run_metasv.py", line 5, in
pkg_resources.run_script('MetaSV==0.5', 'run_metasv.py')
File "/usr/local/python/2.7.9/lib/python2.7/site-packages/distribute-0.6.28-py2.7.egg/pkg_resources.py", line 499, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/python/2.7.9/lib/python2.7/site-packages/distribute-0.6.28-py2.7.egg/pkg_resources.py", line 1239, in run_script
execfile(script_filename, namespace, namespace)
File "/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/lib/python2.7/site-packages/MetaSV-0.5-py2.7.egg/EGG-INFO/scripts/run_metasv.py", line 142, in
sys.exit(run_metasv(args))
File "/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/lib/python2.7/site-packages/MetaSV-0.5-py2.7.egg/metasv/main.py", line 335, in run_metasv
convert_metasv_bed_to_vcf(bedfile=genotyped_bed, vcf_out=final_vcf, workdir=args.workdir, sample=args.sample, reference=args.reference, pass_calls=False)
File "/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/lib/python2.7/site-packages/MetaSV-0.5-py2.7.egg/metasv/generate_final_vcf.py", line 476, in convert_metasv_bed_to_vcf
vcf_writer = vcf.Writer(open(vcf_out, "w"), vcf_template_reader)
File "/scratch/RDS-SMS-PCaGenomes-RW/weejar/bcbiometasv/miniconda/lib/python2.7/site-packages/vcf/parser.py", line 673, in init
if line.length:
AttributeError: 'tuple' object has no attribute 'length'

Sorry to bother you with too many questions. Thankyou for your time in helping my research.

James

filter_gaps

Hello,

I'm using Lumpy, Breakseq, breakdancer, pindel and cnvnator to look for CNVs in fastq obtained by WES (I know it's not the most adaptable tools for WES).

I would like to know the impact of the --filter_gaps option on merging files? How does it work ?

region overlapped between duplication and deletion

Hi,
I checked the output from metasv, but I find some regions are both duplication and deletion, what caused this?

Scaffold_12628_Chr2 93711 . G . PASS CIEND=-10,10;END=102822;SVLEN=9110;SVTYPE=DUP;CIPOS=-10,10;HOMLEN=2;HOMSEQ=TG;SOURCES=Scaffold_12628_Chr2-93711-Scaffold_12628_Chr2-102819-9108-Manta,Scaffold_12628_Chr2-93711-Scaffold_12628_Chr2-102821-9110-Lumpy,Scaffold_12628_Chr2-93712-Scaffold_12628_Chr2-102822-9110-WHAM;NUM_SVMETHODS=6;NUM_SVTOOLS=3;SVMETHOD=RP,RP,RP,SR,SR,SR;A=15;CF=1.0;CIEND95=0,0;CIPOS95=0,0;CW=0.0,1.0,0.0,0.0,0.0;D=0;DI=0.0;EV=8;I=0;PE=0;SR=7;SS=0;STRANDS=-+:7;SU=7;T=0;TAGS=2931_t4;TF=0;U=15;V=0 GT 0/1
Scaffold_12628_Chr2 96164 . T . PASS CIEND=-6,5;END=98928;IMPRECISE;SVLEN=-2764;SVTYPE=DEL;CIPOS=-6,5;SOURCES=Scaffold_12628_Chr2-96164-Scaffold_12628_Chr2-98928-2764-Manta,Scaffold_12628_Chr2-96190-Scaffold_12628_Chr2-98902-2712-Lumpy;NUM_SVMETHODS=4;NUM_SVTOOLS=2;SVMETHOD=RP,RP,SR,SR;CIEND95=0,0;CIPOS95=0,0;PE=0;SR=6;STRANDS=+-:6;SU=6 GT 0/1

Memory error when using metasv

I'm using metasv to merge the output of pindel, cnvnator and breakdancer. But I get the error "OSError: [Errno 12] Cannot allocate memory". I want to know if the metasv read all the bam file into memory when merge the SVs.
Here is my command:
run_metasv.py --reference /public/home/ylma/genome/Sus_scrofa/Sus_scrofa.Sscrofa10.2.dna.toplevel.fa
--breakdancer_native rc.out
--cnvnator_native rc.cnv
--pindel_native rc_D rc_LI rc_SI rc_TD rc_INV
--sample BMX --bams SAMN02298127.02.bam SAMN02298128.02.bam SAMN02298129.02.bam SAMN02298130.02.bam SAMN02298131.02.bam SAMN02298132.02.bam
--spades /public/home/ylma/tools/SPAdes-3.10.1-Linux/bin/spades.py
--age /public/home/ylma/tools/AGE/age_align --num_threads 15 --workdir work --outdir out
--max_ins_intervals 500000 --isize_mean 500 --isize_sd 150

Reference file not indexed

When I run metaSV, I get the following error: Reference file hg19_reference/hg19_multifasta is not indexed
What kind of indexing does the reference fasta file need?

Thank you,
Madeline

Run could not finish for DUP variant with option "--svs_to_assemble --svs_to_softclip" on

I am running Metasv with local assembly for duplication variants. My input is only 5 duplication variants and 4 of them were skipped due to small size, so there is only 1 duplication will be processed for local assembly. I am wondering how much time this process will take. My job has been running over a day with 2 threads and 12G memory for each thread. Are there anyway to speed up this process? For other types of variant DEL,Inseration,Duplication, the same setting could be finished in a few hours with all variants from one chromosome. Thanks, Justin

My input parameters: --svs_to_assemble DUP --svs_to_softclip DUP

Where I am now from output information

INFO 2017-02-14 17:02:34,915 metasv.sv_interval Loading SV intervals from /work/s167568/MGRAK_2016_10_17_WGS14_1507_0_MetaSV/MantaBreakdancer_metaSV/test_DUP.vcf
WARNING 2017-02-14 17:02:34,923 metasv.sv_interval Skipping Record(CHROM=1, POS=821604, REF=T, ALT=[DUP:TANDEM]) due to small size
WARNING 2017-02-14 17:02:34,923 metasv.sv_interval Skipping Record(CHROM=1, POS=2324462, REF=G, ALT=[DUP:TANDEM]) due to small size
WARNING 2017-02-14 17:02:34,924 metasv.sv_interval Skipping Record(CHROM=1, POS=3714245, REF=T, ALT=[DUP:TANDEM]) due to small size
WARNING 2017-02-14 17:02:34,924 metasv.sv_interval Skipping Record(CHROM=1, POS=4789624, REF=T, ALT=[DUP:TANDEM]) due to small size
INFO 2017-02-14 17:02:34,924 metasv.main SV types are set(['DUP'])
INFO 2017-02-14 17:02:34,924 metasv.main Output per-tool VCFs
INFO 2017-02-14 17:02:34,925 metasv.main Outputting single tool VCF for Manta
INFO 2017-02-14 17:02:34,976 metasv.main Indexing single tool VCF for Manta
INFO 2017-02-14 17:02:35,050 metasv.main Do merging
INFO 2017-02-14 17:02:35,050 metasv.main Processing SVs of type DUP
INFO 2017-02-14 17:02:35,050 metasv.main Intra-tool Merging SVs of type DUP
INFO 2017-02-14 17:02:35,050 metasv.main First level merging for DUP for tool Manta
INFO 2017-02-14 17:02:35,050 metasv.main Inter-tool Merging SVs of type DUP
INFO 2017-02-14 17:02:35,051 metasv.main Output merged VCF without assembly
INFO 2017-02-14 17:02:35,103 metasv.main ('DUP', 'LowQual', 'IMPRECISE', ('Manta',)):1
INFO 2017-02-14 17:02:35,103 metasv.main Running assembly
INFO 2017-02-14 17:02:35,103 metasv.main Creating directory /work/s167568/MGRAK_2016_10_17_WGS14_1507_0_MetaSV/MantaBreakdancer_metaSV/metasv_work_test5DUP/spades
INFO 2017-02-14 17:02:35,111 metasv.main Creating directory /work/s167568/MGRAK_2016_10_17_WGS14_1507_0_MetaSV/MantaBreakdancer_metaSV/metasv_work_test5DUP/age
INFO 2017-02-14 17:02:35,122 metasv.main Generating Soft-Clipping intervals.
INFO 2017-02-14 17:02:35,122 parallel_generate_sc_intervals-<_MainProcess(MainProcess, started)> SVs to soft-clip: set(['DUP', 'INV', 'DEL', 'INS'])
INFO 2017-02-14 17:02:35,315 get_bp_intervals-<_MainProcess(MainProcess, started)> 2 total candidate bp intervals in other methods
INFO 2017-02-14 17:02:35,325 generate_sc_intervals-<Process(PoolWorker-1, started daemon)> Generating candidate intervals from /work/s167568/MGRAK_2016_10_17_WGS14_1507_0_MetaSV/input/HCC4017_Clone4.DupsMarked_RG.bam for chromsome 1
INFO 2017-02-14 17:27:36,793 generate_sc_intervals-<Process(PoolWorker-1, started daemon)> 6949907 candidate reads
INFO 2017-02-14 17:28:07,973 generate_sc_intervals-<Process(PoolWorker-1, started daemon)> 574885 candidate NONE reads
INFO 2017-02-14 17:28:07,974 generate_sc_intervals-<Process(PoolWorker-1, started daemon)> Gather intervals from breakpoints in other methods
INFO 2017-02-14 17:28:12,076 generate_sc_intervals-<Process(PoolWorker-1, started daemon)> 574885 bps in other methods
INFO 2017-02-14 17:44:31,879 resolve_none_svs-<Process(PoolWorker-1, started daemon)> 127 unresolved intervals
INFO 2017-02-14 17:44:33,931 resolve_none_svs-<Process(PoolWorker-1, started daemon)> 94 merged unresolved intervals
INFO 2017-02-14 17:44:34,789 resolve_none_svs-<Process(PoolWorker-1, started daemon)> 94 filtered unresolved intervals
INFO 2017-02-14 17:44:34,935 resolve_none_svs-<Process(PoolWorker-1, started daemon)> 79 coverage filtered unresolved intervals
INFO 2017-02-14 17:44:36,884 resolve_none_svs-<Process(PoolWorker-1, started daemon)> 58 coverage filtered unresolved intervals
INFO 2017-02-14 17:57:45,636 generate_sc_intervals-<Process(PoolWorker-1, started daemon)> 179755 merged intervals with left bp support

IOError: [Errno 2] No such file or directory: 'cnvnator.call'

Hi,

I installed metasv using pip install and when I run it I get a missing file cnvnator.call. Full log is below signature.

Thanks, Colin

run_metasv.py --reference /wd5/sq/grch37decoy/hs37d5.000.fa --breakdancer_native breakdancer.out --breakseq_native breakseq.gff --cnvnator_native cnvnator.call --pindel_native pindel_D pindel_LI pindel_SI pindel_TD pindel_INV --bam chimera.bam --spades SPAdes/spades.py --age AGE/age_align --num_threads 11 --workdir work --outdir out --max_ins_intervals 500000 --isize_mean 500 --isize_sd 150 --sample 1
INFO 2018-02-01 15:07:02,114 metasv.main Running MetaSV 0.5.2
INFO 2018-02-01 15:07:02,114 metasv.main Command-line /st2/colin/.local/bin/run_metasv.py --reference /wd5/sq/grch37decoy/hs37d5.000.fa --breakdancer_native breakdancer.out --breakseq_native breakseq.gff --cnvnator_native cnvnator.call --pindel_native pindel_D pindel_LI pindel_SI pindel_TD pindel_INV --bam chimera.bam --spades SPAdes/spades.py --age AGE/age_align --num_threads 11 --workdir work --outdir out --max_ins_intervals 500000 --isize_mean 500 --isize_sd 150 --sample 1
INFO 2018-02-01 15:07:02,114 metasv.main Arguments are Namespace(age='AGE/age_align', age_timeout=300, age_window=20, assembly_max_tools=1, assembly_pad=500, bams=['chimera.bam'], boost_sc=False, breakdancer_native=['breakdancer.out'], breakdancer_vcf=[], breakseq_native=['breakseq.gff'], breakseq_vcf=[], chromosomes=[], cnvkit_vcf=[], cnvnator_native=['cnvnator.call'], cnvnator_vcf=[], disable_assembly=False, enable_per_tool_output=False, extraction_max_read_pairs=10000, filter_gaps=False, gaps=None, gatk_vcf=[], gt_normal_frac=0.05, gt_window=100, inswiggle=100, isize_mean=500.0, isize_sd=150.0, keep_standard_contigs=False, lumpy_vcf=[], manta_vcf=[], max_ins_cov_frac=1.5, max_ins_intervals=500000, max_nm=10, maxsvlen=1000000, mean_read_coverage=50, mean_read_length=100, min_avg_base_qual=20, min_del_subalign_len=50, min_ins_cov_frac=0.5, min_inv_subalign_len=50, min_mapq=5, min_matches=50, min_soft_clip=20, min_support_frac_ins=0.05, min_support_ins=15, minsvlen=50, num_threads=11, outdir='out', overlap_ratio=0.5, pindel_native=['pindel_D', 'pindel_LI', 'pindel_SI', 'pindel_TD', 'pindel_INV'], pindel_vcf=[], reference='/wd5/sq/grch37decoy/hs37d5.000.fa', sample='1', sc_other_scale=5, spades='SPAdes/spades.py', spades_max_interval_size=50000, spades_options='', spades_timeout=300, stop_spades_on_fail=False, svs_to_assemble=set(['DUP', 'INV', 'DEL', 'INS']), svs_to_report=set(['INV', 'CTX', 'INS', 'DEL', 'ITX', 'DUP']), svs_to_softclip=set(['DUP', 'INV', 'DEL', 'INS']), wham_vcf=[], wiggle=100, workdir='work')
INFO 2018-02-01 15:07:02,115 metasv.main Only SVs on the following contigs will be reported: ['GL000191.1', 'GL000192.1', 'GL000193.1', 'GL000194.1', 'GL000195.1', 'GL000196.1', 'GL000197.1', 'GL000198.1', 'GL000199.1', 'GL000200.1', 'GL000201.1', 'GL000202.1', 'GL000203.1', 'GL000204.1', 'GL000205.1', 'GL000206.1', 'GL000207.1', 'GL000208.1', 'GL000209.1', 'GL000210.1', 'GL000211.1', 'GL000212.1', 'GL000213.1', 'GL000214.1', 'GL000215.1', 'GL000216.1', 'GL000217.1', 'GL000218.1', 'GL000219.1', 'GL000220.1', 'GL000221.1', 'GL000222.1', 'GL000223.1', 'GL000224.1', 'GL000225.1', 'GL000226.1', 'GL000227.1', 'GL000228.1', 'GL000229.1', 'GL000230.1', 'GL000231.1', 'GL000232.1', 'GL000233.1', 'GL000234.1', 'GL000235.1', 'GL000236.1', 'GL000237.1', 'GL000238.1', 'GL000239.1', 'GL000240.1', 'GL000241.1', 'GL000242.1', 'GL000243.1', 'GL000244.1', 'GL000245.1', 'GL000246.1', 'GL000247.1', 'GL000248.1', 'GL000249.1', 'NC_007605', 'chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr22', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chrM', 'chrX', 'chrY', 'hs37d5']
INFO 2018-02-01 15:07:02,115 metasv.main Load native files
INFO 2018-02-01 15:07:02,115 metasv.cnvnator_reader File is cnvnator.call
Traceback (most recent call last):
File "/st2/colin/.local/bin/run_metasv.py", line 143, in
sys.exit(run_metasv(args))
File "/home/colin/.local/lib/python2.7/site-packages/metasv/main.py", line 106, in run_metasv
for record in svReader(native_file, svs_to_report=args.svs_to_report):
File "/home/colin/.local/lib/python2.7/site-packages/metasv/cnvnator_reader.py", line 110, in init
self.file_fd = open(file_name)
IOError: [Errno 2] No such file or directory: 'cnvnator.call'

No such file or directory: 'cnvnator.call'

Hello,

When I use the following command:
run_metasv.py --reference hg19_reference/hg19_multifasta.fa --boost_ins --breakdancer_native breakdancer.out --breakseq_native breakseq.gff --cnvnator_native cnvnator.call --pindel_native pindel_D pindel_LI pindel_SI pindel_TD pindel_INV --sample A10890 --bam A10890_C1VBNACXX_1.bam --spades SPAdes-3.6.2-Linux/bin/spades.py --age AGE-simple-parseable-output/age_align.cpp --num_threads 15 --workdir work --outdir out --min_ins_support 2 --max_ins_intervals 500000 --isize_mean 500 --isize_sd 150

I get the following output and error message:
INFO 2016-01-20 10:56:47,410 metasv.main Only SVs on the following contigs will be reported: ['chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chrM', 'chrX', 'chrY']
INFO 2016-01-20 10:56:47,410 metasv.main Load native files
INFO 2016-01-20 10:56:47,410 metasv.cnvnator_reader File is cnvnator.call
Traceback (most recent call last):
File "/usr/local/bin/run_metasv.py", line 6, in
exec(compile(open(file).read(), file, 'exec'))
File "/home/hpcuser01/metasv-0.3/scripts/run_metasv.py", line 108, in
stop_spades_on_fail=args.stop_spades_on_fail, gt_window=args.gt_window, gt_normal_frac=args.gt_normal_frac, isize_mean=args.isize_mean, isize_sd=args.isize_sd, extraction_max_read_pairs=args.extraction_max_read_pairs))
File "/home/hpcuser01/metasv-0.3/metasv/main.py", line 130, in run_metasv
for record in svReader(native_file):
File "/home/hpcuser01/metasv-0.3/metasv/cnvnator_reader.py", line 106, in init
self.file_fd = open(file_name)
IOError: [Errno 2] No such file or directory: 'cnvnator.call'

I am not sure why. Help would be appreciated!

Thank you,
Madeline

Deletion assembly and soft clip

Hello,

There seems to be some bug in metaSV when the --boost_scs option is included. With this option, my deletion detection sensitivity drops dramatically from ~80% to 5% (as measured using VarSim). My command is as follows, using metaSV v 0.5.3:
run_metasv.py --bam $BAM \ --reference $REF \ --sample $SAMPLE --boost_sc \ --cnvnator_native $SAMPLE.bam_CNVcall.100 \ --lumpy_vcf $SAMPLE.bam_lumpy.vcf \ --spades /home/hpcuser01/SPAdes-3.6.2-Linux/bin/spades.py \ --age /home/hpcuser01/AGE/age_align \ --min_support_ins 2 \ --max_ins_intervals 500000 --isize_mean $INSMEAN --isize_sd $INSSD \ --num_threads $THREADS --outdir $SAMPLE.metaSV.out --workdir $SAMPLE.metaSV.work
Even if I specify --svs_to_assembly INS I still have deletions dropping out. Not sure why this is.

output files filtering

Hi there,

I have got some vcf files by metasv to merge different tools and assembly. But I don't know how to extract high confidence SNVs from outputs ? could you give me some ideas?

thanks a lot.

Kai

pybedtools.helpers.BEDToolsError: missing bed file

Hi,

I want to detect deletion only.This is my command:
run_metasv.py --reference ../data/MIC_supercont_chr_combine.txt --breakdancer_native ./BDcaller/version1/BD/CU427_EC_chr_BD --pindel_native ./BDcaller/version1/pindel/CU427_D --sample WT --bam ../data/CU427_chr_EC_bwa_sorted.bam --spades spades.py --age age_align --num_threads 4 --workdir work --outdir DEL --isize_mean 236.88 --isize_sd 172.95 --svs_to_assemble DEL --svs_to_report DEL

The error messange:
Traceback (most recent call last):
File "/usr/local/bin/run_metasv.py", line 143, in
sys.exit(run_metasv(args))
File "/usr/local/lib/python2.7/dist-packages/metasv/main.py", line 338, in run_metasv
convert_metasv_bed_to_vcf(bedfile=genotyped_bed, vcf_out=final_vcf, workdir=args.workdir, sample=args.sample, reference=args.reference, pass_calls=False)
File "/usr/local/lib/python2.7/dist-packages/metasv/generate_final_vcf.py", line 573, in convert_metasv_bed_to_vcf
filterd_bed = filter_confused_INS_calls(nonfilterd_bed, filterd_bed)
File "/usr/local/lib/python2.7/dist-packages/metasv/generate_final_vcf.py", line 164, in filter_confused_INS_calls
bad_INS = bedtool_INS.window(bedtool_bp_nonINS, w=wiggle)
File "/usr/local/lib/python2.7/dist-packages/pybedtools/bedtool.py", line 664, in decorated
result = method(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pybedtools/bedtool.py", line 243, in wrapped
check_stderr=check_stderr)
File "/usr/local/lib/python2.7/dist-packages/pybedtools/helpers.py", line 423, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:

bedtools window -w 20 -b /root/work/genotyping/pybedtools.WpFB8r.tmp -a /root/work/genotyping/pybedtools.rfbScw.tmp

Error message was:
Error: The requested bed file (/root/work/genotyping/pybedtools.rfbScw.tmp) could not be opened. Exiting!

I am not sure what's the problem, please let me know if there is anything I can do. Thank you!

MetaSV 0.5.4 job being terminated on our cluster because it is asking for 178gb of memory

Hi there,

I am trying to test metasv. I have breakseq, pindel, and breakdancer data that I am providing to metasv. However, when I ran metasv on our cluster using 16cpus and 128gb of memory, metasv was terminated because it exceeded this memory allotment -- it went up to 178gb. Is it normal behaviour to use this much memory? Any ideas what could be happening?

The run command I used was this:
python run_metasv.py --breakdancer_native $list --reference /projects/trans_scratch/references/genomes/transabyss/bwamem-0.7.10/hg19a.fa --sample TEST --outdir /projects/trans_scratch/validations/workspace/dpaulino/metasvtest --bam /projects/analysis/analysis24/A36971/merge_bwa-mem-0.7.6a/150nt/hg19a/A36971_2_lanes_dupsFlagged.bam --spades /home/dpaulino/.linuxbrew/Cellar/spades/3.10.1/bin/spades.py --age /home/dpaulino/software/ageAligner/AGE-master/age_align --num_threads 5 --breakseq_native /projects/trans_scratch/validations/workspace/dpaulino/breakseqTest/work/breakseq.gff --pindel_native $pindellist

$list and $pindellist contain paths to breakdancer and pinel data.

The dataset is a human genome with chromosomes 1-22, X, and Y. Any advice on how to get metasv to run properly is greatly appreciated!

Thanks,
Daniel

Permissions errors during run_spades_single in run_spades.py

Hi and thanks in advance for your help!

While running MetaSV, it got stuck while running spades. There are a few 'OSError: [Errno 13] Permission denied' errors during the run_spades_single process, and then it seems to hang while merging the contigs. The merged.fa and spades.log files are both empty. Running MetaSV on the same sample with --disable_assembly went fine with no errors. The error messages are below, please let me know if there is anything I should try or additional information that would be helpful.

Thanks!

Amanda

Errors:
INFO 2016-04-06 14:10:19,200 extract_read_pairs-<Process(PoolWorker-7, started daemon)> Examined 256 pairs in 11.3396 seconds
INFO 2016-04-06 14:10:19,200 extract_read_pairs-<Process(PoolWorker-7, started daemon)> Extraction counts [('all_pair_hq', 256), ('non_perfect_hq', 107)]
INFO 2016-04-06 14:10:19,202 run_spades_single-<Process(PoolWorker-7, started daemon)> Running /HOME/BIOINFORMATICS/SOFTWARE/SPADES-3.7.1-LINUX/BIN/ with arguments ['-1', 'work/spades/2/_all_pair_hq_1.fq', '-2', 'work/spades/2/_all_pair_hq_2.fq', '-o', 'work/spades/2/spades_all_pair_hq/', '-m', '4', '-t', '1', '--phred-offset', '33']
ERROR 2016-04-06 14:10:19,950 run_spades_single-<Process(PoolWorker-7, started daemon)> Caught exception in worker thread
Traceback (most recent call last):
File "/home/.local/lib/python2.7/site-packages/metasv/run_spades.py", line 75, in run_spades_single
retcode = cmd.run(cmd_log_fd_out=spades_log_fd, timeout=timeout)
File "/home/.local/lib/python2.7/site-packages/metasv/external_cmd.py", line 20, in run
self.p = subprocess.Popen(self.cmd, stderr=cmd_log_fd_err, stdout=cmd_log_fd_out)
File "/usr/local/apps/python-2.7.11/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/local/apps/python-2.7.11/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied
INFO 2016-04-06 14:10:20,589 metasv.run_spades Merging the contigs from []
^CTraceback (most recent call last):
File "/home/Bioinformatics/software/metasv/scripts/run_metasv.py", line 143, in
sys.exit(run_metasv(args))
File "/home/.local/lib/python2.7/site-packages/metasv/main.py", line 306, in run_metasv
assembly_max_tools=args.assembly_max_tools)
File "/home/.local/lib/python2.7/site-packages/metasv/run_spades.py", line 214, in run_spades_parallel
for line in fileinput.input(assembly_fastas):
File "/usr/local/apps/python-2.7.11/lib/python2.7/fileinput.py", line 254, in next
line = self.readline()
File "/usr/local/apps/python-2.7.11/lib/python2.7/fileinput.py", line 349, in readline
self._buffer = self._file.readlines(self._bufsize)

Error: reference file is not indexed

Hi,
I'm trying ti use MetaSV for the first time.
I installed version 0.5.2.
I checked that reference.fa and reference.fa.fai are in the same folder with excatly the same name.

I'm trying the following command:
run_metasv.py --reference ~/Documents/Post-Doc/Results/SequencingData/Reference_genomes/Celegans/c_elegans.PRJNA13758.WS243.genomic.fa --boost_sc --breakdancer_native breakdancer.out --breakseq_native breakseq.gff --cnvnator_native cnvnator.call --pindel_native pindel_D pindel_LI pindel_SI pindel_TD pindel_INV --sample 516 --bam ~/Bureau/temp_PSMN/Post-doc/N2vs516/NON-MASKED/BAM-516_20X-unM_RG-sorted-dedup-realign_BQSR1.bam --spades ~/softwares/SPAdes-3.13.0-Linux/bin/spades.py --age ~/softwares/AGE-master/age_align --num_threads 1 --workdir ~/Bureau/temp_PSMN/Post-doc/MetaSV/MetaSV_516/work --outdir ~/Bureau/temp_PSMN/Post-doc/MetaSV/MetaSV_516/out --isize_mean 470 --isize_sd 35

In the folder containing my ref file, I do have .fa and .fai (I re-generate it with samtools faidx to be sure)
ls -l | grep "c_elegans.PRJNA13758.WS243.genomic.fa" -rwxrwxrwx 1 fabfab fabfab 102292161 mai 19 2014 c_elegans.PRJNA13758.WS243.genomic.fa -rwxrwxrwx 1 fabfab fabfab 14 avril 22 2015 c_elegans.PRJNA13758.WS243.genomic.fa.amb -rwxrwxrwx 1 fabfab fabfab 231 avril 22 2015 c_elegans.PRJNA13758.WS243.genomic.fa.ann -rwxrwxrwx 1 fabfab fabfab 100286508 avril 22 2015 c_elegans.PRJNA13758.WS243.genomic.fa.bwt -rwxrwxrwx 1 fabfab fabfab 181 oct. 18 11:28 c_elegans.PRJNA13758.WS243.genomic.fa.fai -rwxrwxrwx 1 fabfab fabfab 25071602 avril 22 2015 c_elegans.PRJNA13758.WS243.genomic.fa.pac -rwxrwxrwx 1 fabfab fabfab 50143256 avril 22 2015 c_elegans.PRJNA13758.WS243.genomic.fa.sa

Any idea of what I'm doing wrong ?

thanks,
Fabrice

How to include SV calls from Delly

Hi Marghoob,
I’m interested to use MetaSV for identifying high confidence SVs from 3 different algorithms e.g. BreakDancer, Delly and CNVnator.
I was wondering whether it is possible to include Delly’s output in MetaSV; and if so, your guidance will be highly appreciated.

Thanks
Mesbah

some problems about assembly using parameter --spades

Hi,

While running MetaSV, it got errors during the spades process. It seems I can't run the spades successfully, and got nothing finally. Running MetaSV on the same sample with --disable_assembly went fine with no errors. The error messages are below, please let me know if there is anything I should try or additional information that would be helpful.

Thanks!
jsxu

image

ValueError: dictionary update sequence element #0 has length 1; 2 is required

Hi,

I am trying a complete run of MetaSV 0.5.4 (installed from bioconda) using all 2 SV detectors (pindel and breakdancer), soft-clips based analysis, and local assembly. And got this error

INFO 2022-02-15 15:51:33,585 metasv.main          Load native files
INFO 2022-02-15 15:51:33,586 metasv.pindel_reader File is LA3111t13-LA4330t13_D
INFO 2022-02-15 16:18:19,892 metasv.pindel_reader File is LA3111t13-LA4330t13_SI
INFO 2022-02-15 16:32:20,639 metasv.pindel_reader File is LA3111t13-LA4330t13_TD
INFO 2022-02-15 16:34:33,923 metasv.pindel_reader File is LA3111t13-LA4330t13_INV
INFO 2022-02-15 16:34:35,183 metasv.breakdancer_reader File is breakdancer.sv.vcf
Traceback (most recent call last):
  File "/data/home/users/g.silvaarias/anaconda3/envs/metasv/bin/run_metasv.py", line 143, in <module>
    sys.exit(run_metasv(args))
  File "/data/home/users/g.silvaarias/anaconda3/envs/metasv/lib/python2.7/site-packages/metasv/main.py", line 106, in run_metasv
    for record in svReader(native_file, svs_to_report=args.svs_to_report):
  File "/data/home/users/g.silvaarias/anaconda3/envs/metasv/lib/python2.7/site-packages/metasv/breakdancer_reader.py", line 222, in next
    self.header.parse_header_line(line)
  File "/data/home/users/g.silvaarias/anaconda3/envs/metasv/lib/python2.7/site-packages/metasv/breakdancer_reader.py", line 74, in parse_header_line
    self.header_dict[fields[0]] = dict(field.split(":") for field in fields[1:])
ValueError: dictionary update sequence element #0 has length 1; 2 is required

Here is the full command:

run_metasv.py --reference $ref \
   --outdir $outdir \
   --boost_sc \
   --breakdancer_native breakdancer.sv.vcf \
   --pindel_native LA3111t13-LA4330t13_D LA3111t13-LA4330t13_SI LA3111t13-LA4330t13_TD LA3111t13-LA4330t13_INV \
   --sample LA3111t13 --sample LA4330t13 --bam LA3111t13_dedup_RG.bam LA4330t13_dedup_RG.bam --spades spades.py \
   --age age_align --num_threads $threads \
   --min_support_ins 10 --isize_mean 400 --isize_sd 100

I would appreciate any suggestion to fix that.

Best,
Gustavo

error in bed to output final vcf file

Hello,

I have been getting this error below when running metasv with soft-clip. Have you seen this before? I am using correct version of tools as listed in http://bioinform.github.io/metasv/.

INFO 2016-08-31 22:00:33,727 genotype_intervals-<Process(PoolWorker-16, started daemon)> Genotyped 6 intervals in 0.0038044 minutes
INFO 2016-08-31 22:00:33,803 parallel_genotype_intervals-<_MainProcess(MainProcess, started)> Following BED files will be merged: ['work/genotyping/0/genotyped.bed', 'work/genotyping/1/genotyped.bed', 'work/genotyping/2/genotyped.bed', 'work/genotyping/3/genotyped.bed']
INFO 2016-08-31 22:00:33,845 parallel_genotype_intervals-<_MainProcess(MainProcess, started)> Finished parallel genotyping of 27 intervals in 0.00704072 minutes
INFO 2016-08-31 22:00:33,847 metasv.main Output final VCF file
feature.field:0/1
Traceback (most recent call last):
File "/mnt/galaxyTools/tools/pymodules/python2.7/bin/run_metasv.py", line 146, in
sys.exit(run_metasv(args))
File "/mnt/galaxyTools/tools/pymodules/python2.7/lib/python/MetaSV-0.5-py2.7.egg/metasv/main.py", line 335, in run_metasv
convert_metasv_bed_to_vcf(bedfile=genotyped_bed, vcf_out=final_vcf, workdir=args.workdir, sample=args.sample, reference=args.reference, pass_calls=False)
File "/mnt/galaxyTools/tools/pymodules/python2.7/lib/python/MetaSV-0.5-py2.7.egg/metasv/generate_final_vcf.py", line 435, in convert_metasv_bed_to_vcf
interval_info = get_interval_info(interval,pass_calls)
File "/mnt/galaxyTools/tools/pymodules/python2.7/lib/python/MetaSV-0.5-py2.7.egg/metasv/generate_final_vcf.py", line 77, in get_interval_info
info.update(json.loads(base64.b64decode(feature.fields[10])))
File "/mnt/galaxyTools/tools/pymodules/python2.7/lib/python/base64.py", line 76, in b64decode
raise TypeError(msg)
TypeError: Incorrect padding

example command

Dear developers,

Thank you for contributing the tool to the community.

At the moment, I have ref and bam files, and when I run:

run_metasv.py --sample sample_A --reference ./ref/ref.fasta --bam ./bam/sample_A.bam --outdir ./out

I got an error - "Nothing to do since no SV file specified".

Could you please let me know where goes wrong? And what's the best way to run the tool? Could you show me an example?

Best wishes,
Fengyuan

Speed up computation time

Hi metasv developer,

Currently, I am using metasv to merge SVs from the outputs of BreankDancer, CNVNATOR, and Pindel for a human genome. I was wondering if there are some tricks that I could accelerate the computational time?

I downloaded metasv from anaconda by using the command below:
conda install -c bioconda metasv
The version of metasv:

[ksu2 18:11:36 ksu2_SVE]$ run_metasv.py --version
run_metasv.py 0.5.4

I performed the run_metasv.py on the example files without any issue, so I moved to my own data. The running time of metasv on our HPC is over 5 days now. If you have any change to give me some suggestions that will be great. Here I listed my bash command.
`
#!/bin/bash
#SBATCH --qos=long
#SBATCH --time=7-00:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --mem=64G

module load anaconda/2.5.0 bedtools/2.27.1
module load gcc/4.8.2
module load cmake/3.0.2 ROOT/5.34.36
export CONDA_ENVS_PATH=/lustre/project/ksu2_SVE
unset PYTHONPATH
source activate SVE

metaSV_ref=Homo_sapiens_assembly38.fasta
breakdancer_our=/data/BreakDancer_out/Subject_ID.sv.tbl
cnvnator_call=/data/CNVnator_out/Subject_ID.cnv.xls
pindel_out=/data/pindel_out/Sample_dir/Subject_ID/*
sample_idSubject_ID_tbl
alignments_bam=/data/Subject_ID.bam
spades_exe=/ksu2_SVE/SVE/bin/spades.py
age_align_exe=/ksu2_SVE/SVE/bin/age_align
threads=20
work=/data/metaSV_work2
OUTDIR=/data/metaSV_out2
insert_size_mean=260.04
insert_size_sd=56.34
metaSV_svs_to_assemble={'DEL','INS','INV','DUP'}

run_metasv.py --reference $metaSV_ref
--breakdancer_native $breakdancer_our
--cnvnator_native $cnvnator_call
--pindel_native $pindel_out
--sample $sample_id
--bam $alignments_bam
--spades $spades_exe
--age $age_align_exe
--num_threads $threads
--workdir $work
--outdir $OUTDIR
--isize_mean $insert_size_mean
--isize_sd $insert_size_sd
`
I didn't find any issues in the log file so far but the running time is over than I expected.

If you need more information, please let me know, and thank you for your time.

Ray

metasv error - Floating point exception

Hi,

I'm trying to run metasv to integrate calls from breakdancer, lumpy and cnvnator.

I've installed and tested metasv v.0.4 without problems, but I get a " Floating point exception" error when trying to test it in on my data (1 sample and 1 chr).
I copy below the last lines of the log and the command I've run.

It creates the work directory and the pre_asm.vcf file, but it's empty.
I've tried combining the output of only 2 callers as well, but all the combinations give the same error.

I would appreciate if you could help me identify what is causing the error.

Thank you in advance and best regards,

amaia

$METASV --reference $resourceDir/human_g1k_v37_decoy.fasta --outdir $WorkingDir/metasv --sample $sample --bam $WorkingDir/bam_sort/$sample.final.sorted.bam --chromosomes $chr  --cnvnator_native $WorkingDir/cnvnator/calls/$sample.calls  --breakdancer_native $WorkingDir/breakdancer/intrachr/samples61_breakdancer_o_$chr.ctx  --lumpy_vcf $WorkingDir/lumpy/$fam'_samples.vcf' --disable_assembly --filter_gaps --keep_standard_contigs

...
INFO 2015-11-12 11:46:24,844 metasv.main SV types are set(['DEL', 'DUP', 'INV', 'INS'])
INFO 2015-11-12 11:46:24,845 metasv.main Do merging
INFO 2015-11-12 11:46:24,845 metasv.main Processing SVs of type DEL
INFO 2015-11-12 11:46:24,845 metasv.main Intra-tool Merging SVs of type DEL
INFO 2015-11-12 11:46:24,845 metasv.main First level merging for DEL for tool CNVnator
INFO 2015-11-12 11:46:24,903 metasv.main First level merging for DEL for tool BreakDancer
INFO 2015-11-12 11:46:25,207 metasv.main First level merging for DEL for tool Lumpy
INFO 2015-11-12 11:46:27,096 metasv.main Inter-tool Merging SVs of type DEL
INFO 2015-11-12 11:46:27,588 metasv.main Checking overlaps SVs of type DEL
INFO 2015-11-12 11:46:28,249 metasv.main Processing SVs of type DUP
INFO 2015-11-12 11:46:28,249 metasv.main Intra-tool Merging SVs of type DUP
INFO 2015-11-12 11:46:28,249 metasv.main First level merging for DUP for tool CNVnator
INFO 2015-11-12 11:46:28,267 metasv.main First level merging for DUP for tool Lumpy
INFO 2015-11-12 11:46:28,477 metasv.main Inter-tool Merging SVs of type DUP
INFO 2015-11-12 11:46:28,549 metasv.main Checking overlaps SVs of type DUP
INFO 2015-11-12 11:46:29,211 metasv.main Processing SVs of type INV
INFO 2015-11-12 11:46:29,211 metasv.main Intra-tool Merging SVs of type INV
INFO 2015-11-12 11:46:29,212 metasv.main First level merging for INV for tool BreakDancer
INFO 2015-11-12 11:46:29,225 metasv.main First level merging for INV for tool Lumpy
INFO 2015-11-12 11:46:29,229 metasv.main Inter-tool Merging SVs of type INV
INFO 2015-11-12 11:46:29,280 metasv.main Checking overlaps SVs of type INV
INFO 2015-11-12 11:46:29,333 metasv.main Processing SVs of type INS
INFO 2015-11-12 11:46:29,333 metasv.main Intra-tool Merging SVs of type INS
INFO 2015-11-12 11:46:29,333 metasv.main First level merging for INS for tool BreakDancer
INFO 2015-11-12 11:46:36,834 metasv.main Inter-tool Merging SVs of type INS
INFO 2015-11-12 11:46:56,726 metasv.main Checking overlaps SVs of type INS
INFO 2015-11-12 11:47:17,721 metasv.main Output merged VCF without assembly
Floating point exception

KeyError: 'Assuming'

Error after running Metasv;can you give me some suggestion?

$METASV --pindel_native /bak01/yangqj/pindel/20220116/vcf_format/ps0001/ps0001_* --cnvnator_native /bak01/yangqj/cnvnator/ps0001.call --reference /bak01/yangqj/Metasv/hg19.fa --outdir out --sample ps0001 --filter_gaps --minsvlen 500 --maxsvlen 500000 --disable_assembly --keep_standard_contigs

INFO 2022-01-25 16:34:31,082 metasv.main Running MetaSV 0.5.4
INFO 2022-01-25 16:34:31,082 metasv.main Command-line /lustre/yangqj/software/miniconda3/envs/metasv/bin/run_metasv.py --pindel_native /bak01/yangqj/pindel/20220116/vcf_format/ps0001/ps0001_DEL.vcf /bak01/yangqj/pindel/20220116/vcf_format/ps0001/ps0001_TD.vcf --cnvnator_native /bak01/yangqj/cnvnator/ps0001.call --reference /bak01/yangqj/Metasv/hg19.fa --outdir out --sample ps0001 --filter_gaps --minsvlen 500 --maxsvlen 500000 --disable_assembly --keep_standard_contigs
INFO 2022-01-25 16:34:31,083 metasv.main Arguments are Namespace(age=None, age_timeout=300, age_window=20, assembly_max_tools=1, assembly_pad=500, bams=[], boost_sc=False, breakdancer_native=[], breakdancer_vcf=[], breakseq_native=[], breakseq_vcf=[], chromosomes=[], cnvkit_vcf=[], cnvnator_native=['/bak01/yangqj/cnvnator/ps0001.call'], cnvnator_vcf=[], disable_assembly=True, enable_per_tool_output=False, extraction_max_read_pairs=10000, filter_gaps=True, gaps=None, gatk_vcf=[], gt_normal_frac=0.05, gt_window=100, inswiggle=100, isize_mean=350.0, isize_sd=50.0, keep_standard_contigs=True, lumpy_vcf=[], manta_vcf=[], max_ins_cov_frac=1.5, max_ins_intervals=10000, max_nm=10, maxsvlen=500000, mean_read_coverage=50, mean_read_length=100, min_avg_base_qual=20, min_del_subalign_len=50, min_ins_cov_frac=0.5, min_inv_subalign_len=50, min_mapq=5, min_matches=50, min_soft_clip=20, min_support_frac_ins=0.05, min_support_ins=15, minsvlen=500, num_threads=1, outdir='out', overlap_ratio=0.5, pindel_native=['/bak01/yangqj/pindel/20220116/vcf_format/ps0001/ps0001_DEL.vcf', '/bak01/yangqj/pindel/20220116/vcf_format/ps0001/ps0001_TD.vcf'], pindel_vcf=[], reference='/bak01/yangqj/Metasv/hg19.fa', sample='ps0001', sc_other_scale=5, spades=None, spades_max_interval_size=50000, spades_options='', spades_timeout=300, stop_spades_on_fail=False, svs_to_assemble=set(['DUP', 'INV', 'INS']), svs_to_report=set(['INV', 'CTX', 'INS', 'DEL', 'ITX', 'DUP']), svs_to_softclip=set(['DUP', 'INV', 'INS']), wham_vcf=[], wiggle=100, workdir='work')
INFO 2022-01-25 16:34:31,084 metasv.main Only SVs on the following contigs will be reported: ['chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr22', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chrM', 'chrX', 'chrY']
INFO 2022-01-25 16:34:31,084 metasv.sv_interval Loading the gaps in the genome from /lustre/yangqj/software/miniconda3/envs/metasv/lib/python2.7/site-packages/metasv/resources/hg19.gaps.bed
INFO 2022-01-25 16:34:31,117 metasv.main Load native files
INFO 2022-01-25 16:34:31,117 metasv.cnvnator_reader File is /bak01/yangqj/cnvnator/ps0001.call
Traceback (most recent call last):
File "/lustre/yangqj/software/miniconda3/envs/metasv/bin/run_metasv.py", line 143, in
sys.exit(run_metasv(args))
File "/lustre/yangqj/software/miniconda3/envs/metasv/lib/python2.7/site-packages/metasv/main.py", line 106, in run_metasv
for record in svReader(native_file, svs_to_report=args.svs_to_report):
File "/lustre/yangqj/software/miniconda3/envs/metasv/lib/python2.7/site-packages/metasv/cnvnator_reader.py", line 123, in next
record = CNVnatorRecord(line.strip())
File "/lustre/yangqj/software/miniconda3/envs/metasv/lib/python2.7/site-packages/metasv/cnvnator_reader.py", line 38, in init
self.sv_type = sv_type_dict[fields[0]]
KeyError: 'Assuming'

pybedtools.cbedtools.MalformedBedLineError: Start is greater than stop

I tried to run MetaSV with the following command:

run_metasv.py --reference ../../../../../mnt/data/GRCh37_bcgsc/GRCh37-lite.fa --boost_ins --breakdancer_native ../../breakdancer/perl/A10898_breakdancer --breakseq_native ../../work/breakseq.gff --cnvnator_native ../../CNVnator_v0.3.2/src/A10898_100_cnvnator --pindel_native ../../pindel-master/A10898_D ../../pindel-master/A10898_LI ../../pindel-master/A10898_SI ../../pindel-master/A10898_TD ../../pindel-master/A10898_INV --sample A10898 --bam ../../../../mnt/data/A10898_3_lanes_dupsFlagged.bam --spades ../../SPAdes-3.6.2-Linux/bin/spades.py --age ../../AGE-simple-parseable-output/age_align --num_threads 15 --workdir A10898_work_metaSV --outdir A10898_out_metaSV --min_ins_support 2 --max_ins_intervals 500000 --isize_mean 500 --isize_sd 150

Eventually I get this traceback:
Traceback (most recent call last):
File "/usr/local/bin/run_metasv.py", line 6, in
exec(compile(open(file).read(), file, 'exec'))
File "/home/hpcuser01/metasv-0.3/scripts/run_metasv.py", line 108, in stop_spades_on_fail=args.stop_spades_on_fail, gt_window=args.gt_window, gt_normal_frac=args.gt_normal_frac, isize_mean=args.isize_mean, isize_sd=args.isize_sd, extraction_max_read_pairs=args.extraction_max_read_pairs))
File "/home/hpcuser01/metasv-0.3/metasv/main.py", line 327, in run_metasv
min_support_frac=min_support_frac, max_intervals=max_intervals)
File "/home/hpcuser01/metasv-0.3/metasv/generate_sv_intervals.py", line 243, in parallel_generate_sc_intervals
"Merging %d features with %d features from %s" % (bedtool.count(), skip_bedtool.count(), skip_bed))
File "/usr/local/lib/python2.7/dist-packages/pybedtools-0.7.4-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 2261, in count
return sum(1 for _ in iter(self))
File "/usr/local/lib/python2.7/dist-packages/pybedtools-0.7.4-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 2261, in
return sum(1 for _ in iter(self))
File "pybedtools/cbedtools.pyx", line 772, in pybedtools.cbedtools.IntervalIterator.next (pybedtools/cbedtools.cxx:11001)
File "pybedtools/cbedtools.pyx", line 682, in pybedtools.cbedtools.create_interval_from_list (pybedtools/cbedtools.cxx:9798)
pybedtools.cbedtools.MalformedBedLineError: Start is greater than stop

Prior to this message, I was getting errors for each chromosomes:
Traceback (most recent call last):
File "/home/hpcuser01/metasv-0.3/metasv/generate_sv_intervals.py", line 148, in generate_sc_intervals
filtered_bed)
File "/usr/local/lib/python2.7/dist-packages/pybedtools-0.7.4-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 775, in decorated
result = method(self, _args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/pybedtools-0.7.4-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 2912, in moveto
fn = self._collapse(self, fn=fn)
File "/usr/local/lib/python2.7/dist-packages/pybedtools-0.7.4-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 1215, in _collapse
for i in iterable:
File "pybedtools/cbedtools.pyx", line 731, in pybedtools.cbedtools.IntervalIterator.next (pybedtools/cbedtools.cxx:10588)
line = next(self.stream)
File "/usr/local/lib/python2.7/dist-packages/pybedtools-0.7.4-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 876, in _generator
result = func(f, _args, *_kwargs)
File "/home/hpcuser01/metasv-0.3/metasv/generate_sv_intervals.py", line 90, in merged_interval_features
interval_readcount = bam_handle.count(reference=feature.chrom, start=feature.start, end=feature.end)
File "csamtools.pyx", line 1169, in pysam.csamtools.Samfile.count (pysam/csamtools.c:13478)
File "csamtools.pyx", line 989, in pysam.csamtools.Samfile._parseRegion (pysam/csamtools.c:11668)
File "csamtools.pyx", line 923, in pysam.csamtools.Samfile.gettid (pysam/csamtools.c:10827)
File "csamtools.pyx", line 57, in pysam.csamtools._force_bytes (pysam/csamtools.c:3393)
TypeError: Expected bytes, got unicode

I'm not sure what the issue is. Your help would be appreciated!

Thanks,
Madeline

Insertion detection

Hello,

When I try to run enhanced insertion detection, I get an error message. This is my command:

run_metasv.py --reference /mnt/data/GRCh37_bcgsc/GRCh37-lite.fa --boost_sc --sample A48018 --bam /mnt/data/A48018_2_lanes_dupsFlagged.bam --spades /home/hpcuser01/SPAdes-3.6.2-Linux/spades.py --age /home/hpcuser01/AGE/age_align --num_threads 50 --workdir A48018_work_boostins --outdir A48018_out_boostinst --max_ins_intervals 500000 --isize_mean 462 --isize_sd 119 --chromosomes X

And the error message:

ERROR 2016-06-24 11:59:42,265 run_spades_single-<Process(PoolWorker-51, started daemon)> Caught exception in worker thread
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/metasv/run_spades.py", line 75, in run_spades_single
retcode = cmd.run(cmd_log_fd_out=spades_log_fd, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/metasv/external_cmd.py", line 20, in run
self.p = subprocess.Popen(self.cmd, stderr=cmd_log_fd_err, stdout=cmd_log_fd_out)
File "/usr/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

What might be the issue here?

Also, the http://bioinform.github.io/metasv/ website should be updated for enhancing insertion detection (.e.g boost_ins is no longer an option).

Thanks for your help,
Madeline

Merged calls (without assembly) have only homozygous ALT genotype and inheritance analysis.

Hello,

It seems to be great tool for merging SV calls from multiple tools.
I did a quick run with provided script "run_test.sh" as well as with our data from "CNVnator", "Lumpy" and "Manta". In both cases when I looked at the output file it contains only homozygous ALT allele in genotype filed (i.e. 1/1).
How should the GT field be interpreted ? Is there any flag that identifies whether call is heterozygous or homozygous alt.

Also, we are in lab interested in analyzing de novo SVs, is there a functionality that looks at genotype in multiple sample ? On that note, How metaSV treats multi sample VCF input, it is not giving any error when provided multi sample VCF from Manta or Lumpy ?

Thanks.

Best,
Nick

type 'exceptions.IOError': Broken pipe

Hi,
I'm running metaSV with cmd like:

run_metasv.py --reference /home/gst/work/ref/Zea_mays.AGPv4.dna.toplevel.fa --boost_sc --pindel_vcf Q417_pindel.vcf --breakdancer_vcf Q417_breakdancer.vcf --cnvnator_native Q417_cnvnator.out --manta_vcf Q417_manta.vcf --lumpy_vcf Q417_lumpy.vcf --wham_vcf Q417_whamg.vcf --mean_read_length 150 --sample Q417 --bam /home/gst/work/b73_BWAMEM_bam/Q417_bwa/Q417.bwamem.sort.bam --spades ~/gst/sftw/SPAdes-3.11.1-Linux/bin/spades.py --age ~/gst/sftw/anaconda2/bin/age_align --num_threads 8 --workdir work --outdir out --min_support_ins 4 --max_ins_intervals 500000 --isize_mean 350 --isize_sd 50

and it breaked with error said:

INFO 2018-07-10 02:09:14,322 parallel_generate_sc_intervals-<_MainProcess(MainProcess, started)> Selecting the top 500000 intervals based on normalized read support
INFO 2018-07-10 02:12:48,947 parallel_generate_sc_intervals-<_MainProcess(MainProcess, started)> After merging with work/metasv.bed 124230 features
<type 'exceptions.IOError'>: Broken pipe
The command was:

    bedtools sort -i stdin

Things to check:
Traceback (most recent call last):
File "/home/gst/sftw/anaconda2/bin/run_metasv.py", line 143, in
sys.exit(run_metasv(args))
File "/home/gst/sftw/anaconda2/lib/python2.7/site-packages/metasv/main.py", line 292, in run_metasv
other_scale=args.sc_other_scale)
File "/home/gst/sftw/anaconda2/lib/python2.7/site-packages/metasv/generate_sv_intervals.py", line 1165, in parallel_generate_sc_intervals
bedtool = bedtool.each(partial(fix_precise_coords)).sort().saveas(interval_bed)
File "/home/gst/sftw/anaconda2/lib/python2.7/site-packages/pybedtools/bedtool.py", line 668, in decorated
result = method(self, *args, **kwargs)
File "/home/gst/sftw/anaconda2/lib/python2.7/site-packages/pybedtools/bedtool.py", line 243, in wrapped
check_stderr=check_stderr)
File "/home/gst/sftw/anaconda2/lib/python2.7/site-packages/pybedtools/helpers.py", line 456, in call_bedtools
print '\n\t' + '\n\t'.join(problems[err.errno])
KeyError: 32

What's the cause of this error?

Thank you!

best wishes,

songtao gui

Release date?

Hi, I enjoyed your talk at UKGS, where it was stated that the software would be released in "a couple of weeks". Please could you clarify when we can expect MetaSV to be released? I'm looking forward to trying it with my datasets.

Error while reading AGE output

Hi,

I have used Version 0.4 and seen the error below. Do you fix this in version 0.5?

INFO 2016-02-14 17:11:46,150 run_age_single-<Process(PoolWorker-10, started daemon)> Writing the ref sequence for region chr1.1076464.1076874
INFO 2016-02-14 17:11:46,151 run_age_single-<Process(PoolWorker-10, started daemon)> Processing 13 contigs for region (chr1, 1076464, chr1, 1076874)
INFO 2016-02-14 17:11:46,151 run_age_single-<Process(PoolWorker-10, started daemon)> Writing the assembeled sequence chr1_1076464_1076874_INS_0_NODE_1_length_1122_cov_20.4015_ID_295 of length 1122
INFO 2016-02-14 17:11:46,157 run_age_single-<Process(PoolWorker-10, started daemon)> Running /BCBIOMETASV/MINICONDA/METASV27JAN/AGE_ALIGN with arguments ['-indel', '-both', '-go=-6', '/bcbiometasv/miniconda/metasv27jan/UP53input/age/chr1.1076464.1076874.ref.fa', '/bcbiometasv/miniconda/metasv27jan/UP53input/age/0df25e19515155dfddbed3a8c720a98a.as.fa']
INFO 2016-02-14 17:11:46,199 run_age_single-<Process(PoolWorker-9, started daemon)> Will process 589 intervals
INFO 2016-02-14 17:11:46,218 run_age_single-<Process(PoolWorker-9, started daemon)> Matching interval chr1 1180324 1180344 eyJOVU1fU1ZNRVRIT0RTIjogMSwgIk5VTV9TVlRPT0xTIjogMSwgIlNDX0NPVkVSQUdFIjogIjQxOCIsICJTT1VSQ0VTIjogImNocjEtMTE3OTgyNC1jaHIxLTExODA4NDQtMC1Tb2Z0Q2xpcCIsICJTQ19ORUlHSF9TVVBQT1JUIjogIjQxIiwgIlNDX1JFQURfU1VQUE9SVCI6ICIzIiwgIlNDX0NIUjJfU1RSIjogImNocjE7NDsxMTgwMDI4OzExODA0MjgsLTE7NDstMTs3MixjaHIyOzMyOzMzMTQxMjk5OzMzMTQxNjY4LGNocjE1OzE7ODgwNzA1MzQ7ODgwNzA2MDcifQ==,INS,0,SC 1 .

INFO 2016-02-14 17:11:46,229 run_age_single-<Process(PoolWorker-9, started daemon)> Writing the ref sequence for region chr1.1180324.1180344
INFO 2016-02-14 17:11:46,231 run_age_single-<Process(PoolWorker-9, started daemon)> Processing 3 contigs for region (chr1, 1180324, chr1, 1180344)
INFO 2016-02-14 17:11:46,231 run_age_single-<Process(PoolWorker-9, started daemon)> Writing the assembeled sequence chr1_1180324_1180344_INS_0_NODE_1_length_1095_cov_19.8838_ID_63 of length 1095
INFO 2016-02-14 17:11:46,234 run_age_single-<Process(PoolWorker-9, started daemon)> Running /BCBIOMETASV/MINICONDA/METASV27JAN/AGE_ALIGN with arguments ['-indel', '-both', '-go=-6', '/bcbiometasv/miniconda/metasv27jan/UP53input/age/chr1.1180324.1180344.ref.fa', '/bcbiometasv/miniconda/metasv27jan/UP53input/age/3e23e83081dd0bbc7bc0548bbb1e4534.as.fa']
INFO 2016-02-14 17:11:46,269 run_age_single-<Process(PoolWorker-10, started daemon)> Returned code 0 (0.0812719 seconds)
ERROR 2016-02-14 17:11:46,271 run_age_single-<Process(PoolWorker-10, started daemon)> Caught exception in worker thread
Traceback (most recent call last):
File "/bcbiometasv/miniconda/metasv27jan/metasv/run_age.py", line 146, in run_age_single
age_record = AgeRecord(out,tr_region_1=tr_region)
File "/bcbiometasv/miniconda/metasv27jan/metasv/age_parser.py", line 85, in init
INFO 2016-02-14 17:11:46,295 run_age_single-<Process(PoolWorker-9, started daemon)> Returned code 0 (0.04352 seconds)
ERROR 2016-02-14 17:11:46,296 run_age_single-<Process(PoolWorker-9, started daemon)> Caught exception in worker thread
Traceback (most recent call last):
File "/bcbiometasv/miniconda/metasv27jan/metasv/run_age.py", line 146, in run_age_single
age_record = AgeRecord(out,tr_region_1=tr_region)
File "/bcbiometasv/miniconda/metasv27jan/metasv/age_parser.py", line 85, in init
self.read_from_age_file(age_out_file)
File "/bcbiometasv/miniconda/metasv27jan/metasv/age_parser.py", line 186, in read_from_age_file
file2, len2 = self.parse_input_descriptor(age_fd)
File "/bcbiometasv/miniconda/metasv27jan/metasv/age_parser.py", line 126, in parse_input_descriptor
raise AgeFormatError("INPUT DESCRIPTOR", age_fd.line_num)
self.read_from_age_file(age_out_file)
File "/bcbiometasv/miniconda/metasv27jan/metasv/age_parser.py", line 186, in read_from_age_file
AgeFormatError: Error while reading AGE output, L13 (section INPUT DESCRIPTOR).
file2, len2 = self.parse_input_descriptor(age_fd)
File "/bcbiometasv/miniconda/metasv27jan/metasv/age_parser.py", line 126, in parse_input_descriptor
raise AgeFormatError("INPUT DESCRIPTOR", age_fd.line_num)
AgeFormatError: Error while reading AGE output, L13 (section INPUT DESCRIPTOR).
Exception in thread Thread-9:
Traceback (most recent call last):
File "/bcbiometasv/miniconda/lib/python2.7/threading.py", line 801, in *bootstrap_inner
self.run()
File "/bcbiometasv/miniconda/lib/python2.7/threading.py", line 754, in run
self.__target(_self.__args, _self.__kwargs)
File "/bcbiometasv/miniconda/lib/python2.7/multiprocessing/pool.py", line 389, in _handle_results
task = get()
TypeError: ('__init
() takes exactly 3 arguments (1 given)', <class 'metasv.age_parser.AgeFormatError'>, ())

James

Lumpy translocations ignored by metaSV

It looks like metaSV ignores translocations in Lumpy vcf files. When I run metaSV, I get these messages ( a short sample):

INFO 2016-10-28 11:30:02,622 metasv.main Load VCF files
INFO 2016-10-28 11:30:02,623 metasv.sv_interval Loading SV intervals from /mnt/data/SV_analysis/HSAN1-c3_bwamem.bam_lumpy.vcf
ERROR 2016-10-28 11:30:02,634 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=1, POS=29878039, REF=N, ALT=[N]1:31129380]])
ERROR 2016-10-28 11:30:02,634 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=1, POS=31129380, REF=N, ALT=[N]1:29878039]])
ERROR 2016-10-28 11:30:02,634 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=1, POS=16415316, REF=N, ALT=[N]1:16416094]])
ERROR 2016-10-28 11:30:02,634 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=1, POS=16416094, REF=N, ALT=[N]1:16415316]])
ERROR 2016-10-28 11:30:02,635 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=1, POS=6515317, REF=N, ALT=[[1:17019741[N])
ERROR 2016-10-28 11:30:02,635 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=1, POS=17019741, REF=N, ALT=[[1:6515317[N])

Lumpy doesn't annotate translocations as CTX or ITX; rather the SVTYPE is replaced by the second breakend coordinate (the first is the entry for the start position of the translocation). Is it possible to add support for lumpy translocations into metaSV?

VCF Output Description

Hello, can you please add description of the output for metasv explaining all the flags in the vcf. This will help to further filter the metasv results.

MemoryError

Hi @marghoob ,

I've got an error as following, could you advise me on a fix?

Traceback (most recent call last):
  File "/home/cbrcmod/scratch/modules/out/modulebin/metasv/0.3/bin/run_metasv.py", line 108, in <module>
    stop_spades_on_fail=args.stop_spades_on_fail, gt_window=args.gt_window, gt_normal_frac=args.gt_normal_frac, isize_mean=args.isize_mean, isize_sd=args.isize_sd, extraction_max_read_pairs=args.extraction_max_read_pairs))
  File "/home/cbrcmod/scratch/modules/out/modulebin/metasv/0.3/lib/python2.7/site-packages/metasv/main.py", line 295, in run_metasv
    pybedtools.BedTool(bed_intervals).saveas(merged_bed)
  File "/home/cbrcmod/scratch/modules/out/modulebin/metasv/0.3/lib/python2.7/site-packages/pybedtools/bedtool.py", line 390, in __init__
    fn = BedTool(iter(fn)).saveas().fn
  File "/home/cbrcmod/scratch/modules/out/modulebin/metasv/0.3/lib/python2.7/site-packages/pybedtools/bedtool.py", line 668, in decorated
    result = method(self, *args, **kwargs)
  File "/home/cbrcmod/scratch/modules/out/modulebin/metasv/0.3/lib/python2.7/site-packages/pybedtools/bedtool.py", line 2729, in saveas
    fn = self._collapse(self, fn=fn, trackline=trackline)
  File "/home/cbrcmod/scratch/modules/out/modulebin/metasv/0.3/lib/python2.7/site-packages/pybedtools/bedtool.py", line 1097, in _collapse
    for i in iterable:
  File "pybedtools/cbedtools.pyx", line 638, in pybedtools.cbedtools.IntervalIterator.__next__ (pybedtools/cbedtools.cpp:9096)
MemoryError

Best wishes,
Fengyuan

Request to specify one kmer for local SPADES assembly

Hello MetaSV Team,

I have used version 0.4 and seen 3 kmer values used to refine each SV. Is it possible to have options to use only one kmer per SV?

I know kmers need to be optimised, but I think users have prior knowledge of their reads, such as read length. So, they can reasonably choose a certain kmer in MetaSV to use so as to save computation.

Any thoughts? Thank you for this very great program.

Regards,
James

metasv error when running local assembly

Hi Brad,

I found a problem with metaSV when running the local assembly - Spades.
It seems a similar error to: bcbio/bcbio-nextgen#1075, but I'm not sure about what is causing the problem.
Please find the error copied below.

Thanks,
amaia

INFO 2015-12-16 10:43:31,315 metasv.run_spades    5176 intervals selected
INFO 2015-12-16 10:43:31,315 metasv.run_spades    22 intervals ignored
INFO 2015-12-16 10:43:32,807 run_spades_single-<Process(PoolWorker-2, started daemon)> Processing interval 22   16054689        16054690        eyJOVU1fU1ZNRVRIT0RTIjogMSwgIkJEX1NDT1JFIjogMzUuMCwgIkJEX09SSTIiOiAiMTQrMTMtIiwgIkJEX1BPUzEiOiAxNjA1NDY4OCwgIkJEX1BPUzIiOiAxNjA1NDgyOSwgIkJEX09SSTEiOiAiMTQrMTMtIiwgIkJEX0NIUjEiOiAiMjIiLCAiU09VUkNFUyI6ICIyMi0xNjA1NDY4OS0xNjA1NDY4OS0xMjQtQnJlYWtEYW5jZXIiLCAiQkRfQ0hSMiI6ICIyMiIsICJOVU1fU1ZUT09MUyI6IDEsICJCRF9TVVBQT1JUSU5HX1JFQURfUEFJUlMiOiA0fQ==,INS,124,RP   1       .
INFO 2015-12-16 10:43:32,808 extract_read_pairs-<Process(PoolWorker-2, started daemon)> Extracting reads from /data/corpora/MPI_workspace/lag/workspaces/lg-ngs/working//bam_sort/DYS14587.final.sorted.bam for region 22:16054689-16054690 with padding 500 using functions ['all_pair', 'non_perfect']
ERROR 2015-12-16 10:43:32,909 run_spades_single-<Process(PoolWorker-2, started daemon)> Caught exception in worker thread
Traceback (most recent call last):
  File "/home/amacar/.local/lib/python2.7/site-packages/metasv/run_spades.py", line 68, in run_spades_single
    max_read_pairs=max_read_pairs, sv_type=sv_type)
  File "/home/amacar/.local/lib/python2.7/site-packages/metasv/extract_pairs.py", line 93, in extract_read_pairs
    aln_list = [aln for aln in bam.fetch(chr_name, start=chr_start, end=chr_end) if not aln.is_secondary]
  File "csamtools.pyx", line 1059, in pysam.csamtools.Samfile.fetch (pysam/csamtools.c:12490)
  File "csamtools.pyx", line 989, in pysam.csamtools.Samfile._parseRegion (pysam/csamtools.c:11668)
  File "csamtools.pyx", line 923, in pysam.csamtools.Samfile.gettid (pysam/csamtools.c:10827)
  File "csamtools.pyx", line 57, in pysam.csamtools._force_bytes (pysam/csamtools.c:3393)
TypeError: Expected bytes, got unicode
INFO 2015-12-16 10:43:33,066 metasv.run_spades    Merging the contigs from []
Traceback (most recent call last):
  File "/data/corpora/MPI_workspace/lag/workspaces/lg-ngs/working/programs/metasv-0.4/scripts/run_metasv.py", line 136, in <module>
    sys.exit(run_metasv(args))
  File "/home/amacar/.local/lib/python2.7/site-packages/metasv/main.py", line 307, in run_metasv
    assembly_max_tools=args.assembly_max_tools)
  File "/home/amacar/.local/lib/python2.7/site-packages/metasv/run_spades.py", line 204, in run_spades_parallel
    for line in fileinput.input(assembly_fastas):
  File "/usr/lib64/python2.7/fileinput.py", line 253, in next
    line = self.readline()
  File "/usr/lib64/python2.7/fileinput.py", line 346, in readline
    self._buffer = self._file.readlines(self._bufsize)

output vcf information

hi,

there are some information about "TAGS= sample name" in output vcf file. So, this information tell me which samples have this variantion ?

thanks,

Kai

Pip installation error

I'm trying to set up metaSV on a shared HPC on ComputeCanada's Cedar and running into an error with the pip installation.

Following the installation instructions I download/load the system requirements first.

First load provided modules and setup Python env:

module load python/3.8
module load spades/3.13.1
module load samtools/0.1.20

virtualenv metaSV
source metaSV/bin/activate
pip install Cython # needs to be installed before the following 3 dependencies
pip install pysam
pip install pybedtools
pip install pyvcf

SPAdes was already available but I needed to downloaded/compiled AGE make OMP=no

Now I try to install metaSV with pip install https://github.com/bioinform/metasv/archive/0.5.2.tar.gz and get an error:

Ignoring pip: markers 'python_version < "3"' don't match your environment
Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/avx2, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic
Collecting https://github.com/bioinform/metasv/archive/0.5.2.tar.gz
  Using cached https://github.com/bioinform/metasv/archive/0.5.2.tar.gz
    ERROR: Command errored out with exit status 1:
     command: /project/6013424/common/tools/CNV/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-pd08her8/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-pd08her8/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-req-build-pd08her8/pip-egg-info
         cwd: /tmp/pip-req-build-pd08her8/
    Complete output (6 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-pd08her8/setup.py", line 8
        print version
              ^
    SyntaxError: Missing parentheses in call to 'print'. Did you mean print(version)?
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

My uname -a:

Linux cedar1.cedar.computecanada.ca 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 GNU/Linux

custom age makefile fails with g++

the Makefile for the custom version of age_align has -O3 option added, on my debian systems at least this pretty well breaks age_align. it hangs most of the time, even at -O it hangs on the test.sh script 90%+ of the time.

tried compiling with gcc 4.6.3 and 4.9.2, nearly the same results.

removing the -O option has it working

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.