Comments (15)
hi, thank you for your confirmation. I have redownloaded the file using SRA toolkit and now it's working fine.
from wengan.
Hi,
It seems that the problem is with the fastmin-sg binary. What gcc/g++ version do you have?. Are you using the pre-compiled binaries?
Thanks
Best,
Alex
from wengan.
Yes, I use the pre-compiled binaries.
The gcc/g++ version I'm using is 9.3.0-17 on ubuntu 20.04
from wengan.
Ok, can you recompile just the fastmin-sg binary? and can you share the data to reproduce the problem?.
Someone else DM me about this problem but was not possible for him to share its data.
The weird thing is that the core-dump arrives at the end of the execution of fastmin-sg, thus might be something related to the data itself (corrupt file).
Best,
Alex
from wengan.
hi @adigenova I tried those things:
- re-compiled the fastmin-sg
- re-download the long and short read, this time use only 1 single file, not the file after
cat
but the error still persists:
wengan.pl -x pacraw \
> -a M \
> -s /mnt/Data/VGP_Assembly/NA12878/FASTQ_Illumina/U0a_CGATGT_L001_R1_001.fastq.gz,/mnt/Data/VGP_Assembly/NA12878/FASTQ_Illumina/U0a_CGATGT_L001_R2_001.fastq.gz \
> -l /mnt/Data/VGP_Assembly/NA12878/FASTQ_PcBio/HG001.m64013_190412_043951.consensusreads.fastq.gz \
> -p asm_wengan \
> -t 14 \
> -g 3000
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/minia -in asm_wengan.minia_reads.41.txt -kmer-size 41 -abundance-min 2 -out asm_wengan.minia.41 -minimizer-size 10 -max-memory 5000 -nb-cores 14 2> asm_wengan.minia.41.err > asm_wengan.minia.41.log
rm -f asm_wengan.minia.41.unitigs.fa.glue* asm_wengan.minia.41.h5 asm_wengan.minia.41.unitigs.fa
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/minia -in asm_wengan.minia_reads.81.txt -kmer-size 81 -abundance-min 2 -out asm_wengan.minia.81 -minimizer-size 10 -max-memory 5000 -nb-cores 14 2> asm_wengan.minia.81.err > asm_wengan.minia.81.log
rm -f asm_wengan.minia.81.unitigs.fa.glue* asm_wengan.minia.81.h5 asm_wengan.minia.81.unitigs.fa
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/minia -in asm_wengan.minia_reads.121.txt -kmer-size 121 -abundance-min 2 -out asm_wengan.minia.121 -minimizer-size 10 -max-memory 5000 -nb-cores 14 2> asm_wengan.minia.121.err > asm_wengan.minia.121.log
rm -f asm_wengan.minia.121.unitigs.fa.glue* asm_wengan.minia.121.h5 asm_wengan.minia.121.unitigs.fa
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/seqtk seq -L 200 asm_wengan.minia.121.contigs.fa | /home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/seqtk seq -l 60 - > asm_wengan.minia.contigs.fa
grep ">" asm_wengan.minia.contigs.fa | sed 's/km:f://' | awk '{print $1" "$4}' | sed 's/>//g' > asm_wengan.minia.contigs.cov.txt
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/fastmin-sg shortr -c 50 -k 21 -w 10 -q 20 -r 50000 -t 14 asm_wengan.minia.contigs.fa asm_wengan.fms.txt 2>asm_wengan.fms.err >asm_wengan.fms.log
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/intervalmiss -d 7 --fst 0.1 -b asm_wengan.minia.contigs.cov.txt -t 14 -s asm_wengan.fms.sams.txt -c asm_wengan.minia.contigs.fa -p asm_wengan 2>asm_wengan.im.err >asm_wengan.im.log
grep ">" asm_wengan.MBC7.msplit.fa | sed 's/>//' | awk '{print $1" "$2}' | sed 's/>//g' > asm_wengan.MBC7.msplit.cov.txt
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/fastmin-sg pacraw -k 20 -w 5 -q 40 -m 150 -r 300 -t 14 -p asm_wengan -I 500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000 asm_wengan.MBC7.msplit.fa asm_wengan.fml.txt 2>asm_wengan.fml.err >asm_wengan.fml.log
make: *** [asm_wengan.mk:58: longreads.asm_wengan1.fa] Error 139
make: *** Deleting file 'longreads.asm_wengan1.fa'
from wengan.
Hi,
Can you post the links to the public data that you are using?.
So, I can reproduce the error and code a patch.
Thanks in advance.
Alex
from wengan.
Hi, for Long read I downloaded from here: https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP194450
and for short read: https://github.com/genome-in-a-bottle/giab_data_indexes/blob/master/NA12878/sequence.index.NA12878_Illumina300X_wgs_09252015 (there are many files, you can use the first 3-4 fastq files for each read1 or read2, then it's enough for ~15X)
from wengan.
Great thanks!!!.
I'll take a look and let you know.
Best
Alex
from wengan.
Hello,
I encountered the same issue, at the same step (Fastmin-sg):
/bin/sh: line 1: 50252 Segmentation fault (core dumped) /gs7k1/binaries/wengan/0.2/bin/fastmin-sg ontraw -k 20 -w 5 -q 40 -m 150 -r 300 -t 20 -p /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.D
ENOVO_WENGAN/Koka_A565 -I 500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000,30000,40000,50000 /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.MBC7.msplit.fa /homed
ir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.txt 2> /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.err > /homedir/triay/work/7.LONG_READS_Mi
nION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.log
make: *** [longreads./homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A5651.fa] Error 139
As sam files and IntervalMiss files are already present the output folder, I think it might be the fastmin-sg step occurring when long reads are called.
Any idea since the thread started of what I could do to correct this?
Best,
Cecile
from wengan.
Hi again,
I tried adding "ulimit -s unlimited" in my script before running Wengan.pl as suggested in issue #19 , however, I still have the same error 139.
/gs7k1/binaries/wengan/0.2/bin/fastmin-sg ontraw -k 20 -w 5 -q 40 -m 150 -r 300 -t 20 -p /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565 -I 500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000,30000,40000,50000 /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.MBC7.msplit.fa /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.txt 2>/homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.err >/homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.log
/bin/sh: line 1: 475427 Segmentation fault (core dumped) /gs7k1/binaries/wengan/0.2/bin/fastmin-sg ontraw -k 20 -w 5 -q 40 -m 150 -r 300 -t 20 -p /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565 -I 500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000,30000,40000,50000 /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.MBC7.msplit.fa /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.txt 2> /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.err > /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.log
make: *** [longreads./homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A5651.fa] Error 139
I'm running it on a SGE Cluster using the bigmem.q and wonder if I'm doing something wrong? I paste my script bellow...
#!/bin/bash
#$ -q bigmem.q
#$ -N Wengan_Koka-A565
# JOB BEGIN
module load bioinfo/wengan/0.2
ulimit -s unlimited
wengan.pl -x ontraw -a M -s /<path_to>/HNDT27.KkM.R1.fastq.gz,/path_to/HNDT27.KkM.R2.fastq.gz -l /<path_to>/A565_31052020.fastq.gz -p /<path_to>/Koka_A565 -t 20 -g 1100
# JOB END
Please, if you have any idea of what is causing the error 139 It would be greatly appreciated!
Best,
Cécile
from wengan.
HI both,
I took a look at this issue and I was able to find where the seg fault happens:
#0 0x00007f0c8f33d6c3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x000055915fc84822 in ks_getuntil2.part.0.constprop ()
#2 0x000055915fc84db4 in kseq_read ()
#3 0x000055915fc85247 in maplongreads ()
#4 0x00007f0c8f3aa609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007f0c8f2d1293 in clone () from /lib/x86_64-linux-gnu/libc.so.6
it happens while reading the long read sequences and it crashes while reading the end of the file. I checked the file HG001.m64011_190329_072846.consensusreads.fastq.gz that I was using to reproduce the behavior and it is corrupted because it gives the "unexpected end of file" message while uncompressing it or checking it (gzip -v -t) :
#while uncompressing the file
zcat HG001.m64011_190329_072846.consensusreads.fastq.gz > HG001.m64011_190329_072846.consensusreads.fastq
**gzip: HG001.m64011_190329_072846.consensusreads.fastq.gz: unexpected end of file**
#while checking the integrity of the gzip file
gzip -v -t HG001.m64011_190329_072846.consensusreads.fastq.gz
HG001.m64011_190329_072846.consensusreads.fastq.gz:
**gzip: HG001.m64011_190329_072846.consensusreads.fastq.gz: unexpected end of file**
To further check this, I uncompressed the file with zcat, picked 1000 sequences, compressed it again with gzip, and rerun fastmin-sg:
zcat HG001.m64011_190329_072846.consensusreads.fastq.gz | head -n 4000 | gzip > test.fastq.gz
fasmin-sg finished without crashing. Thus, it seems that is not an issue related to the fasmin-sg code rather to corrupted gzip or fastq files.
- Thanks for providing the test dataset.
- if you want 15X short-read coverage, you have to include more data from GIAB. The file U0a_CGATGT_L001_R1_001.fastq.gz, U0a_CGATGT_L001_R2_001.fastq.gz has only 4 Million reads that represent about 0.4X of genome coverage (4000000*150*2/3000000000).
- The recommended short-read genome coverage is about 50X. Lower coverage would result in more fragmented assemblies. I recommend the setting of the following parameters (-M 1000 -d 2) to deal with the lower short-read coverage.
- I recommend to check the integrity of your compressed long-read files.
Best,
Alex
from wengan.
Hello Alex,
Thank you for coming back to us! I'm glad things are working for @NTNguyen13 , but unfortunately the problem is not solved for me.
I checked the integrity of my fastq.gz files using gzip -v -t and it appears to be ok!
I thus tried to unzip and gzip again the files, the problem il still exactly the same.
Finally I did with a subset of my fastq.gz long reads file, taking only the top 1000 sequences. The issue is still there, same error 139 as before.
The files have not been modified, they were taken straight out of the sequencing facilities...
Do you have any other advice or thing I could triple check?
Best,
Cecile
from wengan.
Hi Cecile,
One thing that I did was to map the reads using minimap2 to the contigs used as input for fastmin-sg. Minimap2 can handle some corrupt fastq files and it reports to the error log where the problem might be (which sequence). can you do the same exercise? , it's possible for you to share the data to reproduce the issue. I mean just the input to fastmin-sg. Moreover, can you post the log that fasmin-sg generate?
Best,
Alex
from wengan.
Alex,
Thank you for you quick response!
I'm not sure I can share the data yet, but I may be able to share a subset soon. I asked permissions of my collaborators.
Meanwhile, I'll try to do the same exercise as you did. I'll keep you posted if I succeed!
The *.fml.log file is empty after the error 139.
The *.fml.err contains the following informations:
LOG: Mapping mode =ontraw H=0 k=20 w=5 L=2000 l=250 q=40 m=150 c=65 r=300 t=20 o=/Path_to_directory/Koka_A565 I=500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000,30000,40000,50000 s=1
Building contig index
[M::mm_idx_gen::1606476568.013*0.00] collected minimizers
[M::mm_idx_gen::1606476570.172*0.00] sorted minimizers
[M::mm_mapopt_update::1606476572.161*0.00] mid_occ = 15
[M::mm_idx_stat] kmer size: 20; skip: 5; is_hpc: 0; #seq: 409021
[M::mm_idx_stat::1606476574.210*0.00] distinct minimizers: 141411081 (98.66% are singletons); average occurrences: 1.029; average spacing: 3.056
Index construction time: 57.330000 seconds for 409021 target sequence(s)
I hope it is what you were asking for?
Best,
Cécile
from wengan.
Hi Cécile,
Yes, I just wanted to check that the input assembly for fastmin-sg has sequences, and it indeed has 409k contigs.
Let me know if you can share a subset of your data.. ideally one reproducing the error.
Best,
Alex
from wengan.
Related Issues (20)
- Error 139 HOT 4
- Setting new tmp directory for intermediate files HOT 1
- error2 HOT 3
- asm1.minia.41.contigs.fa] Error 127 HOT 3
- "--clib" flag error in intervalmiss HOT 2
- Using error corrected long reads HOT 2
- Error 137 HOT 4
- Is adapter filtering needed? HOT 1
- Unhandled kmer size HOT 1
- issues with non IUPAC bases HOT 1
- unrecognized command 'iupac2bases'. Abort! HOT 1
- Installatation HOT 1
- SPolished.asm.wengan.fasta] Error 136 HOT 3
- Leveraging ONT raw and PacBio raw HOT 1
- Error 132 - linger
- Wengan error 1 HOT 2
- Error 136
- make: *** [m013330.mk:4: m013330.abyss2-contigs.fa] Error 2 HOT 2
- Unable to install DiscovarDenovo
- Final assembly too small HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wengan.