Giter Club home page Giter Club logo

Comments (15)

NTNguyen13 avatar NTNguyen13 commented on September 12, 2024 1

hi, thank you for your confirmation. I have redownloaded the file using SRA toolkit and now it's working fine.

from wengan.

adigenova avatar adigenova commented on September 12, 2024

Hi,

It seems that the problem is with the fastmin-sg binary. What gcc/g++ version do you have?. Are you using the pre-compiled binaries?
Thanks

Best,
Alex

from wengan.

NTNguyen13 avatar NTNguyen13 commented on September 12, 2024

Yes, I use the pre-compiled binaries.

The gcc/g++ version I'm using is 9.3.0-17 on ubuntu 20.04

from wengan.

adigenova avatar adigenova commented on September 12, 2024

Ok, can you recompile just the fastmin-sg binary? and can you share the data to reproduce the problem?.
Someone else DM me about this problem but was not possible for him to share its data.
The weird thing is that the core-dump arrives at the end of the execution of fastmin-sg, thus might be something related to the data itself (corrupt file).

Best,
Alex

from wengan.

NTNguyen13 avatar NTNguyen13 commented on September 12, 2024

hi @adigenova I tried those things:

  • re-compiled the fastmin-sg
  • re-download the long and short read, this time use only 1 single file, not the file after cat

but the error still persists:

wengan.pl -x pacraw \
>     -a M \
>     -s /mnt/Data/VGP_Assembly/NA12878/FASTQ_Illumina/U0a_CGATGT_L001_R1_001.fastq.gz,/mnt/Data/VGP_Assembly/NA12878/FASTQ_Illumina/U0a_CGATGT_L001_R2_001.fastq.gz \
>     -l /mnt/Data/VGP_Assembly/NA12878/FASTQ_PcBio/HG001.m64013_190412_043951.consensusreads.fastq.gz \
>     -p asm_wengan \
>     -t 14 \
>     -g 3000 

/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/minia -in asm_wengan.minia_reads.41.txt -kmer-size 41 -abundance-min 2 -out asm_wengan.minia.41 -minimizer-size 10 -max-memory 5000 -nb-cores 14 2> asm_wengan.minia.41.err > asm_wengan.minia.41.log
rm -f asm_wengan.minia.41.unitigs.fa.glue* asm_wengan.minia.41.h5 asm_wengan.minia.41.unitigs.fa
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/minia -in asm_wengan.minia_reads.81.txt -kmer-size 81 -abundance-min 2 -out asm_wengan.minia.81 -minimizer-size 10 -max-memory 5000 -nb-cores 14 2> asm_wengan.minia.81.err > asm_wengan.minia.81.log
rm -f asm_wengan.minia.81.unitigs.fa.glue* asm_wengan.minia.81.h5 asm_wengan.minia.81.unitigs.fa
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/minia -in asm_wengan.minia_reads.121.txt -kmer-size 121 -abundance-min 2 -out asm_wengan.minia.121 -minimizer-size 10 -max-memory 5000 -nb-cores 14 2> asm_wengan.minia.121.err > asm_wengan.minia.121.log
rm -f asm_wengan.minia.121.unitigs.fa.glue* asm_wengan.minia.121.h5 asm_wengan.minia.121.unitigs.fa
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/seqtk seq -L 200 asm_wengan.minia.121.contigs.fa  | /home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/seqtk seq -l 60 - > asm_wengan.minia.contigs.fa
grep ">" asm_wengan.minia.contigs.fa | sed 's/km:f://' | awk '{print $1" "$4}' | sed 's/>//g' > asm_wengan.minia.contigs.cov.txt
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/fastmin-sg shortr -c 50 -k 21 -w 10 -q 20 -r 50000 -t 14 asm_wengan.minia.contigs.fa asm_wengan.fms.txt 2>asm_wengan.fms.err >asm_wengan.fms.log
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/intervalmiss -d 7 --fst 0.1 -b asm_wengan.minia.contigs.cov.txt -t 14 -s asm_wengan.fms.sams.txt -c  asm_wengan.minia.contigs.fa -p  asm_wengan 2>asm_wengan.im.err >asm_wengan.im.log
grep ">" asm_wengan.MBC7.msplit.fa | sed 's/>//' | awk '{print $1" "$2}' | sed 's/>//g' > asm_wengan.MBC7.msplit.cov.txt
/home/nguyen/Exec/wengan-v0.2-bin-Linux/bin/fastmin-sg pacraw -k 20 -w 5 -q 40 -m 150 -r 300 -t  14 -p asm_wengan -I  500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000 asm_wengan.MBC7.msplit.fa asm_wengan.fml.txt 2>asm_wengan.fml.err >asm_wengan.fml.log
make: *** [asm_wengan.mk:58: longreads.asm_wengan1.fa] Error 139
make: *** Deleting file 'longreads.asm_wengan1.fa'

from wengan.

adigenova avatar adigenova commented on September 12, 2024

Hi,

Can you post the links to the public data that you are using?.
So, I can reproduce the error and code a patch.
Thanks in advance.
Alex

from wengan.

NTNguyen13 avatar NTNguyen13 commented on September 12, 2024

Hi, for Long read I downloaded from here: https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP194450
and for short read: https://github.com/genome-in-a-bottle/giab_data_indexes/blob/master/NA12878/sequence.index.NA12878_Illumina300X_wgs_09252015 (there are many files, you can use the first 3-4 fastq files for each read1 or read2, then it's enough for ~15X)

from wengan.

adigenova avatar adigenova commented on September 12, 2024

Great thanks!!!.
I'll take a look and let you know.
Best
Alex

from wengan.

CTriay avatar CTriay commented on September 12, 2024

Hello,

I encountered the same issue, at the same step (Fastmin-sg):

/bin/sh: line 1: 50252 Segmentation fault      (core dumped) /gs7k1/binaries/wengan/0.2/bin/fastmin-sg ontraw -k 20 -w 5 -q 40 -m 150 -r 300 -t 20 -p /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.D
ENOVO_WENGAN/Koka_A565 -I 500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000,30000,40000,50000 /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.MBC7.msplit.fa /homed
ir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.txt 2> /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.err > /homedir/triay/work/7.LONG_READS_Mi
nION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.log
make: *** [longreads./homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A5651.fa] Error 139

As sam files and IntervalMiss files are already present the output folder, I think it might be the fastmin-sg step occurring when long reads are called.
Any idea since the thread started of what I could do to correct this?

Best,

Cecile

from wengan.

CTriay avatar CTriay commented on September 12, 2024

Hi again,

I tried adding "ulimit -s unlimited" in my script before running Wengan.pl as suggested in issue #19 , however, I still have the same error 139.

/gs7k1/binaries/wengan/0.2/bin/fastmin-sg ontraw -k 20 -w 5 -q 40 -m 150 -r 300 -t  20 -p /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565 -I  500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000,30000,40000,50000 /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.MBC7.msplit.fa /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.txt 2>/homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.err >/homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.log
/bin/sh: line 1: 475427 Segmentation fault      (core dumped) /gs7k1/binaries/wengan/0.2/bin/fastmin-sg ontraw -k 20 -w 5 -q 40 -m 150 -r 300 -t 20 -p /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565 -I 500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000,30000,40000,50000 /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.MBC7.msplit.fa /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.txt 2> /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.err > /homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A565.fml.log
make: *** [longreads./homedir/triay/work/7.LONG_READS_MinION_Kp-Kk-Ho/7.5.DENOVO_WENGAN/Koka_A5651.fa] Error 139

I'm running it on a SGE Cluster using the bigmem.q and wonder if I'm doing something wrong? I paste my script bellow...

#!/bin/bash
#$ -q bigmem.q
#$ -N Wengan_Koka-A565

# JOB BEGIN

module load bioinfo/wengan/0.2

ulimit -s unlimited
wengan.pl -x ontraw -a M -s /<path_to>/HNDT27.KkM.R1.fastq.gz,/path_to/HNDT27.KkM.R2.fastq.gz -l /<path_to>/A565_31052020.fastq.gz -p /<path_to>/Koka_A565 -t 20 -g 1100

# JOB END

Please, if you have any idea of what is causing the error 139 It would be greatly appreciated!

Best,

Cécile

from wengan.

adigenova avatar adigenova commented on September 12, 2024

HI both,

I took a look at this issue and I was able to find where the seg fault happens:

#0  0x00007f0c8f33d6c3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x000055915fc84822 in ks_getuntil2.part.0.constprop ()
#2  0x000055915fc84db4 in kseq_read ()
#3  0x000055915fc85247 in maplongreads ()
#4  0x00007f0c8f3aa609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007f0c8f2d1293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

it happens while reading the long read sequences and it crashes while reading the end of the file. I checked the file HG001.m64011_190329_072846.consensusreads.fastq.gz that I was using to reproduce the behavior and it is corrupted because it gives the "unexpected end of file" message while uncompressing it or checking it (gzip -v -t) :

#while uncompressing the file
zcat  HG001.m64011_190329_072846.consensusreads.fastq.gz  > HG001.m64011_190329_072846.consensusreads.fastq	 
**gzip: HG001.m64011_190329_072846.consensusreads.fastq.gz: unexpected end of file**
#while checking the integrity of the gzip file
gzip -v -t HG001.m64011_190329_072846.consensusreads.fastq.gz 
HG001.m64011_190329_072846.consensusreads.fastq.gz:	
**gzip: HG001.m64011_190329_072846.consensusreads.fastq.gz: unexpected end of file**

To further check this, I uncompressed the file with zcat, picked 1000 sequences, compressed it again with gzip, and rerun fastmin-sg:

 zcat HG001.m64011_190329_072846.consensusreads.fastq.gz  | head -n 4000 | gzip >  test.fastq.gz

fasmin-sg finished without crashing. Thus, it seems that is not an issue related to the fasmin-sg code rather to corrupted gzip or fastq files.

@NTNguyen13

  1. Thanks for providing the test dataset.
  2. if you want 15X short-read coverage, you have to include more data from GIAB. The file U0a_CGATGT_L001_R1_001.fastq.gz, U0a_CGATGT_L001_R2_001.fastq.gz has only 4 Million reads that represent about 0.4X of genome coverage (4000000*150*2/3000000000).
  3. The recommended short-read genome coverage is about 50X. Lower coverage would result in more fragmented assemblies. I recommend the setting of the following parameters (-M 1000 -d 2) to deal with the lower short-read coverage.

@CTriay

  1. I recommend to check the integrity of your compressed long-read files.

Best,
Alex

from wengan.

CTriay avatar CTriay commented on September 12, 2024

Hello Alex,

Thank you for coming back to us! I'm glad things are working for @NTNguyen13 , but unfortunately the problem is not solved for me.

I checked the integrity of my fastq.gz files using gzip -v -t and it appears to be ok!
I thus tried to unzip and gzip again the files, the problem il still exactly the same.
Finally I did with a subset of my fastq.gz long reads file, taking only the top 1000 sequences. The issue is still there, same error 139 as before.
The files have not been modified, they were taken straight out of the sequencing facilities...

Do you have any other advice or thing I could triple check?

Best,

Cecile

from wengan.

adigenova avatar adigenova commented on September 12, 2024

Hi Cecile,

One thing that I did was to map the reads using minimap2 to the contigs used as input for fastmin-sg. Minimap2 can handle some corrupt fastq files and it reports to the error log where the problem might be (which sequence). can you do the same exercise? , it's possible for you to share the data to reproduce the issue. I mean just the input to fastmin-sg. Moreover, can you post the log that fasmin-sg generate?

Best,
Alex

from wengan.

CTriay avatar CTriay commented on September 12, 2024

Alex,

Thank you for you quick response!

I'm not sure I can share the data yet, but I may be able to share a subset soon. I asked permissions of my collaborators.
Meanwhile, I'll try to do the same exercise as you did. I'll keep you posted if I succeed!

The *.fml.log file is empty after the error 139.
The *.fml.err contains the following informations:

LOG: Mapping mode =ontraw H=0 k=20 w=5 L=2000 l=250 q=40 m=150 c=65 r=300 t=20 o=/Path_to_directory/Koka_A565 I=500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000,30000,40000,50000 s=1
Building contig index
[M::mm_idx_gen::1606476568.013*0.00] collected minimizers
[M::mm_idx_gen::1606476570.172*0.00] sorted minimizers
[M::mm_mapopt_update::1606476572.161*0.00] mid_occ = 15
[M::mm_idx_stat] kmer size: 20; skip: 5; is_hpc: 0; #seq: 409021
[M::mm_idx_stat::1606476574.210*0.00] distinct minimizers: 141411081 (98.66% are singletons); average occurrences: 1.029; average spacing: 3.056
Index construction time: 57.330000 seconds for 409021 target sequence(s)

I hope it is what you were asking for?

Best,

Cécile

from wengan.

adigenova avatar adigenova commented on September 12, 2024

Hi Cécile,

Yes, I just wanted to check that the input assembly for fastmin-sg has sequences, and it indeed has 409k contigs.
Let me know if you can share a subset of your data.. ideally one reproducing the error.

Best,
Alex

from wengan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.