xfengnefx / hifiasm-meta Goto Github PK

This project forked from chhylp123/hifiasm

hifiasm_meta - de novo metagenome assembler, based on hifiasm, a haplotype-resolved de novo assembler for PacBio Hifi reads.

License: MIT License

C++ 90.87% C 8.36% Makefile 0.13% Roff 0.62% Shell 0.01%

hifiasm-meta's People

Contributors

Stargazers

Watchers

Forkers

lh3 jianshu93 ohmeta alienzj khemlalnirmalkar wenrong19961019 srisvs33

hifiasm-meta's Issues

General question regarding treatment of contained reads

The manuscript briefly mentions how Hifiasm-meta uses a new method for filtering contained reads. I'm interested in learning about the filtering mechanism here. Could you please share more details of the algorithm ; OR point me to appropriate place in the code. Pasting the text from your manuscript:

Treatment of contained reads. The standard procedure to construct a string graph discards a read contained in a longer read. This may lead to an assembly gap if the contained read and the longer read actually reside on different haplotypes10. The original hifiasm patches such gaps by rescuing contained reads after graph construction. Hifiasm-meta tries to resolve the issue before graph construction instead. It retains a contained read if other reads exactly overlapping with the read are inferred to come from different haplotypes. In other words, hifiasm-meta only drops a contained read if there are no other similar haplotypes around it. This strategy often retains extra contained reads that are actually redundant. These extra reads usually lead to bubble-like subgraphs and are later removed by the bubble popping algorithm in the original hifiasm.

I wish to understand the exact condition / threshold values which decides whether to retain the contained read.

Thank you.

Too much RAM memory required for metagenome assembly.

Greetings! This is not a bug but rather an optimization issue. We are currently using REVIO data using one sample per cell. This leaves us with a raw file of 30-40 gigabytes to do metagenome assembly.
We are currently using a server with 750gb of RAM memory and it seems not to be enough. Is there a parameter that can be tweaked to reduce memory usage? Is there an approximate formula we can use to calculate how much memory is needed?

Thank u very much!

Duplicate GFA links

Hello

It seems there are duplicate edges in the produced GFA. What is the purpose of these?

E.g. if we'd take sheepB.hifiasm-meta.a_ctg.gfa.gz then we'll end with:

L       s0.ctg000590l   +       s0.ctg027907l   -       10632M  L1:i:29150
...
L       s0.ctg027907l   +       s0.ctg000590l   -       10637M  L1:i:14435

Note that overlaps are different as well which does look suspicious...

Resistance gene not assembled in primary contig file, but is present in alternate contig file

In a comparison of metagenomic assemblies made only from illumina short-read data (metaSPADES) to hifiasm-meta assemblies (the primary contig file, .p_ctg.gfa), we found an ARG assembled in a metaSPADES assembly that was not present in the hifiasm-meta assembly for that sample. ~20 HiFi reads align well to the ARG, but the ARG is not assembled with hifiasm-meta. However, the ARG is assembled in the alternate contig file made by hifiasm-meta (.a_ctg.gfa), along with the .r_ctg.gfa and .p_utg.gfa files. How would you recommend we run hifiasm-meta such that the ARG content we care about lies solely within the primary contig file, or should we find ARGs within both primary and alternate contig files? ARGs are usually surrounded by mobile genetic elements and usually belong to low-abundance species.

hifiasm-meta produces redundant assemblies?

Hello,

I performed de novo assembly on two human faecal metagenomes sequenced with PacBio Sequel II.
I tested metaFlye (2.9-b1768) and hifiasm-meta (v0.2.1).
As you can see below, hifiasm-meta produces much larger assemblies.

I mapped on the PacBio assemblies Illumina paired-end reads obtained from the same samples.
Even if the assemblies of hifiasm_meta are much larger, the proportion of mapped reads only increases slightly.
In addition, the proportion of reads aligned exactly 1 time is much lower.
This suggests that hifiasm-meta produces redundant assemblies.
What do you think?

Thanks for you help,
Florian

Donor 1

	metaFlye	hifiasm_meta
assembly size (bp)	596 522 308	831 187 874
# contigs	9 253	15 586
N50 (bp)	164 736	132 052
% illumina reads aligned concordantly exactly 1 time	50.79	39.45
% illumina reads aligned concordantly > 1 time	23.50	38.31
% illumina reads aligned concordantly	74.29	77.76

Donor 2

	metaFlye	hifiasm_meta
assembly size (bp)	264 656 715	551 812 461
# contigs	3 836	17 080
N50 (bp)	243 801	44 732
% illumina reads aligned concordantly exactly 1 time	55.28	20.34
% illumina reads aligned concordantly > 1 time	33.15	74.26
% illumina reads aligned concordantly	88.43	94.6

Question about read overlap parameters and usage

Hi,

I'm trying to understand the difference between the read overlap parameters --force-rs, -S and --force-preovec. Do they all do the same thing?

Also, I've noticed there are different usage instructions in the manpage, the README and output of hifiasm_meta -h. I found this a bit confusing, is it possible to clarify which one is correct, or more complete?

Thanks!

Hi-C integration?

Hi,
are you back-porting (up-porting? side-porting?) the Hi-C integration from hifiasm? We are sequencing some species where up to half the sample might be bacteria and fungi (the target species is a plant), and are considering using hifiasm-meta for this as the first step, and then mapping and extracting the plant-specific reads for a separate assembly with regular hifiasm. We are also getting Hi-C reads for these samples, so I wondered if Hi-C integration might be helpful for separating species in hifiasm-meta.

Sincerely,
Ole

Update bioconda recipe to latest release

Hello,

Could you update the bioconda recipe to the the latest release? (current version is v0.2 r40 instead of v0.2.1 r52)

Thanks!

hifiasm-meta exit!

For assembly of metagenomic samples with HiFi reads, I ran hifiasm-meta with default parameters. The command line was as follow:
hifiasm_meta -t 32 -o asm hifi_reads.fastq.gz

But the process was exit in the step of "checkpoint: post-assembly". The log file showed:

********** checkpoint: post-assembly **********
[M::hamt_clean_graph] (peak RSS so far: 18.8 GB)
[M::hamt_ug_opportunistic_elementary_circuits] collected 0 circuits, used 0.00s
[M::hamt_ug_opportunistic_elementary_circuits] wrote all rescued circles, used 0.00s
[T::hamt_ug_opportunistic_elementary_circuits_helper_deduplicate_minhash] got the sequences, used 0.0s
[T::hamt_minhash_mashdist] sketched - 0.0s.
[T::hamt_minhash_mashdist] compared - 0.0s.
[T::hamt_ug_opportunistic_elementary_circuits_helper_deduplicate_minhash] collected mash distances for 0 seqs, used 0.0s
[M::hamt_ug_opportunistic_elementary_circuits_helper_deduplicate_minhash] had 0 paths, 0 remained (0 dropped by length diff, 0 by length abs),used 0.0s after sketching.
[M::hamt_ug_opportunistic_elementary_circuits] deduplicated rescued circles, used 0.01s
[M::hamt_ug_opportunistic_elementary_circuits] wrote deduplicated rescued circles, used 0.00s
[M::hamt_simple_binning] Will try to bin on 147 contigs (skipped 0 because blacklist).
Using random seed: 42
Perplexity too large for the number of data points!
################

How can I fix it? I'm looking for your reply. Thank you very much!

Problems understanding the output files

Hello,

I'm testing hifiasm for my PacBio Hifi reads. Running the program works fine, however I'm having some problems making sense of the output files. With other assemblers such as hiCanu and flye, you get one fasta file that contains the complete assembly, to which the reads can be mapped to.
With hifiasm, you get multiple unitig and contig output files (the asm.a*, asm.r*, and asm.p* files, in addition the noseq versions) in GFA format. These are all the GFA files that I got:

asm.a_ctg.gfa
asm.p_ctg.noseq.gfa
asm.p_utg.noseq.gfa
asm.r_utg.noseq.gfa
asm.a_ctg.noseq.gfa
asm.p_ctg.gfa
asm.p_utg.gfa
asm.r_utg.gfa

Could you tell me which of these files corresponds most to the one assembly.fasta that is produced by other assemblers?
This one I want to use for further analysis, such as mapping of the reads and binning.

Thanks a lot for any help,

Hanna

HiFi reads: Is it better to perform assembly before taxonomic and functional identification?

Hello

I am a beginner and I have a question about metagenomic analysis using HiFi PacBio long reads. In short read metagenomics I have seen in some papers who suggest doing taxonomic and functional profiling after assembly, to increase the precision. I was wondering if with long reads we can directly use the raw reads for profiling or it is still better to perform assembly first.

Thank you

GFA file size issue

Hi, I use hifiasm-meta to assemble urogenital tract metagenomics data from CAMI.

This data was simulated by CAMISIM, average read length: 3,000 bp, read length s.d.: 1,000 bp.

Run log:

$ hifiasm_meta -o cami_0.hifiasm_meta.out -t 32 /database/openstack.cebitec.uni-bielefeld.de/swift/v1/CAMI_Urogenital_tract/pacbio/2018.01.23_14.08.31_sample_0/reads/anonymous_reads.fq.gz

[M::hamt_assemble] Skipped read selection.
[M::ha_analyze_count] lowest: count[16383] = 0
[M::hamt_ft_gen::278.101*[email protected]] ==> filtered out 0 k-mers occurring 750 or more times
[M::hamt_assemble] Generated flt tab.
alloc 1666925 uint16_t
[M::ha_pt_gen::398.464*4.70] ==> counted 131777689 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
tot_cnt=59765
tot_pos=59765
[M::ha_pt_gen::431.595*5.13] ==> indexed 59765 positions
[M::hamt_assemble::439.470*[email protected]] ==> corrected reads for round 1
[M::hamt_assemble] # bases: 4957619989; # corrected bases: 0; # recorrected bases: 0
[M::hamt_assemble] size of buffer: 0.132GB
[M::ha_pt_gen::470.852*6.04] ==> counted 131777979 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
tot_cnt=59765
tot_pos=59765
[M::ha_pt_gen::506.590*6.28] ==> indexed 59765 positions
[M::hamt_assemble::514.866*[email protected]] ==> corrected reads for round 2
[M::hamt_assemble] # bases: 4957619989; # corrected bases: 0; # recorrected bases: 0
[M::hamt_assemble] size of buffer: 0.132GB
[M::ha_pt_gen::559.852*6.81] ==> counted 131777979 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
tot_cnt=59765
tot_pos=59765
[M::ha_pt_gen::597.090*6.98] ==> indexed 59765 positions
[M::hamt_assemble::606.630*[email protected]] ==> corrected reads for round 3
[M::hamt_assemble] # bases: 4957619989; # corrected bases: 0; # recorrected bases: 0
[M::hamt_assemble] size of buffer: 0.132GB
[M::ha_pt_gen::643.258*7.55] ==> counted 131777979 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
tot_cnt=59765
tot_pos=59765
[M::ha_pt_gen::674.827*7.68] ==> indexed 59765 positions
[M::hamt_assemble::683.525*[email protected]] ==> found overlaps for the final round
[M::ha_print_ovlp_stat] # overlaps: 0
[M::ha_print_ovlp_stat] # strong overlaps: 0
[M::ha_print_ovlp_stat] # weak overlaps: 0
[M::ha_print_ovlp_stat] # exact overlaps: 0
[M::ha_print_ovlp_stat] # inexact overlaps: 0
[M::ha_print_ovlp_stat] # overlaps without large indels: 0
[M::ha_print_ovlp_stat] # reverse overlaps: 0
[M::hist_readlength] <1.0k:
[M::hist_readlength] 1.0k: ]]]]]]]]
[M::hist_readlength] 1.5k: ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[M::hist_readlength] 2.0k: ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[M::hist_readlength] 2.5k: ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[M::hist_readlength] 3.0k: ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
[M::hist_readlength] 3.5k: ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]                                                                                                                                                                                    
[M::hist_readlength] 4.0k: ]]]]]]]]]]]]]]]]]]]]]]]
[M::hist_readlength] 4.5k: ]]]]]]]]]]]]]
[M::hist_readlength] 5.0k: ]]]]]]]
[M::hist_readlength] 5.5k: ]]]]
[M::hist_readlength] 6.0k: ]]
[M::hist_readlength] 6.5k: ]
[M::hist_readlength] 7.0k: ]
[M::hist_readlength] 7.5k: ]
[M::hist_readlength] 8.0k: ]
[M::hist_readlength] 8.5k: ]
[M::hist_readlength] 9.0k: ]
[M::hist_readlength] 9.5k: ]
[M::hist_readlength] 10.0k: ]
[M::hist_readlength] 10.5k: ]
[M::hist_readlength] 11.0k: ]
[M::hist_readlength] 11.5k: ]
[M::hist_readlength] >50.0k: 0
Writing reads to disk...
wrote cmd of length 323: version=0.13-r308, CMD= hifiasm_meta -o cami_0.hifiasm_meta.out -t 32 /database/openstack.cebitec.uni-bielefeld.de/swift/v1/CAMI_Urogenital_tract/pacbio/2018.01.23_14.08.31_sample_0/reads/anonymous_reads.fq.gz
Bin file was created on Wed Dec 30 15:31:02 2020
Hifiasm_meta 0.1-r022 (hifiasm code base 0.13-r308).
Reads has been written.
[hamt::write_All_reads] Writing per-read coverage info...
[hamt::write_All_reads] Finished writing.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
bin files have been written.
Writing raw unitig GFA to disk...
[M::hamt_output_unitig_graph_advance] Writing GFA...
[M::hamt_output_unitig_graph_advance] Writing GFA...
[M::hamt_output_unitig_graph_advance] Writing GFA...
Inconsistency threshold for low-quality regions in BED files: 70%
Writing debug asg to disk...
[M::write_debug_assembly_graph] took 0.02s

[M::main] Hifiasm code base version: 0.13-r308
[M::main] Hifiasm_meta version: 0.1-r022
[M::main] CMD: hifiasm_meta -o cami_0.hifiasm_meta.out -t 32 /database/openstack.cebitec.uni-bielefeld.de/swift/v1/CAMI_Urogenital_tract/pacbio/2018.01.23_14.08.31_sample_0/reads/anonymous_reads.fq.gz
[M::main] Real time: 691.048 sec; CPU: 5463.747 sec; Peak RSS: 16.432 GB

Output:

$ ll
.rw-r--r-- zhujie 2782    0 B  Wed Dec 30 15:31:07 2020 cami_0.hifiasm_meta.out.a_ctg.gfa
.rw-r--r-- zhujie 2782    0 B  Wed Dec 30 15:31:07 2020 cami_0.hifiasm_meta.out.a_ctg.noseq.gfa
.rw-r--r-- zhujie 2782    0 B  Wed Dec 30 15:31:07 2020 cami_0.hifiasm_meta.out.dbg_asg
.rw-r--r-- zhujie 2782  1.2 GB Wed Dec 30 15:31:04 2020 cami_0.hifiasm_meta.out.ec.bin
.rw-r--r-- zhujie 2782 38.2 MB Wed Dec 30 15:31:04 2020 cami_0.hifiasm_meta.out.ec.mt.bin
.rw-r--r-- zhujie 2782  6.7 MB Wed Dec 30 15:31:00 2020 cami_0.hifiasm_meta.out.ovecinfo.bin
.rw-r--r-- zhujie 2782  9.5 MB Wed Dec 30 15:31:04 2020 cami_0.hifiasm_meta.out.ovlp.reverse.bin
.rw-r--r-- zhujie 2782  9.5 MB Wed Dec 30 15:31:04 2020 cami_0.hifiasm_meta.out.ovlp.source.bin
.rw-r--r-- zhujie 2782    0 B  Wed Dec 30 15:31:07 2020 cami_0.hifiasm_meta.out.p_ctg.gfa
.rw-r--r-- zhujie 2782    0 B  Wed Dec 30 15:31:07 2020 cami_0.hifiasm_meta.out.p_ctg.noseq.gfa
.rw-r--r-- zhujie 2782    0 B  Wed Dec 30 15:31:07 2020 cami_0.hifiasm_meta.out.p_utg.gfa
.rw-r--r-- zhujie 2782    0 B  Wed Dec 30 15:31:07 2020 cami_0.hifiasm_meta.out.p_utg.noseq.gfa
.rw-r--r-- zhujie 2782    0 B  Wed Dec 30 15:31:07 2020 cami_0.hifiasm_meta.out.r_utg.gfa
.rw-r--r-- zhujie 2782    0 B  Wed Dec 30 15:31:07 2020 cami_0.hifiasm_meta.out.r_utg.lowQ.bed
.rw-r--r-- zhujie 2782    0 B  Wed Dec 30 15:31:07 2020 cami_0.hifiasm_meta.out.r_utg.noseq.gfa

All GFA file size is zero.

Any help? Thanks ~

Good settings for enriched similar sequences

Hi,
We are struggling to perform de novo assembly of meta bacterial samples selectively cultured with antimicrobials from wasterwater using hifiasm-meta with the default parameters. The sequencing depth seemed to be fine, but the number of circulated bacterial genomes and plasmids is not large, so the resulted contigs would not be good. We guess the cause might be due to the increased redundancy of sequences (bacterial species and plasmids). Someone knows if there are any effective settings to deal with this kind of data?
Thanks!

How to identify the circular contigs in hifiasm-meta output

Hi, all
Is there any label in the output files that indicates the circular cotings?

contig coverage

I ran hifiasm-meta HiFi reads from rumen microbiome and the contigs were generated from various bacterial and eukaryotes.
I am wondering if I can find out the coverage of each contigs.
Is there any hifiasm-meta output file can provide coverage information of each contigs?
or is there any way to extract the coverage information?

redundancy of hifiasm-meta and metaflye

hello

i test assembly efficiency of hifiasm-meta and metaflye with mock communty (MSA 1003).

For f5bcb58692924cb7_1 (ATCC-12228 , len: 2503245 bp), hifiasm-meta got 544 contigs, the longest one is 2387482 bp, and the others are shorter than 30000 bp. when I mapped these contigs to the reference genome, I found high redundancy among these contigs, especially the longest contig included lots of shorter contigs. On the other hand, metaflye got one contig, and exactly the length of the reference genome. But for 5964adb8d0df4fde_1 (ATCC-33323, len: 1854273), hifiasm-meta got 8 contigs, the coverage is good and almost no overlap existed among these 8 contigs.

So, i want to ask :
1, why different assembly results appeared for different reference genome;
2, how should I set parameters to get a set of contigs with low redundancy while maintaining high coverage.

the current parameters i set was: hifiasm_meta -t 36 --force-rs -o mock2 ../mock2.fastq.gz

thanks for your help

gfa s-line

Hello,

Could you please explain more about the S-line of the noseq.gfa file.
What does "dp" and "ts" represent for, respectively.

thank you.

A serious error in simulated data

[* stack smashing detected *]

When a data set have a high coverage, the error maybe appear. The software for simulation is pbsim2.

No circular contigs recovered

Hi,

I have tested hifiasm-meta on a pacbio hifi data obtained from fecal metagenome of a healthy human.

Below are the library statistics:

sum = 13017330229, n = 1646208, ave = 7907.46, largest = 21324
N50 = 8596, n = 605631
N60 = 7871, n = 763863
N70 = 7149, n = 937306
N80 = 6377, n = 1129752
N90 = 5392, n = 1350332
N100 = 104, n = 1646208

Below are the assembly statistics (asm.p_ctg.gfa):

sum = 831324548, n = 15560, ave = 53427.03, largest = 3704035
N50 = 132051, n = 896
N60 = 73324, n = 1769
N70 = 45743, n = 3226
N80 = 29874, n = 5501
N90 = 19672, n = 8924
N100 = 2682, n = 15560

Unfortunately, it seems that there are no circular contigs even if some contigs are very long (>3Mb)
Here is a this screenshot:

Is there something i'm doing wrong ?

Thanks for your help,
Florian

在不去除宿主的情况下，能否组装到宿主基因组

如题，我想咨询一下，在对原始序列不进行去除宿主污染的情况下，该软件能否组装到宿主基因组

Understanding which reads contribute to contigs

Hi Xiaowen,
I am wondering if it is possible to obtain a list of reads that contribute to each contig in the assembly?

This seems like it would be highly valuable for metagenomics, as it can identify all reads associated with specific bacterial genomes. In addition, it would be extremely valuable for a more specific use-case I describe below.

I am working on a problem where I am trying to assemble an endosymbiont bacteria from a larger HiFi dataset focused on the host organism. The assembly of the full dataset with hifiasm did not produce a complete bacteria contig, it was present as several smaller contigs. I attempted to re-assemble and improve the quality of these results. To accomplish this, I have:

Mapped contigs from a hifiasm assembly of the full dataset to a reference of the target bacteria, to identify and extract relevant bacteria contigs.
Mapped reads to those putative bacteria contigs to identify reads that are most likely target bacteria, and extract them.
Performed assembly with this subset of putative bacteria reads using hifiasm-meta.

This resulted in a complete, circular genome for the target bacteria, along with a few small tangled contigs, suggesting the approach worked pretty well. The small contigs in the new assembly are likely some combination of host reads and perhaps strain variation.

The genome has a few frameshifts and I would like to try polishing it using only the reads that were used to build the complete bacteria contig. I have used minimap2 to align the subset of reads to this contig, and there are several short regions in which some proportion of reads map poorly (alignments are <1000 bp and they are being hard clipped >3000 bp on each side). I think these are potentially host reads. I can filter these out using samclip, but it would be helpful to know whether or not they were used to construct this contig, and therefore deserve to be excluded.

Given metagenomic assemblies often result in several complete genomes, I think the same topic will come up. Polishing would also be desirable, but problematic read alignments would be more prevalent due to more species, shared repeats, etc. Having the ability to assign reads to particular contigs would be a tremendous help here too.

Any advice would be greatly appreciated!

Thanks,
Dan

Potential for improvement: A great test dataset here!

This project is quite exciting, but like you mentioned in your pre-print, there is very little public training data to help optimize for this use-case.

I'd like to point the authors to a substantially larger and more representative dataset: 11 real individual human HiFi fecal metagenomes (which are NOT pooled). They have a more realistic distribution of species (some highly abundant but many lower-abundant ones).

PRJNA754443
11_sra_samples.csv

Expected differences seen in this real dataset compared to the "pooled" samples used to benchmark this:

These new samples have less equitable (but arguably more realistic) distributions of microbes than the pooled samples because you aren't merging multiple non-overlapping sets of high-abundance bugs; there is more of an exponential decay in abundances.
These new samples would be expected to have potentially less tangled graphs, as they are less likely to contain mixtures of near-identical strains from different people in the same sample. Large numbers of closely-related genomes are less likely to be found within a given individual when evolutionary selection has taken place to limit the diversity of closely-related strains competing for the same resources/niches within the gut
Overall depth is slightly lower with a median of roughly 1 million reads of 7kb length.
Despite point 3, there may be more potential to capture rare microbes because these single samples have twice the effective read depth per human subject than the pooled samples which ostensibly have twice the volume of data in total.

I've run the latest version of this assembler on these samples already, and see substantially fewer closed genomes (and overall HQ mags!) per sample than the pooled samples, as expected. I aim to do numerous more experiments with some of the recent cleaning options and potentially other (graph-aware?) binning tweaks, but I don't expect the overall picture to change much.

I'm curious to see whether further improvements can be made given the availability of this larger corpus of individual-level human microbiome HiFi data.

Segmentation fault with conda installed 0.3-r063.2 (hifiasm code base 0.13-r308)

Hello,

I am trying to assemble metagenomic reads from public SRA accessions using conda installed hifiasm_meta 0.3-r063.2 and I am often receiving a segmentation fault.

For example, I tried to assemble accession SRR13392911 - "human gut metagenome sequencing" - and received this fault after the majority of the program ran, generating 5 ha_hist_line plots and 1 hist_readlength plot. It is able to write the .bin files but seems to fail at writing the .gfa files.

Writing reads to disk... 
wrote cmd of length 254: hamt version=0.3-r063.2, ha base version=0.13-r308, CMD= hifiasm_meta -o SRR13392911.v1.hifiasm_meta/SRR13392911.v1.hifiasm_meta -t 64 D1181.fastq.gz
Bin file was created on Tue Feb  7 23:26:52 2023
Hifiasm_meta 0.3-r063.2 (hifiasm code base 0.13-r308).
Reads has been written.
[hamt::write_All_reads] Writing per-read coverage info...
[hamt::write_All_reads] Finished writing.
Writing ma_hit_ts to disk... 
ma_hit_ts has been written.
Writing ma_hit_ts to disk... 
ma_hit_ts has been written.
bin files have been written.
[M::hamt_clean_graph] no debug gfa
[debug::hamt_normalize_ma_hit_t_single_side_advance] nb_batch: 2093
[M::hamt_normalize_ma_hit_t_single_side_advance] typeA 335 B 339, used 370.1s

[debug::hamt_normalize_ma_hit_t_single_side_advance] nb_batch: 2093
[M::hamt_normalize_ma_hit_t_single_side_advance] typeA 0 B 25, used 334.7s

[M::clean_weak_ma_hit_t] treated 0, used 0.1s
[M::ma_hit_sub] remained 8572776, deleted 0, used 0.1s
[M::detect_chimeric_reads_conservative] n_simple_remove: 0, n_complex_remove: 0/0, used 0.1
[M::ma_hit_cut] typeA 0 , typeB 8572488, used 0.0
[M::ma_hit_flt] typeA 0 , typeB 8572488, used 0.0s
[M::hamt_hit_contained_multi] treated roughly 0 spots, used 0.2s
[M::hamt_hit_contained_drop_singleton_multi] treated roughly 4 spots, used 0.1s
[debug::ma_hit_contained_advance] ret0: 32, used 0.02 s
[M::ma_hit_contained_advance] dropped 8572680 reads, used total of 0.15 s

[M::asg_arc_del_trans] reduced 0 arcs, used 0.1s
[M::asg_cut_tip] cut 65 tips, used 0.1 s
[M::hamt_clean_graph] ====== initial clean ======


**********0-th round drop: drop_ratio = 0.200000**********
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::hamt_asgarc_drop_tips_and_bubbles] did 0 rounds, dropped 0 spots, used 0.2 s

[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_false_node_meta] removed 0 single nodes, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_exact] removed 0 inexact overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_length] removed 0 short overlaps, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_false_link] removed 0 false overlaps, used 1.7s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_complex_false_link] removed 0 false overlaps, used 1.5s
[M::asg_cut_tip] cut 0 tips, used 0.1 s


**********1-th round drop: drop_ratio = 0.400000**********
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::hamt_asgarc_drop_tips_and_bubbles] did 0 rounds, dropped 0 spots, used 0.2 s

[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_false_node_meta] removed 0 single nodes, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_exact] removed 0 inexact overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_length] removed 0 short overlaps, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_false_link] removed 0 false overlaps, used 1.5s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_complex_false_link] removed 0 false overlaps, used 1.5s
[M::asg_cut_tip] cut 0 tips, used 0.1 s


**********2-th round drop: drop_ratio = 0.600000**********
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::hamt_asgarc_drop_tips_and_bubbles] did 0 rounds, dropped 0 spots, used 0.2 s

[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_false_node_meta] removed 0 single nodes, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_exact] removed 0 inexact overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_length] removed 0 short overlaps, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_false_link] removed 0 false overlaps, used 1.5s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_complex_false_link] removed 0 false overlaps, used 1.5s
[M::asg_cut_tip] cut 0 tips, used 0.1 s


**********3-th round drop: drop_ratio = 0.800000**********
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::hamt_asgarc_drop_tips_and_bubbles] did 0 rounds, dropped 0 spots, used 0.2 s

[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_false_node_meta] removed 0 single nodes, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_exact] removed 0 inexact overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_diploid_by_length] removed 0 short overlaps, used 0.1
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_short_false_link] removed 0 false overlaps, used 1.5s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_complex_false_link] removed 0 false overlaps, used 1.5s
[M::asg_cut_tip] cut 0 tips, used 0.1 s


********** last round **********
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.2s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::hamt_asgarc_drop_tips_and_bubbles] did 0 rounds, dropped 0 spots, used 0.3 s

[M::asg_arc_del_short_diploi_by_suspect_edge] removed 0 suspect overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_del_triangular_directly] removed 0 triangular overlaps, used 0.1s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_identify_simple_bubbles_multi] dropped total of 0, used 0.1s
[M::asg_arc_del_too_short_overlaps] removed 0 short overlaps, used 0.1s
[M::asg_cut_tip] cut 0 tips, used 0.1 s
[M::asg_arc_del_simple_circle_untig] removed 0 self-circles, used 0.1s


********** checkpoint: r_utg **********
Writing raw unitig GFA to disk... 
[M::hamt_clean_graph] ======= preclean =======
[M::hamt_ug_pop_simpleInvertBubble] popped 0 locations
[M::hamt_ug_oneutgCircleCut] treated 0 spots
[M::hamt_clean_graph] round 0, dropped 0, used 0.9s


********** checkpoint: p_utg **********

[M::hamt_output_unitig_graph_advance] Writing GFA... 
[M::hamt_ug_drop_shorter_ovlp] cut 0
[M::hamt_asgarc_ugCovCutDFSCircle_aggressive] cut 0.


[M::hamt_clean_graph] time check #1, 1.2s
[M::hamt_ug_oneutgCircleCut] treated 0 spots
[M::hamt_ug_basic_topoclean_simple] total cut: 0
/bin/bash: line 3:    61 Segmentation fault      (core dumped) hifiasm_meta -o $BD/$BD -t 64 D1181.fastq.gz

Thanks for your help.

Is it necessary to conduct binning after assembly with HiFi reads to get MAGs?

Hello, xfengnefx!

With NGS shotgun reads, to get MAGs we usually assemble pair-end reads into contigs, and then recover MAGs through binning.

What I want to ask is that for HiFi reads, in order to get MAGs with higher quality whether it is necessary to conduct binning after we get contigs using hifiasm-meta?

Thanks for your helping.

Extracting circular contigs

Hi, thanks for the great tools! I am currently testing the tools on a few datasets and was wondering if there's a straightforward way to extract the contigs that are circular? In Canu/Flye there's either header or a separate file indicating which contigs are circular, for example.

Thank you!

Segfaults using the conda package version hamtv0.3--h5b5514e_1

Hi,

I've been wrapping hifiasm_meta to install it on Galaxy and I've come across three reproducible segfaults using the biocontainer, which is built from the conda package.

I've uploaded the test data I'm using here.

Thanks!

1. The `--force-rs` parameter

Running this command causes a segfault:

singularity exec \
	docker://quay.io/biocontainers/hifiasm_meta:hamtv0.3--h5b5514e_1 \
	hifiasm_meta \
	--force-rs \
	-o asm \
	zymoD6331std-ecoli-ten-percent.42.1.fq.gz

Click for log

[M::CommandLine_process] Forced pre-ovec read selection. Ignoring count of ovlp.
[M::main] Start: Thu Jan 19 10:03:55 2023

[prof::yak_count] step 1 total 0.02 s, step2 0.06 s, step3 0.25 s.
[M::ha_analyze_count] lowest: count[16383] = 0
[prof::hamt_mark] step1 total 0.02 s, step2 0.14 s
[M::hamt_flt_withsorting] Reads sorted. 
[M::hamt_assemble] read kmer stats collected.
[prof::yak_count] step 1 total 0.03 s, step2 0.06 s, step3 0.24 s.
[M::ha_analyze_count] lowest: count[16383] = 0
[M::hamt_ft_gen::7.527*[email protected]] ==> filtered out 0 k-mers occurring 1500 or more times
[M::hamt_assemble] generated flt tab.
[M::hamt_pre_ovec_v2] Entered pre-ovec read selection.
[M::ha_pt_gen] counting - minimzers
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[M::ha_pt_gen::7.616*1.01] ==> counted 58537 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
[M::ha_pt_gen] counting - minimzer positions
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[debug::ha_pt_gen] tot_cnt is 21999, pt->tot_pos is 21999
[M::ha_pt_gen::7.709*1.01] ==> indexed 21999 positions
[prof::hamt_pre_ovec_v2] start ~ done ha_idx: 0.18 s
[M::hamt_pre_ovec_v2] 0 reads with more than desired targets(150). (Total reads: 241)
[prof::hamt_pre_ovec_v2] ha_idx ~ done estimation: 0.09 s
[M::hamt_pre_ovec_v2] Ignore estimated total number of overlaps and proceed to read selection.
[M::hamt_pre_ovec_v2] plan to keep 161 out of 241 reads (66.80%).
[M::hamt_flt_withsorting_supervised] entering round#2, pass#0...
[W::worker_process_one_read_inclusive2] read limit reached in round#2.
[prof::hamt_flt_withsorting_supervised] step1 0.10 s, step2 0.06 s, step3 0.14 s
[prof::hamt_flt_withsorting_supervised] used 0.24 s
[M::hamt_flt_withsorting_supervised] finished selection, retained 162 reads (goal was 161; total read count is 241).
[prof::hamt_pre_ovec_v2]     ~ done supervised: 0.24 s
[M::hamt_pre_ovec_v2] finished read selection, took 0.51s.
[M::hamt_assemble] read selection dropped reads, recalculate ha_flt_tab...
[prof::yak_count] step 1 total 0.00 s, step2 0.05 s, step3 0.18 s.
[M::ha_analyze_count] lowest: count[16383] = 0
[M::hamt_ft_gen::9.574*[email protected]] ==> filtered out 0 k-mers occurring 1500 or more times
[M::hamt_assemble] finished redo ha_flt_tab.


[M::hamt_assemble] entered read correction round 1
[M::ha_pt_gen] counting - minimzers
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[M::ha_pt_gen::9.665*1.01] ==> counted 58537 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
[M::ha_pt_gen] counting - minimzer positions
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[debug::ha_pt_gen] tot_cnt is 21999, pt->tot_pos is 21999
[M::ha_pt_gen::9.757*1.01] ==> indexed 21999 positions
[M::hamt_assemble::9.842*[email protected]] ==> corrected reads for round 1
[M::hamt_assemble] # bases: 1696054; # corrected bases: 8; # recorrected bases: 0
[M::hamt_assemble] size of buffer: 0.007GB
[probe::hamt_assemble] used 0.27 s


[M::hamt_assemble] entered read correction round 2
[M::ha_pt_gen] counting - minimzers
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[M::ha_pt_gen::9.933*1.01] ==> counted 58533 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
[M::ha_pt_gen] counting - minimzer positions
[prof::yak_count] step 1 total 0.00 s, step2 0.09 s, step3 0.00 s.
[debug::ha_pt_gen] tot_cnt is 22004, pt->tot_pos is 22004
[M::ha_pt_gen::10.027*1.01] ==> indexed 22004 positions
[M::hamt_assemble::10.111*[email protected]] ==> corrected reads for round 2
[M::hamt_assemble] # bases: 1696052; # corrected bases: 0; # recorrected bases: 0
[M::hamt_assemble] size of buffer: 0.007GB
[probe::hamt_assemble] used 0.27 s


[M::hamt_assemble] entered read correction round 3
[M::ha_pt_gen] counting - minimzers
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[M::ha_pt_gen::10.201*1.01] ==> counted 58533 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
[M::ha_pt_gen] counting - minimzer positions
[prof::yak_count] step 1 total 0.00 s, step2 0.09 s, step3 0.00 s.
[debug::ha_pt_gen] tot_cnt is 22004, pt->tot_pos is 22004
[M::ha_pt_gen::10.295*1.01] ==> indexed 22004 positions
[M::hamt_assemble::10.377*[email protected]] ==> corrected reads for round 3
[M::hamt_assemble] # bases: 1696052; # corrected bases: 0; # recorrected bases: 0
[M::hamt_assemble] size of buffer: 0.007GB
[probe::hamt_assemble] used 0.27 s
[M::ha_pt_gen] counting - minimzers
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[M::ha_pt_gen::10.467*1.01] ==> counted 58533 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
[M::ha_pt_gen] counting - minimzer positions
[prof::yak_count] step 1 total 0.00 s, step2 0.09 s, step3 0.00 s.
[debug::ha_pt_gen] tot_cnt is 22004, pt->tot_pos is 22004
[M::ha_pt_gen::10.560*1.01] ==> indexed 22004 positions
[M::hamt_assemble::10.622*[email protected]] ==> found overlaps for the final round
[probe::hamt_assemble] used 0.25 s
[M::ha_print_ovlp_stat] # overlaps: 57
[M::ha_print_ovlp_stat] # strong overlaps: 0
[M::ha_print_ovlp_stat] # weak overlaps: 57
[M::ha_print_ovlp_stat] # exact overlaps: 5
[M::ha_print_ovlp_stat] # inexact overlaps: 52
[M::ha_print_ovlp_stat] # overlaps without large indels: 57
[M::ha_print_ovlp_stat] # reverse overlaps: 0
[M::hist_readlength] <  1.0k: 0
[M::hist_readlength] 1.0k: 0
[M::hist_readlength] 1.5k: 0
[M::hist_readlength] 2.0k: 0
[M::hist_readlength] 2.5k: 0
[M::hist_readlength] 3.0k: ] 3
[M::hist_readlength] 3.5k: ] 6
[M::hist_readlength] 4.0k: ] 3
[M::hist_readlength] 4.5k: ] 13
[M::hist_readlength] 5.0k: ] 5
[M::hist_readlength] 5.5k: ] 15
[M::hist_readlength] 6.0k: ] 13
[M::hist_readlength] 6.5k: ] 8
[M::hist_readlength] 7.0k: ] 14
[M::hist_readlength] 7.5k: ] 12
[M::hist_readlength] 8.0k: ] 11
[M::hist_readlength] 8.5k: ] 7
[M::hist_readlength] 9.0k: ] 8
[M::hist_readlength] 9.5k: ] 10
[M::hist_readlength] 10.0k: ] 8
[M::hist_readlength] 10.5k: ] 9
[M::hist_readlength] 11.0k: ] 12
[M::hist_readlength] 11.5k: ] 6
[M::hist_readlength] 12.0k: ] 6
[M::hist_readlength] 12.5k: ] 9
[M::hist_readlength] 13.0k: ] 5
[M::hist_readlength] 13.5k: ] 5
[M::hist_readlength] 14.0k: ] 6
[M::hist_readlength] 14.5k: ] 5
[M::hist_readlength] 15.0k: ] 4
[M::hist_readlength] 15.5k: ] 6
[M::hist_readlength] 16.0k: ] 4
[M::hist_readlength] 16.5k: ] 4
[M::hist_readlength] 17.0k: ] 1
[M::hist_readlength] 17.5k: ] 3
[M::hist_readlength] 18.0k: ] 2
[M::hist_readlength] 18.5k: ] 3
[M::hist_readlength] 19.0k: ] 1
[M::hist_readlength] 19.5k: ] 3
[M::hist_readlength] 20.0k: ] 1
[M::hist_readlength] 20.5k: 0
[M::hist_readlength] 21.0k: ] 2
[M::hist_readlength] 21.5k: ] 1
[M::hist_readlength] 22.0k: ] 5
[M::hist_readlength] 22.5k: ] 1
[M::hist_readlength] 23.0k: 0
[M::hist_readlength] 23.5k: 0
[M::hist_readlength] 24.0k: 0
[M::hist_readlength] 24.5k: 0
[M::hist_readlength] 25.0k: 0
[M::hist_readlength] 25.5k: 0
[M::hist_readlength] 26.0k: 0
[M::hist_readlength] 26.5k: 0
[M::hist_readlength] 27.0k: ] 1
[M::hist_readlength] >50.0k: 0
Writing reads to disk... 
wrote cmd of length 245: hamt version=0.2-r058, ha base version=0.13-r308, CMD= /usr/local/bin/hifiasm_meta --force-rs -o asm zymoD6331std-ecoli-ten-percent.42.1.fq.gz
Bin file was created on Thu Jan 19 10:04:06 2023
Hifiasm_meta 0.2-r058 (hifiasm code base 0.13-r308).
Reads has been written.
[hamt::write_All_reads] Writing per-read coverage info...
[hamt::write_All_reads] Finished writing.
Writing ma_hit_ts to disk... 
ma_hit_ts has been written.
Writing ma_hit_ts to disk... 
ma_hit_ts has been written.
bin files have been written.
[M::hamt_clean_graph] no debug gfa
[debug::hamt_normalize_ma_hit_t_single_side_advance] nb_batch: 4
[M::hamt_normalize_ma_hit_t_single_side_advance] takes 0.00s, typeA 17 B 39

[debug::hamt_normalize_ma_hit_t_single_side_advance] nb_batch: 4
[M::hamt_normalize_ma_hit_t_single_side_advance] takes 0.00s, typeA 0 B 0

[M::clean_weak_ma_hit_t] takes 0.00 s, treated 0

[debug::ma_hit_contained_advance] ret0: 1, used 0.00 s
[M::ma_hit_contained_advance] dropped 225 reads, used total of 0.00 s

Writing raw unitig GFA to disk... 
[M::hamt_ug_pop_simpleInvertBubble] popped 0 locations
[M::hamt_ug_oneutgCircleCut] treated 0 spots
[M::hamt_output_unitig_graph_advance] Writing GFA... 
[M::hamt_ug_drop_shorter_ovlp] cut 0
[M::hamt_asgarc_ugCovCutDFSCircle_aggressive] cut 0.
[M::hamt_ug_oneutgCircleCut] treated 0 spots
[M::hamt_ug_basic_topoclean_simple] total cut: 0
Segmentation fault (core dumped)

2. Trying to write to an output directory that doesn't exist

Running this command causes a segfault:

ls -lh . &&
singularity exec \
	docker://quay.io/biocontainers/hifiasm_meta:hamtv0.3--h5b5514e_1 \
	hifiasm_meta \
	-o somedir/asm \
	zymoD6331std-ecoli-ten-percent.42.1.fq.gz

Click for log

-rw-rw-r-- 1 tom tom 912K Jan 19 09:54 zymoD6331std-ecoli-ten-percent.42.1.fq.gz
INFO:    Using cached SIF image
[M::main] Start: Thu Jan 19 10:06:12 2023

[M::hamt_assemble] Skipped read selection.
[prof::yak_count] step 1 total 0.02 s, step2 0.06 s, step3 0.23 s.
[M::ha_analyze_count] lowest: count[16383] = 0
[M::hamt_ft_gen::3.800*[email protected]] ==> filtered out 0 k-mers occurring 750 or more times
[M::hamt_assemble] Generated flt tab.


[M::hamt_assemble] entered read correction round 1
[M::ha_pt_gen] counting - minimzers
[prof::yak_count] step 1 total 0.03 s, step2 0.08 s, step3 0.00 s.
[M::ha_pt_gen::3.919*1.01] ==> counted 58537 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
[M::ha_pt_gen] counting - minimzer positions
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[debug::ha_pt_gen] tot_cnt is 21999, pt->tot_pos is 21999
[M::ha_pt_gen::4.010*1.01] ==> indexed 21999 positions
[M::hamt_assemble::4.184*[email protected]] ==> corrected reads for round 1
[M::hamt_assemble] # bases: 2515730; # corrected bases: 27; # recorrected bases: 0
[M::hamt_assemble] size of buffer: 0.007GB
[probe::hamt_assemble] used 0.38 s


[M::hamt_assemble] entered read correction round 2
[M::ha_pt_gen] counting - minimzers
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[M::ha_pt_gen::4.270*1.01] ==> counted 58533 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
[M::ha_pt_gen] counting - minimzer positions
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[debug::ha_pt_gen] tot_cnt is 22004, pt->tot_pos is 22004
[M::ha_pt_gen::4.361*1.01] ==> indexed 22004 positions
[M::hamt_assemble::4.530*[email protected]] ==> corrected reads for round 2
[M::hamt_assemble] # bases: 2515713; # corrected bases: 2; # recorrected bases: 0
[M::hamt_assemble] size of buffer: 0.007GB
[probe::hamt_assemble] used 0.35 s


[M::hamt_assemble] entered read correction round 3
[M::ha_pt_gen] counting - minimzers
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[M::ha_pt_gen::4.617*1.01] ==> counted 58533 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
[M::ha_pt_gen] counting - minimzer positions
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[debug::ha_pt_gen] tot_cnt is 22004, pt->tot_pos is 22004
[M::ha_pt_gen::4.707*1.01] ==> indexed 22004 positions
[M::hamt_assemble::4.875*[email protected]] ==> corrected reads for round 3
[M::hamt_assemble] # bases: 2515713; # corrected bases: 0; # recorrected bases: 0
[M::hamt_assemble] size of buffer: 0.007GB
[probe::hamt_assemble] used 0.34 s
[M::ha_pt_gen] counting - minimzers
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[M::ha_pt_gen::4.961*1.01] ==> counted 58533 distinct minimizer k-mers
[M::ha_pt_gen] count[16383] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[16383] = 0
[M::ha_pt_gen] counting - minimzer positions
[prof::yak_count] step 1 total 0.00 s, step2 0.08 s, step3 0.00 s.
[debug::ha_pt_gen] tot_cnt is 22004, pt->tot_pos is 22004
[M::ha_pt_gen::5.051*1.01] ==> indexed 22004 positions
Segmentation fault (core dumped)

This works fine:

mkdir somedir &&
singularity exec \
	docker://quay.io/biocontainers/hifiasm_meta:hamtv0.3--h5b5514e_1 \
	hifiasm_meta \
	-o somedir/asm \
	zymoD6331std-ecoli-ten-percent.42.1.fq.gz

3. Running `hifiasm_meta --help`, or another parameter that doesn't exist

singularity exec \
	docker://quay.io/biocontainers/hifiasm_meta:hamtv0.3--h5b5514e_1 \
	hifiasm_meta --help

[ERROR] unknown option in "--help"
[M::main] Start: Thu Jan 19 10:08:51 2023

[M::hamt_assemble] Skipped read selection.
Segmentation fault (core dumped)

singularity exec \
	docker://quay.io/biocontainers/hifiasm_meta:hamtv0.3--h5b5514e_1 \
	hifiasm_meta --someparam

[ERROR] unknown option in "--someparam"
[M::main] Start: Thu Jan 19 10:09:34 2023

[M::hamt_assemble] Skipped read selection.
Segmentation fault (core dumped)

`hifiasm_meta --version` doesn't exit cleanly (not a segfault)

I get an exit code of 1 for this.

singularity exec \
	docker://quay.io/biocontainers/hifiasm_meta:hamtv0.3--h5b5514e_1 \
	hifiasm_meta --version ;
echo $?

ha base version: 0.13-r308
hamt version: 0.2-r058
1

fail to Write GFA file

Hi xfengenfx

recently，i use the hifiam-meta to assemble my metagenomic HIFI data，i encountered same error in two times at two compute
cluster，which shows stop at the Writing GFA step suddenly. here is my two log file, the first one was in the slurm system,the second one was in the usual system. so i can't get my final contig GFA file ,can you figure it out for me.
job-26237_1.err.txt
nohup.out.txt
appreciate it

Input file?

Hello

Sorry if I am asking silly questions. I am a beginner and a bit confused with the pipeline. Is there any tutorial showing how to run this workflow? I do not know where should we add our input HiFi FASTA files to run the workflow?!
Also it seems all the outputs are graphs (Contig graph, Raw unitig graph, Cleaned unitig graph). Will we get SAMPLE.contigs.fasta files to use in "HiFi-MAG-Pipeline"?

Thanks

Some warnings on compilation

I get a few warnings; maybe they're of note?

htab.cpp: In function ‘yak_bf_t* yak_bf_init(int, int)’:
htab.cpp:95:23: warning: ignoring return value of ‘int posix_memalign(void**, size_t, size_t)’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
   95 |         posix_memalign(&ptr, 1<<(YAK_BLK_SHIFT-3), 1ULL<<(n_shift-3));
      |         ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In file included from Overlaps_hamt.h:4,
                 from Overlaps_hamt.cpp:10:
Overlaps_hamt.cpp: In function ‘void hamt_asgarc_util_get_the_two_targets(asg_t*, uint32_t, uint32_t*, uint32_t*, int, int, int)’:
Overlaps_hamt.cpp:1756:15: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
 1756 |     assert(idx=2);  // deadly
      |            ~~~^~
In file included from htab.cpp:11:
htab.cpp: In function ‘ha_pt_t* ha_pt_gen(ha_ct_t*, int)’:
htab.h:112:58: warning: argument 1 range [18446744071562067968, 18446744073709551615] exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
  112 | #define CALLOC(ptr, len) ((ptr) = (__typeof__(ptr))calloc((len), sizeof(*(ptr))))
      |                                                    ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
htab.cpp:366:9: note: in expansion of macro ‘CALLOC’
  366 |         CALLOC(pt->h, 1<<pt->pre);
      |         ^~~~~~
In file included from /usr/include/c++/11/cstdlib:75,
                 from /usr/include/c++/11/stdlib.h:36,
                 from htab.cpp:5:
/usr/include/stdlib.h:542:14: note: in a call to allocation function ‘void* calloc(size_t, size_t)’ declared here
  542 | extern void *calloc (size_t __nmemb, size_t __size)
      |              ^~~~~~
In file included from Hash_Table.h:3,
                 from Assembly.cpp:8:
Assembly.cpp: In function ‘void ha_overlap_and_correct(int)’:
htab.h:112:58: warning: argument 1 range [18446744071562067968, 18446744073709551615] exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
  112 | #define CALLOC(ptr, len) ((ptr) = (__typeof__(ptr))calloc((len), sizeof(*(ptr))))
      |                                                    ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
Assembly.cpp:1089:9: note: in expansion of macro ‘CALLOC’
 1089 |         CALLOC(e, asm_opt.thread_num);
      |         ^~~~~~
In file included from /usr/include/c++/11/cstdlib:75,
                 from /usr/include/c++/11/stdlib.h:36,
                 from Assembly.cpp:2:
/usr/include/stdlib.h:542:14: note: in a call to allocation function ‘void* calloc(size_t, size_t)’ declared here
  542 | extern void *calloc (size_t __nmemb, size_t __size)
      |              ^~~~~~
htab.cpp: In function ‘void worker_process_one_read_HPC(plmt_step_t*, int)’:
htab.cpp:1265:28: warning: ‘buf’ may be used uninitialized [-Wmaybe-uninitialized]
 1265 |         double mean = meanl(buf, idx);
      |                       ~~~~~^~~~~~~~~~
In file included from htab.h:7,
                 from htab.cpp:11:
meta_util.h:10:8: note: by argument 1 of type ‘const uint16_t*’ {aka ‘const short unsigned int*’} to ‘double meanl(const uint16_t*, uint32_t)’ declared here
   10 | double meanl(const uint16_t *counts, uint32_t l);
      |        ^~~~~
htab.cpp: In function ‘void worker_process_one_read_noHPC(plmt_step_t*, int)’:
htab.cpp:1320:28: warning: ‘buf’ may be used uninitialized [-Wmaybe-uninitialized]
 1320 |         double mean = meanl(buf, idx);
      |                       ~~~~~^~~~~~~~~~
In file included from htab.h:7,
                 from htab.cpp:11:
meta_util.h:10:8: note: by argument 1 of type ‘const uint16_t*’ {aka ‘const short unsigned int*’} to ‘double meanl(const uint16_t*, uint32_t)’ declared here
   10 | double meanl(const uint16_t *counts, uint32_t l);
      |        ^~~~~

xfengnefx / hifiasm-meta Goto Github PK

hifiasm-meta's People

Contributors

Stargazers

Watchers

Forkers

hifiasm-meta's Issues

Donor 1

Donor 2

[*** stack smashing detected ***]

1. The --force-rs parameter

2. Trying to write to an output directory that doesn't exist

3. Running hifiasm_meta --help, or another parameter that doesn't exist

hifiasm_meta --version doesn't exit cleanly (not a segfault)

Recommend Projects

Recommend Topics

Recommend Org

[* stack smashing detected *]

1. The `--force-rs` parameter

3. Running `hifiasm_meta --help`, or another parameter that doesn't exist

`hifiasm_meta --version` doesn't exit cleanly (not a segfault)