eblerjana / pangenie Goto Github PK

View Code? Open in Web Editor NEW

103.0 103.0 10.0 1.35 MB

Pangenome-based genome inference

License: MIT License

CMake 0.38% Python 11.41% C++ 88.21%

pangenie's People

Contributors

Stargazers

Watchers

Forkers

asleonard wook2014 josephlalli yonghaoy hhaentze sajadtavakoli han-cao xuelei-dai lierking xiamaz

pangenie's Issues

Should i hard-masked the assemblies firstly

Hi !
I used your pipeline pangenome-graph-from-assemblies to construct a VCF file, but I noticed that some of the variants have an unusually high number of alleles. see belows:
1 18300 . GACACACAACACACAC GACAGACAGACACACACAC,GACAGACAGACAGACAGACACAC,GACAGACAGACAGACAGACAGACACACACACACACACACAC,G,GACACACACACACACACACACAC,GACAGACAGACAGACACACACACACACAC
1 19171 . CAAAAAAAA CAA,CAAA,CAAAAAAAAAAAA,CAAAAAAAAAAAAAAAAAAAA,CAAAAAAAAAAAAAAAAAAAAA,CAAAAAAAAAAAAAAAAAAAAAA,CAAAAA,CAAAA,CAAAAAAAAAA,C,CA,CAAAAAAAAAAA
I suspect that this may be due to repetitive sequences in my input assemblies, which were directly obtained from hifiasm assembly results without being masked.

[question] Can variants vcf is single chromosome ?

Hi, @eblerjana

Can the variants vcf is single chromosome ? For large-scale samples parallel, can we just use one chromosome vcf for genotyping/phasing? Or the pangenie must have whole genome variants for more precisely kmer counting and genotyping?

Thanks.
Zhigui Bao

Counting kmers from reads uses too much memory compared to jf

Hi Jana,

I've noticed that the memory usage when counting kmers directly from reads (and not a .jf) is incredibly high, and it looks like PanGenie reads the entire fastq file into memory and then counts the kmers. I think this also leads to under reporting of usage, because the PanGenie log for a run says

Count kmers in genome ...
#### Memory usage until now: 61.7867 GB ####
Determine unique kmers ...
#### Memory usage until now: 79.1394 GB ####
....
Total maximum memory usage: 84.4493 GB

When the same cluster reports peak usage of 223.6 Gb. At one point during kmer counting it was 176.3 Gb, so it definitely seems like the kmer counting is a big bottleneck, especially for moderate/high coverage samples.

Just running jellyfish by itself to produce the .jf file only peaked at 42.1 Gb, so that seems like a better approach currently.

I tried and very strongly failed at trying to implement some streaming feature in jellyfish. It looks like they have some code set up for that, but I didn't last long enough.

Normalization of genotype likelihoods from subsampling steps

Currently, genotype likelihoods from subsampling steps are added and normalized after each step. A better approach would be to first add up all likelihoods, and then do the normalization once at the end. Otherwise, the likelihoods are not weighted equally, which might lead to genotyping errors.

pangenie/src/pggtyper.cpp

Line 96 in ae47956

// normalize the likelihoods after they have been combined

Truncated genotyping

Hi Jana,

I ran PanGenie for genotyping on my own dataset, and it finished without errors or warnings. However, I found on all 11 chromosomes, the genotyping results stopped too early, around 11 Mb or 13 Mb while the the chromosome length should be close to 30 Mb.
What is your recommendation to solve this problem?

Thank you!

Honghe

GraphBuilder: Found invalid genotype. (Minigraph cactus vcf)

Hello,

I'am trying to genotype short reads using pangenie but during the indexing step with the vcf output of minigraph cactus i'am encountering this error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  GraphBuilder::GraphBuilder: Found invalid genotype. Genotypes must be diploid (.|. if missing).
Aborted (core dumped)

Here is a sample from my minigraph cactus vcf :

chr1    13551   >2370>2374      T       C,A     60.0    .       AC=1,16;AF=0.0434783
=>2370>2371>2374,>2370>2373>2374,>2370>2372>2374;NS=23;LV=0     GT      0       2   2       2       2       2       2       2       .       .       .       .       2   2       0       .       2       .       0       0       2       2       0       1   .       2       .

I'am not working on the human genome so i can't use the pre-processing pipeline, does my vcf need pre-processing after minigraph cactus (apart from vcfbub).

Thank you beforehand.

Can long-reads be use for <reads.fa/fq>？

Hi,
Thank you very much for such a nice tool. usage: PanGenie [options] -i <reads.fa/fq> -r <reference.fa> -v <variants.vcf>
I fund the paper (https://www.biorxiv.org/content/10.1101/2020.11.11.378133v1）said that ：we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation....

May I ask if the pangenie software can only use short reads? Can long-reads be use for <reads.fa/fq> (such as Nanopore or PacBio reads)? ，What is the advantage over the second generation
The genotype of the outfile is the genotype of reads (i.e -i <reads.fa/fq>) that corresponds to the sample, rigth？

Thanks.

PanGenie nondeterminstic when executed with a Jellyfish database and multiple threads

Running PanGenie with a Jellyfish database in jf format and multiple threads (-t) produces randomized output.
Executing it with either a .jf file or multiple threads works fine, but not with both parameters provided.

Steps to Reproduce

jellyfish count -m 31 -s 3G -p 126 -c 7 -C -L 1 --if <path-segments.fasta> <input.fasta> -o input.jf
PanGenie -i input.jf -r <reference.fa> -v <variants.vcf>  -t 2 -o V1
PanGenie -i input.jf -r <reference.fa> -v <variants.vcf>  -t 2 -o V2
diff V1_genotyping.vcf V2_genotyping.vcf | wc -l

Expected Result

Both genotyping files should be the same.

diff V1_genotyping.vcf V2_genotyping.vcf | wc -l
0

Actual Result

However, there are many (small) differences:

diff V1_genotyping.vcf V2_genotyping.vcf | wc -l
10790792

How to Fix

The problem seems to be within the JellyfishReader class. I'm not familiar with the source code of Jellyfish, but I presume that the line abundance = this->db->check(jelly_kmer); in the function getKmerAbundance cannot be parallized as is.

I tried guarding the line with a mutex and this does resolve the problem. However, the parallelization overhead of the mutex makes the "determining unique kmers" section almost as slow as the serial execution, so it's definitely not an optimal solution

Remark: The bug cannot be observed in the provided test data, because the VCF file only contains a single chromosome. In that case PanGenie will only submit a single thread, regardless of how many were specified in the command line.

How is Jellyfish database in jf format created for PanGenie?

The PanGenie command line help indicates that a jellyfish database in jf format is an option for input.

-i VAL sequencing reads in FASTA/FASTQ format or Jellyfish database in jf format.
NOTE: INPUT FASTA/Q FILE MUST NOT BE COMPRESSED. (required).

What is the recommended JellyFish command for creating the jf file? How does the jf formated file compare in size to the uncompressed fasta or fastq file?

FastaReader::parse_file: reference file cannot be opened

Hi, pangenie group,
I met a problem when I run the pangenie with singularity, it seems that it cannot open the reference fasta file, which I can read. The error is listed below. In addition, I'd like to ask how to select an appropriate reference fasta file, is it also a part of our pan-genome?

singularity exec /share/apps/gene/pangenie-1.0.1/pangenie-new-gcc.sif cat /metadata/jellyfish.lib.version
libjellyfish-2.0-2:amd64 2.3.0-4build1 libjellyfish-2.0-dev:amd64 2.3.0-4build1

singularity exec /share/apps/gene/pangenie-1.0.1/pangenie-new-gcc.sif cat /metadata/pangenie.git.version
6e0da25

singularity exec /share/apps/gene/pangenie-1.0.1/pangenie-new-gcc.sif PanGenie -i $input.fastq -r $reference.fa -v $graph.vcf -o test

program: PanGenie - genotyping and phasing based on kmer-counting and known haplotype sequences.
author: Jana Ebler

Files and parameters used:
-c	0
-d	0
-e	3000000000
-g	1
-i	$input.fastq
-j	1
-k	31
-o	test
-p	0
-r	$reference.fa
-s	sample
-t	1
-u	0
-v	$graph.vcf
Determine allele sequences ...
terminate called after throwing an instance of 'std::runtime_error'
  what():  FastaReader::parse_file: reference file cannot be opened.
Aborted

Install from source error in v3.0

Hi @eblerjana ,

For the newest code in master branch, I encountered some errors like this:

[  1%] Building CXX object src/CMakeFiles/PanGenieLib.dir/emissionprobabilitycomputer.cpp.o
In file included from /public/home/wjwei/SFTW/pangenie-3.0.0/src/uniquekmers.hpp:9,
                 from /public/home/wjwei/SFTW/pangenie-3.0.0/src/emissionprobabilitycomputer.hpp:8,
                 from /public/home/wjwei/SFTW/pangenie-3.0.0/src/emissionprobabilitycomputer.cpp:5:
/public/home/wjwei/SFTW/pangenie-3.0.0/src/kmerpath.hpp:8:10: fatal error: cereal/access.hpp: No such file or directory
    8 | #include <cereal/access.hpp>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [src/CMakeFiles/PanGenieLib.dir/emissionprobabilitycomputer.cpp.o] Error 1
make[1]: *** [src/CMakeFiles/PanGenieLib.dir/all] Error 2
make: *** [all] Error 2

After using conda to create a env, cereal are here:

ll /public/home/wjwei/SFTW/conda/envs/pangenie/include/cereal/
total 515K
-rw-rw-r-- 2 wjwei JBYan  15K Feb 28  2022 access.hpp
drwxr-xr-x 2 wjwei JBYan 4.0K Aug 22 21:48 archives/
-rw-rw-r-- 2 wjwei JBYan  47K Feb 28  2022 cereal.hpp
drwxr-xr-x 2 wjwei JBYan 4.0K Aug 22 21:48 details/
drwxr-xr-x 4 wjwei JBYan 4.0K Aug 22 21:48 external/
-rw-rw-r-- 2 wjwei JBYan 6.6K Feb 28  2022 macros.hpp
-rw-rw-r-- 2 wjwei JBYan 6.5K Feb 28  2022 specialize.hpp
drwxr-xr-x 3 wjwei JBYan 4.0K Aug 22 21:48 types/
-rw-rw-r-- 2 wjwei JBYan 2.4K Feb 28  2022 version.hpp

So, I'm confused about the source-code install.

Looking forward to your reply sincerely.

Best wishes, Wei

Genotyping with breakpoint deviations

Hi Jana,

Since PanGenie relies on exact kmer matches, I want to know if it will perform well with input SVs that have imprecise breakpoints? Have you investigated how inaccurate breakpoints impact the genotyping results? Thank you.

Best regards,
Zheng zhuqing

what(): GraphBuilder: number of paths is limited to 254 in current implementation.

Description:

I'm encountering an issue when running the following command:
PanGenie-index -r test.fa -v test.vcf -t 24 -o pan_test
The error message I'm receiving is:

program: PanGenie - genotyping based on kmer-counting and known haplotype sequences.
author: Jana Ebler

version: v3.0.1
Files and parameters used:
-e      3000000000
-k      31
-o     pan_test
-r      test.fa
-t      24
-v      test.vcf
Determine allele sequences ...
Read reference genome ...
Found 168 chromosome(s) from the reference file.
Read input VCF ...
terminate called after throwing an instance of 'std::runtime_error'
what():  GraphBuilder: number of paths is limited to 254 in current implementation.

In the test.vcf file, there are 170 samples, but Pangenie supports a maximum of 254 haplotypes, which corresponds to 127 samples. Can I extract an individual from the test.vcf file, change all the genotypes to 1/1 for this individual, and then proceed with Pangenie-index?

Does pangenie support ARM architecture?

I want to install pangenie in ARM server, but it always issue errors, I don't know weather pangenie support ARM architecture, If you know it, you can tell me. thanks.

Package 'jellyfish-2.0', required by 'virtual:world', not found

When i have download pangenie from github and install jellyfish-2.2.10 by conda. The error :
-- The C compiler identification is GNU 8.5.0
-- The CXX compiler identification is GNU 8.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PkgConfig: /usr/bin/pkg-config (found version "1.4.2")
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.11")
-- Checking for module 'jellyfish-2.0'
-- Package 'jellyfish-2.0', required by 'virtual:world', not found
CMake Error at /usr/share/cmake/Modules/FindPkgConfig.cmake:556 (message):
A required package was not found
Call Stack (most recent call first):
/usr/share/cmake/Modules/FindPkgConfig.cmake:778 (_pkg_check_modules_internal)
CMakeLists.txt:11 (pkg_check_modules)

-- Configuring incomplete, errors occurred!

High FP call after running Truvari on HG002

Hi @eblerjana,

I've been working with PanGenie more often since my last issue. I was keen on assessing the performance for the genome inference of HG002, so I benchmarked my PnaGenie outputs for this sample – obtained after running the tool on both my pangenome graphs (built with minigraph-CACTUS and PGGB) – against the truth set from GIAB.

However, despite not being as good as the HPRC one the recall is reasonable, on the other hand the precision is straight up bad... after talking with the team at the HPRC here in Santa Cruz they pointed out how most of these FP lowering the precision value are likely on satellite other difficult regions (I guess?), and explained to me that most of the improvement of a PanGenie run in terms of precision is due to the post-processing filtering with a ML approach (I've been reading this #14).

Therefore, my question is how can I run this script for just my single HG002 sample? What I got from the issue #14 is that this is particularly beneficial for large cohort with trio data; is there a way to assess it a single sample? Also, you pointed out to a concept of genotype quality which might work in the absence of the former information..., how can I assess that eventually?

Thanks in advance and looking up online I came across these two links:

Are they anyhow related on how the ML approach works or is based on? This is just for my understanding. Thanks again!

Can we start from a plain VCF called to pan-sequence

I have vcf files generated by aligning fastq files to the pan-sequences fasta. The BAM files so produced, were used in CNVcaller to produce a VCF file. Can i use this vcf fiIe for pangenie? I read your paper on Pangenie, hence, I wondered, if I may ask you how to start to achieve this.

Joint-genotyping across samples

Is there a way to generate a joint-genotyped vcf with pangenie? If I understand the software correctly, only the variants in the pangenome reference are called across all the genotyped samples while the non-pangenome variants detected are called within individual samples.

Error while launching from Snakefile

Hi there,

My aim is to get a genome inference for a human sample (HG005) to benchmark my graph in terms of SVs calling using PanGenIe. I assembled both with minigraph-CACTUS (for which I'm testing right now) and with pggb.

With that said, despite the pangenomes are quite small (only five individuals), after correctly outputting the .vcf files for both of them, I incur in an error when launching from Snakefile. Below the output for the command:

File "Snakefile", line 13
    rule all:
           ^
SyntaxError: invalid syntax

Now, the reason I'm doing this is because of the non-overlapping variants requirement. Therefore, considering my .vcf haven't been generated using your pipeline, I'm assuming they are not in accordance to the specific standard you mentioned (particularly the third one); hence why, I proceeded this way.

One last mention, I'm not very experience with Snakefiles, and after looking online I've seen they are launched invoking python3; is that the correct way? Let me know, thanks in advance!

P.S. attached a version of the .yaml edited to conform the inputs for the script (config.log)

Assertion error when running PanGenie with my VCF file

Hello,

When I try to run PanGenie using my VCF file, I get an error message. Here is the error message I receive:

GraphBuilder: skip variant at chr1:1000404 since it is contained in a previous one.
PanGenie:/biosoft/pangenie/src/graph.cpp:80: void Graph::add_variant_cluster(std::vector<std::shared_ptr >*, std::vector<std::vector<std::__cxx11::basic_string > >&, bool): Assertion `defined_alleles.size() == (variant_ids[i].size()+1)' failed.
/var/spool/job55890972/slurm_script: line 30: 11280 Aborted

I ensured the reference genome matches the variants in the VCF file. However, I continue to encounter this error.

Would you be able to assist me in understanding what is causing this problem? Could you please let me know if there are any specific requirements or formats for the VCF file that I have overlooked?
Thank you for your help.
Best,
Hongbo

Which step used the graph pangenome?

Hello, authors,

At the demo, 'test-variants.vcf' was a regular variation vcf file. I am confused which step used the graph pangenome format with Extended Data Fig. 1. I used the 'pangenome-graph-from-assemblies' pipline to generate a callset.vcf file. What should I do next? Should I use vcfbub to convert it into a bubble file and then perform genotyping?

I would appreciate your help.

Tian Tian

Failed to allocate 17333333368 bytes of memory

Description:

I'm encountering an issue when running the Pangenie:

Files and parameters used:
-a      0
-c      0
-d      0
-e      3000000000
-g      0
-i      /BIGDATA2/scau_jyang_1/software/EVG-main/test/genotype/PanGenie/test1/test1.fq
-j      10
-k      31
-o      test1
-p      0
-r      /BIGDATA2/scau_jyang_1/software/EVG-main/test/convert/convert.test.fa
-s      test1
-t      10
-u      0
-v      /BIGDATA2/scau_jyang_1/software/EVG-main/test/genotype/PanGenie/test1/test1.vcf
Determine allele sequences ...
Found 1 chromosome(s) from the reference file.
Identified 769 variants in total from VCF-file.
#### Memory usage until now: 0.008824 GB ####
Write path segments to file: test1_path_segments.fasta ...
Found 1 chromosome(s) in the VCF.
#### Memory usage until now: 0.008824 GB ####
Count kmers in reads ...
Histogram peak: 1 (3323)
Computed kmer abundance peak: 1
Count kmers in genome ...
terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::unbounded_array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
  what():  Failed to allocate 17333333368 bytes of memory.

I am confused by this issue and would appreciate any guidance on how to resolve it.

Genome graph construction, variant filtering & phasing

Hi,

Thanks for your work on this program. The method seems really elegant and we've had some really exciting results from it so far.

Here is a brief outline of our preliminary analysis:

We have a well assembled reference genome plus 8 individuals sequenced with HiFi, two of the individuals are properly phased with parent short read data, whereas the others are just mosaic haplotype assemblies.
We then have a large number of short read population samples, which we'd like to genotype.
There is a specific region of the genome that we are interested in, which has an elevated concentration of large SVs that overlap, including an insertion of >20Kbp. Getting accurate genotyping results for this region from short-reads would be really useful for our subsequent analyses.
Currently the genotyping of some of the TE sequences in this region have fairly low GQ scores and so don't seem completely reliable.

I have a couple of questions to make sure I'm getting the most out of the program:

I initially had good results with minigraph-cactus for generating a comprehensive vcf file. However I read on the PanGenie issues page, that this is not a recommended program. There was a reference to vcfbub, but It's not clear to me what steps need to be taken to go from the reference + long read assemblies to an appropriate pangenome graph. I didn't notice any obvious issues with the subsequent short read genotyping results from minigraph-cactus, but I'm worried that I'm not getting the best results.
I was trying to be quite conservative with the genotyping results, I used the GQ flag as mentioned in the paper, and then also applied a minimum KC of 7 per sample. Is this a reasonable filtering approach?
I noticed that there is an experimental phasing option. We are quite interested in the haplotype structure of our data, and initially we used Shapeit (using the HiFi individuals as reference haplotypes) to phase the population data. Are there anymore details available about PanGenie phasing? Is it worth applying the PanGenie phasing prior to Shapeit perhaps? Is it worth combining PanGenie genotyping with read-based phasing programs like WhatsHap?

AK field might be misleading

Variants in the input VCF that are less than the kmer size apart are internally merged by PanGenie and treated as a single variant. Alleles of this merged variant are composed of combinations of the merged variants (defined by the paths). The number of unique kmers per allele is computed only for this merged variant. When writing the output VCF, these merged variants are converted back to their original representation (creating the original entries). However, the AK counts are computed for each original allele by summing up the unique kmers of all combined alleles that include that particular allele. The problem is that the same kmer can be counted several times when occurring on several combined alleles, leading to numbers of AK that exceed UK. There is not really any proper way of computing AK counts for the original alleles.

Error in vcf-merging/pangenome-graph-from-assemblies/Snakemake file

Hi,

When I tried to reproduce your pangenome graph, I noticed that there were several errors in vcf-merging/pangenome-graph-from-assemblies/Snakemake file, mainly about generating trio sample list and ped file.

According to function parse_trios in scripts/mendelian-consistency.py,

def parse_trios(ped_file, samples):
	samples_to_include = set()
	with open(samples, 'r') as listing:
		for line in listing:
			samples_to_include.add(line.strip())
	trios = {}
	for line in open(ped_file, 'r'):
		if line.startswith('#'):
			continue
		fields = line.split()
		# map trio_name -> [child, parent1, parent2, nr consistent, nr_inconsistent, nr_untyped, nr total]
		if any([s not in samples_to_include for s in fields[1:4]]):
			continue
		trios[(fields[1], fields[2], fields[3])] = [fields[1], fields[2], fields[3], 0, 0, 0, 0]
	return trios

trio sample list (results/trio-samples.txt) should be like:

HG00731
HG00732
HG00733
HG00512
HG00513
HG00514
NA19238
NA19239
NA19240

However, the Snakemake file generated a file like:

Puerto Rican
HG00731
HG00732
HG00733
Chinese
HG00512
HG00513
HG00514
Yoruban
NA19238
NA19239
NA19240

And ped (results/trios.ped) file should look like:

Puerto_Rican	HG00733	HG00731	HG00732
Chinese	HG00514	HG00512	HG00513
Yoruban	NA19240	NA19238	NA19239

instead of:

Puerto Rican	Puerto Rican	HG00731	HG00732
Chinese	Chinese	HG00512	HG00513
Yoruban	Yoruban	NA19238	NA19239

(Also, trio names should not contain spaces)

To solve the problem above, you may need to modify some lines in Snakemake file:

#################################################################
#  5) Check mendelian consistency for trios and construct graph
#################################################################


# generate a file specifying the trio relationships
rule generate_ped_file:
	output:
		"{outdir}trios.ped"
	run:
		with open(output[0], "w") as ped_output:
			for trio in config['trios']:
				father=config['trios'][trio][0]
				mother=config['trios'][trio][1]
				#Original
				#ped_output.write('\t'.join([trio, trio, father, mother]) + '\n')
				#MODIFED#
				child=config['trios'][trio][2]
				ped_output.write('\t'.join([trio, child, father, mother]) + '\n')
				
rule generate_samples_file:
	output:
		"{outdir}trio-samples.txt"
	run:
		with open(output[0], "w") as sample_output:
			for trio in config['trios']:
				#sample_output.write(trio + '\n') #MODIFIED#
				for sample in config['trios'][trio]:
					sample_output.write(sample + '\n')

Can pangenie also work with GBS data? Genotyping-by-sequencing?

Hi pangenie team,

First of all, thank you very much for developing such a beautiful tool for the community.

I am wondering, instead of whole genome sequencing data, can pangenie also work with genotyping-by-sequencing (GBS) data? In plant breeding, people normally do GBS to save money since there might be hundreds of plants needed to be genotyped.

Looking forward to your answer and thank you very much in advance.

Best wishes,
Yutang

Push singularity image to Sylabs Cloud

Any chance of pushing the singularity image to a cloud service?
It would be great for those of us who try to integrate this tool into nextflow genotyping pipelines.

Thank you.

command `PanGenie-index` has wrong cli-usage info

As the title says, the command PanGenie-index in newest master branch is :

program: PanGenie - genotyping based on kmer-counting and known haplotype sequences.
author: Jana Ebler

version: v3.0.0
usage:
PanGenie-index [options] -i <reads.fa/fq> -r <reference.fa> -v <variants.vcf> -o <index-prefix>

options:
	-e VAL	size of hash used by jellyfish (default: 3000000000).
	-k VAL	kmer size (default: 31).
	-o VAL	prefix of the output files. NOTE: the given path must not include non-existent folders.
	-r VAL	reference genome in FASTA format. NOTE: INPUT FASTA FILE MUST NOT BE COMPRESSED.
	-t VAL	number of threads to use for kmer-counting (default: 1).
	-v VAL	variants in VCF format. NOTE: INPUT VCF FILE MUST NOT BE COMPRESSED.

Error: option -v is mandatory.

The usage info showed -i <reads.fa/fq>, actually, it's not necessary . It could be trivial mistake:)

Best, Wei

Question about std::runtime_error

Hello,

I've encountered the following error while running a program:

The input file used is the output from minigraph-cactus.

When I created the input VCF using only 3 genome sequences, the calling worked well. However, after adding 200 more genomes, I faced the following error

Can anyone help me resolve this issue?

Thank you.

$ pangenome singularity exec $programs/pangenie/container/pangenie.sif PanGenie -i 10-5.fastq -r $ref -v ./pangenome.vcf -l 31 -j 4 -t 4 -o 10-5 -s 10-5

program: PanGenie - genotyping and phasing based on kmer-counting and known haplotype sequences.
author: Jana Ebler

Files and parameters used:
-a 0
-c 0
-d 0
-e 3000000000
-g 0
-i 10-5.fastq
-j 4
-k 31
-o 10-5
-p 0
-r ./cw_final.fasta
-s 10-5
-t 4
-u 0
-v ./pangenome.vcf
Determine allele sequences ...
Found 11 chromosome(s) from the reference file.
terminate called after throwing an instance of 'std::runtime_error'
what(): VariantReader: number of paths is limited to 254 in current implementation.
[1] 229709 abort (core dumped) singularity exec $programs/pangenie/container/pangenie.sif PanGenie -i -r -

update help/readme: uncompressed input

The command line help is accurate (reads.fa/fq), but since uncompressed read files are uncommon, it would be helpful to add that requirement in a more obvious way, e.g.

-i VAL	sequencing reads in FASTA/FASTQ format or Jellyfish database in jf format (required).
        NOTE: INPUT FASTA/Q FILE MUST NOT BE COMPRESSED

pggb vcf are empty

Hi again,

I'm having serious troubles processing the PGGB .vcf files... At first the tool pointed out to haplotypes being un-phased; this is a problem I fixed collapsing the columns for each individual haplotype in a single one for all samples using the following awk command:

awk -F '\t' '/^#/ {print;next;} {OFS="\t";for(i=j=10; i < NF; i+=2) {$j = $i"|"$(i+1); j++} NF=j-1}1' pangenome_ref_guided_GRCh38.vcf > pangenome_ref_guided_GRCh38-phased.vcf

In fact, compared to minigraph-CACTUS, PGGB outputs a .vcf file where the haplotypes for each individual in the graph are split into two columns, this leads PanGenIe to think they are un-pahsed. Anyway, I proceed then to fix the columns headers for the respective samples, but still despite running to completion the process returns an empty .vcf.

I'm not sure what was the cause for it, and after talking with Glenn and looking up online I acknowledged I was missing some pre-processing steps, namely removing complex bubbles and decomposing nested alleles with vcfbub and vcfwave, respectively.
I run the following command:

vcfbub -l 0 -r 10000 -i pangenome_ref_guided_GRCh38-phased.vcf.gz | vcfwave -L 10000 -t 16 -n | bgzip -c -@ 16 > decomposed_and_cleaned.vcf.gz

Unfortunately, I got the same result... So, going back to one of my hypotheses this could have been linked to having # in the filenames as per the PanSN standard for naming used by PGGB.
I tested removing from the #CHROM column everything but the string chr(number); however, even after this attempt the result is always an empty .vcf file.
I though it could have been helpful to re-run the vcfbub and vcflib commands, but this time I got the following error:

error: more sample names in header than sample fields
samples: CHM13 HG002 HG00514 HG00733 HG03492 NA19240
line: chr1 418412 >11061496>11061499 T C 60.0 . AC=1;AF=1;AN=1;AT=>11061496>11061497>11061499,>11061496>11061498>11061499;NS=1;LV=0 GT .|. .|. .|. .|. .|1
thread 'main' panicked at 'called Result::unwrap() on an Err value: IoError(Os { code: 32, kind: BrokenPipe, message: "Broken pipe" })', src/main.rs:199:46
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

This is strange as I previously didn't get that, maybe because I run the pre-processing before editing with awk, I don't know for sure...

Please, if you have any clue/idea of what is happening I would really appreciate some help. I tried several approaches but all failed and I wonder whether there is something I'm missing.
With that said, I'm happy to attach a screenshot of the original .vcf file if needed and I'm sorry for the long text.

Thanks in advance!

P.S. I always made sure the contigs name in the reference and the #CHROM in the .vcf were consistent in all trials

Can PanGenie genotype inversions?

Hi Jana,

Previously I thought PanGenie only worked with SNPs, insertions, and deletions. However, I read about a study with your contribution (https://doi.org/10.1101/2021.12.20.472354 , "Haplotype-resolved inversion landscape reveals hotspots of mutational recurrence associated with genomic disorders") in which it used PanGenie to genotype inversions. May I ask how the inversions should be represented in the input VCF file of PanGenie?
Thank you!

Best regards,
Honghe

update help/readme: output prefix

It is not obvious how PanGenie handles the output prefix if the path includes non-existent folders. By looking at the runtime errors, I assume PanGenie does not create non-existent folders.

PanGenie installation by conda

Dear @eblerjana,
Thank you for this useful tool. I have a question about Pangenie installation.

#here is the pangenie installation method I chose
git clone https://github.com/eblerjana/pangenie.git
cd pangenie
conda env create -f environment.yml
conda activate pangenie
mkdir build; cd build; cmake .. ; make
#(pangenie) cche@login02 21:25:03
~/lixin/software/pangenie/build
$
make
[ 37%] Built target PanGenieLib
[ 40%] Built target PanGenie
[ 42%] Built target PanGenie-index
[100%] Built target tests

#but PanGenie can't run after 'make' process
(pangenie) cche@login02 22:07:42
~/lixin/software/pangenie/build

./build/src/PanGenie

program: PanGenie - genotyping based on kmer-counting and known haplotype sequences.
author: Jana Ebler

version: v3.0.0
usage:
PanGenie [options] -f -i <reads.fa/fq> -o
PanGenie [options] -i <reads.fa/fq> -r <reference.fa> -v <variants.vcf> -o

I have no problem, and PanGenie can be run successfully. 

Thanks and wish you all the best.
Li Xin

How to get the correct reference file and variants file

Hello.

I am a beginner in this field and I'm currently trying to learn how to use this tool. In regard to the input files for this tool, I have a very naive question.

I would like to use some larger datasets to run this program. I found some relevant information in the paper's link, such as this link: https://zenodo.org/record/5607680. However, the FASTA files in this link seem to be categorized by genotypes. How can I extract information about a single chromosome from these files and then obtain corresponding reference and variant files that match each other?

Best.

Add pangenie to bioconda

Hi is it possible to add Pangenie to Bioconda instead of having to manually build the tool?

How to process large paired end fastq file?

Hello,
I have about 200 samples. There are 2 fastq.gz file (sample.R1.fq.gz and sample.R2.fq.gz) for every file. Each file is very large (more than 50G compressed file). How can I unzip them and combine the paired end file into one file?

Thank you !

Reuse path-segments.fasta

Hi Jana,

I have some questions regarding the path-segments.fasta, please correct me if I am wrong:

path-segments.fasta is generated by PanGenie-graph -r <reference.fa> -v <variants.vcf> -o <output-prefix> and not affected by the input fasta file.
path-segments.fasta is only used to count k-mer by jellyfish count --if <path-segments.fasta>

As PanGenie-graph seems a single thread process and takes me ~1 hour to generate the reference file (likely due to my HDD speed), I try to avoid generating this every time by separately using jellyfish to count k-mer first. But I found the path-segments.fasta is always generated.

Hence, I am wondering if it is possible to add one option to reuse path-segments.fasta when genotyping individual samples or skip generating this file when the input is a jf database?

Thanks a lot.

Best,
Han

How to represent structural variants in input VCF?

Hi,

Thank you very much for such a nice tool.

I would like to represent not only snps but also structural variants in the input VCF.

Could you please tell me how to do that?

This is my toy example:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=1>
##ALT=<ID=DEL,Description="Deletion">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00732 NA12878
1 15951 . T ~~. PASS . GT 0|0 1|1~~

End error message:

VariantReader: skip variant at 1:15950 since alleles contain undefined nucleotides:

Best,
Mehmet

Filtering variants

Hi there,

Thanks for this great tool. I noticed that, in both the HPRC reference release paper and the original paper describing this tool, you employ complex pre and post pangenie variant filtering:

"We used vcfbub (version 0.1.0, (Garrison, 2021)) with parameters -l 0 and -r 100000 in order to filter the VCF..."
"We additionally remove all records for which more than 80% of all 88 haplotypes carry a missing allele..."
"we have implemented a decomposition approach which aims at detecting all variant alleles nested inside of multi-allelic top-level records..."
"[We] defined a high quality subset of SV alleles that we can reliably genotype. For this purpose, we applied a machine learning approach similar to what we have presented previously..."

How important are these methods to the quality/reliability of your final genotypes, especially for large SV (> 50bp)?
Does this package implement these filters, and if not, is there code that does?
Do you have pre-trained models to use for the machine learning/regression based filter?

Just trying to get high-quality SV genotypes on my diverse short-read dataset, so any help is greatly appreciated.

Thanks again!
-Joe

Pre-processing for input fastq

Hi,

Thanks for developing this great tool. I have one naive question regarding the input fastq files. Is it important to pre-process input fastq files (e.g., trimming by sequencing quality, filtering by length after trimming)? Can we directly use the raw sequencing reads?

Thanks,
Han

Add option to restrict PanGenie to region set

Would be useful to be able to work, e.g., with exome data. If variants outside the given region set could be imputed, that'd be even more cool.

Segmentation fault

Hi
I started to install pangenie using the following code:
git clone https://github.com/eblerjana/pangenie.git
cd pangenie
conda activate pangenie
mkdir build; cd build; cmake .. ; make
Since I use an old linux HPC with old glibc I got this message:

cmake: /lib64/libc.so.6: version `GLIBC_2.17' not found

I installed glibc2.17 using this code:
conda install -c fridh glibc
Then I reactivated pangenie and tried
cmake
at the pangenie directory.
I get:

Segmentation fault

Thank you

Question about pangenome-graph-from-assemblies

I want to generate a VCF file from haplotype-resolved assemblies using pangenome-graph-from-assemblies, recommended in document, but encounter error. Is it a format problem of config.json ? The pictures are error report and my json file respectively.
Thans a lot!

Evaluating Pangenie vcf output with Truvari

Hi,

Thank you for developing PanGenie! Recently, I am trying to evaluate the output vcf of PanGenie with Truvari, but it comes up that the entries which have multiple sequence (separated by comma) in the ALT field are misclassified as false positives by Truvari. Is there any way or criteria to choose only one sequence from the ALT field, or find the most possible variant that corresponds to such an entry?

For example, given a record like this:

chr1 899960 . TGGGAGGGGTCCGCGCGTCCGCAGTGGGGATGTGCTGCGGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCCGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCCGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCCGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCCGGGA TGGGAGGGGTCCACGCGTCCGCAGTGGGGCTGCCGGGAGGGGGGCCGGGTCCACAGTGGGGATGTGCTGCCGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCCGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCGGGAAGGGGGGGGCCGGGTCCGCAGTGGGGCTGTGCTGCCGGGA,TGGGAGGGGTCCGCGCGTCCGCAGTGGGGCTGCCGGGAGGGGGGCCGGGTCCACAGTGGGGATGTGCTGCCGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCCGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCGGGAAGGGGGGGGCCGGGTCCGCAGTGGGGCTGTGCTGCCGGGA,TGGGAGGGGTCCGCGCGTCCGCAGTGGGGCTGTGCTGCGGGAAG,CCGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCGGGAAGGGGGGGGCCGGGTCCGCAGTGGGGATGTGCTGCCGGGA,TGGGAGGGGTCCGCGCGTCCGCAGTGGGGCTGTGCTGCGGGAAGAGGGGGGCCGGGTCCGCAGTGGGGATGTGCTGCCGGGA,CGGGGA,CGGGGAGGGGGGCGCGGGTCCGTAGTGGGGATGTGCTGCGGGAAGGGGGGGGCCGGGTCCACAGTGGGGATGTGCTGCCGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCCGGGAGGGGGGCGCGGGTCCGCAGTGGGGATGTGCTGCGGGAA . PASS AF=0.05,0.05,0.15,0.05,0.1,0.15,0.05;UK=301;AK=52,112,95,44,47,70,54,97;MA=8;ID=chr1-899972-SNV-0-1:chr1-899987-DEL-0-5:chr1-899998-SNV-0-1:chr1-900009-DEL-1-1:chr1-900018-SNV-0-1:chr1-900112-SNV-0-1:chr1-900115-SNV-0-1:chr1-900123-SNV-0-1:chr1-900124-INS-0-1:chr1-900141-SNV-0-1,chr1-899987-DEL-0-5:chr1-899998-SNV-0-1:chr1-900009-DEL-1-1:chr1-900018-SNV-0-1:chr1-900112-SNV-0-1:chr1-900115-SNV-0-1:chr1-900123-SNV-0-1:chr1-900124-INS-0-1:chr1-900141-SNV-0-1,chr1-899989-SNV-0-1:chr1-900001-SNV-0-1:chr1-900003-DEL-0-151,chr1-899955-DEL-0-113:chr1-900112-SNV-0-1:chr1-900115-SNV-0-1:chr1-900123-SNV-0-1:chr1-900124-INS-0-1,chr1-899989-SNV-0-1:chr1-900000-INS-0-2:chr1-900009-DEL-2-115,chr1-899955-DEL-1-189:chr1-900150-SNV-0-1,chr1-899955-DEL-2-37:chr1-900019-SNV-0-1:chr1-900036-SNV-0-1:chr1-900039-SNV-0-1:chr1-900047-SNV-0-1:chr1-900048-INS-0-1:chr1-900056-SNV-0-1:chr1-900150-SNV-0-1:chr1-900153-SNV-0-1 GT:GQ:GL:KC 3/3:10000:-82.8146,-359.1,-264.5,-291,-281.3,-185.3,-93.41,-258.7,-197.2,0,-138.8,-309.4,-199.5,-35.39,-57.8,-169.7,-335,-273.5,-101.3,-122.4,-72.88,-187.1,-362.2,-315.3,-117.7,-191.2,-194,-120.8,-247.5,-409.4,-342.2,-178.1,-270.1,-210.3,-201.2,-202.4:52

how should I interpret it and convert it (if possible) to a simple variant record that only has one sequence in both REF and ALT?

Thank you very much

Best

VariantReader: skip too many short variants from minigraph-cactus

PanGenie is a user-friendly and fast genotyper.

I generated vcf from minigraph-cactus.
When i used PanGenie v2.1.0, it can output results but too many (nearly 36%) variants from were skipped. The command is

~/tool/pangenie-2.1.0/build/src/PanGenie -i $fq -r $ref -v $mcvcf -j $th -s $sample -o $sample -t 20

Logs output like

VariantReader: skip variant at Chr01:5201 since it is contained in a previous one.
...

The source vcf file had been changed to the diploid version. Here are the first and second variants.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1       Sample2   Sample3     Sample4     Sample5     Sample6   Sample7       Sample8     Sample9       Sample10
Chr01   5194    >1008>1119      TGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTAAACCCTGAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTGAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCC      TGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCC,TGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTAAACCCTAAACCCTAAACTCTAAACCCTAAACCCTAAACTCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTGAACCCTAAATCCTAAACCCTAAACCCTAAACCCTAAACCC,TGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCTGAACCCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCTGAACCCCTAAACCCTGAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTGAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCC,TAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAACCCTAAACCCTAAACCCTAACCCTAAGCCCTAAACCCTAAACCCTAAGCCCTAAGCCCTAAACCCTAAGCCCTAAACCCTAAGCCCTAAGCCCTAAACCCTAAGCCCTAAGCCCTAAACCCTAAGCCCTAAACCCTAAACCCTAAACCCTAAGCCCTAAGCCCTAAGCCCTAAGCCCTAAACCCTAAGCCCTAAGCCCTAAGCCCTAAACCCTAAGCCCTAAGCCCTAAGCCCTAAGCCCTAAGCCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCC,TATT 60      .       AC=1,1,1,1,1;AF=0.166667,0.166667,0.166667,0.166667,0.166667;AN=6;AT=>1008>1011>1012>1013>1014>1016>1017>1019>1020>1022>1023>1025>1026>1028>1029>1031>1032>1034>1035>1037>1038>1040>1047>1048>1050>1051>1053>1054>1056>1057>1059>1060>1062>1104>1106>1108>1109>1110>1112>1113>1115>1116>1118>1119,>1008>1011>1012>1013>1014>1016>1017>1019>1020>1022>1023>1025>1026>1028>1029>1031>1032>1034>1035>1037>1038>1040>1047>1048>1050>1051>1053>1054>1056>1057>1059>1060>1062>1104>1106>1107>1109>1110>1112>1114>1115>1116>1118>1119,>1008>1011>1012>1013>1014>1016>1017>1019>1020>1022>1023>1025>1026>1028>1029>1031>1032>1034>1035>1037>1038>1040>1047>1048>1050>1051>1053>1054>1056>1057>1059>1060>1062>1063>1065>1066>1068>1069>1071>1072>1074>1075>1077>1078>1080>1081>1083>1084>1086>1087>1089>1090>1092>1093>1095>1097>1099>1100>1102>1103>1104>1106>1107>1109>1111>1112>1113>1115>1117>1118>1119,>1008>1011>1012>1013>1014>1016>1017>1019>1020>1022>1023>1025>1026>1028>1029>1031>1032>1034>1035>1037>1038>1040>1041>1043>1044>1046>1047>1048>1050>1051>1053>1054>1056>1057>1059>1060>1062>1063>1065>1066>1068>1069>1071>1072>1074>1075>1077>1078>1080>1081>1083>1084>1086>1087>1089>1090>1092>1093>1095>1096>1097>1098>1100>1101>1103>1104>1106>1107>1109>1110>1112>1113>1115>1116>1118>1119,>1008>1009>1012>1013>1015>1016>1018>1019>1021>1022>1024>1025>1027>1028>1030>1031>1033>1034>1036>1037>1039>1040>1042>1043>1045>1046>1047>1049>1050>1052>1053>1055>1056>1058>1059>1061>1062>1064>1065>1067>1068>1070>1071>1073>1074>1076>1077>1079>1080>1082>1083>1085>1086>1088>1089>1091>1092>1094>1095>1097>1099>1100>1101>1103>1104>1105>1106>1107>1109>1110>1112>1114>1115>1116>1118>1119,>1008>1009>1010>1119;NS=6;LV=0     GT      5|5     .|.     3|3     1|1     0|0     2|2     .|.     4|4     .|.     .|.
Chr01   5202    >1013>1016      G       A       60      .       AC=1;AF=0.2;AN=5;AT=>1013>1014>1016,>1013>1015>1016;NS=5;LV=1;PS=>1008>1119     GT      .|.     .|.     0|0     0|0     0|0     0|0     .|.     1|1     .|.     .|.
...

I checked the REF/ALT sequence, and found the second variant was redundant.
RER[Sample5] 5202bp position can be inferred from the first variant. TGAACCCT[g]AA
ALT[Sample8] 5202bp position can be inferred from the first variant as well. TAAACCCT[a]AA

I know that PanGenie uses non-overlapping variants from the vcf file, but I wonder how to check the vcf file can be used as the input of PanGenie? Are there some suggested steps to pre-process the vcf file?

Interleaved fastq files

Hello,

Since I have interleaved reads ready to use, I was wondering if it is ok to use them instead of concatenating the reads or it would cause problems down the line?

Thanks!

How to use paired end fastq file

Hello,
My sequencing files (.fastq) are paired end. But I didn't find parameter which fits this kind of file in panGenie. Can you give me an example usage of paired end file?

Thanks!

Which version of CHM13 reference genome to use

Hi,

Thanks for developing this great tool. I would like to use the "HPRC-CHM13 (88 haplotypes)" variant reference. Based on the name, I think the reference genome I should use for genotyping is CHM13, but I am not sure which version of CHM13 to use. Could you provide links to the reference genome you used when generating the variant reference vcf?

Thanks,
Han

jellyfish version?

INSTALL

pangenie]$ cd build/
build]$ cmake ..
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /xxx/software/miniconda3/bin/x86_64-conda-linux-gnu-cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /xxx/software/miniconda3/bin/x86_64-conda-linux-gnu-c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Checking for module 'jellyfish-2.0'
-- Found jellyfish-2.0, version 2.2.10
-- /Parastor300s_G30S/zhangjj/software/miniconda3/envs/pangenie/include/jellyfish-2.2.10
-- /parastor300/work01/zhangjj/software/PanGenie/pangenie/build/src
-- Configuring done
-- Generating done
-- Build files have been written to: /parastor300/work01/zhangjj/software/PanGenie/pangenie/build
build]$ make
Scanning dependencies of target PanGenieLib
[ 1%] Building CXX object src/CMakeFiles/PanGenieLib.dir/emissionprobabilitycomputer.cpp.o
[ 2%] Building CXX object src/CMakeFiles/PanGenieLib.dir/copynumber.cpp.o
[ 4%] Building CXX object src/CMakeFiles/PanGenieLib.dir/commandlineparser.cpp.o
[ 5%] Building CXX object src/CMakeFiles/PanGenieLib.dir/columnindexer.cpp.o
[ 7%] Building CXX object src/CMakeFiles/PanGenieLib.dir/dnasequence.cpp.o
[ 8%] Building CXX object src/CMakeFiles/PanGenieLib.dir/fastareader.cpp.o
[ 10%] Building CXX object src/CMakeFiles/PanGenieLib.dir/genotypingresult.cpp.o
[ 11%] Building CXX object src/CMakeFiles/PanGenieLib.dir/histogram.cpp.o
[ 13%] Building CXX object src/CMakeFiles/PanGenieLib.dir/hmm.cpp.o
。。。
[ 38%] Linking CXX executable PanGenie-graph
/xxx/software/miniconda3/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -ljellyfish-2.0
collect2: error: ld returned 1 exit status
make[2]: *** [src/CMakeFiles/PanGenie-graph.dir/build.make:104: src/PanGenie-graph] Error 1
make[1]: *** [CMakeFiles/Makefile2:166: src/CMakeFiles/PanGenie-graph.dir/all] Error 2
make: *** [Makefile:114: all] Error 2

build]$ jellyfish --version
jellyfish 2.2.10

Does the version of Jellyfish have to be 2.0?

eblerjana / pangenie Goto Github PK

pangenie's People

Contributors

Stargazers

Watchers

Forkers

pangenie's Issues

Steps to Reproduce

Expected Result

Actual Result

How to Fix

INSTALL

Recommend Projects

Recommend Topics

Recommend Org