lczech / grenedalf Goto Github PK

View Code? Open in Web Editor NEW

25.0 3.0 2.0 19.43 MB

Toolkit for Population Genetic Statistics from Pool-Sequenced Samples, e.g., in Evolve and Resequence experiments

License: GNU General Public License v3.0

CMake 2.71% Makefile 0.32% C++ 87.93% Shell 2.88% Python 6.15%

pool-sequencing evolve-and-resequence allele-frequencies population-genetics popoolation popoolation2 poolfstat

grenedalf's People

Contributors

Stargazers

Watchers

Forkers

peterdfields bzizou

grenedalf's Issues

Grendalf not outputting when sites are missing in the Mpileup file

I am currently using Grendalf v 0.5.0 with some pooled sequencing data and was noticing that in my outputs the total.missing column is always 0 even when information is missing from my mpileup files. I have attached a screenshot of my mpileup file and of the output from grendalf diversity to illustrate this.

error regarding vcf file from PoolSNP

Hello,

I am trying to run grenedalf frequency on my vcf file generated by PoolSNP from an mpileup file using

GRENEDALF=/gxfs_home/geomar/smomw573/software/grenedalf/bin/grenedalf
VCF=/gxfs_home/geomar/smomw573/work/seasonal_adaptation/analysis/PoolSNP/finalfile.vcf.gz
GENOME=/gxfs_work1/geomar/smomw573/seasonal_adaptation/genome/Eaffinis.Baltic.PseudoRef.Mar22.fasta \
OUTPUT=/gxfs_home/geomar/smomw573/work/seasonal_adaptation/analysis/grenedalf-freq


$GRENEDALF frequency --vcf-path $VCF --allow-file-overwriting --reference-genome-fasta-file $GENOME --write-sample-alt-freq > $OUTPUT/grenedalftrial.log

and got the following error:

Cannot iterate over VCF file /gxfs_home/geomar/smomw573/work/seasonal_adaptation/analysis/PoolSNP/finalfile.vcf using the "AD" FORMAT field to count allelic depths, as that field is not part of the VCF file.

PoolSNP output vcf head

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  EA_2009_T1      EA_2009_T2      EA_2009_T3      EA_2009_T4      EA_2011_T1      EA_2011_T2    EA_2015_T1      EA_2015_T2      EA_2015_T3      EA_2015_T4      EA_2022_T1      EA_2022_T2      EA_2022_T3      EA_2022_T4
TRINITY_DN17_c1_g1_i1   185     .       G       A       .       .       ADP=82.42857142857143;NC=0      GT:RD:AD:DP:FREQ        ./.:.:.:.:.     0/1:51:1:52:0.02      0/1:71:3:74:0.04        0/1:94:3:97:0.03        0/1:64:2:66:0.03        0/1:107:2:109:0.02      0/1:109:2:111:0.02      0/0:91:0:91:0.0 0/1:48:2:50:0.04      0/0:96:0:96:0.0 0/1:99:2:101:0.02       0/1:92:1:93:0.01        0/1:79:3:82:0.04        0/1:100:2:102:0.02
TRINITY_DN17_c1_g1_i1   190     .       C       G       .       .       ADP=83.92857142857143;NC=0      GT:RD:AD:DP:FREQ        ./.:.:.:.:.     0/1:48:3:51:0.06      0/1:73:4:77:0.05        0/1:96:4:100:0.04       0/1:67:1:68:0.01        0/1:105:5:110:0.05      0/1:108:2:110:0.02      0/1:92:3:95:0.03        0/1:48:2:50:0.04      0/1:94:4:98:0.04        0/1:104:2:106:0.02      0/1:87:4:91:0.04        0/1:82:3:85:0.04        0/0:102:0:102:0.0

The vcf file was then formated with the following python script:

# Open the VCF file in read mode
with gzip.open(vcf_file_path, 'rt', encoding='utf-8') as vcf_file:
    # Iterate through each line in the file
    with open(output_file_path, 'a', encoding='utf-8') as output_file:
        line_number = 0

        for line in vcf_file:
            if line.startswith('#'):
                continue

            # Increment the line number (Move this line outside the loop)
            # line_number += 1
            # Split the line into fields based on tabs
            fields = line.strip().split('\t')
            # 9th column forward
            fsplit = fields[8].split(':')
            modified_fields = ':'.join([fsplit[0], fsplit[2], fsplit[3], fsplit[4]])
            fields[8] = modified_fields

            idx = 9
            for fcolumn in fields[9:]:
                tmpsplit = fcolumn.split(':')
                tmpjoin = ':'.join([tmpsplit[0], ','.join([tmpsplit[1], tmpsplit[2]]), ':'.join([tmpsplit[3], tmpsplit[4]])])
                fields[idx] = tmpjoin
                idx += 1

            # Write the modified line to the output file
            output_file.write('\t'.join(fields) + '\n')

            line_number += 1
            #print(line_number)

            # Print line number every 50,000 lines
            if line_number % 50000 == 0:
                print(f"Processed {line_number} lines.")

and we corrected the AD field in the vcf header

##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

and now grenedalf frequency in running.

Many thanks,

Jennifer

subsampling vs coverage thresholds

hi,

thanks for developing this -- it looks really useful.

I had more of conceptual question. In the popoolation manuals it always suggests subsampling pileups to create equal coverage. Is that something that is still necessary/useful here? And if so, how does one decide either on the subsampling value, or min/max coverage?

cheers and many thanks in advance.

Compilation problem during NIX packaging

Hello,

I'm trying to create a NIX package of your tool (which is used in my computing center) but I'm facing an issue during compilation, and I can't understand why. I don't know if you have some NIX knowledge, but here are the sources of this wip packages (genesis and grenedalf):

bzizou/nixpkgs@9a30cd2
bzizou/nixpkgs@1d366e5

Note that NIX is providing a way to create dependencies with specific versions, like the commit hash of CLI11 for example. This replaces the "manual" build you made into tools/cmake/IncludeHtslib.cmake, that's why it has been disabled by a patch.

The compiler used by default here is gcc 13.2, but I tried with Clang, gcc11, gcc8 and I always have the same error.

Here is the full trace of the error:

[bzizou@bart:~/git/nixpkgs]$ nix-build -A grenedalf
this derivation will be built:
  /nix/store/alvqp910rn1ajb0avjsvg59aihp27qi8-grenedalf-0.3.0.drv
building '/nix/store/alvqp910rn1ajb0avjsvg59aihp27qi8-grenedalf-0.3.0.drv'...
Running phase: unpackPhase
unpacking source archive /nix/store/rzv22r10gywmw7d6fjvyv1r1s7z4hdbg-source
source root is source
Running phase: patchPhase
Running phase: updateAutotoolsGnuConfigScriptsPhase
Running phase: configurePhase
fixing cmake files...
cmake flags: -DCMAKE_FIND_USE_SYSTEM_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DCMAKE_INSTALL_LOCALEDIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/share/locale -DCMAKE_INSTALL_LIBEXECDIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/libexec -DCMAKE_INSTALL_LIBDIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/lib -DCMAKE_INSTALL_DOCDIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/share/doc/grenedalf -DCMAKE_INSTALL_INFODIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/share/info -DCMAKE_INSTALL_MANDIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/share/man -DCMAKE_INSTALL_OLDINCLUDEDIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/include -DCMAKE_INSTALL_INCLUDEDIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/include -DCMAKE_INSTALL_SBINDIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/sbin -DCMAKE_INSTALL_BINDIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/bin -DCMAKE_INSTALL_NAME_DIR=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0/lib -DCMAKE_POLICY_DEFAULT_CMP0025=NEW -DCMAKE_OSX_SYSROOT= -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_STRIP=/nix/store/ln6zld1ia7rxddmxgbpfhrmb42rbxdw8-gcc-wrapper-13.2.0/bin/strip -DCMAKE_RANLIB=/nix/store/ln6zld1ia7rxddmxgbpfhrmb42rbxdw8-gcc-wrapper-13.2.0/bin/ranlib -DCMAKE_AR=/nix/store/ln6zld1ia7rxddmxgbpfhrmb42rbxdw8-gcc-wrapper-13.2.0/bin/ar -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DCMAKE_INSTALL_PREFIX=/nix/store/6iz6bqk2kriqrhry0ha8vjh2wvj55z9x-grenedalf-0.3.0 -DGENESIS_INCLUDE_DIR=/nix/store/jyjd5a9gh4336kb7l40yz5sfpyhzvxr8-genesis-0.30.0/include 
CMake Deprecation Warning at CMakeLists.txt:26 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The CXX compiler identification is GNU 13.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /nix/store/ln6zld1ia7rxddmxgbpfhrmb42rbxdw8-gcc-wrapper-13.2.0/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- grenedalf build type: Release
-- Static linking of system libraries: OFF
-- Building without LTO/IPO support
-- CMAKE_EXE_LINKER_FLAGS  
-- GENESIS_LINK_LIBRARIES 
-- Configuring done (0.3s)
-- Generating done (0.0s)
CMake Warning:
  Manually-specified variables were not used by the project:

    BUILD_TESTING
    CMAKE_C_COMPILER
    CMAKE_EXPORT_NO_PACKAGE_REGISTRY
    CMAKE_FIND_USE_PACKAGE_REGISTRY
    CMAKE_FIND_USE_SYSTEM_PACKAGE_REGISTRY
    CMAKE_INSTALL_BINDIR
    CMAKE_INSTALL_DOCDIR
    CMAKE_INSTALL_INCLUDEDIR
    CMAKE_INSTALL_INFODIR
    CMAKE_INSTALL_LIBDIR
    CMAKE_INSTALL_LIBEXECDIR
    CMAKE_INSTALL_LOCALEDIR
    CMAKE_INSTALL_MANDIR
    CMAKE_INSTALL_OLDINCLUDEDIR
    CMAKE_INSTALL_SBINDIR


-- Build files have been written to: /build/source/build
cmake: enabled parallel building
cmake: enabled parallel installing
Running phase: buildPhase
build flags: -j8 SHELL=/nix/store/087167dfxal194pm54cmcbbxsfy3cjgn-bash-5.2p26/bin/bash
[ 12%] Building CXX object CMakeFiles/grenedalf.dir/src/commands/convert/frequency.cpp.o
[ 12%] Building CXX object CMakeFiles/grenedalf.dir/src/commands/analyze/fst.cpp.o
[ 12%] Building CXX object CMakeFiles/grenedalf.dir/src/commands/convert/sync.cpp.o
[ 12%] Building CXX object CMakeFiles/grenedalf.dir/src/commands/analyze/diversity.cpp.o
[ 19%] Building CXX object CMakeFiles/grenedalf.dir/src/commands/tools/license.cpp.o
[ 19%] Building CXX object CMakeFiles/grenedalf.dir/src/commands/tools/citation.cpp.o
[ 22%] Building CXX object CMakeFiles/grenedalf.dir/src/commands/simulate/simulate.cpp.o
[ 25%] Building CXX object CMakeFiles/grenedalf.dir/src/commands/tools/version.cpp.o
[ 29%] Building CXX object CMakeFiles/grenedalf.dir/src/commands/tools/wiki.cpp.o
[ 32%] Building CXX object CMakeFiles/grenedalf.dir/src/commands/visualize/afs_heatmap.cpp.o
[ 35%] Building CXX object CMakeFiles/grenedalf.dir/src/main.cpp.o
[ 38%] Building CXX object CMakeFiles/grenedalf.dir/src/options/file_input.cpp.o
[ 41%] Building CXX object CMakeFiles/grenedalf.dir/src/options/file_output.cpp.o
[ 45%] Building CXX object CMakeFiles/grenedalf.dir/src/options/global.cpp.o
[ 48%] Building CXX object CMakeFiles/grenedalf.dir/src/options/poolsizes.cpp.o
[ 51%] Building CXX object CMakeFiles/grenedalf.dir/src/options/table_output.cpp.o
[ 54%] Building CXX object CMakeFiles/grenedalf.dir/src/options/variant_file_frequency_table.cpp.o
[ 58%] Building CXX object CMakeFiles/grenedalf.dir/src/options/variant_file_pileup.cpp.o
[ 61%] Building CXX object CMakeFiles/grenedalf.dir/src/options/variant_file_sam.cpp.o
[ 64%] Building CXX object CMakeFiles/grenedalf.dir/src/options/variant_file_sync.cpp.o
[ 67%] Building CXX object CMakeFiles/grenedalf.dir/src/options/variant_file_vcf.cpp.o
/build/source/src/options/variant_file_sam.cpp: In member function 'virtual VariantFileOptions::VariantInputIterator VariantFileSamOptions::get_iterator_(const std::string&) const':
/build/source/src/options/variant_file_sam.cpp:158:5: error: 'SamVariantInputIterator' was not declared in this scope; did you mean 'VariantInputIterator'?
  158 |     SamVariantInputIterator reader;
      |     ^~~~~~~~~~~~~~~~~~~~~~~
      |     VariantInputIterator
/build/source/src/options/variant_file_sam.cpp:159:5: error: 'reader' was not declared in this scope
  159 |     reader.min_map_qual( sam_min_map_qual_.value );
      |     ^~~~~~
/build/source/src/options/variant_file_sam.cpp:162:31: error: 'string_to_sam_flag' was not declared in this scope
  162 |     reader.flags_include_all( string_to_sam_flag( sam_flags_include_all_.value ));
      |                               ^~~~~~~~~~~~~~~~~~
/build/source/src/options/variant_file_sam.cpp:168:12: error: 'make_variant_input_iterator_from_sam_file' was not declared in this scope
  168 |     return make_variant_input_iterator_from_sam_file( filename, reader );
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[ 70%] Building CXX object CMakeFiles/grenedalf.dir/src/options/variant_filter_numerical.cpp.o
/build/source/src/options/variant_file_vcf.cpp: In member function 'virtual VariantFileOptions::VariantInputIterator VariantFileVcfOptions::get_iterator_(const std::string&) const':
/build/source/src/options/variant_file_vcf.cpp:74:12: error: 'make_variant_input_iterator_from_pool_vcf_file' was not declared in this scope
   74 |     return make_variant_input_iterator_from_pool_vcf_file( filename );
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[ 74%] Building CXX object CMakeFiles/grenedalf.dir/src/options/variant_filter_region.cpp.o
make[2]: *** [CMakeFiles/grenedalf.dir/build.make:328: CMakeFiles/grenedalf.dir/src/options/variant_file_sam.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [CMakeFiles/grenedalf.dir/build.make:356: CMakeFiles/grenedalf.dir/src/options/variant_file_vcf.cpp.o] Error 1
/build/source/src/options/variant_filter_region.cpp: In member function 'void VariantFilterRegionOptions::prepare_region_filters() const':
/build/source/src/options/variant_filter_region.cpp:225:22: error: 'genome_locus_set_from_vcf_file' was not declared in this scope
  225 |         add_filter_( genome_locus_set_from_vcf_file( file ));
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/grenedalf.dir/build.make:384: CMakeFiles/grenedalf.dir/src/options/variant_filter_region.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/grenedalf.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
error: builder for '/nix/store/alvqp910rn1ajb0avjsvg59aihp27qi8-grenedalf-0.3.0.drv' failed with exit code 2;
       last 10 log lines:
       > make[2]: *** [CMakeFiles/grenedalf.dir/build.make:328: CMakeFiles/grenedalf.dir/src/options/variant_file_sam.cpp.o] Error 1
       > make[2]: *** Waiting for unfinished jobs....
       > make[2]: *** [CMakeFiles/grenedalf.dir/build.make:356: CMakeFiles/grenedalf.dir/src/options/variant_file_vcf.cpp.o] Error 1
       > /build/source/src/options/variant_filter_region.cpp: In member function 'void VariantFilterRegionOptions::prepare_region_filters() const':
       > /build/source/src/options/variant_filter_region.cpp:225:22: error: 'genome_locus_set_from_vcf_file' was not declared in this scope
       >   225 |         add_filter_( genome_locus_set_from_vcf_file( file ));
       >       |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       > make[2]: *** [CMakeFiles/grenedalf.dir/build.make:384: CMakeFiles/grenedalf.dir/src/options/variant_filter_region.cpp.o] Error 1
       > make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/grenedalf.dir/all] Error 2
       > make: *** [Makefile:91: all] Error 2
       For full logs, run 'nix log /nix/store/alvqp910rn1ajb0avjsvg59aihp27qi8-grenedalf-0.3.0.drv'.

If you have any idea...

Dxy calcuations

Dear Lucas,

great to see a revision of PoolSeq software for the next round of high throughput analyses. Exciting times ahead.

I know that you have already done a lot of work, but one thing that would help advance the community is the ability to generate Dxy per window, as a measure of divergence between populations.

why?

There is a lot of existing empirical and theoretical work that focuses on when Fst vs. Dxy are informative, and thus, it would be nice to have a framework where Fst and Dxy could both be estimated from PoolSeq. This was a limitation of the Popoolation2 package, that I think held back PoolSeq analyses from more commonly combining Fst and Dxy in their analyses.

Currently, in the individual based VCF world, this is done perhaps best via PiXY, which overcomes existing tools in accounting for missing data in most VCF based approaches for Pi and Fst analyses.
https://pixy.readthedocs.io/en/latest/

Hopefully you can do something similar?

Multiallelic positions handling not available in v0.5.0

Hi @lczech

I am using grenedalf v0.5.0 to calculate FST between a set of samples. My set of samples are coming from 3 different vcf files, so I am using the --sample-group-merge-table-file to create the different groups (it works pretty well).

Moreover, my window-type is a gff3 file containing genes only.

Here is my command:

grenedalf fst --vcf-path set1.bcf --vcf-path set2.bcf --vcf-path set3.bcf \
    --reference-genome-fasta-file genome.fasta \
    --sample-group-merge-table-file grouping.list \
    --filter-total-only-biallelic-snps \
    --window-region-gff gff_genes.gff3 \
    --window-type regions \
    --pool-sizes pools.list \
    --filter-total-snp-min-count 2 \
    --method unbiased-hudson \
    --window-average-policy available-loci \
    --write-pi-tables \
    --comparand-list comparand.list \
    --file-prefix fst_genes_hudson. --threads 4 --log-file fst_genes_hudson.log

I am running into the following error:

Computing FST between 398 pairs of samples.
Streaming over 25351 total regions across 63 chromosomes.
Error: Cannot convert VcfRecord to Variant, as one of the VcfRecord REF or ALT sequences/alleles is not a single nucleotide (it is not a SNP), at chr1:18018. At the time bein
g, we are not supporting indels and other such variants.

The position mentioned is an indel and it is only present on set1.bcf

CHROM  POS     ID      REF     ALT     QUAL
chr1       18018   Repeat  TA      T       4259.94

I also tried doing the following:

use the --filter-region-bed flag and give a bed file that does not contain that position - Did not work
use the --filter-total-only-biallelic-snps flag - Did not work

I had a look in the previous version that I was using (v0.3.0) and there was an implementation to handle multiallelic positions. Has that been removed in the new version? In any case, it should skip non-snp position as default, right?

Any idea how I could solve this problem?

Cheers,

Vinicius

Unordered chromosomes error

Hi there,

Excited to use grenedalf, thanks for your work on this! I'm providing multiple VCFs to a grenedalf frequency call to start, and am getting the following error
what(): Malformed VCF file /DATA/home/juliak//pathogen_evo/vcfs/poolseq/Sample_100.Q20.ANNO.vcf: unordered chromosomes and positions going from NODE_9_length_182528_cov_101.096338:62956 to NODE_10_length_150116_cov_77.788002:93243 Aborted (core dumped)

I've taken a look at the vcfs in this region and they look correctly sorted (see below). Is there something about how the chromosome name is parsed that might be throwing this error?

Thanks,
Julia

BAMs vs VCFs for pooled mitochondrial variants using Kofler or Karlsson

Dear Lucas,
I have been using your tool ‘Grenedalf’ for analysing pooled data, and I would like to first thank you for putting such a great tool together!

I work with pooled data (coming from shotgun metagenomics sequencing of faecal samples infected with parasitic eggs or pools of eggs directly) and I am interested in analysing both nuclear and mitochondrial data to understand genetic variation within those parasites. I work directly on stool extracts mostly, so any downstream analysis is based on allele frequencies not genotypes.

I have used both VCF and BAM files to calculate Fst and theta pi nucleotide diversity from a couple of samples, and I am getting different results depending on whether I use BAM or VCF files for the same number of positions to be considered (n=229 mitochondrial variants, after filtering for missingness, quality etc). You have explained previously how you recommend to use BAM files as opposed to VCFs and that makes sense, so I will switch to using bam files moving forward.

First Q: For calculating Fst, when I use the ‘Karlsson’ method (as I am unsure of my pool size = I have hundreds of eggs in each environmental sample) with a VCF, then the number of SNPs that are considered for the analysis is n=93 but it is 95 if I am using the corresponding BAM files. However, the total number of SNPs in the VCF is 229 and they are all biallelic from looking at the VCF.
When I switch to the ‘Kofler’ method, for the BAM files, the number of SNPs to be considered is 156 (!) but for the VCF is still 93. I am unsure why ~ 73 positions are considered ‘Not SNPs’ from ‘Grenedalf’ (during the Kofler/BAM method), though they appear to be SNPs in my VCF and why when I use the ‘Karlsson’ method, then the tool throws out some ‘non biallelic SNPs’ (n=61) as well as the ‘no SNPs’ (n=73).
Are there any assumptions made by ‘Grenedalf’? Do the BAM files hold information about the reads (like indel positions etc even though they have been filtered out) that makes ‘Grenedalf’ to ignore them?
The VCFs I have used to test these have been filtered for min/max alleles (2 for both, bcftools), and then the positions have been filtered for high quality (based on custom QUAL distributions and for 0.7 max-missingness).
Does it have to do with the fact I am using mitogenome data (ploidy of 1?).

Command line for pi: grenedalf diversity --sam-path $i --filter-sample-min-count 2 --filter-sample-min-coverage 1 --filter-region-list $k --window-type chromosomes --pool-sizes 1000 --file-prefix ${nickname}_${name}_pi_
Command line for Fst: grenedalf fst --sam-path $i $j --method karlsson --filter-region-list 0.7_pools.recode.kept.sites --window-type chromososomes --file-prefix ${i}_${j}_

Second Q: Is there a way to calculate within sample pi diversity when you don't know the pool size?

Third Q: Is it possible to generate a collated table from when running Fst with BAMs? I have 88 bams and all the possible combos generate over 3,000 CSVs that have to be merged later...

Thank you for all the fantastic work and responsiveness!
Marina

Diversity by chromosomes output file - tiny issue?

Hi guys,

This is a tiny issue (and may be deliberate) but thought I would flag it in case not.

The output diversity file (calculating all three measures and putting in one file), when run for chromosomes has the start and end of the chromosome as zero. Not sure if you had wanted to have the last base on each chromosome for the end instead. Just thinking it would make the number of snps and coverage more immediately relevant.

All the best,
Jenni

Masking sample-wise

Hi Lucas,

me again, another important feature which would be very useful is to use sample-specific masking conditions. Global masking may be useful to mask TE's etc. However, individual libraries may be characterized by differences in Read Depths which may require more specific masking for the individual samples.

In our case, we have individual BED/MASK files (in FASTA format) for each sample and currently, I need to break the input sample-wise and run grenedalf for each sample separately, which is quite an effort for >700 samples.

Is there a more elegant way to do that, e.g. by reading the BED files for each sample first and create a matrix with window-wise masks for each samples which can then be used to calculate averages?

Cheers, Martin

Question: How does the SNP window work?

Hi,

I've been plotting files and realised that the SNP windows are the same co-ordinates for all my pairwise comparisons. But that some have 'nan'. Does this mean that windows are determined as those which hold 100 SNPs across all samples (I asked for 100 SNPs), but when two samples being compared have fewer than 100 SNPs in a given window, the statistic for that window for that comparison is not calculated?

Thanks,
Jenni

Open MP issue

Hiya,

I am trying to install grenedalf to test the software on my pool seq data, but having a hard time installing it. I've encountered an issue related to OpenMP. Sorry, i am a bit of novice when it comes to compiling. But below is my error output:

#######################
Running CMake...
CMake Deprecation Warning at CMakeLists.txt:26 (cmake_minimum_required):
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.

-- The CXX compiler identification is AppleClang 12.0.5.12050022
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- grenedalf build type: RELEASE
-- Static linking of system libraries: OFF
CMake Deprecation Warning at libs/genesis/CMakeLists.txt:28 (cmake_minimum_required):
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.

-- Configuring Genesis
-- CMake version 3.21.0-rc2
-- The C compiler identification is AppleClang 12.0.5.12050022
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- OS: Darwin 20.5.0
-- Genesis version: v0.25.0
-- Building Genesis as a dependency
-- Build type: RELEASE
-- Unity build: FULL
-- C++ compiler: AppleClang 12.0.5.12050022 at /Library/Developer/CommandLineTools/usr/bin/c++
-- C compiler : AppleClang 12.0.5.12050022 at /Library/Developer/CommandLineTools/usr/bin/cc
-- Looking for zlib
-- Found ZLIB: /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk/usr/lib/libz.tbd (found version "1.2.11")
-- Found zlib: /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk/usr/include 1.2.11
-- Using zlib
-- Looking for Threads
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found Threads:
-- Using Threads
-- Looking for OpenMP
-- Using find_package( OpenMP ) patch
-- Try OpenMP C flag = [ ]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-fopenmp=libiomp5]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-fopenmp=libomp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-Xpreprocessor -fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-Xclang -fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [/openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-Qopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-xopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [+Oopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-qsmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP C flag = [-mp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [ ]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-fopenmp=libiomp5]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-fopenmp=libomp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-Xpreprocessor -fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-Xclang -fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [/openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-Qopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-openmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-xopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [+Oopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-qsmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
-- Try OpenMP CXX flag = [-mp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Failed
CMake Warning (dev) at /Applications/CMake.app/Contents/share/cmake-3.21/Modules/FindPackageHandleStandardArgs.cmake:438 (message):
The package name passed to find_package_handle_standard_args (OpenMP)
does not match the name of the calling package (OpenMP_patch). This can
lead to problems in calling code that expects find_package result
variables (e.g., _FOUND) to follow a certain pattern.
Call Stack (most recent call first):
libs/genesis/tools/cmake/FindOpenMP_patch.cmake:384 (find_package_handle_standard_args)
libs/genesis/CMakeLists.txt:336 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.

-- Could NOT find OpenMP (missing: OpenMP_C_FLAGS OpenMP_CXX_FLAGS)
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND)
-- OpenMP not found
-- You are trying to compile with Clang and OpenMp. This does not seem to work right now. Try installing libiomp-dev first, or the equivalent for your system.
-- To build without OpenMP support, call CMake with -DGENESIS_USE_OPENMP=OFF
CMake Error at libs/genesis/CMakeLists.txt:389 (message):
Required package OpenMP not found.

####################

Best,
Lenny

Problem compilation

Good,

I have problems compiling the software. I have tried with gcc7, gcc11 and gcc12. I attach the log of the compilation.

best regards,
David
make.log

masking when averaging by window-length

Hi Lucas,

thanks for the great new features -praticulary the averaging options are very useful. However, I am wondering if you could also implement the possibility to mask when in averaging by window-length. As far as I understand this is not possible yet.

In our case, we have already called SNPs and want to average diversity stats in windows (by window-length) based on a SYNC file with SNPs only. In addition, we have BED files, which contain the regions that should be masked in the genomes.

When I am using --window-average-policy valid-loci in combination with --filter-mask-fasta ${PWD}/data/BED/${ID}.mask.gz I get a lot of missing positions, but (I guess) the correct number of masked sites per window.

When I am using --window-average-policy window-length in combination with --filter-mask-bed ${PWD}/data/BED/${ID}.bed.gz I get no (or a few??) missing positions, but also no masked sites per window.

Thanks, Martin

Unidentifed issues when creating sync file

Hi Lucas,

I'm trying to generate a sync filewith the following command:

grenedalf sync --threads 8 --sam-path /shared/projects/invalbo/tiphaine/results/04_Polishing/Reunion/O_rg_marked_realigned.bam
--sam-path /shared/projects/invalbo/tiphaine/results/04_Polishing/Reunion/P_rg_marked_realigned.bam
--reference-genome-dict-file /shared/projects/invalbo/tiphaine/resources/genomes/AalbF5_filtered.dict
--out-dir /shared/projects/invalbo/tiphaine/results/06_Grenedalf
--multi-file-locus-set intersection --with-header --sam-min-base-qual 30 --sam-min-map-qual 20 --sam-flags-include-any /^0x[1-2]+/
--file-suffix Reunion --filter-region NC_085136.1:1-1000000 --compress --allow-file-overwriting

I have try this option with and without the "--compress" option.

I don't really understand some parts of the error message:

[W::bam_hdr_read] EOF marker is absent. The input is probably truncated

Failed to read BGZF block data at offset 3493214333 expected 20418 bytes; hread returned 0
Read block operation failed with error 4 after 50 of 379 bytes

Do you have a suggestion for the origin of the problem ?
If it comes from my bam files, I don't understant in which way they can be truncated.

Thank you in advance for your help,
Cheers,
Tiphaine

diversity calculations (pi) seem to calculate based on windows in an unexpected way

Hi Lucas,

I've noticed my theta pi numbers were far, far higher than I would anticipate. I'm calculating based on sliding windows of 10000 positions. I found one high pi area to look at what was going on. That region looks as follows:

2L      5776    C       0:0:32:192:0:0
2L      6353    C       0:183:30:0:0:0
2L      6921    G       0:192:0:27:0:0
2L      7542    G       0:169:0:45:0:0
2L      7801    C       0:161:36:0:0:0
2L      8667    G       0:0:29:185:0:0
2L      8774    T       0:27:172:0:0:0

So in a 10000bp window I have 7 segregating sites with reasonably good and consistent coverage. If we calculate Per site nucleotide diversity (based on the naive calculation of per site expected heterozygosity) would be
0.00016. However, I get 0.269 which is reasonably close to the value expected based on only the 7 segregating sites (~0.23). So it seems the windows are not behaving as I would expect? It is calculating based on the just the bi-allelic sites, rather than the entire window. Is there a way to set this to acknowledge the rest of the window that I missed?

Thank you,
Tyler

Sample grouping for diversity calculation

Hi,

I am working with a dataset of pooled natural occurring populations which have been processed in a variant calling (gatk). I have already calculated the different diversity measures per-sample, but I was wondering if it would be possible to create groups of samples (e.g. 10 samples for group A and 10 samples for group B) and then calculate per-group diversity measures?

It would of course be possible to merge on fastq or bam level, but that would then lead to lots of extra preprocessing.

Thank you.

compilation warning with gcc/12.3.0

It may be an overly cautious gcc 12.3.0 compiler but during a successful linking I see:

[100%] Linking CXX executable /sw/bioinfo/grenedalf/0.3.0/src/grenedalf/bin/grenedalf
In member function 'update_',
    inlined from 'update' at /sw/bioinfo/grenedalf/0.3.0/src/grenedalf/build/genesis_unity_sources/lib/all.cpp:67744:12,
    inlined from 'update' at /sw/bioinfo/grenedalf/0.3.0/src/grenedalf/build/genesis_unity_sources/lib/all.cpp:67731:15:
/sw/bioinfo/grenedalf/0.3.0/src/grenedalf/build/genesis_unity_sources/lib/all.cpp:67831:11: warning: '__builtin_memcpy' reading between 0 and 4294967295 bytes from a region of size 0 [-Wstringop-overread]
67831 |     memcpy( &buffer_[index], &input[i], length-i );
      |           ^
/sw/bioinfo/grenedalf/0.3.0/src/grenedalf/build/genesis_unity_sources/lib/all.cpp: In member function 'update':
/sw/bioinfo/grenedalf/0.3.0/src/grenedalf/build/genesis_unity_sources/lib/all.cpp:67725:10: note: at offset [65, 4294967295] into source object 'sbuf' of size 64
67725 |     char sbuf[MD5::BlockSize];
      |          ^
[100%] Built target grenedalf

Note that I build with cmake separately as I'd prefer to use non-system modules for liblzma libbz2 libdeflate and libz that we already have.

output extension is csv, but actually "\t"

not very important, but the output of fst and frequency both have .csv file extensions, but the files are tab delimited.

--sam-min-map-qual and depth

Hi,

Thank you for the amazing tool. I have a question on how the --sam-min-map-qual filter works. According to wiki

--sam-min-map-qual
UINT:UINT in [0 - 90]=0 Needs: --sam-path
Minimum phred-scaled mapping quality score [0-90] for a read in sam/bam/cram files to be considered. Any read that is below the given value of mapping quality will be completely discarded, and its bases not taken into account. Default is 0, meaning no filtering by base quality.

I have aligned the reads to the genome using BWA MEM and processed the bam files with the following command.

/grenedalf/bin/grenedalf frequency --sam-path Eng_ORG.bam Eng_MID.bam Eng_CUR.bam
--reference-genome-fasta-file genome.fa
--sam-min-base-qual 25 --sam-min-map-qual 30 --write-sample-read-depth --write-sample-counts --write-total-read-depth --write-sample-ref-freq --write-total-counts --out-dir ./ --file-prefix p2

My understanding is that only the reads with mapq > 30 will be used to estimate the frequencies, so the read depth of a sample in the output should match the number of reads with mapq score above 30 in the bam file. I am, i correct in assuming this?

I summed up the DEPTH column if a sample in R and this value does not match with the number of reads with mapq >30 in the bam file.

> sum(fsnp$Eng_MID.DEPTH)
[1] 488310302

samtools view Eng_MID.bam | awk '$5 > 30' | wc -l
27079399

The library size of Eng_MID.fq.gz was ~43 million PE reads

The same is the case with other samples. Is this because of multimapping? if yes how can this affect the allele frequency, diversity estimates and fst?

Thanks in advance

How to understand the statistics of the screen output？

Dear @lczech,

I have four pooling sequencing libraries. I have finished SNP calling using bwa+GATK and a total of 9.8 M SNPs (after filtering for depth and missing rate) have been identified. I found that grenedalf can directly work with BAM and VCF format. So I first calculated the FST using BAM as the input. At the end of the programme, the following statistics were recorded. How to understand each statistic and which number better represents the number of SNPs? If 28568428 represents the number of SNPs, it is significantly different from 9.8 M.

Sample filter summary (summed up across all samples):
Passed:               4036079496
Empty (after counts): 32041114
Above max coverage:   263818
Total filter summary (after applying all sample filters):
Passed:                  28568428
Below min coverage:      51188538
Not SNP:                 937243195

Finished 2023-10-02 23:48:24

When using VCF as the input, the statistics looks much more normal. It seems like that 9795814 represents the number if SNPs after filtering by minor allele count > 0.

Processed 39 chromosomes with 9795814 (non-filtered) positions in 47245 windows.
Total filter summary (after applying all sample filters):
Passed:                  9795814
Not SNP:                 120324

Finished 2023-10-03 10:29:20

Sincerely,
Zhuqing Zheng

install errors

Hi @lczech
Very nice tool. When installing this tool, i get the following errors. If possible, could you provide a binary program or any advice on installing under anaconda environment?

[ 21%] Building CXX object libs/genesis/lib/genesis/CMakeFiles/genesis_lib_obj.dir/__/__/__/__/genesis_unity_sources/lib/all.cpp.o
/public/home/zhengzhuqing/01.software/install/grenedalf/build/genesis_unity_sources/lib/all.cpp: In member function ‘void genesis::population::BedReader::read_(std::shared_ptr<genesis::utils::BaseInputSource>, std::function<void(genesis::population::BedReader::Feature&&)>) const’:
/public/home/zhengzhuqing/01.software/install/grenedalf/build/genesis_unity_sources/lib/all.cpp:7696:24 warning: missing initializer for member ‘genesis::population::BedReader::Feature::chrom’ [-Wmissing-field-initializers]
         feat = Feature{};

make[3]: *** [libs/genesis/lib/genesis/CMakeFiles/genesis_lib_obj.dir/__/__/__/__/genesis_unity_sources/lib/all.cpp.o] Error 1
make[2]: *** [libs/genesis/lib/genesis/CMakeFiles/genesis_lib_obj.dir/all] Error 2
make[1]: *** [all] Error 2
make: *** [build] Error 2

Sincerely,
Zheng zhuqing

Error: Svg axis label position out of [ 0.0, 1.0 ]

I'm trying to make a cathedral plot on a couple of different chromosomes. A subset are creating the following error:

Plotting cathedral-plot-scaffold_6.csv
Error: Svg axis label position out of [ 0.0, 1.0 ]

terminate called after throwing an instance of 'std::runtime_error'
  what():  Svg axis label position out of [ 0.0, 1.0 ]
Aborted (core dumped)

I have tried using the --clip and --min, --max flags but am still getting the same error message. Please let me know if any additional information would be helpful.

Suggestion to include "missing data" codes

Dear Lucas,

I have a suggestion how to handle missing information. In our recent papers (e.g. here and here), we extended the original SYNC file code by a missing data code. I.e., instead of using 0:0:0:0:0:0, which may either indicate missing data (e.g. by not passing SNP calling criteria) or zero coverage, we introduced the .:.:.:.:.:. code, which unambiguously indicates that this position is masked.

It would be very useful if your software could handle this as well.

Thanks a lot,

Martin

Inconsistencies between `--sample-name-list` and `--filter-samples-exclude`

Hi Lucas,

thanks for this new implementation. This will be very useful!

However, I am running into a problem when I want to exclude samples from a sync file. I provide a list of 737 names with --sample-name-list and then a list of 13 samples to be excluded with --filter-samples-exclude.

Now, I get the following error which indicates that the sample name list is not correctly updated when the samples are excluded. Is there a quick fix for that?

--sample-name-list(names.txt): Invalid sample names list that contains 737 name entries. This is incongruent with the input file, which contains 724 samples (after filtering, if a sample name filter was given).

Thanks, Martin

Suggestion: Include MASK files for correct calculation of per-site averages

Hi Lucas,

me again! I have another suggestion that may be useful as you are restructuring your code right now. One common problem with PoPoolation is, that it does not take into account that not only SNPs but also non-polymorphic positions may not pass heuristic SNP-calling paramters or a posteriori filtering. This commonly leads to an underestimation of per-site averages for any popgen statistics when simply dividing the sum of absolute values by the window-size. To yield correct values, the window-sizes need to be adjusted to account for all sites that do not pass the same criteria (heuristic parameters, removal of low-complexity regions, etc.) as the actual SNPs. We describe this issue in the Methods section of our paper (https://academic.oup.com/mbe/article/37/9/2661/5837682)

One way to pass such information is to either read BED or MASK files. The latter are FASTA-type files, where all positions with a 0 are retained and all positions >0 are ignored. See also the documentation of vcftools

>1
0000011111222...
>2
2222211111000...

Based on this information, the window-size could be adjusted to calculate correct per-site averages. Hope this is helpful and apologies for swamping you with stuff ;-)

Cheers, Martin

error with grenedalf

Hi Lczech

I run grenedalf and I got this error:
Started 2024-06-14 14:57:37

Processing 4 samples

At chromosome chromosome_1
At chromosome chromosome_2
At chromosome chromosome_3
At chromosome chromosome_4
At chromosome chromosome_5
At chromosome chromosome_6
At chromosome chromosome_7
At chromosome chromosome_8
At chromosome chromosome_9
Error: Cannot iterate multiple input sources in parallel, as (at least) one of them is not in the correct sorting order. By default, we expect lexicographical sorting of chromosomes, and then sorting by position within chromosomes. Alternatively, when a sequence dictionary is specified (such as from a .dict or .fai file, or from a reference genome .fasta file), we expect the order of chromosomes as specified there. Offending input source: PS2, going from chromosome_9:58459040 to chromosome_10:6

terminate called after throwing an instance of 'std::runtime_error'
what(): Cannot iterate multiple input sources in parallel, as (at least) one of them is not in the correct sorting order. By default, we expect lexicographical sorting of chromosomes, and then sorting by position within chromosomes. Alternatively, when a sequence dictionary is specified (such as from a .dict or .fai file, or from a reference genome .fasta file), we expect the order of chromosomes as specified there. Offending input source: PS2, going from chromosome_9:58459040 to chromosome_10:6
Aborted (core dumped)

the for bam files come from grenepipe/dedup

is this related with the size of the label? the problems is when we change form chromossome 9 to 10.

This was my first try with grenedalf and I didn't use a referrence genome file.

/home/dau1/software/grenedalf-0.5.1/bin/grenedalf diversity --sam-path PN1.bam PN2.bam PS1.bam PS2.bam --pool-sizes pool-sizes.csv --window-type genome --window-average-policy valid-loci --filter-sample-min-count 2 --filter-sample-min-read-depth 4 --no-extra-columns

thank you

osp

lczech / grenedalf Goto Github PK

grenedalf's People

Contributors

Stargazers

Watchers

Forkers

grenedalf's Issues

Recommend Projects

Recommend Topics

Recommend Org