cbg-ethz / v-pipe Goto Github PK

V-pipe is a pipeline designed for analysing NGS data of short viral genomes

Home Page: https://cbg-ethz.github.io/V-pipe/

License: Apache License 2.0

Shell 0.35% Python 5.89% HTML 9.49% Dockerfile 0.08% CSS 0.07% Perl 0.01% Jupyter Notebook 84.11%

ngs snakemake conda biohackeu20 virus sequencing bioinformatics bioinformatics-pipeline biohackcovid20 sars-cov-2

v-pipe's Introduction

V-pipe is a workflow designed for the analysis of next generation sequencing (NGS) data from viral pathogens. It produces a number of results in a curated format (e.g., consensus sequences, SNV calls, local/global haplotypes). V-pipe is written using the Snakemake workflow management system.

Usage

Different ways of initializing V-pipe are presented below. We strongly encourage you to deploy it using the quick install script, as this is our preferred method.

To configure V-pipe refer to the documentation present in config/README.md.

V-pipe expects the input samples to be organized in a two-level directory hierarchy, and the sequencing reads must be provided in a sub-folder named raw_data. Further details can be found on the website. Check the utils subdirectory for mass-importers tools that can assist you in generating this hierarchy.

We provide virus-specific base configuration files which contain handy defaults for, e.g., HIV and SARS-CoV-2. Set the virus in the general section of the configuration file:

general:
  virus_base_config: hiv

Also see snakemake's documentation to learn more about the command-line options available when executing the workflow.

Tutorials

Tutorials for your first steps with V-pipe for different scenarios are available in the docs/ subdirectory.

Using quick install script

To deploy V-pipe, use the installation script with the following parameters:

curl -O 'https://raw.githubusercontent.com/cbg-ethz/V-pipe/master/utils/quick_install.sh'
./quick_install.sh -w work

This script will download and install miniconda, checkout the V-pipe git repository (use -b to specify which branch/tag) and setup a work directory (specified with -w) with an executable script that will execute the workflow:

cd work
# edit config.yaml and provide samples/ directory
./vpipe --jobs 4 --printshellcmds --dry-run

Test data to test your installation is available with the tutorials provided in the docs/ subdirectory.

Using Docker

Note: the docker image is only setup with components to run the workflow for HIV and SARS-CoV-2 virus base configurations. Using V-pipe with other viruses or configurations might require internet connectivity for additional software components.

Create config.yaml or vpipe.config and then populate the samples/ directory. For example, the following config file could be used:

general:
  virus_base_config: hiv

output:
  snv: true
  local: true
  global: false
  visualization: true
  QA: true

Then execute:

docker run --rm -it -v $PWD:/work ghcr.io/cbg-ethz/v-pipe:master --jobs 4 --printshellcmds --dry-run

Using Snakedeploy

First install mamba, then create and activate an environment with Snakemake and Snakedeploy:

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy
conda activate snakemake

Snakemake's official workflow installer Snakedeploy can now be used:

snakedeploy deploy-workflow https://github.com/cbg-ethz/V-pipe --tag master .
# edit config/config.yaml and provide samples/ directory
snakemake --use-conda --jobs 4 --printshellcmds --dry-run

Dependencies

Conda

Conda is a cross-platform package management system and an environment manager application. Snakemake uses mamba as a package manager.
Snakemake

Snakemake is the central workflow and dependency manager of V-pipe. It determines the order in which individual tools are invoked and checks that programs do not exit unexpectedly.
VICUNA

VICUNA is a de novo assembly software designed for populations with high mutation rates. It is used to build an initial reference for mapping reads with ngshmmalign aligner when a references/cohort_consensus.fasta file is not provided. Further details can be found in the wiki pages.

Computational tools

Other dependencies are managed by using isolated conda environments per rule, and below we list some of the computational tools integrated in V-pipe:

FastQC

FastQC gives an overview of the raw sequencing data. Flowcells that have been overloaded or otherwise fail during sequencing can easily be determined with FastQC.
PRINSEQ

Trimming and clipping of reads is performed by PRINSEQ. It is currently the most versatile raw read processor with many customization options.
ngshmmalign

We perform the alignment of the curated NGS data using our custom ngshmmalign that takes structural variants into account. It produces multiple consensus sequences that include either majority bases or ambiguous bases.
bwa

In order to detect specific cross-contaminations with other probes, the Burrows-Wheeler aligner is used. It quickly yields estimates for foreign genomic material in an experiment. Additionally, It can be used as an alternative aligner to ngshmmalign.
MAFFT

To standardise multiple samples to the same reference genome (say HXB2 for HIV-1), the multiple sequence aligner MAFFT is employed. The multiple sequence alignment helps in determining regions of low conservation and thus makes standardisation of alignments more robust.
Samtools and bcftools

The Swiss Army knife of alignment postprocessing and diagnostics. bcftools is also used to generate consensus sequence with indels.
SmallGenomeUtilities

We perform genomic liftovers to standardised reference genomes using our in-house developed python library of utilities for rewriting alignments.
ShoRAH

ShoRAh performs SNV calling and local haplotype reconstruction by using bayesian clustering.
LoFreq

LoFreq (version 2) is SNVs and indels caller from next-generation sequencing data, and can be used as an alternative engine for SNV calling.
SAVAGE and Haploclique

We use HaploClique or SAVAGE to perform global haplotype reconstruction for heterogeneous viral populations by using an overlap graph.

Citation

If you use this software in your research, please cite:

Fuhrmann, L., Jablonski, K. P., Topolsky, I., Batavia, A. A., Borgsmueller, N., Icer Baykal, P., Carrara, M. ... & Beerenwinkel, (2023). "V-Pipe 3.0: A Sustainable Pipeline for Within-Sample Viral Genetic Diversity Estimation." bioRxiv, doi:10.1101/2023.10.16.562462.

Contributions

* software maintainer ; ** group leader

Contact

We encourage users to use the issue tracker. For further enquiries, you can also contact the V-pipe Dev Team [email protected].

v-pipe's People

Contributors

Stargazers

Watchers

v-pipe's Issues

Warning: All reads at position 4045 in the same reverse orientation ?

Hello V-pipe team,
Thanks for the wonderful tool.
My v-pipe works fine, however I am getting this warning (Warning: All reads at position 4045 in the same reverse orientation ?) for number of position around 50, and don't know what is wrong in the dataset. Can you please explain me why I get this warning and how to rectify this ?
Thanks
Vinoy

KeyError: <rule name>

The following upstream issue affect execution of V-pipe:

snakemake/snakemake#1021
snakemake/snakemake#1024

current work-around:

mamba install snakemake-minimal=6.3.0

lofreq environment file contains unsolvable conflicts

Output of conda while trying to construct environment:

Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package htslib conflicts for:
lofreq=2.1.4 -> samtools -> htslib[version='>=1.10,<1.11.0a0|>=1.9,<1.10.0a0']
lofreq=2.1.4 -> htslib[version='>=1.10.2,<1.11.0a0']

Package libgcc-ng conflicts for:
lofreq=2.1.4 -> libgcc-ng[version='>=7.3.0|>=7.5.0']
lofreq=2.1.4 -> python[version='>=3.7,<3.8.0a0'] -> libgcc-ng[version='>=4.9|>=7.2.0']

Package libcurl conflicts for:
lofreq=2.1.4 -> htslib[version='>=1.10.2,<1.11.0a0'] -> libcurl[version='>=7.64.1,<8.0a0|>=7.71.1,<8.0a0']
samtools=1.9 -> curl[version='>=7.64.0,<8.0a0'] -> libcurl[version='7.59.0|7.60.0|7.61.0|7.61.1|7.61.1|7.61.1|7.62.0|7.62.0|7.63.0|7.63.0|7.64.0|7.64.0|7.64.0|7.64.0|7.64.0|7.64.1|7.64.1|7.65.2|7.65.3|7.68.0|7.68.0|7.69.1|7.71.0|7.71.1|7.71.0|7.69.1|7.68.0|7.67.0|7.65.3|7.65.2|7.64.1|7.63.0|7.63.0|7.62.0',build='h1ad7b7a_0|h20c2e04_0|hbdb9355_0|h01ee5af_1000|hbdb9355_0|h01ee5af_1000|h20c2e04_2|h20c2e04_0|h01ee5af_0|hda55be3_4|hda55be3_0|hda55be3_0|hcdd3856_0|hf7181ac_0|hf7181ac_1|hda55be3_0|hda55be3_0|hf7181ac_1|hf7181ac_5|h541490c_2|h20c2e04_0|h20c2e04_0|h20c2e04_0|h20c2e04_0|h20c2e04_0|h20c2e04_0|h01ee5af_1002|hbdb9355_2|h20c2e04_1000|h20c2e04_0|h20c2e04_0|h1ad7b7a_0|h1ad7b7a_0']
bcftools=1.9 -> curl[version='>=7.64.1,<8.0a0'] -> libcurl[version='7.59.0|7.60.0|7.61.0|7.61.1|7.61.1|7.61.1|7.62.0|7.62.0|7.63.0|7.63.0|7.64.0|7.64.0|7.64.0|7.64.0|7.64.0|7.64.1|7.64.1|7.64.1|7.65.2|7.65.3|7.68.0|7.68.0|7.69.1|7.71.0|7.71.1|7.71.0|7.69.1|7.68.0|7.67.0|7.65.3|7.65.2|7.63.0|7.63.0|7.62.0',build='h1ad7b7a_0|h1ad7b7a_0|h20c2e04_1000|hbdb9355_0|h01ee5af_1000|hbdb9355_0|h01ee5af_1000|h20c2e04_2|h01ee5af_0|hda55be3_4|hf7181ac_5|h20c2e04_0|h20c2e04_0|hda55be3_0|hda55be3_0|hcdd3856_0|hf7181ac_0|hf7181ac_1|hda55be3_0|hda55be3_0|hf7181ac_1|h20c2e04_0|h20c2e04_0|h20c2e04_0|h20c2e04_0|h20c2e04_0|h541490c_2|h01ee5af_1002|hbdb9355_2|h20c2e04_0|h20c2e04_0|h20c2e04_0|h1ad7b7a_0']

Package libstdcxx-ng conflicts for:
lofreq=2.1.4 -> python[version='>=3.7,<3.8.0a0'] -> libstdcxx-ng[version='>=4.9|>=7.5.0|>=7.2.0']
lofreq=2.1.4 -> libstdcxx-ng[version='>=7.3.0']

Package zlib conflicts for:
lofreq=2.1.4 -> zlib[version='>=1.2.11,<1.3.0a0']
lofreq=2.1.4 -> samtools -> zlib[version='1.2.*|1.2.11|1.2.11.*|1.2.8.*|1.2.8']

Package libssh2 conflicts for:
bcftools=1.9 -> curl[version='>=7.64.1,<8.0a0'] -> libssh2[version='>=1.8.0,<2.0.0a0|>=1.9.0,<2.0a0|>=1.8.2,<2.0a0|>=1.8.0,<2.0a0']
samtools=1.9 -> curl[version='>=7.64.0,<8.0a0'] -> libssh2[version='>=1.8.0,<2.0.0a0|>=1.9.0,<2.0a0|>=1.8.2,<2.0a0|>=1.8.0,<2.0a0']

Package samtools conflicts for:
lofreq=2.1.4 -> samtools
samtools=1.9

Create conda environment

threading question

Hi, thanks for this pipeline, loving it. BUT am not yet a snakemake guru.

I have a question regarding optimizing of the compute. Seems like I can run the pipeline two ways:

start each sample independently
start the batch

If I do (1), I am starting vpipe with the --cores 128 option (AMD server with 128 physical cores) but it seems to use only 4 threads for those sub-programs that can use them. In the vpipe config files, I see the threads option, but that seems to be set to 1. So, where did it get the 4 and is there an easy way to change that globally? --threads=128 or something?

If I do (2), is there a way to specify the number of samples that should be processed simultaneously AND similar to above, the threads to use for each process? Something like process 8 samples at a time using 16 threads each.

Thanks
Bob

predicthaplo does not work on sars-cov-2 tutorial

Hi there. So apparently that data would not have enough divergence, so a crash with haploclique is expected.

But predicthaplo should apparently work but does not:

Removing output files of failed job predicthaplo since they might be corrupted:
samples/SRR10903401/20200102/variants/global/REF_aln.sam
Configuration:
  prefix = samples/SRR10903402/20200102/variants/global/predicthaplo/
  cons = /home/ubuntu/new-vpipe-haplotype-recon-experiments/V-pipe/workflow/../resources/sars-cov-2/NC_045512.2.fasta
  visualization_level = 1
  FASTAreads = samples/SRR10903402/20200102/variants/global/REF_aln.sam
  have_true_haplotypes = 0
  FASTAhaplos = 
  do_local_Analysis = 1
After parsing the reads in file samples/SRR10903402/20200102/variants/global/REF_aln.sam: average read length= -nan 0
First read considered in the analysis starts at position 100000. Last read ends at position 0
There are 0 reads
/usr/bin/bash: line 3: 25922 Segmentation fault      predicthaplo --sam samples/SRR10903402/20200102/variants/global/REF_aln.sam --reference /home/ubuntu/new-vpipe-haplotype-recon-experiments/V-pipe/workflow/../resources/sars-cov-2/NC_045512.2.fasta --prefix samples/SRR10903402/20200102/variants/global/predicthaplo/ --have_true_haplotypes 0 --min_length 0 2> >(tee -a samples/SRR10903402/20200102/variants/global/predicthaplo.err.log >&2)
[Fri Nov 18 11:34:03 2022]
Error in rule predicthaplo:
    jobid: 22
    input: samples/SRR10903402/20200102/alignments/REF_aln.bam, /home/ubuntu/new-vpipe-haplotype-recon-experiments/V-pipe/workflow/../resources/sars-cov-2/NC_045512.2.fasta
    output: samples/SRR10903402/20200102/variants/global/REF_aln.sam, samples/SRR10903402/20200102/variants/global/predicthaplo_haplotypes.fasta
    log: samples/SRR10903402/20200102/variants/global/predicthaplo.out.log, samples/SRR10903402/20200102/variants/global/predicthaplo.err.log (check log file(s) for error message)
    conda-env: /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-sars-cov-2-example/.snakemake/conda/648dc97f886b8633756d6cd60de0ff7c_
    shell:
        
            samtools sort -n samples/SRR10903402/20200102/alignments/REF_aln.bam -o samples/SRR10903402/20200102/variants/global/REF_aln.sam 2> >(tee samples/SRR10903402/20200102/variants/global/predicthaplo.err.log >&2)

            predicthaplo                 --sam samples/SRR10903402/20200102/variants/global/REF_aln.sam                 --reference /home/ubuntu/new-vpipe-haplotype-recon-experiments/V-pipe/workflow/../resources/sars-cov-2/NC_045512.2.fasta                 --prefix samples/SRR10903402/20200102/variants/global/predicthaplo/                 --have_true_haplotypes 0                 --min_length 0                 2> >(tee -a samples/SRR10903402/20200102/variants/global/predicthaplo.err.log >&2)

            # TODO: copy over actual haplotypes
            touch samples/SRR10903402/20200102/variants/global/predicthaplo_haplotypes.fasta
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job predicthaplo since they might be corrupted:
samples/SRR10903402/20200102/variants/global/REF_aln.sam
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-11-18T113259.947175.snakemake.log

I just followed the tutorial https://github.com/cbg-ethz/V-pipe/blob/master/docs/tutorial_sarscov2.md and configured in config.yaml

general:
    virus_base_config: 'sars-cov-2'
    # e.g: 'hiv', 'sars-cov-2', or absent
    
    # the tool selected as haplotype_reconstruction does the global haplotype reconstruction
    haplotype_reconstruction: predicthaplo

output:
    # enable global haplotype reconstruction
    # might not work with this data...
    #
    # > nope, this data does not support haplotype reconstruction, not enough divergence?...
    global: true

the rest is the default options

Expose parameters in config file?

Do we plan to have the parameters used in the various rules exposed in the config file? Or do we envision users who want to change some of them to do this in some other way?

[UX issue] unknown keywords in config file

Currently unknown keywords in config files are silently ignored by snakemake's default "snakemake.utils.validate" validator.

This makes typoe in sections and options hard to spot for user.

TODO warn about unknown keywords, search for similar-looking keyword

Errors on Snakemake 5.x

Reported by Maryam:
I tried to run v-pipe and I got the following error:

SyntaxError:
Not all output, log and benchmark files of rule gunzip contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
  File ".../V-pipe/vpipe.snake", line 415, in <module>

Then I changed snakemake version from 5.4.2 to 4.8.0 and I get a different error:

Building DAG of jobs...
MissingInputException in line 389 of .../V-pipe/vpipe.snake:
Missing input files for rule all:
samples/patient1/20170904/variants/local/snvs.csv

any comments would be appreciated.

According to @sposadac:

At first sight, can you try running it with:
[output]
local = False

Indeed, I haven?t tried snakemake 5+ versions, and there might be somethings that need updating. Therefore, running V-pipe using snakemake version 4.8.0 sounds a good idea for the time being.

MAFFT version

MAFFT version 7.310 does not run successfully.

Running with tutorial dataset giving all empty file

Hi,

I have tried to run the tutorial dataset (SARS-CoV-2) with the command mentioned on the webpage,
https://cbg-ethz.github.io/V-pipe/tutorial/sars-cov2/

However, after running successfully the whole script, the generated output files are all empty. Could anyone help me find out where is the problem?

Is it happeining in prinseq? Below is the output of prinseq;

The length cutoff is: 200
[prinseq-lite-0.20.4] [01/21/2022 23:24:18] Executing PRINSEQ with command: "perl prinseq-lite.pl -fastq samples/SRR10903401/20200102/extracted_data/R1.fastq -fastq2 samples/SRR10903401/20200102/extracted_data/R2.fastq -out_format 3 -trim_qual_right 30 -trim_qual_left 30 -trim_qual_type min -trim_qual_rule lt -trim_qual_window 10 -trim_qual_step 1 -log samples/SRR10903401/20200102/preprocessed_data/prinseq.out.log -min_len 200 -min_qual_mean 30 -ns_max_n 4 -out_good samples/SRR10903401/20200102/preprocessed_data/R -out_bad null"
[prinseq-lite-0.20.4] [01/21/2022 23:24:18] Parsing and processing input data: "samples/SRR10903401/20200102/extracted_data/R1.fastq" and "samples/SRR10903401/20200102/extracted_data/R2.fastq"
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Done parsing and processing input data
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Input sequences (file 1): 476,632
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Input bases (file 1): 71,758,102
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Input mean length (file 1): 150.55
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Input sequences (file 2): 476,632
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Input bases (file 2): 71,807,572
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Input mean length (file 2): 150.66
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Good sequences (pairs): 0
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Good sequences (singletons file 1): 0 (0.00%)
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Good sequences (singletons file 2): 0 (0.00%)
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Bad sequences (file 1): 476,632 (100.00%)
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Bad bases (file 1): 71,758,102
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Bad mean length (file 1): 150.55
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Bad sequences (file 2): 0 (0.00%)
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] Sequences filtered by specified parameters:
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] trim_qual_left: 180134
[prinseq-lite-0.20.4] [01/21/2022 23:25:05] min_len: 773130

Please let me know,

Best regards,
Wasim

ShoRAh: list index out of range

I run the whole pipeline starting with usual raw reads. I.e., alignments are made within V-pipe.
ShoRAh error log:

Traceback (most recent call last):
  File "/data/nasif12/home_if12/dvoretsk/projects/V-pipe-SARS/.snakemake/conda/5f9cc436/bin/shorah", line 14, in <module>
    main()
  File "/data/nasif12/home_if12/dvoretsk/projects/V-pipe-SARS/.snakemake/conda/5f9cc436/lib/python3.6/site-packages/shorah/cli.py", line 196, in main
    args.func(args)
  File "/data/nasif12/home_if12/dvoretsk/projects/V-pipe-SARS/.snakemake/conda/5f9cc436/lib/python3.6/site-packages/shorah/cli.py", line 75, in shotgun_run
    shotgun.main(args)
  File "/data/nasif12/home_if12/dvoretsk/projects/V-pipe-SARS/.snakemake/conda/5f9cc436/lib/python3.6/site-packages/shorah/shotgun.py", line 440, in main
    r = list(aligned_reads.keys())[0]
IndexError: list index out of range

Getting snake error

VPIPE_BASEDIR = /opt/V-dock/V-pipe/workflow
Importing legacy configuration file vpipe.config
MissingSectionHeaderError in line 107 of /opt/V-dock/V-pipe/workflow/rules/common.smk:
File contains no section headers.
file: 'vpipe.config', line: 1
'general:\n'
File "/opt/V-dock/V-pipe/workflow/Snakefile", line 12, in
File "/opt/V-dock/V-pipe/workflow/rules/common.smk", line 259, in
File "/opt/V-dock/V-pipe/workflow/rules/common.smk", line 170, in process_config
File "/opt/V-dock/V-pipe/workflow/rules/common.smk", line 107, in load_legacy_ini
File "/opt/conda/envs/snakemake/lib/python3.10/configparser.py", line 698, in read
File "/opt/conda/envs/snakemake/lib/python3.10/configparser.py", line 1086, in _read

Missing input files for rule generate_web_visualization

Hi,
I have installed v-pipe and modified the config.yaml and created the folder "samples".
I tried the ./vpipe --dryrun and I've got:

VPIPE_BASEDIR = /Users/opentrons-b10/V-pipe/workflow
Using base configuration virus SARS-CoV-2
WARNING: protocols YAML look-up file </Users/opentrons-b10/V-pipe/workflow/../resources/sars-cov-2/primers.yaml> specified, but no sample ever uses it: fourth column absent from samples TSV file.
Building DAG of jobs...
MissingInputException in rule generate_web_visualization in file /Users/opentrons-b10/V-pipe/workflow/rules/visualization.smk, line 10:
Missing input files for rule generate_web_visualization:
output: samples/20230220/torino/visualization/snv_calling.html, samples/20230220/torino/visualization/alignment.html, samples/20230220/torino/visualization/reference_uri_file, samples/20230220/torino/visualization/bam_uri_file
wildcards: dataset=samples/20230220/torino
affected files:
/Users/opentrons-b10/V-pipe/workflow/../resources/sars-cov-2/primers/v3/nCoV-2019.tsv

Could you please help me ion solving this issue?
many thanks for the help,
Carlotta Olivero

wrong url for release download in quick install

https://github.com/cbg-ethz/V-pipe/blob/master/utils/quick_install.sh#L216 is wrong

the url for the tagged version tar archive is (now)

https://github.com/cbg-ethz/V-pipe/archive/refs/tags/v2.99.1.tar.gz

(with refs/tags)

and not just

https://github.com/cbg-ethz/V-pipe/archive/v2.99.1.tar.gz

frameshift_deletion_checks, simBench & Haplotype callers

Dear V-pipe authors,

Thank you very much for your pipeline. I attempt to use V-pipe to assess the intra-host viral diversity in plant samples. However, I encountered the following issues:

When running the Quality assessment module, (setting it to true in the config.yaml file), I get an error related to the frameshift_deletion_checks script. I have attached the relevant error log below:

Error in rule frameshift_deletions_checks: 

    jobid: 9 

    output: results_rev/capsicum/41239130/references/frameshift_deletions_check.tsv 

    log: results_rev/capsicum/41239130/references/frameshift_deletions_check.out.log, results_rev/capsicum/41239130/references/frameshift_deletions_check.err.log (check log file(s) for error message) 

    conda-env: {...}/vpipe_merged_reads/41239130_Capsicum_annuum_TSWV/.snakemake/conda/c2a6b8c9e98375cd16af9408a0e9b8b2 

    shell: 

 

        frameshift_deletions_checks -i results_rev/capsicum/41239130/alignments/REF_aln.bam -c results_rev/capsicum/41239130/references/consensus.bcftools.fasta -f references_rev/41239130_tswv_conc_rev.fasta -g  --english=true -o results_rev/capsicum/41239130/references/frameshift_deletions_check.tsv 2> >(tee results_rev/capsicum/41239130/references/frameshift_deletions_check.err.log >&2) 

 

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) 

 

Shutting down, this might take some time. 

Exiting because a job execution failed. Look above for error message 

Complete log: {...}/vpipe_merged_reads/41239130_Capsicum_annuum_TSWV/.snakemake/log/2022-11-01T152833.441917.snakemake.log

The error is not fixed even if I provide the .gff file in the config.yaml with the following two alternative syntaxes:

root: 

   frameshift_deletion_checks :  

        genes_gff : gff_dir/[…].gff3

frameshift_deletion_checks: 

    genes_gff : gff_dir/[…].gff3

I cannot find the benchmarking functionality in the latest documentation. In the config HTML file, there are no explanations and tags on how to generate reads and overall use the benchmarking modules. Downloading the benchmark branch and adjusting to the deprecated documentation was not successful. Are you planning to update the documentation for the newest version of your master branch, so that the benchmarking capabilities of the pipeline can be utilized? Can you please point me to the documentation of the benchmarking?
Considering the Haplotype calling modules, I consistently get segmentation faults when using PredictHaplo (changing between local to global analysis) and (core dumped) errors when using Haploclique. I was trying these options as I want to perform reference-based reconstruction of haplotypes. Can you provide any ideas on the reasons that such errors pop up?

Haploclique error log:

terminate called after throwing an instance of 'std::bad_alloc'
what():  std::bad_alloc

PredictHaplo error log remains empty, but after completing the local haplotype reconstruction, raises a segmentation fault.
It may be relevant that my datasets show very high coverage (50.000x -350.000x).

Please let me know if you need any more information to answer these issues. Thank you very much for your consideration.

Dimitris Karapliafis

Add Haploclique

posterior values using lofreq rule?

Hey, thanks for all the effort you put in this pipeline!

Because I have to call variants in regions with quite low coverage I recently tried running the v-pipe SARS-CoV branch using lofreq as snv caller, defined by the config file as written in the documentation. After some issues I also adjusted the "coverage_intervals"; "coverage" value to 10 (to fit the lofreq filter).

In the visualization, however, I only get posterior scores of 1 for every variant. Since it also calls the ShoRAH rule after lofreq I was wondering why this is the case but couldn't find anything so far.
Is this an expected behaviour?
Is there a way to adjust the snv rule to get the posterior scores also when using lofreq as a snv caller?
Do you maybe have any recommendations how to apply certain frequency filtering on lofreq variants, regardless of whether they can be included in the visualization afterwards or not? (I think it's calculating a p-value but I couldn't find how to make use of this in v-pipe)

Any hints where I could start to look at, would be highly appreciated. Thanks!

got stuck in this rule convert_to_ref, convert_reference did not work

Job counts:
count jobs
1 convert_to_ref
1

    convert_reference -t gi|1142969405|gb|KY272010.1| -m references/ALL_aln_ambig.fasta -i samples/HFMD/71147/alignments/full_aln.bam -o samples/HFMD/71147/alignments/REF_aln.bam > samples/HFMD/71147/alignments/convert_to_ref.out.log 2> >(tee samples/HFMD/71147/alignments/convert_to_ref.err.log >&2)

/bin/bash: 1142969405: command not found
/bin/bash: gb: command not found
/bin/bash: KY272010.1: command not found
/bin/bash: -m: command not found
/bin/bash: 1142969405: command not found
/bin/bash: gb: command not found
/bin/bash: KY272010.1: command not found
/bin/bash: -m: command not found
/bin/bash: 1142969405: command not found
/bin/bash: gb: command not found
/bin/bash: KY272010.1: command not found
/bin/bash: 1142969405: command not found
/bin/bash: gb: command not found
/bin/bash: KY272010.1: command not found
/bin/bash: -m: command not found
/bin/bash: -m: command not found
usage: convert_reference [-h] -t TO [-v] -m input -i input [-o output] [-p]
[-X] [-H]
convert_reference: error: the following arguments are required: -m, -i
usage: convert_reference [-h] -t TO [-v] -m input -i input [-o output] [-p]
[-X] [-H]
convert_reference: error: the following arguments are required: -m, -i
usage: convert_reference [-h] -t TO [-v] -m input -i input [-o output] [-p]
[-X] [-H]
convert_reference: error: the following arguments are required: -m, -i
usage: convert_reference [-h] -t TO [-v] -m input -i input [-o output] [-p]
[-X] [-H]
convert_reference: error: the following arguments are required: -m, -i
[Fri Oct 30 07:39:20 2020]
Error in rule convert_to_ref:
jobid: 0
output: samples/HFMD/71157/alignments/REF_aln.bam
log: samples/HFMD/71157/alignments/convert_to_ref.out.log, samples/HFMD/71157/alignments/convert_to_ref.err.log (check log file(s) for error message)
conda-env: /Volumes/AKiTiO_duo3/CoxA10/trimmed/match_to_virus_genome/v-pipe-working-dir/.snakemake/conda/77a26a2e
shell:

    convert_reference -t gi|1142969405|gb|KY272010.1| -m references/ALL_aln_ambig.fasta -i samples/HFMD/71157/alignments/full_aln.bam -o samples/HFMD/71157/alignments/REF_aln.bam > samples/HFMD/71157/alignments/convert_to_ref.out.log 2> >(tee samples/HFMD/71157/alignments/convert_to_ref.err.log >&2)

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

ImportError: Bio.Alphabet has been removed from Biopython

Hello,
I am having a first try a V-pipe and it seems that Bio.Alphabet has been removed from BioPython this month, causing an error/crash at the report generation step.

Traceback (most recent call last):
  File "/Users/ywenger/vpipe/V-pipe/scripts/assemble_visualization_webpage.py", line 16, in <module>
    from Bio.Alphabet import IUPAC
  File "/Users/ywenger/vpipe/work/.snakemake/conda/c7ff4d0f/lib/python3.8/site-packages/Bio/Alphabet/__init__.py", line 20, in <module>
    raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

Best,
Yvan

[UX issue] multiple GFF

There are multiple places to specify GFFs in V-pipe and this is confusing to users.

section input, property gff_directory : a directory with multiple GFFs used for the visualisation
- visualisation supports using multiple GFF, e.g.: for genes/open reading frames, for mature product/proteases-clived, for domains on the proteins.
section input, property metainfo_file : a YAML file giving extra information about the GFF inside the directory above
- as it is used only for visualisation, the metainfo is free-form.
section frameshit_deletions_checks, property genes_gff : a single GFF, specifically gens, used only by QA.
- this one is currently completely different from the visualisation's ones.

All this leads to confusions for users, see here

Provide Dockerfile & prebuilt/automatically built container image

This software should be available in an automatically build container image with all dependencies already installed and ready to run with a well defined interface.

The container image should be available via quay or docker hub.

This was done manually at some point but the automatic pipeline has not been established yet: https://quay.io/repository/dryak/v-pipe?tab=tags

Redirect output to log files

Having all output on the console is not really helpful when run in a cluster or multi-core environment.

Cause of (shorah_regions) error?

Any idea about the cause of the following error ?
Best

[Mon Feb  1 23:30:09 2021]
Finished job 16.
21 of 27 steps (78%) done

[Mon Feb  1 23:30:09 2021]
localrule shorah_regions:
    input: variants/coverage_intervals.tsv
    output: samples/10919588/K-5771/variants/coverage_intervals.tsv, samples/10919594/K-5770/variants/coverage_intervals.tsv
    jobid: 10

VPIPE_BASEDIR = /home/x/V-pipe
Job counts:
	count	jobs
	1	shorah_regions
	1
[Mon Feb  1 23:30:10 2021]
Error in rule shorah_regions:
    jobid: 0
    output: samples/10919588/K-5771/variants/coverage_intervals.tsv, samples/10919594/K-5770/variants/coverage_intervals.tsv

RuleException:
FileNotFoundError in line 65 of /home/x/V-pipe/rules/snv.smk:
[Errno 2] No such file or directory: 'samples/10919588-K/5771/variants/coverage_intervals.tsv'
  File "/home/x/V-pipe/rules/snv.smk", line 65, in __rule_shorah_regions
  File "/home/x/miniconda3/envs/V-pipe/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Exiting because a job execution failed. Look above for error message
[Mon Feb  1 23:30:19 2021]
Finished job 3.
22 of 27 steps (81%) done
[Mon Feb  1 23:30:22 2021]
Finished job 4.
23 of 27 steps (85%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

vpipe.config

[input]
reference = references/ref.fasta
samples_file = samples.tsv
paired = True

[output]
snv = True
local = True
global = False

[hmm_align]
leave_msa_temp = true

[general]
aligner = bwa
threads = 10
snv_caller = shorah

[preprocessing]
extra = -ns_max_n 4 -min_qual_mean 30 -trim_qual_left 30 -trim_qual_right 30 -trim_qual_window 10

Primers trimming

Do V-pipe trim primers ? I can't find the information. (I see only this param in the conf file: primers_file=)

If yes, is it done post-alignment ?

shorah sampling status

I am performing local haplotype reconstruction on more than 100 samples with shorah.

Is there a way to check whether the MCMC sampling has converged for a particular region or overall?

Example to get people started

Having a little example with sample files somewhere up front in the README would be nice.

How to run analysis for single-end reads

I have successfully run test data and Wuhan data on which is paired-end data, but not able to run single-end data as there is no specific guide/manual for it.
I have got to know that it supports single-end data as reported in publication.
I tried by renaming file to read_R1.fastq but no sucess,

(base) zuber@gbrc-hpc-42:/opt/data/env-V-pipe/ENV/work$ ./vpipe --cores 40
VPIPE_BASEDIR = /opt/data/env-V-pipe/ENV/V-pipe
AssertionError in line 369 of /opt/data/env-V-pipe/ENV/V-pipe/rules/common.smk:
ERROR: Line '3' does not contain at least two entries!
File "/opt/data/env-V-pipe/ENV/V-pipe/vpipe.snake", line 11, in
File "/opt/data/env-V-pipe/ENV/V-pipe/rules/common.smk", line 369, in

Error in rule hmm_align: HIV in Vpipe

Hello, I would like to ask what does this error mean and how will I solve this error?

Activating conda environment: .snakemake/conda/0a11ef3f67c5c382159134f72bbed3ac_
ERROR: Count argument '2-testing-work-results-2VM-sim-20170904' is not an integral/floating point value! Aborting.
[Thu Feb 2 15:03:36 2023]
Error in rule hmm_align:
jobid: 3
input: /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/references/initial_consensus.fasta, /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/preprocessed_data/R1.fastq, /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/preprocessed_data/R2.fastq
output: /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/alignments/full_aln.sam, /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/alignments/rejects.sam, /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/references/ref_ambig.fasta, /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/references/ref_majority.fasta
log: /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/alignments/ngshmmalign.out.log, /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/alignments/ngshmmalign.err.log (check log file(s) for error details)
conda-env: /home/diamantev/HIV/Vpipe/working_2/.snakemake/conda/0a11ef3f67c5c382159134f72bbed3ac_
shell:

        CONSENSUS_NAME=/home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904
        CONSENSUS_NAME="${CONSENSUS_NAME#*/}"
        CONSENSUS_NAME="${CONSENSUS_NAME//\//-}"

        # 1. clean previous run
        rm -rf   /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/alignments
        rm -f    /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/references/ref_ambig.fasta
        rm -f    /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/references/ref_majority.fasta
        mkdir -p /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/alignments
        mkdir -p /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/references

        # 2. perform alignment # -l = leave temps
        ngshmmalign -v  -R /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/references/initial_consensus.fasta -o /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/alignments/full_aln.sam -w /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/alignments/rejects.sam -t 1 -N "${CONSENSUS_NAME}"  /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/preprocessed_data/R1.fastq /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/preprocessed_data/R2.fastq > /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/alignments/ngshmmalign.out.log 2> >(tee /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/alignments/ngshmmalign.err.log >&2)

        # 3. move references into place
        mv /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/{alignments,references}/ref_ambig.fasta
        mv /home/diamantev/HIV/Vpipe/working_2/testing/work/results/2VM-sim/20170904/{alignments,references}/ref_majority.fasta
        
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-02-02T150334.285845.snakemake.log

Haplotype percentage or frequency estimation

Hi,
Thank you for your support and creating of this pipeline,

The problem I face is the lack of access to the percentage and frequency of each haplotype.

What I did:
Set global True in the config file
Then get the freq_est.py script from here (https://bitbucket.org/jbaaijens/savage/src/master/)
Unfortunately, it does not work and there is no percentage for haplotypes.

Is this not possible by default in the pipeline? (Like what is seen in the webinars and graphs of this pipeline?)

Does the use of Haploclique have an effect on the creation of this report for the percentage of haplotypes?

Sincerely yours,
Naser

Dynamic input/ouput files

Dynamic files are marked as "experimental" and more than one wildcard is marked as "not supported". Rule snv uses two wildcards.
https://bitbucket.org/snakemake/snakemake/issues/577/python-interrupts-snakemake-with-keyerror

VICUNA: conda environment

Conda environment for rule 'initial_vicuna' does not work properly. The package on bioconda, mvicuna, is a slightly different tool with different command line arguments and interface. Moreover, VICUNA and mvicuna are not longer maintained.
Additionally, the interface for jar files is implemented differently when dependencies are fetched from the bioconda channel.

Conda not installed in standard location

I am working on a server where conda and activate are not in a central location, and therefore snakemake isn't able to load up the environments, giving a /usr/bin/bash: /usr/bin/activate: No such file or directory error. The snakemake envs load up fine if I do it myself.

Activating conda environment: /XXX/V-pipe/.snakemake/conda/3d1013d0
/usr/bin/bash: /usr/bin/activate: No such file or directory

# but this loads it up fine
conda activate /XXX/V-pipe/.snakemake/conda/3d1013d0

How can I adjust the snakemake or the init_project.sh to use conda or conda activate from a different location than /usr/bin/activate?

I do have --use-conda in my call:

./vpipe --dryrun --use-conda
./vpipe --cores 24 --use-conda

Thanks.

Move hard-coded CSS to external cascading stylesheet.

For future development it is inevitable to externalize the CSS to respective stylesheets. This not only improves maintainability and enforces standardization, it also encourages reusability of defined classes. I might also make sense to make full use of CSS 3 to keep up with contemporary Web 11.2 standards.

Best,
Simon

SARS-CoV-2 workflow comparison - kindly check if your work is represented correctly

Hello V-pipe Team,

I am from the University Hospital Essen, Germany, and we work extensively with SARS-CoV-2 in our research. We have also developed a SARS-CoV-2 workflow. In preparation for the publication of our workflow, we have looked at several other SARS-CoV-2 related workflows, including your work. We will present this review in the publication and want to ensure that your work is represented as accurately as possible.

Moreover, there is currently little to no current overview of SARS-CoV-2 related workflows. Therefore, we have decided to make the above comparison publicly available via this GitHub repository. It contains a table with an overview of the functions of different SARS-CoV-2 workflows and the tools used to implement these functions.

We would like to give you the opportunity to correct any misunderstandings on our side. Please take a moment to make sure we are not misrepresenting your work or leaving out important parts of it by taking a look at this overview table. If you feel that something is missing or misrepresented, please feel free to give us feedback by contributing directly to the repository.

Thank you very much!

cc @alethomas

Warning: cannot detect conda environment

Hi
The following problem I got with V-pipe master (HIV-1 analysis): when I try to initialise the project I have the following Warning:
Warning: cannot detect conda environment V-pipe project initialized!

Conda V-pipe environment is activated.

And after trying --dryrun, the following error came up:
$ ./vpipe --dryrun
VPIPE_BASEDIR = /Users/sviat/V-pipe
Migrating .snakemake folder to new format...
Migration complete
Building DAG of jobs...
WorkflowError:
WorkflowError:
MissingInputException: Missing input files for rule gunzip:
samples/ADA1038B/20210521/extracted_data/R1.fastq.gz
CyclicGraphException: Cyclic dependency on rule convert_to_ref.
MissingInputException: Missing input files for rule sam2bam:
samples/ADA1038B/20210521/alignments/REF_aln.sam

ADA1038B is the first sample in my sample list.
Samples prepared according to the manual:
v-pipe_workdir/samples/ADA1038B/20210521/raw_data/ADA1038B_R1.fastq
v-pipe_workdir/samples/ADA1038B/20210521/raw_data/ADA1038B_R2.fastq
...

If the files are in *fastq.gz format, the error message looks a bit different:
Building DAG of jobs...
MissingInputException in line 10 of /Users/sviat/V-pipe/rules/quality_assurance.smk:
Missing input files for rule gunzip:
samples/ADA1038B/20210521/extracted_data/R1.fastq.gz

Can you please help me with this issue?
Thank you!
P.S. SARS-CoV-2 V-pipe branch works perfect!

The order of jobs does not reflect the order of execution. Any help?

I have run "./vpipe --dryrun" and got this:

VPIPE_BASEDIR = /users/Carlotta/v-pipe/testing/V-pipe
Building DAG of jobs...
Job stats:
job count min threads max threads

all 1 1 1
total 1 1 1

[Tue Oct 5 15:46:26 2021]
localrule all:
jobid: 0
resources: tmpdir=/var/folders/rb/rdcsh0j507n5b1ctytvr7xy00000gn/T

Job stats:
job count min threads max threads

all 1 1 1
total 1 1 1

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

What does it mean?
Any help is really appreciated, thanks a lot!
Carlotta

hiv-tutorial works with haploclique, but fails with predicthaplo

Removing output files of failed job predicthaplo since they might be corrupted:
samples/CAP188/4/variants/global/REF_aln.sam
After parsing the reads in file samples/CAP217/4390/variants/global/REF_aln.sam: average read length= -nan 0
First read considered in the analysis starts at position 100000. Last read ends at position 0
There are 0 reads
/usr/bin/bash: line 3:  1586 Segmentation fault      predicthaplo --sam samples/CAP217/4390/variants/global/REF_aln.sam --reference samples/cohort_consensus.fasta --prefix samples/CAP217/4390/variants/global/predicthaplo/ --have_true_haplotypes 0 --min_length 0 2> >(tee -a samples/CAP217/4390/variants/global/predicthaplo.err.log >&2)
[Fri Nov 18 12:35:01 2022]
Error in rule predicthaplo:
    jobid: 41
    input: samples/CAP217/4390/alignments/REF_aln.bam, samples/cohort_consensus.fasta
    output: samples/CAP217/4390/variants/global/REF_aln.sam, samples/CAP217/4390/variants/global/predicthaplo_haplotypes.fasta
    log: samples/CAP217/4390/variants/global/predicthaplo.out.log, samples/CAP217/4390/variants/global/predicthaplo.err.log (check log file(s) for error message)
    conda-env: /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/.snakemake/conda/2bf4df4a26b143afa975c4ca179e069b_
    shell:
        
            samtools sort -n samples/CAP217/4390/alignments/REF_aln.bam -o samples/CAP217/4390/variants/global/REF_aln.sam 2> >(tee samples/CAP217/4390/variants/global/predicthaplo.err.log >&2)

            predicthaplo                 --sam samples/CAP217/4390/variants/global/REF_aln.sam                 --reference samples/cohort_consensus.fasta                 --prefix samples/CAP217/4390/variants/global/predicthaplo/                 --have_true_haplotypes 0                 --min_length 0                 2> >(tee -a samples/CAP217/4390/variants/global/predicthaplo.err.log >&2)

            # TODO: copy over actual haplotypes
            touch samples/CAP217/4390/variants/global/predicthaplo_haplotypes.fasta
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job predicthaplo since they might be corrupted:
samples/CAP217/4390/variants/global/REF_aln.sam
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-11-18T121401.627778.snakemake.log

I followed https://github.com/cbg-ethz/V-pipe/blob/master/docs/tutorial_hiv.md and then configured the following:

general:
    virus_base_config: 'hiv'
    # e.g: 'hiv', 'sars-cov-2', or absent

    
    # enable hyplotype reconstruction

    # this failed:
    haplotype_reconstruction: predicthaplo

    # this worked:
    #haplotype_reconstruction: haploclique

input:
    samples_file: samples.tsv

output:
    datadir: samples/

    trim_primers: false
    # see: config/README.md#amplicon-protocols
    snv: false
    local: false

    # enable hyplotype reconstruction
    global: true
    
    
    visualization: false
    diversity: false
    QA: false
    upload: false
    dehumanized_raw_reads: false

Have a wiki homepage like other repositories

When I go to the wiki page of this project, I just see a list of the four pages.

It should look more like: https://github.com/npm/cli/wiki with a short introductory text.

hiv-tutorial fails with haplotype_reconstruction: savage, and SAVAGE command output is dumped into terminal instead of log

here's the log... savage failed on all three samples... the output was not put into a log file:

Activating conda environment: .snakemake/conda/e6edb6f18a80cf5f3d8af13ded28a55d_
patch 3 - De novo overlap computations - RunningProcessing output[Fri Nov 18 13:13:47 2022]
Finished job 30.
26 of 30 steps (87%) done
patch 9 - De novo overlap computations - Running rust-overlapsINFO      2022-11-18 13:13:51       SamToFastq

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    SamToFastq -I samples/CAP217/4390/alignments/REF_aln.bam -FASTQ samples/CAP217/4390/variants/global/R1.fastq -SECOND_END_FASTQ samples/CAP217/4390/variants/global/R2.fastq -RC false
**********


patch 10 - De novo overlap computations - Running rust-overlaps13:13:51.522 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/.snakemake/conda/e6edb6f18a80cf5f3d8af13ded28a55d_/share/picard-2.22.3-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Nov 18 13:13:51 CET 2022] SamToFastq INPUT=samples/CAP217/4390/alignments/REF_aln.bam FASTQ=samples/CAP217/4390/variants/global/R1.fastq SECOND_END_FASTQ=samples/CAP217/4390/variants/global/R2.fastq RE_REVERSE=false    OUTPUT_PER_RG=false COMPRESS_OUTPUTS_PER_RG=false RG_TAG=PU INTERLEAVE=false INCLUDE_NON_PF_READS=false CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri Nov 18 13:13:51 CET 2022] Executing as ubuntu@TIM-N716 on Linux 5.10.102.1-microsoft-standard-WSL2 amd64; OpenJDK 64-Bit Server VM 11.0.8-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.22.3
WARNING: BAM index file /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/samples/CAP217/4390/alignments/REF_aln.bam.bai is older than BAM /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/samples/CAP217/4390/alignments/REF_aln.bam
patch 3 - De novo overlap computations - RunningProcessing output[Fri Nov 18 13:13:51 CET 2022] picard.sam.SamToFastq done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=536870912
patch 11 - De novo overlap computations - Running rust-overlaps
-------------------------------------------
SAVAGE - Strain Aware VirAl GEnome assembly
-------------------------------------------
Version: 0.4.2
Author: Jasmijn Baaijens
    
Command used:
/home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/.snakemake/conda/e6edb6f18a80cf5f3d8af13ded28a55d_/opt/savage-0.4.2/savage.py -t 1 --split 20 -p1 /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/samples/CAP217/4390/variants/global/R1.fastq -p2 /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/samples/CAP217/4390/variants/global/R2.fastq -o samples/CAP217/4390/variants/global/

Parameter values:
filtering = True
reference = None
merge_contigs = 0.0
remove_branches = True
contig_len_stage_c = 100
split_num = 20
use_subreads = True
no_assembly = False
diploid_contig_len = 200
overlap_stage_c = 100
input_p2 = /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/samples/CAP217/4390/variants/global/R2.fastq
input_p1 = /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/samples/CAP217/4390/variants/global/R1.fastq
count_strains = False
min_clique_size = 4
diploid_overlap_len = 30
compute_overlaps = True
preprocessing = True
threads = 1
stage_a = True
stage_b = True
stage_c = True
max_tip_len = None
min_overlap_len = None
outdir = samples/CAP217/4390/variants/global/
average_read_len = None
sfo_mm = 50
revcomp = False
input_s = None
diploid = False

Input fastq stats:
Number of single-end reads = 0
Number of paired-end reads = 4872
Total number of bases = 1398543
Average sequence length = 287.1

Using max_tip_len = 287
Using min_overlap_len = 172

*******************
Preprocessing input
Done!                                                        s 
********************
Overlap computations
Done!                                                            t nningProcessing output 
**************
SAVAGE Stage a
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [33, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [10, 0]
Processing outputpipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [8, 0]
patch 7 - De novo overlap computationspipeline_per_stage.py  
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [3, 0]
 - Running rust-overlapspipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [0, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [0, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [0, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [1, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [33, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [9, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [0, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [1, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [16, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [4, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [1, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [15, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [0, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [2, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [33, 0]
pipeline_per_stage.py
Processing outputStage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [1, 0]
combine_contigs.py
cat: stage_a/patch0/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch1/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch2/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch3/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch4/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch5/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch6/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch7/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch9/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch10/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch11/stage_a/singles.fastq: No such file or directory
Processing outputcat: stage_a/patch12/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch13/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch14/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch15/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch16/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch17/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch18/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch19/stage_a/singles.fastq: No such file or directory
patch 8 - De novo overlap computationsDone!                  
**************
SAVAGE Stage b
Empty set of contigs from Stage a (contigs_stage_a.fasta) --> Exiting SAVAGE.
[Fri Nov 18 13:13:58 2022]
Error in rule savage:
    jobid: 41
    input: samples/CAP188/30/alignments/REF_aln.bam
    output: samples/CAP188/30/variants/global/R1.fastq, samples/CAP188/30/variants/global/R2.fastq, samples/CAP188/30/variants/global/contigs_stage_c.fasta
    log: samples/CAP188/30/variants/global/savage.out.log, samples/CAP188/30/variants/global/savage.err.log (check log file(s) for error message)
    conda-env: /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/.snakemake/conda/e6edb6f18a80cf5f3d8af13ded28a55d_
    shell:
        
            # Convert BAM to FASTQ without re-reversing reads - SAVAGE expect all reads in the same direction
            source /home/ubuntu/new-vpipe-haplotype-recon-experiments/V-pipe/workflow/scripts/functions.sh
            SamToFastq picard I=samples/CAP188/30/alignments/REF_aln.bam FASTQ=samples/CAP188/30/variants/global/R1.fastq SECOND_END_FASTQ=samples/CAP188/30/variants/global/R2.fastq RC=false 2> >(tee samples/CAP188/30/variants/global/savage.err.log >&2)
            # Remove /1 and /2 from the read names
            sed -i -e "s:/1$::" samples/CAP188/30/variants/global/R1.fastq
            sed -i -e "s:/2$::" samples/CAP188/30/variants/global/R2.fastq

            R1=${PWD}/samples/CAP188/30/variants/global/R1.fastq
            R2=${PWD}/samples/CAP188/30/variants/global/R2.fastq
            savage -t 1 --split 20 -p1 ${R1} -p2 ${R2} -o samples/CAP188/30/variants/global/ 2> >(tee -a samples/CAP188/30/variants/global/savage.err.log >&2)
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job savage since they might be corrupted:
samples/CAP188/30/variants/global/R1.fastq, samples/CAP188/30/variants/global/R2.fastq
Done!                                                            t 
**************
SAVAGE Stage a
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [127, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [11, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [16, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [63, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [15, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [18, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [36, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [70, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [48, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [98, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [12, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [14, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [111, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [58, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [51, 0]
Processing outputpipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [53, 0]
pipeline_per_stage.py
patch 8 - De novo overlap computationsStage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [68, 0]
 - Running rust-overlapspipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [9, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [32, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [52, 0]
combine_contigs.py
cat: stage_a/patch0/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch1/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch2/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch3/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch4/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch5/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch6/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch7/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch8/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch9/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch10/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch11/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch12/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch13/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch14/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch15/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch17/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch19/stage_a/singles.fastq: No such file or directory
Done!
**************
SAVAGE Stage b
Empty set of contigs from Stage a (contigs_stage_a.fasta) --> Exiting SAVAGE.
[Fri Nov 18 13:14:09 2022]
Error in rule savage:
    jobid: 39
    input: samples/CAP188/4/alignments/REF_aln.bam
    output: samples/CAP188/4/variants/global/R1.fastq, samples/CAP188/4/variants/global/R2.fastq, samples/CAP188/4/variants/global/contigs_stage_c.fasta
    log: samples/CAP188/4/variants/global/savage.out.log, samples/CAP188/4/variants/global/savage.err.log (check log file(s) for error message)
    conda-env: /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/.snakemake/conda/e6edb6f18a80cf5f3d8af13ded28a55d_
    shell:
        
            # Convert BAM to FASTQ without re-reversing reads - SAVAGE expect all reads in the same direction
            source /home/ubuntu/new-vpipe-haplotype-recon-experiments/V-pipe/workflow/scripts/functions.sh
            SamToFastq picard I=samples/CAP188/4/alignments/REF_aln.bam FASTQ=samples/CAP188/4/variants/global/R1.fastq SECOND_END_FASTQ=samples/CAP188/4/variants/global/R2.fastq RC=false 2> >(tee samples/CAP188/4/variants/global/savage.err.log >&2)
            # Remove /1 and /2 from the read names
            sed -i -e "s:/1$::" samples/CAP188/4/variants/global/R1.fastq
            sed -i -e "s:/2$::" samples/CAP188/4/variants/global/R2.fastq

            R1=${PWD}/samples/CAP188/4/variants/global/R1.fastq
            R2=${PWD}/samples/CAP188/4/variants/global/R2.fastq
            savage -t 1 --split 20 -p1 ${R1} -p2 ${R2} -o samples/CAP188/4/variants/global/ 2> >(tee -a samples/CAP188/4/variants/global/savage.err.log >&2)
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job savage since they might be corrupted:
samples/CAP188/4/variants/global/R1.fastq, samples/CAP188/4/variants/global/R2.fastq
Done!                                                            t 
**************
SAVAGE Stage a
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [31, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [13, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [56, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [18, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [29, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [25, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [34, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [51, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [193, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [55, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [29, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [25, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [83, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [18, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [85, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [9, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [3, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [28, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [44, 0]
pipeline_per_stage.py
Stage a done in 1 iterations
Maximum read length per iteration:      [0]
Number of contigs per iteration:        [0]
Number of overlaps per iteration:       [27, 0]
combine_contigs.py
cat: stage_a/patch0/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch1/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch2/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch3/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch4/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch5/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch6/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch7/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch9/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch10/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch11/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch13/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch14/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch15/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch16/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch17/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch18/stage_a/singles.fastq: No such file or directory
cat: stage_a/patch19/stage_a/singles.fastq: No such file or directory
Done!
**************
SAVAGE Stage b
Empty set of contigs from Stage a (contigs_stage_a.fasta) --> Exiting SAVAGE.
[Fri Nov 18 13:14:31 2022]
Error in rule savage:
    jobid: 40
    input: samples/CAP217/4390/alignments/REF_aln.bam
    output: samples/CAP217/4390/variants/global/R1.fastq, samples/CAP217/4390/variants/global/R2.fastq, samples/CAP217/4390/variants/global/contigs_stage_c.fasta
    log: samples/CAP217/4390/variants/global/savage.out.log, samples/CAP217/4390/variants/global/savage.err.log (check log file(s) for error message)
    conda-env: /home/ubuntu/new-vpipe-haplotype-recon-experiments/work-hiv-example/.snakemake/conda/e6edb6f18a80cf5f3d8af13ded28a55d_
    shell:
        
            # Convert BAM to FASTQ without re-reversing reads - SAVAGE expect all reads in the same direction
            source /home/ubuntu/new-vpipe-haplotype-recon-experiments/V-pipe/workflow/scripts/functions.sh
            SamToFastq picard I=samples/CAP217/4390/alignments/REF_aln.bam FASTQ=samples/CAP217/4390/variants/global/R1.fastq SECOND_END_FASTQ=samples/CAP217/4390/variants/global/R2.fastq RC=false 2> >(tee samples/CAP217/4390/variants/global/savage.err.log >&2)
            # Remove /1 and /2 from the read names
            sed -i -e "s:/1$::" samples/CAP217/4390/variants/global/R1.fastq
            sed -i -e "s:/2$::" samples/CAP217/4390/variants/global/R2.fastq

            R1=${PWD}/samples/CAP217/4390/variants/global/R1.fastq
            R2=${PWD}/samples/CAP217/4390/variants/global/R2.fastq
            savage -t 1 --split 20 -p1 ${R1} -p2 ${R2} -o samples/CAP217/4390/variants/global/ 2> >(tee -a samples/CAP217/4390/variants/global/savage.err.log >&2)
            
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job savage since they might be corrupted:
samples/CAP217/4390/variants/global/R1.fastq, samples/CAP217/4390/variants/global/R2.fastq
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-11-18T125029.533222.snakemake.log

I followed https://github.com/cbg-ethz/V-pipe/blob/master/docs/tutorial_hiv.md, my config was

general:
    virus_base_config: 'hiv'
    # e.g: 'hiv', 'sars-cov-2', or absent

    
    # enable hyplotype reconstruction

    # let's try
    haplotype_reconstruction: savage

    # this failed:
    #haplotype_reconstruction: predicthaplo

    # this worked:
    #haplotype_reconstruction: haploclique

input:
    samples_file: samples.tsv

output:
    datadir: samples/

    trim_primers: false
    # see: config/README.md#amplicon-protocols
    snv: false
    local: false

    # enable hyplotype reconstruction
    global: true
    
    
    visualization: false
    diversity: false
    QA: false
    upload: false
    dehumanized_raw_reads: false

Solved. Verification error.

Sorry. False alarm caused by the automatically recognised file path, which did not match to the file path in the local system.

./vpipe --dryrun error

Hi all,
I run the above with own data but got :
(vpipe_env) ibseq:~/testing/work$ ./vpipe --dryrun
VPIPE_BASEDIR = /univ/ibseq/testing/V-pipe
AssertionError in line 68 of /univ/ibseq/testing/V-pipe/rules/common.smk:
ERROR: Line '13' does not contain at least two entries!
File "/univ/ibseq/testing/V-pipe/vpipe.snake", line 11, in
File "/univ/ibseq/testing/V-pipe/rules/common.smk", line 68, in

from https://cbg-ethz.github.io/V-pipe/tutorial/sars-cov2/
any advice?
thanks
ibseq

ResolvePackageNotFound: - savage=0.4.2

I always ran into this problem. Any idea?

CreateCondaEnvironmentException:
Could not create conda environment from /Users/jameschen/CloudStation/Bioinform/V-pipe/envs/savage.yaml:
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed

ResolvePackageNotFound:

savage=0.4.2

Compatible execution of jar files

Provide a solution for the different interfaces when calling InDelFixer, ConsensusFixer and Picard tools.

shorah output in vcf for all samples

hi all
how can we convert the minority_variants.tsv in vcf or merge each of the single vcf into one? (they all have same naming)

thanks
ibseq

please document haploclique.extra_parameters

under https://github.com/cbg-ethz/V-pipe/wiki/options#haploclique

thanks!

Error in rule generate_web_visualization

Hi !
Got this error in rule generate_web_visualization, from the assemble_visualization_webpage.py script on line 16 :
from Bio.Alphabet import IUPAC

ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

What should I do to correct that ? Should I roll back my version of biopython ?
Thanks !

error for assigning config file by snakemake option

Hey,
I have another issue regarding the use of the config file.
I'm running different tests for testing different parameters on and for different sample sets. Therefore it would be great if I could just create different config files and running vpipe using snakemakes --configfile option. I thought that should be possible since it is a basic snakemake functionality.
However, after some tests I figured that using this option will always throw an error. Even if I assign the path to the default path/to/vpipe/workdir/vipipe.config with the --configfile. So really the same file which is definitely used when running vpipe without specifying the config-file. I get YAML format errors, that's why I tested multiple times if it really is the same file I assign there. I really can't think about any reason for this anymore.
Do you have an idea what could be the cause of it? Or am I indeed just misunderstood some snakemake logic?

Those are the errors I get:

yaml.parser.ParserError: expected '<document start>', but found '<scalar>'
  in "vpipe.config", line 2, column 1
[...]
During handling of the above exception, another exception occurred:
[...]
snakemake.exceptions.WorkflowError: Config file is not valid JSON or YAML. In case of YAML, make sure to not mix whitespace and tab indentation.```

Vicuna error

Hello,

When I run V-Pipe I get the following error file in the initial consensus folder. The V-Pipe run did not complete. Any assistance would be much appreciated.

[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.00 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.00 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index consensus.fasta
[main] Real time: 0.020 sec; CPU: 0.004 sec
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 22 sequences (4752 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 1, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 22 reads in 0.005 CPU sec, 0.005 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 4 consensus.fasta ../preprocessed_data/R1.fastq ../preprocessed_data/R2.fastq
[main] Real time: 0.007 sec; CPU: 0.008 sec
INFO 2019-10-14 23:36:08 SamToFastq

********** NOTE: Picard's command line syntax is changing.

********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

********** The command line looks like this in the new syntax:

********** SamToFastq -I mapped.bam -FASTQ cleaned/R1.fastq -SECOND_END_FASTQ cleaned/R2.fastq -VALIDATION_STRINGENCY SILENT

23:36:09.046 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/matt_hopken/software/V-pipe/.snakemake/conda/b85da07e/share/picard-2.21.1-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Oct 14 23:36:09 MDT 2019] SamToFastq INPUT=mapped.bam FASTQ=cleaned/R1.fastq SECOND_END_FASTQ=cleaned/R2.fastq VALIDATION_STRINGENCY=SILENT OUTPUT_PER_RG=false COMPRESS_OUTPUTS_PER_RG=false RG_TAG=PU RE_REVERSE=true INTERLEAVE=false INCLUDE_NON_PF_READS=false CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Oct 14 23:36:09 MDT 2019] Executing as matt_hopken@abdoserver1 on Linux 4.15.0-51-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.1-SNAPSHOT
[Mon Oct 14 23:36:09 MDT 2019] picard.sam.SamToFastq done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=514850816
/bin/bash: line 66: vicuna: command not found

Conflict in quality_assurance.smk with custom datadir-path

When defining datadir different from samples/ in the config.yaml, e.g.

input:
      reference: ../../resources/reference/ancestor_consensus.fasta
      datadir: ../../resources/samples/Experiment3

the function len_cutoff() in quality_assurance.smk is taking the wrong part of the splitted string:

def len_cutoff(wildcards):
    parts = wildcards.dataset.split("/")
    patient_ID = parts[1] # should be: patient_ID = parts[-2]  
    date = parts[2] # should be: date = parts[-1]

Instead of

 patient_ID = parts[1] 
 date = parts[2]

it should be

patient_ID = parts[-2]  
date = parts[-1]