Giter Club home page Giter Club logo

16s-dada2's Introduction

Snakemake workflow: Dada2

Snakemake dada2 Build Status

This workflow is an implementation of the popular DADA2 tool. I followed the steps in the Tutorial. I use IDtaxa for taxonomic annotation.

dada2

Authors

  • Silas Kieser (@silask)

Usage

Step 1: Install workflow

If you simply want to use this workflow, download and extract the latest release. If you intend to modify and further develop this workflow, fork this repository. Please consider providing any generally applicable modifications via a pull request.

In any case, if you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository and, if available, its DOI (see above).

Requirements:

The pipeline has some dependencies which an be installed with conda:

conda env create -n dada2_env --file dependencies.yml

Databases:

For taxonomic annotation I use IDtaxa. A database e.g. the one from GTDB should be downloaded from here and the path added to the config file.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the file config.yaml.

Create a sample table like this one. You can use the script prepare_sample_table.py for it. The scripts searches for fastq(.gz) files inside a folder (structure). If you have paired end files they should have R1/R2 somewhere in the filename. If might be a good idea to simplify sample names.

./prepare_sample_table.py path/to/fastq(.gz)files

The script creates a samples.tsv in the working directory. Here is an example.

R1 R2
sample1 /path/to/fastqs/sample1/sample1_R1.fastq.gz /path/to/fastqs/sample1/sample1_R2.fastq.gz
sample2 /path/to/fastqs/sample2_R1.fastq.gz /path/to/fastqs/sample2_R1.fastq.gz

Step 3: Execute workflow

Test your configuration by performing a dry-run via

snakemake --configfile path/config.yaml -n

Execute the workflow locally via

snakemake --configfile path/config.yaml --cores $N

using $N cores or run it in a cluster environment via

snakemake --configfile path/config.yaml --cluster qsub --jobs 100

or

snakemake --configfile path/config.yaml --drmaa --jobs 100

See the Snakemake documentation for further details.

Testing

You can test the pipeline with the script test.py.

Cite

dada2

Callahan, B., McMurdie, P., Rosen, M. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13, 581–583 (2016). https://doi.org/10.1038/nmeth.3869

IDtaxa:

Murali, A., Bhargava, A. & Wright, E.S. IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences. Microbiome 6, 140 (2018). https://doi.org/10.1186/s40168-018-0521-5

16s-dada2's People

Contributors

colindaven avatar silask avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

16s-dada2's Issues

Missing test data

The readme indicates that test data are in the subfolder .test, but I can't find that subfolder in this repository. Is the test data located somewhere else?

Errors in filter step

Hey

Sorry for the avalanche of issues, I cannot seem to get this to work

Running with a samples.tsv:

        R1      R2
Sample_ID107:ID107      fastq/D1081/00_data/Sample_ID107/ID107_AACGCTGA-CTACTATA_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID107/ID107_AACGCTGA-CTACTATA_L001_R2_001.fastq
Sample_ID108:ID108      fastq/D1081/00_data/Sample_ID108/ID108_CGTAGCGA-CTACTATA_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID108/ID108_CGTAGCGA-CTACTATA_L001_R2_001.fastq
Sample_ID119:ID119      fastq/D1081/00_data/Sample_ID119/ID119_AACGCTGA-CGTTACTA_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID119/ID119_AACGCTGA-CGTTACTA_L001_R2_001.fastq
Sample_ID120:ID120      fastq/D1081/00_data/Sample_ID120/ID120_CGTAGCGA-CGTTACTA_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID120/ID120_CGTAGCGA-CGTTACTA_L001_R2_001.fastq
Sample_ID131:ID131      fastq/D1081/00_data/Sample_ID131/ID131_AACGCTGA-AGAGTCAC_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID131/ID131_AACGCTGA-AGAGTCAC_L001_R2_001.fastq
Sample_ID143:ID143      fastq/D1081/00_data/Sample_ID143/ID143_AACGCTGA-TACGAGAC_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID143/ID143_AACGCTGA-TACGAGAC_L001_R2_001.fastq
Sample_ID155:ID155      fastq/D1081/00_data/Sample_ID155/ID155_AACGCTGA-ACGTCTCG_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID155/ID155_AACGCTGA-ACGTCTCG_L001_R2_001.fastq
Sample_ID167:ID167      fastq/D1081/00_data/Sample_ID167/ID167_AACGCTGA-TCGACGAG_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID167/ID167_AACGCTGA-TCGACGAG_L001_R2_001.fastq
Sample_ID178:ID178      fastq/D1081/00_data/Sample_ID178/ID178_TGCTCGTA-GATCGTGT_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID178/ID178_TGCTCGTA-GATCGTGT_L001_R2_001.fastq
Sample_ID179:ID179      fastq/D1081/00_data/Sample_ID179/ID179_AACGCTGA-GATCGTGT_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID179/ID179_AACGCTGA-GATCGTGT_L001_R2_001.fastq
Sample_ID190:ID190      fastq/D1081/00_data/Sample_ID190/ID190_TGCTCGTA-GTCAGATA_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID190/ID190_TGCTCGTA-GTCAGATA_L001_R2_001.fastq
Sample_ID191:ID191      fastq/D1081/00_data/Sample_ID191/ID191_AACGCTGA-GTCAGATA_L001_R1_001.fastq      fastq/D1081/00_data/Sample_ID191/ID191_AACGCTGA-GTCAGATA_L001_R2_001.fastq
Sample_ID001:ID001      fastq/D1082/00_data/Sample_ID001/ID001_AACGCTGA-CTACTATA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID001/ID001_AACGCTGA-CTACTATA_L001_R2_001.fastq
Sample_ID002:ID002      fastq/D1082/00_data/Sample_ID002/ID002_CGTAGCGA-CTACTATA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID002/ID002_CGTAGCGA-CTACTATA_L001_R2_001.fastq
Sample_ID003:ID003      fastq/D1082/00_data/Sample_ID003/ID003_AACGCTGA-CGTTACTA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID003/ID003_AACGCTGA-CGTTACTA_L001_R2_001.fastq
Sample_ID004:ID004      fastq/D1082/00_data/Sample_ID004/ID004_CGTAGCGA-CGTTACTA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID004/ID004_CGTAGCGA-CGTTACTA_L001_R2_001.fastq
Sample_ID005:ID005      fastq/D1082/00_data/Sample_ID005/ID005_AACGCTGA-AGAGTCAC_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID005/ID005_AACGCTGA-AGAGTCAC_L001_R2_001.fastq
Sample_ID006:ID006      fastq/D1082/00_data/Sample_ID006/ID006_AACGCTGA-TACGAGAC_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID006/ID006_AACGCTGA-TACGAGAC_L001_R2_001.fastq
Sample_ID007:ID007      fastq/D1082/00_data/Sample_ID007/ID007_AACGCTGA-ACGTCTCG_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID007/ID007_AACGCTGA-ACGTCTCG_L001_R2_001.fastq
Sample_ID008:ID008      fastq/D1082/00_data/Sample_ID008/ID008_AACGCTGA-TCGACGAG_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID008/ID008_AACGCTGA-TCGACGAG_L001_R2_001.fastq
Sample_ID009:ID009      fastq/D1082/00_data/Sample_ID009/ID009_TGCTCGTA-GATCGTGT_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID009/ID009_TGCTCGTA-GATCGTGT_L001_R2_001.fastq
Sample_ID010:ID010      fastq/D1082/00_data/Sample_ID010/ID010_AACGCTGA-GATCGTGT_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID010/ID010_AACGCTGA-GATCGTGT_L001_R2_001.fastq
Sample_ID011:ID011      fastq/D1082/00_data/Sample_ID011/ID011_TGCTCGTA-GTCAGATA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID011/ID011_TGCTCGTA-GTCAGATA_L001_R2_001.fastq
Sample_ID012:ID012      fastq/D1082/00_data/Sample_ID012/ID012_AACGCTGA-GTCAGATA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID012/ID012_AACGCTGA-GTCAGATA_L001_R2_001.fastq
Sample_ID013:ID013      fastq/D1082/00_data/Sample_ID013/ID013_AACGCTGA-CTACTATA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID013/ID013_AACGCTGA-CTACTATA_L001_R2_001.fastq
Sample_ID014:ID014      fastq/D1082/00_data/Sample_ID014/ID014_CGTAGCGA-CTACTATA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID014/ID014_CGTAGCGA-CTACTATA_L001_R2_001.fastq
Sample_ID015:ID015      fastq/D1082/00_data/Sample_ID015/ID015_AACGCTGA-CGTTACTA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID015/ID015_AACGCTGA-CGTTACTA_L001_R2_001.fastq
Sample_ID016:ID016      fastq/D1082/00_data/Sample_ID016/ID016_CGTAGCGA-CGTTACTA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID016/ID016_CGTAGCGA-CGTTACTA_L001_R2_001.fastq
Sample_ID017:ID017      fastq/D1082/00_data/Sample_ID017/ID017_AACGCTGA-AGAGTCAC_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID017/ID017_AACGCTGA-AGAGTCAC_L001_R2_001.fastq
Sample_ID018:ID018      fastq/D1082/00_data/Sample_ID018/ID018_AACGCTGA-TACGAGAC_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID018/ID018_AACGCTGA-TACGAGAC_L001_R2_001.fastq
Sample_ID019:ID019      fastq/D1082/00_data/Sample_ID019/ID019_AACGCTGA-ACGTCTCG_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID019/ID019_AACGCTGA-ACGTCTCG_L001_R2_001.fastq
Sample_ID020:ID020      fastq/D1082/00_data/Sample_ID020/ID020_AACGCTGA-TCGACGAG_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID020/ID020_AACGCTGA-TCGACGAG_L001_R2_001.fastq
Sample_ID021:ID021      fastq/D1082/00_data/Sample_ID021/ID021_TGCTCGTA-GATCGTGT_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID021/ID021_TGCTCGTA-GATCGTGT_L001_R2_001.fastq
Sample_ID022:ID022      fastq/D1082/00_data/Sample_ID022/ID022_AACGCTGA-GATCGTGT_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID022/ID022_AACGCTGA-GATCGTGT_L001_R2_001.fastq
Sample_ID023:ID023      fastq/D1082/00_data/Sample_ID023/ID023_TGCTCGTA-GTCAGATA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID023/ID023_TGCTCGTA-GTCAGATA_L001_R2_001.fastq
Sample_ID024:ID024      fastq/D1082/00_data/Sample_ID024/ID024_AACGCTGA-GTCAGATA_L001_R1_001.fastq      fastq/D1082/00_data/Sample_ID024/ID024_AACGCTGA-GTCAGATA_L001_R2_001.fastq

Then

snakemake --configfile test_config.yaml --cores 4

Produces:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        1       combine_read_counts
        1       dereplicate
        1       filter
        1       filterLength
        1       get_rep_seq
        1       learnErrorRates
        1       removeChimeras
        8

[Mon Sep 13 17:26:17 2021]
rule filter:
    input: fastq/D1081/00_data/Sample_ID107/ID107_AACGCTGA-CTACTATA_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID108/ID108_CGTAGCGA-CTACTATA_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID119/ID119_AACGCTGA-CGTTACTA_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID120/ID120_CGTAGCGA-CGTTACTA_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID131/ID131_AACGCTGA-AGAGTCAC_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID143/ID143_AACGCTGA-TACGAGAC_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID155/ID155_AACGCTGA-ACGTCTCG_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID167/ID167_AACGCTGA-TCGACGAG_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID178/ID178_TGCTCGTA-GATCGTGT_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID179/ID179_AACGCTGA-GATCGTGT_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID190/ID190_TGCTCGTA-GTCAGATA_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID191/ID191_AACGCTGA-GTCAGATA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID001/ID001_AACGCTGA-CTACTATA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID002/ID002_CGTAGCGA-CTACTATA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID003/ID003_AACGCTGA-CGTTACTA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID004/ID004_CGTAGCGA-CGTTACTA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID005/ID005_AACGCTGA-AGAGTCAC_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID006/ID006_AACGCTGA-TACGAGAC_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID007/ID007_AACGCTGA-ACGTCTCG_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID008/ID008_AACGCTGA-TCGACGAG_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID009/ID009_TGCTCGTA-GATCGTGT_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID010/ID010_AACGCTGA-GATCGTGT_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID011/ID011_TGCTCGTA-GTCAGATA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID012/ID012_AACGCTGA-GTCAGATA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID013/ID013_AACGCTGA-CTACTATA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID014/ID014_CGTAGCGA-CTACTATA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID015/ID015_AACGCTGA-CGTTACTA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID016/ID016_CGTAGCGA-CGTTACTA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID017/ID017_AACGCTGA-AGAGTCAC_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID018/ID018_AACGCTGA-TACGAGAC_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID019/ID019_AACGCTGA-ACGTCTCG_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID020/ID020_AACGCTGA-TCGACGAG_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID021/ID021_TGCTCGTA-GATCGTGT_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID022/ID022_AACGCTGA-GATCGTGT_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID023/ID023_TGCTCGTA-GTCAGATA_L001_R1_001.fastq, fastq/D1082/00_data/Sample_ID024/ID024_AACGCTGA-GTCAGATA_L001_R1_001.fastq, fastq/D1081/00_data/Sample_ID107/ID107_AACGCTGA-CTACTATA_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID108/ID108_CGTAGCGA-CTACTATA_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID119/ID119_AACGCTGA-CGTTACTA_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID120/ID120_CGTAGCGA-CGTTACTA_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID131/ID131_AACGCTGA-AGAGTCAC_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID143/ID143_AACGCTGA-TACGAGAC_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID155/ID155_AACGCTGA-ACGTCTCG_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID167/ID167_AACGCTGA-TCGACGAG_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID178/ID178_TGCTCGTA-GATCGTGT_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID179/ID179_AACGCTGA-GATCGTGT_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID190/ID190_TGCTCGTA-GTCAGATA_L001_R2_001.fastq, fastq/D1081/00_data/Sample_ID191/ID191_AACGCTGA-GTCAGATA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID001/ID001_AACGCTGA-CTACTATA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID002/ID002_CGTAGCGA-CTACTATA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID003/ID003_AACGCTGA-CGTTACTA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID004/ID004_CGTAGCGA-CGTTACTA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID005/ID005_AACGCTGA-AGAGTCAC_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID006/ID006_AACGCTGA-TACGAGAC_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID007/ID007_AACGCTGA-ACGTCTCG_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID008/ID008_AACGCTGA-TCGACGAG_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID009/ID009_TGCTCGTA-GATCGTGT_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID010/ID010_AACGCTGA-GATCGTGT_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID011/ID011_TGCTCGTA-GTCAGATA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID012/ID012_AACGCTGA-GTCAGATA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID013/ID013_AACGCTGA-CTACTATA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID014/ID014_CGTAGCGA-CTACTATA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID015/ID015_AACGCTGA-CGTTACTA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID016/ID016_CGTAGCGA-CGTTACTA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID017/ID017_AACGCTGA-AGAGTCAC_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID018/ID018_AACGCTGA-TACGAGAC_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID019/ID019_AACGCTGA-ACGTCTCG_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID020/ID020_AACGCTGA-TCGACGAG_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID021/ID021_TGCTCGTA-GATCGTGT_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID022/ID022_AACGCTGA-GATCGTGT_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID023/ID023_TGCTCGTA-GTCAGATA_L001_R2_001.fastq, fastq/D1082/00_data/Sample_ID024/ID024_AACGCTGA-GTCAGATA_L001_R2_001.fastq
    output: filtered/Sample_ID107:ID107_R1.fastq.gz, filtered/Sample_ID108:ID108_R1.fastq.gz, filtered/Sample_ID119:ID119_R1.fastq.gz, filtered/Sample_ID120:ID120_R1.fastq.gz, filtered/Sample_ID131:ID131_R1.fastq.gz, filtered/Sample_ID143:ID143_R1.fastq.gz, filtered/Sample_ID155:ID155_R1.fastq.gz, filtered/Sample_ID167:ID167_R1.fastq.gz, filtered/Sample_ID178:ID178_R1.fastq.gz, filtered/Sample_ID179:ID179_R1.fastq.gz, filtered/Sample_ID190:ID190_R1.fastq.gz, filtered/Sample_ID191:ID191_R1.fastq.gz, filtered/Sample_ID001:ID001_R1.fastq.gz, filtered/Sample_ID002:ID002_R1.fastq.gz, filtered/Sample_ID003:ID003_R1.fastq.gz, filtered/Sample_ID004:ID004_R1.fastq.gz, filtered/Sample_ID005:ID005_R1.fastq.gz, filtered/Sample_ID006:ID006_R1.fastq.gz, filtered/Sample_ID007:ID007_R1.fastq.gz, filtered/Sample_ID008:ID008_R1.fastq.gz, filtered/Sample_ID009:ID009_R1.fastq.gz, filtered/Sample_ID010:ID010_R1.fastq.gz, filtered/Sample_ID011:ID011_R1.fastq.gz, filtered/Sample_ID012:ID012_R1.fastq.gz, filtered/Sample_ID013:ID013_R1.fastq.gz, filtered/Sample_ID014:ID014_R1.fastq.gz, filtered/Sample_ID015:ID015_R1.fastq.gz, filtered/Sample_ID016:ID016_R1.fastq.gz, filtered/Sample_ID017:ID017_R1.fastq.gz, filtered/Sample_ID018:ID018_R1.fastq.gz, filtered/Sample_ID019:ID019_R1.fastq.gz, filtered/Sample_ID020:ID020_R1.fastq.gz, filtered/Sample_ID021:ID021_R1.fastq.gz, filtered/Sample_ID022:ID022_R1.fastq.gz, filtered/Sample_ID023:ID023_R1.fastq.gz, filtered/Sample_ID024:ID024_R1.fastq.gz, filtered/Sample_ID107:ID107_R2.fastq.gz, filtered/Sample_ID108:ID108_R2.fastq.gz, filtered/Sample_ID119:ID119_R2.fastq.gz, filtered/Sample_ID120:ID120_R2.fastq.gz, filtered/Sample_ID131:ID131_R2.fastq.gz, filtered/Sample_ID143:ID143_R2.fastq.gz, filtered/Sample_ID155:ID155_R2.fastq.gz, filtered/Sample_ID167:ID167_R2.fastq.gz, filtered/Sample_ID178:ID178_R2.fastq.gz, filtered/Sample_ID179:ID179_R2.fastq.gz, filtered/Sample_ID190:ID190_R2.fastq.gz, filtered/Sample_ID191:ID191_R2.fastq.gz, filtered/Sample_ID001:ID001_R2.fastq.gz, filtered/Sample_ID002:ID002_R2.fastq.gz, filtered/Sample_ID003:ID003_R2.fastq.gz, filtered/Sample_ID004:ID004_R2.fastq.gz, filtered/Sample_ID005:ID005_R2.fastq.gz, filtered/Sample_ID006:ID006_R2.fastq.gz, filtered/Sample_ID007:ID007_R2.fastq.gz, filtered/Sample_ID008:ID008_R2.fastq.gz, filtered/Sample_ID009:ID009_R2.fastq.gz, filtered/Sample_ID010:ID010_R2.fastq.gz, filtered/Sample_ID011:ID011_R2.fastq.gz, filtered/Sample_ID012:ID012_R2.fastq.gz, filtered/Sample_ID013:ID013_R2.fastq.gz, filtered/Sample_ID014:ID014_R2.fastq.gz, filtered/Sample_ID015:ID015_R2.fastq.gz, filtered/Sample_ID016:ID016_R2.fastq.gz, filtered/Sample_ID017:ID017_R2.fastq.gz, filtered/Sample_ID018:ID018_R2.fastq.gz, filtered/Sample_ID019:ID019_R2.fastq.gz, filtered/Sample_ID020:ID020_R2.fastq.gz, filtered/Sample_ID021:ID021_R2.fastq.gz, filtered/Sample_ID022:ID022_R2.fastq.gz, filtered/Sample_ID023:ID023_R2.fastq.gz, filtered/Sample_ID024:ID024_R2.fastq.gz, stats/Nreads_filtered.txt
    log: logs/dada2/filter.txt
    jobid: 1
    threads: 4

Loading required package: Rcpp
Error in filterAndTrim(snakemake@input[["R1"]], snakemake@output[["R1"]],  :
  These are the errors (up to 5) encountered in individual cores...
Error in validObject(.Object) :
  invalid class “SRFilterResult” object: superclass "Mnumeric" not defined in the environment of the object's class
Error in validObject(.Object) :
  invalid class “SRFilterResult” object: superclass "Mnumeric" not defined in the environment of the object's class
Error in validObject(.Object) :
  invalid class “SRFilterResult” object: superclass "Mnumeric" not defined in the environment of the object's class
Error in validObject(.Object) :
  invalid class “SRFilterResult” object: superclass "Mnumeric" not defined in the environment of the object's class
Error in validObject(.Object) :
  invalid class “SRFilterResult” object: superclass "Mnumeric" not defined in the environment of the object's class
In addition: Warning message:
In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule,  :
  all scheduled cores encountered errors in user code
Execution halted
[Mon Sep 13 17:26:26 2021]
Error in rule filter:
    jobid: 1
    output: filtered/Sample_ID107:ID107_R1.fastq.gz, filtered/Sample_ID108:ID108_R1.fastq.gz, filtered/Sample_ID119:ID119_R1.fastq.gz, filtered/Sample_ID120:ID120_R1.fastq.gz, filtered/Sample_ID131:ID131_R1.fastq.gz, filtered/Sample_ID143:ID143_R1.fastq.gz, filtered/Sample_ID155:ID155_R1.fastq.gz, filtered/Sample_ID167:ID167_R1.fastq.gz, filtered/Sample_ID178:ID178_R1.fastq.gz, filtered/Sample_ID179:ID179_R1.fastq.gz, filtered/Sample_ID190:ID190_R1.fastq.gz, filtered/Sample_ID191:ID191_R1.fastq.gz, filtered/Sample_ID001:ID001_R1.fastq.gz, filtered/Sample_ID002:ID002_R1.fastq.gz, filtered/Sample_ID003:ID003_R1.fastq.gz, filtered/Sample_ID004:ID004_R1.fastq.gz, filtered/Sample_ID005:ID005_R1.fastq.gz, filtered/Sample_ID006:ID006_R1.fastq.gz, filtered/Sample_ID007:ID007_R1.fastq.gz, filtered/Sample_ID008:ID008_R1.fastq.gz, filtered/Sample_ID009:ID009_R1.fastq.gz, filtered/Sample_ID010:ID010_R1.fastq.gz, filtered/Sample_ID011:ID011_R1.fastq.gz, filtered/Sample_ID012:ID012_R1.fastq.gz, filtered/Sample_ID013:ID013_R1.fastq.gz, filtered/Sample_ID014:ID014_R1.fastq.gz, filtered/Sample_ID015:ID015_R1.fastq.gz, filtered/Sample_ID016:ID016_R1.fastq.gz, filtered/Sample_ID017:ID017_R1.fastq.gz, filtered/Sample_ID018:ID018_R1.fastq.gz, filtered/Sample_ID019:ID019_R1.fastq.gz, filtered/Sample_ID020:ID020_R1.fastq.gz, filtered/Sample_ID021:ID021_R1.fastq.gz, filtered/Sample_ID022:ID022_R1.fastq.gz, filtered/Sample_ID023:ID023_R1.fastq.gz, filtered/Sample_ID024:ID024_R1.fastq.gz, filtered/Sample_ID107:ID107_R2.fastq.gz, filtered/Sample_ID108:ID108_R2.fastq.gz, filtered/Sample_ID119:ID119_R2.fastq.gz, filtered/Sample_ID120:ID120_R2.fastq.gz, filtered/Sample_ID131:ID131_R2.fastq.gz, filtered/Sample_ID143:ID143_R2.fastq.gz, filtered/Sample_ID155:ID155_R2.fastq.gz, filtered/Sample_ID167:ID167_R2.fastq.gz, filtered/Sample_ID178:ID178_R2.fastq.gz, filtered/Sample_ID179:ID179_R2.fastq.gz, filtered/Sample_ID190:ID190_R2.fastq.gz, filtered/Sample_ID191:ID191_R2.fastq.gz, filtered/Sample_ID001:ID001_R2.fastq.gz, filtered/Sample_ID002:ID002_R2.fastq.gz, filtered/Sample_ID003:ID003_R2.fastq.gz, filtered/Sample_ID004:ID004_R2.fastq.gz, filtered/Sample_ID005:ID005_R2.fastq.gz, filtered/Sample_ID006:ID006_R2.fastq.gz, filtered/Sample_ID007:ID007_R2.fastq.gz, filtered/Sample_ID008:ID008_R2.fastq.gz, filtered/Sample_ID009:ID009_R2.fastq.gz, filtered/Sample_ID010:ID010_R2.fastq.gz, filtered/Sample_ID011:ID011_R2.fastq.gz, filtered/Sample_ID012:ID012_R2.fastq.gz, filtered/Sample_ID013:ID013_R2.fastq.gz, filtered/Sample_ID014:ID014_R2.fastq.gz, filtered/Sample_ID015:ID015_R2.fastq.gz, filtered/Sample_ID016:ID016_R2.fastq.gz, filtered/Sample_ID017:ID017_R2.fastq.gz, filtered/Sample_ID018:ID018_R2.fastq.gz, filtered/Sample_ID019:ID019_R2.fastq.gz, filtered/Sample_ID020:ID020_R2.fastq.gz, filtered/Sample_ID021:ID021_R2.fastq.gz, filtered/Sample_ID022:ID022_R2.fastq.gz, filtered/Sample_ID023:ID023_R2.fastq.gz, filtered/Sample_ID024:ID024_R2.fastq.gz, stats/Nreads_filtered.txt
    log: logs/dada2/filter.txt (check log file(s) for error message)

RuleException:
CalledProcessError in line 34 of /home/ubuntu/16S-dada2/rules/dada2.smk:
Command 'set -euo pipefail;  Rscript --vanilla /home/ubuntu/16S-dada2/.snakemake/scripts/tmpas45fiyy.filter.R' returned non-zero exit status 1.
  File "/home/ubuntu/16S-dada2/rules/dada2.smk", line 34, in __rule_filter
  File "/home/ubuntu/miniconda3/envs/dada2_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/ubuntu/16S-dada2/.snakemake/log/2021-09-13T172617.160599.snakemake.log

dada2.smk threads issue

Thanks for the rapid changes. I am not sure why dada2 has an issue.

My config.yaml is unchanged in the threads section:

threads: 4

snakemake
KeyError in line 28 of /ngsssd1/rcug/public_metagenome/soil_16S/16S-dada2/rules/dada2.smk:
'threads'
  File "/ngsssd1/rcug/public_metagenome/soil_16S/16S-dada2/Snakefile", line 97, in <module>
  File "/ngsssd1/rcug/public_metagenome/soil_16S/16S-dada2/rules/dada2.smk", line 28, in <module>

(dada2_env) rcug@hpc-rc03:/ngsssd1/rcug/public_metagenome/soil_16S/16S-dada2$ snakemake --cores 10
KeyError in line 28 of /ngsssd1/rcug/public_metagenome/soil_16S/16S-dada2/rules/dada2.smk:
'threads'
  File "/ngsssd1/rcug/public_metagenome/soil_16S/16S-dada2/Snakefile", line 97, in <module>
  File "/ngsssd1/rcug/public_metagenome/soil_16S/16S-dada2/rules/dada2.smk", line 28, in <module>

EOL while scanning string literal

Hi

I created the conda env and activated it, then tried a dry run:

snakemake --configfile config.yaml -n

This produces an error:

SyntaxError in line 10 of 16S-dada2/Snakefile:
EOL while scanning string literal

Not sure what's going on, is this a Python version?

$ which python
/home/ubuntu/miniconda3/envs/dada2_env/bin/python
$ python --version
Python 3.6.13

problematic samples ?

Hi,
just trying this, the test and install worked fine. thanks.

It seems my sample table isn't ok despite using the supplied script.

It looks like this, I've tried removing many samples, removing SE read samples (public data), gzipping, gunzipping, adding _001.fastq, removing _001.fastq ...... no luck.

Are the names just too long or contain illegal chars ?

16S-dada2$ snakemake --cores 10

KeyError in line 5 of /ngsssd1/rcug/public_metagenome/soil_16S/16S-dada2/Snakefile:
'sampletable'
File "/ngsssd1/rcug/public_metagenome/soil_16S/16S-dada2/Snakefile", line 5, in

Thanks,
Colin

cat samples.tsv
        R2      R1
SRR12688636-16S-microbiome-001  /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688636_16S_microbiome_R2_001.fastq     /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688636_16S_microbiome_R1_001.fastq
SRR12688638-16S-microbiome-001  /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688638_16S_microbiome_R2_001.fastq     /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688638_16S_microbiome_R1_001.fastq
SRR12688640-16S-microbiome-001  /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688640_16S_microbiome_R2_001.fastq     /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688640_16S_microbiome_R1_001.fastq
SRR12688641-16S-microbiome-001  /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688641_16S_microbiome_R2_001.fastq     /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688641_16S_microbiome_R1_001.fastq
SRR12688642-16S-microbiome-001  /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688642_16S_microbiome_R2_001.fastq     /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688642_16S_microbiome_R1_001.fastq
SRR12688637-16S-microbiome-001  /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688637_16S_microbiome_R2_001.fastq     /ngsssd1/rcug/public_metagenome/soil_16S/renamed/microbiome/SRR12688637_16S_microbiome_R1_001.fastq

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.