Giter Club home page Giter Club logo

anacapa's People

Contributors

gauravsk avatar jessegomer avatar jtdaugh avatar limey-bean avatar lpipes avatar marinednadude avatar max-mapper avatar zjgold avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anacapa's Issues

minimal example data

db lives in /u/flashscratch/e/eecurd/UCLA_generated_data/Anacapa-master/Anacapa_db

good example raw data for now would be UCLA_generated_data/dada2_test_2/RAW - Miseq w/ nextera adapters. output is in dada2_out and bowtie2_runs

I will try to produce same output as a first pass

biopython on hoffman

User needs to make sure that they have biopython.

module load
pip install biopython --user

Runnind using dada2 version 1.22

Hi lime-bean

I am a Masters student at the University of Johannesburg, South Africa.
My project entails using eDNA metabarcording as a tool to study estuarine meacroinvertebrates.
Therefore the Anacapa toolkit is so essential for me to grasp and utilize for my project (by the way thank you for this toolkit)

I would like to inquire

I currently have dada 2 version 1.22 installed on my R(version 4.1.3) on my Linux ubuntu.
The script (anacapa_QC) works well until the dada2 pipeline. I cant seem to get files in my out directory dada out folder.
Checking the run log I see it halts the process and requires I install dada2 1.16.

Would it be possible to tweak the scripts to allow the use of dada2 version 1.22?
If so how would I go about this? i.e do I change the anacapa_config.sh script or?

Figures - to do

Crux vs. the world

  • Smithsonian's CO1 database comparisons (Emily will do the comparisons)
  • Improve combined heatmap + histogram figure (Gaurav to do this)

Bowtie2-BLCA v Blast-BLCA v Qiime2

All comparisons are happening at the 60% cutoff

  • Histogram, one bar for each of the classifiers. In text, refer to what was wrong.

Hi-Seq vs Mi-Seq

  • Scatterplot of MiSeq v HiSeq taxonomy assignments; explore this a bit.

Anacapa vs. the world

  • Forthcoming...

To-do list: Dada2 step

Summary stats

  • how many sequences were kept/discarded at each step?

  • what is the average sequence length (or sequence length distribution)?

  • how many sequences were left unpaired,

  • filterAndTrim() issue:

I get the following error running filterAndTrim() with the parameters as currently defined:

Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0,  : [...]
Mismatched forward and reverse sequence files: 9587, 9588.

I get this because after the first round of cutadapt filtering, I have a slightly different number of sequences left in the forward and reverse files. I might be doing something different in my cutadapt runs, though, than what is in the pipeline now...

wc -l 740_2016_S70_L001_R*
   438348 740_2016_S70_L001_R1_001_cut.fastq
   438352 740_2016_S70_L001_R2_001_cut.fastq

I can get filterAndTrim() to run by setting matchIDs = TRUE- with this option, the function checks the header of each sequence rather than assuming they match up.

Either way, I think it can't hurt to set matchIDs = TRUE- just to have an extra check in place. Emily?

Anacapa documentation - options for own data

Hi,

I have managed to install the Anacapa container and run the 12S example successfully.

Now I wanted to to run my own data but I am not sure I find all the options that need to be customized.

I have two markers, COI and 12S

What I did

  1. I created an new folder inside Anacapa_db called COI_data

  2. I modified the run-anacapa-COI.sh file as follows

  • pointed DATA to the COI_data folder
  • pointed OUT to new directory COI_results
  • pointed FORWARD and REVERSE to forward_primers.txt & reverse_primers.txt
  • checked that the flags -a truseq & -t MiSeq are right for my project
  1. I changed the Primer in forward_primers.txt & reverse_primers.txt and removed the primers for the barcodes that I didn't have.

  2. I changed the expected length in metabarcode_loci_min_merge_length.txt

However, I seem to be missing something

  • I get output for COI AND 16S. How do I need to provide the files that the pipeline realizes which files belong to which primer? (I provided only COI files, the files are de-replicated by primer)
  • Did I miss options that I need to adapt?

Thanks!

Fabian

Minor code issues that need fixing to get Anacapa working

Hi all, I was trying to get the latest version of Anacapa to run through the Singularity container and ran into a few lines of code that need fixing to get Anacapa to run. The first is an erroneous (and double) print statement in line 327 in blca_from_bowtie.py, which results in termination of the script and therefore failure of the pipeline.

The second is a problem in local mode in the run_*_blca.sh scripts, which pass -p ${DB}/muscle as the muscle path to blca_from_bowtie.py, where ${DB} points to the Anacapa_db directory. The result is failure on line 369. This should point to the muscle path as specifiied in anacapa_config.sh (related to issue #40 ?)

As I'm not sure whether this pipeline is still being maintained, I've attached all code that is required to get Anacapa running through Singularity in local mode. NOTE: I get slightly different taxonomy annotations in the 12S example, see issue #60.

Download and modify files

#!/bin/bash
# Path to install Anacapa
BASE_PATH="/path/to/preferred/directory"
ANACAPA_PATH="${BASE_PATH}/anacapa"

# Download singularity container
mkdir ${ANACAPA_PATH}
cd ${ANACAPA_PATH}
wget https://zenodo.org/record/2602180/files/anacapa-1.5.0.img?download=1 \
    -O anacapa-1.5.0.img

# Test if container can be executed and whether muscle is callable
# singularity shell ${ANACAPA_PATH}/anacapa-1.5.0.img
# muscle
# exit

# Clone the Anacapa repository
git clone https://github.com/limey-bean/Anacapa

# Replace the configuration with one for singularity usage
CONFIG_PATH=${ANACAPA_PATH}/Anacapa/Anacapa_db/scripts/anacapa_config.sh
mv ${CONFIG_PATH} ${CONFIG_PATH/config.sh/config.sh.bak}
wget https://raw.githubusercontent.com/dat-ecosystem-archive/anacapa-container/master/config/anacapa_config.sh \
    -O ${CONFIG_PATH}

# Remove a print statement that causes an error in the BLCA procedure
BLCA_PY_PATH=${ANACAPA_PATH}/Anacapa/Anacapa_db/scripts/blca_from_bowtie.py
sed -i.bak '327d' ${BLCA_PY_PATH}

# Alter the path to MUSCLE in the run_blca.sh script - it assumes that
# MUSCLE is callabale from Anacapa_db/muscle, whereas it is called from
# the $PATH variable in the container. This is the default option,
# so the -p option can be removed
BLCA_SH_PATH=${ANACAPA_PATH}/Anacapa/Anacapa_db/scripts/run_blca.sh
BOWTIE_BLCA_SH_PATH=${ANACAPA_PATH}/Anacapa/Anacapa_db/scripts/run_bowtie2_blca.sh
sed -i.bak 's;-p ${DB}/muscle;;g' ${BLCA_SH_PATH}
sed -i.bak 's;-p ${DB}/muscle;;g' ${BOWTIE_BLCA_SH_PATH}

Run the 12S example

#!/bin/bash
# Define paths
BASE_PATH="/path/to/preferred/directory"
ANACAPA_PATH="${BASE_PATH}/anacapa/Anacapa"
CONTAINER_PATH="${BASE_PATH}/anacapa/anacapa-1.5.0.img"

# Unzip the 12S bowtie and tax databases
cd ${ANACAPA_PATH}
unzip ${ANACAPA_PATH}/Example_data/12S_Oct2019.zip 12S_Oct2019/*
mv 12S_Oct2019 ${ANACAPA_PATH}/Anacapa_db/12S

# Test the QC and ASV parsing module
singularity exec -B ${ANACAPA_PATH} ${CONTAINER_PATH} /bin/bash -c "${ANACAPA_PATH}/Anacapa_db/anacapa_QC_dada2.sh \
    -i ${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_test_data \
    -o ${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_time_test \
    -d ${ANACAPA_PATH}/Anacapa_db \
    -f ${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_test_data/forward.txt \
    -r ${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_test_data/reverse.txt \
    -e ${ANACAPA_PATH}/Anacapa_db/metabarcode_loci_min_merge_length.txt \
    -a nextera \
    -t MiSeq \
    -l"

# Compare the output with the expected output. Should not return any lines.
OUT="${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_time_test/12S"
EXP="${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/Anacapa_test_data_expected_output_after_QC_dada2/12S"
git diff --no-index --stat ${EXP} ${OUT}

# Test the taxonomic classification module
singularity exec -B ${ANACAPA_PATH} ${CONTAINER_PATH} /bin/bash -c \
    "${ANACAPA_PATH}/Anacapa_db/anacapa_classifier.sh \
    -o ${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_time_test \
    -d ${ANACAPA_PATH}/Anacapa_db \
    -l"

# Compare the output with the expected output
OUT="${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_time_test/12S"
EXP="${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/Anacapa_test_data_expected_output_after_classifier/12S"
git diff --no-index --stat ${EXP} ${OUT}

Edit: added reference to issue #60.

Slightly different taxonomy annotations with 12S example

After getting Anacapa to work through Singularity and running the 12S example, I seem to get some small differences between my output and the expected output, see below. For a reproducible example, please see #59. I was wondering what could be the cause of this discrepancy? Is it likely to be the result of different versions of Anacapa / software, or only of the 12S database used? I made use of the 12S_Oct2019.zip database. Judging by the diff below, it seems that my output contains more taxa.

Check which files are different

# Compare my output with the expected output
OUT="${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_time_test/12S"
EXP="${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/Anacapa_test_data_expected_output_after_classifier/12S"
git diff --no-index --stat ${EXP} ${OUT}

 .../12S_taxonomy_tables/12S_ASV_taxonomy_brief.txt |  32 +-
 .../12S_ASV_taxonomy_detailed.txt                  |  32 +-
 .../100/12S_ASV_raw_taxonomy_100.txt               |   8 +-
 .../100/12S_ASV_sum_by_taxonomy_100.txt            |   6 +-
 .../40/12S_ASV_raw_taxonomy_40.txt                 |   6 +-
 .../40/12S_ASV_sum_by_taxonomy_40.txt              |   4 +-
 .../50/12S_ASV_raw_taxonomy_50.txt                 |   4 +-
 .../50/12S_ASV_sum_by_taxonomy_50.txt              |   2 +-
 .../60/12S_ASV_raw_taxonomy_60.txt                 |   4 +-
 .../60/12S_ASV_sum_by_taxonomy_60.txt              |   2 +-
 .../70/12S_ASV_raw_taxonomy_70.txt                 |   4 +-
 .../70/12S_ASV_sum_by_taxonomy_70.txt              |   2 +-
 .../80/12S_ASV_raw_taxonomy_80.txt                 |   4 +-
 .../80/12S_ASV_sum_by_taxonomy_80.txt              |   2 +-
 .../90/12S_ASV_raw_taxonomy_90.txt                 |   4 +-
 .../90/12S_ASV_sum_by_taxonomy_90.txt              |   2 +-
 .../95/12S_ASV_raw_taxonomy_95.txt                 |   4 +-
 .../95/12S_ASV_sum_by_taxonomy_95.txt              |   2 +-
 .../12S/12Sbowtie2_out/12S_bowtie2_all.sam         | 702 +++++++++++----------
 .../12Sbowtie2_out/12S_bowtie2_all.sam.blca.out    |  32 +-
 .../single_read_forward_12S_end_to_end.sam         |  33 +-
 .../single_read_forward_12S_local.sam              | 118 ++--
 .../single_read_merged_12S_end_to_end.sam          | 351 ++++++-----
 .../single_read_merged_12S_local.sam               | 200 +++---
 24 files changed, 826 insertions(+), 734 deletions(-)

Compare the brief tanomy tables

# Compare the Brief taxonomy tables
diff ${EXP}/12S_taxonomy_tables/12S_ASV_taxonomy_brief.txt ${OUT}//12S_taxonomy_tables/12S_ASV_taxonomy_brief.txt
4,9c4,9
< forward_12S_3 1       0       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:NA;family:Pomacentridae;genus:Hypsypops;species:Hypsypops rubicundus;       superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; FJ616348.1;FJ616297.1;FJ616291.1;AF285932.1;JQ707048.1;FJ616360.1;JN935815.1;LC091986.1;LC069643.1
< forward_12S_4 1       0       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:NA;family:Pomacentridae;genus:Hypsypops;species:Hypsypops rubicundus;       superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; FJ616348.1;FJ616291.1;FJ616360.1;FJ616297.1
< forward_12S_5 0       1       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Labriformes;family:Labridae;genus:Oxyjulis;species:Oxyjulis californica;    superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:98.0;species:98.0;   AY279632.1;KY815300.1;LC104651.1;LC146271.1;AJ810137.1;AY279611.1;KY421797.1;AJ810136.1;KY815304.1;AY279589.1;AB974580.1;AY279628.1;AB972165.1;KY815292.1;LC021281.1;AY279590.1;AY279635.1;LC037152.1
< forward_12S_6 0       1       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Clupeiformes;family:Engraulidae;genus:Engraulis;species:Engraulis mordax;   superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; LC091576.1;LC091575.1;LC020920.1;AF417342.1;AB040676.1;KF765500.2;AP009137.1
< forward_12S_7 0       1       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Pleuronectiformes;family:Paralichthyidae;genus:Citharichthys;species:Citharichthys stigmaeus;       superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:98.75;species:91.6666666667; LC049675.1;LC092080.1;LC145943.1;AF488499.1;AF488502.1
< merged_12S_1  435     190     superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Lutjaniformes;family:Lutjanidae;genus:Scombrops;species:Microcanthus strigatus;     superkingdom:100.0;phylum:100.0;class:100.0;order:36.3857142857;family:36.3857142857;genus:24.4357142857;species:16.6666666667; LC146222.1;LC146225.1;KC136483.1;EF616892.1;LC036796.1;LC036741.1;AB972198.1;AP006009.1;LC036780.1;LC036779.1;LC006297.3;LC208773.1;AB378750.1;LC069537.1;AB378749.1;FJ171339.1;LC021264.1;KX641477.1;AB128870.1;KC136457.1;JQ010988.1;JQ010987.1;LC036864.1;AB236128.1;AB236130.1;LC021271.1;AB236129.1;AB214535.1;KM658974.1;LC069635.1;LC069607.1;LC069633.1;KC136370.1;LC092033.1;AP017437.1;LC278180.1;LC026623.1;AP006025.1;LC037134.1;KC136536.1;LC278257.1;AP006814.1;LC104688.1;LC104689.1;LC036807.1;KC136395.1;KC136577.1;LC049802.1;LC146329.1;AY700235.1;LC278258.1;AF294451.1;AP017438.1;LC021167.1;KT309078.1;LC021166.1;KR363149.1
---
> forward_12S_3 1       0       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:NA;family:Pomacentridae;genus:Hypsypops;species:Hypsypops rubicundus;       superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; FJ616348.1;FJ616297.1;FJ616291.1;JQ707048.1;AF285932.1;JN935815.1;FJ616360.1;LC091986.1;LC499326.1;LC069643.1
> forward_12S_4 1       0       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:NA;family:Pomacentridae;genus:Hypsypops;species:Hypsypops rubicundus;       superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; FJ616348.1;FJ616291.1;FJ616297.1;FJ616360.1
> forward_12S_5 0       1       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Labriformes;family:Labridae;genus:Oxyjulis;species:Oxyjulis californica;    superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:98.0;species:98.0;   AY279632.1;KY815300.1;LC104651.1;AJ810137.1;KY421797.1;AY279611.1;LC146271.1;AJ810136.1;KY815304.1;AB974580.1;LC499368.1;AY279589.1;AY279628.1;AB972165.1;KY815292.1;AY279590.1;LC021281.1;AY279635.1;LC037152.1
> forward_12S_6 0       1       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Clupeiformes;family:Engraulidae;genus:Engraulis;species:Engraulis mordax;   superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; NC_041097.1;LC091575.1;LC091576.1;LC468860.1;AB040676.1;AF417342.1;AP017957.1;LC499587.1;LC385201.1;LC468861.1;LC020920.1;KF765500.2;LC340033.1;AP009137.1
> forward_12S_7 0       1       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Pleuronectiformes;family:Paralichthyidae;genus:Citharichthys;species:Citharichthys stigmaeus;       superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:99.6666666667;species:92.5;  LC092080.1;LC049675.1;LC145943.1;AF488499.1;AF488502.1
> merged_12S_1  435     190     superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Centrarchiformes;family:Lutjanidae;genus:Scombrops;species:Microcanthus strigatus;  superkingdom:100.0;phylum:100.0;class:100.0;order:42.7643162393;family:22.1083333333;genus:20.8132478632;species:20.3205128205; LC146225.1;EF616892.1;KC136483.1;LC036796.1;AP006009.1;LC036741.1;LC421698.1;AB938146.1;LC500739.1;LC208773.1;LC036779.1;LC036780.1;DQ872160.1;AB972189.1;AB739063.1;LC474184.1;MF621710.1;AB378750.1;AB378749.1;FJ171339.1;LC146230.1;LC055190.1;MH248214.1;KX373635.1;KX641477.1;JQ010988.1;LC036864.1;LC021266.1;AB236129.1;LC021271.1;AB236128.1;AB214535.1;AB236130.1;KM282429.1;MH248193.1;KC136370.1;AP017437.1;LC092033.1;LC026623.1;LC037134.1;LC340157.1;KC136536.1;AF055600.1;LC499379.1;AP006814.1;LC104689.1;KC136517.1;LC104688.1;MG748713.1;MH248192.1;LC340159.1;LC036807.1;KC136395.1;KC136577.1;LC499334.1;LC049802.1;KR152255.1;KC136518.1;KR152252.1;AB972222.1;LC499336.1;LC458333.1;LC421748.1;LC278255.1;LC278252.1;KT337336.1;KR363149.1;KT309078.1;LC499366.1;LC021167.1;LC474187.1;LC021166.1;AP017438.1
12,15c12,15
< merged_12S_4  104     0       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Labriformes;family:Labridae;genus:Oxyjulis;species:Oxyjulis californica;    superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:90.2;species:90.2;   AY279632.1;KY815300.1;AY279611.1;KY421797.1;AJ810137.1;LC146271.1;LC104651.1;AJ810136.1;KY815304.1;AY279589.1;AB974580.1;AB972165.1;LC021281.1;AY279590.1
< merged_12S_5  0       99      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Labriformes;family:Labridae;genus:Oxyjulis;species:Oxyjulis californica;    superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:99.5;species:99.5;   AY279632.1;KY815300.1;LC146271.1;KY421797.1;AY279611.1;AJ810137.1;LC104651.1;AJ810136.1;KY815304.1;AB974580.1;AY279589.1;AY279628.1;KY815292.1;AY279635.1;AB972165.1;LC037152.1;AY279590.1;LC021281.1
< merged_12S_6  31      43      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Clupeiformes;family:Engraulidae;genus:Engraulis;species:Engraulis mordax;   superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; LC091575.1;LC091576.1;KF765500.2;AF417342.1;AB040676.1;LC020920.1;AP009137.1;AP011557.1
< merged_12S_7  37      26      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Centrarchiformes;family:Kyphosidae;genus:Girella;species:Girella simplicidens;      superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:77.0;  KC136553.1;KC136546.1;KC136395.1;KC136577.1;KC136536.1;AB236128.1;AB214535.1;LC021271.1;AB236130.1;AB236129.1;KC136370.1;AB233478.1;LC278022.1;LC037029.1;AB233477.1;LC278153.1;AB208649.1;AB233476.1;AB233475.1;KC136555.1;LC037030.1;LC278154.1;AB233479.1;KC136477.1;AB233481.1;AB972232.1;AB233485.1;AB233484.1;AP011060.1;AB233483.1;AB233482.1;AB208648.1;AB233489.1;AB233488.1;AB233491.1;AB233486.1;AB233480.1;AB233490.1;AB233487.1
---
> merged_12S_4  104     0       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Labriformes;family:Labridae;genus:Oxyjulis;species:Oxyjulis californica;    superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:90.2;species:90.2;   AY279632.1;KY815300.1;LC146271.1;AY279611.1;KY421797.1;AJ810137.1;LC104651.1;AJ810136.1;KY815304.1;LC499368.1;AB974580.1;AY279589.1;AB972165.1;AY279590.1;LC021281.1
> merged_12S_5  0       99      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Labriformes;family:Labridae;genus:Oxyjulis;species:Oxyjulis californica;    superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:99.0;species:99.0;   AY279632.1;KY815300.1;KY421797.1;AJ810137.1;LC146271.1;AY279611.1;LC104651.1;AJ810136.1;KY815304.1;LC499368.1;AB974580.1;AY279589.1;AY279628.1;KY815292.1;AY279635.1;AB972165.1;LC037152.1;LC021281.1;AY279590.1
> merged_12S_6  31      43      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Clupeiformes;family:Engraulidae;genus:Engraulis;species:Engraulis mordax;   superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; LC091575.1;NC_041097.1;LC091576.1;AP017957.1;LC468860.1;LC020920.1;LC385201.1;LC468861.1;AB040676.1;KF765500.2;LC499587.1;AF417342.1;LC340033.1;AP009137.1;AP011557.1
> merged_12S_7  37      26      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Centrarchiformes;family:Kyphosidae;genus:Girella;species:Girella simplicidens;      superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:76.9166666667; KC136553.1;KC136546.1;KC136395.1;KC136536.1;KC136577.1;AB214535.1;LC021271.1;KC136370.1;AB236129.1;AB236128.1;AB236130.1;AB208649.1;AB233477.1;LC278154.1;LC037029.1;LC037030.1;AB233478.1;KC136555.1;LC492400.1;LC278022.1;LC385204.1;AB233476.1;AB233475.1;LC278153.1;LC458160.1;AB233482.1;AB233489.1;AP011060.1;AB233488.1;AB233491.1;AB233483.1;LC421696.1;AB233481.1;AB233485.1;KC136477.1;AB972232.1;LC492401.1;AB208648.1;AB233484.1;AB233479.1;AB233480.1;AB233486.1;AB233490.1;AB233487.1
17,21c17,21
< merged_12S_9  0       51      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Pleuronectiformes;family:Paralichthyidae;genus:Citharichthys;species:Citharichthys stigmaeus;       superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:98.6666666667; LC092080.1;LC049675.1;LC145943.1;AF488499.1;AF488502.1
< merged_12S_10 0       44      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Centrarchiformes;family:Kyphosidae;genus:Medialuna;species:Medialuna californiensis;        superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; KC136408.1;AP011062.1;LC037028.1;KC136456.1;KC136474.1;KC136544.1;KC136573.1;KC136377.1;KC136516.1;KC136375.1;KC136428.1;KC136398.1;LC037025.1;KC136418.1;KC136540.1;KC136499.1;KC136382.1;KC136362.1;KC136451.1;KC136568.1;KC136550.1;KC136404.1;KC136505.1;KC136441.1;KC136503.1;LC037027.1;KC136407.1;KC136378.1;KC136444.1;KC136450.1;KC136470.1;KC136480.1;KC136496.1;KC136368.1;KC136520.1;KC136386.1;KC136500.1;KC136488.1;KC136365.1;KC136463.1;KC136529.1;KC136491.1;AB972197.1;KC136565.1;AP011061.1;KC136453.1;KC136507.1;KC136535.1;KC136538.1;KC136464.1;KC136506.1;LC037024.1;KC136414.1;KC136405.1;KC136393.1;KC136564.1;KC136442.1;KC136557.1;KC136438.1;KC136548.1;KC136559.1;KC136361.1;KC136448.1;KC136479.1;KC136384.1;KC136560.1;KC136383.1;KC136423.1;KC136399.1;KC136381.1;KC136436.1;KC136437.1;KC136409.1;KC136411.1;KC136416.1;EF616895.1;KC136417.1;KC136574.1;KC136481.1;KC136521.1;KC136523.1;KC136433.1;KC136492.1;KC136524.1;KC136455.1;KC136511.1;KC136533.1;KC136426.1;KC136558.1;KC136473.1;KC136466.1;KC136476.1;KC136374.1;KC136439.1;KC136522.1;KC136532.1;KC136403.1;KC136430.1;KC136379.1;KC136364.1
< merged_12S_11 35      8       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Labriformes;family:Labridae;genus:Semicossyphus;species:Semicossyphus pulcher;      superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; AY279644.1;EU601226.1;AB974611.1;KY815247.1
< merged_12S_12 0       38      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:NA;family:Pomacentridae;genus:Chromis;species:Chromis punctipinnis; superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; FJ616322.1;FJ616317.1;JQ707030.1;JQ707036.1;FJ616328.1;AB969956.1;FJ616320.1;LC104634.1;FJ616290.1;AF285921.1;JN935809.1;FJ616295.1;LC104618.1;KF374999.1;FJ616313.1;AP006016.1;FJ616326.1;LC069664.1;AF285920.1;AB969974.1;FJ616327.1;LC104613.1;FJ616293.1;LC069660.1;AB969957.1;JQ707032.1;AB974585.1;LC069650.1;AY279570.1;LC069665.1;LC104614.1;FJ616296.1;JQ707028.1;LC021279.1;JQ707034.1;FJ616289.1;FJ616298.1;LC104616.1;AF285926.1;FJ616314.1;LC091980.1;LC069651.1;FJ616321.1;JN935818.1;FJ616316.1;LC146253.1;FJ616294.1;FJ616292.1;LC146252.1;FJ616311.1;AY098623.1;FJ616297.1;FJ616318.1;JQ707035.1;LC104619.1;FJ616319.1;FJ616315.1;LC104637.1;LC069656.1;JQ707031.1;LC069597.1;KU531434.1;LC091970.1;JN628854.1;FJ616340.1;KU140665.1;KX631426.1;KT277287.1;KT166981.1;JN628852.1;JN628861.1;JN628858.1;KX595333.1;KT943516.1;KT221043.1;HE961974.1;AY597335.1;FJ616350.1;KM658974.1
< merged_12S_13 0       35      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Atheriniformes;family:Atherinopsidae;genus:Odontesthes;species:Odontesthes incisa;  superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:37.9928571429; GQ352653.1;GQ352655.1;GQ352652.1;KF791036.1;GQ352654.1;GQ352659.1;AB370894.1;GQ352656.1;GQ352658.1;GQ352657.1;GQ352651.1
---
> merged_12S_9  0       51      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Pleuronectiformes;family:Paralichthyidae;genus:Citharichthys;species:Citharichthys stigmaeus;       superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; LC092080.1;LC049675.1;LC145943.1;AF488499.1;AF488502.1
> merged_12S_10 0       44      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Centrarchiformes;family:Kyphosidae;genus:Medialuna;species:Medialuna californiensis;        superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; KC136408.1;LC037028.1;AP011062.1;KC136499.1;KC136531.1;KC136568.1;KC136543.1;KC136428.1;KC136422.1;KC136402.1;KC136573.1;KC136505.1;KC136519.1;KC136540.1;KC136377.1;KC136375.1;KC136451.1;KC136474.1;KC136544.1;LC037025.1;KC136418.1;KC136463.1;KC136542.1;KC136480.1;KC136488.1;KC136525.1;KC136496.1;KC136520.1;KC136444.1;KC136512.1;KC136427.1;KC136527.1;KC136378.1;KC136500.1;KC136380.1;LC037026.1;KC136386.1;KC136545.1;KC136470.1;KC136529.1;KC136391.1;KC136400.1;AP011061.1;KC136538.1;AB972197.1;KC136526.1;KC136491.1;KC136461.1;KC136565.1;KC136436.1;KC136537.1;KC136572.1;KC136433.1;KC136502.1;KC136455.1;KC136372.1;KC136446.1;KC136574.1;EF616895.1;KC136445.1;KC136487.1;KC136523.1;KC136405.1;KC136569.1;KC136511.1;KC136533.1;KC136411.1;KC136417.1;KC136557.1;KC136447.1;KC136489.1;KC136414.1;AY279561.1;KC136559.1;KC136384.1;KC136541.1;KC136492.1;KC136506.1;KC136369.1;KC136399.1;KC136383.1;KC136448.1;KC136381.1;KC136554.1;KC136481.1;KC136479.1;KC136548.1;KC136468.1;KC136524.1;KC136409.1;KC136514.1;KC136423.1;KC136421.1;KC136442.1;AP014538.1;KC136476.1;KC136460.1;KC136532.1;KC136556.1;KC136439.1
> merged_12S_11 35      8       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Labriformes;family:Labridae;genus:Semicossyphus;species:Semicossyphus pulcher;      superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:100.0;species:100.0; AY279644.1;EU601226.1;LC499589.1;AB974611.1;LC340201.1;LC499309.1;KY815247.1
> merged_12S_12 0       38      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:NA;family:Pomacentridae;genus:Chromis;species:Chromis punctipinnis; superkingdom:100.0;phylum:100.0;class:100.0;order:99.0;family:100.0;genus:100.0;species:99.0;   FJ616322.1;FJ616317.1;JQ707030.1;JQ707036.1;FJ616290.1;FJ616320.1;LC474196.1;AB969956.1;LC474195.1;FJ616328.1;LC104634.1;FJ616313.1;LC340187.1;AP006016.1;LC499559.1;FJ616295.1;KF374999.1;LC500710.1;JN935809.1;FJ616326.1;LC104618.1;LC069664.1;AF285921.1;AB969957.1;AB969974.1;AY279570.1;FJ616327.1;AB974585.1;FJ616293.1;LC069660.1;LC104613.1;LC499560.1;LC421703.1;AF285920.1;LC069665.1;JQ707032.1;LC069650.1;LC499303.1;NC_041192.1;LC104614.1;MK100717.1;LC458258.1;JQ707028.1;LC021279.1;FJ616296.1;JQ707034.1;FJ616298.1;FJ616289.1;LC104616.1;AF285926.1;LC091980.1;FJ616314.1;FJ616321.1;LC069651.1;FJ616316.1;JN935818.1;LC146253.1;FJ616294.1;LC146252.1;FJ616292.1;LC499305.1;MH248164.1;AY098623.1;FJ616297.1;FJ616311.1;JQ707035.1;FJ616318.1;LC104619.1;FJ616319.1;FJ616315.1;LC385286.1;LC069656.1;JQ707031.1;KU531434.1;LC069597.1;NC_031181.1;LC091970.1;JN628861.1;JN628858.1;KU140665.1;JN628854.1;KT277287.1;KT943516.1;KT166981.1;JN628852.1;KX631426.1;FJ616340.1;HE961974.1;KX595333.1;KT221043.1;MG603675.1;AY597335.1;NC_037019.1;FJ616350.1;NC_037020.1;KM658974.1;MG603674.1
> merged_12S_13 0       35      superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:Atheriniformes;family:Atherinopsidae;genus:Leuresthes;species:Leuresthes tenuis;    superkingdom:100.0;phylum:100.0;class:100.0;order:100.0;family:100.0;genus:73.0927350427;species:73.0927350427; MN181432.1;NC_044649.1;KF791036.1;GQ352655.1;GQ352653.1;GQ352652.1;GQ352654.1;GQ352659.1;GQ352656.1;AB370894.1;GQ352657.1;GQ352658.1;GQ352651.1
23c23
< merged_12S_15 16      0       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:NA;family:Embiotocidae;genus:Rhacochilus;species:Rhacochilus vacca; superkingdom:100.0;phylum:100.0;class:100.0;order:31.6;family:100.0;genus:94.8;species:94.8;    LC091972.1;AY279572.1;LC091971.1;LC091976.1;LC091975.1;AF285918.1;AY279573.1;KU530212.1;LC091978.1;LC104608.1;LC091977.1;FJ616394.1;AP009128.1;LC091974.1;LC091973.1;AB969917.1;AP009129.1;JN125224.1;JN125222.1;AY279571.1;JN125226.1;JN125223.1;JN125227.1;AB969916.1;JN125225.1
---
> merged_12S_15 16      0       superkingdom:Eukaryota;phylum:Chordata;class:Actinopteri;order:NA;family:Embiotocidae;genus:Rhacochilus;species:Rhacochilus vacca; superkingdom:100.0;phylum:100.0;class:100.0;order:31.1666666667;family:100.0;genus:93.5;species:93.5;   AY279572.1;LC091972.1;LC091971.1;LC091975.1;LC091976.1;AY279573.1;AF285918.1;KU530212.1;LC091977.1;FJ616394.1;AP009128.1;LC091978.1;LC104608.1;LC091973.1;LC091974.1;AB969917.1;AP009129.1;JN125222.1;JN125224.1;AY279571.1;JN125227.1;JN125223.1;JN125226.1;AB969916.1;JN125225.1

Problem in dada2 step

Hi,

I am testing the Anacapa QC dada2 script with the 12S example data and am currently stuck at the dada2 step.

Checking that Paired reads are still paired:

12S ...
12S_first1000reads-LSC-A-1-S19-L001 ...check!
12S_first1000reads-LSC-A-2-S20-L001 ...check!
mar feb 22 20:42:21 -05 2022

Process metabarcode reads for with dada2

12S
Running Dada2 inline

Running dada2 on paired reads
0

Everything stops there indefinitely. I have R version 4.0.0. and have been trying to figure out what is the script stopping there, but with no success. Hope anyone can provide any help.

JB

Can't get to run ANACAPA

I am trying to use the ANACAPA pipeline instead of the whole dada2 on R script.

I have installed all the programs needed and input them in anacapa_config.sh as shown below.

MODULE_SOURCE=""
FASTX_TOOLKIT=""
ANACONDA_PYTHON="/root/Descargas/env/bin/python"
BOWTIE2="/root/.conda/envs/bowtie2/bin/bowtie2"
ATS=""
R="/opt/R/4.0.0/bin/R"
PYTHONWNUMPY=""
GCC="/usr/bin/gcc"

CUTADAPT="/root/.conda/envs/cutadaptenv/bin/cutadapt"
MUSCLE="/home/dna_server/Documentos/anacapa/anacapa/Anacapa_db/muscle"
RUNNER="/bin/bash"
QUEUESUBMIT="qsub"
export PATH="/usr/local/anacapa/miniconda/bin:$PATH"

I am currently missing the fastx_toolkit path, since there is no fastx-toolkit executable. Is there any specific executable I should use? I installed fastx using bioconda (conda install fastx_toolkit).

Besides this problem, when I run the code I get the following error:

cutadapt: error: unrecognized arguments: -f /home/dna_server/Documentos/eDNA_analysis/HN00160001/anacapa_QC_dada2_output//QC/fastq/At-F-10_1.fastq /home/dna_server/Documentos/eDNA_analysis/HN00160001/anacapa_QC_dada2_output//QC/fastq/At-F-10_2.fastq

I guess the cutadapt loads but it cannot process the -f parameter, but I don't know how to solve this. I am not a programmer, so I am sure am missing something here. I would appreciate any help!

thanks

muscle in db folder?

We should have the user add muscle to the anacapa folder? i suppose we can just use the config...

Problem with the dada2 part of the pipeline

Hello!

I am trying to learn and run the Anacapa pipeline for my PhD project. I am using macOS Catalina with the Miniconda environment (Python 2.7) with both Conda and Homebrew installed. I also have R and RStudio already installed too.

At the moment I have managed to be able to install all the dependencies (such as fasxt toolkit, dada2, and others) run the script (the Anacapa QC, dada2 and BLCA) until the dada2 part of the pipeline. So I am using your examples and 12S example data to learn how to use it and how to manage the outputs. I manage to sort some issues regarding macOS (such as updating the Bash GCC compiler to be able to recognise the &>> code of the shell). But apparently there is a problem when the pipeline gets to the dada2 part.
Here is the Terminal code:

**Running in local mode
Using User Defined Primers
Required Arguments Given

Sun 26 Apr 2020 02:21:02 BST

Preprocessing: 1) Generate an md5sum file
Sun 26 Apr 2020 02:21:02 BST
Preprocessing: 2) Change file suffixes
Sun 26 Apr 2020 02:21:02 BST
Preprocessing: 3) Uncompress files
Sun 26 Apr 2020 02:21:02 BST
QC: 1) Run cutadapt to remove 5'sequncing adapters and 3'primers + sequencing adapters, sort for length, and quality.

Generating Primer and Primer + Adapter files for cutadapt steps. Your adapter type is nextera.

first1000reads-LSC-A-1-S19-L001 ...
forward...
check
reverse...
check
Sun 26 Apr 2020 02:21:03 BST

first1000reads-LSC-A-2-S20-L001 ...
forward...
check
reverse...
check
Sun 26 Apr 2020 02:21:03 BST
12S

Checking that Paired reads are still paired:

12S ...
12S_first1000reads-LSC-A-1-S19-L001 ...check!
12S_first1000reads-LSC-A-2-S20-L001 ...check!
Sun 26 Apr 2020 02:21:04 BST

Process metabarcode reads for with dada2

12S
Running Dada2 inline

Running dada2 on paired reads
0
moving on
Sun 26 Apr 2020 02:22:00 BST

Running dada2 on forward reads
0
moving on
Sun 26 Apr 2020 02:22:54 BST

Running dada2 on reverse reads
0
moving on
Sun 26 Apr 2020 02:23:48 BST

If a dada2 job fails you can find the run script...**

When I go to check the dada2 output it says this:

**Downloading GitHub repo benjjneb/[email protected]

checking for file ‘/private/var/folders/f9/rgzykm7d5439h5_mpc1h3lw80000gn/T/Rtmp91tC20/remotesac236df086cb/benjjneb-dada2-553008d/DESCRIPTION’ ...

✔ checking for file ‘/private/var/folders/f9/rgzykm7d5439h5_mpc1h3lw80000gn/T/Rtmp91tC20/remotesac236df086cb/benjjneb-dada2-553008d/DESCRIPTION’

─ preparing ‘dada2’:

checking DESCRIPTION meta-information ...

✔ checking DESCRIPTION meta-information

─ cleaning src

─ checking for LF line-endings in source and make files and shell scripts

─ checking for empty or unneeded directories

─ building ‘dada2_1.6.0.tar.gz’

  • installing source package ‘dada2’ ...
    ** using staged installation
    ** libs
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c RcppExports.cpp -o RcppExports.o
    In file included from RcppExports.cpp:4:
    In file included from ./../inst/include/dada2.h:7:
    ./../inst/include/dada2_RcppExports.h:14:14: warning: unused function 'validateSignature' [-Wunused-function]
    void validateSignature(const char* sig) {
    ^
    1 warning generated.
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c Rmain.cpp -o Rmain.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c chimera.cpp -o chimera.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c cluster.cpp -o cluster.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c error.cpp -o error.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c evaluate.cpp -o evaluate.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c filter.cpp -o filter.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c kmers.cpp -o kmers.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c misc.cpp -o misc.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c nwalign_endsfree.cpp -o nwalign_endsfree.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c nwalign_vectorized.cpp -o nwalign_vectorized.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c pval.cpp -o pval.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c strmap.cpp -o strmap.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppParallel/include' -I/usr/local/include -fPIC -Wall -g -O2 -c taxonomy.cpp -o taxonomy.o
    clang++ -mmacosx-version-min=10.13 -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o dada2.so RcppExports.o Rmain.o chimera.o cluster.o error.o evaluate.o filter.o kmers.o misc.o nwalign_endsfree.o nwalign_vectorized.o pval.o strmap.o taxonomy.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
    installing to /Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-dada2/00new/dada2/libs
    ** R
    ** data
    *** moving datasets to lazyload DB
    ** inst
    ** byte-compile and prepare package for lazy loading
    Error: object ‘narrow’ is not exported by 'namespace:Biostrings'
    Execution halted
    ERROR: lazy loading failed for package ‘dada2’
  • removing ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library/dada2’
  • restoring previous ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library/dada2’
    Error: Failed to install 'dada2' from GitHub:
    (converted from warning) installation of package ‘/var/folders/f9/rgzykm7d5439h5_mpc1h3lw80000gn/T//Rtmp91tC20/fileac232b86ad44/dada2_1.6.0.tar.gz’ had non-zero exit status
    Execution halted**

I have made sure that dada2 is installed (along with the Bioconductor package manager for R) and being able to load it when using RStudio, but for some reason, the Anacapa script is trying to download the dada2 1.6 version for GitHub instead of working with the installed version I have. And apparently, when trying to download the dada2 1.6 version with RStudio and R I cannot do it, as it displays the same error messages as the dada2 Anacapa output file.

I am no expert on meddling with code and shells, but do you think this is still something I have to polish on my part or maybe the Anacapa pipeline is trying to use the dada2 1.6 version instead of the one I have installed (which is the latest versions from Bioconductor). Because the 1.6 version is not available anymore from GitHub or other repos, maybe this is the reason why I cannot get the pipeline running.

Do you know what can I do to address this issue, as I can't still test the other part of the Anacapa pipeline?

Thanks a lot and have a great start of the week!

merge_asv.py bug

If dada2 does not generate a nochim_unmerged.txt, then the merge_asv.py file will throw an error and not build the summary table required to append the blca results.

clarifications in README

  • Line 62 of README:

All reads are first globally aligned against the CRUX database using Bowtie 2.

Should this read "All ASVs are first globally...". Similarly in the following sentence.

  • Line 29 of README indicates that there are CRUX DBs available in DRYAD, but later on (Line 208) the link is to a google drive- which will you want to use in the long term?

How to convert own reference database to Anapaca-compatible library?

Hi!

Thank you so much for your work and for making things a lot easier for non-coding people. :)
I would like to ask if there is a way that we can convert our own reference database to an Anacapa-compatible library. I found the link in your GitHub documentation about this but the link is broken. Here is the said link: https://github.com/limey-bean/CRUX_Creating-Reference-libraries-Using-eXisting-tools/tree/master/crux_release_V1_db/scripts

Thank you for your attention and looking forward to hearing from you.

Best regards and more power,
Joyce

Problem with my own reads/data

Hello again,

I manage to use the anacapa data_container and the test_data works great! I am now trying to use my own dataset, so I started with a couple of files to see everything runs smoothly. I am also using the 12S_mitofish primers, so there is no change there. I have checked my data and with my service provider and my data has already trimmed the nextera adapters and all reads start with my primer sequence. Then, I run the code and this is the output:

Thu Feb 24 18:36:29 -05 2022

Preprocessing: 1) Generate an md5sum file
Thu Feb 24 18:36:30 -05 2022
Preprocessing: 2) Change file suffixes
Thu Feb 24 18:36:30 -05 2022
QC: 1) Run cutadapt to remove 5'sequncing adapters and 3'primers + sequencing adapters, sort for length, and quality.

Generating Primer and Primer + Adapter files for cutadapt steps.  Your adapter type is nextera.

At-F-06 ...
forward...
check
reverse...
check
Thu Feb 24 18:38:25 -05 2022
12S

Checking that Paired reads are still paired:

 ...
 3' primer sequences 12S

Process metabarcode reads for with dada2

If a dada2 job fails you can find the run script in /home/dna_server/Documentos/anacapa_datacontainer/anacapa/Fish/12S_qc_dada2_output//Run_info/run_scripts and the dada2 output in /home/dna_server/Documentos/anacapa_datacontainer/anacapa/Fish/12S_qc_dada2_output//Run_info/dada2_out/

good luck!

real    1m56,296s
user    1m53,513s
sys     0m1,742s

It seems something is not working as it never starts the reads in dada2. It has to be an issue with the cutadapted reads, but I have no idea what I could be doing wrong. I am attaching the cutadapt report if that may help. Thanks for the help.

cutadapt-report.txt

check config file script?

At some point we should write a script to check that the user has what they need to run the scripts

License

Hi,

What is the license for this tool?

Thanks,
Cornel

To-do list- data cleaning step

Data cleaning

  • check if user has periods in file names; break script if so and ask user to rename files
  • rewrite the perl script into python; make it put all of the cleaned output files into a new folder

Remove explicit sh from command spawning

During my containerization process I've been getting source: command not found errors. I tracked it down to usage of sh explicitly in the various Anacapa shell scripts.

So I would recommend changing lines like this:

sh ~/Anacapa_db/scripts/anacapa_release_V1.sh

To simply this:

~/Anacapa_db/scripts/anacapa_release_V1.sh

Since all of the .sh files have #!/bin/bash at the top putting sh before them in the commands is unnecessary, and in fact is overriding bash and forcing it to use sh, which is causing my issue here.

The reason is:

https://stackoverflow.com/questions/13702425/source-command-not-found-in-sh-shell

/bin/sh is usually some other shell trying to mimic The Shell. Many distributions use /bin/bash for sh, it supports source. On Ubuntu, though, /bin/dash is used which does not support source. If you cannot edit the script, try to change the shell which runs it.

I think the version of sh on Hoffman is bash, but on other systems (such as in my Ubuntu container) the sh doesn't support source.

rename script file according to step

might be an in idea to rename the script files in order of their usage, e.g. 01-clean-seqs.sh, 02-paired-end-dada2.R, ... 08-biom-automation.R

(matters once we chunk things up into 5, not so relevant right now)

Write check_paired.pl in python?

It looks a little sloppy to have a single random perl script thrown in with everything else here- can we rewrite in Python? Probably good for long-term maintenance etc. I'm probably not the best person to do it but can do it if needed.

Weirdness in unknown taxonomy

I'm confused by the behavior on unknown taxonomy:

  • what's the difference between "NA;NA;NA;NA;NA;NA" and just "" in sum.taxonomy?
  • I guess we are just at the mercy of ncbi's db/Entrez qiime here, but I find taxonomic calls like Arthropoda;Insecta;Anthoathecata;Hydractiniidae;NA;Podocoryna carnea hard to interpret as a user- and also when doing biom comparison stuff. Why isn't Podocoryna being listed as genus? We may find the r package taxize useful:
library(taxize)
classification("Podocoryna carnea", db = "ncbi")

Retrieving data for taxon 'Podocoryna carnea'

$`Podocoryna carnea`
                 name         rank     id
1  cellular organisms      no rank 131567
2           Eukaryota superkingdom   2759
3        Opisthokonta      no rank  33154
4             Metazoa      kingdom  33208
5           Eumetazoa      no rank   6072
6            Cnidaria       phylum   6073
7            Hydrozoa        class   6074
8        Hydroidolina     subclass  37516
9       Anthoathecata        order 406427
10           Filifera     suborder 406428
11     Hydractiniidae       family   6094
12         Podocoryna        genus   6095
13  Podocoryna carnea      species   6096


@jessegomer @limey-bean

using only the taxonomic assignemnt part - help wanted

Hi,

I came across this reading the Lin et al preprint and it sounds very useful! I have metabarcoding data for two marker (COI and 16S) from a very diverse community (I know that the COI sequences contain animals, plants and fungi at least) and I have been struggling with the taxonomic assignemnt.

I had spend quite a bit of time making a refdb with obitools and from the embl database, but I saw that you did all the job allreday, and a better job than I did :-)

An just while I was trying to make my DBs with CRUX, I even saw that you already had pre-maid databases for COI and 16S!

So far - so awesome!

However, I am struggling to find out how to just run the taxonomic assignment bit of the pipeline.

I have ASV tables (dada2 denoised) for both marker genes.

could you give me some pointers?

Thanks!

Fabian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.