Giter Club home page Giter Club logo

primalscheme's Introduction

primalscheme

License: GPL v3 Build Quality Gate Status


about

primalscheme is a tool for designing primer panels for multiplex PCR. It uses a greedy algorithm to find primers for tiling amplicon generation for multiple reference genomes. It works best on viral isolates of known lineages e.g. outbreak strains.

web interface

a web interface is available at primalscheme.com, although it does not currently support all available options.

requirements

  • Linux, macOS or Windows
  • Python 3.6+

installation from PyPI using pip

Create & activate a virtual environment (recommended):

python3 -m venv /path/to/primal-venv
source /path/to/primal-venv/bin/activate

or on Windows:

python -m venv c:\path\to\primal-venv
c:\path\to\primal-venv\Scripts\activate.bat

Install primalscheme with pip:

pip install primalscheme

installation from source

git clone https://github.com/aresti/primalscheme.git primalscheme
cd primalscheme
python3 -m venv venv
source venv/bin/activate
pip install .

Note: if you'd like your installation to be editable, use:

pip install flit
flit install --pth-file

(--pth-file vs --symlink: maintains module visibility for IDE debuggers, and works on Windows.)

command line interface

To design a multiplex scheme run primalscheme multiplex <FASTA>.

If you're looking for test inputs you can try CHIKV_demo.fa, Ebov-10-Pan.fasta or nCov-2019.fasta

optional arguments

  • --amplicon-size, -a: Amplicon size target. Pass twice to set an exact range (e.g, -a 220 -a 250), otherwise expect +/- 5% (default 380, 420).
  • --outpath, -o: Path to output directory (default: ./output).
  • --name, -n: Prefix name for your outputs (default: scheme).
  • --target-overlap, -t: Target insert overlap size (default: 0).
  • --debug, -d: Set log level DEBUG.
  • --force, -f: Force output to an existing directory, overwrite files.
  • --pinned, -p: Only consider primers from the first reference.
  • --high-gc, -g: Use config suitable for high-GC sequences.
  • --help, -h: Show help message and exit.

modifying config

Many parameters are not directly exposed in the CLI as we believe they are necessary for successful multiplex PCR. However, most values can be modified in config.py if you install the package from source in editable mode with flit install --pth-file (see above).

output

  • {output_dir}/{prefix}.reference.fasta - all input references
  • {output_dir}/{prefix}.primer.bed - coordinates of primer positions
  • {output_dir}/{prefix}.insert.bed - coordinates of trimmed amplicons
  • {output_dir}/{prefix}.primer.tsv - primer sequences and information
  • {output_dir}/{prefix}.plot.pdf - diagrammatic overview of scheme (PDF)
  • {output_dir}/{prefix}.plot.svg - diagrammatic overview of scheme (SVG)
  • {output_dir}/{prefix}.report.json - run report (reference ids, regions, gaps, coverage)
  • {output_dir}/{prefix}.pickle - pickled Python objects (if --debug)
  • {output_dir}/{prefix}.log - run logs (more detail if --debug)

how to design a scheme

  1. Download complete genomes from GenBank and concatenate into a single FASTA file.
    • e.g. cat "sequence (1).fasta" "sequence (2).fasta" > input.fasta
  2. Align all sequences using Clustal Omega.
  3. The genomes should be complete and roughly the same length.
    • Primers are only designed within the limits of the alignment.
    • The first reference will be used for the coordinate system in the output scheme.
  4. Check the 'Result Summary' tab then the 'Percent Identity Matrix' link to check the identity matrix of all sequences.
    • Check that the maximum sequence divergence is not more than 5%.
    • If your sequences are more divergent, it may be necessary to separate them into multiple independent schemes; the 'Phylogenetic Tree' tab may be useful for this.
  5. Remove sequences with 99-100% identity to other genomes in the file as these will bias the primer selection
  6. Run primalscheme on your FASTA file.
    • Optionally, set amplicon min/max length as required by your sequencing technology.

protocol

We strongly recommend using the PCR conditions outlined in the Nature Protocols paper.

description of the algorithm

  • Set the first reference as primary, to be used for the coordinate system.
  • Open a region window of width AMPLICON_SIZE_MAX onto the primary reference.
  • Align the flanks of this reference slice to all other references using parasail.
  • Take all aligned flanks and digest them into K-mers (primers) of length PRIMER_SIZE_MIN to PRIMER_SIZE_MAX.
  • Hard filter the primers to eliminate those with unacceptable GC, Tm, homodimer and homopolymer length.
  • Filter out any primers with > PRIMER_MAX_MISMATCHES against any reference.
  • Sort left and right primers according to base penalty plus weighted mismatch score.
  • Check for heterodimers in same pool using primer3-py; select top scoring, non-interacting pair.
  • Open new window position set to maintain overlap if region is successful.
  • Repeat routine until end of genome is reached.
  • If during the above routine there is a) failed alignment b) no primers after filtering then:
    • Move window position left by STEP_DISTANCE before retrying.
    • Continue stepping left until a hard-left limit is reached (start of genome, primer collision or no progress).
    • Reset window position to initial value.
    • Move window position right (to open a gap or reduce overlap) before retrying.
    • Exclude secondary reference on repeated flank alignment failure.

citing

If you use primalscheme please cite:

Quick J et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat Protoc. 2017 Jun;12(6):1261-1276.

primalscheme's People

Contributors

aresti avatar joshquick avatar

Forkers

b1b18

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.