Giter Club home page Giter Club logo

barcoder's Introduction

#########################################
# This file is part of barCoder.
#
# barCoder is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3 of the License.
#
# barCoder is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with barCoder.  If not, see <https://www.gnu.org/licenses/>.
#
#########################################
# Introduction
#########################################
# This document describes how to use the barcode generation code. The code will
# generate a list of sequences that can be directly synthesized and inserted
# into the target genomes via user-specified targeting vectors. 
#
# Running the code
#
# This software was tested on CentOS 7, and requires at least 64 GB of RAM in 
# order to support BLAST searches against the "nt" database. A modern CPU is 
# recommended, however only the BLAST searches utilize multi-threaded
# processing. The default behavior is to use all cores available for BLAST. 
#
# Dependencies:
#
# * perl 5.8 or higher
# * Bioperl. Information on installing Bioperl can be found at
#   https://bioperl.org/INSTALL.html Specific Bioperl modules used in this program
#   include:
#	- StandAloneBlastPlus (bl2seq, blastn)
#	- EMBOSS
# * nt database from NCBI: ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.*
# * See "Detailed Installation" below for more one specific path.
#
# Test data is located in the Projects directory.	
#
# The purpose of the code is to generate a set of barcodes that minimize the chances of 
#		non-unique detection.
#
# Generating unique barcodes: 
# 	A project contains a set of genomes that need to have barcodes generated
# for them, called "target genomes". These barcodes meet a minimum set of 
# requirements for PCR-based detection, including standard PCR parameters
# (Tm, length, etc.) and high specificity. The specificity includes low homology
# to the target genomes, other primers, and a second set of genomes called
# "other genomes". This second set of genomes includes things that do not need
# to be barcoded, but are expected to be present in the environment. The primers
# must not match these other genomes in order to minimize false positives. The 
# primers are also blasted against the NCBI nt database to maximize the odds of
# uniqueness.
#	Three primers are generated. The first primer is the same across a 
# single project and is one of the two primers needed for qPCR detection using 
# either SYBR Green or TaqMan technologies. The second qPCR primer is unique for
# each target genome in the project. For each target genome a second unique 
# sequence is generated for use as an optional probe sequence, needed only for
# TaqMan-based qPCR detection.
#	A filler sequence completes the barcode. This sequence is randomly 
# generated with 2 constraints: (1) the GC content matches that of the target
# genome and (2) there are no major secondary structures to interfere with 
# the PCR. For each target genome, a second set of 3 primers are generated as
# a backup in case any of the sets fail for some unforseen reason.
#	At the time of execution, the user can choose the number of barcodes to 
# generate per target genome. This provides multiple alternative barcodes in the
# case that 1 or more do not function well.
#
#
#########################################
# User Guide
#########################################
# The first step is to create a new project directory. This directory needs to 
# 	be placed under the Barcode/Projects/ directory. There are directories
#	there called BAsterne, BTK and YPco92 as examples.
#
# This folder should contain the following (some are optional):
#	(1) Subdirectory containing the target genomes. These can be in genbank or
#		fasta format. 
#	(2) Text file containing the script parameters, called "parameters.txt".
#		These include things like target Tm and blast threshold to count 
#		as a "match." This file should be copied from the BAsterne, BTK or YPco92  
#		project and tweaked as appropriate. Each parameter is described 
#		in the "parameters.txt" file in the project directory.
#	(3) Subdirectory containing other genomes (genbank or fasta 
#		format). The subdirectory is required, but genome files are optional, so 
#		the subdirectory may be left empty.
#	(4) (optional) List of primers to avoid from previous projects. Fasta
#		file listing primers. File should be named prevPrimers* (i.e.
#		can have prevPrimers1.fasta, prevPrimersProject2.fas, etc.)
#
# Usage: 
# 	from the Barcoder/ directory, type:
#
#	perl barcoder.pl "project_name" nBarcodes
#
#	Where:
#	- project_name = name of the subdirectory of the current project, located
#		at Barcode/Projects/project_name
#	- nBarcodes = number of barcodes to generate per target genome (it is 
#		useful to generate multiple in case some fail). 
# Outputs:
#	- primerList.fa: list of primers generated for barcodes
#	- barcodeList.fa: list of barcodes generated
#	- log files: complete description of what the script did (see
#	 	parameters.txt for some options)
#
#########################################
# Detailed Installation
#########################################
#
# Steps for installing dependencies and running barCoder, starting
# from a bare Ubuntu 22 LTS instance.  You need at least 64 GB of RAM
# and, if you're going to BLAST against the NCBI database, a disk of
# at least 500 GB.
#
#   sudo apt-get update
#   sudo apt-get upgrade
#   sudo apt install emacs perl git cpanminus gcc libexpat1-dev \
#                    amap-align ncbi-blast+ bioperl-run
#   curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
#   sh Miniconda3-latest-Linux-x86_64.sh
#    -> say yes to all questions
#   source ~/.bashrc
#   conda install -c bioconda blast
#   conda update --all
#   conda install perl-bioperl
#
# Possibly you also need to run:
#
#   cpanm Bio::Perl
#   cpanm Bio::DB::EUtilities
#   cpanm Bio::Tools::Run::StandAloneBlastPlus --force
#   cpanm Bio::Factory::EMBOSS --force
#
# Install the NCBI DB:
#
#   update_blastdb --decompress nt
#   sudo mkdir /nt/
#   sudo chown ubuntu /nt/
#   mv nt.* /nt/
#    -> then modify parameters.txt to replace "/nt/nt_v5" with "/nt/nt"
#########################################

barcoder's People

Contributors

ecbcgit avatar jeffkaufman avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.