jiqingxiaoxi / glapd Goto Github PK

design common and specific primer for LAMP using the whole genome

License: GNU General Public License v2.0

C 49.71% Makefile 0.04% Cuda 48.69% Perl 1.56%

glapd's Introduction

This is a customizable LAMP primer sets designing system: GLAPD (whole genome based LAMP primer design for a set of target genomes).
######
Introduction:
LAMP(Loop-mediated isothermal amplification) is a simple and effective new method to amplify DNA sequence.
One LAMP primer set contains four single LAMP primers from six primer regions in genome. The six regions are F3, F2, F1c, B1c, B2 and B3. Sequences from F1c and F2 are synthesized into primer FIP and sequences from B1c and B2 are synthesized into BIP. In order to accelerate the amplification, two additional loop primers(LF, LB) can be added.

GLAPD can design LAMP primer sets based on whole genome. It can design common primers for a set of target genomes and the primers are specific for background genomes.
Firstly, GLAPD identifies all candiate single primer regions. Then those single primers are aligned to target genomes and background genomes. Thirdly, GLAPD combines candidate single primers into LAMP primer sets. At last the commonality and specificity check are calculated for LAMP primer sets with the information from alignment.
######
SYSTEM REQUIREMENTS:
GLAPD now runs under Linux operation system, it needs perl and gcc. If run GPU version, the computer capability of GPU >=2.0.
######
OTHER SOFTWARES MAY BE NEEDED
Bowtie, which could be downloaded from http://bowtie-bio.sourceforge.net/index.shtml
CUDA driver, which could be downloaded from http://www.nvidia.com
######
Files and Directories:
This system is packaged in one file, "GLAPD.tar.gz", which can be downloaded from http://cgm.sjtu.edu.cn/GLAPD/ and https://github.com/jiqingxiaoxi/GLAPD2.git. Important files inside this tar.gz file are listed here.

|-Par/
tm_nn_parameter.txt Parameters for calculating Tm.
stab_parameter.txt Parameters for calculating the stability of single primers.
*.db, *.ds Sixteen files for calculating the secondary structure of primers.
|-par.pl Get the information of positions and mismatches for each single primer.
|-single.c The CPU version of GLAPD for identify single primer regions.
|-LAMP.c The CPU version of GLAPD for design LAMP primer sets.
|-Makefile Makefile for CPU version.
|-GPU/
single.cu The GPU version of GLAPD for identify single primer regions.
LAMP.cu The GPU version of GLAPD for design LAMP primer sets.
Makefile Makefile for GPU version.
|-bowtie/
bowtie The bowtie program
bowtie-build The bowtie-build program
|-example/
example.fa Sequences as example, from "NC_002951.2 Staphylococcus aureus subsp. aureus COL chromosome".
target-list.txt The list of names from target genomes. It contains 3 strains of S. aureus.
background-list.txt The list of names from background genomes. It contains 3 strains of other bacteria.
index.* The index files for Bowtie software.
######
INSTALLATION:
tar -zxvf GLAPD.tar.gz
If you want to use CPU version:
cd GLAPD/
make
If you want to use GPU version:
cd GLAPD/GPU/
make
######
QUICK START:
1. If you want design LAMP primers for a sequence without taking care of commonality and specificity:
cd GLAPD/
./Single -in example/example.fa -out Test
(Two files "Inner/Test" and "Outer/Test" are created.)
./LAMP -in Test -ref example/example.fa -out success.txt
(Ten LAMP primer sets are designed successfully stored in "success.txt" file.)

2. If you want design common LAMP primers without taking care of specificity:
cd GLAPD/
./Single -in example/example.fa -out Test
(Two files "Inner/Test" and "Outer/Test" are created.)
perl par.pl --in Test --ref example/example.fa --bowtie Bowtie_path/bowtie --index example/index --common example/target-list.txt
(Three files "Inner/Test-common_list.txt", "Inner/Test-common.txt" and "Outer/Test-common.txt" are created.)
./LAMP -in Test -ref example/example.fa -out success.txt -common
(Ten LAMP primer sets are designed successfully stored in "success.txt" file.)

3. If you want design specific LAMP primers without taking care of commonality:
cd GLAPD/
./Single -in example/example.fa -out Test
(Two files "Inner/Test" and "Outer/Test" are created.)
perl par.pl --in Test --ref example/example.fa --bowtie Bowtie_path/bowtie --index example/index --specific example/background-list.txt
(Two files "Inner/Test-specific.txt" and "Outer/Test-specific.txt" are created.)
./LAMP -in Test -ref example/example.fa -out success.txt -specific
(Ten LAMP primer sets are designed successfully stored in "success.txt" file.)

4. If you want design common and specific LAMP primers:
cd GLAPD/
./Single -in example/example.fa -out Test
(Two files "Inner/Test" and "Outer/Test" are created.)
perl par.pl --in Test --ref example/example.fa --bowtie Bowtie_path/bowtie --index example/index --common example/target-list.txt --left
(Five files "Inner/Test-common_list.txt", "Inner/Test-common.txt", "Inner/Test-specific.txt", "Outer/Test-common.txt" and "Outer/Test-specific.txt" are created.)
./LAMP -in Test -ref example/example.fa -out success.txt -common -specific
(Ten LAMP primer sets are designed successfully stored in "success.txt" file.)
######
RUN THE SYSTEM:
1.Identify candidate single primer regions:
Command:
Single -in <ref_genome> -out <single_primers> [options]*

Arguments:
-in <ref_genome>
reference genome, fasta formate
-out <single_primers>
output the candidate single primers
-dir <directory>
the directory for output file
default: current directory
-loop
identifiy candidate single primer regions for loop primers
-check <int>
check single primers' secondary structure or not
0: don't check secondary structure; other values: check
default: 1
-par <par_directory>
parameter files under the directory are used to check primers' secondary structure
default: GLAPD/Par/
-h[-help]
print usage

2.Align sequences from single primer regions(optional):
Command:
perl par.pl --in <sinlge_primers_file> --ref <ref_genome> --common[--specific] <genomes_list> --bowtie <bowtie> --index <database> [options]*

Arguments:
--in <single_primers_file>
the file name of candidate single primer regions, files are generated from Single program
--ref <ref_genome>
reference genome, fasta formate
--dir <directory>
dirctory for files of candidate single primer regions
default: current directory
--loop
include loop primers
--common <genomes_list>
the genomes in the file(target genomes) are expected to be amplified by LAMP primer sets
--specific <genomes_list>
the genomes in the file(background genomes) are not expected to be amplified by LAMP primer sets
--left
background_group = all_genome_in_database - target_group
used with --common
invalid if exist --specific
--bowtie <bowtie>
the bowtie program
--index <database>
bowtie index file name, comma-separated
--mis_s <int>
the max number of mismatches allowed when align single primers to background genomes
this value between 0 and 3. the bigger of the value, the more specific
default: 2
--mis_c <int>
the max number of mismatches allowed when align single primers to target genomes
this value between 0 and 3. the smaller of the value, the more common
default: 0
--threads <int>
number of threads to launch when align
default: 1
--help|--h
print help information

3.Design LAMP primer sets:
Command:
LAMP -in <sinlge_primers_file> -ref <ref_genome> -out <LAMP_primer_sets> [options]*

Arguments:
-in <single_primers_file>
the file name of candidate single primer regions, files are generated from Single program
-ref <ref_genome>
reference genome, fasta formate
-dir <directory>
the directory for output file
default: current directory
-out <LAMP_primer_sets>
output successfully designed LAMP primer sets
-num <int>
the expected output number of LAMP primer sets
default: 10
-loop
design LAMP primer sets with loop primers
-common
design common LAMP primer sets those can amplify more than one target genomes
-specific
design specific LAMP primer sets those can't amplify any background genomes
-check <int>
check primers' tendency of binding to another in one LAMP primer set or not
0: don't check; other values: check
default: 1
-par <par_directory>
parameter files under the directory are used to check primers' binding tendency
default: GLAPD/Par/
-fast
fast mode to design LAMP primer sets, in this mode GLAPD may lost some right results
-h/-help
print usage
######
INPUT FILE FORMAT:
1) The reference genome must be fasta format.
2) The common list file and specific list file used in step 2 must be one genome name per line. They can be generated by taking the sequence names from target or background genomes directly.
3) The bowtie index can be generated by "bowtie-build" command, more details in http://bowtie-bio.sourceforge.net/tutorial.shtml.
OUTPUT FILE FORMAT:
1) In the files generated by "Single" program, each line means a candidate single primer. For example:
"pos:3 length:25 +:2 -:0 61.89"
Each line has five fields seperated by tabs. From left to right, the fields are:
1. The position of the single primer in reference genome (0-based)
2. The length of the single primer
3,4. Sum of all applicable flags. "+" means the primer from the plus strand of reference genome and "-" means the primer from minus strand. If the number is "0", the single primer isn't from the plus or minus strand of reference genome. Flags are:
1: this single primer can be used to amplify a target if the GC-content of target region is >=60%
2: this single primer can be used to amplify a target if the GC-content of target region is <=45%
4: this single primer can be used to amplify a target if the GC-content of target region is between 45% and 60%
5. Tm
2) In the files generated by "par.pl" ("XX-common.txt" and "XX-specific.txt), each line stores an alignment. For example:
"3 25 2 501407 1 0"
Each line has six fields seperated by tabs. From left to right, the fields are:
1. The position of the single primer in reference genome (0-based)
2. The length of the single primer
3. The genome turn in target group or background group (0-based)
4. The position of alignment in this genome(field 3)
5. "1" means this single primer can be used to amplify the genome (field 3) in plus strand. "0" means this single primer can't be used to amplify the genome (field 3) in plus strand.
6. "1" means this single primer can be used to amplify the genome (field 3) in minus strand. "0" means this single primer can't be used to amplify the genome (field 3) in minus strand.
3) In the file generated by "par.pl" ("XX-common_list.txt"), each line contains one target genome. For example:
"NC_002951.2 0"
Each line has two fields seperated by tab. From left to right, the fields are:
1. The name of target genome
2. The turn of target genome in target group (0-based)
4) The LAMP primer set is stored in the file generated by "LAMP" program, for example:
"The 1 LAMP primers:
F3: pos:36,length:18 bp, primer(5'-3'):CGGTTCCCTGTACTCGAA
F2: pos:74,length:19 bp, primer(5'-3'):AATTCCTTTGTTGAGGCCG
F1c: pos:115,length:24 bp, primer(5'-3'):CGAAATCTTCAAACACTACGTGCT
B1c: pos:175,length:22 bp, primer(5'-3'):CCTGACGGAAGCAGCATTAAGT
B2: pos:237,length:19 bp, primer(5'-3'):CGAACGTAACCAAAGTCGT
B3: pos:276,length:23 bp, primer(5'-3'):TAAAAAATAAAAAACCGTGCACC
This set of LAMP primers could be used in 3 genomes, there are: NC_002951.2, NC_017340.1, NC_002745.2"
One LAMP primer set contains at least six single primers. The positions of single primers in reference genome and their length, sequence are listed in this file. When user designs common primers, which target genomes can be amplified by this primer set are also listed in this file.
######
TIPS:
1) Select reference genome:
The reference genome can be select one randomly from the group of target genomes, or the most expected genome amplified by the LAMP primer set.
2) Specific file:
When run the step 2, if you have the "common file", you can use "--left" option to replace the "--specific". In this way, all genomes in database expect for those in "common file" are defined as the background genomes.
3) Fast mode:
When run the step 3, use the "-fast" option can accelerate the designing. But this mode may lost some right LAMP primer sets.
######
If you have any questions, please contact with us:
Ben Jia: [email protected]
Chaochun Wei: [email protected]

glapd's People

Contributors

Stargazers

Watchers

Forkers

jxshi twelvesummer micro-irfan qiyueming lymc33 litao-zhou-hub aizhimin beeromics

glapd's Issues

genome size

Hello.
I want to try your program but before going into installing it and trying to actually try to use it, I want to ask you if there is a limit for the size of the genomes that the program can handle. I work with fungal genomes and they are very big compared to bacteria. Also, they the genome assemblies are made of many contigs and I wonder if that will work. Thank you

GPU version can't find primers

Hello,

I did some tests with both CPU/GPU versions of GLAPD but looks like GPU did not work, even using the same example inputs:

CPU Single step message:
It takes 4 seconds to prepare.
There ara 10965 candidate primers used as F3/F2/B2/B3.
There are 9613 candidate primers used as F1c/B1c.
There are 9659 candidate primers used as LF/LB.
It takes 56 seconds to identify candidate single primer regions.

GPU Single step message:
It takes 3 seconds to prepare.
There ara 0 candidate primers used as F3/F2/B2/B3.
There are 0 candidate primers used as F1c/B1c.
There are 0 candidate primers used as LF/LB.
Warning: there don't have enough primers(>=4) used as F3/F2/B2/B3.
Warning: there don't have enough primers(>=2) used as F1c/B1c.
Warning: there don't have enough primers(>=1) used as LF/LB. But you can design LAMP primers without loop primer.
It takes 1 seconds to identify candidate single primer regions.

Are there any limitations regarding the GPU scripts?

GPU make message:
nvcc -arch sm_86 single.cu -o Single
nvcc -arch sm_86 LAMP.cu -o LAMP

CUDA compiler:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

Permission denied

Hey,
It is a very nice tool for making LAMP primers, but now I have a error and I don't know what was wrong. Maybe someone can help me to fix this.

First I tested the tool with a small genome with 30kb, also to check everything. Then I made a new bowtie database with other bacteria genomes. The output of Single was fine:
There ara 5843008 candidate primers used as F3/F2/B2/B3.
There are 3970570 candidate primers used as F1c/B1c.

Then was the goal to make specific primers for the detection:
perl par.pl --in xxx --ref /PATH/TO/GENOME.fna --common /PATH/TO/target-list.txt --bowtie /PATH/TO/GLAPD/bowtie --index /PATH/TO/GLAPD/bowtie/INDEX --left.
But then I goes wrong:
Now the program is handling the 1-th file, total files is 2...
Can't exec "/PATH/TO/GLAPD/bowtie": Permission denied at par.pl line 342.
Can't open the /PATH/TO//GLAPD/Inner/xxxxx-0.bowtie file!
The file in Inner doesn't exist, and which permission is denied?
I already checked the bowtie database and make a new one. I read in another issue about plasmids, so I also checked the genome of interest for chomosome sequence and plasmids, but that was only the chromosome.

So what is going wrong and what should be the solution?

Use bowtie2

Hello

Thank you for the awesome work with GLAPD,
I wonder if we could use Bowtie 2 speed/memory/sensitivity advantages when compared to Bowtie v1.

Thanks!

COMMON genomes saved locally

Please advise how to code the common option for scenario (4), if I only have the assembled .fna files to be targeted that are saved locally, not in NCBI.
Thank you!

Error(?) messages: Use of uninitialized value...

Hello,

Running par.pl step, I receive these messages:

Use of uninitialized value $array[0] in exists at ./par.pl line 212, <IN> line 2.
Use of uninitialized value $array[0] in hash element at ./par.pl line 217, <IN> line 2.
Use of uninitialized value $list_common[1] in concatenation (.) or string at ./par.pl line 323.

I've checked the lines in the perl script but could not figure out what the variables mean in those lines.

Seems like the results are ok, though.

Do I need to worry about these messages?

Thanks again !

Single identifies primers beyond range

Hello! I'm having a problem when running GLAPD. After running Single on a target genome, some primers generated by the program fall on positions beyond the length of the genome (if the genome is 5 040 356bp long, Single calculates primers for positions up to 5 040 965). Single runs normally, without giving any errors, but when I then try using par.pl on the results it cannot get some primers from the genome sequence using the substring command.
I have also run the program with other genomes which have a single chromosome (as opposed to this one, which has one chromosome and several plasmids), and it seems to run ok.
Is this a bug or am I making some mistake running the program?

I add the commands and output below:

First, run Single:

./Single -in Aeromonas_test.fna -out Aeromonas_test

Output stream:
It takes 0 seconds to prepare.
There ara 6893989 candidate primers used as F3/F2/B2/B3.
There are 7833456 candidate primers used as F1c/B1c.
It takes 37439 seconds to identify candidate single primer regions.

Second, run par.pl:

perl par.pl
--in Aeromonas_test
--ref Aeromonas_test.fna
--bowtie bowtie/bowtie
--index indexes/Aeromonas_test_index
--common Aeromonas_common.txt
--left
--threads 8

Ouput stream:
Now the program is handling the 1-th file, total files is 2...
In this step, it takes 5879 seconds.
Now the program is handling the 2-th file, total files is 2...
In this step, it takes 5589 seconds.

Error stream:
substr outside of string at par.pl line 311, line 7832666.
Use of uninitialized value $primer in concatenation (.) or string at par.pl line 312, line 7832666.
...
Error while flushing and closing output
terminate called after throwing an instance of 'int'

Is there any parameters to adjust the distance between F1c and B1c?

I need primers satisfies [position_B1c - position_F1c > 50]. Is there any parameters to adjust the distance between F1c and B1c?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.