Interfaces the Basic Local Alignment Search Tool (BLAST) to search
genetic sequence data bases with the Bioconductor infrastructure. This
includes interfaces to blastn
, blastp
, blastx
, and makeblastdb
.
The BLAST software needs to be downloaded and installed separately.
Other R interfaces for bioinformatics are also available:
- rRDP: Interface to the RDP Classifier
- rMSA: Interface for Popular Multiple Sequence Alignment Tools including ClustalW, MAFFT, MUSCLE, and Kalign
-
Install the Bioconductor package
Biostrings
following the instructions here. -
Install the
rBlast
from r-universe usinginstall.packages('rBLAST', repos = 'https://mhahsler.r-universe.dev')
-
Install the BLAST software by following the instructions found in
library('rBLAST') ? blast
library(rBLAST)
Download the 16S Microbial data base from NCBI
download.file("https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz",
"16S_ribosomal_RNA.tar.gz", mode='wb')
untar("16S_ribosomal_RNA.tar.gz", exdir="16SMicrobialDB")
Load some test sequences
seq <- readRNAStringSet(system.file("examples/RNA_example.fasta",
package="rBLAST"))
seq
## RNAStringSet object of length 5:
## width seq names
## [1] 1481 AGAGUUUGAUCCUGGCUCAGAAC...GGUGAAGUCGUAACAAGGUAACC 1675 AB015560.1 d...
## [2] 1404 GCUGGCGGCAGGCCUAACACAUG...CACGGUAAGGUCAGCGACUGGGG 4399 D14432.1 Rho...
## [3] 1426 GGAAUGCUNAACACAUGCAAGUC...AACAAGGUAGCCGUAGGGGAACC 4403 X72908.1 Ros...
## [4] 1362 GCUGGCGGAAUGCUUAACACAUG...UACCUUAGGUGUCUAGGCUAACC 4404 AF173825.1 A...
## [5] 1458 AGAGUUUGAUUAUGGCUCAGAGC...UGAAGUCGUAACAAGGUAACCGU 4411 Y07647.2 Dre...
Load a BLAST database (replace db with the location + name of the BLAST DB)
bl <- blast(db="./16S_rRNA_DB/16S_ribosomal_RNA")
bl
## BLAST Database
## Location: /home/hahsler/baR/rBLAST/16S_rRNA_DB/16S_ribosomal_RNA
## BLAST Type: blastn
## Database: 16S ribosomal RNA (Bacteria and Archaea type strains)
## 21,856 sequences; 31,790,086 total bases
##
## Date: May 1, 2021 5:36 AM Longest sequence: 3,600 bases
##
## BLASTDB Version: 5
##
## Volumes:
## /home/hahsler/baR/rBLAST/16S_rRNA_DB/16S_ribosomal_RNA
Query a sequence using BLAST and find sequences with a 99% percent identity or higher.
cl <- predict(bl, seq[1,], BLAST_args = "-perc_identity 99")
cl
## QueryID SubjectID Perc.Ident Alignment.Length Mismatches Gap.Openings
## 1 1675 NR_151899.1 100 40 0 0
## 2 1675 NR_041235.1 100 37 0 0
## 3 1675 NR_036779.1 100 34 0 0
## 4 1675 NR_117153.1 100 32 0 0
## Q.start Q.end S.start S.end E Bits
## 1 22 61 2 41 1.22e-12 75.0
## 2 22 58 2 38 5.66e-11 69.4
## 3 24 57 2 35 2.63e-09 63.9
## 4 27 58 1 32 3.40e-08 60.2
Cite the use of this package as:
Hahsler M, Nagar A (2019). rBLAST: R Interface for the Basic Local Alignment Search Tool. R package version 0.99.2, URL: https://github.com/mhahsler/rBLAST.
BibTeX
@Manual{,
title = {{rBLAST: R Interface for the Basic Local Alignment Search Tool}},
author = {Michael Hahsler and Anurag Nagar},
year = {2019},
note = {R package version 0.99.2},
url = {https://github.com/mhahsler/rBLAST}
}