Giter Club home page Giter Club logo

cas-offinder's Introduction

Cas-OFFinder

Cas-OFFinder is OpenCL based, ultrafast and versatile program that searches for potential off-target sites of CRISPR/Cas-derived RNA-guided endonucleases (RGEN).

Cas-OFFinder is not limited by the number of mismatches and allows variations in protospacer-adjacent motif (PAM) sequences recognized by Cas9, the essential protein com-ponent in RGENs.

Requires an OpenCL device to run properly.

Cas-OFFinder is distributed under new BSD license (3-clauses).

Cas-OFFinder has been tested on the platforms below:

  • Microsoft Windows (7 and 8)
  • GNU/Linux (CentOS, Ubuntu and Elementary OS)
  • (NEW!) Mac OS X (Mavericks)

CRISPR/Cas-derived RNA-guided endonucleases (RGEN)

RGENs use complementary base pairing to recognize target sites.

RGENs consist of two parts.

  • Guide RNA
    • Dual RNA components comprising sequence-invariant tracrRNA and sequence-variable guide RNA termed crRNA
    • ...or single-chain guide RNA (sgRNA) constructed by linking essential portions of tracrRNA and crRNA
  • Cas9 Protein
    • A fixed protein component that recognizes the protospacer adjacent motif (PAM) downstream of target DNA sequences corresponding to guide RNA.

PAM sites:

  • SpCas9 from Streptococcus pyogenes: 5’-NGG-3’ (to a lesser extent, 5’-NAG-3’)
  • StCas9 from Streptococcus thermophilus: 5’-NNAGAAW-3’ (W = A or T)
  • NmCas9 from Neisseria meningitidis:5’-NNNNGMTT-3’ (M = A or C)
  • SaCas9 from Staphylococcus aureus: 5’-NNGRRT-3’ (R = A or G)

Usage

Cas-OFFinder can run with:

cas-offinder {input_file} {G|C|A} {output_file}

G stands for using all available GPU devices, C for using all CPUs, and A for using all accelerators.

A short example may be helpful!

First, download any target organism's chromosome FASTA files. You can find one in below links:

Extract all FASTA files in a directory.

For example (human chromosomes, in POSIX environment):

$> wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
$> mkdir -p /var/chromosome/human_hg19
$> tar zxf chromFa.tar.gz -C /var/chromosome/human_hg19
$> ls -al /var/chromosome/human_hg19
  drwxrwxr-x.  2 user group      4096 2013-10-18 11:49 .
  drwxrwxr-x. 16 user group      4096 2013-11-12 12:44 ..
  -rw-rw-r--.  1 user group 254235640 2009-03-21 00:58 chr1.fa
  -rw-rw-r--.  1 user group 138245449 2009-03-21 01:00 chr10.fa
  -rw-rw-r--.  1 user group 137706654 2009-03-21 01:00 chr11.fa
  -rw-rw-r--.  1 user group 136528940 2009-03-21 01:01 chr12.fa
  -rw-rw-r--.  1 user group 117473283 2009-03-21 01:01 chr13.fa
  -rw-rw-r--.  1 user group 109496538 2009-03-21 01:01 chr14.fa
  ...

Now, download Cas-OFFinder binary here,

https://sourceforge.net/projects/cas-offinder/files/Binaries

and save it to any directory you want.

And just try running it for a short help:

$> ./cas-offinder
  Cas-OFFinder v2.2 (2014-10-22)
  
  Copyright (c) 2013 Jeongbin Park and Sangsu Bae
  Website: http://github.com/snugel/cas-offinder
  
  Usage: cas-offinder {input_file} {C|G|A} {output_file}
  (C: using CPUs, G: using GPUs, A: using accelerators)
  
  Example input file:
  /var/chromosomes/human_hg19
  NNNNNNNNNNNNNNNNNNNNNRG
  GGCCGACCTGTCGCTGACGCNNN 5
  CGCCAGCGTCAGCGACAGGTNNN 5
  ACGGCGCCAGCGTCAGCGACNNN 5
  GTCGCTGACGCTGGCGCCGTNNN 5
  
  Available device list:
  Type: CPU, 'Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz'
  Type: GPU, 'Pitcairn'

Also it provides a list of all available OpenCL devices!

On Windows, if you encountered a missing .dll error, you may need to download and install Visual C++ Redistributable Packages for Visual Studio 2013.

Now you should create an input file:

  • The first line of the input file gives directory path containing chromosomes FASTA files,
  • The second line indicates the desired pattern including PAM site,
  • ...and following lines are the query sequences and maximum mistmatch numbers, seperated by spaces. (The length of the desired pattern and the query sequences should be the same!)

For the pattern and the query sequences, mixed bases are allowed to account for the degeneracy in PAM sequences.

Also, the number of mismatched bases is not limited!

Following codes are supported:

A C G T
Adenine Cytosine Guanine Thymine
R Y S W K M
A or G C or T G or C A or T G or T A or C
 B     |     D     |     H     |     V     |   N

:---------:|:---------:|:---------:|:---------:|:------: C or G or T|A or G or T|A or C or T|A or C or G|any base

An example of input file:

/var/chromosomes/human_hg19
NNNNNNNNNNNNNNNNNNNNNRG
GGCCGACCTGTCGCTGACGCNNN 5
CGCCAGCGTCAGCGACAGGTNNN 5
ACGGCGCCAGCGTCAGCGACNNN 5
GTCGCTGACGCTGGCGCCGTNNN 5
...

Save it as 'input.txt'.

Now you can run Cas-OFFinder as following (using GPUs):

$> ./cas-offinder input.txt G out.txt
...

Then output file will be generated :

  • The first column of the output file indicates the given query sequence,
  • The second column is the FASTA title (if you downloaded it from UCSC or Ensembl, it is usually a chromosome name),
  • The third column is the position of the off-target site (same convention with Bowtie),
  • The forth column shows the actual sequence from the position (mismatched bases noted in lowercase letters),
  • The fifth column indicates forward strand(+) or reverse strand(-) of the found sequence,
  • ... and the last column is the number of the mismatched bases.

out.txt:

GGCCGACCTGTCGCTGACGCNNN chr8    49679        GGgCatCCTGTCGCaGACaCAGG +       5
GGCCGACCTGTCGCTGACGCNNN chr8    517739       GcCCtgCaTGTgGCTGACGCAGG +       5
GGCCGACCTGTCGCTGACGCNNN chr8    599935       tGCCGtCtTcTCcCTGACGCCAG -       5
GGCCGACCTGTCGCTGACGCNNN chr8    5308348      GGCaGgCCTGgCttTGACGCAGG -       5
GGCCGACCTGTCGCTGACGCNNN chr8    9525579      GGCCcAgCTGTtGCTGAtGaAAG +       5
GGCCGACCTGTCGCTGACGCNNN chr8    12657177     GGCCcACCTGTgGCTGcCcaTAG -       5
GGCCGACCTGTCGCTGACGCNNN chr8    12808911     GGCCGACCaGgtGCTccCGCCGG +       5
GGCCGACCTGTCGCTGACGCNNN chr8    21351922     GGCCcACCTGaCtCTGAgGaCAG -       5
GGCCGACCTGTCGCTGACGCNNN chr8    21965064     GGCCGtCCTGcgGCTGctGCAGG -       5
GGCCGACCTGTCGCTGACGCNNN chr8    22409058     GcCCGACCccTCcCcGACGCCAG +       5
...

Advanced Usage

Cas-OFFinder is mainly designed for CRISPR/Cas9 derived RGENs, however, it is also can be used for searching off-targets of other nucleases, e.g. TALENs(Transcription activator-like effector nucleases) or ZFNs(Zinc-finger nucleases), by specifying pattern sequence as all 'N's.

Example input file for TALENs:

/var/chromosomes/human_hg19
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
TTCTGGAGGTGCCTGAGGCCNNNNNNNNNNNNGAGGCCACCTTTCCAGTCCA 5
TGGCCAATGTGACGCTGACGNNNNNNNNNNNNCTGGAGACTCCAGACTTCCA 5
....

Installation

Compile

OpenCL library is required to compile Cas-OFFinder.

To support cross-platform compilation on various operating systems, CMake build system is used (more informations on http://www.cmake.org).

First, download CMake here (http://www.cmake.org/cmake/resources/software.html). If you use Ubuntu linux, you can also install it via apt-get. (apt-get install cmake)

Checkout the source code of Cas-OFFinder with Git client, or download it manually on github website.

In POSIX environment (g++ should be pre-installed), launch terminal and type the following to build Cas-OFFinder:

  cmake -G "Unix Makefiles"
  make

On Windows (Visual Studio should be pre-installed), launch 'Visual Studio Command Prompt' (You can find it under 'Start menu' - 'Microsoft Visual Studio xxxx' - 'Visual Studio Tools') and type the following (Assuming that the CMake binary is installed in 'C:\Program Files (x86)\CMake 2.8\bin'):

  "C:\Program Files (x86)\CMake 2.8\bin\cmake.exe" -G "NMake Makefiles"
  nmake

Then cas-offinder binary will be generated. Copy it wherever you want.

Module reference

Download & Source

The binaries can be downloaded from

https://sourceforge.net/projects/cas-offinder/files/Binaries

And the source code is distributed from

https://github.com/snugel/cas-offinder

Changelog

  • 2.3
    • Removed cl.hpp due to lack of C++ binding support in the new OpenCL 2.0 standard.
    • Constant arguments are stored in constant or local memory, rather than global memory.
    • Added support for 2bit format.
    • Removed kseq.h
    • Precise running time measurment on POSIX platform.
  • 2.2
    • Corrected a critical bug (when cas-offinder finds no binding sites in the given genome chunk, it crashes).
    • Now Cas-OFFinder reads whole fasta file at once, in order to achieve faster searching speed when it searches in FASTA files which contain many small scaffolds.
  • 2.1
    • Using atomic operation, reduced computing load on CPU. In our benchmark, the total computation speed increased about twice as fast as before.
    • When lowercase sequences are given, convert them uppercase sequences before computation.
    • Corrected a bug (mixed bases were shown as lowercases letters, even they had been matched with normal bases).
    • Now supports 'accelerators', with 'A' option.
  • 1.1
    • When Cas-OFFinder is launched without parameters, now it display available device list.
    • If the given chromosomes directory does not exist, now it returns an error message.
    • Corrected a bug (when Cas-OFFinder couldn't find any OpenCL device it would hang).
  • 1.0
    • Initial release.

License

Cas-OFFinder (except dirent.h) is licensed under the new BSD licence.

Copyright (c) 2013, Jeongbin Park and Sangsu Bae All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the Seoul National University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.