Giter Club home page Giter Club logo

nlrextract's Introduction

title NLRextract IRC channel

NLRextract: search for NLR related hmms

NLRextract was written to search for NLR related hmms in multi-fasta protein sequences. It uses hmms from pfam v32. It includes the search for the CC domain related MADA motif [1]. This tool is under development so let me know if something's odd or can be improved. I tested the performance of NLRextract against NLRparser[2] and NLRtracker[3] in a small article on my website: https://www.biotinkertech.eu/project_NLRextract.html

IMPORTANT!!
24.02.22 - Potentially wrong output of NLRextract using system supplied awk:

Filtering of hmmrsearch / hmmrscan output was so far done with the system supplied awk version. Standard Installation of Ubuntu 20.04 comes with mawk as awk version. This version of awk has problems with scientific notation of numbers. Therefore, gawk=5.1.0 was included in environment.yml and the bash script was 'NLRextract' was adjusted to use gawk for filtering of results. If you have run NLRextract on standard Ubuntu 20.04 please make sure to rerun your analysis with the updated version!!!

IMPORTANT!!

What it does:

  • Searching for: NB-ARC, CC, TIR, RPW8 and LRR domains in the proteins
  • Create a venn diagram from the domain infromation
  • Extract the overlaps of the different domain combinations
  • Extract sequences of proteins with the different domain combinations
  • Plot number of different domains, and number of different NLRs

What it needs:

Please have a look at the environment.yml file for more information

(base) ๐Ÿ’ป daniel:NLRextract $ cat environment.yml

What it includes:

  • NLRextract.sh -> Script to run hmmsearch, hmmscan and pltNLR.r/vennNLR.r
  • vennNLR.r -> Create venn diagram from the domain infromation and extract the combinations
  • plotNLR.r -> Create barplots for the domains and the NLR proteins

How to install it:

  1. Clone it
(base) ๐Ÿ’ป daniel ~ $ git clone https://github.com/Daniel-Ze/NLRextract.git
  1. put the containing folder in your $PATH
  2. chmod a+x NLRextract
  3. Edit line 3 in NLRextract if you chose to install it anywhere else than your home folder:
1 #!/bin/bash
2 ########################################################### To get the script running
3 NLRextracthome=~/NLRextract                               # edit this line
4 ########################################################### and put the folder in PATH
  1. Install the environment:
(base) ๐Ÿ’ป daniel ~ $ cd NLRextract
(base) ๐Ÿ’ป daniel:NLRextract $ mamba env create -f environment.yml

How to run it:

(base) ๐Ÿ’ป daniel:NLRextract $ NLRextract
/Users/daniel/miniconda3/etc/profile.d/conda.sh exists.
[info]	No conda environment name supplied. Defaulting to: NRLextract
[info]	Activating conda environment NLRextract:
[info]	 - Found Rscript in your path.
[info]	 - Found hmmsearch in your path.
[info]	 - Found bedtools in your path.
[error]	No protein file supplied.

	NLRextract will run hmmersearch and search for the
	HMMs located in /Users/daniel/NLRextract/hmm/.
	Make sure that you supply a protein multifasta file.

Usage: NLRextract -p protein.fa
	-c number of CPUs to use (default: 1) 
	-p path to the protein multifasta (mandatory)
	-s suffix for folder (default: random string)
	-e conda environment (default: NLRextract)

Output:

The script will generated several folder with the output of the different tools used:

NLRextract_TAIR10_test/
โ”œโ”€โ”€ [3.1K]  nlr.stderr          # Error reports of all steps
โ”œโ”€โ”€ [204K]  clust/              # Alignment and phylogenetic tree data
โ”œโ”€โ”€ [ 92K]  domain/             # Fasta sequences of the extracted domains
โ”œโ”€โ”€ [1.1M]  fasta/              # Fasta sequences of the extracted NLR proteins
โ”œโ”€โ”€ [1.2M]  gff/                # GFF files of domains and NLR proteins
โ”œโ”€โ”€ [233M]  hmmer/              # HMMER results for the NLR HMM motifs
โ”œโ”€โ”€ [ 59K]  name/               # Sequence names of NLR proteins and domains
โ”œโ”€โ”€ [ 26K]  seqname/            # Sequence names of proteins with a NLR domain
โ””โ”€โ”€ [ 34K]  stats/              # Summary of findings plus plots

Changes:

UPDATE 24.02.21

  • switched from awk to gawk=5.1.0

PREVIOUS UPDATES

  • Added usage info
  • Added suffix and number of CPU options
  • Adjusted plotNLR.r output
  • Added phylogenetic trees

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

[1] Hiroaki Adachi, Mauricio P Contreras, Adeline Harant, Chih-hang Wu, Lida Derevnina, Toshiyuki Sakai, Cian Duggan, Eleonora Moratto, Tolga O Bozkurt, Abbas Maqbool, Joe Win, Sophien Kamoun, 2019, An N-terminal motif in NLR immune receptors is functionally conserved across distantly related plant species. eLife, 8:e49956 http://dx.doi.org/10.7554/eLife.49956
[2] Burkhard Steuernagel, Florian Jupe, Kamil Witek, Jonathan D.G. Jones, Brande B.H. Wulff, 2015, NLR-parser: rapid annotation of plant NLR complements. Bioinformatics, Vol. 31, Issue 10, Pages 1665โ€“1667 https://doi.org/10.1093/bioinformatics/btv005
[3] Jiorgos Kourelis, Toshiyuki Sakai, Hiroaki Adachi, Sophien Kamoun, 2021, RefPlantNLR: a comprehensive collection of experimentally validated plant NLRs. bioRxiv, https://doi.org/10.1101/2020.07.08.193961

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.