Giter Club home page Giter Club logo

repet-slurm's Introduction

REPET-Slurm

A collection of scripts to get started with running the REPET pipeline on a cluster with the SLURM resource manager and a module system installed.

Caveats/Warnings

  1. FASTA Format
    • Header
      • Recommended format: ">XX_i" (XX = letters, i = numbers)
      • avoid spaces and symbols like "=;:|"
    • 60 bps (or less) per line for sequences

Prerequisite Files

TEdenovo

  1. Host genome (FASTA format)
  2. REPET-specific Pfam HMM File
  3. rDNA (FASTA format) of host genome
  4. RepBase Amino Acid Database
  5. RepBase Nucleotide Database
  6. cDNA of host genome (FASTA format)

A RepeatScout bank can also be provided but there are additional pre-processing steps before it can be used in the pipeline. See the TEdenovo tuto webpage or text file included with REPET. These scripts currently do NOT perform this pre-processing steps.

TEannot

  1. Host genome (FASTA format)
  2. TE library (FASTA format)
    • from TEdenovo or another source
  3. RepBase Amino Acid Database
  4. RepBase Nucleotide Database

Getting Started

TEdenovo

  1. Clone the repository and copy the default configuration.
$ git clone https://github.com/stajichlab/REPET-slurm
$ cd REPET-slurm/TEdenovo
$ cp /path/to/REPET/config/TEdenovo.cfg .
  1. Change the settings in TEdenovo.cfg and TEdenovo_AllSteps.sh to match your environment/project.
  2. Copy/link the prerequisite files into the TEdenovo folder.
  3. sh TEdenovo_AllSteps.sh or sbatch TEdenovo_AllSteps.sh.

TEannot

If you already ran TEdenovo, then skip step 1.

  1. Clone the repository and copy the default configuration.
$ git clone https://github.com/stajichlab/REPET-slurm
$ cd REPET-slurm/TEannot
$ cp /path/to/REPET/config/TEannot.cfg .
  1. Change the settings in TEannot.cfg and TEannot_AllSteps.sh to match your environment/project.
  2. Copy/link the prerequisite files into the TEannot folder.
    • TE library has a required naming format: <project_name>_refTEs.fa
  3. sh TEannot_AllSteps.sh or sbatch TEannot_AllSteps.sh.

repet-slurm's People

Contributors

hyphaltip avatar twrightsman avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

repet-slurm's Issues

Failed steps don't have output folders removed

This will cause those failed steps to be skipped on automatic restart. The solution is to figure out the final file(s) that are outputted by that step, and are needed by the next step, and check if those exist instead of a crude folder check

Array job manually defined in TEdenovo steps 3 and 4

Steps 3 and 4 are executed once for each clusterer installed (Grouper, Recon, and/or Piler) but currently the number of array jobs is manually defined in TEdenovo_Step3.sh and TEdenovo_Step4.sh. It would be easier to move the array definition to the master scheduler script, which can check how many clusterers are defined to be available.

Note: This means that a check must be made for the $SLURM_ARRAY_TASK_ID variable, since it won't be set if the step script is executed independently.

Customizable config file name

Instead of hard-coding the configuration file to be named TEdenovo.cfg or TEannot.cfg, allow user to specify a name in the script configuration

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.