repet-slurm's Introduction

REPET-Slurm

A collection of scripts to get started with running the REPET pipeline on a cluster with the SLURM resource manager and a module system installed.

Caveats/Warnings

FASTA Format
- Header
  - Recommended format: ">XX_i" (XX = letters, i = numbers)
  - avoid spaces and symbols like "=;:|"
- 60 bps (or less) per line for sequences

Prerequisite Files

TEdenovo

Host genome (FASTA format)
REPET-specific Pfam HMM File
rDNA (FASTA format) of host genome
- RNAmmer
RepBase Amino Acid Database
RepBase Nucleotide Database
cDNA of host genome (FASTA format)

A RepeatScout bank can also be provided but there are additional pre-processing steps before it can be used in the pipeline. See the TEdenovo tuto webpage or text file included with REPET. These scripts currently do NOT perform this pre-processing steps.

TEannot

Host genome (FASTA format)
TE library (FASTA format)
- from TEdenovo or another source
RepBase Amino Acid Database
RepBase Nucleotide Database

Getting Started

TEdenovo

Clone the repository and copy the default configuration.

$ git clone https://github.com/stajichlab/REPET-slurm
$ cd REPET-slurm/TEdenovo
$ cp /path/to/REPET/config/TEdenovo.cfg .

Change the settings in TEdenovo.cfg and TEdenovo_AllSteps.sh to match your environment/project.
Copy/link the prerequisite files into the TEdenovo folder.
sh TEdenovo_AllSteps.sh or sbatch TEdenovo_AllSteps.sh.

TEannot

If you already ran TEdenovo, then skip step 1.

Clone the repository and copy the default configuration.

$ git clone https://github.com/stajichlab/REPET-slurm
$ cd REPET-slurm/TEannot
$ cp /path/to/REPET/config/TEannot.cfg .

Change the settings in TEannot.cfg and TEannot_AllSteps.sh to match your environment/project.
Copy/link the prerequisite files into the TEannot folder.
- TE library has a required naming format: <project_name>_refTEs.fa
sh TEannot_AllSteps.sh or sbatch TEannot_AllSteps.sh.

repet-slurm's People

Contributors

Stargazers

Watchers

repet-slurm's Issues

Failed steps don't have output folders removed

This will cause those failed steps to be skipped on automatic restart. The solution is to figure out the final file(s) that are outputted by that step, and are needed by the next step, and check if those exist instead of a crude folder check

Array job manually defined in TEdenovo steps 3 and 4

Steps 3 and 4 are executed once for each clusterer installed (Grouper, Recon, and/or Piler) but currently the number of array jobs is manually defined in TEdenovo_Step3.sh and TEdenovo_Step4.sh. It would be easier to move the array definition to the master scheduler script, which can check how many clusterers are defined to be available.

Note: This means that a check must be made for the $SLURM_ARRAY_TASK_ID variable, since it won't be set if the step script is executed independently.

stajichlab / repet-slurm Goto Github PK

repet-slurm's Introduction

REPET-Slurm

Caveats/Warnings

Prerequisite Files

TEdenovo

TEannot

Getting Started

TEdenovo

TEannot

repet-slurm's People

Contributors

Stargazers

Watchers

Forkers

repet-slurm's Issues

Recommend Projects

Recommend Topics

Recommend Org