riri

Pipelines to process bacterial PE RNAseq and SE RiboSeq

RiboSee is used to align singe end Ribosomal profiling reads (RiboSeq) to a reference (can also handel UMI tagged sequecing), and output alignments, counts, psite determination, etc. RNAseeker is used to align pair-end sequencing to a reference, and output alignments, counts and statistics.

Secondary scripts can be used for analyses.

Differential RNAseq
Psite statistical anlayses
Etc

I have written little to no error handling, so check logs etc. There is also something strange happenign with the prealignemnts to the stable RNAs and many are getting through, so I masked these in the reference just incase.

If files dont execute then do this chmod +x RNAseeker_pipe.sh riboSee_pipe.sh gtf_primer.py singularity_continer_setup.sh

TODO

-Fix error with qualimap? (Error while calculating counts! Failed to detect annotations file format.

Download scripts

git clone --recursive https://github.com/SemiQuant/riri.git

Download singularity container

./RNAseeker_pipe.sh --container

RiboSee

Flag	Description	Defaults
-t	--threads	Number of threads to use
-g	--genome_reference	Full path to reference genome
-gtf	--GTF_reference	Full path to reference annotations
-rd	--read_dir	Full path to loaction of read file
-r	--reads	Read name
-o	--out_dir	Full path to output directory
-n	--name	Sample name
-s	--strand	Stranded sequecning (yes
-sd	--script_directory	Path to script location
-dl	--container	Downloaad singularity container and exit
-mt	--get_metrics	Compile report of metric after run, path to folder of results
-fq	--fastQC	Perform fastQC anlsysis
-mm	--max_missmatch	Maximum missmatches for alignment
-mn	--min_len	Mininum read length
-mx	--max_len	Maximum read length
-tm	--trim_fasta	Path to adapter and linkers multi fasta, uses Trimmomatic
-ca	--cut\adapt	Adapter sequence to cut (e.g., CTGTAGGCACCATCAAT); Overwrites trim_fasta and uses CutAdapter
-ms	--mask	mask stable RNAs in reference instead of prealigning to them?
-u	--umi	UMI sequence if present e.g., GNNNNNNNNGACTGGAGTTCAGACGTGTGCTCTTCCGA
-p	--prime	(defult = 3) plastid three or 5 prime
-os	--offset	plastid offset (defult = 14)
-d	--downstream	plastid downstream (defult = 100)
-l	--landmark	plastid landmark (defult = cds_start)
-c	--codon_buffer	plastid codon_buffer (defult = 5)
-no	--normalize_over	plastid normalize_over (defult = '30 200')
-m	--min_counts	plastid normalize_over (defult = 20)
-pi	--plastid_input_extras	A tsv file where each column is a list of genes of intrest, with the first entry the name of the list

Example run

also see "wynton_slurm_wrapper.sge"

out_dir="/wynton/home/ribSeq"
container="/wynton/home/riri_v0.1.sif"
script_dir="/wynton/home/riri"
read_dir="/wynton/home/fastq"
nm="file_name"

mkdir -p "$out_dir"
cd "$out_dir"

singularity exec "$container" \
  "${script_dir}/riboSee_pipe.sh" \
  --threads 8 \
  --genome_reference "${script_dir}/references/NC_000962_rRNAsMasked.fasta" \
  --GTF_reference "${script_dir}/references/NC_000962.gff" \
  --reads "${read_dir}/${nm}_L2_1.fq.gz" \
  --out_dir "$out_dir" \
  --name "$nm" \
  --strand "reverse" \
  --script_directory "${script_dir}" \
  --fastQC \
  --max_missmatch 2 \
  --min_len 24 \
  --max_len 36 \
  --trim_fasta "${script_dir}/references/adapts.fasta"

RNAseeker

Flag	Description	Defaults
-r	--ref	Full path to reference genome
-t	--threads	Number of threads to use
-g	--gtf	Full path to reference annotations
-r1	--read1	Full path to loaction of read1 file
-r2	--read2	Full path to loaction of read2 file
-n	--name	Sample name
-o	--out_dir	Full path to output directory
-m	--ram	Ram
-s	--strand	Stranded sequecning (yes
-rR	--remove_rRNA	Remove rRNA from annotation file
-sd	--script_directory	Path to script location
-dl	--container	Downloaad singularity container and exit
-mt	--get_metrics	Compile report of metric after run, path to folder of results
-tr	--trim_metrics	trim reads?
-a	--adapters	Path to adapter and linkers multi fasta
-fq	--fastQC	Perform fastQC anlsysis

Example run

also see "wynton_slurm_wrapper.sge"

out_dir="/wynton/home/RNAseq"
container="/wynton/home/riri_v0.1.sif"
script_dir="/wynton/home/riri"
read_dir="/wynton/home/fastq"
nm="file_name"

mkdir -p "$out_dir"
cd "$out_dir"

singularity exec "$container" \
  "${script_dir}/RNAseeker_pipe.sh" \
    --ref "${script_dir}/references/NC_000962.fasta" \
    --threads 8 \
    --gtf "${script_dir}/references/NC_000962.gff" \
    --read1 "${read_dir}/${nm}_L2_1.fq.gz" \
    --read2 "${read_dir}/${nm}_L2_2.fq.gz" \
    --name ${nm} \
    --out_dir "$out_dir" \
    --adapters "${script_dir}/references/adapts.fasta" \
    --strand "reverse" \
    --trim \
    --remove_rRNA \
    --fastQC \
    --keep_unpaired \
    --script_directory "${script_dir}"

Random

if you want to list the files in a folder to paste into the array, you can use this for i in $(ls *_L2_1.fq.gz); do echo -n '"'${i}'" '; done

semiquant / riri Goto Github PK

riri's Introduction

riri

Pipelines to process bacterial PE RNAseq and SE RiboSeq

TODO

Download scripts

Download singularity container

RiboSee

Example run

RNAseeker

Example run

Random

riri's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent