RiboSee is used to align singe end Ribosomal profiling reads (RiboSeq) to a reference (can also handel UMI tagged sequecing), and output alignments, counts, psite determination, etc. RNAseeker is used to align pair-end sequencing to a reference, and output alignments, counts and statistics.
Secondary scripts can be used for analyses.
- Differential RNAseq
- Psite statistical anlayses
- Etc
I have written little to no error handling, so check logs etc. There is also something strange happenign with the prealignemnts to the stable RNAs and many are getting through, so I masked these in the reference just incase.
If files dont execute then do this
chmod +x RNAseeker_pipe.sh riboSee_pipe.sh gtf_primer.py singularity_continer_setup.sh
-Fix error with qualimap? (Error while calculating counts! Failed to detect annotations file format.
git clone --recursive https://github.com/SemiQuant/riri.git
./RNAseeker_pipe.sh --container
Flag | Description | Defaults |
---|---|---|
-t | --threads | Number of threads to use |
-g | --genome_reference | Full path to reference genome |
-gtf | --GTF_reference | Full path to reference annotations |
-rd | --read_dir | Full path to loaction of read file |
-r | --reads | Read name |
-o | --out_dir | Full path to output directory |
-n | --name | Sample name |
-s | --strand | Stranded sequecning (yes |
-sd | --script_directory | Path to script location |
-dl | --container | Downloaad singularity container and exit |
-mt | --get_metrics | Compile report of metric after run, path to folder of results |
-fq | --fastQC | Perform fastQC anlsysis |
-mm | --max_missmatch | Maximum missmatches for alignment |
-mn | --min_len | Mininum read length |
-mx | --max_len | Maximum read length |
-tm | --trim_fasta | Path to adapter and linkers multi fasta, uses Trimmomatic |
-ca | --cut\adapt | Adapter sequence to cut (e.g., CTGTAGGCACCATCAAT); Overwrites trim_fasta and uses CutAdapter |
-ms | --mask | mask stable RNAs in reference instead of prealigning to them? |
-u | --umi | UMI sequence if present e.g., GNNNNNNNNGACTGGAGTTCAGACGTGTGCTCTTCCGA |
-p | --prime | (defult = 3) plastid three or 5 prime |
-os | --offset | plastid offset (defult = 14) |
-d | --downstream | plastid downstream (defult = 100) |
-l | --landmark | plastid landmark (defult = cds_start) |
-c | --codon_buffer | plastid codon_buffer (defult = 5) |
-no | --normalize_over | plastid normalize_over (defult = '30 200') |
-m | --min_counts | plastid normalize_over (defult = 20) |
-pi | --plastid_input_extras | A tsv file where each column is a list of genes of intrest, with the first entry the name of the list |
also see "wynton_slurm_wrapper.sge"
out_dir="/wynton/home/ribSeq"
container="/wynton/home/riri_v0.1.sif"
script_dir="/wynton/home/riri"
read_dir="/wynton/home/fastq"
nm="file_name"
mkdir -p "$out_dir"
cd "$out_dir"
singularity exec "$container" \
"${script_dir}/riboSee_pipe.sh" \
--threads 8 \
--genome_reference "${script_dir}/references/NC_000962_rRNAsMasked.fasta" \
--GTF_reference "${script_dir}/references/NC_000962.gff" \
--reads "${read_dir}/${nm}_L2_1.fq.gz" \
--out_dir "$out_dir" \
--name "$nm" \
--strand "reverse" \
--script_directory "${script_dir}" \
--fastQC \
--max_missmatch 2 \
--min_len 24 \
--max_len 36 \
--trim_fasta "${script_dir}/references/adapts.fasta"
Flag | Description | Defaults |
---|---|---|
-r | --ref | Full path to reference genome |
-t | --threads | Number of threads to use |
-g | --gtf | Full path to reference annotations |
-r1 | --read1 | Full path to loaction of read1 file |
-r2 | --read2 | Full path to loaction of read2 file |
-n | --name | Sample name |
-o | --out_dir | Full path to output directory |
-m | --ram | Ram |
-s | --strand | Stranded sequecning (yes |
-rR | --remove_rRNA | Remove rRNA from annotation file |
-sd | --script_directory | Path to script location |
-dl | --container | Downloaad singularity container and exit |
-mt | --get_metrics | Compile report of metric after run, path to folder of results |
-tr | --trim_metrics | trim reads? |
-a | --adapters | Path to adapter and linkers multi fasta |
-fq | --fastQC | Perform fastQC anlsysis |
also see "wynton_slurm_wrapper.sge"
out_dir="/wynton/home/RNAseq"
container="/wynton/home/riri_v0.1.sif"
script_dir="/wynton/home/riri"
read_dir="/wynton/home/fastq"
nm="file_name"
mkdir -p "$out_dir"
cd "$out_dir"
singularity exec "$container" \
"${script_dir}/RNAseeker_pipe.sh" \
--ref "${script_dir}/references/NC_000962.fasta" \
--threads 8 \
--gtf "${script_dir}/references/NC_000962.gff" \
--read1 "${read_dir}/${nm}_L2_1.fq.gz" \
--read2 "${read_dir}/${nm}_L2_2.fq.gz" \
--name ${nm} \
--out_dir "$out_dir" \
--adapters "${script_dir}/references/adapts.fasta" \
--strand "reverse" \
--trim \
--remove_rRNA \
--fastQC \
--keep_unpaired \
--script_directory "${script_dir}"
if you want to list the files in a folder to paste into the array, you can use this
for i in $(ls *_L2_1.fq.gz); do echo -n '"'${i}'" '; done