Giter Club home page Giter Club logo

mirdeep2's Introduction

Build Status

miRDeep2 README

About

Authors: Sebastian Mackowiak & Marc Friedländer

This is miRDeep2 developed by Sebastian Mackowiak & Marc Friedländer. miRDeep2 discovers active known or novel miRNAs from deep sequencing data (Solexa/Illumina, 454, ...).

(minor edits to README, TUTORIAL, CHANGELOG, and FAQ, convertion to Markdown, trailing whitespace removal & CI setup by Marcel Schilling)

Requirements

Linux system, 2GB Ram, enough disk space dependent on your deep sequencing data

Testing version

MacOSX with Xcode and gcc compiler installed. (This can be obtained from the appstore, if there are any issues with installing it please look for help online).

To compile the Vienna package it may be necessary to have GNU grep installed since the MacOSX grep is BSD based and sometimes not accepted by the installer. To get a GNU grep you could for example install homebrew by typing

ruby -e "$(curl -fsSL \
  https://raw.githubusercontent.com/Homebrew/install/master/install)"

(the link could be out of date, in that case look up online what to do)

After that typing

brew tap homebrew/dupes; brew install grep

will install GNU grep as ggrep in /usr/local/bin/

Installation

Option 1: with the provided install.pl script

Type

perl install.pl

Option 2. without the install mirdeep script

Follow the instructions given below

Dependencies

First download all necessary packages listed here

  1. bowtie short read aligner
  2. Vienna package with RNAfold
  3. SQUID library goto Squid and download it
  4. randfold
  5. Perl package PDF::API2

Manual installation

When packages are downloaded

  1. attach the miRDeep2 executable path to your PATH
echo 'export PATH=$PATH:your_path_to_mirdeep2/src' >> ~/.bashrc
  1. unzip bowtie-0.11.3-bin-linux-x86_64.zip

  2. put the bowtie directory into your PATH variable, e.g.

echo 'export PATH=$PATH:your_path_tobowtie' >> ~/.bashrc
  1. tar xvvzf ViennaRNA-1.8.4.tar.gz

  2. cd to the Vienna dir

  3. type

./configure --prefix=your_path_to_Vienna/install_dir
make
make install
  1. add Vienna binaries to your PATH variable, e.g.
echo 'export PATH=$PATH:your_path_to_Vienna/install_dir/bin' >> ~/.bashrc
  1. tar xxvzf squid-1.9g.tar.gz

  2. tar xvvzf randfold-2.0.tar.gz

  3. cd randfold2.0

  4. edit Makefile, e.g. emacs Makefile:

change line with INCLUDE=-I. to INCLUDE=-I. -I<your_path_to_squid-1.9g> -L<your_path_to_squid-1.9g>, e.g. INCLUDE=-I. -I/home/Pattern/squid-1.9g/ -L/home/Pattern/squid-1.9g/

  1. make

  2. add randfold to your PATH variable, e.g.

echo 'export PATH=$PATH:your_path_to_randfold' >> ~/.bashrc
  1. tar xvvzf PDF-API2-0.73.tar.gz

  2. cd to your PDF_API2 directory

  3. then type in

perl Makefile.PL INSTALL_BASE=your_path_to_miRDeep2 LIB=your_path_to_miRDeep2/lib
make
make test
make install
  1. add your library to the PERL5LIB, e.g.
echo \
  'export PERL5LIB=PERL5LIB:your_path_to_miRDeep2/lib/perl5' \
  >> ~/.bashrc
  1. cd to your mirdeep2 directory (the one containing install.pl)

  2. touch install_successful

  3. start a new shell session to apply the changes to environment variables

Test installation

To test if everything is installed properly type in

  1. bowtie
  2. RNAfold -h
  3. randfold
  4. make_html.pl

You should not get any error messages. Otherwise something is not correctly installed.

Install Paths

Everything that is download by the installer will be in a directory called <your_path_to_mirdeep2>/essentials

Script Reference

miRDeep2 analyses can be performed using the three scripts miRDeep2.pl, mapper.pl and quantifier.pl.

miRDeep2.pl

Description

Wrapper function for the miRDeep2.pl program package. The script runs all necessary scripts of the miRDeep2 package to perform a microRNA detection deep sequencing data anlysis.

Input

  • A FASTA file with deep sequencing reads,
  • a FASTA file of the corresponding genome,
  • a file of mapped reads to the genome in miRDeep2 ARF format,
  • an optional FASTA file with known miRNAs of the analysed species, and
  • an optional FASTA file of known miRNAs of related species.

Output

  • A spreadsheet and
  • an HTML file

with an overview of all detected miRNAs in the deep sequencing input data.

Options

option description
‑a <int> minimum read stack height that triggers analysis. Using this option disables automatic estimation of the optimal value.
‑b <int> minimum score cut-off for predicted novel miRNAs to be displayed in the overview table. This score cut-off is by default 0.
‑c disable randfold analysis
‑t <species> species being analyzed - this is used to link to the appropriate UCSC browser
‑u output list of UCSC browser species that are supported and exit
‑v remove directory with temporary files
‑q <file> miRBase.mrd file from quantifier module to show miRBase miRNAs in data that were not scored by miRDeep2

Examples:

The miRDeep2 module identifies known and novel miRNAs in deep sequencing data. The output of the mapper module can be directly plugged into the miRDeep2 module.

Example use 1

The user wishes to identify miRNAs in mouse deep sequencing data, using default options. The miRBase_mmu_v14.fa file contains all miRBase mature mouse miRNAs, while the miRBase_rno_v14.fa file contains all the miRBase mature rat miRNAs. The 2> will pipe all progress output to the report.log file.

miRDeep2.pl reads_collapsed.fa genome.fa reads_collapsed_vs_genome.arf \
  miRBase_mmu_v14.fa miRBase_rno_v14.fa precursors_ref_this_species.fa \
  -t Mouse 2>report.log

This command will generate

  • a directory with PDFs showing the structures, read signatures and score breakdowns of novel and known miRNAs in the data,
  • an HTML webpage that links to all results generated (result.html),
  • a copy of the novel and known miRNAs contained in the webpage but in text format which allows easy parsing (result.csv),
  • a copy of the performance survey contained in the webpage but in text format (survey.csv), and
  • a copy of the miRNA read signatures contained in the PDFs but in text format (output.mrd).
Example use 2

The user wishes to identify miRNAs in deep sequencing data from an animal with no related species in miRBase:

miRDeep2.pl reads_collapsed.fa genome.fa reads_collapsed_vs_genome.arf \
  none none none 2>report.log

This command will generate the same type of files as example use 1 above. Note that there it will in practice always improve miRDeep2 performance if miRNAs from some related species is input, even if it is not closely related.


mapper.pl

Description

Processes reads and/or maps them to the reference genome.

Input

Default input is

  • a file in FASTA, seq.txt or qseq.txt format.

More input can be given depending on the options used.

Output

The output depends on the options used (see below).

Either

  • a FASTA file with processed reads, or
  • an ARF file with with mapped reads, or
  • both

are output.

Options

Read input file
option description
‑a input file is seq.txt format
‑b input file is qseq.txt format
‑c input file is FASTA format
Preprocessing/mapping
option description
‑h parse to FASTA format
‑i convert RNA to DNA alphabet (to map against genome)
‑j remove all entries that have a sequence that contains letters other than a, c, g, t, u, n, A, C, G, T, U, or N.
‑k <seq> clip 3' adapter sequence
‑l <int> discard reads shorter than <int> nts
‑m collapse reads
‑p <genome> map to genome (must be indexed by bowtie-build). The genome string must be the prefix of the bowtie index. For instance, if the first indexed file is called h_sapiens_37_asm.1.ebwt then the prefix is h_sapiens_37_asm.
‑q map with one mismatch in the seed (mapping takes longer)
Output files
option description
‑s file print processed reads to this file
‑t file print read mappings to this file
Other
option description
‑u do not remove directory with temporary files
‑v outputs progress report

Examples

The mapper module is designed as a tool to process deep sequencing reads and/or map them to the reference genome. The module works in sequence space, and can process or map data that is in sequence FASTA format. A number of the functions of the mapper module are implemented specifically with Solexa/Illumina data in mind. For example on how to post-process mappings in color space, see example use 5:

Example use 1

The user wishes to parse a file in qseq.txt format to FASTA format, convert from RNA to DNA alphabet, remove entries with non-canonical letters (letters other than a, c, g, t, u, n, A, C, G, T, U, or N), clip adapters, discard reads shorter than 18 nts and collapse the reads:

mapper.pl reads_qseq.txt -b -h -i -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m \
 -s reads_collapsed.fa
Example use 2

The user wishes to map a FASTA file against the reference genome. The genome has already been indexed by bowtie-build. The first of the indexed files is named genome.1.ebwt:

mapper.pl reads_collapsed.fa -c -p genome -t reads_collapsed_vs_genome.arf
Example use 3

The user wishes to process the reads as in example use 1 and map the reads as in example use 2 in a single step, while observing the progress:

mapper.pl reads_qseq.txt -b -h -i -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m \
  -p genome -s reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v
Example use 4

The user wishes to parse a GEO file to FASTA format and process it as in example use 1. The GEO file is in tabular format, with the first column showing the sequence and the second column showing the read counts:

geo2fasta.pl GSM.txt > reads.fa

mapper.pl reads.fa -c -h -i -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m \
  -s reads_collapsed.fa
Example use 5

The user has already removed 3' adapters in color space and has mapped the reads against the genome using the BWA tool. The BWA output file is named reads_vs_genome.sam. Notice that the BWA output contains extra fields that are not required for SAM format. Our converter requires these fields and thus may not work with all types of SAM files. The user wishes to generate reads_collapsed.fa and reads_vs_genome.arf to input to miRDeep2:

bwa_sam_converter.pl reads_vs_genome.sam reads.fa reads_vs_genome.arf

mapper.pl reads.fa -c -i -j -l 18 -m -s reads_collapsed.fa

quantifier.pl

Description

The module maps the deep sequencing reads to predefined miRNA precursors and determines by that the expression of the corresponding miRNAs. First, the predefined mature miRNA sequences are mapped to the predefined precursors. Optionally, predefined star sequences can be mapped to the precursors too. By that the mature and star sequence in the precursors are determined. Second, the deep sequencing reads are mapped to the precursors. The number of reads falling into an interval 2 nt upstream and 5 nt downstream of the mature/star sequence is determined.

Input

  • A FASTA file with precursor sequences,
  • a FASTA file with mature miRNA sequences,
  • a FASTA file with deep sequencing reads, and
  • optionally a FASTA file with star sequences and the 3 letter code of the species of interest.

Output

  • A 2 column table file called miRNA_expressed.csv with miRNA identifiers and its read count,
  • a file called miRNA_not_expressed.csv with all miRNAs having 0 read counts,
  • a signature file called miRBase.mrd,
  • a file called expression.html that gives an overview of all miRNAs the input data, and
  • a directory called pdfs that contains for each miRNA a PDF file showing its signature and structure.

Options

option description
-p [file.fa] miRNA precursor sequences (around 70bp: One line per precursors sequence)
-m [file.fa] mature miRNA sequences (around 22nt)
-P specify this option of your mature miRNA file contains 5p and 3p ids only
-c [file] config.txt file with different sample ids... or just the one sample id -- deprecated
-s [star.fa] optional star sequences from miRBase
-t [species] e.g. Mouse or mmu
if not searching in a specific species all species in your files will be analyzed
else only the species in your dataset is considered
-y [time] optional otherwise its generating a new one
-d if parameter given pdfs will not be generated, otherwise pdfs will be generated
-o if parameter is given reads were not sorted by sample in pdf file, default is sorting
-k also considers precursor-mature mappings that have different ids, eg let7c
would be allowed to map to pre-let7a
-n do not do file conversion again
-x do not do mapping against precursor again
-g [int] number of allowed mismatches when mapping reads to precursors, default 1
-e [int] number of nucleotides upstream of the mature sequence to consider, default 2
-f [int] number of nucleotides downstream of the mature sequence to consider, default 5
-j do not create an output.mrd file and pdfs if specified
-W read counts are weighed by their number of mappings. e.g. A read maps twice so each position
gets 0.5 added to its read profile
-U use only unique read mappings; Caveat: Some miRNAs have multiple precursors. These will be
underestimated in their expression since the multimappers are excluded
-u list all values allowed for the species parameter that have an entry at UCSC

Example usage

quantifier.pl -p precursors.fa -m mature.fa -r reads.fa

make_html.pl

Description

It creates a file called result.html that gives an overview of miRDeep2 detected miRNAs (known and novel ones). The HTML file lists up each detected miRNA and provides among others information on its miRDeep2 score, reads mapped to its mature, loop and star sequence, the mature, star and consensus precursor sequences themselves and provides links to BLAST, BLAT, mirBase for miRBase miRNAs and to a PDF file that shows the signature and structure.

Input

  • A miRDeep2 output.mrd file and
  • a miRDeep2 survey.csv file

Output

  • A result.html file with an entry for each provisional miRNA that contains information about its assigned Id, miRDeep2 score, estimated probability that the miRNA candidate is a true positive, rfam alert, total read count, mature read count, loop read count, star read count, significant randfold p-value, miRBase miRNA, example miRBase miRNA with the same seed, BLAT, BLAST, consensus mature sequence, consensus star sequence and consensus precursor sequence. Furthermore, the miRBase miRNAs existent in the input data but not scored by miRDeep2 are listed.
  • A directory called pdfs that contains for each provisional miRNA ID a PDF with its signature and structure.
  • A file called result.csv (when option -c is used) that contains the same entries as the HTML file.

Options

option description
‑v <int> only output hairpins with score above <int>
‑c also create overview in excel format
‑k <file> supply file with known miRNAs
‑s <file> supply survey file if score cutoff is used to get information about how big is the confidence of resulting reads
‑f <file> miRDeep2 output MRD file
‑e report complete survey file
‑g report survey for current score cutoff
‑w <project_folder> automatically used when running webinterface, otherwise don't use it
‑r <file> Rfam file to check for already reported small RNA sequences
‑q <file> miRBase.mrd file produced by quantifier module
‑x <file> signature.arf file with mapped reads to precursors
‑t <org> specify the organism from which your sequencing data was obtained
‑u print all available UCSC input organisms
‑d do not generate PDFs
‑y timestamp
‑z switch is automatically used when script is called by quantifier.pl
‑o print reads in PDF signature sorted by their 3 letter code in front of their identifier

Example usage

make_html.pl -f miRDeep_outfile -s survey.csv -c -e -y 123456789

clip_adapters.pl

Description

Removes 3' end adaptors from deep sequenced small RNAs. The script searches for occurrences of the six first nucleotides of the adapter in the read sequence, starting after position 18 in the read sequence (so the shortest clipped read will be 18 nts). If no matches to the first six nts of the adapter are identified in a read, the 3' end of the read is searched for shorter matches to the 5 to 1 first nts of the adapter.

Input

  • A FASTA file with the deep sequencing reads and the adapter sequence (both in RNA or DNA alphabet).

Output

  • A FASTA file with the clipped reads.

FASTA IDs are retained. If no matches to the adapter prefixes are identified in a given read, the unclipped read is output.

Example usage

clip_adapters.pl reads.fa TCGTATGCCGTCTTCTGCTTGT > reads_clipped.fa

Notes

It is possible to clip adapters using more sophisticated methods. Users are encouraged to test other methods with the miRDeep2 modules.


collapse_reads.pl

Description

Collapses reads in the FASTA file to ensure that each sequence only occurs once. To indicate how many times reads the sequence represents, a suffix is added to each FASTA identifier. E.g. a sequence that represents ten reads in the data will have the _x10 suffix added to the identifier.

Input

  • A FASTA file, either in standard format or in the collapsed suffix format.

Output

  • A FASTA file in the collapsed suffix format.

Options

option description
‑a outputs progress

Example usage

collapse_reads.pl reads.fa > reads_collapsed

Notes

Since the script reads all FASTA entries into a hash using the sequence as key, it can potentially use more than 3 GB memory when collapsing very big datasets, >50 million reads. In this case, the user can partition the reads (for instance based on the 5' nucleotide), collapse separately and concatenate.


excise_precursors_iterative.pl

Description

This script is a wrapper for excise_precursors.pl, which it calls one or more times, incrementing the height of the read stack required for initiating excision until the number of excised precursors falls below a given threshold.

Input

  • The reference genome in FASTA format,
  • the mapped reads in .arf format,
  • a filename that the excised precursors will be written to, and
  • the maximal number of precursors that should be reported.

Output

The excised precursors in FASTA format.

Options

option description
‑a Output progress to screen.

Example usage

excise_precursors_iterative.pl genome.fa reads_vs_genome.arf \
  potential_precursors.fa 50000 -a

excise_precursors.pl

Description

Excises precursors from the genome using the mapped reads as guidelines.

Input

  • The reference genome in FASTA format and
  • the mapped reads in .arf format.

Output

  • The excised precursors in FASTA format.

Options

option description
‑a <integer> Only excise if the highest local read stack is <integer> reads high (default 2).
‑b Output progress to screen.

Example usage

excise_precursors.pl genome.arf reads_vs_genome.arf -b

fastaparse.pl

Description

Performs simple filtering of entries in a FASTA file.

Input

  • A FASTA file.

Ouput

  • A filtered FASTA file.

Options

option description
‑a <int> only output entries where the sequence is minimum int nts long
‑b remove all entries that have a sequence that contains letters other than a, c, g, t, u, n, A, C, G, T, U, or N.
‑s output progress

Example usage

fastaparse.pl reads.fa -a 18 -s > reads_no_short.fa

fastaselect.pl

Description

This script only prints out the FASTA entries that match an ID in the ID file.

Input

  • A FASTA file and a file with IDs, one ID per line.

Output

  • A FASTA file containing the FASTA entries that match an ID.

Options

option description
‑a only prints out entries that has an id that is not present in the ID file.

Example usage

fastaselect.pl reads.fa reads_select.ids > reads_select.fa

find_read_count.pl

Description

Scans a file searching for the suffixes that are generated by collapse_reads.pl (e.g. _x10). It sums up the integer values in the suffixes and outputs the sum. If a given id occurs multiple times in the file, it will multi-count the integer value of the ID. It will also only count the first integer occurrence in a given line.

Input

  • Any file containing the suffixes that are generated by collapse_reads.pl.

This will typically be a FASTA file or a list of IDs.

Output

  • The sum of integer values (the total read count).

Example usage

find_read_count.pl reads_collapsed.fa

geo2fasta.pl

Description

Parses GSM format files into FASTA format.

Input

  • GSM files in tabular format.

The first column should be sequences and the second column the number of times the sequence occurs in the data.

Output

  • A FASTA file, one sequence per line (the sequences are expanded).

Example usage

geo2fasta.pl GSM.txt > reads.fa

illumina_to_fasta.pl

Description

Parses seq.txt or qseq.txt output from the Solexa/Illumina platform to FASTA format.

Input

  • A seq.txt or
  • qseq.txt file.

By default seq.txt.

Output

  • A FASTA file, one entry for each line of seq.txt.

The entries are named seq plus a running number that is incremented by one for each entry. Any . characters in the seq.txt file is substituted with an N.

Options

option description
‑a format is qseq.txt

Example usage

illumina_to_fasta.pl s_1.qseq.txt -a > reads.fa

miRDeep2_core_algorithm.pl

Description

For each potential miRNA precursor input, the miRDeep2 core algorithm either discards it or assigns it a log-odds score that reflects the probability that the precursor is a genuine miRNA.

Input

Default input is

  • an ARF file with the read signatures and
  • an RNAfold output file with the structures of the potential miRNA precursors.

Output

  • A .mrd file with all potential miRNA precursors that are scored.

Options

option description
‑h print this usage
‑s FASTA file with reference mature miRNAs from one or more related species
‑t print filtered
‑u limited output (only ids)
‑v cut-off (default 1)
‑x sensitive option for Sanger sequences
‑y file with randfold p-values
‑z consider Drosha processing

Example usage

miRDeep2_core_algorithm.pl signature.arf potential_precursors.str \
  -s miRBase_related_species.fa -y potential_precursors.rand > output.mrd

Notes

The -z option has not been thoroughly tested.


parse_mappings.pl

Description

Performs simple filtering of entries in an .arf file.

Input

Default input is

  • an .arf file.

Output

  • A filtered .arf file.

Options

option description
‑a <int> Discard mappings of edit distance higher than this
‑b <int> Discard mappings of read queries shorter than this
‑c <int> Discard mappings of read queries longer than this
‑d <file> Discard read queries not in this file
‑e <file> Discard read queries in this file
‑f <file> Discard reference dbs not in this file
‑g <file> Discard reference dbs in this file
‑h Discard remaining suboptimal mappings
‑i <int> Discard remaining suboptimal mappings and discard any reads that have more remaining mappings than this
‑j Remove any unmatched nts in the very 3' end
‑k Output progress to standard output

Example usage

parse_mappings.pl reads_vs_genome.arf -a 0 -b 18 -c 25 -i 5 \
  > reads_vs_genome_parsed.arf

perform_controls.pl

Description

Performs a designated number of rounds of permuted controls (for details, see Friedländer et al., Nature Biotechnology, 2008).

Input

The permutation controls estimate the number of false positives produced by a miRDeep2_core_algorithm.pl run. The input to perform_controls.pl should be

  • a file containing the exact command line used to initiate the miRDeep2_core_algorithm.pl run,
  • the structure file input to miRDeep2_core_algorithm.pl, and
  • the desired rounds of controls.

Output

  • A file in .mrd format.

The output of each control run is separated by a line permutation integer. The mean number of entries output by the control runs gives an estimate of the false positives produced. The further contents (besides the number of entries) of the .mrd output by perform_controls.pl is not biologically meaningful.

Options

option description
‑a Output progress to screen

Example usage

perform_controls.pl line potential_precursors.str 100 \
  > output_controls.mrd

permute_structure.pl

Description

In a file output by RNAfold, each entry can be partitioned into an 'id' part and an 'other' part, consisting of the dot-bracket structure, sequence, mfe etc. This scripts reads all 'id' parts into a hash and pairs them with 'other' parts from random entries. This is used by the perform_controls.pl script.

Input

  • An RNAfold output file.

Output

  • An RNAfold output file with IDs moved to random entries.

Example usage

permute_structure.pl potential_precursors.str \
  > potential_precursors_permuted.str

prepare_signature.pl

Description

Prepares the signature file to be input to the miRDeep2_core_algorithm.pl script.

Input

  • A FASTA file with deep sequencing reads and
  • a FASTA file with precursors.

Output

  • A signature file in .arf format.

Options

option description
‑a <file> FASTA file with the sequences of known mature miRNAs for the species. These sequences will not influence the miRDeep scoring, but will subsequently make it easy to estimate sensitivity of the run.
‑b Output progress to screen

Example usage

prepare_signature.pl reads_collapsed.fa potential_precursors.fa \
  -a miRBase_this_species.fa > signature.arf

rna2dna.pl

Description

Substitutes us and Us to Ts. This is useful since bowtie does not match Us to Ts.

Input

  • A FASTA file.

Output

  • A substituted FASTA file.

Example usage

rna2dna.pl reads_RNA_alphabet.fa > reads_DNA_alphabet.fa

select_for_randfold.pl

Description

This script identifies potential precursors whose structure is basically consistent with Dicer recognition. Since running randfold is time-consuming, it is practical only to estimate p-values for those potential precursors that actually fold into hairpin structures.

Input

  • An ARF file with the read signatures and
  • an RNAfold output file with the structures of the potential miRNA precursors.

Output

  • A list of ids, separated by newlines.

Example usage

select_for_randfold.pl signature.arf potential_precursors.str \
  > potential_precursors_for_randfold.ids

survey.pl

Description

Surveys miRDeep2 performance at score cut-offs from -10 to 10.

Input

Default input is

  • a .mrd file output by the miRDeep2_core_algorithm.pl script.

Output

  • A .csv file with performace statistics.

Options

option description
‑a <file> file outputted by controls
‑b <file> mature miRNA FASTA reference file for the species
‑c <file> signature file
‑d <int> read stack height necessary for triggering excision

Example usage

survey.pl output.mrd -a output_controls.mrd -b miRBase_this_species.fa \
  -c signature.arf -d 2 > survey.csv

convert_bowtie_output.pl

Description

It converts a bowtie bwt mapping file to a mirdeep arf file.

Input

  • A file in bwt format.

Output

  • A file in mirdeep arf format.

bwa_sam_converter.pl

Description

It converts a bwa sam mapping file to a mirdeep arf file.

Input

  • A bwa created file in sam format.

Output

  • A file in mirdeep arf format.

samFLAGinfo.pl

Description

It gives information about the bwa FLAG in a bwa created mapping file in sam format.

Input

  • A FLAG number created by bwa.

Output

  • Information about the alignment created by bwa.

clip_adapters.pl

Description

Removes 3' end adaptors from deep sequenced small RNAs. The script searches for occurrences of the six first nucleotides of the adapter in the read sequence, starting after position 18 in the read sequence (so the shortest clipped read will be 18 nts). If no matches to the first six nts of the adapter are identified in a read, the 3' end of the read is searched for shorter matches to the 5 to 1 first nts of the adapter.

Input

  • A FASTA file with the deep sequencing reads and
  • the adapter sequence (both in RNA or DNA alphabet).

Output

  • A FASTA file with the clipped reads.

FASTA IDs are retained. If no matches to the adapter prefixes are identified in a given read, the unclipped read is output.

Example usage

clip_adapters.pl reads.fa TCGTATGCCGTCTTCTGCTTGT > reads_clipped.fa

Notes

It is possible to clip adapters using more sophisticated methods. Users are encouraged to test other methods with the miRDeep2 modules.


sanity_check_genome.pl

Description

It checks the supplied genome FASTA file for its correctness. Identifier lines are not allowed to contain whitespaces and must be unique. Sequence lines are not allowed to contain characters others than A, C, G, T, N, a, c, g, t, or n.

Input

  • A genome file in FASTA format

sanity_check_mapping_file.pl

Description

It checks the supplied mapping file for its correctness. Each line in the file must be in the ARF format.

Input

  • A mapping file in ARF format.

sanity_check_mature_ref.pl

Description

It checks the supplied mature_miRNA FASTA file for its correctness. Identifier lines are not allowed to contain whitespaces and must be unique. Sequence lines are not allowed to contain characters others than A, C, G, T, N, a, c, g, t, or n.

Input

  • A mature miRNA file in FASTA format.

sanity_check_reads_ready.pl

Description

It checks the supplied reads file for its correctness. Each identifier line must have the format of '>name_uniqueNumber_xnumbere.g.>xyz_1_x20. See also file format_descriptions.txt` for more detailed informations.

Input

  • A mapping file in ARF format.

extract_miRNAs.pl

Description

Extracts mature and precursor sequences from miRBase fasta files for species of interest.

Input

  • A fasta file from miRBAase
  • One or more species three letter code abbreviations

Output

  • A fasta file in a proper format usable by quantifier.pl and miRDeep2.pl.
  • Multiline sequences from input files are put on a single line and MacOS and Windows linebreaks/carriage returns are removed

Example usage

extract_miRNAs.pl mature_miRBase.fa hsa > mature_hsa.fa
extract_miRNAs.pl hairpin_miRBase.fa hsa > hairpin_hsa.fa
extract_miRNAs.pl mature_miRBase.fa mmu,chi > mature_other.fa

mirdeep2's People

Contributors

drmirdeep avatar mschilli87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mirdeep2's Issues

miRDeep2 does not generate result.html

Hi,

unfortunately I could not find result.html or result.csv in the result directory after miRDeep2.pl command. Strangely I did not get any errors but this result is missing.

here are the generated files I got
grafik

and here is command line screenshot, which looks just fine without any errors
grafik

I deeply appreciate any kind of help.
Dewi

Rfam_for_miRdeep.fa

I'm Running miRDeep2 v2.0.0.8 on Linux Mint 18.1, fresh install. installed miRDeep2 and dependencies via Bioconda.

I'm trying to get miRDeep2 running, and the error "Rfam_for_miRDeep.fa not found in your miRDeep2 scripts directory" kept popping up. The file Rfam_for_miRDeep.fa was in the /bin directory along with miRDeep2.pl, but it didn't work. Just through an accident, I placed the Rfam_for_miRDeep.fa file in the directory above the /bin, and that error stopped occurring. Is that expected behavior? I kept getting the same error on a fresh Ubuntu install as well, but I have not confirmed the fix there.

Thanks

Read counts of quantifier module versus known miRNA detected by miRDeep2 module

Hi,

Sorry if my question seems weird, but any help is appreciated.

After running quantifier.pl and miRDeep2.pl (Ver. 2.0.0.8), I received the read counts and the total results of my data, respectively. When I did differential expression analysis with DESeq2 on the read counts, I obtained mir16 as the differentially expressed miRNA. But, when I check the final output of miRDeep2.pl, there is no mir16 as the known miRNA detected by miRDeep2.

Is this conflict due to the cut-off criteria (or mismatch allowance) between two modules (quantifier,mirdeep2)? How could I interpret this result?

Thanks.

make_html.pl: lines 2680 - 2694.

Hi, I am getting error messages saying

Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/option{ <-- HERE t}\s*=\s*(\S+)/ at /Users/tina/anaconda2/envs/mirdeep2/bin/make_html.pl" on lines 2680, 2683, 2686, 2689, 2692, and 2694.

I suspect that line

}elsif(/option{t}\s*=\s*(\S+)/){

should be changed to

}elsif(/\Q$options{'t'}\E\s*=\s*(\S+)/){

and all the other mentioned lines edited accordingly.

fix mirdeep2 wrapper to use modern genome references

It's really unfortunate that mapper.pl will happily map away using a modern reference like hg38, but your new mirdeep2.pl wrapper chokes on the very same reference - complaining about the spaces in the identifiers.

Please fix mirdeep2 to handle spaces in identifiers so that we don't have to do a bunch of jiggery pokery to fool mirdeep2 into thinking it is working with some kind of ancient reference.

mapper.pl

Hi,

I am using mapper.pl to map reads (fast) to a genome index generated with bowtie. The command looks like:
mapper.pl file.fastq -e -h -m -p INDEX -s collapsed_reads.fa -t reads_collapsed_vs_genome.arf

The problems arises because I am not getting 100% of the reads mapped, however, the initial input file (file.fastq) is made of reads I have previously mapped to the genome using bowtie and extracted the ones that mapped.

bowtie command: bowtie -v 1 -S -m 5 --best --strata -a INDEX raw_reads.fastq mapped.sam

I wonder how can I solve this and make mapper.pl mapped 100% of the reads since the only reason I am using mapper is to get a *.arf file so I can use mirdeep2.pl later on.

Thanks for the help,
G

bowtie issues when installing mirdeep2

Hi,
I kindly need some help about installing mirdeep2 on my ubuntu running system.
I ran the install.pl script and I got the following error.

Building of /home/robert/programs/mirdeep2_0_0_8/essentials/bowtie-1.2.2/bowtie not successful
Please have a look at the install.log and install_error.log in
the essentials directory
at ./install.pl line 766.

I tried downloading bowtie manually by myself and ran the script again bu the same error appears....when i look into the log file for the error message i do not see anything written in the file.
I would greatly appreciate any help to go about this.
Thank you in advance
Robert

No file mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/precursors.fa_stack found

Hello all! I am trying to work with miRDeep2 on a transcriptomic data (RNA-seq), however, this error pops up every-time I run the miRDeep2.pl script.

No file mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/precursors.fa_stack found

My parameters are as follows:

date && time miRDeep2.pl ~/Project/reads_collapsed.fa ~/Project/newgenome.fa ~/Project/reads_vs_refdb.arf ~/Project/new_human_mature.fasta none ~/Project/new_human_stem.fasta -t Human -g -1 hsa -v

I have elaborated more about this on my Biostars post.
Any and all help would be much appreciated. Thanks.

Output bed files (press, mature and star) have the same coordinates

Hi,
After running mirdeep2 and predicting microRNAs I've realised the 3 bed files have exactly the same positions:

examples:
mature bed file
LQNS02278089.1 848170 848230 LQNS02278089.1_34108 2249659.3 - 848170 848230 0,0,255
LQNS02278089.1 847652 847711 LQNS02278089.1_34106 1566285.5 - 847652 847711 0,0,255

precursor bed file
LQNS02278089.1 848170 848230 LQNS02278089.1_34108 2249659.3 - 848170 848230 0,0,255
LQNS02278089.1 847652 847711 LQNS02278089.1_34106 1566285.5 - 847652 847711 0,0,255

Therefore I am blocked at this stage.
How could I get a bed file containing the coordinates of the mature sequence, or the star sequences, not just the precursor sequences coordinates three times.

Thanks for the help!

mirdeep2 quantification questions

Hello,
I'm trying to use mirdeep2 for plant miRNA study. One of the issue I firstly noticed is that the precursor sequence in mature miRNAs detected by miRDeep2 part is different from what I provided (which are downloaded from miRBase), they are normally shorter than miRBase precursors. I suppose it's because of the de novo precursor identification step in mirdeep2.pl module? How to make miRDeep2 stick with provided precursor sequence and do the scoring and quantification?

Plant miRNA is little bit different, so I also tried miRDP2, a modified version for plant miRNA, to identify novel miRNA in my species. Then I only used quantifier.pl module for quantification. But nearly no significant differential expressed miRNA detected by this quantification (done by DESeq2). As plant miRNAs have many families/copys, I believe that not every single miRNA/copy will express at same level. But by quantifier.pl module algorithm, what I can get is similar expression on different mature miRNA family because the sequence similarity. Does miRDeep2 try to tackle this issue? I'm not sure the precursors selection steps in miRDeep2 module can help with this problem.

Thanks
Ziliang

quantifier.pl: uninitialized hash values

I ran some new data through my usual pipeline and noticed 'uninitialized value' errors in quantifier.pl.

The full error message is

Use of uninitialized value in numeric gt (>) at <path-to>/quantifier.pl line 987, <IN> line <number>.
Use of uninitialized value in numeric gt (>) at <path-to>/quantifier.pl line 998, <IN> line <number>.

where <number> equals the number of lines in reads.processed_mapped.bwt .
I see this pair of error messages exactly as many times, as the number of lines in miRNA_not_expressed.csv (minus the header line).

The reported lines are in the if conditionals in the following block of the PrintExpressionValuesSamples subroutine:

mirdeep2/src/quantifier.pl

Lines 983 to 1006 in ee3417c

for(my $i = 1; $i <= $hash{$_}{'c'}; $i++){
printf OUTG ("%s\t%.2f\t%s\t%.2f",$hash{$_}{$i}{'mature'},$hash{$_}{$i}{'score'},$_,$hash{$_}{$i}{'score'});
for my $sample(sort keys %hash_sample){
next if($sample =~ /config/);
if($hash_sample{$sample}{$_}{$i}{'score'} > 0){
# print OUTG "\t$hash_sample{$sample}{$_}{$i}{'score'}";
printf OUTG ("\t%.2f",$hash_sample{$sample}{$_}{$i}{'score'});
}else{
print OUTG "\t0";
}
}
for my $sample(sort keys %hash_sample){
next if($sample =~ /config/);
if($hash_sample{$sample}{$_}{$i}{'score'} > 0){
#print OUTG "\t$hash_sample{$sample}{$_}{$i}{'score'}";
printf OUTG ("\t%.2f",$total_t*$hash_sample{$sample}{$_}{$i}{'score'}/$total{$sample});
}else{
print OUTG "\t0";
}
}
print OUTG "\n";

so the unitialized field must be $hash_sample{$sample}{$_}{$i}{'score'}.

Is this a bug or a feature?
Can I trust the results of this run to be correct or is something wrong?
If this is expected, can we silence the warning?
If not, can we replace it by a useful warning/error message?

This is done with the latest release and the following commands:

mapper.pl reads.trimmed.fa -c -j -l 18 -m -v -o 12 -p mm10 -s reads.processed -t alignments.arf
quantifier.pl -P -d -j -t mmu -p hairpin.fa -m mature.fa -r reads.processed

novel microRNA counts for each sample

Hi,

I am wondering if there is any function I can use to extract read counts for each sample from the predicted novel microRNAs from miRDeep2?

after mapping using the mapper.pl and config.txt (list of all samples) as follow:

mapper.pl config.txt -d -e -h -i -j -l 18 -m -p bowtie_index_mmu -s all_samples.fa -t all_samples.arf

and followed by miRDeep2.pl

miRDeep2.pl all_samples.fa mmu_reference.fa all_samples.arf mature_mmu.fa hsa_rno_mature.fa mmu_hairpin.fa -t Mouse

I could only found the read counts for each sample in the miRNAs_expressed_all_samples.csv for the known microRNAs but not for the novel microRNAs. Did I miss something here? Thanks in advance for any kind of help!

best,
Dewi

mirdeep2 for miRNA detection in giardia

Hi,
In Giardia, miRNA length peak is reportedly 26 nt (PMID: 24586143) (compared to 22 nt e.g. in vertebrates).
Will this hamper the sensitivity of mirdeep2 in identifying miRNA in this species?
thanks,
iddo

why empty reads_collapsed_vs_genome.arf

command

perl ./mirdeep2_0_0_8/bin/mapper.pl \
  ./S14.clean.fq \
  -e -h -j -l 18 -m \
  -p ./mirna/data/hg19 \
  -s S14.collapsed.fa \
  -t reads_collapsed_vs_genome.arf \
  -v -n \
  2>log.txt

the S14.collapsed.fa have data, but reads_collapsed_vs_genome.arf is empty file, why?


edit (by @mschilli): formatting

mirdeep2 randfold pvalues from two reference genomes for bos taurus

Hi,
I am testing mirdeep2.pl on two reference bovine reference genomes (UMD3.1 and ARS-UCD1.2).
ARS-UCD1.2 is the most recently published bovine reference genome.
I get almost the same expressed mirRNAs with both references, however, with UMD3.1 some of the expressed mirRNAs have significant (yes) randfold p-values and some not significant (no). Surprisingly all expressed mirRNAs with ARS-UCD1.2 have thier randfold p-values not significant and non with a significant p-value.

Is this something you have observed before? Could it be something inherently related to the difference in the two genomes?
Thank you,
Robert

Reads with mismatches are not mapped to the genome in the initial mapping by the mapper.pl

Hi, I have tested miRDeep2.0.1.2 and it seem not to be working with FastQ anymore.
I tried the following command:

mapper.pl config.txt -d -e -o 15 -r 30 -n -h -j -q -l 17 -m -p bowtie_index/genome -s All_collapsed.fa -t Collapsed_vs_genome.arf -v

It is returning with the following ERROR messages:
Use of uninitialized value $query in pattern match (m//) at /usr/local/mirdeep2-master/bin/collapse_reads_md.pl line 129.
Use of uninitialized value $seq in transliteration (tr///) at /usr/local/mirdeep2-master/bin/collapse_reads_md.pl line 96.
Use of uninitialized value $seq in hash element at /usr/local/mirdeep2-master/bin/collapse_reads_md.pl line 97.

However, when I convert the fastq files to fasta, mapper.pl work just fine.

Thanks a lot,

Can't locate Compress/Zlib.pm in @INC

Hi, I get this error every time - can you please advise? Thank you.

#producing graphic results
started: 18:55:35
make_html.pl -f mirdeep_runs/run_01_09_2019_t_17_13_38/output.mrd -p mirdeep_runs/run_01_09_2019_t_17_13_38/tmp/precursors.coords -v 0 -s mirdeep_runs/run_01_09_2019_t_17_13_38/survey.csv -c -e -r /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/bin//../Rfam_for_miRDeep.fa -y 01_09_2019_t_17_13_38 -o -V 2.0.1.2

Can't locate Compress/Zlib.pm in @inc (@inc contains: /cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/lib/perl5/site_perl /cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/lib/perl5 /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/lib/perl5/x86_64-linux-thread-multi /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/lib/perl5 /lustre03/project/6006069/scripts/mirdeep2/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/lib/perl5/PDF/API2/Content.pm line 11.
BEGIN failed--compilation aborted at /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/lib/perl5/PDF/API2/Content.pm line 11.
Compilation failed in require at /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/lib/perl5/PDF/API2/Page.pm line 12.
BEGIN failed--compilation aborted at /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/lib/perl5/PDF/API2/Page.pm line 12.
Compilation failed in require at /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/lib/perl5/PDF/API2.pm line 17.
BEGIN failed--compilation aborted at /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/lib/perl5/PDF/API2.pm line 17.
Compilation failed in require at /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/bin/make_html.pl line 25.
BEGIN failed--compilation aborted at /lustre03/project/6006069/dansapoz/scripts/mirdeep2/mirdeep2/bin/make_html.pl line 25.

ended: 18:55:39
total:0h:0m:4s

got an error during manual installation of PDF-API2

Hi,

I'm trying to install mirdeep2 and bump into some issues. I'm using Debian version 9.

Initially I tried to install it with install.pl. It worked until it tried to download randfold and failed. So I started to follow manual installation step 9. ranfold was successfully installed but couldn't install PDF-API2. Initially I was trying to install latest version (2.036) of PDF-API2 but failed. Then I download older version (0.73) which was mentioned in the readme file.

Below are the command that I typed and the error message. I typed the command in PDF-API2 folder as instructed.

perl Makefile.PL PREFIX=/home/haewonchung/software/mirdeep2 LIB=/home/haewonchung/software/mirdeep2/lib

Can't locate lib/PDF/API2/Version.pm in @inc (you may need to install the lib::PDF::API2::Version module) (@inc contains: /home/haewonchung/software/mirdeep2/lib/perl5 /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.24.1 /usr/local/share/perl/5.24.1 /usr/lib/x86_64-linux-gnu/perl5/5.24 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.24 /usr/share/perl/5.24 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at Makefile.PL line 42.

It seems like it cannot find version file. The version file is actually in /home/haewonchung/software/mirdeep2/essentials/PDF-API2-0.73/lib/PDF/API2 so I thought the PREFIX should be /home/haewonchung/software/mirdeep2/essentials/PDF-API2-0.73

So I tried this

perl Makefile.PL PREFIX=/home/haewonchung/software/mirdeep2/essentials/PDF-API2-0.73 LIB=/home/haewonchung/software/mirdeep2/essentials/PDF-API2-0.73/lib

But still got exact same error message. It seems like it does not read PREFIX. Could you please help me to install it?

Thanks,

HaeWon

Issue changing Make_html.pl

Hi,
I am an admin on a current HPC cluster and one of our users is trying to use this software. I have created a job script but it fails when run. See following error :

"mkdir: cannot create directory `/opt/apps/mirdeep2_0_0_8/src/indexes': Permission denied
Could not open index file for writing: "/opt/apps/mirdeep2_0_0_8/src/indexes/Rfam_index.3.ebwt"
Please make sure the directory exists and that permissions allow writing by
Bowtie.
Command: bowtie-build --wrapper basic-0 /opt/apps/mirdeep2_0_0_8/src//../Rfam_for_miRDeep.fa /opt/apps/mirdeep2_0_0_8/src/indexes/Rfam_index
Settings:
  Output files: "/opt/apps/mirdeep2_0_0_8/src/indexes/Rfam_index.*.ebwt"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 5 (one in 32)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /opt/apps/mirdeep2_0_0_8/src//../Rfam_for_miRDeep.fa
Reading reference sizes
  Time reading reference sizes: 00:00:00
Total time for call to driver() for forward index: 00:00:00
 
Rfam index could not be created in /opt/apps/mirdeep2_0_0_8/src/indexes/
Please check if miRDeep2 is allowed to create the directory /opt/apps/mirdeep2_0_0_8/src/indexes/ and write to it
The Rfam analysis will be skipped as long as this is not possible"

I have tracked this down to that the user does not have permission to create and write "indexes" to "src" directory. Is there any way possible that we can change the path of where the indexes directory is created via command line or job script? i am using centos linux.

I got two different result when running miRDeep2 core algorithm twice

I ran miRDeep2 core algorithm twice using human data.I got a novel miRNA from the first result and some verification work had been done.However, I can not find this novel miRNA in second result.The criteria for identifying the same small RNA is they have same precursors that is from same genome loci, same mature sequence, same star sequence.I check the detailed information and find the novel miRNA has the different score, 2 in first and -1.7 in second. The score cut-off I set is 0. The miRNA that score>0 predicted by miRDeep2 core algorithm is the final miRNA How can I explain it?

how to da data processing

seeming that mapper.pl will do data process automatically, including a specific adaptor sequence. we input a SE50 fasta file(it did QC advanced and seems no adaptor already),and still gain a result fasta file with 50bp sequence.
raw data:
image
after mapper.pl:
image
some questions:
1)is it normal to have a 50bp sequence as input for the quantification module and the mirdeep2 module? will it be too long?
2)should I change U to T advanced? to convert RNA to DNA?
3)as for the output result, how to define a true mature miRNA, with a mirDeep2 score >0?
image

thanks

Illegal left brace in line 2680

Hi,
I'm getting the following error with my anaconda installation (with my data and with the run_tuto.sh)
This issue is similar than #13,
and there's report of this issue in other repositories

started: 16:16:56
make_html.pl -f mirdeep_runs/run_26_05_2019_t_16_16_10/output.mrd -p mirdeep_runs/run_26_05_2019_t_16_16_10/tmp/precursors.coords -v 0 -s mirdeep_runs/run_26_05_2019_t_16_16_10/survey.csv -c -e -r /home/bio/anaconda3/bin//../Rfam_for_miRDeep.fa -y 26_05_2019_t_16_16_10 -o  -V 2.0.0.8 

Unescaped left brace in regex is illegal here in regex; marked by <-- HERE in m/option{ <-- HERE t}\s*=\s*(\S+)/ at /home/bio/anaconda3/bin/make_html.pl line 2680.

ended: 16:16:56
total:0h:0m:0s

Find attached the log file
error-conda.log

Also I got a similar error with the installation via install.pl, find attached
error-installpl.log
I installed the module via conda and cpan

There's any solution for the issue?
Cheers,
Luis Alfonso.

error in install.pl

I'm getting an error in install.pl file
following is the error
Use of uninitialized value $shell in pattern match (m//) at install.pl line 46.
'which' is not recognized as an internal or external command,
operable program or batch file.
'which' is not recognized as an internal or external command,
operable program or batch file.
No grep found on the system

Empty mapped ARF file

I am using the example data and no files are empty.

This is the call I'm using: perl /opt/mirdeep2/src/mapper.pl reads.fa -c -k TCGTATGCCGTCTTCTGCTTGT -m -p cel_cluster -s reads_processed.fq -t reads_mapped.arf -v -n

Here is the log:

clipping 3' adapters
discarding short reads
collapsing reads
mapping reads to genome index
trimming unmapped nts in the 3' ends
Log file for this run is in mapper_logs and called mapper.log_319
Mapping statistics
#desc	total	mapped	unmapped	%mapped	%unmapped
Use of uninitialized value $count2 in subtraction (-) at /opt/mirdeep2/src/mapper.pl line 709.
total: 378333	Use of uninitialized value $count2 in print at /opt/mirdeep2/src/mapper.pl line 709.
	378333	Use of uninitialized value $count2 in division (/) at /opt/mirdeep2/src/mapper.pl line 710.
Use of uninitialized value $count2 in division (/) at /opt/mirdeep2/src/mapper.pl line 710.
0.000	1.000
Use of uninitialized value in subtraction (-) at /opt/mirdeep2/src/mapper.pl line 712.
seq: 378333	Use of uninitialized value in print at /opt/mirdeep2/src/mapper.pl line 712.
	378333	Use of uninitialized value in division (/) at /opt/mirdeep2/src/mapper.pl line 713.
Use of uninitialized value in division (/) at /opt/mirdeep2/src/mapper.pl line 713.

Uninformative error for zero-mappings.

I actually hit the error I traced down in #9 (comment).
In the case of #9 it turned out an invalid index was used.
In my case however, I simply have an extraordinarily bad sRNA-seq run that actually doesn't seem to contain any miRNA reads.
While it is nice to see that miRDeep2 is specific enough to not report random miRNAs in this case, reporting '0 mapped' reads would clearly be better than failing with an error.

I know that running miRDeep2 on an input file without valid miRNA reads is pointless but this is one sample out of a big batch that I run through a standardized Snakemake workflow. Having to trace back that error and manually remove the sample to avoid errors seems pointless given the (supposedly) easy fix.

species not at UCSC

i want to analysis the mirna-seq of rice, but when i use quantifier.pl, i found that rice didn't at UCSC, so is there any solutions to use quantifier.pl.

Unique novel miRNA identifier

Hi,

I am using miRDeep2 version 2.0.08, and all the generated output are fine and I have all the known and novel miRNAs. The question I have here is that, when I want to use novel miRNAs total counts to analyze their expression pattern (control samples vs treated samples), each novel miRNA id has a unique identifier. The major issue is that when I am importing the data into a software like DESeq2 to identify differential expressed miRNA, because all the identifiers among the samples (controls and treatments) are unique, software is not able to perform normalization and other downstream analysis and the reason is due to difference in factor levels

I am not sure there is a way to generate output in miRDeep2 as HTSeq does (one column gene id, second column read counts) among all the samples. miRDeep2 maps the reads against the reference genome, but for each sample generate a unique identifier like NW_018742577.1_6527, which makes the downstream analysis nearly impossible. The numbers after _ are unique in all the samples, so that's the reason all the samples have different factor levels.

I also checked the .bed files of the samples, they have the scaffold ID like NW_018742577.1, but then I have to look for thousands of gene coordination to see the similarity of the predicted novel miRNA. Which is sort of life-long work!

Any help or suggestion to fix the output identifier is appreciated.

No file precursors.fa_stack found

Hi,
I met a problem. I wanted to use mirdeep2 to find the new miRNA, the data is of rice species. When I use miRDeep.pl, this error appeared:

41      66600
42      65026
43      63654
44      62300
45      60926
46      59812
47      58674
48      57480
49      56400
No file mirdeep_runs/run_26_10_2018_t_10_49_21/tmp/precursors.fa_stack found

So I checked the excise_precursors_iterative_final.pl and found I can increase the value of upper_bound or increase the value of pres_max to solve this bug. I tried to increase upper_bound from 50 to 70, and got precursors.fa_stack file which contained 56. But I don't know the meaning of the two parameter. Could you help me to fix it? Thank you.


edited by @mschilli87: fixed formatting

installation error

mirdeep2 version 8: getting this error:

Building of /Users/sysadmin/desktop/mirdeep2_0_0_8/essentials/bowtie-1.2.2/bowtie not successful 
Please have a look at the install.log and install_error.log in the essentials directory at install.pl line 766.

edited by @mschilli87: copied/formatted issue text from issue title.

Mapper block question

Hello, I am trying to incorporate miRDeep2 to our workflow, parallel to a RNAseq analysis. I performed an execution using default mapper.pl and mirdeep.pl scripts and it works. However, our workflow generate a report file including information about RNAseq samples status and mapping metrics, and we use a BAM file to extact some of these metrics. I wish to extract the same information from mapper.pl output or its temporary files, but I have found that it uses non redundant fasta file (collapsed_reads.fa) as bowtie input and extracted metrics are biased by this, so I need to map all raw redundant reads onto genome generating a SAM or BWT, then convert the output file to BAM and extract all information, and finally generate a "collapsed" ARF for use mirdeep2.pl script.
I have also seen that you can use as input a BWA SAM file but it is not clear in documentation if BWA mapping can be performed using all raw reads or collapsed reads:

The user has already removed 3' adapters in color space and has mapped the reads against the genome using the BWA tool. The BWA output file is named reads_vs_genome.sam. Notice that the BWA output contains extra fields that are not required for SAM format. Our converter requires these fields and thus may not work with all types of SAM files.

After that you specified that reads_collapsed.fa and reads_vs_genome.arf must be generated to use mirdeep.pl

The user wishes to generate reads_collapsed.fa and reads_vs_genome.arf to input to miRDeep2

So I don't really know if bwa_sam_converter.pl script collapse needs a non-collapsed fasta mapped to genome and it generates a collapsed ARF file, or it needs to collapse fasta file before BWA mapping.

Could you explain me more specifically this performance?

Another tangential doubt is when you specified that:

Notice that the BWA output contains extra fields that are not required for SAM format. Our converter requires these fields and thus may not work with all types of SAM files.

Which extra fields do bwa_sam_converter.pl need from BWA SAM?

Thanks.

Quantifier.pl maps zero mature sequences

Hello, I have an issue with the quantifier.pl module. I tried feeding it either the precursor and the mature miRNAs it found in my dataset, or the precursor and mature miRNAs downloaded from miRBase, along with my collapsed reads fasta file. The result is always as follows:

getting samples and corresponding read numbers

Converting input files
building bowtie index
mapping mature sequences against index
mapping read sequences against index
Mapping statistics

#desc	total	mapped	unmapped	%mapped	%unmapped
total: 212332886	44658309	167674577	21.032	78.968
FM1: 21036246	295714	20740532	1.406	98.594
FM3: 23723442	2274090	21449352	9.586	90.414
FM5: 25850499	4551095	21299404	17.605	82.395
FT1: 18912541	6798307	12114234	35.946	64.054
MM1: 23057199	5215099	17842100	22.618	77.382
MM2: 23531979	5291248	18240731	22.485	77.515
MM3: 20608633	5292529	15316104	25.681	74.319
MT1: 18055671	5188102	12867569	28.734	71.266
MT2: 17623204	2908784	14714420	16.505	83.495
MT3: 19933472	6843341	13090131	34.331	65.669
analyzing data

0 mature mappings to precursors

Expressed miRNAs are written to expression_analyses/expression_analyses_1567534683/miRNA_expressed.csv
    not expressed miRNAs are written to expression_analyses/expression_analyses_1567534683/miRNA_not_expressed.csv

Creating miRBase.mrd file

Mapped READS readin - DONE 

make_html2.pl -q expression_analyses/expression_analyses_1567534683/miRBase.mrd -k results_all_mature.fasta -y 1567534683  -o -i expression_analyses/expression_analyses_1567534683/results_all_mature.fasta_mapped.arf  -l  -M miRNAs_expressed_all_samples_1567534683.csv  
miRNAs_expressed_all_samples_1567534683.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files




Warning: 0 mature sequences mapped to any of your given precursor sequences

How is it possible, given that miRDeep2.pl actually finds miRNAs and their precursors?

perl install.pl error: URL check for Vienna package fails though URL resolves properly

i want to install mirdeep2,but when i use :perl install.pl
#####error:
If in doubt our program is right, nature is at fault.
Comments should be sent to [email protected].

Downloading Vienna package now

Vienna package not found at http://www.tbi.univie.ac.at/RNA/packages/source/ViennaRNA-1.8.4.tar.gz
Please try to download the Vienna package from here http://www.tbi.univie.ac.at/RNA/RNAfold.html

but i have install Vienna:
enter: RNAfold -h
RNAfold 2.2.6
Calculate minimum free energy secondary structures and partition function of
RNAs
Usage: RNAfold [OPTIONS]...

how can i install mirdeep2?

Detect synthetic miRNA via miRDeep2

Hello,

I have a question regarding the detection of synthetic miRNA.
So far I have been using miRDeep2 for biological data and it worked fine.
Now, synthetic miRNA are not detected/counted, although I adapted the input files accordingly by adding the synthetic miRNA to the reference files ($genome, $mature) for mapper.pl and quantifier.pl:

mapper.pl ${sampleid}.fq -e -h -m -l 18 -s ${sampleid}/collapsed.fa
mapper.pl ${sampleid}/collapsed.fa -c -p $genome -t $${sampleid}/mapped.arf
quantifier.pl -p $hairpin -m $mature -r ${sampleid}/collapsed.fa -t $spshort

Is it possible that also the precursors ($hairpin) have to be adapted?

Many thanks in advance

Allowed species

Hi,

The species Sheep you specified is not available
allowed species are
.....

Is there alternative way to include Sheep? Earlier, I had no issue in analyzing sheep miRNA data.

Thanks,
Kisun

Empty mapped_arf file

Hi,
I am ending up with the below error while using mapper.pl

desc   total   mapped  unmapped        %mapped %unmapped
Use of uninitialized value $count2 in subtraction (-) at /data1/WHRI-GenomeCentre/software/mirdeep2/mirdeep2/bin/mapper.pl line 709.
total: 4162286  Use of uninitialized value $count2 in print at /data1/WHRI-GenomeCentre/software/mirdeep2/mirdeep2/bin/mapper.pl line 709.
        4162286 Use of uninitialized value $count2 in division (/) at /data1/WHRI-GenomeCentre/software/mirdeep2/mirdeep2/bin/mapper.pl line 710.

Below are the commands used to run the program

mapper.pl ${PROJECTDIR}/config.txt -d -e -i -l 18 -h -j -m -p /data/WHRI-GenomeCentre/Genome/Human/mirdeep/GRCh38 -s processed_reads.fa -t mapped_reads.arf -n -v

I am also attaching the log file.

mapper.log_25882_1529404721.txt

mirdeep2.pl:-g parameter

I wonder why the -g default parameter is 50000.Can the -g parameter be set to - 1 directly, which will result in slow program operation or inaccurate prediction results?

Estimated false positive number is greater than the number of reported novel miRNA precursors

Hello,

When running miRDeep2.pl I get very weird results file. I joined portion of .arf and .fa file that generates similar results as the whole file does. I'm using the following command line (with the bovine genome from UCSC bosTau8 or UMD3.1.1 and version 22 of miRBase database) : miRDeep2.pl allsamples-aligned.fa UMD3.1.1.fa allsamples-aligned.arf bta_mature_v22.fa mature_others_v22.fa bta_hairpin_v22.fa -t Cow

I joined the result file using the whole .arf and .fa file (result_14_12_2018_t_14_28_01IL.xlsx) so you can have a look at it, as well as the result file generated with only part of the .arf and .fa (result_smaller_arf_fa.xlsx)

How can it be possible that I obtain more estimated false positives than the number of novel miRNAs reported by miRDeep2 which lead to an estimation of (0 +/- 0%)?

Could you try it on your system with the small files to verify if you get the same kind of output as I do? Because in the file that I generate, there's novel miRNA that are above 5 miRDeep2 score, but they does not seem to be compiled in the table at the top, is it normal?

Thanks for your help!

result_smaller_arf_fa.xlsx

result_14_12_2018_t_14_28_01IL.xlsx

allsamples-aligned.fa.txt
allsamples-aligned.arf.txt

quantifier.pl -w error: "Mature mapping file hairpin_sfr.fa.dummy_mapped.bwt not found"

I am using miRDeep2-0.1.2. When running quantifier.pl -w:

quantifier.pl -k -w -p ../index/hairpin_sfr.fa \
    -m ../index/mature_sfr.fa \
    -r reads_collapsed.fa &>quantifier_run1.log

I get the following error, and no quantification is performed:

Mature mapping file hairpin_sfr.fa.dummy_mapped.bwt not found

However, without -w, quantifier.pl runs without errors and generates all expected files:

quantifier.pl -k -p ../index/hairpin_sfr.fa \
    -m ../index/mature_sfr.fa \
    -r reads_collapsed.fa &>quantifier_run2.log

edit: if needed, I can work out a script to reproduce the issue.

miRNA normalization based on total reads includes multiple mapped reads

Hi!

I was just checking how exactly miRDeep2 calculates the normalized reads column (seq(norm)) and read in the FAQ that it is a simple RPM calculation where the lib size is the number of reads mapped against precursors.

This number of reads includes reads that map against multiple precursors and also counts those multiple times. If there are 10 reads that map against 2 precursors the library size is 20.

This will lead to library size variations depending on the miRNA profile. If there are miRNAs overexpressed that have multiple precursors, the library size will be larger.

I would thus suggest to take the sum of all unique mapped reads for the calculation of RPM, as the RPM normalization is aiming to normalize for sequencing library size and should not be dependent of the sequences found.

bin/env/perl -W Permission denied - Already solved; would like to share only!

I am using miRDeep2 using Anaconda. While I was running the miRDeep2.pl command, an error bin/envs/perl -W Permission denied was appeared in the start of the process.

To resolve the issue, just go the anaconda directory that miRDeep2 library is installed, something like /home/$USER/.conda/envs/mirdeep2/bin ; there should be four perl script starting with sanity_check_, open each of them and remove -W from the first line and save it.
You are good to go!

Rfam_for_miRDeep.fa not found in your miRDeep2 scripts

I am running miRDeep2.0.0.x on a mac, I went through the installation process specified here https://github.com/rajewsky-lab/mirdeep2 and everything looks fine. I confirmed I got bowtie,
RNAfold, randfold and make_html.pl working. However when I launch mirdeep2 I get this error:

miRDeep2 started at 22:44:14

mkdir mirdeep_runs/run_17_01_2019_t_22_44_14

Error: Rfam_for_miRDeep.fa not found in your miRDeep2 scripts directory
Please copy this file from the miRDeep2 archive to your miRDeep2 directory

I check in another post #1 and they say the file Rfam_for_miRDeep.fa needs to be in the mirdeep2 directory. However, this file is already there in my case.

mirdeep2 directory list files:
CHANGELOG.md Rfam_for_miRDeep.fa install_successful
CREDITS TUTORIAL.md lib
DISCLAIMER bin man
FAQ.md documentation.html src
LICENSE essentials tutorial_dir
README.md install.pl

Could anyone provide some help ?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.