In this project we are going to predict the gene structure of the 5 annotated genomes, i.e. genome6.fa, genome7.fa, genome8.fa, genome9.fa, and genome10.fa (Found in genome/. We use Hidden Markov Models to preduct the gene structure, thus the following three steps are done:
- Deciding on an initial model structure, i.e. the number of hidden states and which transitions and emission should have a fixed probability (e.g. 0 for "not possible", or 1 for "always the case")
- Tune model parameters by training, i.e. set the non-fixed emission and transition probabilities.
- Use your best model to predict the gene structure for the 5 unannotated genomes using the Viterbi algorithm with subsequent backtracking. I.e. for each unannotated genome the most likely sequence of states in the best model generating it, and converting this sequence of states into a FASTA file giving the annotation of each nucleotide as N, C, or R.
- Ensure that you have mono installed
- Run
xbuild /tv:4.0
in root directory - Change into directory
hhm/bin/[Debug]/
- Run
mono hhm.exe