Before the beginning, I recommend you to know some basic of pairwise alignment algorithms such as Needleman-Wunch algorithm because I use to calculate pairwise alignments by Needleman-Wunch.
- Find Sc maximizing Σic D(Sc, Si ).(Note: Some resources say minimizing)
- Iteratively construct the multiple alignment Mc:
- Mc={Sc}
- Add the sequences in S{Sc} to Mc one by one so that the induced alignment aMc(Sc, Si) of every newly added sequence Si with Sc is optimal. Add spaces, when needed, to all pre-aligned sequences.
Combination(k,2) * O(n^2) = O(k^2 * n^2)
AC-BC
DCABC
AC--BC
DCAABC
=
AC--BC
DCA-BC
DCAABC
The format of the input file must be this way;
{matchScore},{mismatchScore},{gapScore}
{sequence1}
{sequence2}
{sequence3}
.
.
.
{sequenceN}
for example; whether matchScore = 1, mismatchScore = -1, gapScore = -4 and four strings, the file should be;
1,-1,-4
MPE
MKE
MSKE
SKE
python cstar.py inputfile outputfile
- Score matrix support []