The current implementation of simple substitution cipher uses at least O(n^3) time, which is startlingly inefficient. Using chi-square, frequency analysis, a tree of possibilities, and only after this an indexed dictionary, rewrite SimpleSubstitutionCipher.
Integrate this into UI (frequencyanalysissimulator.presentation.main.Main) and dataanalysis package with a button to choose substitution cipher. The algorithm should efficiently (should seem instantaneous when run) find the plaintext when given a ciphertext without the key. The UI can also give various properties if relevant.
According to William Friedman's test, the index of coincidence for a polyalphabetic cipher should be less than 0.0660. All of the inputs in the dataset should have this property because they are encrypted using the Vigenère cipher. Instead, the index of coincidence for some values at the beginning of the data collection for friedman_kerckhoff has large negative values because the index of coincidence is greater than kappa_r (probability of uniform random selection from the case-insensitive English alphabet).
The bug couldn't occur in the Friedman Test because the formula is copied. The source of the bug could be:
in Vigenere encryption, which means the entire dataset must be recollected (all experimental groups, not just the ones with Friedman) after tests prove that the Vigenere encryption is perfectly accurate.
index of coincidence calculation (though it seems to also be directly from the formula)
It could be something to do with floating-point precision
Right now DataCollector uses the original cipher length by substringing, but the original input includes punctuation and spaces, which means the data is erroneous. Use a different algorithm or modify the algorithm to ignore spaces and punctuation when counting the length, unless it is negligible.