Goal of the project is to code both unigram and bigram language identification systems. We are then required to input test sentences and put some output files in the log directory. View the following sections for examples, or information on running the application.
To run the application, simply boot it up in your favourite Java IDE. A jar file may or may not be provided when the demo time comes.
Below is a list of all the training files available in the /texts
directory.
en-full.txt
Contains both texts below.
en-moby-dick.txt
en-the-little-prince.txt
fr-full.txt
Contains both texts below.
fr-le-petit-prince.txt
fr-vingt-mille-lieues-sous-les-mers.txt
po-full.txt
Contains both texts below.
po-emma.txt
po-brida.txt
sentences.txt
In this text, the first 10 sentences were given in the handout, the middle 10 are evaluated correctly and the last 10 are evaluated incorrectly by at least one of the models.