Implementation Chinese Word Segmentation use HMM
The corpus is downloan from SIGHAN
The preprocess of the data,cal the paramater of the HMM is complish in preprocess.py
The viterbi algorithm is complish in hmm.py
You can run example.py to get the result of the word segmentation.
What you should do is to change the test_dir and the train_dir for you own.
I download the corpus in category ./data/ and the result is store in ./data/result