Assignments of Coursera National Language Processing by Michael Collins Columbia University
----
H1: Hidden Markov Models
----
Instruction refer to h1/h1.pdf
hmm.py
Hmm_ex, extending Hmm, calculates and stores:
* e(x|y),
* q(y_i|y_i-1, y_i-2)
* count(x),
* rare_word,
* all tags
* all words
SimpleTagger does simple tagging as instructed by Part 1
ViterbiTagger does Viterbi tagging as instructed by Part 2
p1.py
Part 1
p2.py
Part 2
p3.py
Part 3
not as good as required: Your F1-Score is 35.009 and the goal F1-Score is 39.519.
util.py
Helper methods including
* handling rare word (applying different rules)
* test data iterator
----
H2: Probabilistic Context-Free Grammar (PCFG)
----
Instruction refer to h2/h2.pdf
pcfg.py
PCFG, extending Count, calculate and store
* q(X->Y1Y2)
* q(X->w)
CKYTagger implements CKY algorithm
p1.py
Part 1
p2.py
Part 2
Expected development total F1-Scores are 0.79 for part 2 and 0.83 for part 3.
p3.py
Part 3
----
H3: IBM Model 1 & 2
----
Instruction refer to h3/h3.pdf
ibmmodel.py
Count
* t(f|e)
IBMModel1, implements EM and align algorithm
p1.py
Part 1
The expected development F-Scores are 0.420, 0.449, and a basic intersection alignment should give 0.485 for the last part.
----
H4: GLM
----
glm.py
Par1
```
Found 1337 GENEs. Expected 642 GENEs; Correct: 280.
precision recall F1-Score
GENE: 0.209424 0.436137 0.282971
```
Par2
```
Found 775 GENEs. Expected 642 GENEs; Correct: 390.
precision recall F1-Score
GENE: 0.503226 0.607477 0.550459
```
Par3
```
Found 571 GENEs. Expected 642 GENEs; Correct: 366.
precision recall F1-Score
GENE: 0.640981 0.570093 0.603462
```