Adam Poliak — apoliak1
Jordan Matelsky — jmatels1
Leaderboard submission: j6k4m8 (and previously, azpoliak)
We intended to implement an alignment algorithm to optimize for reducing AER, basing our work primarily on the IBM Model I1. In the following sections, we explain (1) our implementation and (2) the specific modifications we made to IBM Model I, and the mathematical motivation behind the modifications.
Our implementation is based on IBM Model I, defining the likelihood of alignment of native word
We experimented with varying numbers of iterations over which to train our model, finally settling on
The implementation of IBM Model 1 can be found in the python script "align".
In order to improve the accuracy of our alignments, we chose to treat the corpus both as a
Our initial results took the form of
That is, for each
From this point, we chose to experiment with ways of combining the relative probabilities in order to maximize alignment accuracy. Our functions took the form score-alignments
).
The function upon which we finally arrived was
...where
This function favors 'diagonal' sentences — sentences in which the alignment of a word at index
Alignment 8 KEY: ( ) = guessed, * = sure, ? = possible
---------------------------------------------------------------
|(?) ? | jusque
| ? (?) | ici
| * ( ) | ,
| (?) | près
| ? ( ) | de
| * ( ) | $
| * ( ) | 250,000
| * ( ) | ont
| * ( ) | été
| (?) | octroyés
| (*) | par
| (*) ? | le
| (*) | institut
| * ( ) | en
| (*) | subventions
| (?) | à
| ( ) ? | la
| * ( ) | recherche
| * ( ) | ou
| (?) | à
| ? ? ? ? ( ) | de
| ( ) ? ? ? | les
| ? ? (?) | programmes
| ? ? ? ( ) | de
| ? * (?) | éducation
| ? ? ? ( ) | de
| ( ) ? ? ? | le
| ? ? (?) | public
| (*) | .
---------------------------------------------------------------
s f , a $ 2 h b i b t i i g f r o p e a .
o a r 5 a e s y h n n r o e r u d c
r o 0 v e s e s a r s b u t
u , e n u t n e l c i
n 0 e i t a i a v
d 0 d t s r c t i
0 u c i t
t h o y
e n
In this case, word alignments that are close to the same index (low research
and recherche
, are favored. With this system, a word with index subject-verb-object
vs object-verb-subject
, for instance) or in languages where word order is unimportant, or variable.
The implementation of the modified IBM Model 1 trained on a French-English model and an English-French model can be found in the python script "align_merge".
We foresee several ways in which our algorithm could be improved both in the French→English case, as we used for testing and experimentation, as well as in the general language-to-language case.
- Parallelization. We began to experiment with parallelization of independent segments of our algorithm in order to improve wall-clock time with no change to algorithmic time-complexity. These tests are available in the
align_merge_parallel
executable. However, we experienced memory-allocation errors at larger volumes of input data when we ran on conventional hardware (AWS EC2 m3.medium). Though we anticipate more performant runs on cluster-based architectures, we had inadequate time to test this entirely. - Improved, later-generation IBM Model. Using a more sophisticated basis for alignments, prior to adding our "back-and-forth" system, would greatly reduce our alignment error rate. We anticipate making these changes in the future, and have written our system to be modular in order to simplify the process of adding these additions.
- Loop combination. We iterate through our vocabularies several times. We anticipate being able to reduce the amount of looping we perform in several ways, given adequate development time.
We collected metrics over a variety of runs. The timing data are averaged over several runs, while the (deterministic) AER, precision, and recall remained consistent with the same parameters.
x: Number of sentences. y: Value. On the left, green represents AER, red represents recall, and blue represents precision. The code used to generate this figure is available in
generate_table1.py
. On the right, black points mark the time taken to calculate that many sentences, using 5 iterations. The calculation is clearly superlinear. We predict approximately$O(n^3)$ .
Here is a table describing our results for running our modified IBM Model 1 implementation
Training data size | 1k lines | 10k lines |
---|---|---|
Precision | 0.582734 | 0.682152 |
Recall | 0.488166 | 0.573964 |
AER | 0.459603 | 0.366801 |
Here is a table describing our results for running our modified IBM Model 1 implementation after combining the
Training data size | 1k lines | 10k lines | 20k lines | 50k lines | 100k lines |
---|---|---|---|---|---|
Precision | 0.595007 | 0.663435 | 0.672199 | 0.673130 | 0.684211 |
Recall | 0.748521 | 0.849112 | 0.866864 | 0.872781 | 0.881657 |
AER | 0.355996 | 0.277358 | 0.265787 | 0.263208 | 0.252830 |
As seen in the table above, our IBM Model 1 implementation with combining
The best modification was the IBM Model 1 implementation with
1 Our implementation is derived in general from the text available here. ↩