Machine Translation

Machine translation project for the SPP 'Machine Translation' by RWTH computer science department i6 in the summer semester 2023.

Creators:

Andreas Pletschko : [email protected]
Lukas Vierling : [email protected]
Glen Grant : [email protected]
Justus Peretti : [email protected]

Supervisors:

Benedikt Hilmes
Prof. Dr.-Ing. Hermann Ney

Results

Model	BLEU
Feed Forward Model	0.213
RNN Encoder Decoder Model	0.322

Usage

To train the Feed-forward model, execute the following command:

python main.py train ff

Or train the recurrent model with:

python main.py train rnn

You can either pass hyperparameters as arguments with the python script, e.g. :

python main.py train rnn --epoch 10 --optimizer adam

Alternatively, you can provide a predefined YAML config file:

python main.py train rnn --config <path to config file>

To train an existing model, use the following command:

python main.py train rnn --model_dir <path to model> --optimizer_dir <path to optimizer>

To evaluate a model, use the following command:

python main.py evaluate rnn --path_to_folder <path to folder with model checkpoints> --model_type <model type, rnn or ff> --dest_path <path where the BLEU scores will be stored>

To translate text with a trained model, use the following command:

python decode.py --model_type <rnn or ff> --model_path <path to model> --source_path <path to source>

Part 1

Scoring methods are important for machine translation because they provide a way to measure the accuracy and quality of the translation output. This helps to identify areas of improvement and evaluate the performance of different translation models. Additionally, scoring methods are necessary to compare the translation output to the reference or human-generated translations, which is essential for benchmarking and evaluation of machine translation systems.
Implementation of several scoring methods to compare hypothesis and references.

WER (Word Error Rate)
PER (Position-independent Error Rate)
BLEU (Bilingual Evaluation Understudy)
Levenshtein-Distance

Part 2

Byte Pair Encoding (BPE) is a tokenization algorithm that splits words into subwords based on their frequency in a given text corpus. BPE is an important preprocessing step for many NLP tasks, as it can reduce the vocabulary size and improve model performance. In addition, batching is a crucial technique for efficient training of neural networks, as it allows for parallel processing of multiple input samples. Together, BPE and batching can significantly improve the speed and accuracy of NLP models, making them more practical and scalable for real-world applications. For example, BERT, one of the most successful NLP models, utilizes BPE and batching to achieve state-of-the-art results on a variety of NLP benchmarks, demonstrating the importance of these techniques for the advancement of natural language understanding.
Implementation of several preprocessing steps.

Byte Pair Encoding (BPE)
A Dictionary
Batch Function

Part 3

A first simple neural model for translating from German to English sentences is implemented. Model is implemented using torch, we then write a training script to learn the model's weights. Finally, we tune the model's hyperparameters and experiment with different architectures to achieve the best possible perplexity on dev set.

Training on batches created in Part 2
Saving and loading models
Evaluating model on development data periodically
Printing architecture of the model
Learning rate scheduling
Hyper parameter tuning

Part 4

With the use of the obtained translation model, we now perform search to employ the model in a real-world translation scenario. We first implement a scoring function, that -- given a model and source/target sentence pairs -- calculates a score for how likely the model predicts the given target sentence from each source sentence. Speaking, we implement the greedy and beam search algorithms for decoding aswell as a early stopping functionality in our training script. Finally, we evaluate our own model on real translation tasks with the newly implemented search methods.

Scoring function to calculate a given model's score of a source/target sentence pairs
Searching algoriths, featuring greedy search and beam search
Decoding interface that the user can interact with to input into the system
Early Stopping
Automatically evaluating a folder of model checkpoints with BLEU
Examining the BLEU values with our self-implemented model

Part 5

RNNs are crucial in encoder-decoder structures for machine translation due to their ability to capture sequential dependencies. Unlike FNNs, RNNs excel at modeling contextual information, making them more suitable for variable-length input sequences. Their recurrent nature enables them to retain memory of past inputs, capturing long-range dependencies and improving translation quality. RNNs with attention mechanisms dynamically focus on relevant information, aligning source and target languages for better translations. Compared to simple FNNs, RNNs in encoder-decoder structures achieve more accurate and fluent machine translations.

Part 6

To enhance the performance of our model we did some hyperparameter tuning. Hyperparemter tuning is an essential part of machine learning, as it allows us to find the best possible configuration for our model. We used the following hyperparameters:

Number of layers
Number of hidden units
Size of embedding layer
Dropout rate
Batch size
BPE encoding
Layer normalization
Gradient clipping
Teacher forcing

lukasvierling / machine-translation Goto Github PK