Machine translation project for the SPP 'Machine Translation' by RWTH computer science department i6 in the summer semester 2023.
- Andreas Pletschko : [email protected]
- Lukas Vierling : [email protected]
- Glen Grant : [email protected]
- Justus Peretti : [email protected]
- Benedikt Hilmes
- Prof. Dr.-Ing. Hermann Ney
Model | BLEU |
---|---|
Feed Forward Model | 0.213 |
RNN Encoder Decoder Model | 0.322 |
To train the Feed-forward model, execute the following command:
python main.py train ff
Or train the recurrent model with:
python main.py train rnn
You can either pass hyperparameters as arguments with the python script, e.g. :
python main.py train rnn --epoch 10 --optimizer adam
Alternatively, you can provide a predefined YAML config file:
python main.py train rnn --config <path to config file>
To train an existing model, use the following command:
python main.py train rnn --model_dir <path to model> --optimizer_dir <path to optimizer>
To evaluate a model, use the following command:
python main.py evaluate rnn --path_to_folder <path to folder with model checkpoints> --model_type <model type, rnn or ff> --dest_path <path where the BLEU scores will be stored>
To translate text with a trained model, use the following command:
python decode.py --model_type <rnn or ff> --model_path <path to model> --source_path <path to source>
Scoring methods are important for machine translation because they provide a way to measure the accuracy and quality of the translation output. This helps to identify areas of improvement and evaluate the performance of different translation models. Additionally, scoring methods are necessary to compare the translation output to the reference or human-generated translations, which is essential for benchmarking and evaluation of machine translation systems.
Implementation of several scoring methods to compare hypothesis and references.
- WER (Word Error Rate)
- PER (Position-independent Error Rate)
- BLEU (Bilingual Evaluation Understudy)
- Levenshtein-Distance
Byte Pair Encoding (BPE) is a tokenization algorithm that splits words into subwords based on their frequency in a given text corpus. BPE is an important preprocessing step for many NLP tasks, as it can reduce the vocabulary size and improve model performance. In addition, batching is a crucial technique for efficient training of neural networks, as it allows for parallel processing of multiple input samples. Together, BPE and batching can significantly improve the speed and accuracy of NLP models, making them more practical and scalable for real-world applications. For example, BERT, one of the most successful NLP models, utilizes BPE and batching to achieve state-of-the-art results on a variety of NLP benchmarks, demonstrating the importance of these techniques for the advancement of natural language understanding.
Implementation of several preprocessing steps.
- Byte Pair Encoding (BPE)
- A Dictionary
- Batch Function
A first simple neural model for translating from German to English sentences is implemented. Model is implemented using torch, we then write a training script to learn the model's weights. Finally, we tune the model's hyperparameters and experiment with different architectures to achieve the best possible perplexity on dev set.
- Training on batches created in Part 2
- Saving and loading models
- Evaluating model on development data periodically
- Printing architecture of the model
- Learning rate scheduling
- Hyper parameter tuning
With the use of the obtained translation model, we now perform search to employ the model in a real-world translation scenario. We first implement a scoring function, that -- given a model and source/target sentence pairs -- calculates a score for how likely the model predicts the given target sentence from each source sentence. Speaking, we implement the greedy and beam search algorithms for decoding aswell as a early stopping functionality in our training script. Finally, we evaluate our own model on real translation tasks with the newly implemented search methods.
- Scoring function to calculate a given model's score of a source/target sentence pairs
- Searching algoriths, featuring greedy search and beam search
- Decoding interface that the user can interact with to input into the system
- Early Stopping
- Automatically evaluating a folder of model checkpoints with BLEU
- Examining the BLEU values with our self-implemented model
RNNs are crucial in encoder-decoder structures for machine translation due to their ability to capture sequential dependencies. Unlike FNNs, RNNs excel at modeling contextual information, making them more suitable for variable-length input sequences. Their recurrent nature enables them to retain memory of past inputs, capturing long-range dependencies and improving translation quality. RNNs with attention mechanisms dynamically focus on relevant information, aligning source and target languages for better translations. Compared to simple FNNs, RNNs in encoder-decoder structures achieve more accurate and fluent machine translations.
To enhance the performance of our model we did some hyperparameter tuning. Hyperparemter tuning is an essential part of machine learning, as it allows us to find the best possible configuration for our model. We used the following hyperparameters:
- Number of layers
- Number of hidden units
- Size of embedding layer
- Dropout rate
- Batch size
- BPE encoding
- Layer normalization
- Gradient clipping
- Teacher forcing