Giter Club home page Giter Club logo

machine-translation's Introduction

Machine Translation

Machine translation project for the SPP 'Machine Translation' by RWTH computer science department i6 in the summer semester 2023.

Creators:

Supervisors:

  • Benedikt Hilmes
  • Prof. Dr.-Ing. Hermann Ney

Results

Model BLEU
Feed Forward Model 0.213
RNN Encoder Decoder Model 0.322

Usage

To train the Feed-forward model, execute the following command:

python main.py train ff

Or train the recurrent model with:

python main.py train rnn

You can either pass hyperparameters as arguments with the python script, e.g. :

python main.py train rnn --epoch 10 --optimizer adam

Alternatively, you can provide a predefined YAML config file:

python main.py train rnn --config <path to config file>

To train an existing model, use the following command:

python main.py train rnn --model_dir <path to model> --optimizer_dir <path to optimizer>

To evaluate a model, use the following command:

python main.py evaluate rnn --path_to_folder <path to folder with model checkpoints> --model_type <model type, rnn or ff> --dest_path <path where the BLEU scores will be stored>

To translate text with a trained model, use the following command:

python decode.py --model_type <rnn or ff> --model_path <path to model> --source_path <path to source>

Part 1

Scoring methods are important for machine translation because they provide a way to measure the accuracy and quality of the translation output. This helps to identify areas of improvement and evaluate the performance of different translation models. Additionally, scoring methods are necessary to compare the translation output to the reference or human-generated translations, which is essential for benchmarking and evaluation of machine translation systems.
Implementation of several scoring methods to compare hypothesis and references.

  • WER (Word Error Rate)
  • PER (Position-independent Error Rate)
  • BLEU (Bilingual Evaluation Understudy)
  • Levenshtein-Distance

Part 2

Byte Pair Encoding (BPE) is a tokenization algorithm that splits words into subwords based on their frequency in a given text corpus. BPE is an important preprocessing step for many NLP tasks, as it can reduce the vocabulary size and improve model performance. In addition, batching is a crucial technique for efficient training of neural networks, as it allows for parallel processing of multiple input samples. Together, BPE and batching can significantly improve the speed and accuracy of NLP models, making them more practical and scalable for real-world applications. For example, BERT, one of the most successful NLP models, utilizes BPE and batching to achieve state-of-the-art results on a variety of NLP benchmarks, demonstrating the importance of these techniques for the advancement of natural language understanding.
Implementation of several preprocessing steps.

  • Byte Pair Encoding (BPE)
  • A Dictionary
  • Batch Function

Part 3

A first simple neural model for translating from German to English sentences is implemented. Model is implemented using torch, we then write a training script to learn the model's weights. Finally, we tune the model's hyperparameters and experiment with different architectures to achieve the best possible perplexity on dev set.

  • Training on batches created in Part 2
  • Saving and loading models
  • Evaluating model on development data periodically
  • Printing architecture of the model
  • Learning rate scheduling
  • Hyper parameter tuning

Model Architecture

Part 4

With the use of the obtained translation model, we now perform search to employ the model in a real-world translation scenario. We first implement a scoring function, that -- given a model and source/target sentence pairs -- calculates a score for how likely the model predicts the given target sentence from each source sentence. Speaking, we implement the greedy and beam search algorithms for decoding aswell as a early stopping functionality in our training script. Finally, we evaluate our own model on real translation tasks with the newly implemented search methods.

  • Scoring function to calculate a given model's score of a source/target sentence pairs
  • Searching algoriths, featuring greedy search and beam search
  • Decoding interface that the user can interact with to input into the system
  • Early Stopping
  • Automatically evaluating a folder of model checkpoints with BLEU
  • Examining the BLEU values with our self-implemented model

Part 5

RNNs are crucial in encoder-decoder structures for machine translation due to their ability to capture sequential dependencies. Unlike FNNs, RNNs excel at modeling contextual information, making them more suitable for variable-length input sequences. Their recurrent nature enables them to retain memory of past inputs, capturing long-range dependencies and improving translation quality. RNNs with attention mechanisms dynamically focus on relevant information, aligning source and target languages for better translations. Compared to simple FNNs, RNNs in encoder-decoder structures achieve more accurate and fluent machine translations.

Part 6

To enhance the performance of our model we did some hyperparameter tuning. Hyperparemter tuning is an essential part of machine learning, as it allows us to find the best possible configuration for our model. We used the following hyperparameters:

  • Number of layers
  • Number of hidden units
  • Size of embedding layer
  • Dropout rate
  • Batch size
  • BPE encoding
  • Layer normalization
  • Gradient clipping
  • Teacher forcing

machine-translation's People

Contributors

lukasvierling avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.