Argos Translate | Tutorial | Video tutorial
Argos Train trains an OpenNMT PyTorch Transformer model and a SentencePiece tokenizer and packages them with Stanza data as an Argos Translate package. Argos Translate packages, which are zip archives with a .argosmodel extension, can be used with Argos Translate, LibreTranslate, and Dot Lexicon.
Pre-trained Argos Translate packages are available for download. If you have trained packages you're willing to share please get in contact so that they can be published on the Argos Translate package index.
$ su argosopentech
$ source ~/argos-train-init
...
$ argos-train
From code (ISO 639): en
To code (ISO 639): es
From name: English
To name: Spanish
Version: 1.0
...
Package saved to /home/argosopentech/argos-train/run/en_es.argosmodel
Data from data-index.json is used for training. Argos Translate primarily uses data from the Opus project.
To train a model with custom data add your data to data-index.json
after running argos-train-init
with a link to download your custom data package. Data packages are zipped directories with a .argosdata extension that contain a source
and target
file with parallel data in corresponding lines and a metadata.json
file.
Docker image available at argosopentech/argostrain.
docker run -it argosopentech/argostrain /bin/bash
argos-train
CUDA required, tested on vast.ai.
Licensed under either the MIT or CC0 License (same as Argos Translate).