Comments (18)
@sriprad
The issue is with fairseq, it is sadly not compatible with windows due to a badly written setup script.
Fairseq is requried for the mBART and m2m model.
If you use opus-mt, fairseq is not required. You can install it then like this:
pip install --no-deps easynmt
pip install tqdm transformers numpy nltk sentencepiece
You also need pytorch:
https://pytorch.org/get-started/locally/
If you need automatic language detection, you also need fastText, which can be installed like this:
pip install fasttext
or when you use Anaconda: https://anaconda.org/conda-forge/fasttext
In that case you can use the opus-mt model.
from easynmt.
Thank you so much @nreimers . Really helpful. However i landed up in another issue may be not related to library but got this one if you can help. I always landed up in the wrong place to download this. if you can help me here.
building 'fasttext_pybind' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/
from easynmt.
Also i am running the code like this hope this is correct.
from easynmt import EasyNMT
model = EasyNMT('opus-mt')
#Translate a single sentence to German
print(model.translate('This is a sentence we want to translate to German', target_lang='de'))
#Translate several sentences to German
sentences = ['You can define a list with sentences.',
'All sentences are translated to your target language.',
'Note, you could also mix the languages of the sentences.']
print(model.translate(sentences, target_lang='de'))
from easynmt.
Thank you so much @nreimers . Really helpful. However i landed up in another issue may be not related to library but got this one if you can help. I always landed up in the wrong place to download this. if you can help me here.
building 'fasttext_pybind' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/
fastText is used for the automatic language detection.
When on Windows and using Anaconda, you can install it like this:
https://anaconda.org/conda-forge/fasttext
Yes, the code you showed is correct.
from easynmt.
from easynmt.
If you provide the source language (source_lang=
), fastText is not needed. In that case, you don't have to install it.
Translation of documents (like word, powerpoint, excel) is sadly not yet supported (it is on the agenda, help appreciated).
But you could implement it yourself for excel:
- Store it as CSV
- Load it with Python
- Translate all entries
- Write it back to CSV
- Load with excel
from easynmt.
Thank you @nreimers if i provide (source_lang=) then how will be the structure of the code like what changes i need to make below to include
- source_lang =
- pass the csv file.
from easynmt import EasyNMT
model = EasyNMT('opus-mt')
#Translate a single sentence to German
print(model.translate('This is a sentence we want to translate to German', target_lang='de'))
#Translate several sentences to German
sentences = ['You can define a list with sentences.',
'All sentences are translated to your target language.',
'Note, you could also mix the languages of the sentences.']
print(model.translate(sentences, target_lang='de'))
Sure happy to support in development
from easynmt.
You code must then look like this:
from easynmt import EasyNMT
model = EasyNMT('opus-mt')
#Translate a single sentence to German
print(model.translate('This is a sentence we want to translate to German', source_lang='en', target_lang='de'))
Note, with version 1.0.2:
https://github.com/UKPLab/EasyNMT/releases/tag/v1.0.2
I added two more options for language detection that are compatible with Windows:
pip install langid
If fastText is not installed, it will fall back to either langid or to langdetect.
from easynmt.
Thank you @nreimers . I could run it with your examples.
Please find below i am trying to upload a csv with a column needs to be translated.
how should i pass the dataframe?
thanks
df = pd.read_csv("C:/xx/trans.csv",encoding = 'unicode_escape')
#how to pass the dataframe?
print(model.translate(, source_lang='en', target_lang='de'))
from easynmt.
Hi @sriprad
You could try:
df['translated_text'] = model.translate(df['source_text'], source_lang='en', target_lang='de')
Not sure how the pandas DF looks for your csv. You might have to update the column names in the Python code.
from easynmt.
Thank you @nreimers . very helpful.
Do you need GPU to run it? I am running the translation on 10k rows of data. With each row has varying line length of 5 to 8. It's been running for last 30 mins and still running.
It has finally run with 1 hour 25 mins. But brilliant . The translation is amazing. Great work @nreimers
from easynmt.
Happy to hear that :)
You can pass the show_progress_bar=True
to the fit method.
But yes, these models are quite slow on a CPU (see Readme). I can recommend to use Google Colab, there, you get a GPU for free which significantly speeds up the processing.
from easynmt.
Thank you @nreimers :). sure i will check in with Google Colab. But not sure if we are allowed to use Google Colab for official purpose.
from easynmt.
@sriprad
A docker container will soon be published. This will make it easy to run it (as long as you have docker installed).
from easynmt.
@nreimers thank you . Is there a possibility of extending this to convert word doc? or lengthy contracts please?
from easynmt.
@sriprad
Translating word docs is quite difficult, as docx is quite a complex format. Further, the content is mixed together with style & format commands. So extracting the text, translating it, and putting it back to a valid a nicely formatted word document is non-trivial.
from easynmt.
This is also happening to me in Arch Linux.
Edit: Fixed it using Python 3.7 instead of Python 3.9
from easynmt.
@nreimers If you can somehow use Okapi filters, you would be able to use most formats.
https://okapiframework.org/wiki/index.php/Filters
Or maybe, give higher priority to xliff files, since this is the main format in translation industry.
from easynmt.
Related Issues (20)
- some questions HOT 4
- Model with Docker Image HOT 1
- i suggest you add the tatoeba challenge models.
- cpu bottleneck : tokenization with a single worker HOT 2
- [ERROR] Exception in ASGI application HOT 1
- Onnx conversion HOT 2
- Question about OSError: Helsinki-NLP/opus-mt-fa-en is not a local folder HOT 1
- Local/offline use of additional Opus-MT models
- MBart50Converter requires the protobuf library but it was not found in your environment. HOT 2
- Do not translate word HOT 3
- (Big) transformer Tatoeba models
- Maybe http3 related bug or not? HOT 1
- Support for NLLB HOT 5
- Enable manually specifying the desired OPUS model? HOT 3
- Exception when trying to download Response 403 HOT 5
- Workflow for large datasets
- How to run test_translation_speed.py HOT 2
- EasyNMT
- Is there randomness in translation or does every translation lead to the exact same output?
- Finetune/Train on custom dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from easynmt.