Comments (13)
Some support was added to LibreTranslate in LibreTranslate/LibreTranslate#12
from argos-translate.
Recently I saw an article about the comparison of language detection tools. FastText can be a viable option instead of langdetect, because it is lot faster.
We have an another option which can be quite accurate in case of longer texts: N-grams. There are predetermined n-grams for all supported languages and it is easy the generate new lists. The advantages of using this approach is that the models are really small, the implementation is easy and we it does not need any extra library. In any case, if help needed, I can implement these.
from argos-translate.
This is the way to do it for core Argos Translate, the only thing I might change is "detect" instead of "auto-detect".
from argos-translate.
This would be pretty useful for any automated translation mechanism!
from argos-translate.
@hollorol If you can do this with jus the Python standard library a pull request would be appreciated.
from argos-translate.
LibreTranslate already has a system for language detection so this hasn't been a priority. My plan was to use CTranslate2 models to map input text into a language code but open to suggestions.
from argos-translate.
Interesting, I think using the same pipeline would be a good long term solution but this could be a something to do in the meantime. One issue with using the pipeline is that as soon as a we add a new language we have to also retrain the detector. This would probably also be lighter weight vs a 100MB model file. The main interest for this is currently from LibreTranslate so if someone wants to extend the Python API to use this that would be welcome and then the API could be reimplemented in the future if it makes sense.
from argos-translate.
Not everyone uses LibreTranslate.
from argos-translate.
The way Argos Translate currently works it would be a breaking change to add this but I'm planning to add it in the next major version. It would also be possible to add language detection to the GUI (which is in a separate repo) using a third party library like Lingua.
from argos-translate.
@PJ-Finlay, I'll do it only for the cli, because I don't use the GUI part of the program; but I guess after it, adapt it to the GUI will be easy.
from argos-translate.
That sounds good, it should probably be it's own file/module that can be integrated into the CLI.
from argos-translate.
Lingua might be useful for this. Lingua is made with python, works with short strings, works offline, and licensed under Apache-2.0.
from argos-translate.
I could see it being used like a special input that would trigger the language detection. Syntax could be something like this:
echo "Text to translate" | argos-translate --from-lang auto-detect --to-lang en
from argos-translate.
Related Issues (20)
- ArgosTransate for python doesn't translate anything from English to French HOT 1
- Argos_translate no longer works offline? HOT 1
- multilingual-rag using argos-translate HOT 1
- The difficulty of generating a proper LLM for translation from web scraping...
- Feature Request: Allow installing without `nvidia-cuda` packages. HOT 3
- Pipe mode, line-by-line (stdin/stdout)
- Support for tamil language
- no pip install
- Does not support Python 3.12 HOT 1
- How to switch usage to GPU instead of CPU? HOT 1
- BUG: pip install argostranslate does not work on Python 3.12 HOT 3
- Problem in Greek Translation
- How to use custom model?
- Stanza version >=1.1.1 breaks a few langauges HOT 2
- Support for Brazilian Portuguese
- [BUG] SIGSEGV
- Title
- Upgrade SentencePiece for Windows Support on Python 3.12+ HOT 2
- Translation of pre-split and pre-tokenized sentences
- Translation from Spanish -> English missing words
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from argos-translate.