Comments (7)
Hi! Your project looks really interesting, I should try it out.
At first I thought that pymorphy2 might be enough as well, but after reading the paper I realized that it apparently does not consider the context of words, so unlike Spacy it can't detect plural/singular for words where this depends solely on context (like лица), which would reduce the accuracy of the stresser by a lot. So I probably can't do without Spacy.
I really want to publish the module on Pypi, but first I planned to rework the database to make it smaller (right now it is needlessly large) and optimize the performance. And probably add some additional data. But it will definitely happen.
from add-stress-to-epub.
What is the database? Does it generate a database beforehand, and then use it to label word stress?
With a dependency on spacy and its model, this seems too large to be packaged locally into VocabSieve, especially since it is not a Russian-specific tool. However, it would be great if I can convert this to a Flask API (and host it on a server) and then allow VocabSieve to query it to label stress on sentences and words.
from add-stress-to-epub.
Also, is there a reason why fb2 cannot be supported directly? It is simply an XML file with the text in it, arguably much simpler than epub.
from add-stress-to-epub.
You are right about the FB2 support, looks reasonable, I have added it to my TODO list.
The database is being generated by my other project (You can find the stabler tested version in the releases here I think). I use this database also to create a Stardict dictionary (link in this post), which your program supports if I read correctly. So it would probably be really cool if this all worked together.
Hosting it on a server is a good idea. Future versions will probably be faster as well.
from add-stress-to-epub.
Interesting. I have also made one such dictionary from the kaikki.org dump, though a simple version with only the definitions (no examples)
VocabSieve does not support HTML in the definition though.
The dictionary I extracted is here: https://freelanguagetools.org/wikt-kaikki-ru.json
Also, there seems to be significant overlaps in the work we do :-)
It would be great if you join the chat on https://github.com/FreeLanguageTools/vocabsieve on Matrix or Telegram.
from add-stress-to-epub.
Yeah, the Kaikki data is really great. In my version I tried to get all the inflections and to link them properly up with the definitions (which sometimes is complicated when you have links that you would have to click several times to get at the original definition in Wiktionary, like with some diminutives). I also spent quite a long time trying to add the OpenRussian data, which has some additional words. I too don't have examples or parts of speech though.
I probably could replace the HTML by stuff like \n newlines, that should not take that much time.
I joined your chat 👍
from add-stress-to-epub.
I didn't get to updating the dictionaries yet, so it might download old versions of them, but the package has been installable through pip install git+https://github.com/Vuizur/add-stress-to-epub
for a while now.
I thought about it a bit and in principle it should be pretty easy to host everything on a server, I only have no experience with Docker stuff and getting it on a VPS and everything, so that would be the largest challenge.
And I didn't get around to optimizing it fully yet, but at least I performed some benchmarks and there really does not appear to exist a better option that I am aware of 🙂.
from add-stress-to-epub.
Related Issues (12)
- "денежным" is not in word list
- Add links to proper dictionaries
- KeyError: ''AUX" HOT 2
- Error when importing HOT 4
- One-syllable words from OpenRussian are accented
- Set stress for words that have two different possibilities
- Любом is ambiguous for some reason
- Parse comparative forms from Wiktionary starting with paranthesis
- Use a CSS hack to add accents to preserve selected words HOT 4
- This project is awesome HOT 1
- Finish benchmarks
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from add-stress-to-epub.