Comments (3)
For what it's worth, in the past while working with Hillary, we talked about this a bit. In short, here is what I proposed:
- Make use of
readthedocs
(it is free for open source, support github integration, document versions and multi-lingual documentation):- Use a Github repo
common_voice/cv-documentation
and link it toreadthedocs
- Create multiple
localized branches
, make locale leads moderator, but keep final control for formatting etc. The content (translation) should be under locale moderators. - You update the documentation through
Github workflows
(issues & PRs) - Whenever you need to mention something in WebApp, Discourse, Matrix, just point to the documentation. That would help support the load a lot.
- Use a Github repo
- Divide the documentation as follows:
General info
on what is MCV, release timing, etcInfo on AI
, Voice AI, how it is done (from text corpora to models - short info and external links)User
documentation (for WebApp - which is where)Community Leads
documentation (mostly from the community handbook)Dataset user
documentation (structure)Developer
documentation (how it works internally)Terminology
(needed to explain concepts like "variant" to the general population)FAQ
(for questions like "How much would I record", and "How can I add bulk sentences" - add "see: link" for these, ...)Future
development & how to contribute to code- etc
I kept a rather detailed documentation in Turkish in discourse and try to keep it up-to-date:
https://discourse.mozilla.org/t/surec-dogrular-yanlislar-ve-veri-kumesinin-iyilestirilmesi/85938
You may like to g-translate that to see what I mentioned. Except for causal users who record 5-10 sentences, you need to give them introductory info on AI and voice-AI to show them why it is the way it is. You'll also see locale-specific information, such as where to check for spelling, borrowed words in that language, etc - that is the reason Pontoon fails and you need multi-lingual branches on github.
I hope this helps...
Edit: Added "Terminology".
from common-voice.
Related Issues (20)
- [BUG] Both ways of donating in CV not functional (android) HOT 1
- [BUG] validated_sentences.tsv for pa-IN is incomplete HOT 1
- LOCALISATION REQUEST: ISO-639-2/3 HOT 12
- [FR] Detail unvalidated text corpus status
- [BUG] reported.tsv has broken rows due to LF & TAB characters in sentence and reason fields HOT 2
- Rare letters in toki pona [BUG] HOT 4
- Create issues template for documentation updates or new docs needed HOT 2
- [BUG] Unable to modify e-mail address. HOT 2
- [FR] (suggestion) Make delta releases easily usable
- [DOCS] Removing discontinued platforms.
- [FR] Add missing major "sentence_domain"s
- Change language name of 'gom' to "Konkani (Romi)" HOT 2
- Multi-orthography for Konkani - linking sentences collected in the gom and knn datasets HOT 13
- [BUG] Delta for v10.0 & v11.0 are buggy and should be removed
- LOCALISATION REQUEST: nqo_Nkoo HOT 2
- [BUG] Should purge voted sentences in "review" from local storage
- [BUG] On changing the language on review page, sentences from previous language appear even after refresh HOT 3
- LOCALISATION REQUEST for Shan (ISO-639-3: shn) language
- Support bulk-ban or bulk-remove sentences HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from common-voice.