masakhane-io / masakhane-community Goto Github PK
View Code? Open in Web Editor NEWAll our community docs! Start here! Lets put Africa on the NLP Map
License: MIT License
All our community docs! Start here! Lets put Africa on the NLP Map
License: MIT License
They're currently missing from publications.md.
Add https://www.openslr.org/32/ to the speech datasets in
https://github.com/masakhane-io/masakhane-community/blob/master/list-of-datasets.md
I'll grant access to the relevant spreadsheet to the assignee :)
Sometimes it's nice to get someone in to come do a talk about a topic that is relevant to the community. If you have something you're specifically interested in learning more about, please post the Topic and a brief motivation as to why you think it would be important to the community.
As the community, please ๐ on the topics that are most important for you right now (try not upvote everything :P That won't help us prioritize which talks to organise first)
For example, /content/drive/My Drive/masakhane/$src-$tgt-$tag
can cause issues, but also the following situation caused an error for me:
source_file = f"/content/drive/My Drive/Research/Hani Machine Translation/hni_story_corpus/v2/hani_story_corpus_train.{source_language}"
target_file = f"/content/drive/My Drive/Research/Hani MachineTranslation/hni_story_corpus/v2/hani_story_corpus_train.{target_language}"
# They should both have the same length.
! wc -l $source_file
! wc -l $target_file
Mitigations we could do:
Actually, it seems you can just change from using My Drive
to MyDrive
paths, which helps a lot so long as there aren't spaces elsewhere in the path, e.g. in my case where Hani Machine Translation
was in the path to train.eng
and train.hni
For example
! wc -l "$source_file"
instead of wc -l $source_file
and `
! head "$source_file"* instead of ! head "$source_file"*
but this doesn't completely solve it, and can get complicated when we've got some of the more complex cases later in the notebook, like
!cp -r joeynmt/models/${src}${tgt}_transformer/* "$gdrive_path/models/${src}${tgt}_transformer/"
or within the yaml file:
#load_model: "{gdrive_path}/models/{name}_transformer/1.ckpt" # if uncommented, load a pre-trained model from this checkpoint
Add a section that checks all the paths for white spaces and warns the user that, maybe it would be easier if they just removed them?
We could rewrite a lot of these to use pathlib
See also pjreddie/darknet#1672 and https://stackoverflow.com/questions/56640534/cannot-open-train-txt-with-white-space-my-drivehe
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.