Implement SkipGram with Negative Sampling, Subsampling and SubwordEmbedding(FastText) using character n-grams in word2vec.py
referenced by Piotr Bojanowski∗, Edouard Grave∗, Armand Joulin, Tomas Mikolov, “Enriching Word Vectors with Subword Information”, ACL 2017
Korea University Information Retrieval(COSE 472) Assignment5
- this model use 2,3,4 and special grams
If you run "word2vec.py", you can train and test your models.
How to run
python word2vec.py [mode] [partition] [update_system] [sub_sampling]
-
mode
- "SG" for skipgram only
-
partition
- "part" if you want to train on a part of corpus (fast training but worse performance)
- "full" if you want to train on full corpus (better performance but very slow training)
-
update_system
- "NS" for Negative Sampling only
-
sub_sampling
- True or False
Result