yuvalpinter / mimick Goto Github PK
View Code? Open in Web Editor NEWCode for Mimicking Word Embeddings using Subword RNNs (EMNLP 2017)
License: GNU General Public License v3.0
Code for Mimicking Word Embeddings using Subword RNNs (EMNLP 2017)
License: GNU General Public License v3.0
I ran a mimick algorithm on a small data set and it is taking 5 mins on CPU. But when I run the same on GPU it is taking 40 mins to finish one epoch. Is there a way to fix this?
Can we increase the batch size of this?
Hi, is it possible to integrate it with transformer-based models, such as a variation of BERT?
Hi I was trying out your demo when I run into error at line 166, Mimick/mimick/model.py
trainer = dy.MomentumSGDTrainer(model.model, options.learning_rate, 0.9, 0.1)
The error message shows that MomentumSGDTrainer takes 3 parameters, as in
MomentumSGDTrainer(ParameterCollection &m, real learning_rate = 0.01, real mom = 0.9)
wondering is there a version conflict? But I installed the v2.0 dynet, following your README.
So what is this last parameter 0.1? Do I just simply delete it?
Thanks in advance!
Currently, all of the dynet
code for LSTMs in the tagging task code (model.py
) is not making use of the initial_state()
method. This entails:
char2tag
or both
mode keeps its state across words along the entire dataset. Within sentences, this means there is also backprop across word boundaries since there's no call to renew_cg()
. This effect may be insignificant due to the <PAD>
characters, but I don't know for sure.Finally got round to experimenting with Mimick only to discover that it targets Python 2 only. (Insert rant that Python 3 is already a decade old.) Do you by any chance plan to add support for Python 3?
Thanks!
The variable in_vocab
is set to zero on line num. 80 in the make_dataset.py
file. As a result, when there are oov words in the vocab file, the words in training count
is always zero in the output.
They're dead references, back from the ancient version of the code that had a CRF w/ Viterbi decoding on top of the word-leve LSTM.
Since the upgrade to DyNet 2.0, training loss doesn't seem to converge on the Mimick algorithm (fine in tagger code; models also make sense).
This seems to be due to the change in learning rate behavior in DyNet's trainers. The current implementation here uses AdamTrainer, but SGDTrainer and AdaGradTrainer have the same issues.
This line should not be concatenating char_embs[-1]
, but rather dy.concatenate([char_embs[-1][:h],char_embs[0][h:]])
for the appropriate h
.
The in-model Mimick code is fine, since it uses separate fwd and bwd char-level models rather than dynet's built-in BiRNNBuilder
. The word-level BiLSTM is fine because it's sequence prediction.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.