Comments (4)
Hi,
Apologies for the late response.
- The bugs you mention are a result of the
utils
file being moved up to the main directory, followed by a newutil
file being opened. I will think about how best to fix this (for now you might as well just copy over theutils
file into the directory). You're right about thewb
line.
--vocab
is indeed a text file containing all words you want to know the predicted embeddings for, if you want to plug it into your model as a preprocessing step (you can always just load a Mimick model into your downstream application and call it on-the-fly as well).all_from_mimick
flag asks if you want all embeddings for--vocab
words (including in-vocab) to be inferred from Mimick; the default is to copy over any in-vocab words from the original dictionary you're training the Mimick model from.--normalized-targets
normalizes the input embeddings before training happens, as you hypothesized. In my experiments I did not encounter any major effect of this flag.- Mimick is constrained to predicting embeddings in the same dimensionality as the input embeddings. If you want to change it, the best way would be to change their dimensionality accordingly (e.g. by some projection, or PCA).
from mimick.
Hi Priyansh, does closing the issue mean you figured out these points? I would be glad to help if not.
from mimick.
@yuvalpinter Actually I by mistake cleared off my issue related content while editing and then saved it. I am rewriting it here. While training using 'model.py', what should i give it to the --vocab
?. Is this option take my rare word file which were not present in the vocabulary of my training data. If so, then what is the use of --all_from_mimick
as it is written that setting it "ON", the vectors in original training set are overriden by Mimick-generated vectors. I mean can't we give training data words along with my rare word file or just only these words . What will happen in both case. Also, the option :- --normalized-targets
says that if toggled, train on normalized vectors from set. Does this mean that it will normalised the vectors before training happens. Can you elaborate on this option usage. Moreover, regarding the dimensionality of word vectors, i have and want after mimick algorithm the word vector's dimension to be x (lets say 100). So what needs to be changed for this ?
from mimick.
@yuvalpinter There are some coding bugs which i wrote below. Kindly fix it.
-
In
make-dataset.py
file (inside mimick directory) on line 19,utils
should be changed toutil
and insideutil.py
following should be added :-import codecs,numpy as np
and code corresponding to functions :-read_text_embs
andread_pickle_embs
should be added. -
In above file in line 90, the file should be pickled in "wb" format rather "w". In my case it throws an error.
from mimick.
Related Issues (11)
- params in MomentumSGDTrainer HOT 4
- Code runs very slow on GPU HOT 5
- Compatibility with Python 3 HOT 2
- Call `initial_state()` on all levels of BiLSTM HOT 1
- Char2Tag takes wrong representations from backward LSTM
- Add early stopping
- Remove `START_TAG` and `STOP_TAG` from `model.py` and `make_dataset.py`
- Transformer Models HOT 1
- Find best Trainer
- in_vocab count is set to zero HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mimick.