Model arguments confusion!! about mimick HOT 4 CLOSED

yuvalpinter commented on May 20, 2024

Model arguments confusion!!

from mimick.

Comments (4)

yuvalpinter commented on May 20, 2024 1

Hi,
Apologies for the late response.

The bugs you mention are a result of the utils file being moved up to the main directory, followed by a new util file being opened. I will think about how best to fix this (for now you might as well just copy over the utils file into the directory). You're right about the wb line.

--vocab is indeed a text file containing all words you want to know the predicted embeddings for, if you want to plug it into your model as a preprocessing step (you can always just load a Mimick model into your downstream application and call it on-the-fly as well).
all_from_mimick flag asks if you want all embeddings for --vocab words (including in-vocab) to be inferred from Mimick; the default is to copy over any in-vocab words from the original dictionary you're training the Mimick model from.
--normalized-targets normalizes the input embeddings before training happens, as you hypothesized. In my experiments I did not encounter any major effect of this flag.
Mimick is constrained to predicting embeddings in the same dimensionality as the input embeddings. If you want to change it, the best way would be to change their dimensionality accordingly (e.g. by some projection, or PCA).

from mimick.

yuvalpinter commented on May 20, 2024

Hi Priyansh, does closing the issue mean you figured out these points? I would be glad to help if not.

from mimick.

Priyansh2 commented on May 20, 2024

@yuvalpinter Actually I by mistake cleared off my issue related content while editing and then saved it. I am rewriting it here. While training using 'model.py', what should i give it to the --vocab ?. Is this option take my rare word file which were not present in the vocabulary of my training data. If so, then what is the use of --all_from_mimick as it is written that setting it "ON", the vectors in original training set are overriden by Mimick-generated vectors. I mean can't we give training data words along with my rare word file or just only these words . What will happen in both case. Also, the option :- --normalized-targets says that if toggled, train on normalized vectors from set. Does this mean that it will normalised the vectors before training happens. Can you elaborate on this option usage. Moreover, regarding the dimensionality of word vectors, i have and want after mimick algorithm the word vector's dimension to be x (lets say 100). So what needs to be changed for this ?

from mimick.

Priyansh2 commented on May 20, 2024

@yuvalpinter There are some coding bugs which i wrote below. Kindly fix it.

In make-dataset.py file (inside mimick directory) on line 19, utils should be changed to util and inside util.py following should be added :- import codecs,numpy as np and code corresponding to functions :- read_text_embs and read_pickle_embs should be added.
In above file in line 90, the file should be pickled in "wb" format rather "w". In my case it throws an error.

from mimick.

Model arguments confusion!! about mimick HOT 4 CLOSED

Comments (4)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent