Giter Club home page Giter Club logo

Comments (4)

yuvalpinter avatar yuvalpinter commented on May 20, 2024 1

Hi,
Apologies for the late response.

  1. The bugs you mention are a result of the utils file being moved up to the main directory, followed by a new util file being opened. I will think about how best to fix this (for now you might as well just copy over the utils file into the directory). You're right about the wb line.
  • --vocab is indeed a text file containing all words you want to know the predicted embeddings for, if you want to plug it into your model as a preprocessing step (you can always just load a Mimick model into your downstream application and call it on-the-fly as well).
  • all_from_mimick flag asks if you want all embeddings for --vocab words (including in-vocab) to be inferred from Mimick; the default is to copy over any in-vocab words from the original dictionary you're training the Mimick model from.
  • --normalized-targets normalizes the input embeddings before training happens, as you hypothesized. In my experiments I did not encounter any major effect of this flag.
  • Mimick is constrained to predicting embeddings in the same dimensionality as the input embeddings. If you want to change it, the best way would be to change their dimensionality accordingly (e.g. by some projection, or PCA).

from mimick.

yuvalpinter avatar yuvalpinter commented on May 20, 2024

Hi Priyansh, does closing the issue mean you figured out these points? I would be glad to help if not.

from mimick.

Priyansh2 avatar Priyansh2 commented on May 20, 2024

@yuvalpinter Actually I by mistake cleared off my issue related content while editing and then saved it. I am rewriting it here. While training using 'model.py', what should i give it to the --vocab ?. Is this option take my rare word file which were not present in the vocabulary of my training data. If so, then what is the use of --all_from_mimick as it is written that setting it "ON", the vectors in original training set are overriden by Mimick-generated vectors. I mean can't we give training data words along with my rare word file or just only these words . What will happen in both case. Also, the option :- --normalized-targets says that if toggled, train on normalized vectors from set. Does this mean that it will normalised the vectors before training happens. Can you elaborate on this option usage. Moreover, regarding the dimensionality of word vectors, i have and want after mimick algorithm the word vector's dimension to be x (lets say 100). So what needs to be changed for this ?

from mimick.

Priyansh2 avatar Priyansh2 commented on May 20, 2024

@yuvalpinter There are some coding bugs which i wrote below. Kindly fix it.

  1. In make-dataset.py file (inside mimick directory) on line 19, utils should be changed to util and inside util.py following should be added :- import codecs,numpy as np and code corresponding to functions :- read_text_embs and read_pickle_embs should be added.

  2. In above file in line 90, the file should be pickled in "wb" format rather "w". In my case it throws an error.

from mimick.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.