Giter Club home page Giter Club logo

word2vec-algorithm-from-scratch's Introduction

Word2Vec-Algorithm-From-Scratch

Write code to implement the skip-gram model of word2vec using python. You should write code to read input from a text file, train on the word2vec algorithm, save the word embeddings for each word, and finally validate the embeddings calculated by using cosine similarity.

Skip Gram

Steps (for guideline purposes only):

1) Build the corpus vocabulary
2) Apply pre-processing
3) Build one-hot encodings for target and context words
4) Build a neural network with two weight matrices W (between input layer
and the hidden layer) and W' (between the hidden layer and the output
layer). W' stodes the embeddings
5) Make number of neurons in the hidden layer equal to the size of the word vectors
6) Train the model with the target word as input and the output being the
probability of all the words in the vocabulary being the potential context words. Calculate the loss and update the weights
7) Get 2 word embeddings for each word by indexing into the input matrix W and the transpose of context matrix W'
8) Write a function to find cosine similarity between vectors of any two words

word2vec-algorithm-from-scratch's People

Contributors

kheem-dh avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.