Giter Club home page Giter Club logo

wombat's Introduction

Note that in its current state the code is in a bit of a mess. A lot of remnants of some related expiriments are left in the code... Substantial refactoring is needed.

I've started refactoring heavily in the v2 branch, but it's currently not actually working/building in v2. Please post issues on github and I'll try to take a look, but my time on this is very limited :(


word matrix batches



This code was developed as part of my Master's thesis research.

A paper is available that describes the methods in this package on IEEE:
Efficient and accurate Word2Vec implementations in GPU and shared-memory multicore architectures

The work builds upon ideas presented in BIDMach and further refined in Intel's pWord2Vec.

This code supports:

  • Both CPU and GPU matrix-based fast Word2Vec
  • Both SkipGram and Hierarchical Softmax Word2Vec architectures

This code does not support:

  • Distributed computing techniques (see pWord2Vec)
  • CBOW Word2Vec architectures

Installation

The make file (hackishly) supports g++, CUDA or ICPC.

Different source files are used for different compilers.

To compile, use make:

For g++:

make

For CUDA:

make cuda

For MKL support and ICPC:

make intel

Once made, you can use the scripts in /scripts to run test programs:

Testing g++ or icpc compiled program:

./cpu.sh [num threads]

Testing CUDA (requries 6.0 CUDA capability):

./cuda.numCPUT-batchSize-batchesPerT.sh  [num cpu threads] [batch size] [batches per thread]

For all programs, to get test data:

./get-data.sh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.