Giter Club home page Giter Club logo

numer's Introduction

This is a collection of Erlang NIF functions for BLAS operations on vectors and matrices with CUDA. Both are natively implemented as Thrust host/device vectors and special "buffer" classes are used to transfer them from Erlang to CUDA and back.

Installation on Windows x64

git clone git://github.com/vascokk/NumEr.git
cd NumEr

All the commands from this point forward should be executed in a VC++ 10.0 command-line window

set TARGET_ARCH=x64

Make sure you have the following bat file:

C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64\vcvars64.bat

with this line inside:

call "C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\SetEnv.cmd" /x64

Execute the above bat file and compile:

mkdir priv
rebar compile
rebar eunit suites=numer_helpers_tests

You should see:

==> numer (eunit)
======================== EUnit ========================
module 'numer_helpers_tests'
  numer_helpers_tests: gemm_test...[0.421 s] ok
  numer_helpers_tests: gemm_2_test...[0.375 s] ok
  numer_helpers_tests: sum_by_cols_test...[0.375 s] ok
  numer_helpers_tests: gemv_test...[0.437 s] ok
  numer_helpers_tests: gemv_2_test...[0.421 s] ok
  numer_helpers_tests: gemv_3_test...[0.405 s] ok
  numer_helpers_tests: saxpy_test...[0.390 s] ok
  numer_helpers_tests: smm_test...[0.390 s] ok
  numer_helpers_tests: m2v_test...ok
  numer_helpers_tests: v2m_test...ok
  numer_helpers_tests: transpose_test...[0.390 s] ok
  numer_helpers_tests: sigmoid_test...[0.390 s] ok
  numer_helpers_tests: sigmoid_2_test...[0.453 s] ok
  numer_helpers_tests: tanh_test...[0.390 s] ok
  numer_helpers_tests: tanh_2_test...[0.406 s] ok
  numer_helpers_tests: log_test...[0.390 s] ok
  numer_helpers_tests: log_2_test...[0.390 s] ok
  numer_helpers_tests: ones_test...ok
  numer_helpers_tests: ones_2_test...ok
  numer_helpers_tests: zeros_test...ok
  numer_helpers_tests: zeros_2_test...ok
  [done in 6.349 s]
=======================================================
  All 21 tests passed.

Mac OS X

Strictly follow NVIDIA Mac OS X Getting Started and set env variables:

export PATH=/Developer/NVIDIA/CUDA-5.0/bin:$PATH
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-5.0/lib:$DYLD_LIBRARY_PATH

Compile and run eunit:

mkdir priv
./rebar compile
./rebar eunit suites=numer_helpers_tests

TODO: Linux

Operations with vectors and matrices

% this is a row-major matrix:
A = [[4.0,6.0,8.0,2.0],[5.0,7.0,9.0,3.0]].

%this is a vector:
X = [2.0,5.0,1.0,7.0].

% create a CUDA context and transfer to "buffers"
{ok, Ctx} = numer_nifs:new_context().
{ok, Buf_A} = numer_nifs:new_matrix_float_buffer(Ctx, A, ?ROW_MAJOR).
{ok, Buf_X} = numer_nifs:new_float_buffer(Ctx).
numer_nifs:write_buffer(Buf_X, X).

As you see one of the parameters in the matrix buffer is "?ROW_MAJOR". It is kinda borrowed from Boost library, but not yet fully implemented in NumEr. Currently only row-major matrices are supported. However, under the hood in the Thrust vectors the numbers are stored in column-major format. I chose to do it in this way, because the CUBLAS library is using column-major storage - being a derivative of the FORTRAN BLAS library.

There are several modules, which are wrappers for the NIF functions, like: numer_blas.erl - for BLAS operations, numer_buffer.erl - for operations with buffers (new, delete, read, write), etc.

Using numer_buffer module, the above example will look like:

 {ok, Ctx} = numer_context:new().
 {ok, Buf_A} = numer_buffer:new(Ctx, matrix, float, row_major, A).
 {ok, Buf_X} = numer_buffer:new(Ctx, float).
 numer_buffer:write(Buf_X, X).

BLAS GEMV example:

%  GEMV: y <- α op ( A ) x + β y
gemv_test()->
    {ok, Ctx} = numer_context:new(),
    A = [[4.0,6.0,8.0,2.0],[5.0,7.0,9.0,3.0]],
    _m = 2, %rows A
    _n = 4, %columns A
    _alpha = 1.0,
    _beta = 0.0,
    X = [2.0,5.0,1.0,7.0],
    Y = [0.0, 0.0], 
    {ok, Buf_A} = numer_buffer:new(Ctx, matrix, float, row_major, A),
    {ok, Buf_X} = numer_buffer:new(Ctx, float),
    numer_buffer:write(Buf_X, X),
    {ok, Buf_Y} = numer_buffer:new(Ctx, float),
    numer_buffer:write(Buf_Y, Y),
    ok = numer_blas:gemv(Ctx, no_transpose , _m, _n, _alpha, Buf_A, Buf_X, _beta, Buf_Y),
    {ok, [60.0,75.0]} = numer_buffer:read(Buf_Y),
    ok = numer_buffer:destroy(Buf_A),
    ok = numer_buffer:destroy(Buf_X),
    ok = numer_buffer:destroy(Buf_Y),
    ok = numer_context:destroy(Ctx).

Using "helpers" module

Since using buffer operations can make the code awkward to read, there is also a helper module - numer_helpers.erl, wich can be used for prototyping the algorithms. WARNING - do not use this module in iterative algorithms (e.g. Machine Learning). Use it for prototyping or one-off calculations with big matrices/vectors. Here is how:

gemv_2_test()->
    A = [[4.0,6.0,8.0,2.0],[5.0,7.0,9.0,3.0]],
    _alpha = 1.0,
    _beta = 0.0,
    X = [2.0,5.0,1.0,7.0],
    Res = numer_helpers:gemv(no_transpose , _alpha, A, X, _beta, []),
    ?assertEqual([60.0,75.0], Res).

It is much more readable and useful for one-off calculations, but in the ML "training" stage (with hundreds of iterations) it will be unusable, due to the multiple buffer transfers.

Machine Learning with Erlang & CUDA - "Logistic Regression"

There is an implementation of the Logistic Regression (without regularization) learning function with Gradient Descent optimization. Take a look at learn_buf() and gradient_descent() in the numer_logreg.erl module and run the eunit test:

Windows 7, GPU - Quadro FX 1800M 1GB:

rebar eunit suites=numer_logreg_tests tests=learn_buf2_test

NOTICE: Using experimental option 'tests'
    Running test function(s):
      numer_logreg_tests:learn_buf2_test/0
======================== EUnit ========================
numer_logreg_tests: learn_buf2_test...test/numer_logreg_tests.erl:108:<0.187.0>:
 Learned:[-0.02788488380610943,0.010618738830089569,6.68175402097404e-4]
[3.557 s] ok
=======================================================
  Test passed.

In learn_buf2_test() all the "buffers" needed are created upfront and passed to the NIFs in order to avoid multiple buffer creations and transfers during the iterations.

The same test with MacBook Pro, with GeForce 9400M 256 MB:

NOTICE: Using experimental option 'tests'
    Running test function(s):
      numer_logreg_tests:learn_buf2_test/0
======================== EUnit ========================
numer_logreg_tests: learn_buf2_test...test/numer_logreg_tests.erl:108:<0.197.0>: 
 Learned:[-0.02788488380610943,0.010618738830089569,6.68175402097404e-4]
[0.430 s] ok
=======================================================
  Test passed.

The numer_ml.erl module contains a C++ implementation (via single NIF function) of Logistic Regression, while the numer_logreg.erl is using numer_blas.erl module. The first one I used to compare the speed between the "native" and "using NumEr modules" implementations. There is considerable difference between the two on Windows (using NumEr modules - 3.5 sec, using single NIF - under 1 sec) and almost no difference on Mac (both - under 0.5 sec).

The project is still a work in progress and needs a lot of polishing and if anyone is willing to give a hand I'll be more than happy. Any suggestions to improve the framework are also very welcome.

numer's People

Contributors

vascokk avatar

Watchers

Josh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.