Giter Club home page Giter Club logo

binmf's Introduction

Binary Matrix Factorization

This package performs low-rank factorization of sparse binary matrices. Model is based on minimization of hinge loss, and is fit through projected sub-gradient descent updates similar to the pegasos algorithm for SVM. At each iteration, it samples entries from the negative class (having a value of zero) at random with replacement, until having an equal number of missing and non-missing entries for each row in the original matrix.

Package is written in C with Python and R interfaces. Requires a C compiler and some BLAS library such as MKL or OpenBLAS (in Python, will try to use the same one that NumPy installation is using).

Computations are parallelized and speed (depending on sparsity) is in the same scale as implicit-ALS with the CG method as implemented in the implicit package.

Installation

  • Python Clone or download the repository and then install with setup.py, e.g.:
git clone https://github.com/david-cortes/binmf.git
cd binmf
python setup.py install

(It requires package findblas, can be installed with pip install findblas.)

  • R
devtools::install_github("david-cortes/binmf")

Usage

  • R See documentation (help(binmf::binmf)

  • Python

Package does not have a fully-fledged API, only a function that modifies already-initialized parameter matrices in-place.

Takes as input a sparse matrix in CSR format, and already-initialized factor matrices.

Will take the values in the non-zero entries as weights. If a uniform weighting is desired, the input matrix should have all values equal to 1.

Example:

import numpy as np
from scipy.sparse import csr_matrix

## Generating random sparse data
nrows = 100
ncols = 500 ## will be a lot slower when rows are not too sparse
sparsity = 0.1
np.random.seed(1)
dense_mat = (np.random.random(size=(nrows, ncols)) >= sparsity).astype('uint8')
sp_mat = csr_matrix(dense_mat)

## Initializing factor matrices
k = 5
np.random.seed(2)
row_fact = np.random.normal(size=(nrows, k))
col_fact = np.random.normal(size=(ncols, k))

from binmf import fit_spfact
fit_spfact(sp_mat, row_fact, col_fact, reg_param=1e-1, niter=100, nthreads=1) #adjust number of threads for your setup

## Prediction for entry (10, 15)
np.dot(row_fact[10], col_fact[15]) # If greater than zero, prediction is value 1, otherwise prediction is value 0

## Note: more iterations will do better in very large datasets

Can also be called directly from C and wrapped in other languages:

/* Main function
	A                       :  Already-initialized A matrix (model parameters)
	B                       :  Already-initialized B matrix (model parameters)
	dimA                    :  Number of rows in matrix A
	dimB                    :  Number of rows in matrix B
	k                       :  Dimensionality of low-rank approximation (number of columns in A and B)
	nnz                     :  Number of non-zero entries in the X matrix
	X_indptr, X_ind,  Xr    :  X matrix (dim A x B) in row-sparse format - values indicate weights, Xr is ignored when there's no weights
	reg_param               :  Strength of l2 regularization
	niter                   :  Number of sub-gradient iterations
	projected               :  Whether to apply a projection step at each update (recommended)
	nthreads                :  Number of parallel threads to use
*/
void psgd(double *A, double *B, size_t dimA, size_t dimB, size_t k, size_t nnz,
	size_t *X_indptr, size_t *X_ind, double *Xr,
	double reg_param, size_t niter, int projected, int nthreads)

References

  • Pan, Rong, et al. "One-class collaborative filtering." Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 2008.
  • Shalev-Shwartz, Shai, et al. "Pegasos: Primal estimated sub-gradient solver for svm." Mathematical programming 127.1 (2011): 3-30.

binmf's People

Contributors

david-cortes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.