Giter Club home page Giter Club logo

ml-algorithms's Introduction

ML-algorithms

Implementation of several supervised and non-supervised machine learning algorithms using both probabilistic and non-probabilistic approaches

This repository provides python implementations for the following machine learning algorithms:

  1. Supervised Learning
    • Regression: L2 regularized least squares linear regression (ridge regression), gradient descent and active learning procedure for linear regression
    • Classification: Perceptron and K-class Bayes classifier
  2. Unsupervised Learning
    • Clustering: K-means and EM for Gaussian mixture models
    • Probabilistic Matrix Factorization

Supervised Learning

Seeks to learn a function that, given an input data x, predicts the corresponding value. These algorithms are based on a function mapping from input to output.

Regression

Using a set of inputs, will predict a real-valued output.

Ridge Regression (non-probabilistic)

L2 regularized least squares linear regression. Contrains the model parameters by adding two terms:

  • Lambda > 0: regularization parameter
  • g(w) > 0: penalty function encouraging the desired properties

In this case, g(w) = ||w||2, addressing high variance issues.

Gradient Descent (non-probabilistic)

Iterative optimization method which aims to get the best parameters of the linear model by simultaneously updating all weights of a model. It uses a learning rate alpha which is set at the beggining and will determine the size of the steps at each iteration. It converges to a local optima.

Active Learning (probabilistic)

Probabilistic learning strategy which aims to model the predictive distribution for unknown (test) data given known (train) data and the posterior distribution. Successively predicts unmeasured data starting by those points where a higher variance is estimated. During this iterative process it continuously updates the posterior with the new observations learnt.

Classification

Using a set of inputs, will predict a discrete label.

Perceptron (non-probabilistic)

Most simple neural network and classification method used to split perfectly separable data. It is a linear classification method which assigns the data to one of two classes, performing a linear combination of features and using the sign as activation function. It learns by adjusting the weights (add or substract x) if the hyperplane mistakenly classifies current x evaluated. It converges only with perfectly separable data and to the first hyperplane that correctly splits all training data points.

K-class Bayes classifier (probabilistic)

Plug-in classifier (discriminative model) which aims to predict for a given input x, the most probable label conditioned on x. It uses the Bayes rule to make this predictions based on the class prior (distribution on y) and the class conditional disctibution of X. Both this distributions are approximated from available data. It can produce either a decision boundary or region.

Unsupervised Learning

Seeks to learn the underlying structure of an input data x.

Clustering

K-Means (non-probabilistic)

Simple hard-clustering algorithm. Given input data x, it aims to predict vector c of cluster assignments and K mean vectors mu.

To perform this assignments it minimizes the Euclidean distance from each data point to the centroid of its assigned cluster.

EM for Gaussian Mixture Models (probabilistic)

Soft-clustering algorithm. Given input data x, it breaks data across the K clusters intelligently, accounting for borderline cases.

To estimate this cluster probability distribution, the Expectation-Minimization (EM) algorithm is used in combination with K Gaussian generative distributions.

Probabilistic Matrix Factorization

Assuming a low rank rating data matrix is provided, this alorithm aims to factorize it into the product of two (also) low rank matrices U and V; reducing computational complexity.

A distribution on the data is assumed considering user and object locations to be generated from the same (generative model) distribution. To estimate matrices U and V, the log joint likelihood is maximized.

Running the algorithms

All implementations are modular and can be applied to any dataset provided through command line. When Xy are given as an input, y response variable is expected to be in the last column. To run the different modules use the following commands:

Regression

$ python3 Regression.py lambda sigma2 X_train.csv y_train.csv X_test.csv

Gradient Descent

$ python3 GradientDescent.py Xy_train.csv output_filename.csv

Perceptron

$ python3 Perceptron.py Xy_train.csv output_filename.csv

K-class Bayes classifier

$ python3 Classification.py X_train.csv y_train.csv X_test.csv

Clustering

$ python3 Clustering.py X.csv

Probabilistic Matrix Factorization

$ python3 ProbabilisticMatrixFactorization.py Ratings.csv

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.