Giter Club home page Giter Club logo

logisticpca's Introduction

Logistic PCA

Build Status CRAN_Status_Badge

logisticPCA is an R package for dimensionality reduction of binary data. Please note that it is still in the very early stages of development and the conventions will possibly change in the future. A manuscript describing logistic PCA can be found here.

logisticPCA projection

Installation

To install R, visit r-project.org/.

The package can be installed by downloading from CRAN.

install.packages("logisticPCA")

To install the development version, first install devtools from CRAN. Then run the following commands.

# install.packages("devtools")
library("devtools")
install_github("andland/logisticPCA")

Classes

Three types of dimensionality reduction are given. For all the functions, the user must supply the desired dimension k. The data must be an n x d matrix comprised of binary variables (i.e. all 0's and 1's).

Logistic PCA

logisticPCA() estimates the natural parameters of a Bernoulli distribution in a lower dimensional space. This is done by projecting the natural parameters from the saturated model. A rank-k projection matrix, or equivalently a d x k orthogonal matrix U, is solved for to minimize the Bernoulli deviance. Since the natural parameters from the saturated model are either negative or positive infinity, an additional tuning parameter m is needed to approximate them. You can use cv.lpca() to select m by cross validation. Typical values are in the range of 3 to 10.

mu is a main effects vector of length d and U is the d x k loadings matrix.

Logistic SVD

logisticSVD() estimates the natural parameters by a matrix factorization. mu is a main effects vector of length d, B is the d x k loadings matrix, and A is the n x k principal component score matrix.

Convex Logistic PCA

convexLogisticPCA() relaxes the problem of solving for a projection matrix to solving for a matrix in the k-dimensional Fantope, which is the convex hull of rank-k projection matrices. This has the advantage that the global minimum can be obtained efficiently. The disadvantage is that the k-dimensional Fantope solution may have a rank much larger than k, which reduces interpretability. It is also necessary to specify m in this function.

mu is a main effects vector of length d, H is the d x d Fantope matrix, and U is the d x k loadings matrix, which are the first k eigenvectors of H.

Methods

Each of the classes has associated methods to make data analysis easier.

  • print(): Prints a summary of the fitted model.
  • fitted(): Fits the low dimensional matrix of either natural parameters or probabilities.
  • predict(): Predicts the PCs on new data. Can also predict the low dimensional matrix of natural parameters or probabilities on new data.
  • plot(): Either plots the deviance trace, the first two PC loadings, or the first two PC scores using the package ggplot2.

In addition, there are functions for performing cross validation.

  • cv.lpca(), cv.lsvd(), cv.clpca(): Run cross validation over the rows of the matrix to assess the fit of m and/or k.
  • plot.cv(): Plots the results of the cv() method.

logisticpca's People

Contributors

andland avatar wrathematics avatar

Watchers

Shubham Pachori avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.