Giter Club home page Giter Club logo

pyipca's Introduction

pyIPCA

a python package for Incremental Principal Component Analysis(IPCA) that conforms to the scikit-learn API.

Installation

python setup.py install

Dependencies

The IPCA algorithms require scikit-learn and numpy. The examples require matplotlib and OpenCV(the cv2 module, which seems to be installed along with scikit-learn these days ...?)

Contributing

Submit a pull request!

pyipca's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pyipca's Issues

Can CCIPCA be used to do partial_fit and transform incrementally in one iteration loop on a large dataset?

Hi @kevinhughes27,

I have a dataset that has 2 million-plus samples. Can I do partial_fit and transform in the same iteration loop as (1) or the partial_fit needs to be called separately before transform as (2)?

(1)

for example in examples:
 ccipca.partial_fit(example)
 ccipca.transform(example)

(2)

for example in examples:
 ccipca.partial_fit(example)

for example in examples:
 ccipca.transform(example)

Problematic results

I made a toy example to test your code, but I guess it is somewhat incorrect. The following is the code that I used.

under ipython

from sklearn.decomposition import PCA
from pyIPCA import CCIPCA, Skocaj_IPCA, Hall_IPCA
import numpy as np

make toy data

data = np.random.rand( 10000, 10 ) * 100;

use sklearn pca

ncomp = 2;
pca = PCA( n_components = 2 );
pca.fit( data );
data_pca = pca.transform( data );
pyplot.scatter( data_pca[:,0], data_pca[:,1]),pyplot.title('Sklearn-PCA'), pyplot.show()
sklearn_pca

use CCIPCA

ipca = CCIPCA( n_components = 2 );
ipca.fit( data );
idata_pca = ipca.transform( data );
pyplot.scatter( idata_pca[:,0], idata_pca[:,1]),pyplot.title('CCIPCA'), pyplot.show()
ccipca

use Skocaj_IPCA

ipca = Skocaj_IPCA( n_components = 2 );
ipca.fit( data );
idata_pca = ipca.transform( data );
pyplot.scatter( idata_pca[:,0], idata_pca[:,1]),pyplot.title('Skocaj_IPCA'), pyplot.show()
skocaj_ipca
#use Hall_IPCA
ipca = Hall_IPCA( n_components = 2 );
ipca.fit( data );
idata_pca = ipca.transform( data );
pyplot.scatter( idata_pca[:,0], idata_pca[:,1]),pyplot.title('Hall_IPCA'), pyplot.show()

hall_ipca

It seems that both CCIPCA and Skocaj_pca does not work properly, because their center after transformation is too far away from the origin (0,0) and their shapes are more like a oval rather than a circle.

By the way the Skocaj_IPCA often invokes the following warning on my machine:
RuntimeWarning: invalid value encountered in divide
explained_variance_.sum())

Many thanks to your contributions in sklearn

Rex

calling CCIPCA on big hdf5 dataset loads the whole dataset to memory

I have a big dataset in hdf5 (~6gb), that means I cannot load all of it to memory at once (one of the reasons to use IPCA). When I used your code for CCIPCA, these lines of function fit():

1: X = array2d(X)
3: X = as_float_array(X, copy=self.copy)

the first one tries to load the whole dataset to memory (from disk), and the second one raises a "Memory Error", saying that the whole matrix WILL be loaded to memory.
Since both are just checks, I commented them out and had no more problems using the code. But I think this should be a common use case, hence the issue. I'm not sure the problem happens with pure numpy arrays, but since h5py datasets have a similar interface (with slicing, etc), the results should be the same.

Configuration:
python (2.7.6)
h5py (2.3.0)
numpy (1.8.1)
scikit-learn (0.14.1)
scipy (0.14.0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.