Giter Club home page Giter Club logo

eigengame's Introduction

EigenGame: Top-k Eigendecompositions for Large, Streaming Data

Background and Description

EigenGame formulates top-k eigendecomposition as a k-player game, enabling a distributed approach that iteratively learns eigenvectors of large matrices defined as expectations over large datasets. This setting is common to many settings in machine learning, statistics, and science more generally.

This repository contains an implementation of the some of the algorithms and experiments described in a series of papers:

  • “EigenGame: PCA as a Nash Equilibrium”, Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel, ICLR (2021)
  • “EigenGame Unloaded: When playing games is better than optimizing”, Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel, ICLR (2022)
  • “The Generalized Eigenvalue Problem as a Nash Equilibrium”, Ian Gemp, Charlie Chen, Brian McWilliams, ICLR (2023).

Charlie Chen was the primary author of the source code with guidance and support from Ian Gemp and Zhe Wang. Brian McWilliams and Ian Gemp wrote initial versions of the implementation which was used for inspiration. Sukhdeep Singh was the program manager for this project.

WARNING: This is a research-level release of a JAX implementation and is under active development.

Installation

pip install -e . will install all required dependencies. This is best done inside a virtual environment (pip install virtualenv).

cd eigengame
virtualenv ~/venv/eigengame
source ~/venv/eigengame/bin/activate
pip install -e .

Note that the jaxlib version (which may be specified in setup.py) must correspond to the existing CUDA installation you wish to use. Please see the JAX documentation for more details.

Usage

eigengame uses the ConfigDict from ml_collections to configure the system. A few example scripts are included under eigengame/configs/. These are mostly for testing so may need additional settings for a production-level calculation.

Taking the synthetic_dataset_pca as an example.

cd eigengame/examples/synthetic_dataset_pca
python experiment.py --config ./config.py --jaxline_mode train_eval_multithreaded

This will train EigenGame to find the top-256 eigenvectors of a dataset drawn from a 1000 dimensional multivariate normal distribution. The system and hyperparameters can be controlled by modifying the config file. Details of all available config settings are in eigengame/eg_base_config.py.

Other systems can easily be set up, by creating a new config, data_pipeline, and experiment file.

Note: to train on larger datasets with large batch sizes, multi-GPU parallelisation is essential. This is supported via JAX's pmap. Multiple GPUs will be automatically detected and used if available.

Output

Evaluation metrics such as cosine similarity are saved in [config.checkpoint_dir]/eval as tensorboard event logs. Note that EigenGame must be run with jaxline_mode=train_eval_multithreaded as indicated in the example for these metrics to be saved.

The eigenvectors are saved to [config.checkpoint_dir] as .npy files containing numpy arrays of shape (num_devices, k, dimensionality).

Giving Credit

If you use this code in your work, we would appreciate it if you please cite the associated papers. The initial paper details the architecture and results on a range of systems:

@inproceedings{gemp2021eigengame,
  author    = {Gemp, Ian and
               McWilliams, Brian and
               Vernade, Claire and
               Graepel, Thore},
  title     = {{EigenGame}: {PCA} as a {N}ash Equilibrium},
  booktitle = {International Conference on Learning Representations},
  year      = {2021}
}
@inproceedings{gemp2022eigengame,
  author    = {Gemp, Ian and
               McWilliams, Brian and
               Vernade, Claire and
               Graepel, Thore},
  title     = {{EigenGame} Unloaded: When playing games is better than optimizing},
  booktitle = {International Conference on Learning Representations},
  year      = {2022}
}

and an arXiv paper describes the most current implementation:

@article{gemp2023generalized,
  author    = {Gemp, Ian and
               Chen, Charlie and
               McWilliams, Brian},
  title     = {The Generalized Eigenvalue Problem as a {N}ash Equilibrium},
  booktitle = {International Conference on Learning Representations},
  year      = {2023}
}

This repository can be cited using:

@software{eigengame_github,
  author  = {Chen, Charlie and
             Wang, Zhe and
             Gemp, Ian and
             Singh, Sukhdeep and
             {EigenGame} Contributors},
  title   = {{EigenGame}},
  url     = {http://github.com/deepmind/eigengame},
  year    = {2023}
}

Disclaimer

This is not an official Google product.

eigengame's People

Contributors

zw267 avatar

Stargazers

Jay Hineman avatar Amirzhan Zhanseitov avatar Siddharth Shrivastava avatar Hyun-Gook Kang avatar  avatar Yilong Qin avatar Archan Ray avatar Jongha (Jon) Ryu avatar  avatar Akash Sharma avatar Shreyansh Singh avatar Sandalots avatar Ruslans Aleksejevs avatar 爱可可-爱生活 avatar Nikolaus Schlemm avatar Luke Marris avatar 电线杆 avatar Kento Nozawa avatar Sung-Yub Kim avatar Jialong Wu avatar Onuralp SEZER avatar Nauman Mustafa avatar Ryuichiro Hataya avatar Taha Bouhsine avatar Rajiv Patel-O'Connor avatar Michael D. Sorochan Armstrong avatar  avatar

Watchers

Andreas Fidjeland avatar Saran Tunyasuvunakool avatar Arun Sathiya avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.