Giter Club home page Giter Club logo

vime's Introduction

Codebase for "VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain"

Authors: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar

Reference: Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar, "VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain," Neural Information Processing Systems (NeurIPS), 2020.

Paper Link: TBD

Contact: [email protected]

This directory contains implementations of VIME framework for self- and semi-supervised learning to tabular domain using MNIST dataset.

To run the pipeline for training and evaluation on VIME framwork, simply run python3 -m main_vime.py or see jupyter-notebook tutorial of VIME in tutorial_vime.ipynb.

Note that any model architecture can be used as the encoder and predictor models such as CNNs.

Code explanation

(1) data_loader.py

  • Load and preprocess MNIST data to make it as tabular data

(2) supervised_model.py

  • Define logistic regression, Multi-layer Perceptron (MLP), and XGBoost models
  • All of them are supervised models for classification

(3) vime_self.py

  • Self-supervised learning part of VIME framework
  • Return the encoder part of VIME framework

(4) vime_semi.py

  • Semi-supervised learning part of VIME framework
  • Save trained predictor and return the predictions on the testing set

(5) main_vime.py

  • Report the prediction performances of supervised-learning, Self-supervised part of VIME and entire VIME frameworks.

(6) vime_utils.py

  • Some utility functions for metrics and VIME frameworks.

Command inputs:

  • iterations: Number of experiments iterations
  • label_no: Number of labeled data to be used
  • model_name: Supervised model name (e.g., mlp, logit, or xgboost)
  • p_m: Corruption probability for self-supervised learning
  • alpha: Hyper-parameter to control the weights of feature and mask losses
  • K: Number of augmented samples
  • beta: Hyperparameter to control supervised and unsupervised losses
  • label_data_rate: Ratio of labeled data

Note that hyper-parameters should be optimized for different datasets.

Example command

$ python3 main_vime.py --iterations 10 --label_no 1000 --model_name xgboost
--p_m 0.3 --alpha 2.0 --K 3 --beta 1.0 --label_data_rate 0.1 

Outputs

  • results: performances of 3 different models (supervised only, VIME-self, and VIME)

vime's People

Contributors

jaturongkongmanee avatar jsyoon0823 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.