Giter Club home page Giter Club logo

distributed-frank-wolfe's Introduction

Distributed Frank-Wolfe Framework for Trace Norm Minimization

This package solves trace norm minimization in a distributed way. It is written in Python and works on clusters deployed with Apache SPARK.

Dependency

Use Python3.5 for the widest compatibility with Apache SPARK. Other versions of Python may not be supported.

Structure

The package is composed of the front end and the back end. The front end implements high-level functions, which allows the users to run the distributed Frank-Wolfe algorithms without caring about the detail. The back end implements various low-level functions about every aspect of Frank-Wolfe.

Frontend

The front end (frontend.py) has two functions: solve and evaluate. The former solves a minimization problem, as the name indicates. It returns an FWpath object, which stores the solution path. The latter takes a dataset and an FWpath object to evaluate the fitness of the result.

Backend

The back end is the base of the framework. It is composed of two parts: the Frank-Wolfe strategy part, which is generic, and the model part, which is not generic.

We currently have implemented two models: multitask least square (mls) and multinomial logistic regression (mlr). Each model is implemented in a separated file (e.g., mls.py, mlr.py) and follows a certain API, among which the most important functions are stats and update. The model part is written in pure Python. Users can implement their own models without any prior distributed computing knowledge.

The strategy part (fw.py) implements the 4 distributed strategies solving the linear subproblem as well as various step size defining devices. This part is written in PySpark, SPARK's Python API.

Basic usage

The typical workflow is:

  1. Open a PySpark terminal or a PySpark notebook
  2. Prepare your own RDD, and from frontend import solve, evaluate
  3. Feed your RDD to the solve function, as well as provide necessary parameters
  4. Evaluate your result with evaluate function.

The solve function takes as input

  • data RDD,
  • metadata dictionary containing the dimension of the input matrices (i.e., n, m, p)
  • model (either mls or mlr, should be imported)
  • hyperparameter characterizing the intended trace norm
  • number of epochs
  • linear minimization oracle
  • step size oracle

Linear minimization oracle candidates:

fw.centralize                                                             # centralize
fw.avgmix                                                                 # singular vectors mixture
lambda rdd, **kwargs: fw.poweriter(rdd, lambda t: 1, 'random', **kwargs)  # power1
lambda rdd, **kwargs: fw.poweriter(rdd, lambda t: 2, 'random', **kwargs)  # power2
lambda rdd, **kwargs: fw.poweriter(rdd, lambda t: 1, **kwargs)            # power1 with warm start
lambda rdd, **kwargs: fw.poweriter(rdd, lambda t: 2, **kwargs)            # power2 with warm start
lambda rdd, **kwargs: fw.poweriter(rdd, lambda t: fw.loground(t, c=1), 'random', **kwargs)  # powlog

Step size oracle candidates:

fw.linesearch
fw.naivestep                                     # 2 / (t+2)
lambda **kwargs: fw.fixedstep(const=0.01)        # fixed step size

The evaluate function takes as input

  • test data RDD,
  • solution path yield from solve
  • metadata dictionary containing the dimension of the input matrices (i.e., n, m, p)
  • model (either mls or mlr, should be imported)
  • (optional) ground truth matrix

For the mls model, it outputs the objective function value and, if provided with the ground truth, estimation error. For the mlr model, it outputs the objective function value and the top-5 misclassification rate.

This workflow works either on a cluster or a PC. For users with difficulties to install SPARK, I recommend my Park package. It allows to develop and/or test basic Spark code on a single PC without SPARK installed.

Demo

Two demos are provided, in both notebook format and HTML format.

Reference

This repository accompanies my working paper

  • A Distributed Frank-Wolfe Framework for Learning Low-Rank Matrices with the Trace Norm

as well as my Ph.D. dissertation

  • A Distributed Frank-Wolfe Framework for Trace Norm Minimization

Difference between these two works

  • My working paper focuses on matrices in the general space, whereas my dissertation studies general matrices, symmetric matrices and positive semidefinite matrices.
  • My dissertation includes only the slow convergence rate, whereas my working paper quantifies also the fast convergence rate when hypotheses could be made on the distribution of the singular values.
  • My working paper presents only the convergence in expectation, whereas my dissertation presents also the convergence in probability.
  • My working paper describes only the dense representation, whereas yy dissertation discusses also the low-rank representation.
  • My working paper provides a coarse-grained analysis of the elapsed time, whereas my dissertation provides a fine-grained one.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.