Giter Club home page Giter Club logo

frankwolfe.jl's People

Contributors

alejandro-carderera avatar dhendryc avatar dviladrich95 avatar elwirth avatar gdalle avatar github-actions[bot] avatar hannahtro avatar j-geuter avatar jannishal avatar jeremiahpslewis avatar matbesancon avatar pokutta avatar sebastiendesignolle avatar victorthouvenot avatar zevwoodstock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

frankwolfe.jl's Issues

when rewriting the birkhoff LMO to the mathopt one keep the same matrix interface

Is there a way to ensure that when we call the convert method on the birkhoff lmo to moi that it stays matrix?

# initial direction for first vertex
direction_vec = Vector{Float64}(undef, n * n)
randn!(direction_vec)
direction_mat = reshape(direction_vec, n, n)

# takes a matrix and returns a matrix
lmo = FrankWolfe.BirkhoffPolytopeLMO()
x00 = FrankWolfe.compute_extreme_point(lmo, direction_mat)

# modify to GLPK variant
# o = GLPK.Optimizer()

# takes a vector and returns a vector
lmo = FrankWolfe.convert_mathopt(lmo, o, dimension=n)
x00 = FrankWolfe.compute_extreme_point(lmo, direction_vec)

it would be better if it stays with the matrices so that we can simply drop-in replace it. could be probably done right at the beginning of convert and right before return.

LCG crashes on movie lens

Lazified Conditional Gradients (Frank-Wolfe + Lazification).
EMPHASIS: memory STEPSIZE: adaptive EPSILON: 1.0e-9 max_iteration: 1000 PHIFACTOR: 2 TYPE: Float64
cache_size Inf GREEDYCACHE: false
WARNING: In memory emphasis mode iterates are written back into x0!

─────────────────────────────────────────────────────────────────────────────────────────────────
  Type     Iteration         Primal           Dual       Dual Gap           Time     Cache Size
─────────────────────────────────────────────────────────────────────────────────────────────────
ERROR: LoadError: MethodError: Cannot `convert` an object of type 
  FrankWolfe.RankOneMatrix{Float64,Array{Float64,1},Array{Float64,1}} to an object of type 
  AbstractArray{T,1} where T
Closest candidates are:
  convert(::Type{T}, ::T) where T<:AbstractArray at abstractarray.jl:14
  convert(::Type{T}, ::Factorization) where T<:AbstractArray at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/factorization.jl:55
  convert(::Type{T}, ::T) where T at essentials.jl:171
Stacktrace:
 [1] push!(::Array{AbstractArray{T,1} where T,1}, ::FrankWolfe.RankOneMatrix{Float64,Array{Float64,1},Array{Float64,1}}) at ./array.jl:934
 [2] compute_extreme_point(::FrankWolfe.VectorCacheLMO{FrankWolfe.NuclearNormLMO{Float64},AbstractArray{T,1} where T}, ::Array{Float64,2}; threshold::Float64, store_cache::Bool, greedy::Bool, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/spokutta/Code/fwjulia/FrankWolfe.jl/src/oracles.jl:210
 [3] lcg(::typeof(f), ::typeof(grad!), ::FrankWolfe.NuclearNormLMO{Float64}, ::FrankWolfe.RankOneMatrix{Float64,Array{Float64,1},Array{Float64,1}}; line_search::FrankWolfe.LineSearchMethod, L::Int64, phiFactor::Int64, cache_size::Float64, greedy_lazy::Bool, epsilon::Float64, max_iteration::Int64, print_iter::Float64, trajectory::Bool, verbose::Bool, linesearch_tol::Float64, emphasis::FrankWolfe.Emphasis, gradient::Nothing) at /home/spokutta/Code/fwjulia/FrankWolfe.jl/src/FrankWolfe.jl:359
 [4] top-level scope at /home/spokutta/Code/fwjulia/FrankWolfe.jl/examples/movielens.jl:96
in expression starting at /home/spokutta/Code/fwjulia/FrankWolfe.jl/examples/movielens.jl:96

Away-Step FW memory mode

Think whether to improve the active set memory consumption with a memory mode, i.e., for the argmin oracle

Note. the AFW algorithm is more expensive due to the active set anyways. potentially just leave as is. Decide once we have BCG to decide how necessary.

another weird issue with momentum + memory

import FrankWolfe
import LinearAlgebra


n = Int(1e3)
k = 10000

xpi = rand(n);
total = sum(xpi);
const xp = xpi ./ total;

f(x) = LinearAlgebra.norm(x-xp)^2
grad(x) = 2 * (x-xp)

lmo = FrankWolfe.UnitSimplexOracle(1.0);
x0 = FrankWolfe.compute_extreme_point(lmo, rand(n))

FrankWolfe.benchmark_oracles(f(x),grad(x),lmo,n;k=100,T=Float64)

@time x, v, primal, dualGap, trajectoryM = FrankWolfe.fw(f,grad,lmo,x0,maxIt=k,
    stepSize=FrankWolfe.shortstep,L=2,printIt=k/10,emph=FrankWolfe.blas,verbose=true, trajectory=true, momentum=0.9);

@time x, v, primal, dualGap, trajectoryM = FrankWolfe.fw(f,grad,lmo,x0,maxIt=k,
    stepSize=FrankWolfe.shortstep,L=2,printIt=k/10,emph=FrankWolfe.memory,verbose=true, trajectory=true, momentum=0.9);

first one works - second one blows up.

benchmark_oracles does not work with matrix types

problem seems to be the n that we pass which is used for allocating the vectors. either we can have a function with a different signature or we can pass the shape etc. suggestions welcome.

Example that fails:

FrankWolfe.benchmark_oracles(f, (str, x) -> grad!(str, x), lmo, n; k=100, T=Float64)

added to movielens.jl

Vanilla Frank-Wolfe with convex combination representation

As discussed, we need an implementation of vanilla FW which maintains the current iterate as a set of atoms and convex combinations.

This could make the nuclear norm problems more stable by explicitly representing a low-rank matrix as a weighted sum of rank 1, while the current results has a lot of near-zero singular values

compute_gradient and compute_value do not rescale

we need both functions to rescale according to size of the batch so that in expectation we have the exact gradient -> right now it seems that we are off by some batch normalization factor in the stochastic cases

-> requires discussion. maybe i am missing something

BCG out-of-box improvements

  • SD steps sometimes make not enough progress although they should (see movielens example). problem with the line search strategy
  • fix numerical issues

Finding vertex in convex decomposition.

When taking a step towards a FW vertex or an away vertex we need to update the active set. This requires finding if the vertex exists in the active set. Right now we are looping through the active set to find if any vertex in the active set is equal to the vertex we want to add. There are several occasions on which this is not needed.

  • Taking an away step: We already know the index in the active set, no need to loop through the active set again.
  • Taking a lazy FW step: Same as above!

Finding the vertex in the convex decomposition can be very costly. An easy fix would be to give as an optional argument to active_set_update! an index, which can be used to update the convex decomposition more quickly in the two cases above. This will likely result in much better improvement if the optimal face is relatively sparse, and we already contain most of these vertices in the active set.

Make AFW and BCG emphasis aware after refactoring

After refactoring AFW and BCG we lost the memory awareness due to the subroutines. We should restore this. It is not that critical because both are hard on the memory anyways but it will still impact the speed of the iterations in particular, if we do not need to call the lmo.

Implement the core of FW methods as iterators

Most methods consist in

  1. Setup
  2. Iteration until criterion met
  3. Cleanup and return

Part 2. could look roughly the same in many algorithms, with the difference being what happens inside each iteration.
For this, an iteration interface could be nice. This also lets the user debug, inspect and so on without us having to anticipate all they might want to inspect at each iteration. The top-level functions can be kept as-is, but users with high-perf will just do the setup and iterations part, without allocating or logging.

Resources:
This discussion mentions the approach in Manopt.jl:
https://discourse.julialang.org/t/ann-optimkit-jl-a-blissfully-ignorant-julia-package-for-gradient-optimization/41063

A blog post describes the iterator approach to solve a linear system:
https://lostella.github.io/2018/07/25/iterative-methods-done-right.html

This is not high-priority, but can make FW more flexible and let users decide what they want to log and how

Non-Euclidean / non-vector space examples

Add examples and check adaptability of the code base to non-trivial atoms.

Easy ones:

  • matrix spaces (symmetric or not)
  • complex fields

More exotic:

  • Base learners (i.e. https://arxiv.org/abs/1910.03742 Greedy Convex Ensemble)
  • Measure spaces - (speculation) applications to optimal transport, Wasserstein barycenter problems

[Needs Documentation] Fix issue with Int64(...) inexact from MaybeHotVector

(non-minimal) example.

using FrankWolfe
using LinearAlgebra
using ReverseDiff;
​
n = Int(1e3);
k = 1000
​
xpi = rand(n);
total = sum(xpi);
const xp = xpi ./ total;
​
const f(x) = 2 * LinearAlgebra.norm(x-xp)^3 - LinearAlgebra.norm(x)^2
const grad = x -> ReverseDiff.gradient(f, x)
​
# pick feasible region
lmo = FrankWolfe.ProbabilitySimplexOracle(1);
​
# compute some initial vertex
x0 = FrankWolfe.compute_extreme_point(lmo, zeros(n));
​
# benchmarking Oracles
FrankWolfe.benchmarkOracles(f,grad,lmo,n;k=100,T=Float64)
​
# memory variant
@time x, v, primal, dualGap, trajectory = FrankWolfe.fw(f,grad,lmo,x0,maxIt=k,
    stepSize=FrankWolfe.nonconvex,printIt=k/10,emph=FrankWolfe.memory,verbose=true);
​
# blas variant ## works only with casting at the moment
# x0 = convert(Vector{promote_type(eltype(x0), Float64)}, x0) 
@time x, v, primal, dualGap, trajectory = FrankWolfe.fw(f,grad,lmo,x0,maxIt=k,
    stepSize=FrankWolfe.nonconvex,printIt=k/10,emph=FrankWolfe.blas,verbose=true);

Test type genericity

Test that the algorithms work with:

  1. Extended precision (BigFloat)
  2. Reduced precision (Float16/32)
  3. Rational

Gradient interface

For a generic function F(x), one needs to evaluate the gradient at a point grad(x) as a dense vector, and then perform operations on it, mostly passing it to compute_extreme_point, could there be an interface to avoid it?

One solution:

grad: x -> V{T}

is any function-like object that, given a point, returns a vector-like object (does not have to be a fully materialized vector).

svdl seems very slow

benchmarking svdl suggests that it needs about 0.5 sec per call for the movielens example. this seems way too high. @alejandro-carderera can you provide some numbers from the python code for the survey in terms of running time so that we can compare.

CG versions to be added

General extensions for all

  • add Momentum to base variant
  • general lazification
  • add adaptive step-size strategy from section 6.3.2 from the survey (requires only a single function)

Stochastic Algorithms

  • Stochastic FW

Active set algorithms

  • Away-Step FW (and Pairwise)
  • BCG

Add dual prices

The algorithms should compute dual prices at near optimal solutions

Sparse gradient descent

the numerical instabilities seem to come from the gradient being projected on the probability simplex of the current active set, yielding very low coefficients and high accumulation error.

One could take a few vertices and do a descent step only on those. It needs to have vertices with positive and negative dot product with the gradient since some will be reduced and some increased.

This should especially help for BCG in Float64 at large scale as in the example

Stochastic FW: optional batched user-provided functions

With the current SFW interface, users provide a function that processes one data point, batching happens a level higher when we call the provided functions.

One possibility would be to make users provide batched functions by default:

f_batched(θ, xs) = sum(f(θ, x_i) for x_i in xs)
g_batched(θ, xs) = sum(g(θ, x_i) for x_i in xs)

What they provide now is the equivalent of the functions f and g above.

Localized gradient doesn't have tests

There is an example but some tests should be added to validate changes we run the tests. This can be verifying the quality of solutions on simple instances

Implement more complex LP oracles

Permutahedron
Birkhoff Polytope
Nuclear Norm Ball

  1. do permutahedron in GLPK (not sure whether this is smart)
  2. add an LP example with GLPK for testin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.