zib-iol / frankwolfe.jl Goto Github PK
View Code? Open in Web Editor NEWJulia implementation for various Frank-Wolfe and Conditional Gradient variants
License: MIT License
Julia implementation for various Frank-Wolfe and Conditional Gradient variants
License: MIT License
Add function to verify user provided gradients against numerical ones...
Is there a way to ensure that when we call the convert method on the birkhoff lmo to moi that it stays matrix?
# initial direction for first vertex
direction_vec = Vector{Float64}(undef, n * n)
randn!(direction_vec)
direction_mat = reshape(direction_vec, n, n)
# takes a matrix and returns a matrix
lmo = FrankWolfe.BirkhoffPolytopeLMO()
x00 = FrankWolfe.compute_extreme_point(lmo, direction_mat)
# modify to GLPK variant
# o = GLPK.Optimizer()
# takes a vector and returns a vector
lmo = FrankWolfe.convert_mathopt(lmo, o, dimension=n)
x00 = FrankWolfe.compute_extreme_point(lmo, direction_vec)
it would be better if it stays with the matrices so that we can simply drop-in replace it. could be probably done right at the beginning of convert and right before return.
Lazified Conditional Gradients (Frank-Wolfe + Lazification).
EMPHASIS: memory STEPSIZE: adaptive EPSILON: 1.0e-9 max_iteration: 1000 PHIFACTOR: 2 TYPE: Float64
cache_size Inf GREEDYCACHE: false
WARNING: In memory emphasis mode iterates are written back into x0!
─────────────────────────────────────────────────────────────────────────────────────────────────
Type Iteration Primal Dual Dual Gap Time Cache Size
─────────────────────────────────────────────────────────────────────────────────────────────────
ERROR: LoadError: MethodError: Cannot `convert` an object of type
FrankWolfe.RankOneMatrix{Float64,Array{Float64,1},Array{Float64,1}} to an object of type
AbstractArray{T,1} where T
Closest candidates are:
convert(::Type{T}, ::T) where T<:AbstractArray at abstractarray.jl:14
convert(::Type{T}, ::Factorization) where T<:AbstractArray at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/factorization.jl:55
convert(::Type{T}, ::T) where T at essentials.jl:171
Stacktrace:
[1] push!(::Array{AbstractArray{T,1} where T,1}, ::FrankWolfe.RankOneMatrix{Float64,Array{Float64,1},Array{Float64,1}}) at ./array.jl:934
[2] compute_extreme_point(::FrankWolfe.VectorCacheLMO{FrankWolfe.NuclearNormLMO{Float64},AbstractArray{T,1} where T}, ::Array{Float64,2}; threshold::Float64, store_cache::Bool, greedy::Bool, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/spokutta/Code/fwjulia/FrankWolfe.jl/src/oracles.jl:210
[3] lcg(::typeof(f), ::typeof(grad!), ::FrankWolfe.NuclearNormLMO{Float64}, ::FrankWolfe.RankOneMatrix{Float64,Array{Float64,1},Array{Float64,1}}; line_search::FrankWolfe.LineSearchMethod, L::Int64, phiFactor::Int64, cache_size::Float64, greedy_lazy::Bool, epsilon::Float64, max_iteration::Int64, print_iter::Float64, trajectory::Bool, verbose::Bool, linesearch_tol::Float64, emphasis::FrankWolfe.Emphasis, gradient::Nothing) at /home/spokutta/Code/fwjulia/FrankWolfe.jl/src/FrankWolfe.jl:359
[4] top-level scope at /home/spokutta/Code/fwjulia/FrankWolfe.jl/examples/movielens.jl:96
in expression starting at /home/spokutta/Code/fwjulia/FrankWolfe.jl/examples/movielens.jl:96
Think whether to improve the active set memory consumption with a memory mode, i.e., for the argmin oracle
Note. the AFW algorithm is more expensive due to the active set anyways. potentially just leave as is. Decide once we have BCG to decide how necessary.
adjust symbol names to follow julia style guide: https://docs.julialang.org/en/v1/manual/style-guide/
currently many functions are camel case but should use underscore instead
import FrankWolfe
import LinearAlgebra
n = Int(1e3)
k = 10000
xpi = rand(n);
total = sum(xpi);
const xp = xpi ./ total;
f(x) = LinearAlgebra.norm(x-xp)^2
grad(x) = 2 * (x-xp)
lmo = FrankWolfe.UnitSimplexOracle(1.0);
x0 = FrankWolfe.compute_extreme_point(lmo, rand(n))
FrankWolfe.benchmark_oracles(f(x),grad(x),lmo,n;k=100,T=Float64)
@time x, v, primal, dualGap, trajectoryM = FrankWolfe.fw(f,grad,lmo,x0,maxIt=k,
stepSize=FrankWolfe.shortstep,L=2,printIt=k/10,emph=FrankWolfe.blas,verbose=true, trajectory=true, momentum=0.9);
@time x, v, primal, dualGap, trajectoryM = FrankWolfe.fw(f,grad,lmo,x0,maxIt=k,
stepSize=FrankWolfe.shortstep,L=2,printIt=k/10,emph=FrankWolfe.memory,verbose=true, trajectory=true, momentum=0.9);
first one works - second one blows up.
currently we cannot broadcast with the new sparse types which somewhat offsets the power of the sparse types.
have a wrapper function for the different step sizes and strategies, so that we do not have to have a copy of the same code block everywhere.
simple routine to numerically verify gradients
problem seems to be the n
that we pass which is used for allocating the vectors. either we can have a function with a different signature or we can pass the shape etc. suggestions welcome.
Example that fails:
FrankWolfe.benchmark_oracles(f, (str, x) -> grad!(str, x), lmo, n; k=100, T=Float64)
added to movielens.jl
This will allow us to access the predefined sets, projections from MathOptSetDistances.jl.
From a user point of view, this means one can potentially use the algorithms from the package with the https://jump.dev interface
As discussed, we need an implementation of vanilla FW which maintains the current iterate as a set of atoms and convex combinations.
This could make the nuclear norm problems more stable by explicitly representing a low-rank matrix as a weighted sum of rank 1, while the current results has a lot of near-zero singular values
see discussion at the end of #57
lmo = FrankWolfe.LpNormLMO{Float64, 2}(1.0)
x0 = FrankWolfe.compute_extreme_point(lmo, zeros(n))
results in a NaN answer
carry over the momentum computation from FW to lazified FW
we need both functions to rescale according to size of the batch so that in expectation we have the exact gradient -> right now it seems that we are off by some batch normalization factor in the stochastic cases
-> requires discussion. maybe i am missing something
Make FrankWolfe.jl accessible in python.
When taking a step towards a FW vertex or an away vertex we need to update the active set. This requires finding if the vertex exists in the active set. Right now we are looping through the active set to find if any vertex in the active set is equal to the vertex we want to add. There are several occasions on which this is not needed.
Finding the vertex in the convex decomposition can be very costly. An easy fix would be to give as an optional argument to active_set_update! an index, which can be used to update the convex decomposition more quickly in the two cases above. This will likely result in much better improvement if the optimal face is relatively sparse, and we already contain most of these vertices in the active set.
BCG in movie lens has only two points in the beginning and simply cycles back and forth. Why?
some of the step sizes require relatively expensive calculations. it would be good to make them also @emphasis
aware and optimize them here and there.
After refactoring AFW and BCG we lost the memory awareness due to the subroutines. We should restore this. It is not that critical because both are hard on the memory anyways but it will still impact the speed of the iterations in particular, if we do not need to call the lmo.
Most methods consist in
Part 2. could look roughly the same in many algorithms, with the difference being what happens inside each iteration.
For this, an iteration interface could be nice. This also lets the user debug, inspect and so on without us having to anticipate all they might want to inspect at each iteration. The top-level functions can be kept as-is, but users with high-perf will just do the setup and iterations part, without allocating or logging.
Resources:
This discussion mentions the approach in Manopt.jl:
https://discourse.julialang.org/t/ann-optimkit-jl-a-blissfully-ignorant-julia-package-for-gradient-optimization/41063
A blog post describes the iterator approach to solve a linear system:
https://lostella.github.io/2018/07/25/iterative-methods-done-right.html
This is not high-priority, but can make FW more flexible and let users decide what they want to log and how
Add examples and check adaptability of the code base to non-trivial atoms.
Easy ones:
More exotic:
E.g.,
Using the stdlib Logger https://docs.julialang.org/en/v1/stdlib/Logging/ will probably be a good start to log progression of the algorithm, with users optionally passing a logger
to the function to print the log to a file
https://github.com/JuliaTesting/ReferenceTests.jl
This could be interesting for things that are hard to test. Most algorithms are not random, so they could be tested against references to see if a change did not influence the evolution per iteration for instance
(non-minimal) example.
using FrankWolfe
using LinearAlgebra
using ReverseDiff;
n = Int(1e3);
k = 1000
xpi = rand(n);
total = sum(xpi);
const xp = xpi ./ total;
const f(x) = 2 * LinearAlgebra.norm(x-xp)^3 - LinearAlgebra.norm(x)^2
const grad = x -> ReverseDiff.gradient(f, x)
# pick feasible region
lmo = FrankWolfe.ProbabilitySimplexOracle(1);
# compute some initial vertex
x0 = FrankWolfe.compute_extreme_point(lmo, zeros(n));
# benchmarking Oracles
FrankWolfe.benchmarkOracles(f,grad,lmo,n;k=100,T=Float64)
# memory variant
@time x, v, primal, dualGap, trajectory = FrankWolfe.fw(f,grad,lmo,x0,maxIt=k,
stepSize=FrankWolfe.nonconvex,printIt=k/10,emph=FrankWolfe.memory,verbose=true);
# blas variant ## works only with casting at the moment
# x0 = convert(Vector{promote_type(eltype(x0), Float64)}, x0)
@time x, v, primal, dualGap, trajectory = FrankWolfe.fw(f,grad,lmo,x0,maxIt=k,
stepSize=FrankWolfe.nonconvex,printIt=k/10,emph=FrankWolfe.blas,verbose=true);
Things to add
needs to be documented properly
Test that the algorithms work with:
For a generic function F(x)
, one needs to evaluate the gradient at a point grad(x)
as a dense vector, and then perform operations on it, mostly passing it to compute_extreme_point
, could there be an interface to avoid it?
One solution:
grad: x -> V{T}
is any function-like object that, given a point, returns a vector-like object (does not have to be a fully materialized vector).
benchmarking svdl
suggests that it needs about 0.5 sec per call for the movielens example. this seems way too high. @alejandro-carderera can you provide some numbers from the python code for the survey in terms of running time so that we can compare.
bring in the lazification from bcg or lcg to AFW so that we can be more efficient for the "localized steps".
Looking at:
https://github.com/ZIB-IOL/FrankWolfe.jl/blob/master/src/utils.jl#L9
the line search is not doing the same thing, we are not updating gamma in the iteration, only M, so the returned gamma is the initial one
The algorithms should compute dual prices at near optimal solutions
the numerical instabilities seem to come from the gradient being projected on the probability simplex of the current active set, yielding very low coefficients and high accumulation error.
One could take a few vertices and do a descent step only on those. It needs to have vertices with positive and negative dot product with the gradient since some will be reduced and some increased.
This should especially help for BCG in Float64 at large scale as in the example
With the current SFW interface, users provide a function that processes one data point, batching happens a level higher when we call the provided functions.
One possibility would be to make users provide batched functions by default:
f_batched(θ, xs) = sum(f(θ, x_i) for x_i in xs)
g_batched(θ, xs) = sum(g(θ, x_i) for x_i in xs)
What they provide now is the equivalent of the functions f
and g
above.
There is an example but some tests should be added to validate changes we run the tests. This can be verifying the quality of solutions on simple instances
Permutahedron
Birkhoff Polytope
Nuclear Norm Ball
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.