Comments (13)
Alright I've had some time to debug now.
How tied are you to the particular L-BFGS implementation in the codebase?
In order to figure out if the issue is that Lovasz Theta SDP is difficult for LR-SDP intrinsically, or if the issue is with the mlpack specific code, I reproduced another implementation in python. Here's the gist: https://gist.github.com/anonymous/07956ad4465ff2b96b4c. Using the same initialization points and penalty update strategy suggested by BM03, the python version gets the right answer of -14, whereas the mlpack implementation (without sparse matrices, but fixing the off by one bug) incorrectly converges towards essentially R = 0.
From what I can tell (checking the numeric outputs), the gradient/lagrange computations are the same. The major difference is the L-BFGS method: the python one is using scipy's fmin_l_bfgs_b
, which is a wrapper around the popular L-BFGS-B (http://users.iems.northwestern.edu/~nocedal/lbfgsb.html). I can't help but wonder if swapping the solver here would make these problems go away.
But of course new library dependencies should not be taken lightly. Another alternative might be to use google's ceres library (http://ceres-solver.org/gradient_solver.html). I have not used this before, but it looks promising.
Thoughts?
from mlpack.
Alternatively, we could get rid of the modes entirely and just use sparse matrices in place
from mlpack.
The modes
are actually just a workaround from a long time ago to deal with the fact that back then, Armadillo didn't have sparse matrices. So if aModes[index] == 1
then the constraint matrix a[index]
is assumed to have three rows and n_nonzero
columns (basically, a sparse matrix stored in coordinate form).
But you are right that it is far better to actually use arma::sp_mat
and I think the reason that this didn't happen is that this code hasn't been revisited since sparse matrices have been added to Armadillo. I think that in this case option a) is probably best, but if we're storing separate vectors for sparse and dense constraint matrices, there's no reason to store ai_modes
. We can just say 'sparse constraints are ordered before dense constraints' or something like that, so the index of the first dense constraint is sparse_ais.size()
. Does that make sense?
Also, if you are looking for a home for your collection of SDP solvers, maybe mlpack could be a place, if you were interested. There are lots of things we could build on top of reliable, fast SDP solvers. :)
from mlpack.
Alright here's my first pass at the implementation: stephentu@186ffaa
It doesn't quite work yet as it is taking forever on the lovasz SDP test case (but getting the right obj value). So I need to look into why the optimization is not terminating. But here I'm looking for feedback on the API.
I'd do this in pull request form, but the git history got all mangled again so I'm going to wait a bit until that get sorted out.
from mlpack.
Looks good to me. We should also update the documentation to clarify the difference between the sparse and dense As, bs, and Cs but that's easy enough to do. I made a few comments on the commit, by the way. One other very minor thing for code consistency with the rest of mlpack: avoid underscores in names (i.e. sparseA
instead of a_sparse
, which then has accessor and mutator SparseA()
).
I'd be interested to see how the speed compares. It should be about the same, maybe a little faster since there's no need to branch on the values of aModes
anymore.
I've got one more big mangling of the commit history to do, probably later today, then I won't do it again (at least for a while), I promise. :)
from mlpack.
How confident are you that the optimal value of johnson8-4-4 is actually -14? I realize that the BM paper says they got -14, but in Table 5 of BM it also says johnson8-4-4 has |V|=2048, |E|=56320 whereas the actual johnson8-4-4.csv
in the repo contains |V|=70, |E|=560.
I bring this up, because in the course of debugging, I realized that the original test case (before my changes) is actually missing one constraint (the size of the B and Lambda vectors should be edges.n_cols + 1
not edges.n_cols
). In changing that, the optimization now converges to -48. I'm going to use Mosek to solve this SDP and see what Mosek gets.
from mlpack.
Alright, Mosek agrees with -14. Gist: https://gist.github.com/anonymous/d639bdf3b576d734040d
Time to track down the bug
from mlpack.
Ah, okay. I wouldn't have been completely surprised if my test case was wrong. I don't have an explanation for the disparity between the BM paper and the size of the dataset in the repo, and consulting commit logs and personal emails doesn't provide a solution either. Let me know if you want any help tracking down the bug.
from mlpack.
Cool-- sorry I've been a bit slow to track this down, I'm actually on vacation now so it'll take me a while to find the cycles to debug.
from mlpack.
To reiterate what you said in order that you may correct me if I've misunderstood, it seems as though the minor changes that you've made have caused LRSDP to not converge, seemingly because L-BFGS finds a stationary point at R = 0. I'm digging pretty far in my memory banks here but I seem to remember that the LRSDP gradient is defined in such a way that there does exist a stationary point at R = 0. I also recall an email to Sam Burer in which I described the phenomenon (which at the time was occurring when I was trying to do MVU via LRSDP), where he agreed that such a thing could happen, but mentioned that he had not really put much thought into it nor had he encountered it in his simulations.
I think at this point that we are looking at two separate issues inside of this one issue; please correct me if you think otherwise:
- The LRSDP code has been successfully adapted to use arma::mat and arma::sp_mat in stephentu/mlpack@186ffaa
- The Johnson844LovaszThetaSDP test fails to converge once you have added the last constraint.
So I would say that if the code you have in your branch passes the test before you added the constraint, let's merge it in---I'm satisfied that it works (especially since the changes are so straightforward). Then I would say this issue is resolved (and we could either rename it or open a new one or what, it doesn't matter to me).
For the test failing to converge, I think the analysis is that our optimizer is unfortunately finding the stationary point R = 0. I'm not confident enough to conclude that, given that I'm just going off my memory and what you've told me, but it seems at least somewhat likely. One solution to this, which you've suggested, is a different optimizer. mlpack has a simulated annealing implementation and an SGD implementation, but SA is super slow and SGD requires a decomposable function, so we can't use either out of the box. I'm hesitant of the dependencies you've suggested: scipy will bring in a giant mess of Python dependencies, liblbfgs doesn't appear to be heavily maintained and I don't know about its portability; ceres doesn't seem to be available in Debian or Fedora repositories; also, how well any of these would work with the framework mlpack already has in place is not (yet) clear.
If you think L-BFGS-B is necessary to solve this problem, I'll implement it (unless you want to); I should be able to reuse the existing L-BFGS code or simply templatize it further. I am going to spend a little time tweaking the L-BFGS parameters with the updated test a little bit to see if that helps, and I'll let you know what I find in the next day or two.
I think mlpack's set of optimizers is a little lacking at the moment and I'm happy to try and set aside some time to implement other algorithms.
Last thing before I finish rambling: is it possible to use plain old L-BFGS in place of L-BFGS-B in your gist? If so, and if that converges okay, then you may have exposed a bug in L-BFGS (which I'd be happy to hunt down, if such a thing exists).
from mlpack.
You are right about R=0 as a stationary point; the gradient is indeed proportional to R. However, I honestly hadn't encountered this problem either when running max-cut and matrix completion instances recently.
Your summary of the two outstanding issues is correct. I will do two things:
- I will do some git-fu to get my patch, w/ the last constraint omitted in the test case, mergable against the current master branch.
- I will then close this issue and open up a new one to deal w/ the second issue you described.
I agree with your assessment of the dependencies. I was, however, thinking of doing the following. We copy one file, lbfgsb.f
(https://github.com/trsaunders/L-BFGS-B-Wrapper/blob/master/lbfgsb.f), into the mlpack codebase. We then maintain a copy of a C++ wrapper around setulb
shown here (https://github.com/trsaunders/L-BFGS-B-Wrapper/blob/master/lbfgsb_mex.cpp), gutting out all the mex crap and making it fit into the existing L_BFGS
API. The wrapper code is fairly straightforward. There are no external library dependencies, other than requiring a fortran compiler for compilation of the project (I think this is reasonable requirement for compiling scientific/statistical software).
I like this option more than re-implementing some variant of L-BFGS since this code is pretty much the reference implementation, so we can have reasonable confidence it works robustly. I think robustness is important because, often the issue with hand rolled optimization routines is that, it works great for your test problems, but then somebody else comes along and their problem hits some corner case and then you're sitting there scratching your head if it's the higher level stuff that calls the optimization routine that is broken or the routine itself (as is the current situation). There's a reason that the scipy people go through such glue code hell to call into these fortran routines with obscure APIs 😄 What do you think of this option?
from mlpack.
Your faith in the correctness of reference implementations is greater than mine (having found bugs in "bulletproof" reference implementations before after months of digging and testing). :)
The FORTRAN compiler situation isn't bad and I wouldn't be too incredibly opposed to it as a dependency but honestly I'd be perfectly happy to sit down, write, and test L-BFGS-B in order to avoid the additional dependency. I really don't like wrapping code in that way and I'll go to pretty big lengths to avoid it. So if you're fine with me providing an implementation of L-BFGS-B, I can have it done probably in about a week or so. Let me know what you think.
from mlpack.
I'm closing this issue to continue the discussion about the optimizer in a new issue
from mlpack.
Related Issues (20)
- which is the right way to setup number of epochs? [NoobQuestion#2] HOT 2
- Build failing for no apparent reason (Visual Studio/C++ 17/Windows 11) HOT 19
- Video Walkthrough for compiling MLPACK
- Add a devcontainer.json HOT 1
- [R] Patch release for R bindings HOT 1
- No module named 'mlpack.image_converter' during import in python at Mac. HOT 6
- Simplify Reinforcement Learning Agent Creation HOT 5
- Try upgrading your stb implementation (macOS) HOT 1
- Crashes with valid test data HOT 8
- Fatal error occurred when calling mlpack.bayesian_linear_regression() HOT 2
- Move the Wiki developer section to the new docs developer page
- stb_image_write warning while compiling HOT 1
- Benchmark to replace the transform functions HOT 15
- Can't train a model having bias addition layer Add() HOT 8
- Reverse Convolution? HOT 6
- Documentation issue
- [R] - `verbose` argument has no effect HOT 1
- Get rid of `arma::fill::zeros` when we upgrade the minimum armadillo version HOT 5
- Document `internal_compact::` name space for `arma::fill` HOT 2
- [R] - Global option for 'verbose' argument HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlpack.