Giter Club home page Giter Club logo

Comments (13)

stephentu avatar stephentu commented on May 22, 2024 1

Alright I've had some time to debug now.

How tied are you to the particular L-BFGS implementation in the codebase?

In order to figure out if the issue is that Lovasz Theta SDP is difficult for LR-SDP intrinsically, or if the issue is with the mlpack specific code, I reproduced another implementation in python. Here's the gist: https://gist.github.com/anonymous/07956ad4465ff2b96b4c. Using the same initialization points and penalty update strategy suggested by BM03, the python version gets the right answer of -14, whereas the mlpack implementation (without sparse matrices, but fixing the off by one bug) incorrectly converges towards essentially R = 0.

From what I can tell (checking the numeric outputs), the gradient/lagrange computations are the same. The major difference is the L-BFGS method: the python one is using scipy's fmin_l_bfgs_b, which is a wrapper around the popular L-BFGS-B (http://users.iems.northwestern.edu/~nocedal/lbfgsb.html). I can't help but wonder if swapping the solver here would make these problems go away.

But of course new library dependencies should not be taken lightly. Another alternative might be to use google's ceres library (http://ceres-solver.org/gradient_solver.html). I have not used this before, but it looks promising.

Thoughts?

from mlpack.

stephentu avatar stephentu commented on May 22, 2024

Alternatively, we could get rid of the modes entirely and just use sparse matrices in place

from mlpack.

rcurtin avatar rcurtin commented on May 22, 2024

The modes are actually just a workaround from a long time ago to deal with the fact that back then, Armadillo didn't have sparse matrices. So if aModes[index] == 1 then the constraint matrix a[index] is assumed to have three rows and n_nonzero columns (basically, a sparse matrix stored in coordinate form).

But you are right that it is far better to actually use arma::sp_mat and I think the reason that this didn't happen is that this code hasn't been revisited since sparse matrices have been added to Armadillo. I think that in this case option a) is probably best, but if we're storing separate vectors for sparse and dense constraint matrices, there's no reason to store ai_modes. We can just say 'sparse constraints are ordered before dense constraints' or something like that, so the index of the first dense constraint is sparse_ais.size(). Does that make sense?

Also, if you are looking for a home for your collection of SDP solvers, maybe mlpack could be a place, if you were interested. There are lots of things we could build on top of reliable, fast SDP solvers. :)

from mlpack.

stephentu avatar stephentu commented on May 22, 2024

Alright here's my first pass at the implementation: stephentu@186ffaa

It doesn't quite work yet as it is taking forever on the lovasz SDP test case (but getting the right obj value). So I need to look into why the optimization is not terminating. But here I'm looking for feedback on the API.

I'd do this in pull request form, but the git history got all mangled again so I'm going to wait a bit until that get sorted out.

from mlpack.

rcurtin avatar rcurtin commented on May 22, 2024

Looks good to me. We should also update the documentation to clarify the difference between the sparse and dense As, bs, and Cs but that's easy enough to do. I made a few comments on the commit, by the way. One other very minor thing for code consistency with the rest of mlpack: avoid underscores in names (i.e. sparseA instead of a_sparse, which then has accessor and mutator SparseA()).

I'd be interested to see how the speed compares. It should be about the same, maybe a little faster since there's no need to branch on the values of aModes anymore.

I've got one more big mangling of the commit history to do, probably later today, then I won't do it again (at least for a while), I promise. :)

from mlpack.

stephentu avatar stephentu commented on May 22, 2024

How confident are you that the optimal value of johnson8-4-4 is actually -14? I realize that the BM paper says they got -14, but in Table 5 of BM it also says johnson8-4-4 has |V|=2048, |E|=56320 whereas the actual johnson8-4-4.csv in the repo contains |V|=70, |E|=560.

I bring this up, because in the course of debugging, I realized that the original test case (before my changes) is actually missing one constraint (the size of the B and Lambda vectors should be edges.n_cols + 1 not edges.n_cols). In changing that, the optimization now converges to -48. I'm going to use Mosek to solve this SDP and see what Mosek gets.

from mlpack.

stephentu avatar stephentu commented on May 22, 2024

Alright, Mosek agrees with -14. Gist: https://gist.github.com/anonymous/d639bdf3b576d734040d

Time to track down the bug

from mlpack.

rcurtin avatar rcurtin commented on May 22, 2024

Ah, okay. I wouldn't have been completely surprised if my test case was wrong. I don't have an explanation for the disparity between the BM paper and the size of the dataset in the repo, and consulting commit logs and personal emails doesn't provide a solution either. Let me know if you want any help tracking down the bug.

from mlpack.

stephentu avatar stephentu commented on May 22, 2024

Cool-- sorry I've been a bit slow to track this down, I'm actually on vacation now so it'll take me a while to find the cycles to debug.

from mlpack.

rcurtin avatar rcurtin commented on May 22, 2024

To reiterate what you said in order that you may correct me if I've misunderstood, it seems as though the minor changes that you've made have caused LRSDP to not converge, seemingly because L-BFGS finds a stationary point at R = 0. I'm digging pretty far in my memory banks here but I seem to remember that the LRSDP gradient is defined in such a way that there does exist a stationary point at R = 0. I also recall an email to Sam Burer in which I described the phenomenon (which at the time was occurring when I was trying to do MVU via LRSDP), where he agreed that such a thing could happen, but mentioned that he had not really put much thought into it nor had he encountered it in his simulations.

I think at this point that we are looking at two separate issues inside of this one issue; please correct me if you think otherwise:

  • The LRSDP code has been successfully adapted to use arma::mat and arma::sp_mat in stephentu/mlpack@186ffaa
  • The Johnson844LovaszThetaSDP test fails to converge once you have added the last constraint.

So I would say that if the code you have in your branch passes the test before you added the constraint, let's merge it in---I'm satisfied that it works (especially since the changes are so straightforward). Then I would say this issue is resolved (and we could either rename it or open a new one or what, it doesn't matter to me).

For the test failing to converge, I think the analysis is that our optimizer is unfortunately finding the stationary point R = 0. I'm not confident enough to conclude that, given that I'm just going off my memory and what you've told me, but it seems at least somewhat likely. One solution to this, which you've suggested, is a different optimizer. mlpack has a simulated annealing implementation and an SGD implementation, but SA is super slow and SGD requires a decomposable function, so we can't use either out of the box. I'm hesitant of the dependencies you've suggested: scipy will bring in a giant mess of Python dependencies, liblbfgs doesn't appear to be heavily maintained and I don't know about its portability; ceres doesn't seem to be available in Debian or Fedora repositories; also, how well any of these would work with the framework mlpack already has in place is not (yet) clear.

If you think L-BFGS-B is necessary to solve this problem, I'll implement it (unless you want to); I should be able to reuse the existing L-BFGS code or simply templatize it further. I am going to spend a little time tweaking the L-BFGS parameters with the updated test a little bit to see if that helps, and I'll let you know what I find in the next day or two.

I think mlpack's set of optimizers is a little lacking at the moment and I'm happy to try and set aside some time to implement other algorithms.

Last thing before I finish rambling: is it possible to use plain old L-BFGS in place of L-BFGS-B in your gist? If so, and if that converges okay, then you may have exposed a bug in L-BFGS (which I'd be happy to hunt down, if such a thing exists).

from mlpack.

stephentu avatar stephentu commented on May 22, 2024

You are right about R=0 as a stationary point; the gradient is indeed proportional to R. However, I honestly hadn't encountered this problem either when running max-cut and matrix completion instances recently.

Your summary of the two outstanding issues is correct. I will do two things:

  • I will do some git-fu to get my patch, w/ the last constraint omitted in the test case, mergable against the current master branch.
  • I will then close this issue and open up a new one to deal w/ the second issue you described.

I agree with your assessment of the dependencies. I was, however, thinking of doing the following. We copy one file, lbfgsb.f (https://github.com/trsaunders/L-BFGS-B-Wrapper/blob/master/lbfgsb.f), into the mlpack codebase. We then maintain a copy of a C++ wrapper around setulb shown here (https://github.com/trsaunders/L-BFGS-B-Wrapper/blob/master/lbfgsb_mex.cpp), gutting out all the mex crap and making it fit into the existing L_BFGS API. The wrapper code is fairly straightforward. There are no external library dependencies, other than requiring a fortran compiler for compilation of the project (I think this is reasonable requirement for compiling scientific/statistical software).

I like this option more than re-implementing some variant of L-BFGS since this code is pretty much the reference implementation, so we can have reasonable confidence it works robustly. I think robustness is important because, often the issue with hand rolled optimization routines is that, it works great for your test problems, but then somebody else comes along and their problem hits some corner case and then you're sitting there scratching your head if it's the higher level stuff that calls the optimization routine that is broken or the routine itself (as is the current situation). There's a reason that the scipy people go through such glue code hell to call into these fortran routines with obscure APIs 😄 What do you think of this option?

from mlpack.

rcurtin avatar rcurtin commented on May 22, 2024

Your faith in the correctness of reference implementations is greater than mine (having found bugs in "bulletproof" reference implementations before after months of digging and testing). :)

The FORTRAN compiler situation isn't bad and I wouldn't be too incredibly opposed to it as a dependency but honestly I'd be perfectly happy to sit down, write, and test L-BFGS-B in order to avoid the additional dependency. I really don't like wrapping code in that way and I'll go to pretty big lengths to avoid it. So if you're fine with me providing an implementation of L-BFGS-B, I can have it done probably in about a week or so. Let me know what you think.

from mlpack.

stephentu avatar stephentu commented on May 22, 2024

I'm closing this issue to continue the discussion about the optimizer in a new issue

from mlpack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.