Comments (12)
Note: NLOpt.jl
also uses a set of integers to indicate different termination reasons. (See line 73-86 in https://github.com/stevengj/NLopt.jl/blob/master/src/NLopt.jl)
from optim.jl.
Very good point. I've been thinking about expanding on the simple Boolean exit status for a while. Symbols do seem like a good approach. I'll read through the flags from fmincon and try to add them.
For me, the biggest open question is the boundary between true error conditions in which we should raise an error and conditions in which we return results with warnings that convergence was not reached.
from optim.jl.
Looking through NLOpt, it seems like we should implement multiple convergence diagnostics -- e.g. convergence in gradient norm vs. convergence in function values vs. convergence in state. I've been debating this for some time, but hesitated. Since NLOpt is doing it, it seems like we'd be wise to follow suit.
from optim.jl.
Convergence criteria are an interesting topic. It seems that most optimization routines threshold the gradient (e.g., all components have absolute value < 1e-6). However, the physicist in me just cringes: I always imagine my different variables having different units, so with this criterion you're comparing convergence in "per-parsec" vs "per-microsecond," which makes no sense. Another way to say it is that this criterion is not scale-invariant. For that reason, I could not bring myself to adopt this criterion in cgdescent
.
Perhaps I shouldn't worry so much about this; it does bother me that cgdescent
is different, I think, from the majority of other algorithms. And it's also the case that the intermediate steps in optimization are also not scale-invariant (at least not until the Hessian gets "learned"), but to me it seems more important to have that property for a stopping criterion than for the intermediates in a calculation.
For what it's worth, one that does work out from a "units" perspective is this one:
max(abs(g.*dx)) < tol*(abs(f_new) + abs(f_old))
g
is the gradient, dx
is the step taken (their product has the same units as the objective function), tol
is a dimensionless number (e.g., sqrt(eps(T))
), and f_new
and f_old
are the function value on the current and previous iteration. I use both of them in case the function value happens to "pass through" zero. Whether one wants max
or mean
or something else is debatable (it looks like I used mean
in cgdescent
), but thinking about it I suspect max
is the better choice.
from optim.jl.
I'm very sympathetic to the concerns about units, but I'm not sure that scale-invariance is ultimately essential: the conventional measures of the difficulty of basic quadratic optimization problems like condition number are arguably themselves not scale-invariant.
For CG, Nocedal and Wright mention using this measure of convergence,
max(abs(g)) < tol * (1 + abs(f_new))
which is quite close to the one you're describing -- but different enough to not be scale-invariant.
For now, I think we should stick with conventional metrics -- while allowing users some flexibility to select among competing standards.
from optim.jl.
As a first pass at this, I've enlarged the OptimizationResults so that it separately specifies whether the function values converged or the gradient converged. It also includes a full set of function values encountered along the way for times when the trajectory is important without maintaining a full trace.
from optim.jl.
I've finished my draft work on this with 1dd3392: the algorithms now assess convergence in terms of change in x
, change in f(x)
and the norm of the gradient, gr
. All of this information is stored in OptimizationResults using Bool
's along with information about exceeding the maximum number of iterations.
For now, I think we're done, but would like to see if others think we need more information than this.
from optim.jl.
What do people want to do here? Right now, you get information about convergence in x
, f(x)
and norm(g(x), Inf)
. Which of the other errors from NLopt seem worth implementing?
from optim.jl.
As you can tell, I'm back at looking at this package again. To me the information seems quite adequate.
One small thing I noticed: should converged
be exported?
from optim.jl.
The current convergence information seems sufficient to me.
from optim.jl.
Agreed.
from optim.jl.
Maybe one related issue. #331
from optim.jl.
Related Issues (20)
- Noisy optimization HOT 1
- Absolute function tolerance HOT 1
- Typo on Optim.simplexer() function HOT 5
- What is Optim's mechanism for calculating the gradient? HOT 1
- [Feature request] Adding Hill Climbing HOT 7
- Docs Latex expressions displays as raw HOT 4
- JuMP Interface, Incorrect Status Returned HOT 5
- Mac nightly fails for some reason
- IPNewton : start is not an interior point HOT 1
- CUDA and Adam errors HOT 3
- Incorrect referencing to ipnewton_basics.ipynb in Nonlinear constrained optimization HOT 1
- `Adam` supported by `Fminbox`? HOT 2
- Add progress meter
- Documentation Incorrect HOT 2
- Simple optimization failing HOT 1
- Not all convergence criteria mentioned in documentation HOT 2
- `eval` in OptimMOIExt fails to precompile HOT 2
- Bounds Error in Neural Network Training Process HOT 1
- Expose more optimisation parameters to users HOT 4
- Extended trace for Adam fails
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optim.jl.