Comments (12)
Hello @stefan-k, have you considered adding criterion.rs style benchmarks in order to track the crate's algorithm performance?
I've quickly turned the backtracking example into one and run it on the commit prior to #112 and the current main branch.
This isn't strictly related to benchmarking L-BFGS in comparison to the Fortran library but could ease evaluating changes that aim to improve performance, and also would prevent degrading it in future releases.
As you can see in the changes in my fork, the changes necessary are really minimal:
- main: main...jonboh:argmin:main
- prior to #112: main...jonboh:argmin:6deb382_benchmark
With those changes we can generate reports with a couple of commands
git checkout 6deb382_benchmark
cargo bench --bench backtracking -- --save-baseline 6deb382
git checkout main
cargo bench --bench backtracking -- --baseline 6deb382
xdg-open ./target/criterion/report/index.html
(blue corresponds to main and red to the commit prior to #112 )
Let me know your thoughts, if you think it would be a worthwhile addition I think I could make a PR this week to add these benchmarks.
I think this would ease working on the performance of the LBFGS algorithm (and others). It would give a good target in which to run the profiles.
from argmin.
Such a benchmark would be really interesting. I haven't compared against these implementations at all, so besides the performance differences, it may be possible that they do not produce the exact same results.
In terms of performance, I suspect (or better, hope) that most time is spent on linear algebra. I think flamegraph could be a very useful tool for this.
I will see if I can find problems for the benchmark (which is something I wanted to do for a long time anyway).
from argmin.
I tested this today. I'm using the original Fortran implementation v3. My application runs in
- 15s with the L-BFGS-B version
- 18s with the L-BFGS version
I really wanted to test argmin-rs because I want to get rid of Fortran in my perfect Rust codebase :D I changed only the L-BFGS part, from the original version to the argmin-rs version. My application now runs in 1m09s. I'm aware that this is not a benchmark at all! Still, now I don't need to code one as I have a ballpark of what I would gain (or lose). Of course, this slowdown is unacceptable, so I won't use this version. On the plus side though
- the code I needed to write using argmin-rs is is much more pleasant! A lot less code and no damn Fortran.
- The results are really close, with a lower final cost, like 0.005 less.
This makes me wonder what tricks they used in the original version. It's obviously optimized by a group of genius mathematicians :P If I test the C port, I'll tell update this page.
from argmin.
Thanks for the feedback @nilgoyette ! I also did some benchmarks recently between pyargmin and scipy for L-BFGS-B on the logistic regression problem, am also getting comparable results (i.e. pyargmin is ~4x slower). The WIP comparison notebook (in Python) can be found here.
I think there are two directions to explore to improve this situation,
- parameters: we need to make sure that we run benchmarks exactly with the same parameters (including line search) as the Fortran version for the results to be comparable. #37 helps, there is more work needed for a rigorous comparison. This may also lead to a conclusion that some default values may be sub-optimal.
- code optimization: as far as I understood form @stefan-k comment above, no one has spent much too time optimizing LBFGS in argmin, so even a little benchmarking & tuning effort might yield reasonable results. In particular, one clear advantage over Fortran we have is the ability to use BLAS.
from argmin.
To have a starting point for this discussion about optimization, here are results of,
cargo flamegraph --example lbfgs --features ndarrayl
with a modified lbfgs example to increase problem dimensionality (and run longer),
diff --git a/Cargo.toml b/Cargo.toml
index e4c176fd..9cce01f1 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -54,3 +54,6 @@ visualizer = ["gnuplot"]
[badges]
travis-ci = { repository = "argmin-rs/argmin", branch = "master" }
maintenance = { status = "actively-developed" }
+
+[profile.release]
+debug = true
diff --git a/examples/lbfgs.rs b/examples/lbfgs.rs
index c22fa30f..5ca625c9 100644
--- a/examples/lbfgs.rs
+++ b/examples/lbfgs.rs
@@ -42,19 +42,272 @@ fn run() -> Result<(), Error> {
let cost = Rosenbrock { a: 1.0, b: 100.0 };
// Define initial parameter vector
- let init_param: Array1<f64> = array![-1.2, 1.0];
- // let init_param: Array1<f64> = array![-1.2, 1.0, -10.0, 2.0, 3.0, 2.0, 4.0, 10.0];
+ // Output of
+ // >>> import numpy as np
+ //>>> np.random.RandomState(0).rand(1000)
+ let init_param: Array1<f64> = array![
+ 5.48813504e-01, 7.15189366e-01, 6.02763376e-01, 5.44883183e-01,
+ 4.23654799e-01, 6.45894113e-01, 4.37587211e-01, 8.91773001e-01,
+ 9.63662761e-01, 3.83441519e-01, 7.91725038e-01, 5.28894920e-01,
+ 5.68044561e-01, 9.25596638e-01, 7.10360582e-02, 8.71292997e-02,
+ 2.02183974e-02, 8.32619846e-01, 7.78156751e-01, 8.70012148e-01,
+ 9.78618342e-01, 7.99158564e-01, 4.61479362e-01, 7.80529176e-01,
+ 1.18274426e-01, 6.39921021e-01, 1.43353287e-01, 9.44668917e-01,
+ 5.21848322e-01, 4.14661940e-01, 2.64555612e-01, 7.74233689e-01,
+ 4.56150332e-01, 5.68433949e-01, 1.87898004e-02, 6.17635497e-01,
+ 6.12095723e-01, 6.16933997e-01, 9.43748079e-01, 6.81820299e-01,
+ 3.59507901e-01, 4.37031954e-01, 6.97631196e-01, 6.02254716e-02,
+ 6.66766715e-01, 6.70637870e-01, 2.10382561e-01, 1.28926298e-01,
+ 3.15428351e-01, 3.63710771e-01, 5.70196770e-01, 4.38601513e-01,
+ 9.88373838e-01, 1.02044811e-01, 2.08876756e-01, 1.61309518e-01,
+ 6.53108325e-01, 2.53291603e-01, 4.66310773e-01, 2.44425592e-01,
+ 1.58969584e-01, 1.10375141e-01, 6.56329589e-01, 1.38182951e-01,
+ 1.96582362e-01, 3.68725171e-01, 8.20993230e-01, 9.71012758e-02,
+ 8.37944907e-01, 9.60984079e-02, 9.76459465e-01, 4.68651202e-01,
+ 9.76761088e-01, 6.04845520e-01, 7.39263579e-01, 3.91877923e-02,
+ 2.82806963e-01, 1.20196561e-01, 2.96140198e-01, 1.18727719e-01,
+ 3.17983179e-01, 4.14262995e-01, 6.41474963e-02, 6.92472119e-01,
+ 5.66601454e-01, 2.65389491e-01, 5.23248053e-01, 9.39405108e-02,
+ 5.75946496e-01, 9.29296198e-01, 3.18568952e-01, 6.67410380e-01,
+ 1.31797862e-01, 7.16327204e-01, 2.89406093e-01, 1.83191362e-01,
+ 5.86512935e-01, 2.01075462e-02, 8.28940029e-01, 4.69547619e-03,
+ 6.77816537e-01, 2.70007973e-01, 7.35194022e-01, 9.62188545e-01,
+ 2.48753144e-01, 5.76157334e-01, 5.92041931e-01, 5.72251906e-01,
+ 2.23081633e-01, 9.52749012e-01, 4.47125379e-01, 8.46408672e-01,
+ 6.99479275e-01, 2.97436951e-01, 8.13797820e-01, 3.96505741e-01,
+ 8.81103197e-01, 5.81272873e-01, 8.81735362e-01, 6.92531590e-01,
+ 7.25254280e-01, 5.01324382e-01, 9.56083635e-01, 6.43990199e-01,
+ 4.23855049e-01, 6.06393214e-01, 1.91931983e-02, 3.01574817e-01,
+ 6.60173537e-01, 2.90077607e-01, 6.18015429e-01, 4.28768701e-01,
+ 1.35474064e-01, 2.98282326e-01, 5.69964911e-01, 5.90872761e-01,
+ 5.74325249e-01, 6.53200820e-01, 6.52103270e-01, 4.31418435e-01,
+ 8.96546596e-01, 3.67561870e-01, 4.35864925e-01, 8.91923355e-01,
+ 8.06193989e-01, 7.03888584e-01, 1.00226887e-01, 9.19482614e-01,
+ 7.14241300e-01, 9.98847007e-01, 1.49448305e-01, 8.68126057e-01,
+ 1.62492935e-01, 6.15559564e-01, 1.23819983e-01, 8.48008229e-01,
+ 8.07318959e-01, 5.69100739e-01, 4.07183297e-01, 6.91669955e-02,
+ 6.97428773e-01, 4.53542683e-01, 7.22055599e-01, 8.66382326e-01,
+ 9.75521505e-01, 8.55803342e-01, 1.17140842e-02, 3.59978064e-01,
+ 7.29990562e-01, 1.71629677e-01, 5.21036606e-01, 5.43379883e-02,
+ 1.99996525e-01, 1.85217945e-02, 7.93697703e-01, 2.23924688e-01,
+ 3.45351681e-01, 9.28081293e-01, 7.04414402e-01, 3.18389295e-02,
+ 1.64694156e-01, 6.21478401e-01, 5.77228589e-01, 2.37892821e-01,
+ 9.34213998e-01, 6.13965956e-01, 5.35632803e-01, 5.89909976e-01,
+ 7.30122030e-01, 3.11944995e-01, 3.98221062e-01, 2.09843749e-01,
+ 1.86193006e-01, 9.44372390e-01, 7.39550795e-01, 4.90458809e-01,
+ 2.27414628e-01, 2.54356482e-01, 5.80291603e-02, 4.34416626e-01,
+ 3.11795882e-01, 6.96343489e-01, 3.77751839e-01, 1.79603678e-01,
+ 2.46787284e-02, 6.72496315e-02, 6.79392773e-01, 4.53696845e-01,
+ 5.36579211e-01, 8.96671293e-01, 9.90338947e-01, 2.16896984e-01,
+ 6.63078203e-01, 2.63322377e-01, 2.06509995e-02, 7.58378654e-01,
+ 3.20017151e-01, 3.83463894e-01, 5.88317114e-01, 8.31048455e-01,
+ 6.28981844e-01, 8.72650655e-01, 2.73542035e-01, 7.98046834e-01,
+ 1.85635944e-01, 9.52791657e-01, 6.87488276e-01, 2.15507677e-01,
+ 9.47370590e-01, 7.30855807e-01, 2.53941643e-01, 2.13311977e-01,
+ 5.18200714e-01, 2.56627181e-02, 2.07470075e-01, 4.24685469e-01,
+ 3.74169980e-01, 4.63575424e-01, 2.77628706e-01, 5.86784346e-01,
+ 8.63855606e-01, 1.17531856e-01, 5.17379107e-01, 1.32068106e-01,
+ 7.16859681e-01, 3.96059703e-01, 5.65421312e-01, 1.83279836e-01,
+ 1.44847759e-01, 4.88056281e-01, 3.55612738e-01, 9.40431945e-01,
+ 7.65325254e-01, 7.48663620e-01, 9.03719740e-01, 8.34224354e-02,
+ 5.52192470e-01, 5.84476069e-01, 9.61936379e-01, 2.92147527e-01,
+ 2.40828780e-01, 1.00293942e-01, 1.64296296e-02, 9.29529317e-01,
+ 6.69916547e-01, 7.85152912e-01, 2.81730106e-01, 5.86410166e-01,
+ 6.39552661e-02, 4.85627596e-01, 9.77495140e-01, 8.76505245e-01,
+ 3.38158952e-01, 9.61570155e-01, 2.31701626e-01, 9.49318822e-01,
+ 9.41377705e-01, 7.99202587e-01, 6.30447937e-01, 8.74287967e-01,
+ 2.93020285e-01, 8.48943555e-01, 6.17876692e-01, 1.32368578e-02,
+ 3.47233518e-01, 1.48140861e-01, 9.81829390e-01, 4.78370307e-01,
+ 4.97391365e-01, 6.39472516e-01, 3.68584606e-01, 1.36900272e-01,
+ 8.22117733e-01, 1.89847912e-01, 5.11318983e-01, 2.24317029e-01,
+ 9.78444845e-02, 8.62191517e-01, 9.72919489e-01, 9.60834658e-01,
+ 9.06555499e-01, 7.74047333e-01, 3.33145152e-01, 8.11013900e-02,
+ 4.07241171e-01, 2.32234142e-01, 1.32487635e-01, 5.34271818e-02,
+ 7.25594364e-01, 1.14274586e-02, 7.70580749e-01, 1.46946645e-01,
+ 7.95220826e-02, 8.96030342e-02, 6.72047807e-01, 2.45367210e-01,
+ 4.20539467e-01, 5.57368791e-01, 8.60551174e-01, 7.27044263e-01,
+ 2.70327905e-01, 1.31482799e-01, 5.53743204e-02, 3.01598634e-01,
+ 2.62118149e-01, 4.56140567e-01, 6.83281336e-01, 6.95625446e-01,
+ 2.83518847e-01, 3.79926956e-01, 1.81150962e-01, 7.88545512e-01,
+ 5.68480764e-02, 6.96997242e-01, 7.78695396e-01, 7.77407562e-01,
+ 2.59422564e-01, 3.73813138e-01, 5.87599635e-01, 2.72821902e-01,
+ 3.70852799e-01, 1.97054280e-01, 4.59855884e-01, 4.46123013e-02,
+ 7.99795885e-01, 7.69564470e-02, 5.18835149e-01, 3.06810100e-01,
+ 5.77542949e-01, 9.59433341e-01, 6.45570244e-01, 3.53624358e-02,
+ 4.30402440e-01, 5.10016852e-01, 5.36177495e-01, 6.81392511e-01,
+ 2.77596098e-01, 1.28860565e-01, 3.92675677e-01, 9.56405723e-01,
+ 1.87130892e-01, 9.03983955e-01, 5.43805950e-01, 4.56911422e-01,
+ 8.82041410e-01, 4.58603962e-01, 7.24167637e-01, 3.99025322e-01,
+ 9.04044393e-01, 6.90025020e-01, 6.99622054e-01, 3.27720402e-01,
+ 7.56778643e-01, 6.36061055e-01, 2.40020273e-01, 1.60538822e-01,
+ 7.96391475e-01, 9.59166603e-01, 4.58138827e-01, 5.90984165e-01,
+ 8.57722644e-01, 4.57223453e-01, 9.51874477e-01, 5.75751162e-01,
+ 8.20767121e-01, 9.08843718e-01, 8.15523819e-01, 1.59414463e-01,
+ 6.28898439e-01, 3.98434259e-01, 6.27129520e-02, 4.24032252e-01,
+ 2.58684067e-01, 8.49038308e-01, 3.33046265e-02, 9.58982722e-01,
+ 3.55368848e-01, 3.56706890e-01, 1.63285027e-02, 1.85232325e-01,
+ 4.01259501e-01, 9.29291417e-01, 9.96149302e-02, 9.45301533e-01,
+ 8.69488531e-01, 4.54162397e-01, 3.26700882e-01, 2.32744129e-01,
+ 6.14464706e-01, 3.30745915e-02, 1.56060644e-02, 4.28795722e-01,
+ 6.80740740e-02, 2.51940988e-01, 2.21160915e-01, 2.53191194e-01,
+ 1.31055231e-01, 1.20362229e-02, 1.15484297e-01, 6.18480260e-01,
+ 9.74256213e-01, 9.90345002e-01, 4.09054095e-01, 1.62954426e-01,
+ 6.38761757e-01, 4.90305347e-01, 9.89409777e-01, 6.53042072e-02,
+ 7.83234438e-01, 2.88398497e-01, 2.41418620e-01, 6.62504572e-01,
+ 2.46063185e-01, 6.65859118e-01, 5.17308517e-01, 4.24088988e-01,
+ 5.54687809e-01, 2.87051520e-01, 7.06574706e-01, 4.14856869e-01,
+ 3.60545560e-01, 8.28656915e-01, 9.24966912e-01, 4.60073109e-02,
+ 2.32626993e-01, 3.48519369e-01, 8.14966479e-01, 9.85491428e-01,
+ 9.68971705e-01, 9.04948346e-01, 2.96556265e-01, 9.92011243e-01,
+ 2.49420041e-01, 1.05906155e-01, 9.50952611e-01, 2.33420255e-01,
+ 6.89768265e-01, 5.83563590e-02, 7.30709099e-01, 8.81720212e-01,
+ 2.72436895e-01, 3.79056896e-01, 3.74296183e-01, 7.48788258e-01,
+ 2.37807243e-01, 1.71853099e-01, 4.49291649e-01, 3.04468407e-01,
+ 8.39189122e-01, 2.37741826e-01, 5.02389457e-01, 9.42583600e-01,
+ 6.33997698e-01, 8.67289405e-01, 9.40209689e-01, 7.50764862e-01,
+ 6.99575060e-01, 9.67965567e-01, 9.94400790e-01, 4.51821683e-01,
+ 7.08697782e-02, 2.92794031e-01, 1.52354706e-01, 4.17486375e-01,
+ 1.31289328e-01, 6.04117804e-01, 3.82808059e-01, 8.95385884e-01,
+ 9.67794672e-01, 5.46884902e-01, 2.74823570e-01, 5.92230419e-01,
+ 8.96761158e-01, 4.06733346e-01, 5.52078277e-01, 2.71652768e-01,
+ 4.55444149e-01, 4.01713535e-01, 2.48413465e-01, 5.05866384e-01,
+ 3.10380826e-01, 3.73034864e-01, 5.24970442e-01, 7.50595023e-01,
+ 3.33507466e-01, 9.24158767e-01, 8.62318547e-01, 4.86902960e-02,
+ 2.53642524e-01, 4.46135513e-01, 1.04627889e-01, 3.48475989e-01,
+ 7.40097526e-01, 6.80514481e-01, 6.22384429e-01, 7.10528403e-01,
+ 2.04923687e-01, 3.41698115e-01, 6.76242482e-01, 8.79234763e-01,
+ 5.43678054e-01, 2.82699651e-01, 3.02352580e-02, 7.10336829e-01,
+ 7.88410351e-03, 3.72679070e-01, 5.30537215e-01, 9.22111462e-01,
+ 8.94945450e-02, 4.05942322e-01, 2.43131997e-02, 3.42610984e-01,
+ 6.22231059e-01, 2.79067948e-01, 2.09749950e-01, 1.15703233e-01,
+ 5.77140244e-01, 6.95270006e-01, 6.71957141e-01, 9.48861021e-01,
+ 2.70321389e-03, 6.47196654e-01, 6.00392237e-01, 5.88739610e-01,
+ 9.62770320e-01, 1.68716734e-02, 6.96482431e-01, 8.13678650e-01,
+ 5.09807197e-01, 3.33964870e-01, 7.90840163e-01, 9.72429256e-02,
+ 4.42035638e-01, 5.19952375e-01, 6.93956411e-01, 9.08857320e-02,
+ 2.27759502e-01, 4.10301563e-01, 6.23294673e-01, 8.86960781e-01,
+ 6.18826168e-01, 1.33461471e-01, 9.80580133e-01, 8.71785735e-01,
+ 5.02720761e-01, 9.22347982e-01, 5.41380794e-01, 9.23306068e-01,
+ 8.29897369e-01, 9.68286410e-01, 9.19782811e-01, 3.60338174e-02,
+ 1.74772004e-01, 3.89134677e-01, 9.52142697e-01, 3.00028919e-01,
+ 1.60467644e-01, 8.86304666e-01, 4.46394415e-01, 9.07875594e-01,
+ 1.60230466e-01, 6.61117512e-01, 4.40263753e-01, 7.64867690e-02,
+ 6.96463145e-01, 2.47398756e-01, 3.96155226e-02, 5.99442982e-02,
+ 6.10785371e-02, 9.07732957e-01, 7.39883918e-01, 8.98062357e-01,
+ 6.72582311e-01, 5.28939929e-01, 3.04446364e-01, 9.97962251e-01,
+ 3.62189059e-01, 4.70648949e-01, 3.78245175e-01, 9.79526929e-01,
+ 1.74658385e-01, 3.27988001e-01, 6.80348666e-01, 6.32076183e-02,
+ 6.07249374e-01, 4.77646503e-01, 2.83999977e-01, 2.38413281e-01,
+ 5.14512743e-01, 3.67927581e-01, 4.56519891e-01, 3.37477382e-01,
+ 9.70493694e-01, 1.33439432e-01, 9.68039532e-02, 3.43391729e-01,
+ 5.91026901e-01, 6.59176472e-01, 3.97256747e-01, 9.99277994e-01,
+ 3.51892996e-01, 7.21406668e-01, 6.37582695e-01, 8.13053863e-01,
+ 9.76225663e-01, 8.89793656e-01, 7.64561974e-01, 6.98248478e-01,
+ 3.35498170e-01, 1.47685578e-01, 6.26360031e-02, 2.41901704e-01,
+ 4.32281481e-01, 5.21996274e-01, 7.73083554e-01, 9.58740923e-01,
+ 1.17320480e-01, 1.07004140e-01, 5.89694723e-01, 7.45398074e-01,
+ 8.48150380e-01, 9.35832080e-01, 9.83426242e-01, 3.99801692e-01,
+ 3.80335184e-01, 1.47808677e-01, 6.84934439e-01, 6.56761958e-01,
+ 8.62062596e-01, 9.72579948e-02, 4.97776908e-01, 5.81081930e-01,
+ 2.41557040e-01, 1.69025406e-01, 8.59580836e-01, 5.85349222e-02,
+ 4.70620904e-01, 1.15834001e-01, 4.57058761e-01, 9.79962326e-01,
+ 4.23706353e-01, 8.57124918e-01, 1.17315564e-01, 2.71252077e-01,
+ 4.03792741e-01, 3.99812140e-01, 6.71383479e-01, 3.44718127e-01,
+ 7.13766868e-01, 6.39186899e-01, 3.99161145e-01, 4.31760128e-01,
+ 6.14527700e-01, 7.00421901e-02, 8.22406738e-01, 6.53421161e-01,
+ 7.26342464e-01, 5.36923001e-01, 1.10477111e-01, 4.05035613e-01,
+ 4.05373583e-01, 3.21042990e-01, 2.99503249e-02, 7.37254243e-01,
+ 1.09784458e-01, 6.06308133e-01, 7.03217496e-01, 6.34786323e-01,
+ 9.59142252e-01, 1.03298155e-01, 8.67167159e-01, 2.91902348e-02,
+ 5.34916855e-01, 4.04243618e-01, 5.24183860e-01, 3.65099877e-01,
+ 1.90566915e-01, 1.91228974e-02, 5.18149814e-01, 8.42776863e-01,
+ 3.73215956e-01, 2.22863818e-01, 8.05320035e-02, 8.53109231e-02,
+ 2.21396446e-01, 1.00014061e-01, 2.65039698e-01, 6.61494621e-02,
+ 6.56048672e-02, 8.56276180e-01, 1.62120261e-01, 5.59682406e-01,
+ 7.73455544e-01, 4.56409565e-01, 1.53368878e-01, 1.99596142e-01,
+ 4.32984206e-01, 5.28234089e-01, 3.49440292e-01, 7.81479600e-01,
+ 7.51021649e-01, 9.27211807e-01, 2.89525490e-02, 8.95691291e-01,
+ 3.92568788e-01, 8.78372495e-01, 6.90784776e-01, 9.87348757e-01,
+ 7.59282452e-01, 3.64544626e-01, 5.01063173e-01, 3.76389155e-01,
+ 3.64911836e-01, 2.60904499e-01, 4.95970295e-01, 6.81739945e-01,
+ 2.77340271e-01, 5.24379811e-01, 1.17380294e-01, 1.59845287e-01,
+ 4.68063547e-02, 9.70731443e-01, 3.86035151e-03, 1.78579968e-01,
+ 6.12866753e-01, 8.13695989e-02, 8.81896503e-01, 7.19620158e-01,
+ 9.66389971e-01, 5.07635547e-01, 3.00403683e-01, 5.49500573e-01,
+ 9.30818717e-01, 5.20761437e-01, 2.67207032e-01, 8.77398789e-01,
+ 3.71918749e-01, 1.38335000e-03, 2.47685022e-01, 3.18233509e-01,
+ 8.58777468e-01, 4.58503167e-01, 4.44587288e-01, 3.36102266e-01,
+ 8.80678123e-01, 9.45026777e-01, 9.91890329e-01, 3.76741267e-01,
+ 9.66147446e-01, 7.91879570e-01, 6.75689148e-01, 2.44889479e-01,
+ 2.16457261e-01, 1.66047825e-01, 9.22756610e-01, 2.94076662e-01,
+ 4.53094245e-01, 4.93957834e-01, 7.78171595e-01, 8.44234962e-01,
+ 1.39072701e-01, 4.26904360e-01, 8.42854888e-01, 8.18033306e-01,
+ 1.02413758e-01, 1.56383349e-01, 3.04198692e-01, 7.53590691e-02,
+ 4.24663003e-01, 1.07617705e-01, 5.68217594e-01, 2.46556940e-01,
+ 5.96433065e-01, 1.17525643e-01, 9.75883868e-01, 9.32561204e-01,
+ 3.91796939e-01, 2.42178594e-01, 2.50398213e-01, 4.83393535e-01,
+ 3.99928019e-02, 6.39705106e-01, 4.08302908e-01, 3.77406573e-01,
+ 8.09364971e-01, 7.09035460e-01, 9.54333815e-01, 3.51936240e-01,
+ 8.97542765e-01, 7.69967186e-01, 3.57424652e-01, 6.21665436e-01,
+ 2.88569958e-01, 8.74399917e-01, 1.12427317e-01, 2.12434361e-01,
+ 1.83033292e-01, 4.03026002e-01, 7.45232960e-01, 5.26907449e-01,
+ 4.87676324e-01, 5.45964897e-04, 4.25401725e-01, 6.35537748e-02,
+ 2.08253252e-01, 9.32393939e-01, 2.15398204e-01, 8.58337639e-01,
+ 8.02893372e-01, 1.59146237e-01, 6.05711957e-01, 1.15661872e-01,
+ 7.27888158e-01, 6.37462277e-01, 8.11938562e-01, 4.79384549e-01,
+ 9.14863088e-01, 4.93489468e-02, 2.92888565e-01, 7.15052597e-01,
+ 4.18109212e-01, 1.72951354e-01, 1.07210745e-01, 8.17339111e-01,
+ 4.73142978e-01, 8.82283672e-01, 7.33289134e-01, 4.09726206e-01,
+ 3.73511014e-01, 5.15638347e-01, 8.89059953e-01, 7.37278580e-01,
+ 5.15296427e-03, 6.94157851e-01, 9.19507407e-01, 7.10455760e-01,
+ 1.77005782e-01, 4.83518127e-01, 1.40316018e-01, 3.58995278e-01,
+ 9.37117042e-01, 9.23305308e-01, 2.82836852e-01, 3.39631044e-01,
+ 6.00212868e-01, 9.63197295e-01, 1.47801334e-01, 2.56916644e-01,
+ 8.73556827e-01, 4.91892232e-01, 8.98961092e-01, 1.85517898e-01,
+ 5.32668587e-01, 3.26269633e-01, 3.16542560e-01, 4.46876964e-01,
+ 4.33077449e-01, 3.57346880e-01, 9.14970770e-01, 7.31744185e-01,
+ 7.27546991e-01, 2.89913450e-01, 5.77709424e-01, 7.79179433e-01,
+ 7.95590369e-01, 3.44530461e-01, 7.70872757e-01, 7.35893897e-01,
+ 1.41506486e-01, 8.65945469e-01, 4.41321470e-01, 4.86410449e-01,
+ 4.48369179e-01, 5.67846001e-01, 6.21169247e-01, 4.98179566e-01,
+ 8.66788543e-01, 6.27734756e-01, 4.01427949e-01, 4.16691757e-01,
+ 8.10838615e-01, 3.48191943e-01, 2.11454796e-01, 5.93831880e-02,
+ 8.76026848e-01, 9.18546451e-01, 1.20120182e-01, 3.34473741e-01,
+ 1.75372070e-01, 1.15898469e-01, 8.99866743e-01, 5.68772591e-02,
+ 9.80485663e-01, 9.64508607e-02, 8.63470649e-01, 5.66506107e-01,
+ 3.67917488e-01, 3.42342377e-01, 7.57364143e-01, 3.14573295e-01,
+ 6.57318917e-01, 5.17326084e-01, 4.84965645e-01, 9.01162171e-01,
+ 5.54645059e-01, 8.26861603e-01, 7.25573534e-01, 3.85572461e-02,
+ 7.73110053e-01, 2.16870250e-01, 9.03149647e-01, 4.29241906e-02,
+ 3.33072034e-01, 9.97329472e-02, 4.75589117e-01, 8.20022436e-01,
+ 2.98187360e-01, 1.50934897e-01, 3.30267036e-01, 8.13880142e-01,
+ 1.40383958e-01, 2.27362449e-01, 6.88519645e-02, 7.05710044e-01,
+ 3.95233244e-01, 3.10839977e-01, 7.18626390e-01, 3.35977542e-01,
+ 7.27771273e-01, 8.15199395e-01, 2.17662843e-01, 9.73818697e-01,
+ 1.62357948e-01, 2.90840907e-01, 1.79795291e-01, 3.45505656e-01,
+ 4.80060888e-01, 5.22175869e-01, 8.53606042e-01, 8.89447909e-01,
+ 2.20103861e-01, 6.22894032e-01, 1.11496057e-01, 4.58969860e-01,
+ 3.22333538e-01, 3.16500745e-01, 4.82584242e-01, 7.29827636e-01,
+ 6.91826588e-02, 8.79173338e-01, 7.34813775e-01, 1.76499389e-01,
+ 9.39160909e-01, 5.06312224e-01, 9.99808578e-01, 1.97259474e-01,
+ 5.34908198e-01, 2.90248043e-01, 3.04173557e-01, 5.91065381e-01,
+ 9.21719067e-01, 8.05263856e-01, 7.23941399e-01, 5.59173782e-01,
+ 9.22298504e-01, 4.92361407e-01, 8.73832178e-01, 8.33981644e-01,
+ 2.13835347e-01, 7.71225463e-01, 1.21711569e-02, 3.22829538e-01,
+ 2.29567445e-01, 5.06862958e-01, 7.36853162e-01, 9.76763674e-02,
+ 5.14922202e-01, 9.38412022e-01, 2.28646551e-01, 6.77141144e-01];
+
// set up a line search
let linesearch = MoreThuenteLineSearch::new().c(1e-4, 0.9)?;
// Set up solver
- let solver = LBFGS::new(linesearch, 7);
+ let solver = LBFGS::new(linesearch, 10);
// Run solver
let res = Executor::new(cost, solver, init_param)
// .add_observer(ArgminSlogLogger::term(), ObserverMode::Always)
- .max_iters(100)
+ .max_iters(300)
.run()?;
// Wait a second (lets the logger flush everything before printing again)
The output is,
ArgminResult:
param (best): [0.9998660430280772, 0.9997711162728186, 1.0000183343714812, 0.9999545581212057, 1.0001581312529746, ..., 1.0020650224246153, 1.0026319569449569, 1.005436729578769, 1.008673050260663, 1.017108153862398], shape=[1000], strides=[1], layout=C | F (0x3), const ndim=1
cost (best): 764.7226773367395
iters (best): 299
iters (best): 299
iters (total): 300
termination: Maximum number of iterations reached
time: 697.088308ms
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 1,652 MB perf.data (200 samples) ]
writing flamegraph to "flamegraph.svg"
and the resulting flamegraph can be found below,
full svg here.
I'm not used to do much profiling of Rust application, but it does looks like it's spending a lot of the time in clone and in managing openblas threads. The latter is somewhat surprising as I have also tried setting OMP_NUM_THREADS=1
environmental variable with no effect. Profiling run on 12 core CPU.
from argmin.
Though, maybe I'm doing something wrong with flamegraph. If I run the same example with cargo run
it takes 1.70s
which is a reasonable run time for profiling. However with cargo flamegraph
the same 100 iterations of the solver are done in only 25.428ms
according to the time printed to stdout. If true the above results are likely not meaningful (or something is wrong with the timer).
Edit: yes, the --release
option was missing from cargo run
, so I need a bigger example. I updated the figures above with an example that takes longer to run (700ms now), but generally we still might need a more realistic problem than Rosenbrock.
from argmin.
This is a lot slower than I expected.
code optimization: as far as I understood form @stefan-k comment above, no one has spent much too time optimizing LBFGS in argmin, so even a little benchmarking & tuning effort might yield reasonable results. In particular, one clear advantage over Fortran we have is the ability to use BLAS.
This is true, unfortunately performance optimization has so far not been one of the main concerns (even though this is the most fun part). When looking at the LBFGS implementation, I believe one of the main problems might be the clones. In particular, the operator is cloned every time the line search is executed (in OpWrapper::new_from_op(&op)
). It should be possible to solve this without clones. This may then also remove the requirement on operators having to implement Clone
.
I have to admit I'm not used to flamegraphs: Do I see this correctly that a lot of time is spent on computing the gradient?
from argmin.
parameters: we need to make sure that we run benchmarks exactly with the same parameters (including line search) as the Fortran version for the results to be comparable.
Yes, this was one of the problem I had. My original code was
let mut lbfgsb = Lbfgsb::new(x0, f, None);
lbfgsb.max_iteration(100);
lbfgsb.set_matric_correction(10);
lbfgsb.set_termination_tolerance(1e-7);
lbfgsb.set_tolerance(1e-05);
I think that matric_correction
is the second parameter in LBFGS::new
and max_iteration
is easy to find. However, I couldn't find termination_tolerance
and tolerance
in argmin. Also, I used MoreThuenteLineSearch
with "random" parameters as I don't know what they are using in the original Fortran version.
from argmin.
Mea culpa for bringing this back from the dead - I'd be interested whether any progress has been made here?
from argmin.
None that I'm aware of. I seem to recall that one of the released version made the whole crate faster, but I can't find any proof of that!
In all case, even if someone works on that problem, I doubt that a general optimization crate can be faster than a specialized Fortran L-BFGS library.
from argmin.
I seem to recall that one of the released version made the whole crate faster, but I can't find any proof of that!
Probably this change: #112 . My initial design of observers was... well, stupid. Before this PR, quite a lot of time was spent on observers, even if no observer was used (some details are here: #111). I made further improvements to the observer design later on, but I think those had less of an impact performance-wise.
I doubt that a general optimization crate can be faster than a specialized Fortran L-BFGS library.
That's probably correct, but benchmarks could still be insightful.
I'd be interested whether any progress has been made here?
Unfortunately no. I don't think I'll be able to commit time to this in the near future. I'd appreciate it if someone with a bit of benchmarking experience could help out with this one.
from argmin.
Hi @jonboh,
thanks a lot for this comment and the work you have put into it! I have heard of criterion.rs but haven't looked into it in detail. This looks amazing and I think this would be an extremely useful addition and a very good basis for future performance improvements. I would highly appreciate it if you could issue a PR :)
from argmin.
Related Issues (20)
- Create inplace variants of math traits
- ParticleSwarm is not reproducible HOT 3
- More-Thuente Linesearch Algorithm Bug: "Search direction must be a descent direction" HOT 7
- BrentRoot is missing tests
- Add `faer-rs` as math backend
- Add stopping criteria based on the number of function evaluations
- Add LaTeX support to docs
- Add derivatives and Hessians to test functions in argmin_testfunctions
- Add more test functions to `argmin_testfunctions`
- Create visualization of all test functions in `argmin_testfunctions`
- `owl-qn` example does not converge HOT 4
- Observers and checkpointing are missing tests
- Enable observers to create final report
- Thoughts on future state handling
- Various non productive overhead HOT 1
- Example (lbfgs) cannot be compiled after upgrade to version 0.10 HOT 2
- rss/atom feed of the blog HOT 1
- newton and newton cg freeze rather than stopping HOT 1
- Difference between bfgs.rs and bfgs of optimtool.unconstrain.newton_quasi HOT 2
- Add Differential Evolution Algorithm HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from argmin.