Giter Club home page Giter Club logo

Comments (12)

jonboh avatar jonboh commented on May 30, 2024 3

Hello @stefan-k, have you considered adding criterion.rs style benchmarks in order to track the crate's algorithm performance?
I've quickly turned the backtracking example into one and run it on the commit prior to #112 and the current main branch.
This isn't strictly related to benchmarking L-BFGS in comparison to the Fortran library but could ease evaluating changes that aim to improve performance, and also would prevent degrading it in future releases.
As you can see in the changes in my fork, the changes necessary are really minimal:

With those changes we can generate reports with a couple of commands

git checkout 6deb382_benchmark
cargo bench --bench backtracking -- --save-baseline 6deb382
git checkout main
cargo bench --bench backtracking -- --baseline 6deb382
xdg-open ./target/criterion/report/index.html

image
image
(blue corresponds to main and red to the commit prior to #112 )

Let me know your thoughts, if you think it would be a worthwhile addition I think I could make a PR this week to add these benchmarks.
I think this would ease working on the performance of the LBFGS algorithm (and others). It would give a good target in which to run the profiles.

from argmin.

stefan-k avatar stefan-k commented on May 30, 2024 1

Such a benchmark would be really interesting. I haven't compared against these implementations at all, so besides the performance differences, it may be possible that they do not produce the exact same results.
In terms of performance, I suspect (or better, hope) that most time is spent on linear algebra. I think flamegraph could be a very useful tool for this.
I will see if I can find problems for the benchmark (which is something I wanted to do for a long time anyway).

from argmin.

nilgoyette avatar nilgoyette commented on May 30, 2024

I tested this today. I'm using the original Fortran implementation v3. My application runs in

  • 15s with the L-BFGS-B version
  • 18s with the L-BFGS version

I really wanted to test argmin-rs because I want to get rid of Fortran in my perfect Rust codebase :D I changed only the L-BFGS part, from the original version to the argmin-rs version. My application now runs in 1m09s. I'm aware that this is not a benchmark at all! Still, now I don't need to code one as I have a ballpark of what I would gain (or lose). Of course, this slowdown is unacceptable, so I won't use this version. On the plus side though

  • the code I needed to write using argmin-rs is is much more pleasant! A lot less code and no damn Fortran.
  • The results are really close, with a lower final cost, like 0.005 less.

This makes me wonder what tricks they used in the original version. It's obviously optimized by a group of genius mathematicians :P If I test the C port, I'll tell update this page.

from argmin.

rth avatar rth commented on May 30, 2024

Thanks for the feedback @nilgoyette ! I also did some benchmarks recently between pyargmin and scipy for L-BFGS-B on the logistic regression problem, am also getting comparable results (i.e. pyargmin is ~4x slower). The WIP comparison notebook (in Python) can be found here.

I think there are two directions to explore to improve this situation,

  • parameters: we need to make sure that we run benchmarks exactly with the same parameters (including line search) as the Fortran version for the results to be comparable. #37 helps, there is more work needed for a rigorous comparison. This may also lead to a conclusion that some default values may be sub-optimal.
  • code optimization: as far as I understood form @stefan-k comment above, no one has spent much too time optimizing LBFGS in argmin, so even a little benchmarking & tuning effort might yield reasonable results. In particular, one clear advantage over Fortran we have is the ability to use BLAS.

from argmin.

rth avatar rth commented on May 30, 2024

To have a starting point for this discussion about optimization, here are results of,

cargo flamegraph --example lbfgs --features ndarrayl

with a modified lbfgs example to increase problem dimensionality (and run longer),

diff --git a/Cargo.toml b/Cargo.toml
index e4c176fd..9cce01f1 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -54,3 +54,6 @@ visualizer = ["gnuplot"]
 [badges]
 travis-ci = { repository = "argmin-rs/argmin", branch = "master" }
 maintenance = { status = "actively-developed" }
+
+[profile.release]
+debug = true
diff --git a/examples/lbfgs.rs b/examples/lbfgs.rs
index c22fa30f..5ca625c9 100644
--- a/examples/lbfgs.rs
+++ b/examples/lbfgs.rs
@@ -42,19 +42,272 @@ fn run() -> Result<(), Error> {
     let cost = Rosenbrock { a: 1.0, b: 100.0 };
 
     // Define initial parameter vector
-    let init_param: Array1<f64> = array![-1.2, 1.0];
-    // let init_param: Array1<f64> = array![-1.2, 1.0, -10.0, 2.0, 3.0, 2.0, 4.0, 10.0];
+    // Output of 
+    // >>> import numpy as np
+    //>>> np.random.RandomState(0).rand(1000)
+    let init_param: Array1<f64> = array![
+       5.48813504e-01, 7.15189366e-01, 6.02763376e-01, 5.44883183e-01,
+       4.23654799e-01, 6.45894113e-01, 4.37587211e-01, 8.91773001e-01,
+       9.63662761e-01, 3.83441519e-01, 7.91725038e-01, 5.28894920e-01,
+       5.68044561e-01, 9.25596638e-01, 7.10360582e-02, 8.71292997e-02,
+       2.02183974e-02, 8.32619846e-01, 7.78156751e-01, 8.70012148e-01,
+       9.78618342e-01, 7.99158564e-01, 4.61479362e-01, 7.80529176e-01,
+       1.18274426e-01, 6.39921021e-01, 1.43353287e-01, 9.44668917e-01,
+       5.21848322e-01, 4.14661940e-01, 2.64555612e-01, 7.74233689e-01,
+       4.56150332e-01, 5.68433949e-01, 1.87898004e-02, 6.17635497e-01,
+       6.12095723e-01, 6.16933997e-01, 9.43748079e-01, 6.81820299e-01,
+       3.59507901e-01, 4.37031954e-01, 6.97631196e-01, 6.02254716e-02,
+       6.66766715e-01, 6.70637870e-01, 2.10382561e-01, 1.28926298e-01,
+       3.15428351e-01, 3.63710771e-01, 5.70196770e-01, 4.38601513e-01,
+       9.88373838e-01, 1.02044811e-01, 2.08876756e-01, 1.61309518e-01,
+       6.53108325e-01, 2.53291603e-01, 4.66310773e-01, 2.44425592e-01,
+       1.58969584e-01, 1.10375141e-01, 6.56329589e-01, 1.38182951e-01,
+       1.96582362e-01, 3.68725171e-01, 8.20993230e-01, 9.71012758e-02,
+       8.37944907e-01, 9.60984079e-02, 9.76459465e-01, 4.68651202e-01,
+       9.76761088e-01, 6.04845520e-01, 7.39263579e-01, 3.91877923e-02,
+       2.82806963e-01, 1.20196561e-01, 2.96140198e-01, 1.18727719e-01,
+       3.17983179e-01, 4.14262995e-01, 6.41474963e-02, 6.92472119e-01,
+       5.66601454e-01, 2.65389491e-01, 5.23248053e-01, 9.39405108e-02,
+       5.75946496e-01, 9.29296198e-01, 3.18568952e-01, 6.67410380e-01,
+       1.31797862e-01, 7.16327204e-01, 2.89406093e-01, 1.83191362e-01,
+       5.86512935e-01, 2.01075462e-02, 8.28940029e-01, 4.69547619e-03,
+       6.77816537e-01, 2.70007973e-01, 7.35194022e-01, 9.62188545e-01,
+       2.48753144e-01, 5.76157334e-01, 5.92041931e-01, 5.72251906e-01,
+       2.23081633e-01, 9.52749012e-01, 4.47125379e-01, 8.46408672e-01,
+       6.99479275e-01, 2.97436951e-01, 8.13797820e-01, 3.96505741e-01,
+       8.81103197e-01, 5.81272873e-01, 8.81735362e-01, 6.92531590e-01,
+       7.25254280e-01, 5.01324382e-01, 9.56083635e-01, 6.43990199e-01,
+       4.23855049e-01, 6.06393214e-01, 1.91931983e-02, 3.01574817e-01,
+       6.60173537e-01, 2.90077607e-01, 6.18015429e-01, 4.28768701e-01,
+       1.35474064e-01, 2.98282326e-01, 5.69964911e-01, 5.90872761e-01,
+       5.74325249e-01, 6.53200820e-01, 6.52103270e-01, 4.31418435e-01,
+       8.96546596e-01, 3.67561870e-01, 4.35864925e-01, 8.91923355e-01,
+       8.06193989e-01, 7.03888584e-01, 1.00226887e-01, 9.19482614e-01,
+       7.14241300e-01, 9.98847007e-01, 1.49448305e-01, 8.68126057e-01,
+       1.62492935e-01, 6.15559564e-01, 1.23819983e-01, 8.48008229e-01,
+       8.07318959e-01, 5.69100739e-01, 4.07183297e-01, 6.91669955e-02,
+       6.97428773e-01, 4.53542683e-01, 7.22055599e-01, 8.66382326e-01,
+       9.75521505e-01, 8.55803342e-01, 1.17140842e-02, 3.59978064e-01,
+       7.29990562e-01, 1.71629677e-01, 5.21036606e-01, 5.43379883e-02,
+       1.99996525e-01, 1.85217945e-02, 7.93697703e-01, 2.23924688e-01,
+       3.45351681e-01, 9.28081293e-01, 7.04414402e-01, 3.18389295e-02,
+       1.64694156e-01, 6.21478401e-01, 5.77228589e-01, 2.37892821e-01,
+       9.34213998e-01, 6.13965956e-01, 5.35632803e-01, 5.89909976e-01,
+       7.30122030e-01, 3.11944995e-01, 3.98221062e-01, 2.09843749e-01,
+       1.86193006e-01, 9.44372390e-01, 7.39550795e-01, 4.90458809e-01,
+       2.27414628e-01, 2.54356482e-01, 5.80291603e-02, 4.34416626e-01,
+       3.11795882e-01, 6.96343489e-01, 3.77751839e-01, 1.79603678e-01,
+       2.46787284e-02, 6.72496315e-02, 6.79392773e-01, 4.53696845e-01,
+       5.36579211e-01, 8.96671293e-01, 9.90338947e-01, 2.16896984e-01,
+       6.63078203e-01, 2.63322377e-01, 2.06509995e-02, 7.58378654e-01,
+       3.20017151e-01, 3.83463894e-01, 5.88317114e-01, 8.31048455e-01,
+       6.28981844e-01, 8.72650655e-01, 2.73542035e-01, 7.98046834e-01,
+       1.85635944e-01, 9.52791657e-01, 6.87488276e-01, 2.15507677e-01,
+       9.47370590e-01, 7.30855807e-01, 2.53941643e-01, 2.13311977e-01,
+       5.18200714e-01, 2.56627181e-02, 2.07470075e-01, 4.24685469e-01,
+       3.74169980e-01, 4.63575424e-01, 2.77628706e-01, 5.86784346e-01,
+       8.63855606e-01, 1.17531856e-01, 5.17379107e-01, 1.32068106e-01,
+       7.16859681e-01, 3.96059703e-01, 5.65421312e-01, 1.83279836e-01,
+       1.44847759e-01, 4.88056281e-01, 3.55612738e-01, 9.40431945e-01,
+       7.65325254e-01, 7.48663620e-01, 9.03719740e-01, 8.34224354e-02,
+       5.52192470e-01, 5.84476069e-01, 9.61936379e-01, 2.92147527e-01,
+       2.40828780e-01, 1.00293942e-01, 1.64296296e-02, 9.29529317e-01,
+       6.69916547e-01, 7.85152912e-01, 2.81730106e-01, 5.86410166e-01,
+       6.39552661e-02, 4.85627596e-01, 9.77495140e-01, 8.76505245e-01,
+       3.38158952e-01, 9.61570155e-01, 2.31701626e-01, 9.49318822e-01,
+       9.41377705e-01, 7.99202587e-01, 6.30447937e-01, 8.74287967e-01,
+       2.93020285e-01, 8.48943555e-01, 6.17876692e-01, 1.32368578e-02,
+       3.47233518e-01, 1.48140861e-01, 9.81829390e-01, 4.78370307e-01,
+       4.97391365e-01, 6.39472516e-01, 3.68584606e-01, 1.36900272e-01,
+       8.22117733e-01, 1.89847912e-01, 5.11318983e-01, 2.24317029e-01,
+       9.78444845e-02, 8.62191517e-01, 9.72919489e-01, 9.60834658e-01,
+       9.06555499e-01, 7.74047333e-01, 3.33145152e-01, 8.11013900e-02,
+       4.07241171e-01, 2.32234142e-01, 1.32487635e-01, 5.34271818e-02,
+       7.25594364e-01, 1.14274586e-02, 7.70580749e-01, 1.46946645e-01,
+       7.95220826e-02, 8.96030342e-02, 6.72047807e-01, 2.45367210e-01,
+       4.20539467e-01, 5.57368791e-01, 8.60551174e-01, 7.27044263e-01,
+       2.70327905e-01, 1.31482799e-01, 5.53743204e-02, 3.01598634e-01,
+       2.62118149e-01, 4.56140567e-01, 6.83281336e-01, 6.95625446e-01,
+       2.83518847e-01, 3.79926956e-01, 1.81150962e-01, 7.88545512e-01,
+       5.68480764e-02, 6.96997242e-01, 7.78695396e-01, 7.77407562e-01,
+       2.59422564e-01, 3.73813138e-01, 5.87599635e-01, 2.72821902e-01,
+       3.70852799e-01, 1.97054280e-01, 4.59855884e-01, 4.46123013e-02,
+       7.99795885e-01, 7.69564470e-02, 5.18835149e-01, 3.06810100e-01,
+       5.77542949e-01, 9.59433341e-01, 6.45570244e-01, 3.53624358e-02,
+       4.30402440e-01, 5.10016852e-01, 5.36177495e-01, 6.81392511e-01,
+       2.77596098e-01, 1.28860565e-01, 3.92675677e-01, 9.56405723e-01,
+       1.87130892e-01, 9.03983955e-01, 5.43805950e-01, 4.56911422e-01,
+       8.82041410e-01, 4.58603962e-01, 7.24167637e-01, 3.99025322e-01,
+       9.04044393e-01, 6.90025020e-01, 6.99622054e-01, 3.27720402e-01,
+       7.56778643e-01, 6.36061055e-01, 2.40020273e-01, 1.60538822e-01,
+       7.96391475e-01, 9.59166603e-01, 4.58138827e-01, 5.90984165e-01,
+       8.57722644e-01, 4.57223453e-01, 9.51874477e-01, 5.75751162e-01,
+       8.20767121e-01, 9.08843718e-01, 8.15523819e-01, 1.59414463e-01,
+       6.28898439e-01, 3.98434259e-01, 6.27129520e-02, 4.24032252e-01,
+       2.58684067e-01, 8.49038308e-01, 3.33046265e-02, 9.58982722e-01,
+       3.55368848e-01, 3.56706890e-01, 1.63285027e-02, 1.85232325e-01,
+       4.01259501e-01, 9.29291417e-01, 9.96149302e-02, 9.45301533e-01,
+       8.69488531e-01, 4.54162397e-01, 3.26700882e-01, 2.32744129e-01,
+       6.14464706e-01, 3.30745915e-02, 1.56060644e-02, 4.28795722e-01,
+       6.80740740e-02, 2.51940988e-01, 2.21160915e-01, 2.53191194e-01,
+       1.31055231e-01, 1.20362229e-02, 1.15484297e-01, 6.18480260e-01,
+       9.74256213e-01, 9.90345002e-01, 4.09054095e-01, 1.62954426e-01,
+       6.38761757e-01, 4.90305347e-01, 9.89409777e-01, 6.53042072e-02,
+       7.83234438e-01, 2.88398497e-01, 2.41418620e-01, 6.62504572e-01,
+       2.46063185e-01, 6.65859118e-01, 5.17308517e-01, 4.24088988e-01,
+       5.54687809e-01, 2.87051520e-01, 7.06574706e-01, 4.14856869e-01,
+       3.60545560e-01, 8.28656915e-01, 9.24966912e-01, 4.60073109e-02,
+       2.32626993e-01, 3.48519369e-01, 8.14966479e-01, 9.85491428e-01,
+       9.68971705e-01, 9.04948346e-01, 2.96556265e-01, 9.92011243e-01,
+       2.49420041e-01, 1.05906155e-01, 9.50952611e-01, 2.33420255e-01,
+       6.89768265e-01, 5.83563590e-02, 7.30709099e-01, 8.81720212e-01,
+       2.72436895e-01, 3.79056896e-01, 3.74296183e-01, 7.48788258e-01,
+       2.37807243e-01, 1.71853099e-01, 4.49291649e-01, 3.04468407e-01,
+       8.39189122e-01, 2.37741826e-01, 5.02389457e-01, 9.42583600e-01,
+       6.33997698e-01, 8.67289405e-01, 9.40209689e-01, 7.50764862e-01,
+       6.99575060e-01, 9.67965567e-01, 9.94400790e-01, 4.51821683e-01,
+       7.08697782e-02, 2.92794031e-01, 1.52354706e-01, 4.17486375e-01,
+       1.31289328e-01, 6.04117804e-01, 3.82808059e-01, 8.95385884e-01,
+       9.67794672e-01, 5.46884902e-01, 2.74823570e-01, 5.92230419e-01,
+       8.96761158e-01, 4.06733346e-01, 5.52078277e-01, 2.71652768e-01,
+       4.55444149e-01, 4.01713535e-01, 2.48413465e-01, 5.05866384e-01,
+       3.10380826e-01, 3.73034864e-01, 5.24970442e-01, 7.50595023e-01,
+       3.33507466e-01, 9.24158767e-01, 8.62318547e-01, 4.86902960e-02,
+       2.53642524e-01, 4.46135513e-01, 1.04627889e-01, 3.48475989e-01,
+       7.40097526e-01, 6.80514481e-01, 6.22384429e-01, 7.10528403e-01,
+       2.04923687e-01, 3.41698115e-01, 6.76242482e-01, 8.79234763e-01,
+       5.43678054e-01, 2.82699651e-01, 3.02352580e-02, 7.10336829e-01,
+       7.88410351e-03, 3.72679070e-01, 5.30537215e-01, 9.22111462e-01,
+       8.94945450e-02, 4.05942322e-01, 2.43131997e-02, 3.42610984e-01,
+       6.22231059e-01, 2.79067948e-01, 2.09749950e-01, 1.15703233e-01,
+       5.77140244e-01, 6.95270006e-01, 6.71957141e-01, 9.48861021e-01,
+       2.70321389e-03, 6.47196654e-01, 6.00392237e-01, 5.88739610e-01,
+       9.62770320e-01, 1.68716734e-02, 6.96482431e-01, 8.13678650e-01,
+       5.09807197e-01, 3.33964870e-01, 7.90840163e-01, 9.72429256e-02,
+       4.42035638e-01, 5.19952375e-01, 6.93956411e-01, 9.08857320e-02,
+       2.27759502e-01, 4.10301563e-01, 6.23294673e-01, 8.86960781e-01,
+       6.18826168e-01, 1.33461471e-01, 9.80580133e-01, 8.71785735e-01,
+       5.02720761e-01, 9.22347982e-01, 5.41380794e-01, 9.23306068e-01,
+       8.29897369e-01, 9.68286410e-01, 9.19782811e-01, 3.60338174e-02,
+       1.74772004e-01, 3.89134677e-01, 9.52142697e-01, 3.00028919e-01,
+       1.60467644e-01, 8.86304666e-01, 4.46394415e-01, 9.07875594e-01,
+       1.60230466e-01, 6.61117512e-01, 4.40263753e-01, 7.64867690e-02,
+       6.96463145e-01, 2.47398756e-01, 3.96155226e-02, 5.99442982e-02,
+       6.10785371e-02, 9.07732957e-01, 7.39883918e-01, 8.98062357e-01,
+       6.72582311e-01, 5.28939929e-01, 3.04446364e-01, 9.97962251e-01,
+       3.62189059e-01, 4.70648949e-01, 3.78245175e-01, 9.79526929e-01,
+       1.74658385e-01, 3.27988001e-01, 6.80348666e-01, 6.32076183e-02,
+       6.07249374e-01, 4.77646503e-01, 2.83999977e-01, 2.38413281e-01,
+       5.14512743e-01, 3.67927581e-01, 4.56519891e-01, 3.37477382e-01,
+       9.70493694e-01, 1.33439432e-01, 9.68039532e-02, 3.43391729e-01,
+       5.91026901e-01, 6.59176472e-01, 3.97256747e-01, 9.99277994e-01,
+       3.51892996e-01, 7.21406668e-01, 6.37582695e-01, 8.13053863e-01,
+       9.76225663e-01, 8.89793656e-01, 7.64561974e-01, 6.98248478e-01,
+       3.35498170e-01, 1.47685578e-01, 6.26360031e-02, 2.41901704e-01,
+       4.32281481e-01, 5.21996274e-01, 7.73083554e-01, 9.58740923e-01,
+       1.17320480e-01, 1.07004140e-01, 5.89694723e-01, 7.45398074e-01,
+       8.48150380e-01, 9.35832080e-01, 9.83426242e-01, 3.99801692e-01,
+       3.80335184e-01, 1.47808677e-01, 6.84934439e-01, 6.56761958e-01,
+       8.62062596e-01, 9.72579948e-02, 4.97776908e-01, 5.81081930e-01,
+       2.41557040e-01, 1.69025406e-01, 8.59580836e-01, 5.85349222e-02,
+       4.70620904e-01, 1.15834001e-01, 4.57058761e-01, 9.79962326e-01,
+       4.23706353e-01, 8.57124918e-01, 1.17315564e-01, 2.71252077e-01,
+       4.03792741e-01, 3.99812140e-01, 6.71383479e-01, 3.44718127e-01,
+       7.13766868e-01, 6.39186899e-01, 3.99161145e-01, 4.31760128e-01,
+       6.14527700e-01, 7.00421901e-02, 8.22406738e-01, 6.53421161e-01,
+       7.26342464e-01, 5.36923001e-01, 1.10477111e-01, 4.05035613e-01,
+       4.05373583e-01, 3.21042990e-01, 2.99503249e-02, 7.37254243e-01,
+       1.09784458e-01, 6.06308133e-01, 7.03217496e-01, 6.34786323e-01,
+       9.59142252e-01, 1.03298155e-01, 8.67167159e-01, 2.91902348e-02,
+       5.34916855e-01, 4.04243618e-01, 5.24183860e-01, 3.65099877e-01,
+       1.90566915e-01, 1.91228974e-02, 5.18149814e-01, 8.42776863e-01,
+       3.73215956e-01, 2.22863818e-01, 8.05320035e-02, 8.53109231e-02,
+       2.21396446e-01, 1.00014061e-01, 2.65039698e-01, 6.61494621e-02,
+       6.56048672e-02, 8.56276180e-01, 1.62120261e-01, 5.59682406e-01,
+       7.73455544e-01, 4.56409565e-01, 1.53368878e-01, 1.99596142e-01,
+       4.32984206e-01, 5.28234089e-01, 3.49440292e-01, 7.81479600e-01,
+       7.51021649e-01, 9.27211807e-01, 2.89525490e-02, 8.95691291e-01,
+       3.92568788e-01, 8.78372495e-01, 6.90784776e-01, 9.87348757e-01,
+       7.59282452e-01, 3.64544626e-01, 5.01063173e-01, 3.76389155e-01,
+       3.64911836e-01, 2.60904499e-01, 4.95970295e-01, 6.81739945e-01,
+       2.77340271e-01, 5.24379811e-01, 1.17380294e-01, 1.59845287e-01,
+       4.68063547e-02, 9.70731443e-01, 3.86035151e-03, 1.78579968e-01,
+       6.12866753e-01, 8.13695989e-02, 8.81896503e-01, 7.19620158e-01,
+       9.66389971e-01, 5.07635547e-01, 3.00403683e-01, 5.49500573e-01,
+       9.30818717e-01, 5.20761437e-01, 2.67207032e-01, 8.77398789e-01,
+       3.71918749e-01, 1.38335000e-03, 2.47685022e-01, 3.18233509e-01,
+       8.58777468e-01, 4.58503167e-01, 4.44587288e-01, 3.36102266e-01,
+       8.80678123e-01, 9.45026777e-01, 9.91890329e-01, 3.76741267e-01,
+       9.66147446e-01, 7.91879570e-01, 6.75689148e-01, 2.44889479e-01,
+       2.16457261e-01, 1.66047825e-01, 9.22756610e-01, 2.94076662e-01,
+       4.53094245e-01, 4.93957834e-01, 7.78171595e-01, 8.44234962e-01,
+       1.39072701e-01, 4.26904360e-01, 8.42854888e-01, 8.18033306e-01,
+       1.02413758e-01, 1.56383349e-01, 3.04198692e-01, 7.53590691e-02,
+       4.24663003e-01, 1.07617705e-01, 5.68217594e-01, 2.46556940e-01,
+       5.96433065e-01, 1.17525643e-01, 9.75883868e-01, 9.32561204e-01,
+       3.91796939e-01, 2.42178594e-01, 2.50398213e-01, 4.83393535e-01,
+       3.99928019e-02, 6.39705106e-01, 4.08302908e-01, 3.77406573e-01,
+       8.09364971e-01, 7.09035460e-01, 9.54333815e-01, 3.51936240e-01,
+       8.97542765e-01, 7.69967186e-01, 3.57424652e-01, 6.21665436e-01,
+       2.88569958e-01, 8.74399917e-01, 1.12427317e-01, 2.12434361e-01,
+       1.83033292e-01, 4.03026002e-01, 7.45232960e-01, 5.26907449e-01,
+       4.87676324e-01, 5.45964897e-04, 4.25401725e-01, 6.35537748e-02,
+       2.08253252e-01, 9.32393939e-01, 2.15398204e-01, 8.58337639e-01,
+       8.02893372e-01, 1.59146237e-01, 6.05711957e-01, 1.15661872e-01,
+       7.27888158e-01, 6.37462277e-01, 8.11938562e-01, 4.79384549e-01,
+       9.14863088e-01, 4.93489468e-02, 2.92888565e-01, 7.15052597e-01,
+       4.18109212e-01, 1.72951354e-01, 1.07210745e-01, 8.17339111e-01,
+       4.73142978e-01, 8.82283672e-01, 7.33289134e-01, 4.09726206e-01,
+       3.73511014e-01, 5.15638347e-01, 8.89059953e-01, 7.37278580e-01,
+       5.15296427e-03, 6.94157851e-01, 9.19507407e-01, 7.10455760e-01,
+       1.77005782e-01, 4.83518127e-01, 1.40316018e-01, 3.58995278e-01,
+       9.37117042e-01, 9.23305308e-01, 2.82836852e-01, 3.39631044e-01,
+       6.00212868e-01, 9.63197295e-01, 1.47801334e-01, 2.56916644e-01,
+       8.73556827e-01, 4.91892232e-01, 8.98961092e-01, 1.85517898e-01,
+       5.32668587e-01, 3.26269633e-01, 3.16542560e-01, 4.46876964e-01,
+       4.33077449e-01, 3.57346880e-01, 9.14970770e-01, 7.31744185e-01,
+       7.27546991e-01, 2.89913450e-01, 5.77709424e-01, 7.79179433e-01,
+       7.95590369e-01, 3.44530461e-01, 7.70872757e-01, 7.35893897e-01,
+       1.41506486e-01, 8.65945469e-01, 4.41321470e-01, 4.86410449e-01,
+       4.48369179e-01, 5.67846001e-01, 6.21169247e-01, 4.98179566e-01,
+       8.66788543e-01, 6.27734756e-01, 4.01427949e-01, 4.16691757e-01,
+       8.10838615e-01, 3.48191943e-01, 2.11454796e-01, 5.93831880e-02,
+       8.76026848e-01, 9.18546451e-01, 1.20120182e-01, 3.34473741e-01,
+       1.75372070e-01, 1.15898469e-01, 8.99866743e-01, 5.68772591e-02,
+       9.80485663e-01, 9.64508607e-02, 8.63470649e-01, 5.66506107e-01,
+       3.67917488e-01, 3.42342377e-01, 7.57364143e-01, 3.14573295e-01,
+       6.57318917e-01, 5.17326084e-01, 4.84965645e-01, 9.01162171e-01,
+       5.54645059e-01, 8.26861603e-01, 7.25573534e-01, 3.85572461e-02,
+       7.73110053e-01, 2.16870250e-01, 9.03149647e-01, 4.29241906e-02,
+       3.33072034e-01, 9.97329472e-02, 4.75589117e-01, 8.20022436e-01,
+       2.98187360e-01, 1.50934897e-01, 3.30267036e-01, 8.13880142e-01,
+       1.40383958e-01, 2.27362449e-01, 6.88519645e-02, 7.05710044e-01,
+       3.95233244e-01, 3.10839977e-01, 7.18626390e-01, 3.35977542e-01,
+       7.27771273e-01, 8.15199395e-01, 2.17662843e-01, 9.73818697e-01,
+       1.62357948e-01, 2.90840907e-01, 1.79795291e-01, 3.45505656e-01,
+       4.80060888e-01, 5.22175869e-01, 8.53606042e-01, 8.89447909e-01,
+       2.20103861e-01, 6.22894032e-01, 1.11496057e-01, 4.58969860e-01,
+       3.22333538e-01, 3.16500745e-01, 4.82584242e-01, 7.29827636e-01,
+       6.91826588e-02, 8.79173338e-01, 7.34813775e-01, 1.76499389e-01,
+       9.39160909e-01, 5.06312224e-01, 9.99808578e-01, 1.97259474e-01,
+       5.34908198e-01, 2.90248043e-01, 3.04173557e-01, 5.91065381e-01,
+       9.21719067e-01, 8.05263856e-01, 7.23941399e-01, 5.59173782e-01,
+       9.22298504e-01, 4.92361407e-01, 8.73832178e-01, 8.33981644e-01,
+       2.13835347e-01, 7.71225463e-01, 1.21711569e-02, 3.22829538e-01,
+       2.29567445e-01, 5.06862958e-01, 7.36853162e-01, 9.76763674e-02,
+       5.14922202e-01, 9.38412022e-01, 2.28646551e-01, 6.77141144e-01];
+
 
     // set up a line search
     let linesearch = MoreThuenteLineSearch::new().c(1e-4, 0.9)?;
 
     // Set up solver
-    let solver = LBFGS::new(linesearch, 7);
+    let solver = LBFGS::new(linesearch, 10);
 
     // Run solver
     let res = Executor::new(cost, solver, init_param)
         // .add_observer(ArgminSlogLogger::term(), ObserverMode::Always)
-        .max_iters(100)
+        .max_iters(300)
         .run()?;
 
     // Wait a second (lets the logger flush everything before printing again)

The output is,

ArgminResult:
    param (best):  [0.9998660430280772, 0.9997711162728186, 1.0000183343714812, 0.9999545581212057, 1.0001581312529746, ..., 1.0020650224246153, 1.0026319569449569, 1.005436729578769, 1.008673050260663, 1.017108153862398], shape=[1000], strides=[1], layout=C | F (0x3), const ndim=1
    cost (best):   764.7226773367395
    iters (best):  299
    iters (best):  299
    iters (total): 300
    termination: Maximum number of iterations reached
    time:        697.088308ms

[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 1,652 MB perf.data (200 samples) ]
writing flamegraph to "flamegraph.svg"

and the resulting flamegraph can be found below,
flamegraph svg

full svg here.

I'm not used to do much profiling of Rust application, but it does looks like it's spending a lot of the time in clone and in managing openblas threads. The latter is somewhat surprising as I have also tried setting OMP_NUM_THREADS=1 environmental variable with no effect. Profiling run on 12 core CPU.

from argmin.

rth avatar rth commented on May 30, 2024

Though, maybe I'm doing something wrong with flamegraph. If I run the same example with cargo run it takes 1.70s which is a reasonable run time for profiling. However with cargo flamegraph the same 100 iterations of the solver are done in only 25.428ms according to the time printed to stdout. If true the above results are likely not meaningful (or something is wrong with the timer).

Edit: yes, the --release option was missing from cargo run, so I need a bigger example. I updated the figures above with an example that takes longer to run (700ms now), but generally we still might need a more realistic problem than Rosenbrock.

from argmin.

stefan-k avatar stefan-k commented on May 30, 2024

This is a lot slower than I expected.

code optimization: as far as I understood form @stefan-k comment above, no one has spent much too time optimizing LBFGS in argmin, so even a little benchmarking & tuning effort might yield reasonable results. In particular, one clear advantage over Fortran we have is the ability to use BLAS.

This is true, unfortunately performance optimization has so far not been one of the main concerns (even though this is the most fun part). When looking at the LBFGS implementation, I believe one of the main problems might be the clones. In particular, the operator is cloned every time the line search is executed (in OpWrapper::new_from_op(&op)). It should be possible to solve this without clones. This may then also remove the requirement on operators having to implement Clone.

I have to admit I'm not used to flamegraphs: Do I see this correctly that a lot of time is spent on computing the gradient?

from argmin.

nilgoyette avatar nilgoyette commented on May 30, 2024

parameters: we need to make sure that we run benchmarks exactly with the same parameters (including line search) as the Fortran version for the results to be comparable.

Yes, this was one of the problem I had. My original code was

let mut lbfgsb = Lbfgsb::new(x0, f, None);
lbfgsb.max_iteration(100);
lbfgsb.set_matric_correction(10);
lbfgsb.set_termination_tolerance(1e-7);
lbfgsb.set_tolerance(1e-05);

I think that matric_correction is the second parameter in LBFGS::new and max_iteration is easy to find. However, I couldn't find termination_tolerance and tolerance in argmin. Also, I used MoreThuenteLineSearch with "random" parameters as I don't know what they are using in the original Fortran version.

from argmin.

dbischof90 avatar dbischof90 commented on May 30, 2024

Mea culpa for bringing this back from the dead - I'd be interested whether any progress has been made here?

from argmin.

nilgoyette avatar nilgoyette commented on May 30, 2024

None that I'm aware of. I seem to recall that one of the released version made the whole crate faster, but I can't find any proof of that!

In all case, even if someone works on that problem, I doubt that a general optimization crate can be faster than a specialized Fortran L-BFGS library.

from argmin.

stefan-k avatar stefan-k commented on May 30, 2024

I seem to recall that one of the released version made the whole crate faster, but I can't find any proof of that!

Probably this change: #112 . My initial design of observers was... well, stupid. Before this PR, quite a lot of time was spent on observers, even if no observer was used (some details are here: #111). I made further improvements to the observer design later on, but I think those had less of an impact performance-wise.

I doubt that a general optimization crate can be faster than a specialized Fortran L-BFGS library.

That's probably correct, but benchmarks could still be insightful.

I'd be interested whether any progress has been made here?

Unfortunately no. I don't think I'll be able to commit time to this in the near future. I'd appreciate it if someone with a bit of benchmarking experience could help out with this one.

from argmin.

stefan-k avatar stefan-k commented on May 30, 2024

Hi @jonboh,

thanks a lot for this comment and the work you have put into it! I have heard of criterion.rs but haven't looked into it in detail. This looks amazing and I think this would be an extremely useful addition and a very good basis for future performance improvements. I would highly appreciate it if you could issue a PR :)

from argmin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.