sleipnirgroup / sleipnir Goto Github PK

View Code? Open in Web Editor NEW

39.0 5.0 10.0 3.89 MB

A linearity-exploiting sparse nonlinear constrained optimization problem solver that uses the interior-point method.

License: BSD 3-Clause "New" or "Revised" License

CMake 2.07% C++ 84.06% Python 13.60% Shell 0.27%

optimization solver

sleipnir's Introduction

Sleipnir

Sparsity and Linearity-Exploiting Interior-Point solver - Now Internally Readable

Named after Odin's eight-legged horse from Norse mythology, Sleipnir is a linearity-exploiting sparse nonlinear constrained optimization problem solver that uses the interior-point method.

#include <print>

#include <sleipnir/optimization/OptimizationProblem.hpp>

int main() {
  // Find the x, y pair with the largest product for which x + 3y = 36
  sleipnir::OptimizationProblem problem;

  auto x = problem.DecisionVariable();
  auto y = problem.DecisionVariable();

  problem.Maximize(x * y);
  problem.SubjectTo(x + 3 * y == 36);
  problem.Solve();

  // x = 18.0, y = 6.0
  std::println("x = {}, y = {}", x.Value(), y.Value());
}

#!/usr/bin/env python3

from jormungandr.optimization import OptimizationProblem


def main():
    # Find the x, y pair with the largest product for which x + 3y = 36
    problem = OptimizationProblem()

    x = problem.decision_variable()
    y = problem.decision_variable()

    problem.maximize(x * y)
    problem.subject_to(x + 3 * y == 36)
    problem.solve()

    # x = 18.0, y = 6.0
    print(f"x = {x.value()}, y = {y.value()}")


if __name__ == "__main__":
    main()

Sleipnir's internals are intended to be readable by those who aren't domain experts with links to explanatory material for its algorithms.

Benchmarks


flywheel-scalability-results-casadi.csv flywheel-scalability-results-sleipnir.csv	cart-pole-scalability-results-casadi.csv cart-pole-scalability-results-sleipnir.csv

Generated by tools/generate-scalability-results.sh from benchmarks/scalability source.

CPU: AMD Ryzen 7 7840U
RAM: 64 GB, 5600 MHz DDR5
Compiler version: g++ (GCC) 14.1.1 20240507

The following thirdparty software was used in the benchmarks:

CasADi 3.6.5 (autodiff and NLP solver frontend)
Ipopt 3.14.16 (NLP solver backend)
MUMPS 5.6.2 (linear solver)

Ipopt uses MUMPS by default because it has free licensing. Commercial linear solvers may be much faster.

See benchmark details for more.

Install

C++ library

See the build instructions.

Python library

pip install sleipnirgroup-jormungandr

API docs

See the C++ API docs and Python API docs.

Examples

See the examples, C++ optimization unit tests, and Python optimization unit tests.

Build

Dependencies

C++23 compiler
- On Windows, install Visual Studio Community 2022 and select the C++ programming language during installation
- On Linux, install GCC 14 or greater via sudo apt install g++
- On macOS 14 or greater, install the Xcode command-line build tools via xcode-select --install. Xcode 15.3 or greater is required.
CMake 3.21 or greater
- On Windows, install from the link above
- On Linux, install via sudo apt install cmake
- On macOS, install via brew install cmake
Python 3.9 or greater
- On Windows, install from the link above
- On Linux, install via sudo apt install python
- On macOS, install via brew install python
Eigen
nanobind (build only)
Catch2 (tests only)

Library dependencies which aren't installed locally will be automatically downloaded and built by CMake.

The benchmark executables require CasADi to be installed locally.

C++ library

On Windows, open a Developer PowerShell. On Linux or macOS, open a Bash shell.

# Clone the repository
git clone [email protected]:SleipnirGroup/Sleipnir
cd Sleipnir

# Configure; automatically downloads library dependencies
cmake -B build -S .

# Build
cmake --build build

# Test
ctest --test-dir build --output-on-failure

# Install
cmake --install build --prefix pkgdir

The following build types can be specified via -DCMAKE_BUILD_TYPE during CMake configure:

Debug
- Optimizations off
- Debug symbols on
Release
- Optimizations on
- Debug symbols off
RelWithDebInfo (default)
- Release build type, but with debug info
MinSizeRel
- Minimum size release build
Asan
- Enables address sanitizer
Tsan
- Enables thread sanitizer
Ubsan
- Enables undefined behavior sanitizer
Perf
- RelWithDebInfo build type, but with frame pointer so perf utility can use it

Python library

On Windows, open a Developer PowerShell. On Linux or macOS, open a Bash shell.

# Clone the repository
git clone [email protected]:SleipnirGroup/Sleipnir
cd Sleipnir

# Setup
pip install --user build

# Build
python -m build --wheel

# Install
pip install --user dist/sleipnirgroup_jormungandr-*.whl

# Test
pytest

Test diagnostics

Passing the --enable-diagnostics flag to the test executable enables solver diagnostic prints.

Some test problems generate CSV files containing their solutions. These can be plotted with tools/plot_test_problem_solutions.py.

Benchmark details

Running the benchmarks

Benchmark projects are in the benchmarks folder. To compile and run them, run the following in the repository root:

# Install CasADi and [matplotlib, numpy, scipy] pip packages first
cmake -B build -S .
cmake --build build
./tools/generate-scalability-results.sh

See the contents of ./tools/generate-scalability-results.sh for how to run specific benchmarks.

How we improved performance

Make more decisions at compile time

During problem setup, equality and inequality constraints are encoded as different types, so the appropriate setup behavior can be selected at compile time via operator overloads.

Reuse autodiff computation results that are still valid (aka caching)

The autodiff library automatically records the linearity of every node in the computational graph. Linear functions have constant first derivatives, and quadratic functions have constant second derivatives. The constant derivatives are computed in the initialization phase and reused for all solver iterations. Only nonlinear parts of the computational graph are recomputed during each solver iteration.

For quadratic problems, we compute the Lagrangian Hessian and constraint Jacobians once with no problem structure hints from the user.

Use a performant linear algebra library with fast sparse solvers

Eigen provides these. It also has no required dependencies, which makes cross compilation much easier.

Use a pool allocator for autodiff expression nodes

This promotes fast allocation/deallocation and good memory locality.

We could mitigate the solver's high last-level-cache miss rate (~42% on the machine above) further by breaking apart the expression nodes into fields that are commonly iterated together. We used to use a tape, which gave computational graph updates linear access patterns, but tapes are monotonic buffers with no way to reclaim storage.

sleipnir's People

Contributors

Stargazers

Watchers

Forkers

juliaschatz neumann-a deltadizzy glaycia calcmogul spacey-sooty bugger2 pietroglyph bbscholar shueja

sleipnir's Issues

Python cart-pole test crashes on Windows

Add docstrings to Python bindings

Consider autogenerating them from the C++ docs with https://github.com/pybind/pybind11_mkdoc. I couldn't get it to find my local clang install.

Replace feasibility restoration phase with one-phase IPM

The algorithm described by https://arxiv.org/pdf/1801.03072.pdf keeps the dual iterates bounded and is more robust on problems with narrow feasible regions.

Support VariableMatrix slicing

Make block matrices support negative slices (e.g., Col(-1) where -1 means last element). Python slices should also support strides not equal to 1.

Here's prior art for a C++ matrix slicing API: https://eigen.tuxfamily.org/dox/group__TutorialSlicingIndexing.html

Add tests for expression tree pruning

Variable operations like multiplication check the operands for constants and avoid creating extra nodes if they're zeroes or ones. We should have unit tests ensuring this behavior occurs correctly.

Eigen needs update to support macOS compilation

Building Sleipnir on macOS Ventura with Apple Clang 14.0.3 results in the following compilation error:

FAILED: CMakeFiles/Sleipnir.dir/src/autodiff/Gradient.cpp.o
/Library/Developer/CommandLineTools/usr/bin/c++ -DFMT_SHARED -DSLEIPNIR_EXPORTS -DSleipnir_EXPORTS -I/Users/prateekma/git/Sleipnir/src -I/Users/prateekma/git/Sleipnir/thirdparty/llvm/include -I/Users/prateekma/git/Sleipnir/include -I/Users/prateekma/git/Sleipnir/build/_deps/eigen3-src -I/Users/prateekma/git/Sleipnir/build/_deps/fmt-src/include -O2 -g -DNDEBUG -std=gnu++20 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.2 -fPIC -Wall -pedantic -Wextra -Werror -Wno-unused-parameter -MD -MT CMakeFiles/Sleipnir.dir/src/autodiff/Gradient.cpp.o -MF CMakeFiles/Sleipnir.dir/src/autodiff/Gradient.cpp.o.d -o CMakeFiles/Sleipnir.dir/src/autodiff/Gradient.cpp.o -c /Users/prateekma/git/Sleipnir/src/autodiff/Gradient.cpp
In file included from /Users/prateekma/git/Sleipnir/src/autodiff/Gradient.cpp:3:
In file included from /Users/prateekma/git/Sleipnir/include/sleipnir/autodiff/Gradient.hpp:7:
In file included from /Users/prateekma/git/Sleipnir/build/_deps/eigen3-src/Eigen/SparseCore:61:
/Users/prateekma/git/Sleipnir/build/_deps/eigen3-src/Eigen/src/SparseCore/TriangularSolver.h:273:13: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable]
      Index count = 0;
            ^
1 error generated.

This patch should be applied to the version of Eigen being used.

Handle autodiff domain errors more gracefully

The Solver performs poorly with a sqrt() in the cost function.

#include <gtest/gtest.h>
#include <sleipnir/optimization/OptimizationProblem.hpp>

#include "CmdlineArguments.hpp"

TEST(EdgeCaseTest, NarrowFeasibleRegion) {
  sleipnir::OptimizationProblem problem;

  auto x = problem.DecisionVariable();
  x.SetValue(20.0);

  auto y = problem.DecisionVariable();
  y.SetValue(50.0);

  problem.Minimize(sleipnir::sqrt(x * x + y * y));

  problem.SubjectTo(y == -x + 5.0);

  auto status =
      problem.Solve({.diagnostics = CmdlineArgPresent(kEnableDiagnostics)});

  EXPECT_EQ(sleipnir::ExpressionType::kNonlinear, status.costFunctionType);
  EXPECT_EQ(sleipnir::ExpressionType::kLinear, status.equalityConstraintType);
  EXPECT_EQ(sleipnir::ExpressionType::kNone, status.inequalityConstraintType);
  EXPECT_EQ(sleipnir::SolverExitCondition::kSuccess, status.exitCondition);

  EXPECT_NEAR(2.5, x.Value(), 1e-2);
  EXPECT_NEAR(2.5, y.Value(), 1e-2);
}

This diverges and gives a bad search direction exit condition. Making the cost function just x * x + y * y converges to the correct answer.

I'm not sure how to deal with this besides implementing some kind of "feasible-only mode" that shrinks steps if they'll cause domain errors or ill-conditioning.

Adding the following constraints works around the issue, but we should ideally still handle domain errors more gracefully.

  problem.SubjectTo(x >= 1e-2);
  problem.SubjectTo(y >= 1e-2);

Publish Python wheels to PyPI

Should this conceptually give the same result as coin-or ipopt?

So I have defined the following problem:

   sleipnir::OptimizationProblem problem;
   auto x = problem.DecisionVariable((int)in.system_matrix.cols());

   problem.SubjectTo(x >= 0);
   //problem.SubjectTo(x < 1);
   auto cost_vec_f = in.system_matrix*x-in.measurement;
   auto f = sleipnir::sqrt(cost_vec_f*cost_vec_f);  // Should really be cost_vec_f.norm()
   problem.Minimize(f);

   problem.Solve({.tolerance=1E-10,.maxIterations = 100000,.diagnostics = true});

In ipopt I have:

   f_vec = (in.system_matrix * x - in.measurement).eval();
   norm = f_vec.norm();

where norm is the cost.

While ipopt and a manually implemented barrier + fminsearch in matlab give similar results sleipnir seems not to converge to a meaningful solution.

Did I do something wrong in defining the problem or is sleipnir not useful for constraints which are too simple ?

Cart-pole problem iterates oscillate around infeasible location

The solver's iterates oscillate around an infeasible location, so the error never drops below the threshold and the max iteration count is reached. A feasibility restoration phase is needed to fix this.

Note that this is only an issue for poorly behaved cost function manifolds.

Jormungandr cart-pole test reports infeasible

Copied the file jormungandr/test/optimization/cart_pole_problem_test.py, removed the asserts and then just ran test_direct_transcription() as main.

Threw error error: numpy method __call__, ufunc <ufunc 'matmul'> not implemented for VariableBlock

Make autodiff caching more fine-grained

While our autodiff is faster than CasADi, it's still a bottleneck compared to the IPM algorithm and linear system solve itself. One way to speed it up would be to eliminate unnecessary work. We already cache the linear rows of Jacobians, but we could go further by caching subtrees within the expression tree too.

One way to do this is to set a dirty bit on each node so computations can be skipped. Since variables are flagged from the bottom of the tree, and reverse autodiff works from the top-down, we'll need to be clever about avoiding unnecessary node traversal.

Implement autodiff linear system solver

This is useful for formulating optimization problems that contain implicit ODEs (e.g., manipulator equations in ma = F form).

The API would essentially be Solve(lhs, rhs).

Based on https://libeigen.gitlab.io/docs/group__TutorialLinearAlgebra.html, we could do either an LU decomposition or a QR decomposition. The latter is more general because it works for non-invertible matrices.

Expand OCPSolver top-level class docs

It currently defers to https://underactuated.mit.edu/trajopt.html, but we may want to include more information inline, or show some example problem formulations.

RegularizedLDLT gets stuck in an infinite loop for some problems

The following loop gets stuck for some problems, resulting in an infinite loop that ignores timeout and iteration limit conditions:
https://github.com/SleipnirGroup/Sleipnir/blob/main/src/optimization/RegularizedLDLT.hpp#L95-L119

The following code is sufficient to generate this error, presumably due to the addition of the height limit constraint.

  sleipnir::OptimizationProblem problem;

  double total_time = 4;
  auto dt = total_time / _N;

  auto elevator = problem.DecisionVariable(2, _N + 1);
  auto elevator_accel = problem.DecisionVariable(1, _N);

  auto arm = problem.DecisionVariable(2, _N + 1);
  auto arm_accel = problem.DecisionVariable(1, _N);

  for (size_t k = 0; k < _N; k++) {
    problem.SubjectTo(elevator(0, k+1) == elevator(0, k) + elevator(1, k) * dt);
    problem.SubjectTo(elevator(1, k+1) == elevator(1, k) + elevator_accel(0, k) * dt);

    problem.SubjectTo(arm(0, k+1) == arm(0, k) + arm(1, k) * dt);
    problem.SubjectTo(arm(1, k+1) == arm(1, k) + arm_accel(0, k) * dt);
  }

  // Start and End Conditions
  problem.SubjectTo(elevator.Col(0) == Eigen::Vector2d({ start.height.value(), 0.0 }));
  problem.SubjectTo(elevator.Col(_N) == Eigen::Vector2d({ end.height.value(), 0.0 }));

  problem.SubjectTo(arm.Col(0) == Eigen::Vector2d({ start.angle.value(), 0.0 }));
  problem.SubjectTo(arm.Col(_N) == Eigen::Vector2d({ end.angle.value(), 0.0 }));

  // Velocity and Acceleration Limits
  problem.SubjectTo(-_constraints.max_velocity_elevator.value() <= elevator.Row(1));
  problem.SubjectTo(elevator.Row(1) <= _constraints.max_velocity_elevator.value());

  problem.SubjectTo(-_constraints.max_acceleration_elevator.value() <= elevator_accel);
  problem.SubjectTo(elevator_accel <= _constraints.max_acceleration_elevator.value());

  problem.SubjectTo(-_constraints.max_velocity_arm.value() <= arm.Row(1));
  problem.SubjectTo(arm.Row(1) <= _constraints.max_velocity_arm.value());

  problem.SubjectTo(-_constraints.max_acceleration_arm.value() <= arm_accel);
  problem.SubjectTo(arm_accel <= _constraints.max_acceleration_arm.value());

  // Height Limit
  auto y = elevator.Row(0) + _config.arm.armLength.value() * sleipnir::sin(arm.Row(0));
  problem.SubjectTo(y <= _constraints.max_height_end_effector.value());

  // Cost Function
  sleipnir::Variable costTotal = 0.0;
  for (int k = 0; k < _N + 1; k++) {
    costTotal += sleipnir::pow(end.height.value() - elevator(0, k), 2) + sleipnir::pow(end.angle.value() - arm(0, k), 2);
  }
  problem.Minimize(costTotal);

Where:

_config.arm.armLength = 1m
_constraints.max_velocity_elevator = 1m/s
_constraints.max_acceleration_elevator = 2m/s/s
_constraints.max_velocity_arm = 360deg/s
_constraints.max_acceleration_arm = 720deg/s/s
_constraints.max_height_end_effector = 1.8m
start.height = 1m
start.angle = 0deg
end.height = 1.25m
end.angle = 180deg

Add Rust bindings

This would make it possible for Choreo to call the solver directly in Tauri.

Use Jormungandr as a reference for what to bind. Instead of NumPy, use nalgebra.

We need to be able to overload the math operators (see https://doc.rust-lang.org/core/ops/) for various combinations of:

double
int
Variable
VariableMatrix
VariableBlock

We also need to overload the equality operator to return a constraint object instead of bool; and overload the comparison operators to return a constraint object instead of Ordering. If that's not possible, provide the following:

Constraints::eq(lhs, rhs)
Constraints::lt(lhs, rhs)
Constraints::le(lhs, rhs)
Constraints::gt(lhs, rhs)
Constraints::ge(lhs, rhs)

Unable to build with current master of Eigen3

E:\all\build\vcpkg_installed\x64-win-llvm-static-md\include\eigen3\Eigen/src/Core/MathFunctions.h(1562,10): error: implicit instantiation of undefined template 'Eigen::numext::signbit_impl<sleipnir::Variable, false, false>'
  return signbit_impl<Scalar>::run(x);
         ^

Full build log:
install-x64-win-llvm-static-md-dbg-out.log

Make Hessian symmetric by construction

We should prune expression graph parts that have child in the upper triangle (50% of the work), since the linear system solver doesn't use the upper triangle. The Hessian class would only return a lower triangular view.

Add tests for autodiff Python wrappers

They should mirror the C++ tests.

Add Python bindings via pybind11

The following links may help as a guide:
https://github.com/RobotLocomotion/drake/tree/master/bindings
https://github.com/RobotLocomotion/drake/blob/master/bindings/pydrake/math_py.cc

The variable matrix classes will need constructors that take MatrixXd so Python can use them for numpy arrays.

Add a way to visualize sparsity patterns in optimization problem matrices

CasADi shows the following in https://web.casadi.org/blog/ocp/:

We may not be able to have batteries-included plotting like that until we have a Python API, but we can at least do the C++ side first. It would help with debugging Sleipnir too.

Add a function to autodiff that computes the sparsity pattern of a sparse matrix.
Add functions to OptimizationProblem that return the sparsity pattern for the Hessian of the Lagrangian, equality constraint Jacobian, and inequality constraint Jacobian.
For nonlinear problems, we could return the sparsity pattern as a function of the iteration number.
Add a function that writes the data out into a plottable format.
Add a Python script that animates the evolution of a given matrix's sparsity pattern over time? Here's how I implemented animations for frccontrol, for reference: https://github.com/calcmogul/frccontrol/blob/main/examples/double_jointed_arm.py#L360-L394

Add symbolic partial derivative to autodiff

We already have Gradient, Jacobian, and Hessian for double matrices, but we don't expose Variable versions of these. This would be useful for, say, minimizing the derivative of something with respect to a decision variable.

ExpressionGraph::GenerateGradientTree() seems to do the work required.

Expression graph destructors cause stack overflow through recursion

The Windows CI failure in #77 is from Expression graph destructors causing a stack overflow through recursion. Here's the bottom of the stacktrace:

464c 000000a5`e1efa0a0 00007ffe`4c6176e8     Sleipnird!sleipnir::Expression::~Expression+0x1b
464d 000000a5`e1efa0d0 00007ffe`4c614526     Sleipnird!sleipnir::Expression::`scalar deleting destructor'+0x18
464e 000000a5`e1efa100 00007ffe`4c6144b9     Sleipnird!std::destroy_at<sleipnir::Expression>+0x16 [C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.34.31933\include\xmemory @ 313] 
464f 000000a5`e1efa130 00007ffe`4c61793a     Sleipnird!std::_Normal_allocator_traits<sleipnir::PoolAllocator<sleipnir::Expression,32768> >::destroy<sleipnir::Expression>+0x19 [C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.34.31933\include\xmemory @ 587] 
4650 000000a5`e1efa160 00007ffe`4c6165c2     Sleipnird!sleipnir::IntrusiveSharedPtrDecRefCount+0x5a [C:\Users\thadh\Documents\GitHub\Sleipnir\include\sleipnir\autodiff\Expression.hpp @ 349] 
4651 000000a5`e1efa1b0 00007ffe`4c616af7     Sleipnird!sleipnir::IntrusiveSharedPtr<sleipnir::Expression>::~IntrusiveSharedPtr<sleipnir::Expression>+0x22 [C:\Users\thadh\Documents\GitHub\Sleipnir\include\sleipnir\IntrusiveSharedPtr.hpp @ 54] 
4652 000000a5`e1efa1e0 00007ffe`4c64b5e8     Sleipnird!sleipnir::Variable::~Variable+0x17
4653 000000a5`e1efa210 00007ffe`4c646ac1     Sleipnird!sleipnir::OptimizationProblem::InteriorPoint+0x4838 [C:\Users\thadh\Documents\GitHub\Sleipnir\src\optimization\OptimizationProblem.cpp @ 1040] 
4654 000000a5`e1efdf10 00007ff7`f10fa5b6     Sleipnird!sleipnir::OptimizationProblem::Solve+0x5c1 [C:\Users\thadh\Documents\GitHub\Sleipnir\src\optimization\OptimizationProblem.cpp @ 337] 
4655 000000a5`e1efe2a0 00007ff7`f11ea39d     SleipnirTest!ArmOnElevatorProblemTest_DirectTranscription_Test::TestBody+0x1516 [C:\Users\thadh\Documents\GitHub\Sleipnir\test\src\optimization\ArmOnElevatorProblemTest.cpp @ 96]

Apparently the graph is 2200 elements deep.

This is a fundamental issue with using a linked list graph for the data structure. We need to either serialize the destructor calls somehow or switch to a tape data structure.

A tape uses an arena allocator, which we moved away from because it meant we couldn't throw away unused nodes and hit OOM often. An arena per OptimizationProblem instance is an option, but that'll negatively impact the user-facing API since you can't make Variables wherever you want anymore.

Implement multistart

Start parallel solvers on extra threads and return the solution with the lowest cost.

https://pyomo.readthedocs.io/en/stable/contributed_packages/multistart.html

This will require doing deep copies of the autodiff expression tree so the threads don't need to share data structures. NumPy calls the member function for doing this copy(): https://numpy.org/doc/stable/reference/generated/numpy.copy.html.

C++ OCPSolver cart-pole test fails with "locally infeasible"

The C++ cart-pole problem test passes now, but some other constrained NLPs still fail with the "bad search direction" status. This could either be a precision issue, another bug, or something a feasibility restoration phase would address.

See section 3.3 of http://cepac.cheme.cmu.edu/pasilectures/biegler/ipopt.pdf for an example formulation of the feasibility restoration phase.

sleipnirgroup / sleipnir Goto Github PK

sleipnir's Introduction

Sleipnir

Benchmarks

Install

C++ library

Python library

API docs

Examples

Build

Dependencies

C++ library

Python library

Test diagnostics

Benchmark details

Running the benchmarks

How we improved performance

Make more decisions at compile time

Reuse autodiff computation results that are still valid (aka caching)

Use a performant linear algebra library with fast sparse solvers

Use a pool allocator for autodiff expression nodes

sleipnir's People

Contributors

Stargazers

Watchers

Forkers

sleipnir's Issues

Recommend Projects

Recommend Topics

Recommend Org