I tried both dense and sparse backend to solve my QP problem. I compared the impact of

a / N + b / N = (a + b) / N <p dir=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Try the following code: <div class="snippet-clipboard-c

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Using vectorization seems to make no difference. about proxsuite HOT 12 CLOSED

simple-robotics commented on July 28, 2024

Using vectorization seems to make no difference.

from proxsuite.

Comments (12)

RobustControl commented on July 28, 2024 1

a / N + b / N = (a + b) / N

If N is very large, it is indeed better to do it outside. But sure you can do it outside the loop to have fewer divisions.

Okey, I misunderstood.

from proxsuite.

jcarpent commented on July 28, 2024

Could you share your setup and a minimal example?
How do you set up the vectorization? Did you use -march=native when compiling?
Did you try the Python interface with conda?

from proxsuite.

RobustControl commented on July 28, 2024

Could you share your setup and a minimal example? How do you set up the vectorization? Did you use -march=native when compiling? Did you try the Python interface with conda?

I only used the C++ interface. I didn't try the Python interface.
I first installed libsimde-dev_0.7.2-4_all.deb, and then installed the ProxSuite according to the document https://simple-robotics.github.io/proxsuite/md_doc_5_installation.html.
1. mkdir build && cd build
2. cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF
3. make
4. make install
I did not build the Python interface.

All my QP problems are quite large, and the data can't be directly exported to a file. Can you provide an QP problem example of C++ to show the difference between before and after enabling the vectorization? I think I can try it on my computer.

from proxsuite.

jcarpent commented on July 28, 2024

We have limited time to handle this issue. It will simplify our work if you can generate a random QP according to your specifities in C++? Could you please?

from proxsuite.

RobustControl commented on July 28, 2024

We have limited time to handle this issue. It will simplify our work if you can generate a random QP according to your specifities in C++? Could you please?

Please try the following code

#include <chrono>

#include <proxsuite/proxqp/dense/dense.hpp>
#include <proxsuite/proxqp/utils/random_qp_problems.hpp>  // used for generating a random convex qp

using namespace proxsuite::proxqp;
using T = double;

int main() {
  auto time_begin = std::chrono::system_clock::now();
  for (T sparsity_factor = 0.1; sparsity_factor < 0.5; sparsity_factor += 0.1) {
    for (dense::isize dim = 100; dim < 200; dim += 20) {
      dense::isize n_eq(dim * 3);
      dense::isize n_in(dim * 3);
      T strong_convexity_factor(1.e-2);
      dense::Model<T> qp_random = utils::dense_strongly_convex_qp(
          dim, n_eq, n_in, sparsity_factor, strong_convexity_factor);

      dense::QP<T> qp(dim, n_eq, n_in);
      qp.settings.max_iter = 10000;
      qp.settings.max_iter_in = 1000;
      qp.settings.eps_abs = 1e-5;
      qp.settings.eps_rel = 0;
      qp.init(qp_random.H, qp_random.g, qp_random.A, qp_random.b, qp_random.C,
              qp_random.u, qp_random.l);
      qp.solve();
    }
    std::cout << "sparsity_factor: " << sparsity_factor << std::endl;
  }
  auto time_end = std::chrono::system_clock::now();

  std::cout << "Time consumption(s): "
            << std::chrono::duration_cast<std::chrono::milliseconds>(time_end -
                                                                     time_begin)
                       .count() /
                   1000.0
            << std::endl;
  return 0;
}

Whether the option "-DBUILD_WITH_VECTORIZATION_SUPPORT=OFF" is added during compiling ProxSuite or not, the time consumption of this program is almost 8.5s.
The cpu of my computer is Intel i7-12700K, and the OS is Ubuntu 20.04.

from proxsuite.

jcarpent commented on July 28, 2024

@RobustControl The way you proceed with the timings of the loops is not correct.
You should only time the qp.solve().
Small, how do you compile this tiny example? Could you share your lines or cmake file?

from proxsuite.

RobustControl commented on July 28, 2024

Try the following code:

#include <proxsuite/proxqp/dense/dense.hpp>
#include <proxsuite/proxqp/sparse/sparse.hpp>  // get the sparse API of ProxQP
#include <proxsuite/proxqp/utils/random_qp_problems.hpp>  // used for generating a random convex qp

using namespace proxsuite::proxqp;
using T = double;

int main() {
  double solve_time = 0.0;
  double setup_time = 0.0;
  for (T sparsity_factor = 0.1; sparsity_factor < 0.5; sparsity_factor += 0.1) {
    for (dense::isize dim = 100; dim < 200; dim += 20) {
      dense::isize n_eq(dim * 3);
      dense::isize n_in(dim * 3);
      T strong_convexity_factor(1.e-2);
      dense::Model<T> qp_random = utils::dense_strongly_convex_qp(
          dim, n_eq, n_in, sparsity_factor, strong_convexity_factor);

      dense::QP<T> qp(dim, n_eq, n_in);
      qp.settings.max_iter = 10000;
      qp.settings.max_iter_in = 1000;
      qp.settings.eps_abs = 1e-5;
      qp.settings.eps_rel = 0;
      qp.init(qp_random.H, qp_random.g, qp_random.A, qp_random.b, qp_random.C,
              qp_random.u, qp_random.l);
      qp.solve();
      solve_time += qp.results.info.solve_time;
      setup_time += qp.results.info.setup_time;
    }
    std::cout << "sparsity_factor: " << sparsity_factor << std::endl;
  }

  std::cout << "Setup Time consumption(dense): " << setup_time / 1e6 << "s"
            << std::endl
            << "Solve Time consumption(dense): " << solve_time / 1e6 << "s"
            << std::endl;

  solve_time = 0.0;
  setup_time = 0.0;
  for (T sparsity_factor = 0.1; sparsity_factor < 0.5; sparsity_factor += 0.1) {
    for (dense::isize dim = 100; dim < 200; dim += 20) {
      isize n_eq(dim * 3);
      isize n_in(dim * 3);

      T sparsity_factor = 0.15;  // level of sparsity
      T conditioning = 10.0;     // conditioning level for H

      auto H = ::proxsuite::proxqp::utils::rand::sparse_positive_definite_rand(
          dim, conditioning, sparsity_factor);
      auto A = ::proxsuite::proxqp::utils::rand::sparse_matrix_rand<T>(
          n_eq, dim, sparsity_factor);
      auto C = ::proxsuite::proxqp::utils::rand::sparse_matrix_rand<T>(
          n_in, dim, sparsity_factor);
      auto g = ::proxsuite::proxqp::utils::rand::vector_rand<T>(dim);
      auto x_sol = ::proxsuite::proxqp::utils::rand::vector_rand<T>(dim);
      auto b = A * x_sol;
      auto l = C * x_sol;
      auto u = (l.array() + 10).matrix().eval();

      proxsuite::proxqp::sparse::QP<T, isize> qp(H.cast<bool>(), A.cast<bool>(),
                                                 C.cast<bool>());
      qp.init(H, g, A, b, C, u, l);
      qp.solve();

      solve_time += qp.results.info.solve_time;
      setup_time += qp.results.info.setup_time;
    }
    std::cout << "sparsity_factor: " << sparsity_factor << std::endl;
  }
  std::cout << "Setup Time consumption(sparse): " << setup_time / 1e6 << "s"
            << std::endl
            << "Solve Time consumption(sparse): " << solve_time / 1e6 << "s"
            << std::endl;

  return 0;
}

I modified the proxsuite/examples/cpp/overview-simple.cpp file and used the command "g++ -std=c++17 examples/cpp/overview-simple.cpp -o overview-simple $(pkg-config --cflags proxsuite)" in the official document to compile the file.
Here is a typical result (whether or not the vectorization is enabled during install prosuite)

from proxsuite.

jcarpent commented on July 28, 2024

You have missed the -O3 -march=native to enable the vectorization using your CPU config.
@fabinsch will provide more details soon.

from proxsuite.

fabinsch commented on July 28, 2024

Hey @RobustControl thanks for providing this benchmark file. I suggest adding another loop over N times init and solve to not measure the latency. Also, I modified it in order to be compatible with the newest version of proxsuite (v0.2.0) and to also take the sparsity into account in the sparse (you were overwriting it each time).

The timings are the following on my Intel i7-11850H and ubuntu 20.04:

sparsity_factor: 0.1
Setup Time consumption(dense): 0.00246659s
Solve Time consumption(dense): 0.00977696s
sparsity_factor: 0.2
Setup Time consumption(dense): 0.00414675s
Solve Time consumption(dense): 0.0160001s
sparsity_factor: 0.3
Setup Time consumption(dense): 0.00586993s
Solve Time consumption(dense): 0.0223937s
sparsity_factor: 0.4
Setup Time consumption(dense): 0.00758707s
Solve Time consumption(dense): 0.0289477s
sparsity_factor: 0.1
Setup Time consumption(sparse): 0.000428303s
Solve Time consumption(sparse): 0.169426s
sparsity_factor: 0.2
Setup Time consumption(sparse): 0.00111007s
Solve Time consumption(sparse): 0.33179s
sparsity_factor: 0.3
Setup Time consumption(sparse): 0.00204471s
Solve Time consumption(sparse): 0.484549s
sparsity_factor: 0.4
Setup Time consumption(sparse): 0.00323929s
Solve Time consumption(sparse): 0.651238s

to compile the file, I used

 g++ -O3 -march=native -DNDEBUG -DPROXSUITE_VECTORIZE -std=gnu++17 timings.cpp -o timings $(pkg-config --cflags proxsuite)

I will document this in our readme file, thanks for pointing out that clear instructions were missing. Only using the -DPROXSUITE_VECTORIZE option is not enough, you need to tell the compiler to use the corresponding instruction set for your CPU, see also here.

and the file timings.cpp has the following content:

#include <proxsuite/proxqp/dense/dense.hpp>
#include <proxsuite/proxqp/sparse/sparse.hpp>  // get the sparse API of ProxQP
#include <proxsuite/proxqp/utils/random_qp_problems.hpp>  // used for generating a random convex qp

using namespace proxsuite::proxqp;
using T = double;

int main() {
  double N = 100;
  double counter = 0.0;  // outer loop
  double solve_time = 0.0;
  double setup_time = 0.0;
  for (T sparsity_factor = 0.1; sparsity_factor < 0.5; sparsity_factor += 0.1) {
    for (dense::isize dim = 100; dim < 200; dim += 20) {
      dense::isize n_eq(dim * 3);
      dense::isize n_in(dim * 3);
      T strong_convexity_factor(1.e-2);
      dense::Model<T> qp_random = utils::dense_strongly_convex_qp(
          dim, n_eq, n_in, sparsity_factor, strong_convexity_factor);

      for (int i = 0; i < N; i++) {
        dense::QP<T> qp(dim, n_eq, n_in);
        qp.settings.max_iter = 10000;
        qp.settings.max_iter_in = 1000;
        qp.settings.eps_abs = 1e-5;
        qp.settings.eps_rel = 0;
        qp.init(qp_random.H, qp_random.g, qp_random.A, qp_random.b, qp_random.C,
                qp_random.l, qp_random.u);
        qp.solve();
        solve_time += qp.results.info.solve_time / N;
        setup_time += qp.results.info.setup_time / N;
      }
      counter += 1.0;
    }
    std::cout << "sparsity_factor: " << sparsity_factor << std::endl;
    std::cout << "Setup Time consumption(dense): " << setup_time / (1e6 * counter) << "s"
              << std::endl
              << "Solve Time consumption(dense): " << solve_time / (1e6 * counter) << "s"
              << std::endl;
    counter = 0.0;
  }

  solve_time = 0.0;
  setup_time = 0.0;

  for (T sparsity_factor = 0.1; sparsity_factor < 0.5; sparsity_factor += 0.1) {
    for (dense::isize dim = 100; dim < 200; dim += 20) {
      isize n_eq(dim * 3);
      isize n_in(dim * 3);  
      T conditioning = 10.0;     // conditioning level for H  
      auto H = ::proxsuite::proxqp::utils::rand::sparse_positive_definite_rand(
          dim, conditioning, sparsity_factor);
      auto A = ::proxsuite::proxqp::utils::rand::sparse_matrix_rand<T>(
          n_eq, dim, sparsity_factor);
      auto C = ::proxsuite::proxqp::utils::rand::sparse_matrix_rand<T>(
          n_in, dim, sparsity_factor);
      auto g = ::proxsuite::proxqp::utils::rand::vector_rand<T>(dim);
      auto x_sol = ::proxsuite::proxqp::utils::rand::vector_rand<T>(dim);
      auto b = A * x_sol;
      auto l = C * x_sol;
      auto u = (l.array() + 10).matrix().eval();  
      for (int i = 0; i < N; i++) {
        proxsuite::proxqp::sparse::QP<T, isize> qp(H.cast<bool>(), A.cast<bool>(),
                                                   C.cast<bool>());
      
        qp.settings.max_iter = 10000;
        qp.settings.max_iter_in = 1000;
        qp.settings.eps_abs = 1e-5;
        qp.settings.eps_rel = 0;
        qp.init(H, g, A, b, C, l, u);
        qp.solve();
        solve_time += qp.results.info.solve_time / N;
        setup_time += qp.results.info.setup_time / N;
      }
      counter += 1.0;  
    }
    std::cout << "sparsity_factor: " << sparsity_factor << std::endl;
  
    std::cout << "Setup Time consumption(sparse): " << setup_time / (1e6 * counter) << "s"
              << std::endl
              << "Solve Time consumption(sparse): " << solve_time / (1e6 * counter) << "s"
              << std::endl;
    counter = 0.0;
  }

  return 0;
}

Note after discussing with @Bambade: your specific setup of having n_constraint = 3 * n_vars is not favorable for the current implementation of the sparse backend. If you have fewer constraints, like n_constraint = 0.1 * n_vars, using the sparse backend gets more interesting.

from proxsuite.

jcarpent commented on July 28, 2024

I will close this issue as it seems to be solved.

from proxsuite.

RobustControl commented on July 28, 2024

fabinsch

Hello @fabinsch Thank you for your detailed reply! I ran your example and got the same result.
But I have a doubt about your test code, that is why divide by N is placed in the for loop? I think you want to write the code as follow.

      for (int i = 0; i < N; i++) {
        proxsuite::proxqp::sparse::QP<T, isize> qp(H.cast<bool>(), A.cast<bool>(),
                                                   C.cast<bool>());
      
        qp.settings.max_iter = 10000;
        qp.settings.max_iter_in = 1000;
        qp.settings.eps_abs = 1e-5;
        qp.settings.eps_rel = 0;
        qp.init(H, g, A, b, C, l, u);
        qp.solve();
        solve_time += qp.results.info.solve_time;
        setup_time += qp.results.info.setup_time;
      }
      solve_time /= N;
      solve_time /= N;

I will try to enable vectorization the in our project later.

from proxsuite.

fabinsch commented on July 28, 2024

a / N + b / N = (a + b) / N

If N is very large, it is indeed better to do it outside. But sure you can do it outside the loop to have fewer divisions.

from proxsuite.

Using vectorization seems to make no difference. about proxsuite HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent