Giter Club home page Giter Club logo

Comments (12)

RobustControl avatar RobustControl commented on July 28, 2024 1

a / N + b / N = (a + b) / N

If N is very large, it is indeed better to do it outside. But sure you can do it outside the loop to have fewer divisions.

Okey, I misunderstood.

from proxsuite.

jcarpent avatar jcarpent commented on July 28, 2024

Could you share your setup and a minimal example?
How do you set up the vectorization? Did you use -march=native when compiling?
Did you try the Python interface with conda?

from proxsuite.

RobustControl avatar RobustControl commented on July 28, 2024

Could you share your setup and a minimal example? How do you set up the vectorization? Did you use -march=native when compiling? Did you try the Python interface with conda?

I only used the C++ interface. I didn't try the Python interface.
I first installed libsimde-dev_0.7.2-4_all.deb, and then installed the ProxSuite according to the document https://simple-robotics.github.io/proxsuite/md_doc_5_installation.html.
1. mkdir build && cd build
2. cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF
3. make
4. make install
I did not build the Python interface.

All my QP problems are quite large, and the data can't be directly exported to a file. Can you provide an QP problem example of C++ to show the difference between before and after enabling the vectorization? I think I can try it on my computer.

from proxsuite.

jcarpent avatar jcarpent commented on July 28, 2024

We have limited time to handle this issue. It will simplify our work if you can generate a random QP according to your specifities in C++? Could you please?

from proxsuite.

RobustControl avatar RobustControl commented on July 28, 2024

We have limited time to handle this issue. It will simplify our work if you can generate a random QP according to your specifities in C++? Could you please?

Please try the following code

#include <chrono>

#include <proxsuite/proxqp/dense/dense.hpp>
#include <proxsuite/proxqp/utils/random_qp_problems.hpp>  // used for generating a random convex qp

using namespace proxsuite::proxqp;
using T = double;

int main() {
  auto time_begin = std::chrono::system_clock::now();
  for (T sparsity_factor = 0.1; sparsity_factor < 0.5; sparsity_factor += 0.1) {
    for (dense::isize dim = 100; dim < 200; dim += 20) {
      dense::isize n_eq(dim * 3);
      dense::isize n_in(dim * 3);
      T strong_convexity_factor(1.e-2);
      dense::Model<T> qp_random = utils::dense_strongly_convex_qp(
          dim, n_eq, n_in, sparsity_factor, strong_convexity_factor);

      dense::QP<T> qp(dim, n_eq, n_in);
      qp.settings.max_iter = 10000;
      qp.settings.max_iter_in = 1000;
      qp.settings.eps_abs = 1e-5;
      qp.settings.eps_rel = 0;
      qp.init(qp_random.H, qp_random.g, qp_random.A, qp_random.b, qp_random.C,
              qp_random.u, qp_random.l);
      qp.solve();
    }
    std::cout << "sparsity_factor: " << sparsity_factor << std::endl;
  }
  auto time_end = std::chrono::system_clock::now();

  std::cout << "Time consumption(s): "
            << std::chrono::duration_cast<std::chrono::milliseconds>(time_end -
                                                                     time_begin)
                       .count() /
                   1000.0
            << std::endl;
  return 0;
}

Whether the option "-DBUILD_WITH_VECTORIZATION_SUPPORT=OFF" is added during compiling ProxSuite or not, the time consumption of this program is almost 8.5s.
The cpu of my computer is Intel i7-12700K, and the OS is Ubuntu 20.04.

from proxsuite.

jcarpent avatar jcarpent commented on July 28, 2024

@RobustControl The way you proceed with the timings of the loops is not correct.
You should only time the qp.solve().
Small, how do you compile this tiny example? Could you share your lines or cmake file?

from proxsuite.

RobustControl avatar RobustControl commented on July 28, 2024

Try the following code:

#include <proxsuite/proxqp/dense/dense.hpp>
#include <proxsuite/proxqp/sparse/sparse.hpp>  // get the sparse API of ProxQP
#include <proxsuite/proxqp/utils/random_qp_problems.hpp>  // used for generating a random convex qp

using namespace proxsuite::proxqp;
using T = double;

int main() {
  double solve_time = 0.0;
  double setup_time = 0.0;
  for (T sparsity_factor = 0.1; sparsity_factor < 0.5; sparsity_factor += 0.1) {
    for (dense::isize dim = 100; dim < 200; dim += 20) {
      dense::isize n_eq(dim * 3);
      dense::isize n_in(dim * 3);
      T strong_convexity_factor(1.e-2);
      dense::Model<T> qp_random = utils::dense_strongly_convex_qp(
          dim, n_eq, n_in, sparsity_factor, strong_convexity_factor);

      dense::QP<T> qp(dim, n_eq, n_in);
      qp.settings.max_iter = 10000;
      qp.settings.max_iter_in = 1000;
      qp.settings.eps_abs = 1e-5;
      qp.settings.eps_rel = 0;
      qp.init(qp_random.H, qp_random.g, qp_random.A, qp_random.b, qp_random.C,
              qp_random.u, qp_random.l);
      qp.solve();
      solve_time += qp.results.info.solve_time;
      setup_time += qp.results.info.setup_time;
    }
    std::cout << "sparsity_factor: " << sparsity_factor << std::endl;
  }

  std::cout << "Setup Time consumption(dense): " << setup_time / 1e6 << "s"
            << std::endl
            << "Solve Time consumption(dense): " << solve_time / 1e6 << "s"
            << std::endl;

  solve_time = 0.0;
  setup_time = 0.0;
  for (T sparsity_factor = 0.1; sparsity_factor < 0.5; sparsity_factor += 0.1) {
    for (dense::isize dim = 100; dim < 200; dim += 20) {
      isize n_eq(dim * 3);
      isize n_in(dim * 3);

      T sparsity_factor = 0.15;  // level of sparsity
      T conditioning = 10.0;     // conditioning level for H

      auto H = ::proxsuite::proxqp::utils::rand::sparse_positive_definite_rand(
          dim, conditioning, sparsity_factor);
      auto A = ::proxsuite::proxqp::utils::rand::sparse_matrix_rand<T>(
          n_eq, dim, sparsity_factor);
      auto C = ::proxsuite::proxqp::utils::rand::sparse_matrix_rand<T>(
          n_in, dim, sparsity_factor);
      auto g = ::proxsuite::proxqp::utils::rand::vector_rand<T>(dim);
      auto x_sol = ::proxsuite::proxqp::utils::rand::vector_rand<T>(dim);
      auto b = A * x_sol;
      auto l = C * x_sol;
      auto u = (l.array() + 10).matrix().eval();

      proxsuite::proxqp::sparse::QP<T, isize> qp(H.cast<bool>(), A.cast<bool>(),
                                                 C.cast<bool>());
      qp.init(H, g, A, b, C, u, l);
      qp.solve();

      solve_time += qp.results.info.solve_time;
      setup_time += qp.results.info.setup_time;
    }
    std::cout << "sparsity_factor: " << sparsity_factor << std::endl;
  }
  std::cout << "Setup Time consumption(sparse): " << setup_time / 1e6 << "s"
            << std::endl
            << "Solve Time consumption(sparse): " << solve_time / 1e6 << "s"
            << std::endl;

  return 0;
}

I modified the proxsuite/examples/cpp/overview-simple.cpp file and used the command "g++ -std=c++17 examples/cpp/overview-simple.cpp -o overview-simple $(pkg-config --cflags proxsuite)" in the official document to compile the file.
Here is a typical result (whether or not the vectorization is enabled during install prosuite)
result

from proxsuite.

jcarpent avatar jcarpent commented on July 28, 2024

You have missed the -O3 -march=native to enable the vectorization using your CPU config.
@fabinsch will provide more details soon.

from proxsuite.

fabinsch avatar fabinsch commented on July 28, 2024

Hey @RobustControl thanks for providing this benchmark file. I suggest adding another loop over N times init and solve to not measure the latency. Also, I modified it in order to be compatible with the newest version of proxsuite (v0.2.0) and to also take the sparsity into account in the sparse (you were overwriting it each time).

The timings are the following on my Intel i7-11850H and ubuntu 20.04:

sparsity_factor: 0.1
Setup Time consumption(dense): 0.00246659s
Solve Time consumption(dense): 0.00977696s
sparsity_factor: 0.2
Setup Time consumption(dense): 0.00414675s
Solve Time consumption(dense): 0.0160001s
sparsity_factor: 0.3
Setup Time consumption(dense): 0.00586993s
Solve Time consumption(dense): 0.0223937s
sparsity_factor: 0.4
Setup Time consumption(dense): 0.00758707s
Solve Time consumption(dense): 0.0289477s
sparsity_factor: 0.1
Setup Time consumption(sparse): 0.000428303s
Solve Time consumption(sparse): 0.169426s
sparsity_factor: 0.2
Setup Time consumption(sparse): 0.00111007s
Solve Time consumption(sparse): 0.33179s
sparsity_factor: 0.3
Setup Time consumption(sparse): 0.00204471s
Solve Time consumption(sparse): 0.484549s
sparsity_factor: 0.4
Setup Time consumption(sparse): 0.00323929s
Solve Time consumption(sparse): 0.651238s

to compile the file, I used

 g++ -O3 -march=native -DNDEBUG -DPROXSUITE_VECTORIZE -std=gnu++17 timings.cpp -o timings $(pkg-config --cflags proxsuite)

I will document this in our readme file, thanks for pointing out that clear instructions were missing. Only using the -DPROXSUITE_VECTORIZE option is not enough, you need to tell the compiler to use the corresponding instruction set for your CPU, see also here.

and the file timings.cpp has the following content:

#include <proxsuite/proxqp/dense/dense.hpp>
#include <proxsuite/proxqp/sparse/sparse.hpp>  // get the sparse API of ProxQP
#include <proxsuite/proxqp/utils/random_qp_problems.hpp>  // used for generating a random convex qp

using namespace proxsuite::proxqp;
using T = double;

int main() {
  double N = 100;
  double counter = 0.0;  // outer loop
  double solve_time = 0.0;
  double setup_time = 0.0;
  for (T sparsity_factor = 0.1; sparsity_factor < 0.5; sparsity_factor += 0.1) {
    for (dense::isize dim = 100; dim < 200; dim += 20) {
      dense::isize n_eq(dim * 3);
      dense::isize n_in(dim * 3);
      T strong_convexity_factor(1.e-2);
      dense::Model<T> qp_random = utils::dense_strongly_convex_qp(
          dim, n_eq, n_in, sparsity_factor, strong_convexity_factor);

      for (int i = 0; i < N; i++) {
        dense::QP<T> qp(dim, n_eq, n_in);
        qp.settings.max_iter = 10000;
        qp.settings.max_iter_in = 1000;
        qp.settings.eps_abs = 1e-5;
        qp.settings.eps_rel = 0;
        qp.init(qp_random.H, qp_random.g, qp_random.A, qp_random.b, qp_random.C,
                qp_random.l, qp_random.u);
        qp.solve();
        solve_time += qp.results.info.solve_time / N;
        setup_time += qp.results.info.setup_time / N;
      }
      counter += 1.0;
    }
    std::cout << "sparsity_factor: " << sparsity_factor << std::endl;
    std::cout << "Setup Time consumption(dense): " << setup_time / (1e6 * counter) << "s"
              << std::endl
              << "Solve Time consumption(dense): " << solve_time / (1e6 * counter) << "s"
              << std::endl;
    counter = 0.0;
  }

  solve_time = 0.0;
  setup_time = 0.0;

  for (T sparsity_factor = 0.1; sparsity_factor < 0.5; sparsity_factor += 0.1) {
    for (dense::isize dim = 100; dim < 200; dim += 20) {
      isize n_eq(dim * 3);
      isize n_in(dim * 3);  
      T conditioning = 10.0;     // conditioning level for H  
      auto H = ::proxsuite::proxqp::utils::rand::sparse_positive_definite_rand(
          dim, conditioning, sparsity_factor);
      auto A = ::proxsuite::proxqp::utils::rand::sparse_matrix_rand<T>(
          n_eq, dim, sparsity_factor);
      auto C = ::proxsuite::proxqp::utils::rand::sparse_matrix_rand<T>(
          n_in, dim, sparsity_factor);
      auto g = ::proxsuite::proxqp::utils::rand::vector_rand<T>(dim);
      auto x_sol = ::proxsuite::proxqp::utils::rand::vector_rand<T>(dim);
      auto b = A * x_sol;
      auto l = C * x_sol;
      auto u = (l.array() + 10).matrix().eval();  
      for (int i = 0; i < N; i++) {
        proxsuite::proxqp::sparse::QP<T, isize> qp(H.cast<bool>(), A.cast<bool>(),
                                                   C.cast<bool>());
      
        qp.settings.max_iter = 10000;
        qp.settings.max_iter_in = 1000;
        qp.settings.eps_abs = 1e-5;
        qp.settings.eps_rel = 0;
        qp.init(H, g, A, b, C, l, u);
        qp.solve();
        solve_time += qp.results.info.solve_time / N;
        setup_time += qp.results.info.setup_time / N;
      }
      counter += 1.0;  
    }
    std::cout << "sparsity_factor: " << sparsity_factor << std::endl;
  
    std::cout << "Setup Time consumption(sparse): " << setup_time / (1e6 * counter) << "s"
              << std::endl
              << "Solve Time consumption(sparse): " << solve_time / (1e6 * counter) << "s"
              << std::endl;
    counter = 0.0;
  }

  return 0;
}

Note after discussing with @Bambade: your specific setup of having n_constraint = 3 * n_vars is not favorable for the current implementation of the sparse backend. If you have fewer constraints, like n_constraint = 0.1 * n_vars, using the sparse backend gets more interesting.

from proxsuite.

jcarpent avatar jcarpent commented on July 28, 2024

I will close this issue as it seems to be solved.

from proxsuite.

RobustControl avatar RobustControl commented on July 28, 2024

fabinsch

Hello @fabinsch Thank you for your detailed reply! I ran your example and got the same result.
But I have a doubt about your test code, that is why divide by N is placed in the for loop? I think you want to write the code as follow.

      for (int i = 0; i < N; i++) {
        proxsuite::proxqp::sparse::QP<T, isize> qp(H.cast<bool>(), A.cast<bool>(),
                                                   C.cast<bool>());
      
        qp.settings.max_iter = 10000;
        qp.settings.max_iter_in = 1000;
        qp.settings.eps_abs = 1e-5;
        qp.settings.eps_rel = 0;
        qp.init(H, g, A, b, C, l, u);
        qp.solve();
        solve_time += qp.results.info.solve_time;
        setup_time += qp.results.info.setup_time;
      }
      solve_time /= N;
      solve_time /= N;

I will try to enable vectorization the in our project later.

from proxsuite.

fabinsch avatar fabinsch commented on July 28, 2024

a / N + b / N = (a + b) / N

If N is very large, it is indeed better to do it outside. But sure you can do it outside the loop to have fewer divisions.

from proxsuite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.