Giter Club home page Giter Club logo

Comments (7)

Maratyszcza avatar Maratyszcza commented on May 10, 2024

Here is the list of reason why NNPACK doesn't use OpenMP (ordered by importance):

  1. Not all platforms support OpenMP. In particular, the default compilers in XCode and (Portable) Native Client do not support it.
  2. pthreadpool is a small threading library, with well-documented source code. It is easy to modify it for additional needs of NNPACK. On the contrary, OpenMP implementations are huge, interleaved with compiler front-ends and internal representation, and have a steep learning curve.
  3. NNPACK uses size_t everywhere, but OpenMP requires int loop counters. Threading on top of OpenMP would be a source of sophisticated bugs.
  4. Unlike OpenMP, pthreadpool can parallelize 2D loops (and does it without division for each loop iteration).
  5. pthreadpool uses work stealing for balancing load between different threads. It is a more efficient scheduling strategy than the ones implemented in OpenMP, and it produces predictable memory access patterns, which NNPACK relies on.

from nnpack.

Darwin2011 avatar Darwin2011 commented on May 10, 2024

Thanks!

from nnpack.

Darwin2011 avatar Darwin2011 commented on May 10, 2024

Thanks for your explaination, @Maratyszcza

  • Now Openmp3 can support unsigned or signed as loop counter.
  • openmp can also collapse 2-fold loop even though it not graceful.
  • For me, real problem is that for dual-socket Haswell machine, the CPU utilization is low. NNPACK cannot fully utilize multicore for thread parallelism.
  • I am trying to tune NNPACK with openmp recently and hope you can give some help.

from nnpack.

Maratyszcza avatar Maratyszcza commented on May 10, 2024

@Darwin2011 Poor dual-socket performance is not related to the threading library, but rather the result of the assumption in NNPACK that all cores share L3 cache. When this assumption doesn't hold, the cores evict each other's cache lines.

from nnpack.

Darwin2011 avatar Darwin2011 commented on May 10, 2024

@Maratyszcza
Any plan to fix this? I can also work on this if you can give me some help.

from nnpack.

Maratyszcza avatar Maratyszcza commented on May 10, 2024

@Darwin2011 I'm think on the plan to improve multi-socket scaling. Fundamentally, two problems need to be solved:

  1. NNPACK assumes that all threads in a thread pool share L3 cache. NNPACK arranges computations is such way that blocks of L3 cache prefetched by different cores are reused by all cores.
  2. NNPACK's memory allocation is not NUMA-aware, and all memory allocation is done on the calling thread, which means memory is allocated on the NUMA node that called the NNPACK function. Very likely, you could get better performance by running the NNPACK-linked application with numactl --interleave all.

from nnpack.

Darwin2011 avatar Darwin2011 commented on May 10, 2024

Can I just separate input images into two streams(one stream per socket) and prepare two threads pool for those stream?

from nnpack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.