Comments (7)
Here is the list of reason why NNPACK doesn't use OpenMP (ordered by importance):
- Not all platforms support OpenMP. In particular, the default compilers in XCode and (Portable) Native Client do not support it.
pthreadpool
is a small threading library, with well-documented source code. It is easy to modify it for additional needs of NNPACK. On the contrary, OpenMP implementations are huge, interleaved with compiler front-ends and internal representation, and have a steep learning curve.- NNPACK uses
size_t
everywhere, but OpenMP requiresint
loop counters. Threading on top of OpenMP would be a source of sophisticated bugs. - Unlike OpenMP,
pthreadpool
can parallelize 2D loops (and does it without division for each loop iteration). - pthreadpool uses work stealing for balancing load between different threads. It is a more efficient scheduling strategy than the ones implemented in OpenMP, and it produces predictable memory access patterns, which NNPACK relies on.
from nnpack.
Thanks!
from nnpack.
Thanks for your explaination, @Maratyszcza
- Now Openmp3 can support unsigned or signed as loop counter.
- openmp can also collapse 2-fold loop even though it not graceful.
- For me, real problem is that for dual-socket Haswell machine, the CPU utilization is low. NNPACK cannot fully utilize multicore for thread parallelism.
- I am trying to tune NNPACK with openmp recently and hope you can give some help.
from nnpack.
@Darwin2011 Poor dual-socket performance is not related to the threading library, but rather the result of the assumption in NNPACK that all cores share L3 cache. When this assumption doesn't hold, the cores evict each other's cache lines.
from nnpack.
@Maratyszcza
Any plan to fix this? I can also work on this if you can give me some help.
from nnpack.
@Darwin2011 I'm think on the plan to improve multi-socket scaling. Fundamentally, two problems need to be solved:
- NNPACK assumes that all threads in a thread pool share L3 cache. NNPACK arranges computations is such way that blocks of L3 cache prefetched by different cores are reused by all cores.
- NNPACK's memory allocation is not NUMA-aware, and all memory allocation is done on the calling thread, which means memory is allocated on the NUMA node that called the NNPACK function. Very likely, you could get better performance by running the NNPACK-linked application with
numactl --interleave all
.
from nnpack.
Can I just separate input images into two streams(one stream per socket) and prepare two threads pool for those stream?
from nnpack.
Related Issues (20)
- NNPACK with Windows support HOT 4
- A compilation error occurs in the Linux ARM environment HOT 1
- potential unitialized variable in nnp_sgemm_upto_4x8__psimd HOT 1
- not found /bin/banchmarkxxx
- Why do more threads take longer?
- AltiVec/PowerPC (OpenPOWER ISA 3.0B or greater) Acceleration Support HOT 1
- CMakeLists.txt broken on MSYS2/MINGW64/AMD64 (Windows) HOT 3
- Real-time human detection on Pi 4 HOT 1
- 'vdotq_lane_s32' is invalid in C99 [-Wimplicit-function-declaration] HOT 1
- Build failed, cos_npi_over_8 is not available in common HOT 1
- ModuleNotFoundError: No module named 'peachpy.x86_64.avx' HOT 7
- make install dont link to libcpuinfo.so HOT 1
- NNPACK builds are not bit-for-bit reproducible HOT 1
- Unsupported Hardware on VM with compatible CPU HOT 3
- Does NNPACK fall back to non-accelerated code when "Could not initialize NNPACK! Reason: Unsupported hardware." occurs? HOT 1
- ld: in lib/libnnpack.a(conv1x1.py.o), section __TEXT/__const address out of range for architecture x86_64
- Use CPack for packaging HOT 1
- After Installing NNPACK on MacBook Pro 15, late 2012 retina, I still get: [W NNPACK.cpp:51] Could not initialize NNPACK! Reason: Unsupported hardware.
- CMake error cpuinfo-gitclone.cmake:40 (message): Failed to checkout tag: 'master'
- [W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nnpack.