Comments (4)
Sure, I should have included some more metrics initially - here's what you asked for:
Type | 2k | 4k | 8k | 16k |
---|---|---|---|---|
Uncompressed Scanlines | 11.32ms | 38.18ms | 143.57ms | 554.91ms |
Uncompressed Tiled | 12.11ms | 43.66ms | 156.71ms | 560.20ms |
DWAB Scanlines | 13.33ms | 32.94ms | 67.87ms | 248.48ms |
DWAB Tiled | 9.79ms | 23.24ms | 67.52ms | 235.73ms |
Those results were taken with calls to RgbaInputFile::readPixels
and TiledRgbaInputFile::readTiles
, surrounded by QueryPerformanceCounter
. I saw a lot of your suggestions in #1717 for a similar-ish (inverse) problem and don't think our method of loading from storage should be an issue, we're able to saturate PCIe in our actual app (not the test app I used for these results), so I'm mainly wondering about decompress performance.
Any thoughts appreciated.
from openexr.
As a baseline, do you have metrics for uncompressed scanline and tiled reads?
from openexr.
This is fantastic data, thank you. Are you aware of the staging/cpp-rewrite
branch? The Core has been rewritten in C with concurrency and general performance in mind; it will be merged to main once it matures. It would be great to have the same metrics for the branch, although I hate to ask you to take on more work.
from openexr.
I wasn't aware of that branch - thanks for pointing it out. I've given it a test run and here's the equivalent data:
Type | 2k | 4k | 8k | 16k |
---|---|---|---|---|
Uncompressed Scanlines | 12.31ms | 33.26ms | 96.85ms | 275.59ms |
Uncompressed Tiled | 7.44ms | 19.96ms | 77.50ms | 253.73ms |
DWAB Scanlines | 21.48ms | 42.18ms | 107.85ms | 380.17ms |
DWAB Tiled | 14.86ms | 31.81ms | 99.20ms | 391.36ms |
The speedups for uncompressed data are quite incredible! Unfortunately, as things currently stand, it seems like DWA compressions have a slight decompression performance hit on my system. I've double checked everything and tried to include a brief look at where my CPU is spending time on staging/cpp_core_rewrite
:
16k raw scanline:
|--------------------------------------------------| total CPU time 100%
|--------------------------------------| unpack_16bit_4chan_interleave_rev 75.11%
|--------| default_read_func 15.61%
|-----| other ~9.28%
16k raw tile:
|--------------------------------------------------| total CPU time 100%
|----------------------------------------| unpack_16bit_4chan_interleave_rev 79.97%
|----------| default_read_func 19.19%
|-| other ~0.84%
16k DWAB scanline:
|--------------------------------------------------| total CPU time 100%
|-------------------------------------| DwaCompressor_uncompress 73.6%
|----------| unpack_16bit_4chan_interleave_rev 19.21%
|--| DwaCompressor_destroy 4.68%
|-| other ~2%
16k DWAB tiles:
|--------------------------------------------------| total CPU time 100%
|----------------------------| DwaCompressor_uncompress 56.55%
|----------------| unpack_16bit_4chan_interleave_rev 31.85%
|-----| DwaCompressor_destroy 10.82%
|-| other ~1%
A slightly finer grained, but still brief, look at DwaCompressor_uncompress
on staging/cpp_core_rewrite
:
DwaCompressor_uncompress (scanline & tiled are similar)
|--------------------------------------------------| total CPU time 100%
|----------| fromHalfZigZag_scalar ~20%
|----------| DwaCompressor_initializeBuffers ~20% (scanline 22%, tiled 18%)
|---------| convertFloatToHalf64_scalar ~17%
|------| internal_huf_decompress ~11%
|----------------| other ~32%
I notice that, as you mentioned, the cpp_core_rewrite
DWA codepaths are quite different to the main
ones, so it might not be much use to draw a comparison, but here's brief attempt:
DwaCompressor::uncompress (scanline & tiled are similar)
|--------------------------------------------------| total CPU time 100%
|---| Imf3_3::<anon namespace>::fromHalfZigZag_scalar ~6%
|--------------| Imf3_3::<anon namespace>::convertFloatToHalf64_scalar ~27%
|------| Imf_3_3::hufUncompress ~12%
|----------------------------| other ~55%
(Unable to find equivalent to DwaCompressor_initializeBuffers, Imf3_3::DwaCompressor::initializewBuffers seems to take up ~0.07% of DwaCompressor::uncompress, so I imagine this functionality happens elsewhere)
How do these timing line up with what you'd expect from the branch? I don't believe my CPU is making use of a lot of optimisations in internal_dwa_simd.h
, which is a shame.
Aside from that, if you have any more suggestions to try, they'd be appreciated. Though, if this is the best we'll get on CPU, that's fine - it feels like real-time is quite far away, and I realise I'm asking for long shots.
from openexr.
Related Issues (20)
- Python crashes when trying to write an output file HOT 2
- Is there any way to write a bigger image?
- [Need suggestions/recommendations] OpenEXR for high-speed images grabbing application scenario. HOT 7
- update metadata without rewriting the entire file HOT 4
- Bazel install target HOT 10
- fatal clang-format error on macOS can't upgrade/install via Homebrew HOT 13
- Cross-compiler target linking failure HOT 1
- Uninitialized sliceOptimizationData::type HOT 2
- OpenEXR.OutputFile Fails to Write to BytesIO Buffer HOT 1
- Install openexr 3.2.4 to macOS Big Sur 11.7.10 HOT 9
- Unit tests fail when building UB2 binaries on Mac. HOT 1
- CMP0107 issue
- OpenEXR static build not creating libIlmImf.a HOT 5
- Chromaticities should be a required attribute HOT 7
- terminate called after throwing an instance of `Iex_3_2::EioExc` in `OpenEXR.InputFile.channels` HOT 1
- AppleClang compiler warning with C++17
- std::atomic_*() overloads for shared_ptr are deprecated in C++20
- writing image from Python wrapper with xDensity gives "unknown attribute" warning
- Building OpenEXR with Python bindings HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openexr.