Giter Club home page Giter Club logo

Comments (4)

GeorgeTattersallFn avatar GeorgeTattersallFn commented on August 28, 2024 1

Sure, I should have included some more metrics initially - here's what you asked for:

Type 2k 4k 8k 16k
Uncompressed Scanlines 11.32ms 38.18ms 143.57ms 554.91ms
Uncompressed Tiled 12.11ms 43.66ms 156.71ms 560.20ms
DWAB Scanlines 13.33ms 32.94ms 67.87ms 248.48ms
DWAB Tiled 9.79ms 23.24ms 67.52ms 235.73ms

Those results were taken with calls to RgbaInputFile::readPixels and TiledRgbaInputFile::readTiles, surrounded by QueryPerformanceCounter. I saw a lot of your suggestions in #1717 for a similar-ish (inverse) problem and don't think our method of loading from storage should be an issue, we're able to saturate PCIe in our actual app (not the test app I used for these results), so I'm mainly wondering about decompress performance.

Any thoughts appreciated.

from openexr.

meshula avatar meshula commented on August 28, 2024

As a baseline, do you have metrics for uncompressed scanline and tiled reads?

from openexr.

meshula avatar meshula commented on August 28, 2024

This is fantastic data, thank you. Are you aware of the staging/cpp-rewrite branch? The Core has been rewritten in C with concurrency and general performance in mind; it will be merged to main once it matures. It would be great to have the same metrics for the branch, although I hate to ask you to take on more work.

from openexr.

GeorgeTattersallFn avatar GeorgeTattersallFn commented on August 28, 2024

I wasn't aware of that branch - thanks for pointing it out. I've given it a test run and here's the equivalent data:

Type 2k 4k 8k 16k
Uncompressed Scanlines 12.31ms 33.26ms 96.85ms 275.59ms
Uncompressed Tiled 7.44ms 19.96ms 77.50ms 253.73ms
DWAB Scanlines 21.48ms 42.18ms 107.85ms 380.17ms
DWAB Tiled 14.86ms 31.81ms 99.20ms 391.36ms

The speedups for uncompressed data are quite incredible! Unfortunately, as things currently stand, it seems like DWA compressions have a slight decompression performance hit on my system. I've double checked everything and tried to include a brief look at where my CPU is spending time on staging/cpp_core_rewrite:

16k raw scanline:
|--------------------------------------------------| total CPU time 100%
|--------------------------------------|             unpack_16bit_4chan_interleave_rev 75.11%
|--------|                                           default_read_func 15.61%
|-----|                                              other ~9.28%

16k raw tile:
|--------------------------------------------------| total CPU time 100%
|----------------------------------------|           unpack_16bit_4chan_interleave_rev 79.97%
|----------|                                         default_read_func 19.19%
|-|                                                  other ~0.84%

16k DWAB scanline:
|--------------------------------------------------| total CPU time 100%
|-------------------------------------|              DwaCompressor_uncompress 73.6%
|----------|                                         unpack_16bit_4chan_interleave_rev 19.21%
|--|                                                 DwaCompressor_destroy 4.68%
|-|                                                  other ~2%


16k DWAB tiles:
|--------------------------------------------------| total CPU time 100%
|----------------------------|                       DwaCompressor_uncompress 56.55%
|----------------|                                   unpack_16bit_4chan_interleave_rev 31.85%
|-----|                                              DwaCompressor_destroy 10.82%
|-|                                                  other ~1%

A slightly finer grained, but still brief, look at DwaCompressor_uncompress on staging/cpp_core_rewrite:

DwaCompressor_uncompress (scanline & tiled are similar)
|--------------------------------------------------| total CPU time 100%
|----------|                                         fromHalfZigZag_scalar ~20%
|----------|                                         DwaCompressor_initializeBuffers ~20% (scanline 22%, tiled 18%)
|---------|                                          convertFloatToHalf64_scalar ~17%
|------|                                             internal_huf_decompress ~11%
|----------------|                                   other ~32%

I notice that, as you mentioned, the cpp_core_rewrite DWA codepaths are quite different to the main ones, so it might not be much use to draw a comparison, but here's brief attempt:

DwaCompressor::uncompress (scanline & tiled are similar)
|--------------------------------------------------| total CPU time 100%
|---|                                                Imf3_3::<anon namespace>::fromHalfZigZag_scalar ~6%
|--------------|                                     Imf3_3::<anon namespace>::convertFloatToHalf64_scalar ~27%
|------|                                             Imf_3_3::hufUncompress ~12%
|----------------------------|                       other ~55%
(Unable to find equivalent to DwaCompressor_initializeBuffers, Imf3_3::DwaCompressor::initializewBuffers seems to take up ~0.07% of DwaCompressor::uncompress, so I imagine this functionality happens elsewhere)

How do these timing line up with what you'd expect from the branch? I don't believe my CPU is making use of a lot of optimisations in internal_dwa_simd.h, which is a shame.

Aside from that, if you have any more suggestions to try, they'd be appreciated. Though, if this is the best we'll get on CPU, that's fine - it feels like real-time is quite far away, and I realise I'm asking for long shots.

from openexr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.