Giter Club home page Giter Club logo

Comments (6)

htot avatar htot commented on July 23, 2024

@aklomp are these in any way useful?

from base64.

aklomp avatar aklomp commented on July 23, 2024

@htot Thanks for your work. It's interesting to see that not all "improvements" to the library have led to actual improvements in real-world benchmarks. Which proves that we need to be careful when introducing new tricks, because some users may be worse off. That said, apart from SSSE3, the trend seems to be upward.

The SSSE3 thing could be due to register pressure. I think I saw the same degradation happen on my Atom N270 (a super weird processor, a 32-bit core with up to SSSE3 support) and when I tried to hand-optimize with inline assembler, I found out that that architecture has much less SSE registers available to it than big-boy x86's. Which results in lots of register moves and slow code. I didn't bother much with it because I considered that use case so niche...

I think these benchmarks are cool and might be useful as a jumping-off point for analyzing performance degradations in past commits, but apart from that I don't see a major use for them. The idea of graphing out performance over time is very powerful though, and I'll try to remember it for my toolbox.

from base64.

htot avatar htot commented on July 23, 2024

I think the Atom is like Baytrail (and Edison) a x86_64 CPU, they support SSSE3 but not AVX. The core is Silvermont (SLM) which has a penalty for long 64 bit instructions (complicated story), that might be the case here too (I have not test on i686 mode). If so, goldmont / airmont may behave completely different (but I don't have those here).

My i7-10700 btw appears to have 16MB L3 cache. So above benchmarks are not really usable (typically nobody ever would encode the same string twice). I patched to add a 100MB string and find that in all cases except "plain" we are near the bandwidth limit of the DDR. And even "plain" with openmp reaches bandwidth limit.

All these optimizations are useful in particular on the slow Atoms, but there we had a degradation. I'll add some improvements here, maybe you can label this "not a bug"?

from base64.

aklomp avatar aklomp commented on July 23, 2024

The Intel Atom N270 really is a 32-bit Diamondville core with SSSE3 extensions, as you can see on Intel's site. It's a very low power (2.5W TDP), passively cooled mobile processor with this weird feature mix for some reason. I've been using it in my home server for the last 12 years. Bit slow at times but gets the job done. Anyway.

I created a "benchmarking" label and added it to this issue. I'll leave it open for the time being, then.

from base64.

htot avatar htot commented on July 23, 2024

This is with 100MB buffer and OPENMP.
decode-threaded

Here you see earlier i7 results were optimistic due to the L3 cache (Edison is not affected, it has no cache).

encode-threaded

And something strange here with AVX2 decoding.

Nevertheless looking at the history
i7-encode-large

We know that the specialized encoding is much faster than plain, but as they are mostly DDR bandwidth limited, we don't see that.
However, plain has seen some nice improvements over time.

i7-decode-large

After the early improvements on decoding not much changed on i7. The decoding performance hit on Edison (Atom, or maybe even unique to Silvermont) is not present on i7. It would be worth reviving the old decoder for when Atom is detected.

from base64.

htot avatar htot commented on July 23, 2024

The Intel Atom N270 really is a 32-bit Diamondville core with SSSE3 extensions, as you can see on Intel's site. It's a very low power (2.5W TDP), passively cooled mobile processor with this weird feature mix for some reason. I've been using it in my home server for the last 12 years. Bit slow at times but gets the job done. Anyway.

I created a "benchmarking" label and added it to this issue. I'll leave it open for the time being, then.

I see. That's confusing, there are also Diamondville CPUs with 64-bit (Atom 230).

from base64.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.