Giter Club home page Giter Club logo

Comments (12)

mkj avatar mkj commented on July 21, 2024 1

I think having a default of -O3 for the crypto/maths libraries makes sense (that's where performance matters), but -Os is a better default for non-performance code. I guess there could be defaults for each of them which can be overridden with environment variables - LIBTOM_CFLAGS=-O3 from Dropbear's configure script perhaps.

from dropbear.

mkj avatar mkj commented on July 21, 2024

I wasn't aware of it, though haven't benchmarked.
libtommath was updated in 2020.79 which seems the most likely change.
Update LibTomMath to 1.2.0

update ltm to 1.1.0 and enable FIPS 186.4 compliant key-generation
(The "FIPS 186.4 compliant" change should have actually made 2048 RSA faster, 6 rounds rather than 8)

There are also a few changes to dbrandom.c though those seem less likely to change speed.

from dropbear.

davidbernard04 avatar davidbernard04 commented on July 21, 2024

About FIPS compliancy, it is actually 12 rounds rather than 8, since primes size for a 2048-bit RSA key are 1024 bits. But I've made a few test loops by hardcoding 8 rounds on the 2022.82 version, and it has no significant impact (some loops are actually worst, lol).

I've made another run of test with 80 keys generated on our ARMv5 platform, and that's in line with previous results:

  • 2018.76: 15.9 sec average for 80 keys
  • 2022.82: 38.1 sec average for 40 keys (with default 12 rounds)
  • 2022.82: 33.6 sec average for 40 keys (with hardcoded 8 rounds)

So I guess that's really the LibTomMath upgrade that affected the generation time.

Any idea what we can do to improve? Otherwise I suppose we'll have to increase our generation timeout on our low-power platforms. Well, that's the cost of progress. ;)

from dropbear.

mkj avatar mkj commented on July 21, 2024

Ah, the CFLAGS order changed in libtommath's own makefile. Previously its own -O3 came after Dropbear's default -Os, in 2022.82 it builds with -Os. Could you try the patch below? I see 30kB extra binary size on x64 though, will have to have a look at that.

--- a/libtommath/makefile_include.mk
+++ b/libtommath/makefile_include.mk
@@ -104,7 +104,7 @@ LIBTOOLFLAGS += -no-undefined
 endif
 
 # add in the standard FLAGS
-LTM_CFLAGS += $(CFLAGS)
+LTM_CFLAGS := $(CFLAGS) $(LTM_CFLAGS)
 LTM_LFLAGS += $(LFLAGS)
 LTM_LDFLAGS += $(LDFLAGS)
 LTM_LIBTOOLFLAGS += $(LIBTOOLFLAGS)

For reference
2018.76 (from ps aux so may be a bit different)
-D _FORTIFY_SOURCE=2 -D DROPBEAR_SERVER -D DROPBEAR_CLIENT ../../libtommath/bn_mp_fwrite.c -quiet -dumpbase bn_mp_fwrite.c -dumpbase-ext .c -mtune=generic -march=x86-64 -g -Os -O3 -Wno-pointer-sign -Wall -Wsign-compare -Wextra -Wshadow -Wsystem-headers -Wdeclaration-after-statement -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wpointer-arith -fno-strict-overflow -fPIE -fstack-protector-strong -funroll-loops -fomit-frame-pointer -fasynchronous-unwind-tables -fstack-protector-strong -Wformat-security -fstack-clash-protection -fcf-protection

2022.82
-Wall -Wsign-compare -Wextra -Wshadow -Wdeclaration-after-statement -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wpointer-arith -Wsystem-headers -O3 -funroll-loops -fomit-frame-pointer -Os -W -Wall -Wno-pointer-sign -fno-strict-overflow -fPIE -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Werror -ggdb -I../../libtommath -I../libtomcrypt/src/headers/ -I../../libtommath/../libtomcrypt/src/headers/ -I../ -I../../libtommath/../ -Wno-deprecated -I../libtomcrypt/src/headers/ -DLOCALOPTIONS_H_EXISTS -I. -I.. -DDROPBEAR_SERVER -DDROPBEAR_CLIENT ../../libtommath/bn_mp_div_2.c -o bn_mp_div_2.o

from dropbear.

davidbernard04 avatar davidbernard04 commented on July 21, 2024

Nice catch! I've tried with your patch and there is an improvement:

  • 2022.82: 35 sec average (-Os)
  • 2022.82: 25 sec average (-O3)

Although still not as fast as the 2018 version:

  • 2018.76: 16 sec average (-O3)

About binary size, there is an extra of ~20 KB on each dropbear and dropbearkey executables between -Os and -O3 for the 2022.82 version (ARM platform).

FYI on our side, we might choose to stick with the -Os flag to save some space (we have some platforms in dire need of free space, lol).

from dropbear.

mkj avatar mkj commented on July 21, 2024

OK right. When I get a chance I'll see if I can compare perf of the releases (and rummage for some older boards to try).
On x64 here the make timing; ./timing benchmark of standalone libtommath is the same between the versions.

from dropbear.

davidbernard04 avatar davidbernard04 commented on July 21, 2024

@mkj On second thought, I personally prefer the makefile_include.mk before your patch above. This way, we can control the optimization flag for everything (dropbear, libtomcrypto and libtommath) without having one of the lib that forces a -O3 regardless of the CFLAGS we specified.

from dropbear.

sjaeckel avatar sjaeckel commented on July 21, 2024

@davidbernard04 that's caused by internal changes of libtommath where previously the prime verification simply wasn't as good as it is now... and this comes at a speed penalty.

tl;dr - if you are only using RSA you can set the compile-time flag LTM_USE_ONLY_MR of libtommath (I hope you use the bundled version :) this makes life here easier for once), which will disable the LR test and you should be fine to go. Please let me know how your performance numbers evolve, since there were some other changes in this area.


longer story...

2018.76 still used ltm 1.0.1 which always only did t deterministic Miller-Rabin Tests for primality testing. (That's definitively not what you want).
Since ltm 1.1.0 we run t rounds of the MR test against random numbers + a Lucas-Selfridge test. That's how FIPS 186-4 expects how primality testing is done ideally. Before it was simply not according to "the standard"^TM.

After having a quick look into FIPS 186-4 I just realized that it states in Appendix C.3 that for RSA keys there's no requirement for a LR test, only for DSA keys.

... maybe we overshot a bit on the paranoia scale? Maybe, but I prefer to offer a secure default and users can then still decide whether performance is more important to them.


@mkj:

please feel free to tag me in such issues, I don't watch dropbear that closely and only stumbled over this by chance.

I'll take this back to the project and check whether we can find a compatible way where LR tests can be enabled at compile time, but disabled at runtime. Currently that's not possible.

I'd say you can safely turn on the LTM_USE_ONLY_MR switch if DSA key generation is disabled.
For configurations that require both, RSA and DSA key generation at runtime, I don't have a good solution ready.
Do you even still have DSA key generation enabled per default?

One possibility would be to always enable LTM_USE_ONLY_MR and use the number of MR iterations of column 'M-R Tests Only' of Table C.1 in FIPS 186-4 for DSA and the numbers returned by mp_prime_rabin_miller_trials() for RSA...

from dropbear.

davidbernard04 avatar davidbernard04 commented on July 21, 2024

@sjaeckel Thanks a lot for your feedback! I've made some tests on the 2022.82 version with the LTM_USE_ONLY_MR define (on our ARMv5 platform), with 80 keys generated for each test below.

For -Os size-optimized build:

  • 36 sec (no define)
  • 26 sec (with the MR define)

For -O3 speed-optimized build:

  • 24 sec (no define)
  • 20 sec (with the MR define)

So it does improve performance for RSA keys generation!

But our app do generate both RSA and ECDSA keys, so unless you or @mkj implement a way for LR tests to be dynamic at runtime (i.e. enabled only for DSA keys), it is preferable that we don't use the LTM_USE_ONLY_MR define.

from dropbear.

mkj avatar mkj commented on July 21, 2024

I've now added a LTM_CFLAGS argument to configure - it defaults to -O3 -funroll-loops like before, but you can set
./configure LTM_CFLAGS=-Os
54a90dd

I'm also setting LTM_USE_ONLY_MR only when DROPBEAR_DSS is enabled (and have set DROPBEAR_DSS disabled by default,
it's on the way out).
0e70732

@sjaeckel just to confirm, ecdsa doesn't seem to need primality testing at all does it?

from dropbear.

mkj avatar mkj commented on July 21, 2024

That should read "setting LTM_USE_ONLY_MR only when DROPBEAR_DSS is disabled".
And I've noticed the check was wrong, fixed 960d374

from dropbear.

sjaeckel avatar sjaeckel commented on July 21, 2024

@sjaeckel just to confirm, ecdsa doesn't seem to need primality testing at all does it?

Correct, primality testing is only required for RSA and DSA.

from dropbear.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.