Hi! We are using dropbearkey in embedded systems, and noticed that the average delay f

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

RSA key generation time seems to have doubled about dropbear HOT 12 CLOSED

mkj commented on July 21, 2024

RSA key generation time seems to have doubled

from dropbear.

Comments (12)

mkj commented on July 21, 2024 1

I think having a default of -O3 for the crypto/maths libraries makes sense (that's where performance matters), but -Os is a better default for non-performance code. I guess there could be defaults for each of them which can be overridden with environment variables - LIBTOM_CFLAGS=-O3 from Dropbear's configure script perhaps.

from dropbear.

mkj commented on July 21, 2024

I wasn't aware of it, though haven't benchmarked.
libtommath was updated in 2020.79 which seems the most likely change.
Update LibTomMath to 1.2.0

update ltm to 1.1.0 and enable FIPS 186.4 compliant key-generation
(The "FIPS 186.4 compliant" change should have actually made 2048 RSA faster, 6 rounds rather than 8)

There are also a few changes to dbrandom.c though those seem less likely to change speed.

from dropbear.

davidbernard04 commented on July 21, 2024

About FIPS compliancy, it is actually 12 rounds rather than 8, since primes size for a 2048-bit RSA key are 1024 bits. But I've made a few test loops by hardcoding 8 rounds on the 2022.82 version, and it has no significant impact (some loops are actually worst, lol).

I've made another run of test with 80 keys generated on our ARMv5 platform, and that's in line with previous results:

2018.76: 15.9 sec average for 80 keys
2022.82: 38.1 sec average for 40 keys (with default 12 rounds)
2022.82: 33.6 sec average for 40 keys (with hardcoded 8 rounds)

So I guess that's really the LibTomMath upgrade that affected the generation time.

Any idea what we can do to improve? Otherwise I suppose we'll have to increase our generation timeout on our low-power platforms. Well, that's the cost of progress. ;)

from dropbear.

mkj commented on July 21, 2024

Ah, the CFLAGS order changed in libtommath's own makefile. Previously its own -O3 came after Dropbear's default -Os, in 2022.82 it builds with -Os. Could you try the patch below? I see 30kB extra binary size on x64 though, will have to have a look at that.

--- a/libtommath/makefile_include.mk
+++ b/libtommath/makefile_include.mk
@@ -104,7 +104,7 @@ LIBTOOLFLAGS += -no-undefined
 endif
 
 # add in the standard FLAGS
-LTM_CFLAGS += $(CFLAGS)
+LTM_CFLAGS := $(CFLAGS) $(LTM_CFLAGS)
 LTM_LFLAGS += $(LFLAGS)
 LTM_LDFLAGS += $(LDFLAGS)
 LTM_LIBTOOLFLAGS += $(LIBTOOLFLAGS)

For reference
2018.76 (from ps aux so may be a bit different)
-D _FORTIFY_SOURCE=2 -D DROPBEAR_SERVER -D DROPBEAR_CLIENT ../../libtommath/bn_mp_fwrite.c -quiet -dumpbase bn_mp_fwrite.c -dumpbase-ext .c -mtune=generic -march=x86-64 -g -Os -O3 -Wno-pointer-sign -Wall -Wsign-compare -Wextra -Wshadow -Wsystem-headers -Wdeclaration-after-statement -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wpointer-arith -fno-strict-overflow -fPIE -fstack-protector-strong -funroll-loops -fomit-frame-pointer -fasynchronous-unwind-tables -fstack-protector-strong -Wformat-security -fstack-clash-protection -fcf-protection

2022.82
-Wall -Wsign-compare -Wextra -Wshadow -Wdeclaration-after-statement -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wpointer-arith -Wsystem-headers -O3 -funroll-loops -fomit-frame-pointer -Os -W -Wall -Wno-pointer-sign -fno-strict-overflow -fPIE -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Werror -ggdb -I../../libtommath -I../libtomcrypt/src/headers/ -I../../libtommath/../libtomcrypt/src/headers/ -I../ -I../../libtommath/../ -Wno-deprecated -I../libtomcrypt/src/headers/ -DLOCALOPTIONS_H_EXISTS -I. -I.. -DDROPBEAR_SERVER -DDROPBEAR_CLIENT ../../libtommath/bn_mp_div_2.c -o bn_mp_div_2.o

from dropbear.

davidbernard04 commented on July 21, 2024

Nice catch! I've tried with your patch and there is an improvement:

2022.82: 35 sec average (-Os)
2022.82: 25 sec average (-O3)

Although still not as fast as the 2018 version:

2018.76: 16 sec average (-O3)

About binary size, there is an extra of ~20 KB on each dropbear and dropbearkey executables between -Os and -O3 for the 2022.82 version (ARM platform).

FYI on our side, we might choose to stick with the -Os flag to save some space (we have some platforms in dire need of free space, lol).

from dropbear.

mkj commented on July 21, 2024

OK right. When I get a chance I'll see if I can compare perf of the releases (and rummage for some older boards to try).
On x64 here the make timing; ./timing benchmark of standalone libtommath is the same between the versions.

from dropbear.

davidbernard04 commented on July 21, 2024

@mkj On second thought, I personally prefer the makefile_include.mk before your patch above. This way, we can control the optimization flag for everything (dropbear, libtomcrypto and libtommath) without having one of the lib that forces a -O3 regardless of the CFLAGS we specified.

from dropbear.

sjaeckel commented on July 21, 2024

@davidbernard04 that's caused by internal changes of libtommath where previously the prime verification simply wasn't as good as it is now... and this comes at a speed penalty.

tl;dr - if you are only using RSA you can set the compile-time flag LTM_USE_ONLY_MR of libtommath (I hope you use the bundled version :) this makes life here easier for once), which will disable the LR test and you should be fine to go. Please let me know how your performance numbers evolve, since there were some other changes in this area.

longer story...

2018.76 still used ltm 1.0.1 which always only did t deterministic Miller-Rabin Tests for primality testing. (That's definitively not what you want).
Since ltm 1.1.0 we run t rounds of the MR test against random numbers + a Lucas-Selfridge test. That's how FIPS 186-4 expects how primality testing is done ideally. Before it was simply not according to "the standard"^TM.

After having a quick look into FIPS 186-4 I just realized that it states in Appendix C.3 that for RSA keys there's no requirement for a LR test, only for DSA keys.

... maybe we overshot a bit on the paranoia scale? Maybe, but I prefer to offer a secure default and users can then still decide whether performance is more important to them.

@mkj:

please feel free to tag me in such issues, I don't watch dropbear that closely and only stumbled over this by chance.

I'll take this back to the project and check whether we can find a compatible way where LR tests can be enabled at compile time, but disabled at runtime. Currently that's not possible.

I'd say you can safely turn on the LTM_USE_ONLY_MR switch if DSA key generation is disabled.
For configurations that require both, RSA and DSA key generation at runtime, I don't have a good solution ready.
Do you even still have DSA key generation enabled per default?

One possibility would be to always enable LTM_USE_ONLY_MR and use the number of MR iterations of column 'M-R Tests Only' of Table C.1 in FIPS 186-4 for DSA and the numbers returned by mp_prime_rabin_miller_trials() for RSA...

from dropbear.

davidbernard04 commented on July 21, 2024

@sjaeckel Thanks a lot for your feedback! I've made some tests on the 2022.82 version with the LTM_USE_ONLY_MR define (on our ARMv5 platform), with 80 keys generated for each test below.

For -Os size-optimized build:

36 sec (no define)
26 sec (with the MR define)

For -O3 speed-optimized build:

24 sec (no define)
20 sec (with the MR define)

So it does improve performance for RSA keys generation!

But our app do generate both RSA and ECDSA keys, so unless you or @mkj implement a way for LR tests to be dynamic at runtime (i.e. enabled only for DSA keys), it is preferable that we don't use the LTM_USE_ONLY_MR define.

from dropbear.

mkj commented on July 21, 2024

I've now added a LTM_CFLAGS argument to configure - it defaults to -O3 -funroll-loops like before, but you can set
./configure LTM_CFLAGS=-Os
54a90dd

I'm also setting LTM_USE_ONLY_MR only when DROPBEAR_DSS is enabled (and have set DROPBEAR_DSS disabled by default,
it's on the way out).
0e70732

@sjaeckel just to confirm, ecdsa doesn't seem to need primality testing at all does it?

from dropbear.

mkj commented on July 21, 2024

That should read "setting LTM_USE_ONLY_MR only when DROPBEAR_DSS is disabled".
And I've noticed the check was wrong, fixed 960d374

from dropbear.

sjaeckel commented on July 21, 2024

@sjaeckel just to confirm, ecdsa doesn't seem to need primality testing at all does it?

Correct, primality testing is only required for RSA and DSA.

from dropbear.

RSA key generation time seems to have doubled about dropbear HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent