Giter Club home page Giter Club logo

Comments (5)

ludocode avatar ludocode commented on August 9, 2024

Yep I agree, there is too much inlining right now. I've also ran into code size problems with MPack. I have a port of MPack to Pebble smartwatch lying around here, and it compiles fine on it, but the unit test suite is way too huge to run. I've only been able to run minor individual tests, and there's no point in the port if it takes up most of the space available to the program.

The recent inlining came about due to some aggressive optimizations I've been doing the past couple months. I do want to point out that none of the inlining was done speculatively. A few months ago I started writing a benchmarking suite for schemaless data parsers which I'm getting ready to release publicly soon. It compares the performance and code size of various libraries including the reference msgpack implementation, MPack, CMP, RapidJSON, and YAJL. This benchmarking suite largely drove the development of v0.5 of MPack which made major performance improvements. It's what I've been using to do these performance tests, and the inlining changes were all made on observation of very real, statistically significant performance improvements in my tests (on a desktop PC, a Chromebook, and a Raspberry Pi.) So I'm not just assuming inlining is automatically faster or guessing at what to inline; they do actually improve performance.

Some of these inlines don't need to be done directly with different optimization options however. I'm switching over the benchmarking suite to -O3 -flto -fwhole-program, which makes any use-once inline unnecessary. This is what the kernel style guide you linked is referring to. I had originally avoided this though, since in my experience most projects in the real world aren't actually compiled this way, and I wanted to keep the benchmarking as realistic as possible (I'm still avoiding profile-guided optimization in benchmarking for the same reason.) And I do agree that performance isn't everything, especially since typically the data parser is such a minor part of any large application, so optimizing for code size instead of speed is usually better anyway. But these things are hard to demonstrate objectively, and I do still want MPack to be the fastest schemaless parser available. So there's a tradeoff to be made here.

In the meantime you might be able to get away with replacing static inline with MPACK_INLINE in the code, and then defining MPACK_INLINE to static __attribute__((unused)) or something. If your linker is good and/or configured properly it should be able to merge down duplicate static function definitions. In the longer term, I might consider adding a configuration option to choose between optimizing for size or speed which would conditionally inline some functions. This might make MPack be able to claim to be the fastest and smallest MessagePack parser when configured appropriately. (This is especially true since it doesn't do any of the templating that the reference implementation does, or that other parsers claim is key to their performance, like RapidJSON. It isn't. I believe that the extent of templating they do is bad, and one of the ways I want to be able to demonstrate this is by having a smaller code footprint.)

Anyway thanks for bringing this up. It's good to know the issues you're having. I agree and have been having similar issues, and it gives me a good direction to work on. I'll definitely be making this a priority for v0.6.

from mpack.

xor-gate avatar xor-gate commented on August 9, 2024

Hi Nicholas (@ludocode),

Thanks for you clear answer. This library is one of the only ones which fit embedded (micro-controller) usage because use of malloc/calloc free is left how it is configured. The use of dynamic memory is not acceptable for MISRA-C. Code size is the most important for deeply embedded systems (e.g ARM Cortex-Mx). I will track the development and when I have some enhancements I will send you pull requests.

from mpack.

ludocode avatar ludocode commented on August 9, 2024

This is now implemented on develop (b2bb68c). This work was done as part of a conversion of inline usage away from static inline, instead using "extern inlines" as appropriate to provide a single definition when a function cannot be inlined.

A new option MPACK_OPTIMIZE_FOR_SIZE is available in mpack-config.h. If you are using GCC it will detect the use of -Os and will automatically enable the option. Any non-trivial functions previously declared inline are now only declared inline if MPACK_OPTIMIZE_FOR_SIZE is not true.

You can see the complexity of supporting all of this here:

/*
* Definition of inline macros.
*
* MPack supports several different modes for inline functions:
* - functions declared with a platform-specific always-inline (MPACK_ALWAYS_INLINE)
* - functions declared inline regardless of optimization options (MPACK_INLINE)
* - functions declared inline only in builds optimized for speed (MPACK_INLINE_SPEED)
*
* MPack does not use static inline in header files; only one non-inline definition
* of each function should exist in the final build. This reduces the binary size
* in cases where the compiler cannot or chooses not to inline a function.
* The addresses of functions should also compare equal across translation units
* regardless of whether they are declared inline.
*
* The above requirements mean that the declaration and definition of non-trivial
* inline functions must be separated so that the definitions will only
* appear when necessary. In addition, three different linkage models need
* to be supported:
*
* - The C99 model, where "inline" does not emit a definition and "extern inline" does
* - The GNU model, where "inline" emits a definition and "extern inline" does not
* - The C++ model, where "inline" emits a definition with weak linkage
*
* The macros below wrap up everything above. All inline functions defined in header
* files have a single non-inline definition emitted in the compilation of
* mpack-platform.c.
*
* Inline functions in source files are defined static, so MPACK_STATIC_INLINE
* is used for small functions and MPACK_STATIC_INLINE_SPEED is used for
* larger optionally inline functions.
*/
#if defined(__cplusplus)
// C++ rules
// The linker will need weak symbol support to link C++ object files,
// so we don't need to worry about emitting a single definition.
#define MPACK_INLINE inline
#elif defined(__GNUC__) && (defined(__GNUC_GNU_INLINE__) || \
!defined(__GNUC_STDC_INLINE__) && !defined(__GNUC_GNU_INLINE__))
// GNU rules
#ifdef MPACK_EMIT_INLINE_DEFS
#define MPACK_INLINE inline
#else
#define MPACK_INLINE extern inline
#endif
#else
// C99 rules
#ifdef MPACK_EMIT_INLINE_DEFS
#define MPACK_INLINE extern inline
#else
#define MPACK_INLINE inline
#endif
#endif
#define MPACK_STATIC_INLINE static inline
#if MPACK_OPTIMIZE_FOR_SIZE
#define MPACK_STATIC_INLINE_SPEED static
#define MPACK_INLINE_SPEED /* nothing */
#ifdef MPACK_EMIT_INLINE_DEFS
#define MPACK_DEFINE_INLINE_SPEED 1
#else
#define MPACK_DEFINE_INLINE_SPEED 0
#endif
#else
#define MPACK_STATIC_INLINE_SPEED static inline
#define MPACK_INLINE_SPEED MPACK_INLINE
#define MPACK_DEFINE_INLINE_SPEED 1
#endif
#ifdef MPACK_OPTIMIZE_FOR_SPEED
#error "You should define MPACK_OPTIMIZE_FOR_SIZE, not MPACK_OPTIMIZE_FOR_SPEED."
#endif

This new option seems like it may be pointless though. When using my benchmarking suite as a semi-realistic test of executable size, enabling MPACK_OPTIMIZE_FOR_SIZE under -Os -flto doesn't seem to accomplish anything; in fact with GCC 5.2 it actually makes the writer test larger by about 170 bytes!

Remember, inline is just a compiler suggestion, and when compiling with -Os, the compiler is likely to ignore it if it does not actually reduce the size of the binary. I'm guessing adding inline in lots of places makes the compiler perform more inter-procedural analysis to find cases where it can actually reduce the size, but it doesn't hurt anything because most of the time it refuses to inline the function. The avoidance of static also doesn't matter, since the linker was already smart enough to merge duplicate definitions.

So the reasoning in your kernel.org documentation may no longer be true, at least not when using a modern compiler and configuring it to optimize for size with link-time optimization. Using static and inline liberally doesn't actually bloat the binary at all, and in fact can actually further reduce the size.

Anyway, give the new code a try and see if the option does anything for you. I didn't try any other platforms or compilers, and I didn't try a non-lto build either, so maybe there are cases where it helps. Make sure you are compiling with -Os -flto and linking with -Os -flto -fwhole-program if possible. Also, and I'm sure this is obvious, but double-check to make sure neither of DEBUG and _DEBUG are defined. Try toggling the MPACK_OPTIMIZE_FOR_SIZE option to see how it affects the size of your binary. If it doesn't have a significant effect I'm most likely going to remove it and just use MPACK_INLINE for everything.

from mpack.

ludocode avatar ludocode commented on August 9, 2024

I just released version 0.6 with the above inline fixes and the MPACK_OPTIMIZE_FOR_SIZE option, so I'm resolving this as fixed. The size option might be removed at some point if it doesn't really reduce the size under proper size-optimized builds, but either way it's fine for now.

from mpack.

xor-gate avatar xor-gate commented on August 9, 2024

Thanks for putting your time and effort in this, totaly agree on your points and have nothing to add.

Kind regards,
Jerry

from mpack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.