Giter Club home page Giter Club logo

armv7-functions's Introduction

ARMv7 Functions

This is a collection of various functions optimized for armv7 and neon.

The five holy laws

  1. Never return floating point values by value. It would work fine if -mfloat-abi=hard was supported everywhere, but sadly it's not. With the more common -mfloat-abi=softfp, every time you do a return my_float_value, it does either a fmrs or a vstr, followed by a load operation in order to read the result back! Instead, use a non-const reference as first parameter. It allows super smooth inlining of your intermediate results without unnecessary loads and stores, just like it would do if hard floats were available (works for vector types too) !
  2. Try to minimize loads and stores. Though GCC doesn't support evolved vldmia/vstmia and will generate poor code for operations on float32x4x4_t, so handcoding them make sense in that case.
  3. Use vector types everywhere it makes sense. Functions prefixed with vec3_ and vec4_ directly work on float32x4_t. Those prefixed with mat44_ directly work with float32x4x4_t. Parameters are passed as references, so the compiler doesn't perform unnecessary ARM register transfers.
  4. Don't hard-code registers, but use dummy values instead for clobber, and let the compiler allocate registers as needed.
  5. A good clobber list is an empty clobber list. If you let the compiler handle loads for you, "memory" shouldn't even show up in your clobber list. The only item that might is "cc".

Compilation flags

For best performance I usually use the following CFLAGS: -mthumb -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad -O3 -ffast-math -fomit-frame-pointer -fstrict-aliasing -fgcse-las -funsafe-loop-optimizations -fsee -ftree-vectorize, with -arch armv7 if it's gcc for iOS or -march=armv7-a if it's eabi-none-gcc.

Preprocessor macros

Several preprocessor macros, when defined, change the behaviour of the code. See config.h and config-defaults.h for details…

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.