Comments (7)
Apparently it only happens when compiling with something like -O0
.
from base64.
So the problem is that the following set of four registers, which together form the lookup table, are not sequentially numbered:
{v5.16b, v6.16b, v7.16b, v16.16b}
That sucks, because as you mention, the code goes to great lengths to load that table into four hardcoded sequential registers: v8
, v9
, v10
and v11
.
For some unclear reason, the compiler chooses to rename those registers when returning from the function. I was really hoping that any reasonable compiler would never do that, because the hardcoded registers are already taken and the table stays live for the duration of the encoder.
Yet here we are. My little gambit failed.
Testing a fix sucks, because I don't have an ARM64 machine that I can test on, and even then I'm not sure that I can reproduce the bug.
The silver lining is that clang
should not be affected by the codegen bug that GCC has for vld1q_u8_x4
. So we should hopefully be able to use that instead...
Could you try changing line 28 to this:
#if defined(BASE64_NEON64_USE_ASM) && !defined(__clang__)
from base64.
Another thing to try is to add the always_inline
attribute to the function:
__attribute__((always_inline))
static inline uint8x16x4_t
load_64byte_table (const uint8_t *p)
{
#ifdef BASE64_NEON64_USE_ASM
I believe that -O0
can turn off inlining, and that may mean that the compiler can't make the reasonable inference that it should not rename the registers.
from base64.
Both suggestions result in the same compiler errors.
FWIW I don't have an arm64 device handy either, so I just installed and used clang (v14) with an aarch64 sysroot (https://developer.arm.com/-/media/Files/downloads/gnu-a/10.3-2021.07/binrel/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu.tar.xz).
from base64.
Here's the command line I'm using (from the project root) to test FWIW (on Linux):
clang-14 -DHAVE_NEON64=1 -I./include -I./lib -O0 -I/tmp/aarch64-none-linux-gnu/libc/usr/include -target arm64-linux-gnu -c lib/arch/neon64/codec.c -o base64_neon64.codec.o
from base64.
Thanks for linking to the sysroot and for sharing your script! Those will be useful in the future. I was able to reproduce the bug and also affirm your conclusions that my proposed fixes don't work.
This looks like a nasty bug. Even when I inline the table-loading code into the encoder loop, the bug appears. Even when I don't create a uint8x16x4_t
, but pass the t0-t3
registers (which should surely be in v8-v11
...) directly to the inline assembly, the bug manifests itself.
I'm unsure of how to fix this, other than to rewrite the whole encoder logic in assembly. (That was something that I was actually planning on, because it would let me interleave loads and stores more naturally.)
Maybe the best fix for the time being is indeed the one you pushed: to just disable inline asm for clang when not optimizing.
from base64.
Yesterday I set up a small AArch64 Debian VM using qemu-system-aarch64
to do quick prototyping on the AArch64 platform. I was hoping that it would be relatively simple to rewrite the entire NEON64 encoding loop in inline assembly, and it turns out I was right. AArch64 assembly is pretty approachable. I managed to implement the entire loop in inline assembly, including proper interleaving and pipelining of the 8x unrolled loop. All tests pass, and I'm reasonably happy with the cleanness of the code.
I've created a new issue (#98) for this enhancement and also pushed a testing branch, issue98
.
This was the nuclear option, but also the only solution I saw to fixing this bug. I was not hopeful that I could find any more tricks to get the compiler to generate the correct code by itself.
from base64.
Related Issues (20)
- Concurrent call interface “base64_decode” appear Segment error
- Library naming HOT 3
- AVX2: enc: add inline asm codepath
- Change license to BSD 3-clause
- CI: upgrade deprecated Ubuntu image
- AVX: enc: add inline asm codepath
- SSSE3: enc: add inline asm codepath HOT 1
- bug: out-of-bounds read when using inline assembly code path HOT 3
- bug: codec_choose_x86 does not check for OS AVX512 support HOT 3
- Decoding data containing <NUL> values HOT 6
- bin/base64: modernize the demo program
- Add a macro to calculate encoded size from raw size and vice versa
- Investigate `gf2p8affineqb` for the shuffle step
- enc: asm: add memory and flags as clobbers
- Integrate with google/oss-fuzz for continuous fuzz testing
- NEON64: enc: add inline asm codepath HOT 1
- cmake: generate only plain codecs
- NEON64: enc: convert full encoding loop to inline assembly
- NEON64: enc: ASM build fails on gcc with dd7a2b5f31 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from base64.