Giter Club home page Giter Club logo

fastbase64's Introduction

fastbase64

Story

We are investigating the possibility of SIMD-accelerated base64 codecs.

  • Wojciech Muła, Daniel Lemire, Faster Base64 Encoding and Decoding Using AVX2 Instructions, ACM Transactions on the Web (to appear) https://arxiv.org/abs/1704.00605

Our AVX2 decoder does not use floating-point numbers.

Sample usage

We extend's Nick Galbreath's base64 library (this high-performance library is used in Chromium).

We assume that you have an AVX2-capable machine (recent Intel processor from 2013 and up). In practice, which function is called should be determined based on the underlying hardware.

Encoding

Original encoding...

 char* src = ...;
 int srclen = ...; //the length of number of bytes in src
 char* dest = (char*) malloc(chromium_base64_encode_len(srclen)); // decode_len effectively multiplies by 4/3
 int len = chromium_base64_encode(dest, src, sourcelen); // returns how many bytes were decoded.

Encoding with AVX2...

 char* src = ...;
 int srclen = ...; //the length of number of bytes in src
 char* dest = (char*) malloc(chromium_base64_encode_len(srclen));
 int len = fast_avx2_base64_encode(dest, src, sourcelen); // returns how many bytes were decoded.

Decoding

Original decoding...

char* src = ...;
int srclen = ...; // or if you don't know use strlen(src)
char* dest = (char*) malloc(chromium_base64_decode_len(srclen)); // effectively multiplies by 3/4
int len = chromium_base64_decode(dest, src, sourcelen);
if (len == MODP_B64_ERROR) { error }

Decoding with AVX2...

char* src = ...;
int srclen = ...; // or if you don't know use strlen(src)
char* dest = (char*) malloc(chromium_base64_decode_len(srclen)); // effectively multiplies by 3/4
int len = fast_avx2_base64_decode(dest, src, sourcelen);
if (len == MODP_B64_ERROR) { error }

Results

We compare SIMD decoding with competitive alternatives. One particularly fast decoder is used by Google Chrome (and available in Chromium). Chromium uses code produced by Nick Galbreath called "high performance base64 encoder / decoder". We also use the decoder found in the Linux kernel as well as the one found in the QuickTime code (which was derived from code from the Apache HTTP server).

Let us look at real data (images and text):

$ ./basic_benchmark
rdtsc_overhead set to 30
Testing first with random data.
See files encodingperf.txt decodingperf.txt ...
Testing with real data.
lena [jpg]
decoding a base64 input of  141020 bytes, original size = 105764
memcpy(buffer, data, datalength)                                :  0.09 cycles per operation (best)     0.09 cycles per operation (avg)
linux_base64_decode(buffer, data, data + datalength)            :  19.73 cycles per operation (best)     19.79 cycles per operation (avg)
quicktime_base64_decode(buffer, data)                           :  3.10 cycles per operation (best)     3.17 cycles per operation (avg)
chromium_base64_decode(buffer, data, datalength)                :  1.84 cycles per operation (best)     1.84 cycles per operation (avg)
scalar_base64_decode(data,datalength,buffer,&outputlength)      :  2.07 cycles per operation (best)     2.09 cycles per operation (avg)
klomp_avx2_base64_decode(data,datalength,buffer,&outputlength)    :  0.43 cycles per operation (best)     0.44 cycles per operation (avg)
fast_avx2_base64_decode(buffer, data, datalength)               :  0.21 cycles per operation (best)     0.21 cycles per operation (avg)

peppers [jpg]
decoding a base64 input of  12640 bytes, original size = 9478
memcpy(buffer, data, datalength)                                :  0.03 cycles per operation (best)     0.04 cycles per operation (avg)
linux_base64_decode(buffer, data, data + datalength)            :  15.05 cycles per operation (best)     15.94 cycles per operation (avg)
quicktime_base64_decode(buffer, data)                           :  3.15 cycles per operation (best)     3.17 cycles per operation (avg)
chromium_base64_decode(buffer, data, datalength)                :  1.82 cycles per operation (best)     1.82 cycles per operation (avg)
scalar_base64_decode(data,datalength,buffer,&outputlength)      :  2.08 cycles per operation (best)     2.09 cycles per operation (avg)
klomp_avx2_base64_decode(data,datalength,buffer,&outputlength)    :  0.44 cycles per operation (best)     0.44 cycles per operation (avg)
fast_avx2_base64_decode(buffer, data, datalength)               :  0.21 cycles per operation (best)     0.21 cycles per operation (avg)

mandril [jpg]
decoding a base64 input of  329632 bytes, original size = 247222
memcpy(buffer, data, datalength)                                :  0.11 cycles per operation (best)     0.12 cycles per operation (avg)
linux_base64_decode(buffer, data, data + datalength)            :  20.02 cycles per operation (best)     20.08 cycles per operation (avg)
quicktime_base64_decode(buffer, data)                           :  3.10 cycles per operation (best)     3.17 cycles per operation (avg)
chromium_base64_decode(buffer, data, datalength)                :  1.84 cycles per operation (best)     1.84 cycles per operation (avg)
scalar_base64_decode(data,datalength,buffer,&outputlength)      :  2.07 cycles per operation (best)     2.08 cycles per operation (avg)
klomp_avx2_base64_decode(data,datalength,buffer,&outputlength)    :  0.43 cycles per operation (best)     0.44 cycles per operation (avg)
fast_avx2_base64_decode(buffer, data, datalength)               :  0.22 cycles per operation (best)     0.22 cycles per operation (avg)

moby_dick [text]
decoding a base64 input of  1484 bytes, original size = 1111
memcpy(buffer, data, datalength)                                :  0.04 cycles per operation (best)     0.05 cycles per operation (avg)
linux_base64_decode(buffer, data, data + datalength)            :  3.61 cycles per operation (best)     4.33 cycles per operation (avg)
quicktime_base64_decode(buffer, data)                           :  3.20 cycles per operation (best)     3.21 cycles per operation (avg)
chromium_base64_decode(buffer, data, datalength)                :  1.83 cycles per operation (best)     1.83 cycles per operation (avg)
scalar_base64_decode(data,datalength,buffer,&outputlength)      :  2.11 cycles per operation (best)     2.13 cycles per operation (avg)
klomp_avx2_base64_decode(data,datalength,buffer,&outputlength)    :  0.53 cycles per operation (best)     0.54 cycles per operation (avg)
fast_avx2_base64_decode(buffer, data, datalength)               :  0.28 cycles per operation (best)     0.29 cycles per operation (avg)

google logo [png]
decoding a base64 input of  3144 bytes, original size = 2357
memcpy(buffer, data, datalength)                                :  0.05 cycles per operation (best)     0.05 cycles per operation (avg)
linux_base64_decode(buffer, data, data + datalength)            :  4.36 cycles per operation (best)     6.37 cycles per operation (avg)
quicktime_base64_decode(buffer, data)                           :  3.16 cycles per operation (best)     3.18 cycles per operation (avg)
chromium_base64_decode(buffer, data, datalength)                :  1.81 cycles per operation (best)     1.82 cycles per operation (avg)
scalar_base64_decode(data,datalength,buffer,&outputlength)      :  2.09 cycles per operation (best)     2.10 cycles per operation (avg)
klomp_avx2_base64_decode(data,datalength,buffer,&outputlength)    :  0.47 cycles per operation (best)     0.47 cycles per operation (avg)
fast_avx2_base64_decode(buffer, data, datalength)               :  0.23 cycles per operation (best)     0.24 cycles per operation (avg)

bing.com social icons [png]
decoding a base64 input of  1808 bytes, original size = 1355
memcpy(buffer, data, datalength)                                :  0.04 cycles per operation (best)     0.04 cycles per operation (avg)
linux_base64_decode(buffer, data, data + datalength)            :  3.97 cycles per operation (best)     5.25 cycles per operation (avg)
quicktime_base64_decode(buffer, data)                           :  3.13 cycles per operation (best)     3.20 cycles per operation (avg)
chromium_base64_decode(buffer, data, datalength)                :  1.82 cycles per operation (best)     1.83 cycles per operation (avg)
scalar_base64_decode(data,datalength,buffer,&outputlength)      :  2.11 cycles per operation (best)     2.27 cycles per operation (avg)
klomp_avx2_base64_decode(data,datalength,buffer,&outputlength)    :  0.46 cycles per operation (best)     0.47 cycles per operation (avg)
fast_avx2_base64_decode(buffer, data, datalength)               :  0.24 cycles per operation (best)     0.24 cycles per operation (avg)

Next plot shows results using random data of varying size:

We see that for base64 inputs of 100 bytes or more the AVX2 decoder is much faster, being more than three times faster.

How does SIMD base64 decoding works?

Let us focus on decoding, the most performance-sensitive task.

Character decoding (lookup)

Base64 writes 6-bit bytes in text form, not as byte values in [0,64). It is useful to take the text input and convert it to values in [0,64) if we want to decode base64 text. (This is not a necessary step however: some high performance base64 decoders do not include such a separate step, decoding base64 in one pass instead.) Muła calls this a lookup, possibly because it is commonly solved using a lookup table.

Muła showed (https://github.com/WojciechMula/base64simd) that you could quickly take a 32-byte vector of base64 encoded text and convert it to an array of integers in [0,64) using shifts, bitwise logical operations and shuffles. It is fast.

Bit packing

Given the byte values in [0,64), i.e., 6-bit values, we must then pack them to finish the decoding. Base64 works by packing 4 bytes into 3 bytes as follows. The normal 4-byte to 3-byte base64 decoding routine goes as follows...

output[0] =  ( input[0] << 2 ) | ( input[1] >> 4)
output[1] =  ( input[1] << 4 ) | ( input[2] >> 2)
output[2] =  ( input[3] << 6 ) |  input[3]

See https://en.wikipedia.org/wiki/Base64#Sample_Implementation_in_Java for a reference implementation.

(Base64 decoders such as the one in the Chromium code base avoid shifts entirely by looking up bytes as "pre-shifted" 32-bit values.)

Muła addresses the issue of "gathering data" from the result of the lookup: http://0x80.pl/notesen/2016-01-17-sse-base64-decoding.html#gathering-data

In a naive form, Muła suggests we use code as this :

const __m128i bits_a = _mm_and_si128(values, _mm256_set1_epi32(0x0000003f));
const __m128i bits_b = _mm_srli_epi32(_mm_and_si128(values, _mm256_set1_epi32(0x00003f00)), 2);
const __m128i bits_c = _mm_srli_epi32(_mm_and_si128(values, _mm256_set1_epi32(0x003f0000)), 4);
const __m128i bits_d = _mm_srli_epi32(_mm_and_si128(values, _mm256_set1_epi32(0x3f000000)), 6);

result = _mm_or_si128(bits_a, _mm_or_si128(bits_b, _mm_or_si128(bits_c, bits_d)));

This almost correct, but base64 works in big endian mode so proper byte shuffling is needed.

fastbase64's People

Contributors

byronhe avatar ksergey avatar lemire avatar longqimin avatar user959 avatar wojciechmula avatar yujinqiu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastbase64's Issues

Using fastbase64 to encode a file and put it in JSON

Hello!

I'm coding a program that should transfer a 160Mb file through JSON for an API and I wanted to test this AVX2 approach to encode the content of the file, paste the encoded string into the JSON and send it to the API.

I first modified unit.c to open a file, pass the content of the file in binary to a variable, encoding and storing it to a variable and then paste it into a file again, after some tests I got the encoded file (except for Images) but I always had a garbage tail at the end of it so I simplified the problem and just modified the unit.c main like this:

` size_t len = 0;
size_t codedlen = 0;
printf("Encoding...\n");
char * dest1 = (char*)malloc(chromium_base64_encode_len(strlen(wikipediasource)));
codedlen = fast_avx2_base64_encode(dest1, wikipediasource, strlen(wikipediasource));

assert(strncmp(dest1, wikipediacoded, codedlen) == 0);
printf("Assert ok\n\nBASE64:\n%s\n", dest1);

char * dest2 = (char*)malloc(chromium_base64_decode_len(codedlen));
len = fast_avx2_base64_decode(dest2, dest1, codedlen);

assert(len == strlen(wikipediasource));
assert(strncmp(dest2, wikipediasource, strlen(wikipediasource)) == 0);
printf("Asserts ok\n\nDECODED TEXT:\n%s\n", dest2);
printf("\n\nSOURCE:\n%s\n",wikipediasource);`

No exception is thrown but printf of SOURCE (wikipediasource) differs from prinf of DECODE TEXT (dest2) (see image)
git

The garbage tail is always changing:
²²²²¦¦¦¦¦¦¦Ñr�¥&
²²²²
²²²²¦¦¦¦¦¦¦┐90¨+ì

I assumed that this was something related to printf or related to a variable type so I stored the decoded string into a file but unfortunately it saved the whole thing including the garbage tail.

So I'm not sure if this is how it's supposed to work and/or if I'm doing something wrong?? Could you give me some advice?

Many thanks in advance!

Best,

Can't use inside c++ code

gcc failed to compile:

/home/ksergey/dev/BithumbGateway/deps/fastbase64/include/chromiumbase64.h: In function ‘std::__cxx11::string& chromium_base64_encode(std::__cxx11::string&)’:
/home/ksergey/dev/BithumbGateway/deps/fastbase64/include/chromiumbase64.h:162:21: error: redefinition of ‘std::__cxx11::string& chromium_base64_encode(std::__cxx11::string&)’
 inline std::string& chromium_base64_encode(std::string& s)
                     ^
/home/ksergey/dev/BithumbGateway/deps/fastbase64/include/chromiumbase64.h:144:21: note: ‘std::__cxx11::string& chromium_base64_encode(std::__cxx11::string&)’ previously defined here
 inline std::string& chromium_base64_encode(std::string& s)
...
#ifdef __cplusplus                                                                                                                                             
}                                                                                                                                                              
                                                                                                                                                               
#include <string>                                                                                                                                              
                                                                                                                                                               
inline std::string& chromium_base64_encode(std::string& s)                                                                                                     
{                                                                                                                                                              
    std::string x(chromium_base64_encode_len(s.size()), '\0');                                                                                                 
    size_t d = chromium_base64_encode(const_cast<char*>(x.data()), s.data(), (int)s.size());                                                                       x.erase(d, std::string::npos);                                                                                                                             
    s.swap(x);                                                                                                                                                 
    return s;                                                                                                                                                  
}                                                                                                                                                                                                                                                                                                                             
/**                                                                                                                                                            
 * base 64 decode a string (self-modifing)                                                                                                                     
 * On failure, the string is empty.                                                                                                                            
 *                                                                                                                                                             
 * This function is for C++ only (duh)                                                                                                                         
 *                                                                                                                                                             
 * \param[in,out] s the string to be decoded                                                                                                                   
 * \return a reference to the input string                                                                                                                     
 */                                                                                                                                                            
inline std::string& chromium_base64_encode(std::string& s)                                                                                                     
{                                                                                                                                                              
    std::string x(chromium_base64_encode_len(s.size()), '\0');                                                                                                 
    size_t d = chromium_base64_encode(const_cast<char*>(x.data()), s.data(), (int)s.size());                                                                   
    if (d == MODP_B64_ERROR) {                                                                                                                                 
        x.clear();                                                                                                                                             
    } else {                                                                                                                                                   
        x.erase(d, std::string::npos);                                                                                                                         
    }                                                                                                                                                          
    s.swap(x);                                                                                                                                                 
    return s;                                                                                                                                                  
}                                                                                                                                                              
                                                                                                                                                               
#endif /* __cplusplus */                                                                                                                                       
#endif                                                                                                                                                        
...

Fail to build on macOS 10.15

➜  fastbase64 git:(master) ✗ make
cc -fPIC -std=c99 -Wall -Wextra -Wshadow -Wpsabi -Wfatal-errors -O3 -march=native -march=native -mavx2 -c src/chromiumbase64.c -Iinclude
warning: unknown warning option '-Wpsabi' [-Wunknown-warning-option]
1 warning generated.
cc -fPIC -std=c99 -Wall -Wextra -Wshadow -Wpsabi -Wfatal-errors -O3 -march=native -march=native -mavx2 -c src/fastavxbase64.c -Iinclude
warning: unknown warning option '-Wpsabi' [-Wunknown-warning-option]
1 warning generated.
cc -fPIC -std=c99 -Wall -Wextra -Wshadow -Wpsabi -Wfatal-errors -O3 -march=native -march=native -mavx2 -c src/fastavx512bwbase64.c -Iinclude
warning: unknown warning option '-Wpsabi' [-Wunknown-warning-option]
src/fastavx512bwbase64.c:87:23: fatal error: always_inline function
      '_mm512_loadu_si512' requires target feature 'avx512f', but would be
      inlined into function 'fast_avx512bw_base64_encode' that is compiled
      without support for 'avx512f'
        inputvector = _mm512_loadu_si512((__m512i *)(str));
                      ^
1 warning and 1 error generated.
make: *** [fastavx512bwbase64.o] Error 1

Not sure if it's a mistake in Makefile. With make AVX512BW=1 it compiles.

How to convert array of RGBA to base_64

Hi there
I have a uint8_t* which contains unpacked rgba data, my question is how to convert it to base_64?

int _width, _height, _comp;
uint8_t* _pixels = stbi_load("test.png", &_width, &_height, &_comp, 4);//4 for rgba
size_t _size = _width * _height * _comp;
char* _buffer = (char*)malloc(_size * 2);
size_t _expected = chromium_base64_encode(_buffer, reinterpret_cast<char*>(_pixels), _size);

this will not give me the right base64 encoded string.
Would you help me on this issue?
Thanks in advance.

*encoded

Any plans on supporting aarch64?

Hi! I'm currently looking at ways to use this library in ClickHouse (ClickHouse/ClickHouse#31643), and my very first step was to build fastbase64 on my machine. I got several errors on v0.1.0, and on master pretty much the only one is

In file included from src/fastavxbase64.c:3:
In file included from /opt/homebrew/Cellar/llvm/16.0.6/lib/clang/16/include/x86intrin.h:15:
/opt/homebrew/Cellar/llvm/16.0.6/lib/clang/16/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
#error "This header is only meant to be used on x86 and x64 architecture"
 ^

I'm trying to build on Apple M1 Max.
Do you have any plans on supporting ARM?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.