Comments (27)
Hi Ferry, noticed you guys are from my old neighbourhood in Delft. Greetings from The Hague :)
I must admit that I don't have any experience with OpenMP, but I get from this that it's a library that can split the work over multiple cores? Looks good, and is in line with the aim of this project, which is to be fast most of all. ("Correctness" is also important, but sort of meaningless when a base64 string is either correct or not.)
I'd agree with you that the results look bounded by memory latency rather than raw throughput, but to me that's a good thing, it means the code is getting the theoretical maximal performance on the target hardware. If you wanted to get more accurate benchmarks, you could try to work on data only in the L1 or L2 caches.
I'd certainly be interested to pull, but there is one showstopper with the current code and that's that the library would require OpenMP. That's not worth it, I'd like to remain cross-platform and portable. You could fix this by using an environment variable to Make
such as ENABLE_OPENMP
to conditionally include omp.h
and link the library. I also noticed some spacing issues which I will comment on separately. If you fix these (or I can fix them, but I'd become the author of your commit), I'll pull it.
Not being familiar with pull requests is no problem, as long as I have your commits and can merge them manually.
from base64.
For me this use of openmp is new as well, so I needed to learn a few thing to get it right. It works mostly by compiler pragmas, so is ignored when the compiler doesn't support it. Yesterday I already put a conditional around the only openmp library function that is called from the code, so now it builds also without -fopenmp. But you are right, it should then not include omp.h. I'll fix that up, along with the spaces formatting.
from base64.
I've split up the code, fixed formatting and permissions and added some benchmarks.
from base64.
BTW I checked my memory configuration and found it to be DDR1333, while I have DDR1600, fixed that and got some more performance.
from base64.
Thanks for the contribution! I'm amazed, on my Core i5 the SSSE3 and AVX2 decoders are now equally fast and bounded only by memory latency. This is a very cool technology to have in my toolbox, thanks.
I just pushed an 'openmp' branch which contains your work, plus a merge commit where I make some small cleanups. I fixed some typos, I tried to keep the line length at 80 characters, and I made some minor whitespace changes. If you agree on the changes, I'll push it into master.
from base64.
Also, I think you deserve to be in the Acknowledgements section of the readme, so I'll add a mention if you agree.
from base64.
Added a shout-out to the Readme.
from base64.
Thanks for this.
Today I modified the benchmark program, to benchmark with different buffer sizes. I found that when the buffer is 100kB - 1MB that seems to be the best case, so my previous benchmarks @10Mb were not really comparable. with yours. Fixes that up, added a graph at the bottom of Readme.md .
You can see > 10M the cache is not very effective so you become memory constrained, while below 10k its not really worth to create threads. In the sweat spot you can achieve 27Gb/sec.
from base64.
Ehr, that would be previously 100MB. Now I repeat 10MB 10x, to measure the timing accurately, and repeat proportionally more times when the block is smaller.
from base64.
With this out of the way, I can get to doing something useful with the code: use it to encode binary data over a 2Mb/s serial connection. Using base64 I can add control/synchronization characters (/00 error, /FF preamble, STX, ETX) with almost no performance loss. With the Edison doing 150MB/s even without threading encoding will increase communication time ~1% (+33% due to base64 of course).
Thanks!
from base64.
Ah, I saw that you merged your branch on top of my master. Could you maybe try to rebase it onto master instead? Rebasing will replay your commits on top of my master branch and "fake history" so that it looks like you were working off the current HEAD
all the time. That gives a nice linear history without merge commits.
With rebase you can also squash commits so that it looks like they never happened. You could for instance suppress the plaintext ASCII table that appeared and disappeared in the README. The "recorded history" becomes a clean set of additive patches. Faking history in git is an awesome perfectionistic superpower :)
It's OK if you don't want to rebase, because I'm happy to merge your changes. Let me know if you're OK with the mention I put in the README and with the small tweaks I made in my merge commit.
from base64.
The question of which benchmark size to choose is tricky. If you choose 100MB, the tests take very long to run on a Raspberry Pi or other slow hardware. If you choose 10MB, you'll be constrained by the speed of memory on desktop systems. (Which I think is an awesome achievement btw :) If you choose a lower size, you get fairer benchmarks for very fast hardware, but the timing variation is larger. I guess I'm not the first to note that meaningful benchmarking is both very difficult and quite subjective.
from base64.
I think it's cool that you're using this library for serial comms. It's nice that you stretch out 3x8 bits over 4x6 so that you can fit in control characters in the top two bits. But if that's the approach you're taking, I have to note that base64-encoded data is not 6-bit clean, because after reshuffling, it gets translated to the base64 alphabet. (0x00 becomes 'A', and so on.) So the top two bits are not free for use. If you want to patch that out, get rid of the "translate" sections during encoding/decoding.
from base64.
:-) I think this was my first C code in 10 years, and now I'm attempting openmp, git, github and markdown at the same time. As soon as I figure out how to rebase I'll attempt to make the history lineair.
What I modified in the benchmark is to use the same buffer multiple times in such a way that the total time is constant regardless the buffer size. This way we should get the same accuracy for small and large buffers. It take a bit of time on edison and probably a bit more on raspberry.
The correct way would be of course to select a constant time and do as many buffer conversions as possible within that time. But I think for now this is good enough.
I measured on Edison today that the cache has no effect (or doesn't exist), so no peak for small buffers. And thread creation costs 9us for one and 12us for 2 threads. So in my application (2k buffer at 2Mb/sec = 0.2MB/sec with SSE3 codec running at 150MB/sec) not using openmp for encoding/decoding will probably be optimal. Irony.
Actually base64 should be just fine I think, in ascii control characters are put from 0x00 to 0x37 and 0xFF is also reserved. I think I saw in your code 0xFF is used for error signalling so that would be just fine for me. And all I actually need is 0x00 (the serial port should be able to generate that in case of parity errors), 0xFF for preambles and STX (0x02) and ETX (0x03) to enclose the base64 text.
from base64.
With respect to rebasing and rewriting history: please tell me how to do that and I'll get it done. I'm a little afraid to mess things up.
Otherwise, just go ahead and merge as you see fit. Thanks for the mention in the Readme!
from base64.
Take a look at the Pro Git Book, especially at the sections about Rebasing and Rewriting History.
Cheers
from base64.
So, how would I continue then? Create a new branch starting from my initial branch point (beaf4e3) and then reorder and squash commits into 1) add openmp stuff + 2) benchmarks?
from base64.
Alright, I think I sorted things out locally. Now how to fix on github? Delete the openmp branch and push up again? Or create a new branch?
from base64.
You can overwrite a remote branch's content doing a force push, i.e. git push -f
. You shouldn't delete a branch with an open pull request, because the pull request will irreversebly be closed and you'd have to open a new one.
EDIT: whoops, just noticed that this is just an issue and not a pull request π§, in this case it doesn't really matter whether you force push, recreate or branch again.
from base64.
I forced pushed. Hope that didn't screw up anything.
from base64.
Hey, I just merged your work into my openmp
branch. I'll probably merge that into master tomorrow after I sleep on it.
I added two things: a merge commit and a new commit on top. The merge commit basically changes nothing except some spacing/style/syntax issues. The new commit shuffles things a little bit so that the plain functions are back in lib/lib.c
, and lib/lib_plain.c
is no longer needed. I think that makes the code cleaner, because lib/lib_openmp.c
was basically copying the bodies of the plain codecs for the case where the buffer size was below the OMP threshold.
from base64.
That's great news. Thanks for this great project and for letting me contribute something back!
from base64.
Merged into master branch, thanks a lot for a great contribution!
from base64.
Actually, I think I have one last question about the OpenMP code, in regards to this line. base64_stream_decode
returns -1
if the requested codec is not implemented, 0
if decoding failed because of invalid input, and 1
if everything is OK. What happens if we spawn four OpenMP threads, and one thread encounters a decode error? All of the return values are added, so the accumulated result will be 3
, which is truncated to 1
, which means "success". Shouldn't a value for result
of anything other than num_threads
indicate an error?
from base64.
You're right, I didn't realize there is no test case for this.
We could replace '+=' by '&=' (assuming -1 can not occur) and initialize 'result' to 1?
from base64.
Well, if there are no errors, the sum must be equal to num_threads
. The -1
case will come out as -num_threads
since all threads encounter the same missing codec. Any result other than those two indicates that one or more threads hit an error.
from base64.
Pushed fix to master.
from base64.
Related Issues (20)
- Decoding data containing <NUL> values HOT 6
- bin/base64: modernize the demo program
- Add a macro to calculate encoded size from raw size and vice versa
- Investigate `gf2p8affineqb` for the shuffle step
- enc: asm: add memory and flags as clobbers
- Integrate with google/oss-fuzz for continuous fuzz testing
- Create release 0.5.1
- Codec detection doesnβt work in test_base64 on musl libc HOT 2
- v0.5.1 breaks `base64 -d` on Alpine Linux (musl libc) HOT 15
- Build of 0.5.1 broken with MinGW HOT 15
- I made a online base64 decoder tool. We can use it HOT 1
- Simplify codepath selection HOT 4
- build failure using mingw32: `error: 'asm' operand has impossible constraints` HOT 4
- CMakeLists: update version to 0.5.2
- bin/base64: add command line options for decoding
- Add `BASE64_FORCE_INLINE` macro to always inline inner loop functions
- Decoding error on Windows (CRLF?) HOT 18
- Codepage error on Windows HOT 2
- Build fails on macOS: `make: objcopy: No such file or directory` HOT 2
- Add installation guide for lib HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from base64.