Giter Club home page Giter Club logo

prefix-sum's Introduction

Comparison of Prefix Sum Method

I've measured the elapsed time and the mean absolute error for the various methods of computing Prefix Sum of an array of 1M floats.

Compiled with SSE2 option

g++ 5.4.0 g++ 7.5.0 g++ 9.3.0 clang 11.0.3* msvc 19.26 Avg SpeedUp MAE
Time(sec) SpeedUp Time SpeedUp Time SpeedUp Time SpeedUp Time SpeedUp
float simple(baseline) 1.459 0% 1.255 0% 0.983 0% 0.901 0% 1.247 0% 0% 2.037 simple(baseline)
simple_double 1.546 -6% 1.648 -24% 1.334 -26% 1.281 -30% 1.156 8% -11% - simple_double
sse 0.521 180% 0.549 128% 0.341 188% 0.444 103% 0.384 225% 172% 0.683 sse
kahan 5.924 -75% 4.789 -74% 3.987 -75% 3.497 -74% 4.100 -70% -73% 0.000 kahan
unroll4 1.401 4% 1.247 1% 0.972 1% 0.893 1% 1.017 23% 9% 2.037 unroll4
unroll4_reorder1 1.068 37% 0.984 27% 0.802 23% 0.745 21% 0.818 52% 36% 0.768 unroll4_reorder1
unroll4_shift 0.896 63% 0.514 144% 0.919 7% 0.545 65% 0.917 36% 57% 0.683 unroll4_shift
unroll8 1.431 2% 1.240 1% 1.042 -6% 0.886 2% 1.048 19% 7% 2.037 unroll8
unroll8_reorder1 1.062 37% 0.926 36% 0.763 29% 0.660 37% 0.780 60% 44% 1.160 unroll8_reorder1
unroll8_reorder2 1.249 17% 0.849 48% 0.645 52% 0.634 42% 0.692 80% 55% 0.833 unroll8_reorder2
unroll8_shift 1.210 21% 0.657 91% 1.248 -21% 0.591 52% 1.294 -4% 23% 0.344 unroll8_shift
unroll16 1.378 6% 1.242 1% 1.009 -3% 0.897 0% 1.036 20% 8% 2.037 unroll16
unroll16_reorder1 0.880 66% 0.891 41% 0.715 37% 0.701 29% 0.728 71% 52% 1.198 unroll16_reorder1
unroll16_reorder2 0.657 122% 0.793 58% 0.613 60% 0.533 69% 0.847 47% 65% 2.277 unroll16_reorder2
double simple(baseline) 1.486 0% 1.291 0% 0.997 0% 0.885 0% 1.526 0% 0% 2.037 simple(baseline)
kahan 5.563 -73% 4.813 -73% 4.079 -76% 3.466 -74% 4.267 -64% -70% - kahan
unroll4 1.478 1% 1.248 3% 1.032 -3% 0.878 1% 1.018 50% 19% 2.037 unroll4
unroll4_reorder1 1.079 38% 1.010 28% 0.789 26% 0.741 19% 0.794 92% 51% 0.768 unroll4_reorder1
unroll4_shift 0.927 60% 0.544 138% 0.919 8% 0.468 89% 0.967 58% 70% 0.683 unroll4_shift
unroll8 1.549 -4% 1.226 5% 0.958 4% 0.883 0% 1.035 47% 19% 2.037 unroll8
unroll8_reorder1 0.929 60% 0.944 37% 0.754 32% 0.671 32% 0.765 100% 61% 1.160 unroll8_reorder1
unroll8_reorder2 0.808 84% 0.831 55% 0.619 61% 0.616 44% 0.673 127% 83% 0.833 unroll8_reorder2
unroll8_shift 1.240 20% 0.648 99% 1.208 -18% 0.606 46% 1.269 20% 32% 0.344 unroll8_shift
unroll16 1.431 4% 1.247 4% 1.015 -2% 0.904 -2% 1.019 50% 19% 2.037 unroll16
unroll16_reorder1 0.861 73% 0.907 42% 0.690 44% 0.622 42% 0.727 110% 72% 1.198 unroll16_reorder1
unroll16_reorder2 0.621 139% 0.812 59% 0.619 61% 0.509 74% 0.802 90% 85% 2.277 unroll16_reorder2

Compiled with AVX option

g++ 5.4.0 g++ 7.5.0 g++ 9.3.0 clang 11.0.3* msvc 19.26 Avg SpeedUp MAE
Time(sec) SpeedUp Time SpeedUp Time SpeedUp Time SpeedUp Time SpeedUp
float simple(baseline) 1.245 0% 1.071 0% 0.995 0% 0.862 0% 1.243 0% 0% 2.037 simple(baseline)
simple_double 1.614 -23% 1.515 -29% 1.263 -21% 1.173 -27% 1.269 -2% -17% - simple_double
avx 0.514 142% 0.502 113% 0.492 102% 0.481 79% 0.577 116% 108% 0.344 avx
sse 0.394 216% 0.358 199% 0.345 189% 0.405 113% 0.504 147% 159% 0.683 sse
kahan 4.825 -74% 4.246 -75% 3.860 -74% 3.406 -75% 4.069 -69% -73% 0.000 kahan
unroll4 1.179 6% 1.213 -12% 0.992 0% 0.861 0% 1.249 0% -1% 2.037 unroll4
unroll4_reorder1 0.988 26% 0.871 23% 0.813 22% 0.727 19% 0.972 28% 24% 0.768 unroll4_reorder1
unroll4_shift 0.679 84% 0.910 18% 0.937 6% 0.570 51% 0.534 133% 76% 0.683 unroll4_shift
unroll8 1.229 1% 1.049 2% 0.949 5% 0.862 0% 1.027 21% 9% 2.037 unroll8
unroll8_reorder1 1.026 21% 0.753 42% 0.734 36% 0.630 37% 0.792 57% 43% 1.160 unroll8_reorder1
unroll8_reorder2 0.758 64% 0.673 59% 0.627 59% 0.570 51% 0.715 74% 63% 0.833 unroll8_reorder2
unroll8_shift 0.756 65% 1.356 -21% 1.246 -20% 0.776 11% 0.631 97% 42% 0.344 unroll8_shift
unroll16 1.171 6% 1.009 6% 1.003 -1% 0.876 -2% 1.074 16% 7% 2.037 unroll16
unroll16_reorder1 0.829 50% 0.769 39% 0.680 46% 0.571 51% 0.773 61% 53% 1.198 unroll16_reorder1
unroll16_reorder2 0.726 71% 0.626 71% 0.569 75% 0.494 75% 0.943 32% 58% 2.277 unroll16_reorder2
double simple(baseline) 1.214 0% 1.066 0% 0.952 0% 0.875 0% 1.537 0% 0% 2.037 simple(baseline)
kahan 4.790 -75% 4.175 -74% 3.821 -75% 3.431 -75% 3.986 -61% -70% - kahan
unroll4 1.152 5% 1.032 3% 0.933 2% 0.866 1% 1.243 24% 10% 2.037 unroll4
unroll4_reorder1 0.975 25% 0.890 20% 0.810 18% 0.732 19% 0.959 60% 35% 0.768 unroll4_reorder1
unroll4_shift 0.683 78% 0.950 12% 0.919 4% 0.567 54% 0.523 194% 98% 0.683 unroll4_shift
unroll8 1.194 2% 1.063 0% 0.965 -1% 0.870 1% 1.034 49% 18% 2.037 unroll8
unroll8_reorder1 0.876 39% 0.761 40% 0.715 33% 0.639 37% 0.838 83% 54% 1.160 unroll8_reorder1
unroll8_reorder2 0.791 54% 0.665 60% 0.642 48% 0.576 52% 0.721 113% 76% 0.833 unroll8_reorder2
unroll8_shift 0.750 62% 1.237 -14% 1.259 -24% 0.763 15% 0.624 146% 61% 0.344 unroll8_shift
unroll16 1.158 5% 1.031 3% 0.960 -1% 0.874 0% 1.034 49% 19% 2.037 unroll16
unroll16_reorder1 0.825 47% 0.750 42% 0.705 35% 0.609 44% 0.747 106% 66% 1.198 unroll16_reorder1
unroll16_reorder2 0.752 61% 0.636 68% 0.587 62% 0.492 78% 0.958 61% 66% 2.277 unroll16_reorder2

prefix-sum's People

Contributors

bab2min avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.