With JAVA_HOME and PATH pointing to a Panama vectorIntrinsics build, run:
./mvnw package && java --add-modules jdk.incubator.vector -jar target/bench.jar
Without having a local Panama vectorIntrinsics build, run:
./ci.sh
This will shallow-clone the GitHub mirror of the Panama vectorIntrinsics branch, build the JDK and execute the benchmarks using it. Make sure your system fulfills the OpenJDK build requirements. See the section "Clean Ubuntu Setup" below for a clean Ubuntu setup. The space requirements for such a cloned and fully built JDK is ~5.6GB, which will reside inside of the panama-vector directory. In addition, the hsdis utility library is built and installed into the JDK's lib directory.
sudo apt install -y libasound2-dev \
libfontconfig1-dev \
libcups2-dev \
libx11-dev \
libxext-dev \
libxrender-dev \
libxrandr-dev \
libxtst-dev \
libxt-dev \
git \
zip \
unzip \
automake \
autoconf \
build-essential
In order to see the x86 code generated by the JIT compiler for all methods, run:
java --add-modules jdk.incubator.vector -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,*Matrix*.* -cp target/bench.jar bench.C2
The x86 code is then printed to stdout. This requires the hsdis utility library available in the $JAVA_HOME/lib directory, as is provided by ./ci.sh
.
Benchmark Mode Cnt Score Error Units
Bench.Matrix4f_invert avgt 5 29.046 ± 3.309 ns/op
Bench.Matrix4f_storePutBB avgt 5 7.324 ± 0.177 ns/op
Bench.Matrix4f_storePutFB avgt 5 5.018 ± 0.389 ns/op
Bench.Matrix4f_storeU avgt 5 2.764 ± 0.070 ns/op
Bench.Matrix4fvArr_invert128 avgt 5 100.726 ± 13.251 ns/op
Bench.Matrix4fvArr_storePutFB avgt 5 5.087 ± 0.062 ns/op
Bench.Matrix4fvArr_storeU avgt 5 2.905 ± 0.034 ns/op
Bench.Matrix4fvArr_storeV256 avgt 5 1.806 ± 0.007 ns/op
Bench.Matrix4fvArr_storeV512 avgt 5 36.230 ± 1.678 ns/op
Bench.mul128LoopArr avgt 5 8.702 ± 0.033 ns/op
Bench.mul128LoopBB avgt 5 9.211 ± 0.066 ns/op
Bench.mul128UnrolledArr avgt 5 10.544 ± 0.064 ns/op
Bench.mul128UnrolledBB avgt 5 10.633 ± 0.013 ns/op
Bench.mul256Arr avgt 5 8.193 ± 0.102 ns/op
Bench.mul256BB avgt 5 8.129 ± 0.059 ns/op
Bench.mulAffineScalarFma avgt 5 11.534 ± 0.258 ns/op
Bench.mulScalar avgt 5 20.053 ± 0.279 ns/op
Bench.mulScalarFma avgt 5 14.636 ± 0.266 ns/op