The fp16computetest from keijiro

fp16computetest's Introduction

FP16ComputeTest

This repository contains a simple program that tests the performance of half-precision floating point operations on DirectX11/12 with the min16float type specifier.

The following screenshot shows the result on RADEON RX 460 (click to enlarge). The first two highlighted lines show the duration spent by large matrix multiplications with the float type. The next two lines show the duration by the same operation but with the min16float type.

The next screenshot shows the result with transposed matrices that improve the performance thanks to data locality.

It seems that min16float improved the performance despite the fact that RX 460 doesn't have a FP16 pipeline.

The following screenshots show the results of the same program on GeForce GTX 1050 Ti. In these cases min16float gave negative effects. It seems that Pascal's FP16 pipelines are not utilized for some reason.

Please note that I'm not trying to provide an accurate conclusion from these results. You may find some doubtful points in them -- why 1050 Ti can run x10 faster than RX 460? The only meaningful conclusion from them is that you can't get a quick performance boost by simply using min16float.

Recommend Projects

keijiro / fp16computetest Goto Github PK

fp16computetest's Introduction

FP16ComputeTest

fp16computetest's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent