Giter Club home page Giter Club logo

documentsass's Introduction

DocumentSASS

The instruction sets for NVIDIA GPUs have a very sparse official documentation.

Other projects have worked on examining the instructions mainly through reverse-engineering, such as MaxAs, AsFermi, CuAssembler, TuringAs, KeplerAs, Decuda, and the paper Dissecting the NVidia Turing T4 GPU.

Since the instructions and architecture changes from generation to generation, it is an uphill battle.
What if a description of the instruction encoding could be found within the tools provided by NVIDIA?
What if the instruction latencies could be found inside these as well?

The answer is of course they can. Otherwise the compiler would do a poor job scheduling instructions. Furthermore, for SASS it turns out that fixed-latency instructions have the number of stall cycles hard-coded into them [src]. It is just a question of finding where this data is hidden.

It turns out that an extensive description of SASS instructions as well as latencies was contained in two specific strings in nvdisasm. Instead of having to write micro-benchmarks to find latencies, or use reverse engineering to make an assembler, one could in theory just consult these files. Instruction scheduling info is given in the latencies file, with the minimum time for fixed-latency ops. essentially being the latency. See NOTES.

For some additional, unrelated observations, see OTHER.

How to run

The easy way is by simply running this notebook in Google Colab. No requirements.

Requirements to run locally: Linux, Python 3, CUDA Toolkit. Run make to generate the raw files describing instructions and latencies. Be sure to change the paths in the beginning of the Makefile if they are different on your system. Tested with CUDA 11.6.

How it works

  1. nvcc is used to compile example.cu to .cubin binaries for a list of architectures.
  2. cc is used to compile intercept.c to a .so library that serves as a man-in-the-middle for data from memcpy calls.
  3. We intercept nvdisasm applied on each binary file using intercept.so.
  4. The result is filtered with strings to only get text, and then the script funnel.py gathers the relevant portions and writes them to files.

An initial approach was to simply run strings nvdisasm to get text embedded in the executable, but it turned out the relevant strings were dynamically generated (and only for the input architecture), which is why this solution is needed.

TODO

  • It appears the instruction string may be slightly corrupted for compute capability 3.5 currently.

documentsass's People

Contributors

0xd0gf00d avatar

Stargazers

Nigel Moes avatar Jon Purdy avatar Rajkumar CV avatar Jie He avatar Andreas Raster avatar  avatar Wenzhuo Liu avatar Tianqi Zhang (张天启) avatar  avatar  avatar Yuxi Liu avatar Roman Dahm avatar Renzo avatar  avatar  avatar  avatar François Cantonnet avatar Quim avatar Rodrigo Huerta Gañán avatar

Watchers

 avatar Rodrigo Huerta Gañán avatar  avatar

documentsass's Issues

Generated Files

Has anyone posted a zipped copy of the output files somewhere for those of us not on Linux? I tried using the link to the google Colab, but it appears to be dead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.