Giter Club home page Giter Club logo

Comments (4)

syifan avatar syifan commented on July 1, 2024

I think I made a mistake in the earlier post. If I profile an HC++ or HIP program, I can still get the kernel information. However, if I profile an OpenCL benchmark, I will not get the kernel information. It seems the main reason is that the API trace terminates early and there is some other information missing from the trace. Maybe it is because hsa_shutdown is not called? I wonder if there is a way to properly record the whole trace in an OpenCL benchmark?

from radeon_compute_profiler.

syifan avatar syifan commented on July 1, 2024

OK, I solve the problem partially by myself. I added a hsa_shutdown call at the end of the program. However, I do not think this is the legitimate solution, as the whole program is purely an OpenCL program.It there a way to avoid that?

from radeon_compute_profiler.

chesik-amd avatar chesik-amd commented on July 1, 2024

In current RCP builds, we don't have a relaible workaround for applications that don't call hsa_shut_down (like all OpenCL applications).

For trace profiling of an OpenCL application, you may be able to get better results if you try the latest release of RCP (the 5.3 release from here https://github.com/GPUOpen-Tools/RCP/releases)

With this release, you can try using OpenCL tracing as opposed to HSA tracing. Simply replace the "-A --hsaaqlpackettrace" with "--apitrace".

We are planning on officially supported OpenCL-on-ROCm profiling in a future RCP release, but for now, you can try the above to see if it gives better results for an OpenCL application that doesn't explicitly call hsa_shut_down.

Collecting perf counters using the OpenCL proflier (i.e. using --perfcounter) still won't work for OCL-on-ROCm applications, but this something that should work better in future releases.

from radeon_compute_profiler.

chesik-amd avatar chesik-amd commented on July 1, 2024

This should work better in recent RCP releases, as well as in the profiler available with ROCM 2.0 You should now be able to profile OpenCL applications using --perfcounter when running on the ROCm stack.

from radeon_compute_profiler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.