Giter Club home page Giter Club logo

mperf's Introduction

mperf

Release Notes | Roadmap | Apps | 中文

mperf is a modular micro-benchmark/toolkit for kernel performance analysis.

Features.

  • Investigate the basic micro-architectural parameters(uarch) of the target CPU/GPU.
  • Draw graph of hierarchical roofline model, used to evaluate performance.
  • Collect CPU/GPU PMU events data.
  • Analyze CPU/GPU PMU events data(TMA Methodology and customized metrics), used to identify performance bottlenecks.
  • OpenCL Linter, used to guide manual OpenCL kernel optimization[TBD].
  • C++ Project
  • support platform: ARM CPUs, Mali GPUs, Adreno 6xx GPUs
  • Lightweight and embeddable library
  • The iOS platform is not yet fully functional.

Installation

mperf support CMake build system and require CMake version upper than 3.15.2, you can compile the mperf follow the step:

  • clone or download the project
    git clone https://github.com/MegEngine/mperf.git
    git submodule update --init --recursive
  • choose a test platform
    • if you will test arm processor in android OS
      • a ndk is required
        • download the NDK and extract to the host machine
        • set the NDK_ROOT env to the path of extracted NDK directory
    • if you will test x86 processor in linux OS
      • a gcc or clang compiler should find by cmake through PATH env
  • if your target test OS is android,run the android_build.sh to build it
    • print the usage about android_build.sh
      ./android_build.sh -h
    • build for armv7 cpu
      ./android_build.sh -m armeabi-v7a
    • build for arm64 cpu
      ./android_build.sh [-m arm64-v8a] // default march is arm64-v8a
    • build with mali mobile gpu
      ./android_build.sh -g mali [arm64-v8a, armeabi-v7a]
    • build with adreno mobile gpu
      ./android_build.sh -g adreno [arm64-v8a, armeabi-v7a]
    • build with pfm
      ./android_build.sh -p [arm64-v8a, armeabi-v7a]
    • build in debug mode
      ./android_build.sh -d [arm64-v8a, armeabi-v7a]
    • build with your custom install directory
      ./android_build.sh -i /your/custom/cmake/install/prefix [arm64-v8a, armeabi-v7a]
      e.g.: ./android_build.sh -i ~/mperf_install [-m arm64-v8a] // default march is arm64-v8a
  • if you target test OS is linux,if you want to enable pfm add -DMPERF_ENABLE_PFM=ON to cmake command
    cmake -S . -B "build-x86" -DMPERF_ENABLE_PFM=ON
    cmake --build "build-x86" --config Release 
  • after build, some executable files are stored in mperf build_dir/apps directory. And you can install the mperf to your system path or your custom install directory by
    cmake --build <mperf_build_dir> --target install 
    e.g.: cmake --build ./build-arm64-v8a/ --target install
  • and now, you can use find_package command to import the installed mperf, and use like
    set(mperf_DIR /path/to/your/installed/mperfConfig.cmake) # Note, it's the dirname of mperfConfig.cmake, e.g. set(mperf_DIR ~/mperf_install/lib/cmake/mperf/)
    find_package(mperf REQUIRED)
    target_link_libraries(your_target mperf::mperf)
  • alternatively, add_subdirectory(mperf) will incorporate the library directly in to your's CMake project.

Usage

  • basic usage for mperf xpmu module:
    mperf::CpuCounterSet cpuset = "CYCLES,INSTRUCTIONS,...";
    mperf::XPMU xpmu(cpuset);
    xpmu.run();
    
    ... // add your function to be measured
    
    xpmu.sample();
    xpmu.stop();
    please see cpu_pmu / mali_pmu / adreno_pmu for more details.
  • basic usage for mperf tma module:
    mperf::tma::MPFTMA mpf_tma(mperf::MPFXPUType::A55);
    mpf_tma.init(
            {"Frontend_Bound", "Bad_Speculation", "Backend_Bound", "Retiring", ...});
    size_t gn = mpf_tma.group_num();
    for (size_t i = 0; i < gn; ++i) {
        mpf_tma.start(i);
        for (size_t j = 0; j < iter_num; ++j) {
            ... // add your function to be measured
        }
        mpf_tma.sample_and_stop(iter_num);
    }
    mpf_tma.deinit();
    please see arm_cpu_tma for more details.

Source Directory Structure

  • apps Various user examples, please see apps doc for more details.
  • eca A module for collecting and analyzing PMU events data(Including TMA analysis).
  • uarch A set of low-level micro-benchmarks to investigate the basic micro-architectural parameters(uarch) of the target CPU/GPU.
  • doc Some documents about roofline and tma usage, please see index for the list.
  • cmake Some cmake relative files.
  • common Some common helper functions.
  • third_party Some dependent libraries.
  • linter OpenCL Linter [TBD].

Tutorial

  • A tutorial about how to optimize matmul to achieve peak performance on ARM A55 core, which will illustrate the basic logic of how to use mperf help your optimization job, please reference optimize the matmul with the help of mperf.

License

mperf is licensed under the Apache-2.0 license.

mperf's People

Contributors

jianhua-cui avatar megvii-mge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mperf's Issues

cntfrq_el0精度问题

咨询一下,cntfrq_el0的精度是不是达不到cycle的级别,想要达到cycle的级别还得用PMCCNTR_EL0
但是PMCCNTR_EL0需要root权限才行是吧?好像需要先在内核态使能才行

A55 上DSU_L3D_CACHE_REFILL和DSU_L3D_CACHE_WB获取不到

参考plot the roofline.md ,抓取cpu function call tma data的时候会有问题,应该是“ Metric_DRAM_BW_Use” 相关events 不支持,看了下https://github.com/ARM-software/data/blob/master/pmu/cortex-a55.json 确实没有DSU_L3D_CACHE_REFILLDSU_L3D_CACHE_WB,奇怪的是修改为L3D_CACHE_REFILL还是会报同样的问题。

enable the following metrics for mperf tma in cpu_tma_transpose

mpf_tma.init(
        {"Metric_GFLOPs_Use", "Metric_DRAM_BW_Use", "Metric_L3_BW_Use", "Metric_L2_BW_Use"});

运行时错误如下:

PD2148:/data/local/tmp/mperf $ ./cpu_tma_transpose
the iter num is 10
the gn and uncore_evt_nums 3, 2
terminating with uncaught exception of type mperf::MperfError: Failed to get a file descriptor for DSU_L3D_CACHE_REFILL

/usr/bin/ld: ../build-x86/third_party/libpfm4//libpfm.a(pfmlib_common.c.o):(.data.rel.ro+0x38): undefined reference to `amd64_k8_revb_support

g++ main.cpp -o test --std=c++11 -I ../include/ -I ../build-x86 -L ../build-x86/eca/xpmu/ -L ../build-x86/eca/tma/ -L ../build-x86/third_party/libpfm4/ -lmperf_xpmu -lpfm_x86 -lpfm
/usr/bin/ld: /tmp/ccRZS8u0.o: in function main': main.cpp:(.text+0x280): undefined reference to set_cpu_thread_affinity_spec_core(unsigned long)'
/usr/bin/ld: main.cpp:(.text+0x2b5): undefined reference to mperf::WallTimer::WallTimer()' /usr/bin/ld: main.cpp:(.text+0x2ef): undefined reference to mperf::WallTimer::get_msecs() const'
/usr/bin/ld: ../build-x86/third_party/libpfm4//libpfm.a(pfmlib_common.c.o):(.data.rel.ro+0x20): undefined reference to netburst_support' /usr/bin/ld: ../build-x86/third_party/libpfm4//libpfm.a(pfmlib_common.c.o):(.data.rel.ro+0x28): undefined reference to netburst_p_support'
/usr/bin/ld: ../build-x86/third_party/libpfm4//libpfm.a(pfmlib_common.c.o):(.data.rel.ro+0x30): undefined reference to amd64_k7_support' /usr/bin/ld: ../build-x86/third_party/libpfm4//libpfm.a(pfmlib_common.c.o):(.data.rel.ro+0x38): undefined reference to amd64_k8_revb_support

error with ioctl on qcom adreno 730

./gpu_adreno_pmu_test 1000 10 1
opencl lib found but can not be opened: /system/lib64/libOpenCL.so err=(null)
opencl lib found but can not be opened: /system/lib64/libOpenCL_system.so err=(null)
opencl lib found but can not be opened: /system/lib64/egl/libGLES_mali.so err=(null)
opencl lib found but can not be opened: /system/vendor/lib64/egl/libGLES_mali.so err=(null)
opencl lib found but can not be opened: /system/vendor/lib64/libPVROCL.so err=(null)
opencl lib found but can not be opened: /usr/lib64/libGLESv2.so.2.1.0 err=(null)
opencl lib found but can not be opened: /data/data/org.pocl.libs/files/lib64/libpocl.so err=(null)
use opencl: libOpenCL.so
ioctl: read in the current value of an event set failed: Operation not permitted
terminating with uncaught exception of type std::runtime_error: Failed to read the value of an event set.
Aborted


device info: Qualcomm, Adreno (TM) 730
build cmd: ./android_build.sh -g adreno

armv7 cpu 编译错误

使用./android_build.sh -m armeabi-v7a 命令编译,报错如下:

/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:27:13: error: invalid operand in inline asm: 'prfm pldl1keep, [$0, #64]3:ldr d2, [$0]fmla v3.4s, v2.4s, v1.s[0]add $0, $0, #64prfm pldl1keep, [$0, #64]subs ${1:w}, ${1:w}, #1bne 3b'
"prfm pldl1keep, [%[b_ptr], #64]\n"
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:27:13: error: invalid operand in inline asm: 'prfm pldl1keep, [$0, #64]3:ldr d2, [$0]fmla v3.4s, v2.4s, v1.s[0]add $0, $0, #64prfm pldl1keep, [$0, #64]subs ${1:w}, ${1:w}, #1bne 3b'
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:27:13: error: invalid instruction
:1:2: note: instantiated into assembly here
prfm pldl1keep, [r0, #64]
^~~~
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:29:14: error: operand must be a register in range [r0, r15]
"ldr d2, [%[b_ptr]]\n"
^
:3:5: note: instantiated into assembly here
ldr d2, [r0]
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:30:14: error: unexpected token in argument list
"fmla v3.4s, v2.4s, v1.s[0]\n"
^
:4:24: note: instantiated into assembly here
fmla v3.4s, v2.4s, v1.s[0]
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:32:14: error: invalid instruction
"prfm pldl1keep, [%[b_ptr], #64]\n"
^
:6:1: note: instantiated into assembly here
prfm pldl1keep, [r0, #64]
^~~~
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:33:14: error: unexpected token in operand
"subs %w[K], %w[K], #1\n"
^
:7:6: note: instantiated into assembly here
subs , , #1
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:41:13: error: invalid operand in inline asm: 'prfm pldl1keep, [$0, #512]3:ldr d2, [$0]fmla v3.4s, v2.4s, v1.s[0]add $0, $0, #64prfm pldl1keep, [$0, #512]subs ${1:w}, ${1:w}, #1bne 3b'
"prfm pldl1keep, [%[b_ptr], #512]\n"
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:41:13: error: invalid operand in inline asm: 'prfm pldl1keep, [$0, #512]3:ldr d2, [$0]fmla v3.4s, v2.4s, v1.s[0]add $0, $0, #64prfm pldl1keep, [$0, #512]subs ${1:w}, ${1:w}, #1bne 3b'
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:41:13: error: invalid instruction
:1:2: note: instantiated into assembly here
prfm pldl1keep, [r0, #512]
^~~~
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:43:14: error: operand must be a register in range [r0, r15]
"ldr d2, [%[b_ptr]]\n"
^
:3:5: note: instantiated into assembly here
ldr d2, [r0]
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:44:14: error: unexpected token in argument list
"fmla v3.4s, v2.4s, v1.s[0]\n"
^
:4:24: note: instantiated into assembly here
fmla v3.4s, v2.4s, v1.s[0]
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:46:14: error: invalid instruction
"prfm pldl1keep, [%[b_ptr], #512]\n"
^
:6:1: note: instantiated into assembly here
prfm pldl1keep, [r0, #512]
^~~~
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:47:14: error: unexpected token in operand
"subs %w[K], %w[K], #1\n"
^
:7:6: note: instantiated into assembly here
subs , , #1
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:63:13: error: register expected
"ld1 {v1.4s}, [%[a_ptr]]\n"
^
:1:7: note: instantiated into assembly here
ld1 {v1.4s}, [r0]
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:64:14: error: register expected
"ld1 {v2.4s}, [%[a_ptr]]\n"
^
:2:6: note: instantiated into assembly here
ld1 {v2.4s}, [r0]
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:65:14: error: register expected
"ld1 {v3.4s}, [%[a_ptr]]\n"
^
:3:6: note: instantiated into assembly here
ld1 {v3.4s}, [r0]
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:41:13: error: invalid operand in inline asm: 'prfm pldl1keep, [$0, #512]3:ldr d2, [$0]fmla v3.4s, v2.4s, v1.s[0]add $0, $0, #64prfm pldl1keep, [$0, #512]subs ${1:w}, ${1:w}, #1bne 3b'
"prfm pldl1keep, [%[b_ptr], #512]\n"
^
/workspace/mperf/apps/cpu_pmu_analysis/prefetch.cpp:41:13: error: invalid operand in inline asm: 'prfm pldl1keep, [$0, #512]3:ldr d2, [$0]fmla v3.4s, v2.4s, v1.s[0]add $0, $0, #64prfm pldl1keep, [$0, #512]subs ${1:w}, ${1:w}, #1bne 3b'
fatal error: too many errors emitted, stopping now [-ferror-limit=]

cmake error with NDK r25c, WSL ubuntu 20.04

env:

  1. android-ndk-r25c
  2. WSL Ubuntu 20.04
$ ./android_build.sh -g mali
the SRC_DIR val is /home/stayua01/code/mperf
build with gpu:mali
NDK_ROOT: /mnt/c/wsl/software/android-ndk-r25c/
strip remove old build
build dir: /home/stayua01/code/mperf/build-arm64-v8a/
build ARCH: arm64-v8a
build ABI: arm64-v8a
build native level: 21
BUILD MAKEFILE_TYPE: Ninja
cmake install prefix: /usr/local
create build dir
CMake Error: CMake was unable to find a build program corresponding to "Ninja".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
CMake Error: CMAKE_ASM_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!
See also "/home/stayua01/code/mperf/build-arm64-v8a/CMakeFiles/CMakeOutput.log".

So, I install ninja-build: sudo apt install ninja-build.

$ ./android_build.sh -g mali
the SRC_DIR val is /home/stayua01/code/mperf
build with gpu:mali
NDK_ROOT: /mnt/c/wsl/software/android-ndk-r25c/
strip remove old build
build dir: /home/stayua01/code/mperf/build-arm64-v8a/
build ARCH: arm64-v8a
build ABI: arm64-v8a
build native level: 21
BUILD MAKEFILE_TYPE: Ninja
cmake install prefix: /usr/local
create build dir
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
-- The ASM compiler identification is unknown
-- Found assembler: /mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: /mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang
-- Check for working C compiler: /mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang - broken
CMake Error at /home/stayua01/software/miniconda3/lib/python3.9/site-packages/cmake/data/share/cmake-3.21/Modules/CMakeTestCCompiler.cmake:69 (message):
  The C compiler

    "/mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /home/stayua01/code/mperf/build-arm64-v8a/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/ninja cmTC_71f46 && [1/2] Building C object CMakeFiles/cmTC_71f46.dir/testCCompiler.c.o
    FAILED: CMakeFiles/cmTC_71f46.dir/testCCompiler.c.o
    /mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang   -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -o CMakeFiles/cmTC_71f46.dir/testCCompiler.c.o -c /home/stayua01/code/mperf/build-arm64-v8a/CMakeFiles/CMakeTmp/testCCompiler.c
    /mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang: 1: clang-14: not found
    ninja: build stopped: subcommand failed.


  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:4 (project)


-- Configuring incomplete, errors occurred!
See also "/home/stayua01/code/mperf/build-arm64-v8a/CMakeFiles/CMakeOutput.log".
See also "/home/stayua01/code/mperf/build-arm64-v8a/CMakeFiles/CMakeError.log".

More infos:

$ cat  /home/stayua01/code/mperf/build-arm64-v8a/CMakeFiles/CMakeOutput.log
The target system is: Android - 1 - aarch64
The host system is: Linux - 5.10.102.1-microsoft-standard-WSL2 - x86_64

$ cat /home/stayua01/code/mperf/build-arm64-v8a/CMakeFiles/CMakeError.log                                                                                               Compiling the C compiler identification source file "CMakeCCompilerId.c" failed.                                                                                                                                 Compiler: /mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang                                                                                                                   Build flags: -g;-DANDROID;-fdata-sections;-ffunction-sections;-funwind-tables;-fstack-protector-strong;-no-canonical-prefixes;-D_FORTIFY_SOURCE=2;-Wformat;-Werror=format-security;                              Id flags: -c;--target=aarch64-none-linux-android21                                                                                                                                                                                                                                                                                                                                                                                The output was:                                                                                                                                                                                                  127
/mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang: 1: clang-14: not found


Compiling the C compiler identification source file "CMakeCCompilerId.c" failed.
Compiler: /mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang
Build flags: -g;-DANDROID;-fdata-sections;-ffunction-sections;-funwind-tables;-fstack-protector-strong;-no-canonical-prefixes;-D_FORTIFY_SOURCE=2;-Wformat;-Werror=format-security;
Id flags:

The output was:
127
/mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang: 1: clang-14: not found

However, I can find clang:

(base) stayua01@100013000760:~/code/mperf$ ls /mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang
/mnt/c/wsl/software/android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang

terminate called after throwing an instance of 'mperf::MperfError'

./cpu_pmu_transpose
1 1 INSTRUCTIONS
the warm iter 0, and time use 19.946565
the warm iter 1, and time use 13.541801
the warm iter 2, and time use 13.487423
the warm iter 3, and time use 13.471814
the warm iter 4, and time use 13.482091
terminate called after throwing an instance of 'mperf::MperfError'
what(): Failed to get a file descriptor for INSTRUCTIONS

报错找不到mperf::CpuCounterSet

来源为README中的示例:
mperf::CpuCounterSet cpuset = "CYCLES,INSTRUCTIONS,...";
mperf::XPMU xpmu(cpuset);
xpmu.run();

... // add your function to be measured

xpmu.sample();
xpmu.stop();
使用时报错找不到,是否支持android机器cpu测试算子执行过程的 GFLOPs和 GBPs,如果支持,要如何测试?按现有文档说明会报错

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.