Giter Club home page Giter Club logo

clr-avx-tools's People

Contributors

bryteise avatar fenrus75 avatar lpereira avatar phmccarty avatar thiagomacieira avatar victorrodriguez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

clr-avx-tools's Issues

found avxjudg.py takes callq/jmp as AVX2

Found script takes following jmp/callq instructions as AVX2
while doing libopenblas_nehalemp-r0.3.3.so with '-d' option

 ...
 AVX2 instruction ? jmp    39092d <zsymm3m_olcopyi@@Base+0x1f8d>
 AVX2 instruction ? jl     39092a <zsymm3m_olcopyi@@Base+0x1f8a>
 AVX2 instruction ? jne    3909d8 <zsymm3m_olcopyi@@Base+0x2038>
 AVX2 instruction ? jg     390a20 <zsymm3m_olcopyi@@Base+0x2080>
 AVX2 instruction ? jmpq   38ef4e <zsymm3m_olcopyi@@Base+0x5ae>
 AVX2 instruction ? callq  857c0 <ssymm_@plt>
 AVX2 instruction ? je     390aa1 <zsymm3m_olcopyi@@Base+0x2101>
 ...

 File total (SSE):  555391 instructions with score 208750
 File total (AVX2):  3834 instructions with score 38 <--false report
 File total (AVX512):  0 instructions with score 0

Will file another PR with two patches to fix it.

Duplicate counting of AVX2 and AVX512 ?

Hi,
I am here to bother you know :)
I added some code to debug the likely duplicate counting, given the position I patched right, if I am wrong, please correct me.

I cloned the code from the git repo and make install .

git clone https://github.com/clearlinux/clr-avx-tools.git

make install

patched avxjudge.py with following debug.patch

/usr/share/clr-avx-tools # diff avxjudge.py avxjudge.py.patched
184a185,187

sse_avx2_duplicate_cnt = 0
avx2_avx512_duplicate_cnt = 0

191a195,196

global sse_avx2_duplicate_cnt
global avx2_avx512_duplicate_cnt

235a241,248

if sse_score >=0.0 and avx2_score >= 0.0:
    sse_avx2_duplicate_cnt +=1
    print("duplicate count for sse & avx2 ?", ins, arg, 

sse_avx2_duplicate_cnt)

if avx512_score >= 0.0 and avx2_score >= 0.0:
    avx2_avx512_duplicate_cnt +=1
    print("duplicate count for avx2 & avx512 ?", ins, arg, 

avx2_avx512_duplicate_cnt)
262a276,277

print("File duplicate count of sse&avx2", sse_avx2_duplicate_cnt, 

", duplicate count of avx2&avx512", avx2_avx512_duplicate_cnt);

score the /usr/lib64/avxstatus/openblas_clr/libopenblas_skylakexp-r0.3.3.so with patched avxjudge.py.patched

python3 avxjudge.py.patched /usr/lib64/avxstatus/openblas_clr/libopenblas_skylakexp-r0.3.3.so

we can see following output

โ€ฆ..
duplicate count for sse & avx2 ? vfmadd132sd 0x15473f(%rip),%xmm3,%xmm1 66799
duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm1 66800 duplicate count for avx2 & avx512 ? vextractf64x4 $0x1,%zmm0,%ymm0 11497 duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm0 66801 duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm1 66802 duplicate count for avx2 & avx512 ? vextractf64x4 $0x1,%zmm0,%ymm0 11498 duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm0 66803 duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm1 66804 duplicate count for avx2 & avx512 ? vextractf64x4 $0x1,%zmm0,%ymm0 11499 duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm0 66805 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm1 66806 duplicate count for avx2 & avx512 ? vextractf32x8 $0x1,%zmm0,%ymm0 11500 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm0 66807 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm1 66808 duplicate count for avx2 & avx512 ? vextractf32x8 $0x1,%zmm0,%ymm0 11501 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm0 66809 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm1 66810 duplicate count for avx2 & avx512 ? vextractf32x8 $0x1,%zmm0,%ymm0 11502 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm0 66811
duplicate count for sse & avx2 ? vfmadd231ss 0x436(%rip),%xmm1,%xmm0 66812
duplicate count for sse & avx2 ? vfmadd231sd 0x5e7(%rip),%xmm1,%xmm0 66813
Top SSE functions by instruction count
sgetrf_single@@base 87.5 %s
slaed6_@@base 86.79 %s
dlaed6_@@base 86.79 %s
slasd6_@@base 84.89 %s
dlasd6_@@base 84.89 %s

Top SSE functions by value
clarfy_@@base 2076.4
cgemm3m_oncopyr@@base 1504.61
cgemm3m_oncopyi@@base 1459.66
csymm3m_iucopyb@@base 1459.66
zgemm_kernel_b@@base 1389.3

Top AVX2 functions by instruction count
zgemm_incopy@@base 49.48 %s
zgemm_kernel_b@@base 49.24 %s
cgemm_incopy@@base 49.15 %s
zgemm_kernel_r@@base 49.11 %s
zgemm_kernel_l@@base 49.11 %s

Top AVX2 functions by value
cgemm_kernel_b@@base 3439.4
cgemm_incopy@@base 3439.4
cgemm_kernel_r@@base 3439.16
cgemm_kernel_l@@base 3439.16
zgemm_kernel_b@@base 3190.86

Top AVX512 functions by instruction count
zgemm_itcopy@@base 57.67 %s
zgemm3m_oncopyr@@base 52.59 %s
zgemm3m_oncopyi@@base 51.9 %s
zsymm3m_iucopyb@@base 51.73 %s
cgemm_otcopy@@base 43.35 %s

Top AVX512 functions by value
zgemm3m_oncopyr@@base 1952.7
cgemm3m_oncopyr@@base 1703.05
zgemm3m_oncopyi@@base 1389.69
zsymm3m_iucopyb@@base 1389.54
csymm3m_iucopyb@@base 1251.03

File total (SSE): 542451 instructions with score 126060 File total (AVX2): 114279 instructions with score 112833 File total (AVX512): 80538 instructions with score 30653

File duplicate count of sse&avx2 66813 , duplicate count of avx2&avx512 11502

Please take a look.

The debug.patch and the output log are attached.

Thanks,
Ethan
duplicate_count.log.tar.gz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.