Giter Club home page Giter Club logo

Comments (3)

imaginary-person avatar imaginary-person commented on May 24, 2024

It doesn't result in a speedup, as memcpy is memory-bound.

Test                     Size(B)         Avg.Time(us)
gdr_copy_to_mapping             1             0.2038
gdr_copy_to_mapping             2             0.1940
gdr_copy_to_mapping             4             0.1860
gdr_copy_to_mapping             8             0.1865
DBG:  using AVX2 implementation of gdr_copy_to_bar
gdr_copy_to_mapping            16             0.1960
gdr_copy_to_mapping            32             0.1928
gdr_copy_to_mapping            64             0.1901
gdr_copy_to_mapping           128             0.1869
gdr_copy_to_mapping           256             0.1926
gdr_copy_to_mapping           512             0.2109
gdr_copy_to_mapping          1024             0.2547
gdr_copy_to_mapping          2048             0.3260
gdr_copy_to_mapping          4096             0.4883
gdr_copy_to_mapping          8192             0.8617
gdr_copy_to_mapping         16384             1.6531
gdr_copy_to_mapping         32768             3.2493
gdr_copy_to_mapping         65536             6.4663
gdr_copy_to_mapping        131072            12.8850
gdr_copy_to_mapping        262144            25.7638
gdr_copy_to_mapping        524288            51.4691
gdr_copy_to_mapping       1048576           102.8449
gdr_copy_to_mapping       2097152           206.1706
gdr_copy_to_mapping       4194304           413.7580
gdr_copy_to_mapping       8388608           828.0581
gdr_copy_to_mapping      16777216          1676.4880

from gdrcopy.

imaginary-person avatar imaginary-person commented on May 24, 2024

BTW, for testing AVX2 or AVX512 support, __cpuid_count should be used instead of __get_cpuid.
has_avx2 seems to be incorrectly computed in the source-code, and is 0 even when it should be 1.

from gdrcopy.

hongbilu avatar hongbilu commented on May 24, 2024

It doesn't result in a speedup, as memcpy is memory-bound.

Test                     Size(B)         Avg.Time(us)
gdr_copy_to_mapping             1             0.2038
gdr_copy_to_mapping             2             0.1940
gdr_copy_to_mapping             4             0.1860
gdr_copy_to_mapping             8             0.1865
DBG:  using AVX2 implementation of gdr_copy_to_bar
gdr_copy_to_mapping            16             0.1960
gdr_copy_to_mapping            32             0.1928
gdr_copy_to_mapping            64             0.1901
gdr_copy_to_mapping           128             0.1869
gdr_copy_to_mapping           256             0.1926
gdr_copy_to_mapping           512             0.2109
gdr_copy_to_mapping          1024             0.2547
gdr_copy_to_mapping          2048             0.3260
gdr_copy_to_mapping          4096             0.4883
gdr_copy_to_mapping          8192             0.8617
gdr_copy_to_mapping         16384             1.6531
gdr_copy_to_mapping         32768             3.2493
gdr_copy_to_mapping         65536             6.4663
gdr_copy_to_mapping        131072            12.8850
gdr_copy_to_mapping        262144            25.7638
gdr_copy_to_mapping        524288            51.4691
gdr_copy_to_mapping       1048576           102.8449
gdr_copy_to_mapping       2097152           206.1706
gdr_copy_to_mapping       4194304           413.7580
gdr_copy_to_mapping       8388608           828.0581
gdr_copy_to_mapping      16777216          1676.4880

May I ask what do you do for AVX2 optimization compared to AVX?

from gdrcopy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.