Giter Club home page Giter Club logo

Comments (8)

ptillet avatar ptillet commented on July 22, 2024 1

Hi, this data is actually rather old. Auto-TVM has improved since then in FP32, and the focus of Triton has focuses to tensor cores. There is more data in Chapter 5 of my PhD dissertation if you are interested ( https://search.proquest.com/openview/8e39f66528f4f49a4aab91412cff9d05/1?pq-origsite=gscholar&cbl=18750&diss=y ).

I am working on a library of standard ops for deep learning (matmul, conv, batchnorm, softmax, etc.) This should provide a good deal of examples for people interested in learning more about Triton. My ultimate goal is actually to provide a compact, viable and open-source alternative to cuBLAS and cuDNN. When I do, I will include script to generate these kinds of plots, including comparison again TVM -- Tensor Comprehensions is now deprecated AFAIK. It should be out in about 1 month (or 2 at most)

Note that support for INT8 is pretty poor (and in fact inexistent for tensor cores) at the moment and that the focus is more on training. TensorRT would be hard to replace at this point.

from triton.

leiwen83 avatar leiwen83 commented on July 22, 2024

I see.

Actually from daily work, I conclude that the most time consuming part, one is the CONV, another is data movement. The data movement op like nhwc2nchw in TRT is also good at reaching the peak bandwidth provided by HW.
What is omited by TRT is that beside op like nhwc2nchw, there are actually many other data movement require to be optimized. So if triton would be help, that would be great.

from triton.

ptillet avatar ptillet commented on July 22, 2024

Do you think you could create a list of ops that you need? I can try to include them in the upcoming op library if possible.

from triton.

leiwen83 avatar leiwen83 commented on July 22, 2024

I think below ops may have first try:

  1. deformable convolution. which may refer to the DCV2_op in https://github.com/msracver/Deformable-ConvNets.
  2. nchw2nhwc and nhwc2nchw implemented by triton which could be compared the bandwidth with TRT
  3. nn.shuffle()
  4. ops fusion that take [nhwc2nchw ->nn.shuffle()->nchw2nhwc] into one op

from triton.

ptillet avatar ptillet commented on July 22, 2024

There will definitely be a permute op similar to https://github.com/ptillet/torch-blocksparse/blob/master/torch_blocksparse/permute.py that will allow high-BW conversion to/from any CNN input format. I think right now this is only tested for conversion between NCHW and CHWN, but it can be easily edited to accomodate NHWC as well.

By nn.shuffle, do you mean https://pytorch.org/docs/stable/generated/torch.nn.PixelShuffle.html ? I could do that, seems like a low-hanging fruit and generally useful op.

from triton.

leiwen83 avatar leiwen83 commented on July 22, 2024

Yes. it is the torch.nn.PixelShuffle

from triton.

ptillet avatar ptillet commented on July 22, 2024

To answer the initial question, I have dug through my filesystem and found the plot code I used for the roofline method:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib

poster_mode = False
if poster_mode:
  font = {'size'   : 22}
  matplotlib.rc('font', **font)
  matplotlib.rc('lines', markersize=11)
  plt.figure(figsize=(20,10))

colors = {'cublas': '#0d770e',
          'triton': '#a81034',
		      'tvm': '#f4b95f',
          'tc': '#adb9d3',
          'plaidml': '#00a0dc'}

perf = {'cublas':  [0.37, 0.77, 1.55, 3, 4.42, 5.51, 6.43, 6.66, 6.92],
        'triton':  [0.45, 0.91, 1.79, 2.99, 4.38, 5.27, 5.84, 5.97, 6.5],
        'tc':      [0.29, 0.32, 0.52, 0.96, 1.1, 1.48, 1.8, 2.2, 2.25],
        'tvm':     [0.14, 0.37, 0.93, 1.85, 2.91, 3.43, 3.38, 3.85, 3.86],
        'plaidml': [0.1, 0.17, 0.3, 0.52, 0.93, 1.51, 1.76, 2.12, 2.44]}

# arithmetic intensity
m = np.repeat(1760, 9)
n = 2**np.arange(2, 11)
k = np.repeat(1760, 9)
flops = 2.*m*n*k
transfer = 4.*(m*k + k*n)
intensity = flops / transfer

# device properties
bandwidth = 256*1e9
max_flops = 7.5
roofline = np.minimum(bandwidth*intensity*1e-12, max_flops)
plt.loglog(intensity, roofline, label = 'Roofline Model', color = 'black')
plt.scatter(intensity, perf['cublas'], label = 'cuBLAS 10.0', color = colors['cublas'])
plt.scatter(intensity, perf['triton'], label = 'Triton', color = colors['triton'])
plt.scatter(intensity, perf['tvm'], label = 'Auto-TVM', color = colors['tvm'])
plt.scatter(intensity, perf['tc'], label = 'Tensor Comprehensions', color = colors['tc'])
plt.scatter(intensity, perf['plaidml'], label = 'PlaidML', color = colors['plaidml'])
plt.legend()
plt.xlabel('Arithmetic Intensity (TFLOP/GB)')
plt.ylabel('Performance (TFLOP/S)')
name = 'roofline-baseline'
if poster_mode:
  name += '-poster'
plt.savefig(name + '.pdf', transparent = False, bbox_inches = 'tight', pad_inches = 0)
plt.show()

Unfortunately, I don't have the script I used to get the performance data. I don't think the FP32 data is meaningful anymore: Auto-TVM got significantly better, and TC/PlaidML got deprecated.

from triton.

ptillet avatar ptillet commented on July 22, 2024

Closing the issue for now. Opened an issue on the permute op here: https://github.com/ptillet/triton/issues/56

from triton.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.