Giter Club home page Giter Club logo

lvnwpu's Projects

autogptq icon autogptq

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

caffe icon caffe

Caffe: a fast open framework for deep learning.

chromium icon chromium

The official GitHub mirror of the Chromium source

cnn icon cnn

This is a matlab-code implementation of convolutional neural network

code-eval icon code-eval

Run evaluation on LLMs using human-eval benchmark

cuda-samples icon cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

cuda_gemm icon cuda_gemm

A simple high performance CUDA GEMM implementation.

cuda_hgemm icon cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

cutlass icon cutlass

CUDA Templates for Linear Algebra Subroutines

deepspeed icon deepspeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

deepspeed-mii icon deepspeed-mii

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

emopy icon emopy

A deep neural net toolkit for emotion analysis via Facial Expression Recognition (FER)

gptq icon gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

how_to_optimize_in_gpu icon how_to_optimize_in_gpu

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

leveldb icon leveldb

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

lightseq icon lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

nn-cuda-example icon nn-cuda-example

Several simple examples for popular neural network toolkits calling custom CUDA operators.

qwen icon qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

server icon server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.