sirius93123 Goto Github PK

followers: 0.0 following: 1.0 repos: 193.0 gists: 0.0

Type: User

sirius93123's Projects

libxsmm

Library for specialized dense and sparse matrix operations, and deep learning primitives.

llmspeculativesampling

Fast inference from large lauguage models via speculative decoding

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

master-thesis-bayesiancnn

Master Thesis on Bayesian Convolutional Neural Network using Variational Inference

max-pool-cuda

Implemented the max pool filter in CUDA using the in built library and using shared memory

maxas

Assembler for NVIDIA Maxwell architecture

mcunet

megatron-lm

Ongoing research training transformer language models at scale, including: BERT & GPT-2

mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

mlcode

mlir-hlo

mmcv

OpenMMLab Computer Vision Foundation

mmdetection

OpenMMLab Detection Toolbox and Benchmark

mobile-yolov5-pruning-distillation

mobilev2-yolov5s剪枝、蒸馏，支持ncnn，tensorRT部署。ultra-light but better performence！

model-compression-deploy

model compression and deploy. compression: 1、quantization: quantization-aware-training, 16/8/4/2-bit(dorefa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、ternary/binary(twn/bnn/xnor-net); post-training-quantization, 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization folding for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

natural-gradients

Collection of algorithms for approximating Fisher Information Matrix for Natural Gradient (and second order method in general)

nbassembler

Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.

nimble

nn-quantization-pytorch

nncf

PyTorch*-based Neural Network Compression Framework for enhanced OpenVINO™ inference

nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

nvbench

CUDA Kernel Benchmarking Library

nvcc-llvm-ir

Enabling on-the-fly manipulations with LLVM IR code of CUDA sources

nvidia-tensor-core-examples

once-for-all

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment

one-shot-tuner

onnx

Open standard for machine learning interoperability

onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

sirius93123 Goto Github PK

sirius93123's Projects

Recommend Projects

Recommend Topics

Recommend Org