Giter Club home page Giter Club logo

lightseq's Introduction

LightSeq: A High Performance Library for Sequence Processing and Generation

logo


Table Of Contents

Release Notes

[2022.10.25] Release v3.0.0 version, which supports int8 mixed-precision training and inference. [中文介绍]

[2021.06.18] Release v2.0.0 version, which supports fp16 mixed-precision training. [中文介绍]

[2019.12.06] Release v1.0.0 version, which supports fp16 mixed-precision inference. [中文介绍]

Introduction

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP and CV models such as BERT, GPT, Transformer, etc. It is therefore best useful for machine translation, text generation, image classification, and other sequence related tasks.

The library is built on top of CUDA official library(cuBLAS, Thrust, CUB) and custom kernel functions which are specially fused and optimized for Transformer model family. In addition to model components, the inference library also provide easy-to-deploy model management and serving backend based on TensorRT Inference Server. With LightSeq, one can easily develop modified Transformer architecture with little additional code.

LightSeq training and inference is very fast. Below is the overall performance:

  • LightSeq fp16 training achieves a speedup of up to 3x, compared to PyTorch fp16 training.
  • LightSeq int8 training achieves a speedup of up to 5x, compared to PyTorch QAT (i.e., quantization aware training).
  • LightSeq fp16 and int8 inference achieve a speedup of up to 12x and 15x, compared to PyTorch fp16 inference, respectively.

Support Matrix

LightSeq supports multiple features, which is shown in the table below.

Features Support List
Model Transformer, BERT, BART, GPT2, ViT, T5, MT5, XGLM, VAE, Multilingual, MoE
Layer embedding, encoder, decoder, criterion, optimizer
Precision fp32, fp16, int8
Mode training, inference
Compatibility Fairseq, Hugging Face, DeepSpeed
Decoding Algorithm beam search, diverse beam search, sampling, CRF
Others gradient communication quantization, auto-tune GEMM algorithm

The table below shows the running modes and precision currently supported by different models.

Models fp16 Training fp16 Inference int8 Training int8 Inference
Transformer Yes Yes Yes Yes
BERT Yes Yes Yes Yes
GPT2 Yes Yes Yes Yes
BART Yes Yes - -
T5 - Yes - -
MT5 - Yes - -
XGLM - Yes - -
ViT Yes Yes Yes Yes
VAE - Yes - -
Multilingual - Yes - Yes
MoE - Yes - -

Performance

We test the speedup of LightSeq training and inference using both fp16 and int8 mix-precision on Transformer and BERT models. The baseline is PyTorch fp16 mix-precision. Training experiments are tested on one A100 GPU and inference experiments are tested on eight A100 GPUs.

More performance results are available here.

Speedup of Transformer Training

Batch Token Size PyTorch QAT LightSeq fp16 LightSeq int8
512 0.36 1.99 1.86
1024 0.37 1.78 1.69
2048 0.37 1.56 1.50
4096 0.39 1.47 1.44
8192 0.41 1.44 1.44
15000 0.43 1.44 1.44

Speedup of BERT Training

Batch Token Size PyTorch QAT LightSeq fp16 LightSeq int8
8 0.45 2.12 1.99
16 0.44 1.92 1.80
32 0.42 1.59 1.52
64 0.46 1.62 1.58
128 0.46 1.74 1.70
256 0.46 1.68 1.73

Speedup of Transformer Inference

Batch Size Sequence Length LightSeq fp16 LightSeq int8
1 8 8.00 9.33
1 32 6.48 7.38
1 128 6.24 6.19
8 8 9.38 10.71
8 32 8.24 8.75
8 128 6.83 7.28
32 8 11.82 14.44
32 32 9.68 11.15
32 128 6.68 7.74

Speedup of BERT Inference

Batch Size Sequence Length LightSeq fp16 LightSeq int8
1 8 9.22 9.87
1 32 10.51 11.30
1 128 9.96 10.85
8 8 9.88 10.33
8 32 7.79 8.22
8 128 4.04 4.35
32 8 10.60 11.02
32 32 8.11 8.85
32 128 1.82 2.04

Installation

Install from PyPI

You can install LightSeq from PyPI, which only supports Python 3.6 to 3.8 on Linux:

pip install lightseq

Build from Source

You can also build from source:

PATH=/usr/local/hdf5/:$PATH ENABLE_FP32=0 ENABLE_DEBUG=0 pip install -e $PROJECT_DIR

Detailed building introduction is available here.

Getting Started

We provide several samples here to show the usage of LightSeq. Refer to the complete user guide and examples for more details.

LightSeq Training from Scratch

You can use the modules provided by LightSeq to build your own models. The following is an example of building a Transformer encoder layer.

First, import LightSeq Transformer encoder module:

from lightseq.training import LSTransformerEncoderLayer

Then create an encoder configuration, and create a LightSeq Transformer encoder layer initialized with the configuration:

config = LSTransformerEncoderLayer.get_config(
    max_batch_tokens=4096,
    max_seq_len=512,
    hidden_size=1024,
    intermediate_size=4096,
    nhead=16,
    attn_prob_dropout_ratio=0.1,
    activation_dropout_ratio=0.1,
    hidden_dropout_ratio=0.1,
    pre_layer_norm=True,
    activation_fn="relu",
    fp16=True,
    local_rank=0,
)
layer = LSTransformerEncoderLayer(config)

In addition to encoder layers, the other modules can be created using similar methods, and then be trained as normal PyTorch models.

More usage is available here.

LightSeq Training from Fairseq

LightSeq integrates all the fast and lightning modules into Fairseq.

First install the two following requirements:

pip install fairseq==0.10.2 sacremoses

You can train a fp16 mix-precision translation task on wmt14 en2de dataset by:

sh examples/training/fairseq/ls_fairseq_wmt14en2de.sh

(Optional) Then you can start int8 mix-precision training on the basis of fp16 pre-training models by:

sh examples/training/fairseq/ls_fairseq_quant_wmt14en2de.sh

More usage is available here.

LightSeq Training from Hugging Face BERT

LightSeq replaces the encoder layers of Hugging Face BERT with LightSeq fast layers.

First you should install these requirements:

pip install transformers seqeval datasets

Before doing next training, you need to switch to the following directory:

cd examples/training/huggingface/bert

Then you can easily fine-tune BERT for different tasks. Taking named entity recognition task as an example, you can train the BERT with fp16 mixed-precision using:

python task_ner/run_ner.sh

(Optional) You can also start int8 mix-precision training on the basis of fp16 pre-training models by:

python task_ner/run_quant_ner.sh

More usage is available here.

LightSeq Inference from Fairseq

After training using the above scripts, you can quickly infer the models using LightSeq.

You should transform the fp16 PyTorch weights to LightSeq protobuf or HDF5:

python export/fairseq/ls_fs_transformer_export.py

(Optional) You can also transform the int8 PyTorch weights to LightSeq protobuf or HDF5:

python export/fairseq/ls_fs_quant_transformer_export.py

Once obtaining the LightSeq weights, you can quickly infer them using the following code:

import lightseq.inference as lsi
model = lsi.Transformer(MODEL_PATH, MAX_BATCH_SIZE)
results = model.infer([[63, 47, 65, 1507, 88, 74, 10, 2057, 362, 9, 284, 6, 2, 1]])

Here MODEL_PATH is the path of your LightSeq weights and MAX_BATCH_SIZE is the maximal batch size of your input sentences.

You can also quickly infer the int8 LightSeq weights by replacing the lsi.Transformer with lsi.QuantTransformer.

More usage is available here.

LightSeq Inference from Hugging Face BERT

We provide an end2end bert-base example to see how fast Lightseq is compared to original Hugging Face.

First you should install the requirements and locate to the specified directory:

pip install transformers
cd examples/inference/python

Then you can check the performance by simply running the following commands. hf_bert_export.py is used to transform PyTorch weights to LightSeq protobuf or HDF5.

python export/huggingface/hf_bert_export.py
python test/ls_bert.py

More usage is available here.

LightSeq Deployment Using Inference Server

We provide a docker image which contains tritonserver and LightSeq's dynamic link library, and you can deploy an inference server by simply replacing the model file with your own model file.

sudo docker pull hexisyztem/tritonserver_lightseq:22.01-1

More usage is available here.

Cite Us

If you use LightSeq in your research, please cite the following papers.

@InProceedings{wang2021lightseq,
    title = "{L}ight{S}eq: A High Performance Inference Library for Transformers",
    author = "Wang, Xiaohui and Xiong, Ying and Wei, Yang and Wang, Mingxuan and Li, Lei",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers (NAACL-HLT)",
    month = jun,
    year = "2021",
    publisher = "Association for Computational Linguistics",
    pages = "113--120",
}

@article{wang2021lightseq2,
  title={LightSeq2: Accelerated Training for Transformer-based Models on GPUs},
  author={Wang, Xiaohui and Xiong, Ying and Qian, Xian and Wei, Yang and Li, Lei and Wang, Mingxuan},
  journal={arXiv preprint arXiv:2110.05722},
  year={2021}
}

We are Hiring!

The LightSeq team is hiring Interns and FTEs with backgrounds in deep learning system, natural language processing, computer vision, speech, etc. We are based in Beijing and Shanghai. If you are interested, please send your resume to [email protected].

lightseq's People

Contributors

aachong avatar anaivebird avatar anychnn avatar aseaday avatar godweiyang avatar handh1998 avatar hexisyztem avatar kangmo avatar lileicc avatar lszxb avatar neopro12 avatar nomadlx avatar taka152 avatar xian8 avatar xingyaoww avatar zjersey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lightseq's Issues

Which version of fairseq is this code v2.0.2 compatible with?

I ran the script ls_fairseq_wmt14en2de.sh, but I encountered an error:

AttributeError: 'TranslationTask' object has no attribute 'args'

It appears that the Fairseq code version is incompatible, and which version of fairseq is this code v2.0.2 compatible with?

Environment:

  • python 3.7.3
  • pytorch 1.7.1+cu101
  • fairseq 0.10.2 (source code to install, commit id d18e44a)
  • lightseq v2.0.2 (pip insatll lightseq)

CUDA Graph to further improve performance?

Hi there! Thank you for your amazing work on implementing the faster components for transformer-based models! I've found that you have multiple gpu kernels in an encoder or decoder. Have you ever tried the CUDA Graph mechanism introduced by nvidia to combine a graph of kernels into one to reduce the launch overhead and memory copies furtherly? It seems to me that we could easily take the advantages of this mechanism in lightseq. Wonder if you are willing to have a try :)

[inference] RuntimeError: CUBLAS_STATUS_NOT_SUPPORTED on cards compute capability greater than 80

2021-06-30 20:37:00.954006: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
initializing gpt tokenizer...
lightseq tokenizer pad token id: 0
huggingface tokenizer pad token id: 0
creating lightseq model...
finish initializing emb_wei from host to device
finish initializing enc_wei from host to device
finish initializing all weight from host to device
gpt2 buf_bytesize: 37281664
creating huggingface model...
====================START warmup====================
=========lightseq=========
lightseq generating...
Traceback (most recent call last):
File "ls_gpt.py", line 118, in
main()
File "ls_gpt.py", line 94, in main
warmup(ls_tokenizer, hf_tokenizer, ls_model, hf_model, sentences)
File "ls_gpt.py", line 56, in warmup
ls_generate(ls_model, ls_tokenizer, ls_inputs)
File "ls_gpt.py", line 33, in ls_generate
ls_res_ids, ls_time = ls_gpt2(model, inputs)
File "ls_gpt.py", line 12, in ls_gpt2
generated_ids = model.sample(inputs)
RuntimeError: [CUDA][ERROR] /tmp/build-via-sdist-ifaem8qq/lightseq-2.0.3/lightseq/inference/model/gpt_encoder.cc.cu(397): CUBLAS_STATUS_NOT_SUPPORTED

Stuck at “Using ~/.cache/torch_extensions as PyTorch extensions root...”

I ran the script ls_fairseq_wmt14en2de.sh, but it was stuck for a long time. I don't think it will take that long to load the WMT14 data.

Environment:

  • GPU: GeForce GTX 1080
  • Driver Version: 430.40
  • CUDA Version: 10.1
  • python 3.7.3
  • pytorch 1.7.1+cu101
  • fairseq 0.10.2 (source code installation, commit id 83e615d)
  • lightseq v2.0.2 (pip insatll lightseq, because ModuleNotFoundError: No module named 'examples.training' error occurs when the source installation)

I print processes wchan file, and it show poll_schedule_timeout.

UnboundLocalError: local variable 'logging_output' referenced before assignment

I tried to run the example in lightseq/examples/training/fairseq (I just entered the two lines of commands in README). After a few seconds, there raised an error:

‘’‘
return logging_output
UnboundLocalError: local variable 'logging_output' referenced before assignment
’‘’

Could anyone tell me what happened plz?

47x speedup results could be incorrect since ht model is not setup well. It should be ~5x speedup.

  1. Setup the hf model and input tensors to cuda device.
  2. Warmup the model with some simple sentences

ls_bart.py

import time
import argparse

import torch
import lightseq
from transformers import BartTokenizer, BartForConditionalGeneration


def ls_bart(model, inputs):
    torch.cuda.synchronize()
    start_time = time.perf_counter()
    generated_ids = model.infer(inputs)
    torch.cuda.synchronize()
    end_time = time.perf_counter()
    return generated_ids, end_time - start_time


def hf_bart(model, inputs):
    inputs = inputs.to('cuda')
    torch.cuda.synchronize()
    start_time = time.perf_counter()
    generated_ids = model.generate(inputs, max_length=50)
    torch.cuda.synchronize()
    end_time = time.perf_counter()
    return generated_ids, end_time - start_time


def ls_generate(model, tokenizer, inputs_id):
    print("=========================lightseq=========================")
    print("lightseq generating...")
    ls_res_ids, ls_time = ls_bart(model, inputs_id)
    ls_res_ids = [ids[0] for ids in ls_res_ids[0]]
    ls_res = tokenizer.batch_decode(ls_res_ids, skip_special_tokens=True)
    print(f"lightseq time: {ls_time}s")
    print("lightseq results:")
    for sent in ls_res:
        print(sent)


def hf_generate(model, tokenizer, inputs_id):
    print("=========================huggingface=========================")
    print("huggingface generating...")
    hf_res_ids, hf_time = hf_bart(model, inputs_id)
    hf_res = tokenizer.batch_decode(hf_res_ids, skip_special_tokens=True)
    print(f"huggingface time: {hf_time}s")
    print("huggingface results:")
    for sent in hf_res:
        print(sent)


def warmup(tokenizer, ls_model, hf_model,
           sentences=["Let's do a quick warm up <mask>.", "what do you <mask> of", "that is a <mask> idea"]):
    print("======START warmup=====")
    inputs = tokenizer(sentences, return_tensors="pt", padding=True)
    inputs_id = inputs["input_ids"]

    ls_generate(ls_model, tokenizer, inputs_id)
    hf_generate(hf_model, tokenizer, inputs_id)
    print("======END warmup=====")


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--user_input', action="store_true")
    args = parser.parse_args()

    print("initializing bart tokenizer...")
    tokenizer = BartTokenizer.from_pretrained("facebook/bart-base")

    print("creating lightseq model...")
    ls_model = lightseq.Transformer("lightseq_bart_base.pb", 128)
    print("creating huggingface model...")
    hf_model = BartForConditionalGeneration.from_pretrained("facebook/bart-base")
    hf_model.eval()
    hf_model.to('cuda')

    warmup(tokenizer, ls_model, hf_model)

    while True:
        if args.user_input:
            sentences = [input("input the masked sentence:\n")]
        else:
            sentences = [
                "I love that girl, but <mask> does not <mask> me.",
                "She is so <mask> that I can not help glance at <mask>.",
                "Nothing's gonna <mask> my love for you.",
                "Drop everything now. Meet me in the pouring <mask>. Kiss me on the sidewalk."
            ]

        print("tokenizing the sentences...")
        inputs = tokenizer(sentences, return_tensors="pt", padding=True)
        inputs_id = inputs["input_ids"]

        ls_generate(ls_model, tokenizer, inputs_id)
        hf_generate(hf_model, tokenizer, inputs_id)

        if not args.user_input:
            break


if __name__ == "__main__":
    main()

output

tokenizing the sentences...
=========================lightseq=========================
lightseq generating...
lightseq time: 0.041104961186647415s
lightseq results:
I love that girl, but she does not love me.
She is so beautiful that I can not help glance at her.
Nothing's gonna change my love for you.
Drop everything now. Meet me in the pouring rain. Kiss me on the sidewalk.
=========================huggingface=========================
huggingface generating...
huggingface time: 0.153672456741333s
huggingface results:
I love that girl, but she does not love me.
She is so beautiful that I can not help glance at her.
Nothing's gonna change my love for you.
Drop everything now. Meet me in the pouring rain. Kiss me on the sidewalk.

cuda 11.3 already has cub, will conflict with 3rdparty/cub/

cuda 11.3 already has cub in /usr/local/cuda/include/cub, will conflict with 3rdparty/cub/
error:
usr/local/cuda/include/cub/block/../util_type.cuh(72): error: class template "cub::If" has already been defined
/usr/local/cuda/include/cub/block/../util_type.cuh(81): error: class template "cub::If" has already been defined
/usr/local/cuda/include/cub/block/../util_type.cuh(98): error: class template "cub::Equals" has already been defined
/usr/local/cuda/include/cub/block/../util_type.cuh(72): error: class template "cub::If" has already been defined
/usr/local/cuda/include/cub/block/../util_type.cuh(109): error: class template "cub::Equals" has already been defined
/usr/local/cuda/include/cub/block/../util_type.cuh(81): error: class template "cub::If" has already been defined

loss nan when using apex amp

We have found LightSeq will cause nan loss with apex amp.

You can try opt_level=O3 to solve this and a PR fixing the compatibility with O1 is on the way

lightseq gpt2-medium 生成结果异常

lightseq == 2.0.0

代码修改如下,测试gpt2-medium 加速效果:

diff --git a/example/python/hf_gpt2_export.py b/example/python/hf_gpt2_export.py
index 9cd2845..0ed5e4b 100644
--- a/example/python/hf_gpt2_export.py
+++ b/example/python/hf_gpt2_export.py
@@ -106,9 +106,9 @@ def extract_gpt_weights(


 if __name__ == "__main__":
-    output_lightseq_model_name = "lightseq_gpt2.pb"
-    input_huggingface_gpt_model = "gpt2"
-    head_number = 12
+    output_lightseq_model_name = "lightseq_gpt2_medium.pb"
+    input_huggingface_gpt_model = "gpt2-medium"
+    head_number = 24
     # generation_method should be "topk" or "topp"
     generation_method = "topk"
     topk = 1
diff --git a/example/python/ls_gpt.py b/example/python/ls_gpt.py
index 6a13d76..63a5fab 100644
--- a/example/python/ls_gpt.py
+++ b/example/python/ls_gpt.py
@@ -65,21 +65,21 @@ def main():

     print("initializing gpt tokenizer...")

-    ls_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+    ls_tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
     # lightseq use len(tokenizer) as pad_token in default
     ls_tokenizer.add_special_tokens({"pad_token": "[PAD]"})
     print(f"lightseq tokenizer pad token id: {ls_tokenizer.pad_token_id}")

-    hf_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+    hf_tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
     # use EOS as PAD for huggingface to avoid warning according to https://huggingface.co/blog/how-to-generate while avoid reshaping the model embedding
     hf_tokenizer.pad_token = hf_tokenizer.eos_token
     print(f"huggingface tokenizer pad token id: {hf_tokenizer.pad_token_id}")

     print("creating lightseq model...")
-    ls_model = lsi.Gpt("lightseq_gpt2.pb", max_batch_size=16, max_step=50)
+    ls_model = lsi.Gpt("lightseq_gpt2_medium.pb", max_batch_size=16, max_step=50)

     print("creating huggingface model...")
-    hf_model = GPT2LMHeadModel.from_pretrained("gpt2")
+    hf_model = GPT2LMHeadModel.from_pretrained("gpt2-medium")
     hf_model.to("cuda:0")

     # lightseq gpt perplexity supports batch infer with different lengths,

v100下生成结果如下:

$python ls_gpt.py
initializing gpt tokenizer...
lightseq tokenizer pad token id: 50257
huggingface tokenizer pad token id: 50256
creating lightseq model...
finish initializing emb_wei from host to device
finish initializing enc_wei from host to device
finish initializing all weight from host to device
gpt2 buf_bytesize: 160692864
creating huggingface model...
====================START warmup====================
=========lightseq=========
lightseq generating...
lightseq time: 0.1644422672688961s
lightseq results:
My name is GPT or to


My name is GPT, – – of the,, U the U one of one U one U one U one of one U one of one U U U U U U U SP one U S one U SP one U S one S one
My name is GPT's P and and and and as as as as I and I and I and I and I and I and I and I and I I the to the to the to the to the to the to the to the to the
My name is GPT
-- the: the the the the
, the the the the the the the
 the, a the the the the
 a the the the the the the the the the the the the the- a a the
=========huggingface=========
huggingface generating...
huggingface time: 1.4769768044352531s
huggingface results:
My name is GPT. I am a professional photographer and I am a member of the International Photographic Union (IPU). I am a member of the International Photographic Union (IPU) and I am a member of the International Photographic
My name is GPT. I am a professional photographer and I am a member of the International Photographic Union (IPU). I am a member of the International Photographic Union (IPU) and I am a member of the International Photographic
My name is GPT. I am a professional photographer and I am a member of the International Photographic Union (IPU). I am a member of the International Photographic Union (IPU) and I am a member of the International Photographic
My name is GPT. I am a professional photographer and I am a member of the International Photographic Union (IPU). I am a member of the International Photographic Union (IPU) and I am a member of the International Photographic
====================END warmup====================
tokenizing the sentences...
=========lightseq=========
lightseq generating...
lightseq time: 0.15774999745190144s
lightseq results:
My name is GPT or " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
My name is GPT, – The, " The, U, one, one, one, one's one of the U one's one, the first, one, the first, one, one, one, one, one, the first
My name is GPT's P.- and as as as as we to and and and and and to to to to to to to to to to to to to to to to to to to to to to to to to to to the to
My name is GPT ( "--- D "- the---------------------------------,--
=========huggingface=========
huggingface generating...
huggingface time: 1.204436743631959s
huggingface results:
My name is GPT. I am a professional photographer and I am a member of the International Photographic Union (IPU). I am a member of the International Photographic Union (IPU) and I am a member of the International Photographic
My name is GPT. I am a professional photographer and I am a member of the International Photographic Union (IPU). I am a member of the International Photographic Union (IPU) and I am a member of the International Photographic
My name is GPT. I am a professional photographer and I am a member of the International Photographic Union (IPU). I am a member of the International Photographic Union (IPU) and I am a member of the International Photographic
My name is GPT. I am a professional photographer and I am a member of the International Photographic Union (IPU). I am a member of the International Photographic Union (IPU) and I am a member of the International Photographic

是我代码修改哪里有问题吗?

not able to load the gptlm example given in readme

so when i run the following command i am getting this error . Is it because of the some mistake in gpt.pb file ?
sudo docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/media/zlabs-nlp/hdd1/ravi/ravi/lightseq/modelzoo:/models nvcr.io/nvidia/tensorrtserver:19.05-py3 trtserver --model-store=/models

===============================
== TensorRT Inference Server ==
===============================

NVIDIA Release 19.05 (build 6393584)

Copyright (c) 2018-2019, NVIDIA CORPORATION.  All rights reserved.
Copyright 2019 The TensorFlow Authors.  All rights reserved.
Copyright 2019 The TensorFlow Serving Authors.  All rights reserved.
Copyright (c) 2016-present, Facebook Inc. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

I0929 14:23:02.049226 1 main.cc:267] Starting endpoints, 'inference:0' listening on
I0929 14:23:02.049323 1 main.cc:271]  localhost:8001 for gRPC requests
I0929 14:23:02.049440 1 grpc_server.cc:265] Building nvrpc server
I0929 14:23:02.049452 1 grpc_server.cc:272] Register TensorRT GRPCService
I0929 14:23:02.049464 1 grpc_server.cc:275] Register Infer RPC
I0929 14:23:02.049470 1 grpc_server.cc:279] Register StreamInfer RPC
I0929 14:23:02.049474 1 grpc_server.cc:284] Register Status RPC
I0929 14:23:02.049480 1 grpc_server.cc:288] Register Profile RPC
I0929 14:23:02.049484 1 grpc_server.cc:292] Register Health RPC
I0929 14:23:02.049490 1 grpc_server.cc:304] Register Executor
I0929 14:23:02.054788 1 main.cc:282]  localhost:8000 for HTTP requests
I0929 14:23:02.096256 1 main.cc:294]  localhost:8002 for metric reporting
I0929 14:23:02.098009 1 metrics.cc:149] found 1 GPUs supporting NVML metrics
I0929 14:23:02.103680 1 metrics.cc:158]   GPU 0: GeForce GTX 1080 Ti
I0929 14:23:02.104140 1 server.cc:243] Initializing TensorRT Inference Server
I0929 14:23:02.109893 1 server_status.cc:106] New status tracking for model 'gptlm'
2020-09-29 14:23:02.110033: I external/tf_serving/tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2020-09-29 14:23:02.110065: I external/tf_serving/tensorflow_serving/model_servers/server_core.cc:562]  (Re-)adding model: gptlm
2020-09-29 14:23:02.210513: I external/tf_serving/tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: gptlm version: 1}
2020-09-29 14:23:02.210607: I external/tf_serving/tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: gptlm version: 1}
2020-09-29 14:23:02.210631: I external/tf_serving/tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: gptlm version: 1}
I0929 14:23:02.211445 1 custom_bundle.cc:164] Creating instance gptlm_0_0_gpu0 on GPU 0 (6.1) using libgptlm.so
2020-09-29 14:23:02.219823: I external/tf_serving/tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: gptlm version: 1}
Trtis instance init start
plz set environment variable MODEL_ZOO !
E0929 14:23:10.055644 1 dynamic_batch_scheduler.cc:162] Initialization failed for dynamic-batch scheduler thread 0: initialize error for 'gptlm': (18) load gpt weight in .pb failed

This is the example given in the readme and also i want the config.pbtxt and weight file for the transformer model that is supported by you if possible
Thanks in advance !!!!!

difference inference result between TransformerEncoder and standard BertEncoder ?

Hi, thanks for your attention.

I have met this problem as below.

  • inference results A of BERT with HuggingFace Transformers (standard BertEncoder)
  • inference results B of BERT with LightSeq (TransformerEncoder)
  • There is a difference between A and B.
  • Specifically, even the first encoder layer output is different with same input vector.
  • I run BERT inference just like BERT run_ner.py which is replace BertEncoder with TransformerEncoder.

And, I noticed that the explanation in README.md as below.

Currently, Lightseq use TransformerEncoder in Bert finetune task, and it has a few differences with BertEncoder, which will influence performance of acc or f1. We will support standard BertEncoder ASAP.

So, can you help me figure out the difference?

  • Where is the difference? (in cuda kernel, or layer norm, or config, or activation func)?
  • Moreover, why is it different?

I want to speed up my bert model's inference time with your cool LightSeq, and the model has already been trained with HuggingFace Transformers

Thanks!

pytorch GPT模型显存OOM,cuda11不能运行

感谢你们的开源工作,我用自己的模型转化之后可以成功用lightseq成功预测,加速效果的确明显,但是vocab size只能为非常小的值,一旦过大,就会爆显存,实例模型的vocab size是5004,有相应解决方案吗?在3090上也不能用Lightseq,有支持3090的打算吗?

是否考虑支持Nezha

理由:nezha模型据我们内部测试效果要好于bert-wwm-ext这一类的bert模型,同时据我们所知faster transformer也暂时不支持nezha,onnx-runtime对nezha的加速也是普通优化,并未达到bert这么快的程度

support for MBART (big models)?

Hello, tThank you for your contribution. Howeverm I notice that all mbart models exceed 2GB. Do you have any plan to fix this issue?

Questions about gpu driver version

Hi there, this is a really nice library for inference. However, we encountered the following problem of insufficient CUDA driver version while we attempted to load the model file. Here are the logs:
model = lightseq.Transformer("lightseq_bart_base.pb", 128)
RuntimeError: [CUDA][ERROR] /workspace/pywrapper/transformer.cc.cu(43): CUDA driver version is insufficient for CUDA runtime version

Where we found that such library has a painfully high requirement of devices.
Is it possible for you to support wider version of device and driver situations for different environment.
Thank you very much!
Regards!

Example/Support of converting Fairseq Model to run in LightSeq

I am curious of trying LightSeq to speed up my inference for a vanilla Transformer Encoder-Decoder (Vasawani 17) model. My original model was trained with FairSeq (or OpenNMT-py). Is there any example or places that you can refer to help me convert my transformer model to the format compatible of running LightSeq?

GPT with beam search decoding,can' t get same results using same inputs

debug 无从查起,下面是详细的使用过程,求指点迷津,太感谢啦

  1. gpt.proto 文件如下,内容与 proto/gpt.proto 基本一致,只在message ModelConf {} section 中,增加了beam_size参数 int32 beam_size = 7;
message ModelConf {
  int32 head_num = 1; // head number for multi-head attention
  int32 src_padding_id = 2; // source padding id
  string sampling_method = 3; // choice of beam_search, topk, topp, topk_greedy
  float topp = 4; // parameter for topp sampling
  int32 topk = 5; // parameter for topk sampling
  int32 eos_id = 6;
  int32 beam_size = 7;  // beam size of beam search
}

  1. gpt.proto 生成 gpt_pb2.py
$ protoc --proto_path=./  --python_out=. /  gpt.proto
  1. 原始模型转为 gpt.pb

  2. 下载文件

    下载config.pbtxt
    wget https://github.com/bytedance/lightseq/releases/download/v0.0.1/v0.0.1_gptlm.config.pbtxt

    下载libgptgenerate.so
    wget https://github.com/bytedance/lightseq/releases/download/v1.1.0/v1.1.0_libs.tar.gz
    tar -xvf v1.1.0_libs.tar.gz
    mv libgptgenerate.so.fp32 libgptgenerate.so

  3. 组织文件

model_zoo
└── gpt_generation
    ├── 1
    │   └── libgptgenerate.so
    ├── config.pbtxt
    └── gpt.pb

CUBLAS_STATUS_INTERNAL_ERROR when run examples/training/fairseq/ls_fairseq_wmt14en2de.sh

When I run examples/training/fairseq/ls_fairseq_wmt14en2de.sh, CUBLAS_STATUS_INTERNAL_ERROR appears.

Here is the error log:

2021-06-30 15:37:49 | INFO | fairseq.tasks.translation | [en] dictionary: 40480 types
2021-06-30 15:37:49 | INFO | fairseq.tasks.translation | [de] dictionary: 42720 types
2021-06-30 15:37:49 | INFO | fairseq.data.data_utils | loaded 39414 examples from: /tmp/wmt14_en_de/valid.en-de.en
2021-06-30 15:37:49 | INFO | fairseq.data.data_utils | loaded 39414 examples from: /tmp/wmt14_en_de/valid.en-de.de
2021-06-30 15:37:49 | INFO | fairseq.tasks.translation | /tmp/wmt14_en_de/ valid en-de 39414 examples
Using /home/lxl/.cache/torch_extensions as PyTorch extensions root...
Using /home/lxl/.cache/torch_extensions as PyTorch extensions root...
Using /home/lxl/.cache/torch_extensions as PyTorch extensions root...
Using /home/lxl/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/lxl/.cache/torch_extensions/lightseq_layers/build.ninja...
Building extension module lightseq_layers...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module lightseq_layers...
Time to load lightseq_layers op: 0.5408718585968018 seconds
Loading extension module lightseq_layers...
Time to load lightseq_layers op: 0.4066014289855957 seconds
Loading extension module lightseq_layers...
Time to load lightseq_layers op: 0.507136344909668 seconds
Loading extension module lightseq_layers...
Time to load lightseq_layers op: 0.5070416927337646 seconds
Traceback (most recent call last):
  File "/home/lxl/.local/bin/lightseq-train", line 33, in <module>
    sys.exit(load_entry_point('lightseq', 'console_scripts', 'lightseq-train')())
  File "/home/lxl/workspace/lightseq/examples/training/fairseq/lightseq_fairseq_train_cli.py", line 10, in ls_cli_main
    cli_main(*args, **kwargs)
  File "/home/lxl/.local/lib/python3.7/site-packages/fairseq_cli/train.py", line 352, in cli_main
    distributed_utils.call_main(args, main)
  File "/home/lxl/.local/lib/python3.7/site-packages/fairseq/distributed_utils.py", line 286, in call_main
    nprocs=args.distributed_num_procs,
  File "/home/lxl/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/lxl/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/home/lxl/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/lxl/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/lxl/.local/lib/python3.7/site-packages/fairseq/distributed_utils.py", line 270, in distributed_main
    main(args, **kwargs)
  File "/home/lxl/.local/lib/python3.7/site-packages/fairseq_cli/train.py", line 68, in main
    model = task.build_model(args)
  File "/home/lxl/.local/lib/python3.7/site-packages/fairseq/tasks/translation.py", line 327, in build_model
    model = super().build_model(args)
  File "/home/lxl/.local/lib/python3.7/site-packages/fairseq/tasks/fairseq_task.py", line 547, in build_model
    model = models.build_model(args, self)
  File "/home/lxl/.local/lib/python3.7/site-packages/fairseq/models/__init__.py", line 58, in build_model
    return ARCH_MODEL_REGISTRY[model_cfg.arch].build_model(model_cfg, task)
  File "/home/lxl/workspace/lightseq/examples/training/fairseq/fs_modules/ls_transformer.py", line 137, in build_model
    args, src_dict, args.encoder_embed_dim, args.max_source_positions
  File "/home/lxl/workspace/lightseq/examples/training/fairseq/fs_modules/ls_transformer.py", line 159, in build_embedding
    emb = LSTransformerEmbeddingLayer(config)
  File "/home/lxl/workspace/lightseq/lightseq/training/ops/pytorch/transformer_embedding_layer.py", line 113, in __init__
    self.config.padding_idx,
RuntimeError: [CUDA][ERROR] /home/lxl/workspace/lightseq/lightseq/training/csrc/ops/includes/context.h(15): CUBLAS_STATUS_INTERNAL_ERROR

cat /home/lxl/workspace/lightseq/lightseq/training/csrc/ops/includes/context.h
#pragma once

#include <cublas_v2.h>
#include <cuda.h>

#include <iostream>
#include <string>

#include "cuda_util.h"

class Context {
 public:
  Context() : _stream(nullptr) {
    CHECK_GPU_ERROR(cublasCreate(&_cublasHandle));
  }

  virtual ~Context() {}

  static Context &Instance() {
    static Context _ctx;
    return _ctx;
  }

  void set_stream(cudaStream_t stream) {
    _stream = stream;
    CHECK_GPU_ERROR(cublasSetStream(_cublasHandle, _stream));
  }

  cudaStream_t get_stream() { return _stream; }

  cublasHandle_t get_cublashandle() { return _cublasHandle; }

 private:
  cudaStream_t _stream;
  cublasHandle_t _cublasHandle;
};

My pytorch is ok with cublas and matmul.
It seems cublasCreate failed, why?

RuntimeError: Parse weights from [lightseq_bart_base.hdf5] failed

When I tried to run the example case like this

python hf_bart_export.py
python ls_bart.py

It has some errors

initializing bart tokenizer...
creating lightseq model...
Traceback (most recent call last):
  File "ls_bart.py", line 102, in <module>
    main()
  File "ls_bart.py", line 69, in main
    ls_model = lsi.Transformer("lightseq_bart_base.hdf5", 128)
RuntimeError: Parse weights from [lightseq_bart_base.hdf5] failed.

Alright,I tried to run other case , huggingface gpt2 in examples:

python hf_gpt2_export.py
python ls_gpt.py

It had some error again:

initializing gpt tokenizer...
Downloading: 100%|███████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 1.81MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 1.36MB/s]
Downloading: 100%|███████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 2.29MB/s]
lightseq tokenizer pad token id: 50257
huggingface tokenizer pad token id: 50256
creating lightseq model...
Traceback (most recent call last):
  File "ls_gpt.py", line 119, in <module>
    main()
  File "ls_gpt.py", line 79, in main
    ls_model = lsi.Gpt("lightseq_gpt2_base.hdf5", max_batch_size=16)
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. lightseq.inference.Gpt(weight_path: str, max_batch_size: int, max_step: int)

Invoked with: 'lightseq_gpt2_base.hdf5'; kwargs: max_batch_size=16

I don't know how to fixed them. Can you give me some advices. thank you very much.

Setting sampling method as topk=8, there are more than 8 different values of first token with same input.

Hello, thank you very much for your help!

The experimental condition is:

  • set the sampling method of the lightseq GPT model to topk=8
  • keep the same input and generate 30 times repeatedly, and then remove the duplicates after 30 sentences are obtained.

It is found that there are more than 8 different values of first token in the sentences generated by the repeated experiment.

P.S. \[\w+\] in the picture are words in vocab.txt.

image

cudaErrorIllegalAddress when integrate lightseq.training.ops.pytorch.transformer_decoder_layer

Thanks for the training support for Lightseq, and I've succeeded integrated the encoder layer.

But it's quite frustrating when I try to integrate the decoder part, and couldn't make progress due to:

  • transform: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered or RuntimeError: CUDA error: an illegal memory access was encountered encountered, and make batch size smaller won't solve this problem.

Besides, the following issues are also annoying:

  • We have nhead in Config class, but config.heads are used in forward method
  • We have nlayer in Config class, but self.nlayer rather than self.config.nlayer are used in forward method

Environments:

  • Pytorch Version: 1.7.1
  • CUDA Version: 10.2

Is the code not correctly implementing self attention mask in LSTransformerDecoder?

In the c++ code for TransformerDecoderLayer, I see only one mask operation, which is integrated into the softmax kernel, but in Transformer there are two types of mask operations (key padding mask and attn mask). Decoder should achieve attn mask in self-attention in order not to see the sequence on the right, but I see that the mask parameter of softmax in self-attention of decoder is set to nullptr (code). Is the code not implementing these two mask operations correctly? Or is it hidden in some other code I didn't see?

In addition, I didn't find any code for position embedding in the code whether absolute position or relative position. Will the implementation of relative positions be considered later?

not able to run the gptlm_example.cu.cc given in the model directory

I wanted to run the gptlm_example.cu.cc in the model directory. As far i know to run the gptlm_example.cu.cc, we need two file 1) gpt.pb 2)test_case.
please correct me if i have understood wrong about the above one.

when i run the following command:
rkoy@rkoy:lightseq/lightseq/example$ g++ gptlm_example.cu.cc gpt.pb test_case
i get the following error:

In file included from /usr/include/cublas_v2.h:65:0,
                 from lightseq/lightseq/model/gpt_encoder.h:3,
                 from gptlm_example.cu.cc:3:
/usr/include/cublas_api.h:72:10: fatal error: driver_types.h: No such file or directory
 #include "driver_types.h"
          ^~~~~~~~~~~~~~~~
compilation terminated.

please let me know how i should resolve this

I am able to run the below command without any error
rkoy@rkoy:lightseq$ ./v1.0.0_libs/gptlm_example.fp32 ./v0.0.1_gptlm.pkg/gpt.pb ./v0.0.1_gptlm.pkg/test_case

I think the first and second command has the same functionality almost. Let me know if i have got anything wrong.
Thanks in advance!!!!!

Gpt exceeds maximum protobuf size of 2GB: 3096122166

when I use lightseq(2.0) export gpt2-large, it raises an error
ValueError: Message Gpt exceeds maximum protobuf size of 2GB: 3096122166

hf_gpt2_export.py is as follows

if __name__ == "__main__":
    output_lightseq_model_name = "lightseq_gpt2_large.pb"
    input_huggingface_gpt_model = "gpt2-large"
    head_number = 36
    # generation_method should be "topk" or "topp"
    generation_method = "topk"
    topk = 1
    topp = 0.75
    # default eos_id from https://huggingface.co/transformers/model_doc/gpt2.html#gpt2lmheadmodel
    eos_id = 50256
    pad_id = 50257
    extract_gpt_weights(
        output_lightseq_model_name,
        input_huggingface_gpt_model,
        head_num=head_number,  # layer number
        generation_method=generation_method,
        topk=topk,
        topp=topp,
        eos_id=eos_id,
        pad_id=pad_id,
    )
['transformer.h.34.mlp.c_proj.bias'] -> ffn_second_bias, shape: (1280,), convert finished.
['transformer.h.35.ln_1.weight'] -> multihead_norm_scale, shape: (1280,), convert finished.
['transformer.h.35.ln_1.bias'] -> multihead_norm_bias, shape: (1280,), convert finished.
['transformer.h.35.attn.c_attn.weight'] -> multihead_project_kernel_qkv, shape: (1280, 3840), convert finished.
['transformer.h.35.attn.c_attn.bias'] -> multihead_project_bias_qkv, shape: (3840,), convert finished.
['transformer.h.35.attn.c_proj.weight'] -> multihead_project_kernel_output, shape: (1280, 1280), convert finished.
['transformer.h.35.attn.c_proj.bias'] -> multihead_project_bias_output, shape: (1280,), convert finished.
['transformer.h.35.ln_2.weight'] -> ffn_norm_scale, shape: (1280,), convert finished.
['transformer.h.35.ln_2.bias'] -> ffn_norm_bias, shape: (1280,), convert finished.
['transformer.h.35.mlp.c_fc.weight'] -> ffn_first_kernel, shape: (1280, 5120), convert finished.
['transformer.h.35.mlp.c_fc.bias'] -> ffn_first_bias, shape: (5120,), convert finished.
['transformer.h.35.mlp.c_proj.weight'] -> ffn_second_kernel, shape: (5120, 1280), convert finished.
['transformer.h.35.mlp.c_proj.bias'] -> ffn_second_bias, shape: (1280,), convert finished.
['transformer.ln_f.weight'] -> norm_scale, shape: (1280,), convert finished.
['transformer.ln_f.bias'] -> norm_bias, shape: (1280,), convert finished.
['transformer.wte.weight'] -> token_embedding, shape: (50257, 1280), convert finished.
['transformer.wpe.weight'] -> position_embedding, shape: (1024, 1280), convert finished.
Wrting to lightseq_gpt2_large.pb
Traceback (most recent call last):
  File "hf_gpt2_export.py", line 127, in <module>
    pad_id=pad_id,
  File "hf_gpt2_export.py", line 100, in extract_gpt_weights
    fout.write(gpt.SerializeToString())
ValueError: Message Gpt exceeds maximum protobuf size of 2GB: 3096122166

How much are the inference error?

Hello, I want to know if I use lightseq to do transformer model inference, the error value between lightseq with pytorch transformer in FP16.
I don't find related document to explain about it.
Thank you! looking for your reply~

Is message ModelConf{} in gpt.proto editable?

Hi! I'm a newcomer of LightSeq.

I wanna add some params to the section "message ModelConf{} ", and I notice that the modelConf in lightseq/docs/export_model.md and lightseq/proto/transformer.proto differs:

lightseq/docs/export_model.md

message ModelConf {
  int32 head_num = 1;   // head number for multi-head attention
  int32 beam_size = 2;  // beam size of beam search
  int32 extra_decode_length =3;  // extra decode length compared with source length
  float length_penalty = 4;  // length penalty of beam search
  int32 src_padding_id = 5;  // source padding id
  int32 trg_start_id = 6;    // target start id
 }

lightseq/proto/transformer.proto

`message ModelConf {
  int32 head_num = 1;   // head number for multi-head attention
  int32 beam_size = 2;  // beam size of beam search
  int32 extra_decode_length =
      3;                     // extra decode length compared with source length
  float length_penalty = 4;  // length penalty of beam search
  int32 src_padding_id = 5;  // source padding id
  int32 trg_start_id = 6;    // target start id
  float diverse_lambda = 7; // diverse beam search lambda
  string sampling_method = 8; // choice of beam_search, topk, topp, topk_greedy
  float topp = 9; // parameter for topp sampling
  int32 topk = 10; // parameter for topk sampling
  int32 trg_end_id = 11; // eos of target embedding
  bool is_post_ln = 12; // Pre-LN or Post-LN
  bool no_scale_embedding = 13; // whether to scale embedding by sqrt(emb_dim)
  bool use_gelu = 14; // use gelu for activation otherwise relu
  // Whether it is a multilingual model.
  // If it is set to true, lang_emb and trg_vocab_mask should be non-empty.
  bool is_multilingual = 15;
}`

So I wonder that is message ModelConf{} in gpt.proto editable? Can you provide a set of addable parameters?

Thanks a lot for your time😊

是否考虑支持T5/GPT2模型?

您好:
感谢开源,试用了下lightseq, 感觉效果还是非常惊人的,提速10倍左右。
生成模型领域较好的T5/GPT2是否也有考虑支持呢?有具体时间计划吗?

How to implement in vision transformer?

Thx for the marvelous work!
It seems lightseq only integrates its operators into NLP models. Does it support vision transformers? What do we need to do for adopting lightseq in Vision Transformer works such as ViT or Swin Transformer?

Do you have export model detail example code?

Hi ! I want to try to use lightseq, but i am new to any inference speed up framework, such as tensorRT...So I have read all the docs but still don't know how to export my bert model weight, and turn it to .pb format..... Can you can me some example? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.