Comments (8)
Please try the tensorflow_bert demo in v2.
from fastertransformer.
@byshiue thank you for the quick reply, but any reason? which is part is different from v1 ? thanks
from fastertransformer.
The encoder of v1 and v2 are same. And v1 also provide the tensorflow_bert demo, but we do not demonstrate how to use in the README. So, I recommend you run the tensorflow_bert following by the README of v2 first.
There are many possible reasons, and I cannot give the answer because we do not have enough information.
from fastertransformer.
@ thank you, i went back to test result again, and i found, if we do not use fastertransformer, the original bert inference time for single example(not average inference time of a test files but one input) is around 15ms. so Actually we reduce the time from 15ms -> 9ms, so fastertransformer works.
I was thinking about one thing. for original bert without fastertransformer, we can use export model to save mode as pb and use feature to inference. it will reduce time a lot, Can we use export model here with faster transformer? like how?
thanks so much
from fastertransformer.
Yes. There are two ways.
First, you can restore the checkpoint, and get the variables by tf.get_tensor_by_name or other similar function, and put them into FasterTransformer. If you put the variables by the tf.tensor format, then the overhead of constructor of FasterTransformer would be smaller because it does not need to copy the memory.
Another way is put the weights as the numpy format. In this way, the overhead of constructor would be large, but there is no effect for inference time.
from fastertransformer.
@byshiue thank you.. but kinda overwhelming.... i am new for tensorflow and bert. Is there any code i can reference?
from fastertransformer.
You can first try by sample/tensorflow/encoder_sample.py
and sample/tensorflow/utils/encoder.py
.
This is an easy environment to verify the correctness and the inference speed.
For example, you can try to replace the "encoder_vars[val_off + 0]" of encoder.py
by "tf.get_default_graph().get_tensor_by_name('layer_%d/attention/self/query/kernel:0' % layer_idx)".
Another sample is, using sess.run(all_var) to get the values of all variables as numpy format, and then put them into the FasteTransformer op.
After you understand how to use the FasterTransformer, you can modify the sample of tensorflow_bert to run the test on the BERT.
from fastertransformer.
closing due to inactivity
from fastertransformer.
Related Issues (20)
- Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification?
- Supporting for expert parallelism in MoE inference
- Is llama2 70b supported? Do you know minimal configuration? HOT 1
- How to serving multi-gpu inference? HOT 1
- How to get started?
- Sparsity support
- repetition_penalty logic in FT has bug HOT 1
- can support decoder only bart? such as MBartForCausalLM
- error You need C++17 to compile PyTorch
- Does FasterTransformer support multi-stream pipeline parallelism ?
- multi_block_mode performance issue HOT 1
- Confidence is not returned in the decoding example?
- on H800 can not exec nvidia/pytorch:23.09-py3 container success
- Are `fuseQKV masked attention` and Flash Attention the same?
- what is the mean of EFF-FT?
- How to know the correspondence between versions vcr.io/nvidia/pytorch:xx.xx-py3 and pytorch?
- error: ‘CUDNN_DATA_BFLOAT16’ was not declared in this scope; did you mean ‘CUDNN_DATA_FLOAT’
- bug: memory of position_encoding_table is not malloced correctly.
- can be used in diffusion models,like sd and sdxl? how?where is the demos?tks
- An error occurred for the specific cuda version
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastertransformer.