Hi, my environment is tf 1.13.1 . I already set up the fastertransformer v1 and used b

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

single example inference seems slow about fastertransformer HOT 8 CLOSED

nvidia commented on June 27, 2024

single example inference seems slow

from fastertransformer.

Comments (8)

byshiue commented on June 27, 2024

Please try the tensorflow_bert demo in v2.

from fastertransformer.

652994331 commented on June 27, 2024

@byshiue thank you for the quick reply, but any reason? which is part is different from v1 ? thanks

from fastertransformer.

byshiue commented on June 27, 2024

The encoder of v1 and v2 are same. And v1 also provide the tensorflow_bert demo, but we do not demonstrate how to use in the README. So, I recommend you run the tensorflow_bert following by the README of v2 first.
There are many possible reasons, and I cannot give the answer because we do not have enough information.

from fastertransformer.

652994331 commented on June 27, 2024

@ thank you, i went back to test result again, and i found, if we do not use fastertransformer, the original bert inference time for single example(not average inference time of a test files but one input) is around 15ms. so Actually we reduce the time from 15ms -> 9ms, so fastertransformer works.

I was thinking about one thing. for original bert without fastertransformer, we can use export model to save mode as pb and use feature to inference. it will reduce time a lot, Can we use export model here with faster transformer? like how?

thanks so much

from fastertransformer.

byshiue commented on June 27, 2024

Yes. There are two ways.
First, you can restore the checkpoint, and get the variables by tf.get_tensor_by_name or other similar function, and put them into FasterTransformer. If you put the variables by the tf.tensor format, then the overhead of constructor of FasterTransformer would be smaller because it does not need to copy the memory.
Another way is put the weights as the numpy format. In this way, the overhead of constructor would be large, but there is no effect for inference time.

from fastertransformer.

652994331 commented on June 27, 2024

@byshiue thank you.. but kinda overwhelming.... i am new for tensorflow and bert. Is there any code i can reference?

from fastertransformer.

byshiue commented on June 27, 2024

You can first try by sample/tensorflow/encoder_sample.py and sample/tensorflow/utils/encoder.py.
This is an easy environment to verify the correctness and the inference speed.
For example, you can try to replace the "encoder_vars[val_off + 0]" of encoder.py by "tf.get_default_graph().get_tensor_by_name('layer_%d/attention/self/query/kernel:0' % layer_idx)".

Another sample is, using sess.run(all_var) to get the values of all variables as numpy format, and then put them into the FasteTransformer op.

After you understand how to use the FasterTransformer, you can modify the sample of tensorflow_bert to run the test on the BERT.

from fastertransformer.

byshiue commented on June 27, 2024

closing due to inactivity

from fastertransformer.

single example inference seems slow about fastertransformer HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent