Comments (7)
It might be caused by some cache conflicts introduced by the recent fix. Could you try to remove all the existing intermediate cache-related files like below:
rm -rf ~/.aitemplate/*
rm -rf ./tmp/profiler/*
And re-run the example? Sorry for the inconvenience. Thanks!
from aitemplate.
cc @asroy
from aitemplate.
ok trying again
from aitemplate.
If it is 4K seq & batch > 128 case the fail is expected because one dim exceed the limits of a gemm + permutation range.
from aitemplate.
Seems to have failed in the same place
2022-10-06 23:38:40,017 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> Profile: gemm_rcr_bias_permute_m2n3_11233: M == 524288 && N == 2304 && K == 768
2022-10-06 23:38:40,018 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling gemm_rcr_bias_permute_m2n3_11233
Traceback (most recent call last):
File "benchmark_ait.py", line 298, in <module>
compile_and_benchmark()
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "benchmark_ait.py", line 284, in compile_and_benchmark
mod = compile_module(
File "benchmark_ait.py", line 210, in compile_module
mod = compile_model(y, target, "./tmp", model_name)
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/compiler.py", line 176, in compile_model
compiler.transform.profile(
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/transform/profile.py", line 67, in profile
func.profile(
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/ops/gemm_universal/gemm_common.py", line 675, in profile
best_algo, workspace, split_k = self._profile_single_workload(
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/ops/gemm_universal/gemm_common.py", line 593, in _profile_single_workload
raise RuntimeError(
RuntimeError: Profile workload: failed. Results: [].
from aitemplate.
Seems to have failed in the same place
2022-10-06 23:38:40,017 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> Profile: gemm_rcr_bias_permute_m2n3_11233: M == 524288 && N == 2304 && K == 768 2022-10-06 23:38:40,018 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling gemm_rcr_bias_permute_m2n3_11233 Traceback (most recent call last): File "benchmark_ait.py", line 298, in <module> compile_and_benchmark() File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "benchmark_ait.py", line 284, in compile_and_benchmark mod = compile_module( File "benchmark_ait.py", line 210, in compile_module mod = compile_model(y, target, "./tmp", model_name) File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/compiler.py", line 176, in compile_model compiler.transform.profile( File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/transform/profile.py", line 67, in profile func.profile( File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/ops/gemm_universal/gemm_common.py", line 675, in profile best_algo, workspace, split_k = self._profile_single_workload( File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/ops/gemm_universal/gemm_common.py", line 593, in _profile_single_workload raise RuntimeError( RuntimeError: Profile workload: failed. Results: [].
Yes what is batch size and sequence length in this model?
from aitemplate.
Yeah that's the issue. I was just running with defaults so didn't notice it went from tuning to a full inference loop phase.
# python3 benchmark_ait.py
...
[23:34:55] ./tmp/BERT_fast_gelu_64_4096/model_container.cpp:471: Benchmark runtime ms/iter: 944.369
[23:36:30] ./tmp/BERT_fast_gelu_64_4096/model_container.cpp:471: Benchmark runtime ms/iter: 944.431
[23:38:04] ./tmp/BERT_fast_gelu_64_4096/model_container.cpp:471: Benchmark runtime ms/iter: 944.373
batch_size: 64, seq_length: 4096, latency: 944.4016723632812
output_0 shape: [128, 4096, 768]
2022-10-06 23:38:04,564 INFO <aitemplate.backend.target> Loading profile cache from: /root/.aitemplate/rocm.db
INFO:/usr/local/lib/python3.8/dist-packages/aitemplate/utils/graph_utils.py:Dumped toposort graph to ./tmp/BERT_fast_gelu_128_4096/toposort_graph.txt
INFO:/usr/local/lib/python3.8/dist-packages/aitemplate/utils/graph_utils.py:Dumped toposort pseudo code to ./tmp/BERT_fast_gelu_128_4096/toposort_pseudo_code.txt
from aitemplate.
Related Issues (20)
- How do I make stable diffusion with AIT work with any latent resolution? HOT 5
- Compile Diffusers Community Pipelines HOT 1
- Slicing tensor along a fixed dimension HOT 2
- Dynamic Resolution for Control-Net HOT 1
- Option for choosing fp32 gemm backend implementation HOT 4
- Error in installing fx2ait setup.py
- Compile error while transplanting flash attention2 to AITemplate HOT 3
- Demo for using aitemplate for transformers based LLMs
- Could AITemplate support mix precision inference? HOT 2
- Does AIT support BF16 inference now? HOT 4
- Add Support for intel arc GPUs 🥹 HOT 1
- VAEdecode and AITemplate dont get along, slower latent decoding HOT 3
- GFX1100 Support HOT 3
- making template of sdxl HOT 1
- Ops unit tests fail on ROCm HOT 1
- fx2ait low performance HOT 1
- op b2b_bmm report an error "no module named cutlass_lib' HOT 3
- AMD MI210 ResNet test error HOT 3
- Problem running ControlNet example with AITemplate HOT 1
- sm90/sm90a Hopper architecture Incompatibility HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aitemplate.