Comments (5)
Please make sure you use the latest PyTorch version. You can try pytorch-nightly.
from chakra.
But when I use the command conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch-nightly -c nvidia
to install pytorch-nightly, I found that there is only the cpu version of pytorch:
conda list
:
pytorch 2.4.0.dev20240610 py3.11_cpu_0 pytorch-nightly
from chakra.
After installing the latest version of pytorch from the source code, I did collect record function id, but found that they were all 0.
kineto_trace_0.json :
"distributedInfo": {"backend": "nccl", "rank": 0, "world_size": 2, "pg_count": 1, "pg_config": [{"pg_name": "0", "pg_desc": "default_pg", "backend_config": "cuda:nccl", "pg_size": 2, "ranks": [0, 1]}], "nccl_version": "2.20.5"},
"traceEvents": [
{
"ph": "X", "cat": "cpu_op", "name": "autograd::engine::evaluate_function: NllLossBackward0", "pid": 19449, "tid": 19871,
"ts": 6200919497584.082, "dur": 69.113,
"args": {
"External id": 2561,"Record function id": 0, "Sequence number": 1357, "Fwd thread id": 1, "Ev Idx": 0
}
},
{
"ph": "X", "cat": "cpu_op", "name": "NllLossBackward0", "pid": 19449, "tid": 19871,
"ts": 6200919497588.947, "dur": 57.355,
"args": {
"External id": 2562,"Record function id": 0, "Sequence number": 1357, "Fwd thread id": 1, "Ev Idx": 1
}
},
{
"ph": "f", "id": 1, "pid": 19449, "tid": 19871, "ts": 6200919497588.947,
"cat": "fwdbwd", "name": "fwdbwd", "bp": "e"
},
{
"ph": "X", "cat": "cpu_op", "name": "aten::nll_loss_backward", "pid": 19449, "tid": 19871,
"ts": 6200919497593.060, "dur": 52.436,
"args": {
"External id": 2563,"Record function id": 0, "Ev Idx": 2
}
},
{
"ph": "X", "cat": "cpu_op", "name": "aten::zero_", "pid": 19449, "tid": 19871,
"ts": 6200919497606.506, "dur": 27.350,
"args": {
"External id": 2564,"Record function id": 0, "Ev Idx": 3
}
},
from chakra.
Hi, @XcodeRole. I am not sure why this is happening. Could you please create an issue in PyTorch since it is related to the PyTorch profiler?
from chakra.
Thank you for your reply. I have solved the problem. Using on_trace_ready=torch.profiler.tensorboard_trace_handler to collect kineto trace, the record function id behavior becomes normal. When using prof.export_chrome_trace(f"kineto_trace.json"), the record function id information is lost.
from chakra.
Related Issues (20)
- nccl:send not found HOT 3
- lack of attribute 'parent' HOT 3
- Information redundancy HOT 4
- Missing Module for Execution Trace Converter HOT 1
- [Tutorial] Many nodes have a common parent node, but the node doesn't exist in PyTorch ET. HOT 7
- record_param_comms HOT 2
- Improving node time duration resolution HOT 1
- Segmentfault when running ns3 simulation HOT 1
- more traces? HOT 6
- may i ask is there any tutorial or example for this project? HOT 3
- How to distinguish communication domains between different communication (ET) node? HOT 2
- how to use chakra_trace_link? HOT 3
- Can't convert text use et.converter HOT 9
- the Converted Text file can't be visualized by et_visualizer HOT 3
- some questions about generating trace HOT 1
- Question about FlexFlow Feature
- ET & KT merge through chakra_trace_link (ET+) does not contain timing information HOT 1
- Error when using chakra_converter HOT 2
- Warnings in `trace_link.py` when running Chakra on AMD GPUs HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chakra.