Comments (2)
Thanks for your issue. Could you try pulling the most recent repo? I fixed this last week.
from colossalai.
Thanks for the answer, I pulled colossalai==0.3.7, when torch==2.2.1,The following error occurs
File "/opt/conda/lib/python3.9/site-packages/colossalai/auto_parallel/tensor_shard/initialize.py", line 355, in autoparallelize
rst_to_unpack = initialize_model(
File "/opt/conda/lib/python3.9/site-packages/colossalai/auto_parallel/tensor_shard/initialize.py", line 265, in initialize_model
gm = ColoGraphModule(model, graph, model.__class__.__name__)
File "/opt/conda/lib/python3.9/site-packages/colossalai/_analyzer/fx/graph_module.py", line 110, in __init__
super().__init__(root, graph, class_name)
File "/opt/conda/lib/python3.9/site-packages/torch/fx/graph_module.py", line 428, in __init__
self.graph = graph
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in __setattr__
super().__setattr__(name, value)
File "/opt/conda/lib/python3.9/site-packages/torch/fx/graph_module.py", line 472, in graph
self.recompile()
File "/opt/conda/lib/python3.9/site-packages/colossalai/_analyzer/fx/graph_module.py", line 141, in recompile
python_code = self._graph.python_code(root_module="self")
File "/opt/conda/lib/python3.9/site-packages/torch/fx/graph.py", line 1328, in python_code
return self._python_code(root_module, namespace, verbose=verbose)
File "/opt/conda/lib/python3.9/site-packages/torch/fx/graph.py", line 1331, in _python_code
return self._codegen._gen_python_code(self.nodes, root_module, namespace, verbose=verbose)
File "/opt/conda/lib/python3.9/site-packages/colossalai/_analyzer/fx/codegen.py", line 472, in _gen_python_code
return PythonCode(fn_code, globals_)
TypeError: __init__() missing 1 required positional argument: '_lineno_map'
when the torch==2.1.1,The following error occurs
File "/opt/conda/lib/python3.9/site-packages/colossalai/auto_parallel/tensor_shard/initialize.py", line 355, in autoparallelize
rst_to_unpack = initialize_model(
File "/opt/conda/lib/python3.9/site-packages/colossalai/auto_parallel/tensor_shard/initialize.py", line 267, in initialize_model
shape_prop_pass(gm, *meta_args.values())
File "/opt/conda/lib/python3.9/site-packages/colossalai/_analyzer/fx/passes/shape_prop.py", line 269, in shape_prop_pass
ShapeProp(module).propagate(*args, device=_current_device(module))
File "/opt/conda/lib/python3.9/site-packages/colossalai/_analyzer/fx/passes/shape_prop.py", line 253, in propagate
return super().run(*tree_map(wrap_fn, args))
File "/opt/conda/lib/python3.9/site-packages/torch/fx/interpreter.py", line 138, in run
self.env[node] = self.run_node(node)
File "/opt/conda/lib/python3.9/site-packages/colossalai/_analyzer/fx/passes/shape_prop.py", line 116, in run_node
r = getattr(self, n.op)(n.target, args, kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/fx/interpreter.py", line 312, in call_module
return submod(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/opt/conda/lib/python3.9/site-packages/torch/nn/functional.py", line 2202, in embedding
return handle_torch_function(
File "/opt/conda/lib/python3.9/site-packages/torch/overrides.py", line 1577, in handle_torch_function
result = torch_func_method(public_api, types, args, kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/_tensor.py", line 1386, in __torch_function__
ret = func(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/functional.py", line 2233, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
File "/opt/conda/lib/python3.9/site-packages/colossalai/_analyzer/_subclasses/meta_tensor.py", line 113, in __torch_dispatch__
ret = func(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/_ops.py", line 448, in __call__
return self._op(*args, **kwargs or {})
File "/opt/conda/lib/python3.9/site-packages/torch/_decomp/decompositions.py", line 1141, in embedding
return weight[indices]
File "/opt/conda/lib/python3.9/site-packages/torch/_meta_registrations.py", line 2790, in meta_index_Tensor
return self.new_empty(before_shape + replacement_shape + after_shape)
File "/opt/conda/lib/python3.9/site-packages/torch/_refs/__init__.py", line 4483, in new_empty
return torch.empty(
File "/opt/conda/lib/python3.9/site-packages/colossalai/_analyzer/_subclasses/meta_tensor.py", line 188, in _new
return MetaTensor(
File "/opt/conda/lib/python3.9/site-packages/colossalai/_analyzer/_subclasses/meta_tensor.py", line 60, in __new__
r = torch.Tensor._make_wrapper_subclass(
RuntimeError: !check_has_torch_dispatch(obj) INTERNAL ASSERT FAILED at "../torch/csrc/autograd/python_variable.cpp":1934, please report a bug to PyTorch. While HermeticPyObject was enabled, we attempted to create a tensor subclass with __torch_dispatch__. This violates the invariant that operations in HermeticPyObject have equivalent C++ implementations. If your operator registered from Python operator registration isn't doing anything strange, there may be an internal PyTorch bug involving not appropriately disabling TorchDispatchMode before executing Python op registration.
While executing %transformer_wte : [num_users=1] = call_module[target=transformer.wte](args = (%view,), kwargs = {})
Original traceback:
None
from colossalai.
Related Issues (20)
- Use gemini plugin and LowLevelZero to run llama2_7b. In the pulgin in gemini, set the policy to static, shard_param_frac, offload_optim_frac, and offload_param_frac to 0.0, making gemini equal to zero2, and set stage to 2 in LowLevelZero. Using bf16 for training, and comparing the two plugins, we found that the GPU memory usage of gemini is higher than that of LowLevelZero. Why is this? In principle, gemini should save more GPU memory HOT 2
- [FEATURE]: Support Command-R model
- [BUG]: Command-R 8 GPU Pytest failure
- [FEATURE]: Support T5ForTokenClassification
- [FEATURE]: Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM
- [BUG]: loading OPT 66B model - CPU runs out of memory HOT 13
- [BUG]: Colossal AI failed to load ChatGLM2 HOT 2
- [BUG]: ColossalChat train sft is skipped with opt-1.3b model HOT 5
- [FEATURE]: Support SP+PP in Llama etc. HOT 1
- [compatibility] support torch 2.2
- [FEATURE]: [PyTorch] per-channel FP8 quantization
- [DOC]: macos 不可以运行吗请问 HOT 2
- [Feature]: [PyTorch] FP8 all-reduce using all-to-all and all-gather
- 2024 机场跑路名单 HOT 1
- training issue HOT 1
- Whether to support the training acceleration of the StableDiffusion3 algorithm model? HOT 1
- [BUG]: run opt inference but failed with No module named 'energonai'
- [BUG]: pip install colossalai, pip install . produces an exit code: 1
- [PROPOSAL]: Does the LowLevelZero Plugin Support Lora, This Code Is Confusing HOT 1
- [BUG]: Low_Level_Zero plugin crashes with LoRA HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from colossalai.