Describe the bug Using torch 2.1.1, running bash examples/bert/tr

When I was processing the training dataset for GPT using the <code class="notranslate"

[BUG] @jit_fuser fails with Unknown type constructor Sequence about megatron-lm HOT 6 OPEN

Edenzzzz commented on July 17, 2024 3

[BUG] @jit_fuser fails with Unknown type constructor Sequence

from megatron-lm.

Comments (6)

Edenzzzz commented on July 17, 2024

This is very weird as TorchScipt explicitly forbids using Sequence annotator https://pytorch.org/docs/stable/jit_language_reference.html#supported-type
I also ran with NGC docker with torch 2.1.0; didn't work

from megatron-lm.

bentherien commented on July 17, 2024

Can confirm I get the same error using

PyTorch 2.3.1
Megatron-LM e33c8f7
CUDA 12.1

from megatron-lm.

Edenzzzz commented on July 17, 2024

Can confirm I get the same error using

PyTorch 2.3.1

Megatron-LM e33c8f7

CUDA 12.1

While they didn't say this, using the newest NVIDIA PyTorch container (torch 2.4) seems to work

from megatron-lm.

deepakn94 commented on July 17, 2024

Thanks for pointing this out. We will have a fix shortly.

from megatron-lm.

divisionblur commented on July 17, 2024

When I was processing the training dataset for GPT using the tools/preprocess_data.py script, I encountered this issue.

'vocab_range_from_per_partition_vocab_size' is being compiled since it was called from 'calculate_predicted_logits'

from megatron-lm.

divisionblur commented on July 17, 2024

python tools/preprocess_data.py
--input data/oscar-1GB.jsonl
--output-prefix my-gpt3
--vocab-file gpt2-vocab.json
--tokenizer-type GPT2BPETokenizer
--merge-file gpt2-merges.txt
--append-eod
Traceback (most recent call last):
File "/mnt/users/lihai/gpt3/code/Megatron-LM/tools/preprocess_data.py", line 23, in
from megatron.training.tokenizer import build_tokenizer
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/training/init.py", line 16, in
from .initialize import initialize_megatron
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/training/initialize.py", line 18, in
from megatron.training.arguments import parse_args, validate_args
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/training/arguments.py", line 14, in
from megatron.core.models.retro.utils import (
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/retro/init.py", line 12, in
from .decoder_spec import get_retro_decoder_block_spec
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/retro/decoder_spec.py", line 9, in
from megatron.core.models.gpt.gpt_layer_specs import (
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/gpt/init.py", line 1, in
from .gpt_model import GPTModel
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 13, in
from megatron.core.models.common.language_module.language_module import LanguageModule
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/common/language_module/language_module.py", line 9, in
from megatron.core.fusions.fused_cross_entropy import fused_vocab_parallel_cross_entropy
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/fusions/fused_cross_entropy.py", line 27, in
def calculate_predicted_logits(
File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_script.py", line 1381, in script
fn = torch._C._jit_script_compile(
File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_recursive.py", line 1010, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_script.py", line 1381, in script
fn = torch._C._jit_script_compile(
File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_recursive.py", line 1010, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_script.py", line 1381, in script
fn = torch._C._jit_script_compile(
RuntimeError:
Unknown type constructor Sequence:
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/tensor_parallel/utils.py", line 106
def vocab_range_from_per_partition_vocab_size(
per_partition_vocab_size: int, rank, world_size: int
) -> Sequence[int]:
~~~~~~~~~~~~~ <--- HERE
index_f = rank * per_partition_vocab_size
index_l = index_f + per_partition_vocab_size
'vocab_range_from_per_partition_vocab_size' is being compiled since it was called from 'calculate_predicted_logits'
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/tensor_parallel/cross_entropy.py", line 41

    # Get the partition's vocab indices
    get_vocab_range = VocabUtility.vocab_range_from_per_partition_vocab_size
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    partition_vocab_size = vocab_parallel_logits.size()[-1]
    rank = get_tensor_model_parallel_rank()

'calculate_predicted_logits' is being compiled since it was called from 'calculate_predicted_logits'
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/fusions/fused_cross_entropy.py", line 31
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:

(
~
    target_mask,
    ~~~~~~~~~~~~
    masked_target_1d,
    ~~~~~~~~~~~~~~~~~
    predicted_logits,
    ~~~~~~~~~~~~~~~~~
    sum_exp_logits,
    ~~~~~~~~~~~~~~~
    exp_logits,
    ~~~~~~~~~~~
) = VocabParallelCrossEntropy.calculate_predicted_logits(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    vocab_parallel_logits, target, logits_max
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
)

from megatron-lm.

[BUG] @jit_fuser fails with Unknown type constructor Sequence about megatron-lm HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent