Giter Club home page Giter Club logo

Comments (6)

Edenzzzz avatar Edenzzzz commented on July 17, 2024

This is very weird as TorchScipt explicitly forbids using Sequence annotator https://pytorch.org/docs/stable/jit_language_reference.html#supported-type
I also ran with NGC docker with torch 2.1.0; didn't work

from megatron-lm.

bentherien avatar bentherien commented on July 17, 2024

Can confirm I get the same error using

  • PyTorch 2.3.1
  • Megatron-LM e33c8f7
  • CUDA 12.1

from megatron-lm.

Edenzzzz avatar Edenzzzz commented on July 17, 2024

Can confirm I get the same error using

  • PyTorch 2.3.1
  • Megatron-LM e33c8f7
  • CUDA 12.1

While they didn't say this, using the newest NVIDIA PyTorch container (torch 2.4) seems to work

from megatron-lm.

deepakn94 avatar deepakn94 commented on July 17, 2024

Thanks for pointing this out. We will have a fix shortly.

from megatron-lm.

divisionblur avatar divisionblur commented on July 17, 2024

When I was processing the training dataset for GPT using the tools/preprocess_data.py script, I encountered this issue.

'vocab_range_from_per_partition_vocab_size' is being compiled since it was called from 'calculate_predicted_logits'

from megatron-lm.

divisionblur avatar divisionblur commented on July 17, 2024

python tools/preprocess_data.py
--input data/oscar-1GB.jsonl
--output-prefix my-gpt3
--vocab-file gpt2-vocab.json
--tokenizer-type GPT2BPETokenizer
--merge-file gpt2-merges.txt
--append-eod
Traceback (most recent call last):
File "/mnt/users/lihai/gpt3/code/Megatron-LM/tools/preprocess_data.py", line 23, in
from megatron.training.tokenizer import build_tokenizer
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/training/init.py", line 16, in
from .initialize import initialize_megatron
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/training/initialize.py", line 18, in
from megatron.training.arguments import parse_args, validate_args
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/training/arguments.py", line 14, in
from megatron.core.models.retro.utils import (
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/retro/init.py", line 12, in
from .decoder_spec import get_retro_decoder_block_spec
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/retro/decoder_spec.py", line 9, in
from megatron.core.models.gpt.gpt_layer_specs import (
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/gpt/init.py", line 1, in
from .gpt_model import GPTModel
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 13, in
from megatron.core.models.common.language_module.language_module import LanguageModule
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/models/common/language_module/language_module.py", line 9, in
from megatron.core.fusions.fused_cross_entropy import fused_vocab_parallel_cross_entropy
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/fusions/fused_cross_entropy.py", line 27, in
def calculate_predicted_logits(
File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_script.py", line 1381, in script
fn = torch._C._jit_script_compile(
File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_recursive.py", line 1010, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_script.py", line 1381, in script
fn = torch._C._jit_script_compile(
File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_recursive.py", line 1010, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File "/mnt/users/lihai/miniconda3/envs/gpt3/lib/python3.9/site-packages/torch/jit/_script.py", line 1381, in script
fn = torch._C._jit_script_compile(
RuntimeError:
Unknown type constructor Sequence:
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/tensor_parallel/utils.py", line 106
def vocab_range_from_per_partition_vocab_size(
per_partition_vocab_size: int, rank, world_size: int
) -> Sequence[int]:
~~~~~~~~~~~~~ <--- HERE
index_f = rank * per_partition_vocab_size
index_l = index_f + per_partition_vocab_size
'vocab_range_from_per_partition_vocab_size' is being compiled since it was called from 'calculate_predicted_logits'
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/tensor_parallel/cross_entropy.py", line 41

    # Get the partition's vocab indices
    get_vocab_range = VocabUtility.vocab_range_from_per_partition_vocab_size
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    partition_vocab_size = vocab_parallel_logits.size()[-1]
    rank = get_tensor_model_parallel_rank()

'calculate_predicted_logits' is being compiled since it was called from 'calculate_predicted_logits'
File "/mnt/users/lihai/gpt3/code/Megatron-LM/megatron/core/fusions/fused_cross_entropy.py", line 31
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:

(
~
    target_mask,
    ~~~~~~~~~~~~
    masked_target_1d,
    ~~~~~~~~~~~~~~~~~
    predicted_logits,
    ~~~~~~~~~~~~~~~~~
    sum_exp_logits,
    ~~~~~~~~~~~~~~~
    exp_logits,
    ~~~~~~~~~~~
) = VocabParallelCrossEntropy.calculate_predicted_logits(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    vocab_parallel_logits, target, logits_max
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
)

from megatron-lm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.