System Info transformers ve

Additionally, the total size only takes into account the first <code class="notranslat

Thanks for the report <a class="user-mention notranslate" data-hovercard-type="user" d

Model saving (via `.save_pretrained` or `.push_to_hub`) produces inconsistent shard sizes when some weights are offloaded about transformers HOT 2 OPEN

xenova commented on September 12, 2024

Model saving (via `.save_pretrained` or `.push_to_hub`) produces inconsistent shard sizes when some weights are offloaded

from transformers.

Comments (2)

xenova commented on September 12, 2024

Additionally, the total size only takes into account the first n-1 shards.

from transformers.

SunMarc commented on September 12, 2024

Thanks for the report @xenova ! The easiest solution would be to update the get_tensor_size function in huggingface_hub library as it doesn't "work" with meta tensor:

def get_tensor_size(tensor: "torch.Tensor") -> int:
    return tensor.numel() * tensor.element_size()

In accelerate, we have the following for example:

def id_tensor_storage(tensor: torch.Tensor) -> Tuple[torch.device, int, int]:
    """
    Unique identifier to a tensor storage. Multiple different tensors can share the same underlying storage. For
    example, "meta" tensors all share the same storage, and thus their identifier will all be equal. This identifier is
    guaranteed to be unique and constant for this tensor's storage during its lifetime. Two tensor storages with
    non-overlapping lifetimes may have the same id.
    """
    _SIZE = {
        torch.int64: 8,
        torch.float32: 4,
        torch.int32: 4,
        torch.bfloat16: 2,
        torch.float16: 2,
        torch.int16: 2,
        torch.uint8: 1,
        torch.int8: 1,
        torch.bool: 1,
        torch.float64: 8,
    }
    try:
        storage_ptr = tensor.untyped_storage().data_ptr()
        storage_size = tensor.untyped_storage().nbytes()
    except Exception:
        # Fallback for torch==1.10
        try:
            storage_ptr = tensor.storage().data_ptr()
            storage_size = tensor.storage().size() * _SIZE[tensor.dtype]
        except NotImplementedError:
            # Fallback for meta storage
            storage_ptr = 0
            # On torch >=2.0 this is the tensor size
            storage_size = tensor.nelement() * _SIZE[tensor.dtype]

    return tensor.device, storage_ptr, storage_size

This way, we will have the state dict properly splitted with the right tensor size. Note that the state_dict will contain meta tensors. But, we update the state dict afterwards using get_state_dict_from_offload (we can't do that before as the might not have enough storage on gpus+cpu because some layers are stored in the disk). LMK if this works for you @Wauplin !

from transformers.

Recommend Projects

Model saving (via `.save_pretrained` or `.push_to_hub`) produces inconsistent shard sizes when some weights are offloaded about transformers HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent