kyegomez / longnet Goto Github PK

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

Home Page: https://discord.gg/qUtxnK2NMf

License: Apache License 2.0

Python 100.00%

artificial-intelligence attention attention-is-all-you-need attention-mechanisms chatgpt gpt3 gpt4 machine-learning transformer context-length

longnet's Introduction

LongNet: Scaling Transformers to 1,000,000,000 Tokens

This is an open source implementation for the paper LongNet: Scaling Transformers to 1,000,000,000 Tokens by Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Furu Wei. The LongNet is a Transformer variant designed to scale sequence length up to more than 1 billion tokens without sacrificing performance on shorter sequences.

Installation

pip install longnet

Usage

Once you have installed LongNet, you can use the DilatedAttention class as follows:

import torch
from long_net import DilatedAttention


# model config
dim = 512
heads = 8
dilation_rate = 2
segment_size = 64

# input data
batch_size = 32
seq_len = 8192


# create model and data
model = DilatedAttention(dim, heads, dilation_rate, segment_size, qk_norm=True)
x = torch.randn((batch_size, seq_len, dim))

output = model(x)
print(output)

`LongNetTransformer`

A fully ready to train transformer model with dilated transformer blocks with Feedforwards with layernorm, SWIGLU, and a parallel transformer block

import torch
from long_net.model import LongNetTransformer

longnet = LongNetTransformer(
    num_tokens=20000,
    dim=512,
    depth=6,
    dim_head=64,
    heads=8,
    ff_mult=4,
)

tokens = torch.randint(0, 20000, (1, 512))
logits = longnet(tokens)
print(logits)

Train

To run a simple training run on the enwiki8 dataset, gitclone, install the requirements.txt, and then run python3 train.py

LongNet Summarized

Scaling sequence length has become a critical bottleneck in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. In this paper, they introduce LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. Specifically, they propose dilated attention, which expands the attentive field exponentially as the distance grows.

Features

LongNet has significant advantages:

It has a linear computation complexity and a logarithm dependency between tokens.
It can be served as a distributed trainer for extremely long sequences.
Its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing Transformer-based optimization.

Experiment results demonstrate that LongNet yields strong performance on both long-sequence modeling and general language tasks. Their work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence.

Citation

@inproceedings{ding2023longnet,
  title={LongNet: Scaling Transformers to 1,000,000,000 Tokens},
  author={Ding, Jiayu and Ma, Shuming and Dong, Li and Zhang, Xingxing and Huang, Shaohan and Wang, Wenhui and Wei, Furu},
  booktitle={Proceedings of the 10th International Conference on Learning Representations},
  year={2023}
}

Todo

Fix the ParallelTransformer Block's forward pass with dilated attn
Train on enwiki 8 and test
Create multihead iteration

longnet's People

Contributors

Stargazers

Watchers

longnet's Issues

Incorrect argument type passed into utils.sparsifyIndices()

Hi all,

I have just finished the paper and trying to understand the code. And I think there might be an issue in LongNet/attention.py. You initialized self.head_offsets to be a matrix, but you paseds this matrix to utils.sparsifyIndices() as the head_idx, which should be an integer.

And I suspect the head_idx is used for generating different sparcified pattern, but the head_idx remained the same in the function, which might caused all heads share the same pattern.

Just a little bit confused here.

AMD Support

Hello,
How can i use my amd gpu with longnet?

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

RuntimeError: shape '[32, 1, -1, 64, 512]' is invalid for input of size 524288

Hello, thank your great work.
I try to run example_main.py. Then raise error.
It looks like the length of the input sequence must be a multiple of dilation_rate * segment_size .

Traceback (most recent call last):
  File "D:\github_repo\LongNet\example_main.py", line 35, in <module>
    _ = attention(x)
        ^^^^^^^^^^^^
  File "Z:\Software\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\github_repo\LongNet\LongNet\attention.py", line 88, in forward
    x_ = x_.contiguous().view(batch_size, 1, -1, self.segment_size, self.d_model)  # Add an extra dimension for the number of heads
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[32, 1, -1, 64, 512]' is invalid for input of size 524288

Basemodel usage

Hey kyegomez,

I'm interested in trying out the implementation.
Is it already possible to use a basemodel for this?

pip install Error

when pip install, it got wrong:

Collecting git+https://github.com/kyegomez/LongNet.git
  Cloning https://github.com/kyegomez/LongNet.git to /tmp/pip-req-build-z5f85jrm
  Running command git clone --filter=blob:none --quiet https://github.com/kyegomez/LongNet.git /tmp/pip-req-build-z5f85jrm
  Resolved https://github.com/kyegomez/LongNet.git to commit a621fc8ed60520ee90bafee4ab26c13ea8bec19e
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [14 lines of output]
      Traceback (most recent call last):
        File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
          return hook(metadata_directory, config_settings)
        File "/tmp/pip-build-env-3lxviw8j/overlay/local/lib/python3.10/dist-packages/poetry/core/masonry/api.py", line 42, in prepare_metadata_for_build_wheel
          poetry = Factory().create_poetry(Path(".").resolve(), with_groups=False)
        File "/tmp/pip-build-env-3lxviw8j/overlay/local/lib/python3.10/dist-packages/poetry/core/factory.py", line 58, in create_poetry
          raise RuntimeError("The Poetry configuration is invalid:\n" + message)
      RuntimeError: The Poetry configuration is invalid:
        - data.documentation must be uri

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Any demo python I can play with?

Hi,

I have installed LongNet in my Ubuntu with 4090. (Is it enough to run LongNet?)
but when I type python example.py, there is error...
I have tried "pip install torchscale", no help, same error.. >_<
I am still reading the github and the paper, is there any demo python program I can play with first? Thx

Details (No error during LongNet installation)

pip install LongNet
Requirement already satisfied: LongNet in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (0.1.3)
Requirement already satisfied: torch in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from LongNet) (2.0.1)
Requirement already satisfied: einops in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from LongNet) (0.6.1)
Requirement already satisfied: flash-attn in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from LongNet) (1.0.5)
Requirement already satisfied: accelerate in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from LongNet) (0.20.3)
Requirement already satisfied: bitsandbytes in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from LongNet) (0.39.1)
Requirement already satisfied: fairscale in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from LongNet) (0.4.0)
Requirement already satisfied: timm in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from LongNet) (0.4.12)
Requirement already satisfied: flamingo-pytorch in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from LongNet) (0.1.2)
Requirement already satisfied: numpy>=1.17 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from accelerate->LongNet) (1.25.0)
Requirement already satisfied: packaging>=20.0 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from accelerate->LongNet) (23.1)
Requirement already satisfied: psutil in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from accelerate->LongNet) (5.9.5)
Requirement already satisfied: pyyaml in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from accelerate->LongNet) (6.0)
Requirement already satisfied: filelock in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (3.12.2)
Requirement already satisfied: typing-extensions in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (4.7.1)
Requirement already satisfied: sympy in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (1.12)
Requirement already satisfied: networkx in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (3.1)
Requirement already satisfied: jinja2 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (3.1.2)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (11.7.99)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (11.7.99)
Requirement already satisfied: nvidia-cuda-cupti-cu11==11.7.101 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (11.7.101)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (8.5.0.96)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (11.10.3.66)
Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (10.9.0.58)
Requirement already satisfied: nvidia-curand-cu11==10.2.10.91 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (10.2.10.91)
Requirement already satisfied: nvidia-cusolver-cu11==11.4.0.1 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (11.4.0.1)
Requirement already satisfied: nvidia-cusparse-cu11==11.7.4.91 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (11.7.4.91)
Requirement already satisfied: nvidia-nccl-cu11==2.14.3 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (2.14.3)
Requirement already satisfied: nvidia-nvtx-cu11==11.7.91 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (11.7.91)
Requirement already satisfied: triton==2.0.0 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torch->LongNet) (2.0.0)
Requirement already satisfied: setuptools in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch->LongNet) (67.8.0)
Requirement already satisfied: wheel in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch->LongNet) (0.38.4)
Requirement already satisfied: cmake in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from triton==2.0.0->torch->LongNet) (3.26.4)
Requirement already satisfied: lit in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from triton==2.0.0->torch->LongNet) (16.0.6)
Requirement already satisfied: einops-exts in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from flamingo-pytorch->LongNet) (0.0.4)
Requirement already satisfied: ninja in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from flash-attn->LongNet) (1.11.1)
Requirement already satisfied: torchvision in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from timm->LongNet) (0.15.2)
Requirement already satisfied: MarkupSafe>=2.0 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from jinja2->torch->LongNet) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from sympy->torch->LongNet) (1.3.0)
Requirement already satisfied: requests in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torchvision->timm->LongNet) (2.31.0)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from torchvision->timm->LongNet) (10.0.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from requests->torchvision->timm->LongNet) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from requests->torchvision->timm->LongNet) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from requests->torchvision->timm->LongNet) (2.0.3)
Requirement already satisfied: certifi>=2017.4.17 in /home/ak/anaconda3/envs/LongNet/lib/python3.9/site-packages (from requests->torchvision->timm->LongNet) (2023.5.7)


python example.py 
Traceback (most recent call last):
  File "/media/ak/HD/LongNet/example.py", line 3, in <module>
    from LongNet import DilatedAttention
  File "/media/ak/HD/LongNet/LongNet/__init__.py", line 6, in <module>
    from LongNet.attention import DilatedAttention
  File "/media/ak/HD/LongNet/LongNet/attention.py", line 6, in <module>
    from LongNet.torchscale import XPOS, RelativePositionBias
ImportError: cannot import name 'XPOS' from 'LongNet.torchscale' (unknown location)

Module not Found Error : 'packaging'

Hi, I got this error while installing through
pip install LongNet

Collecting flash-attn (from LongNet)
Using cached flash_attn-2.0.9.tar.gz (2.2 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [17 lines of output]
Traceback (most recent call last):
File "/home/tu/.local/lib/python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/home/tu/.local/lib/python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/tu/.local/lib/python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-7v4k1tnr/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-7v4k1tnr/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-7v4k1tnr/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 488, in run_setup
self).run_setup(setup_script=setup_script)
File "/tmp/pip-build-env-7v4k1tnr/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "", line 8, in
ModuleNotFoundError: No module named 'packaging'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

train error

I ran train.py and got error below

Traceback (most recent call last): File "/public/home/wangycgroup/public/02_Data/Internal/phage/train.py", line 86, in <module> loss = model(next(train_loader)) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 326, in forward logits = self.net(x_inp, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 272, in forward x = self.transformer(x) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 245, in forward x = block(x) + x File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/long_net/model.py", line 206, in forward attn = self.attn(q, k, v) File "/public/home/wangycgroup/wangjn/software/miniconda3/envs/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) TypeError: forward() takes 2 positional arguments but 4 were given

the output is

`No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Using StableAdamWUnfused-v1

training: 0%| | 0/100000 [00:00<?, ?it/s]
training: 0%| | 0/100000 [00:00<?, ?it/s]
`

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Training with gpus

could u tell me which version of accelerate and torch you choose

example.py does not work

RuntimeError Traceback (most recent call last)
in <cell line: 25>()
24 #test forward pass
25 with torch.no_grad():
---> 26 output = model(x)
27 print(f"Output shape: {output.shape}") # expected (batch_size, seq_Len)
28

4 frames
in apply_rotary_pos_emb(x, sin, cos, scale)
33 sin, cos = map(lambda t: duplicate_interleave(t * scale), (sin, cos))
34 # einsum notation for lambda t: repeat(t[offset:x.shape[1]+offset,:], "n d -> () n () (d j)", j=2)
---> 35 return (x * cos) + (rotate_every_two(x) * sin)
36
37

RuntimeError: The size of tensor a (512) must match the size of tensor b (64) at non-singleton dimension 2

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Train Error

(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py
2024-03-05 23:56:10,524 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-03-05 23:56:17.908409: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
Using StableAdamWUnfused-v1
training: 0%| | 0/100000 [00:01<?, ?it/s]

Traceback (most recent call last):
  File "/Users/personalinfo/LongNet/train.py", line 84, in <module>
    loss = model(next(train_loader))
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 356, in forward
    logits = self.net(x_inp, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 302, in forward
    x = self.transformer(x)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 271, in forward
    x = block(x) + x
RuntimeError: The size of tensor a (4128) must match the size of tensor b (8196) at non-singleton dimension 1
(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py
2024-03-06 00:09:22,364 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-03-06 00:09:27.673362: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Using StableAdamWUnfused-v1
training:   0%|                                                                                                                                | 0/100000 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/Users/personalinfo/LongNet/train.py", line 84, in <module>
    loss = model(next(train_loader))
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 356, in forward
    logits = self.net(x_inp, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 302, in forward
    x = self.transformer(x)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 271, in forward
    x = block(x) + x
RuntimeError: The size of tensor a (4128) must match the size of tensor b (8196) at non-singleton dimension 1

After setting up the environment, I ran 'python3 train.py' and this happened. Can you have a check? Thank you!

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

LongNet can be used for fine-tuning large language models?

If I want to use LongNet to fine-tune the already trained large language model, can it be implemented？

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

OutOfMemoryError

Non-A100 GPU detected, using math or mem efficient attention if input tensor is on cuda

OutOfMemoryError Traceback (most recent call last)
in <cell line: 22>()
20 #create model and data
21 model = DilatedAttention(d_model, num_heads, dilation_rate, segment_size).to(device)
---> 22 x = torch.randn((batch_size, seq_len, d_model), device=device, dtype=dtype)
23
24

OutOfMemoryError: CUDA out of memory. Tried to allocate 305.18 GiB (GPU 0; 14.75 GiB total capacity; 16.00 KiB already allocated; 14.09 GiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

KeyError: 'module.token_embs.0.gamma'

@kyegomez
hello bro!!
Thanks for your code
When i run training.py , something is wrong:
File "LongNet/training.py", line 364, in decoupled_optimizer no_decay_param.append(param_dict[param]) KeyError: 'module.token_embs.0.gamma'

Import issues

Hello,

Thank you for the great effort to provide a version of LongNet! 😃
I noticed that I have some import issues, for example in the LongNet/model.py file:

ModuleNotFoundError: No module named 'LongNet.transformer'

I think this is due to some modification of the imports that used to be from LongNet.Transformer import LongNet but was changed to from LongNet.transformer import LongNet

Would it be possible to have a check on that ? 🔧

Thank you!

ModuleNotFoundError: No module named 'LongNet'

Issue Description:

I encountered an issue while using the pip install LongNet command to install the LongNet package. When attempting to import a module using from LongNet.attention import DilatedAttention, I received the following error:

ModuleNotFoundError: No module named 'LongNet'

Upon investigating the installation directory of pip packages (/usr/local/lib/python3.11/dist-packages), I noticed that the actual installation directory for the LongNet library was named as longnet (all lowercase).

Suggested Resolution:

It appears that there might be a mismatch between the library's name and the installation directory. To resolve this issue, one of the following actions could be taken:

Rename the Installation Directory:
Rename the installation directory from longnet to LongNet to match the capitalization of the library's name. This would align the installation directory with the expected import statement.
Adjust Import Statements:
Modify the import statements within the library's codebase to use lowercase import paths, matching the installation directory. For example, change from LongNet.attention import DilatedAttention to from longnet.attention import DilatedAttention.

This was generated by ChatGPT based on my description.

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

The README usage code failed to run.

I tried running the README usage code in my Colab environment with an A100 GPU, but it appears to have failed. Please see the attached screenshot and my Colab code: https://colab.research.google.com/drive/1wU-O7kKk_Frq9q-bhXE87e-wNE47YZqV?usp=sharing

Thanks!

where to find any experiments on real dataset？

Where can I find some examples showing how to train longnet on a real dataset?

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

Link to official implementation, remove misleading citation

It's fine to have your own re-implementation but you need to acknowledge it as such, and link the official implementation and paper:

https://github.com/microsoft/unilm
https://arxiv.org/abs/2307.02486v1

To confirm that is indeed the correct implementation, you can look at paperswithcode and see it's listed under code, and microsoft's repo lists LongNet and links to the arxiv paper.

You also include the citation on this repository, which is very misleading as this is not the official implementation and is not at all associated with the paper.

For people who aren't familiar with this guy, he regularly tries to steal people's work and mislead people into thinking it's his own:
kyegomez/tree-of-thoughts#56
kyegomez/tree-of-thoughts#74
kyegomez/tree-of-thoughts#34 (providing the code and implementation before the authors did only means you are clout chasing and trying to mislead people)

cant install

:~/LongNet$ pip install -r requirements.txt
Requirement already satisfied: torch in /home/straughterguthrie/robust/lib/python3.10/site-packages (from -r requirements.txt (line 1)) (2.0.1)
Collecting einops
Using cached einops-0.6.1-py3-none-any.whl (42 kB)
Collecting flash_attn
Using cached flash_attn-1.0.8.tar.gz (2.0 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
Traceback (most recent call last):
File "/home/straughterguthrie/robust/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in
main()
File "/home/straughterguthrie/robust/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/straughterguthrie/robust/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 130, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-tjgu9b0f/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-tjgu9b0f/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-tjgu9b0f/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "", line 13, in
ModuleNotFoundError: No module named 'torch'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
(robust) straughterguthrie@straughterguthrie-OMEN-by-HP-Obelisk-Desktop-875-1xxx:~/LongNet$

kyegomez / longnet Goto Github PK

longnet's Introduction

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Installation

Usage

LongNetTransformer

Train

LongNet Summarized

Features

Citation

Todo

longnet's People

Contributors

Stargazers

Watchers

Forkers

longnet's Issues

Upvote & Fund

Upvote & Fund

Upvote & Fund

Upvote & Fund

Upvote & Fund

Upvote & Fund

Upvote & Fund

Non-A100 GPU detected, using math or mem efficient attention if input tensor is on cuda

Upvote & Fund

Upvote & Fund

Upvote & Fund

Recommend Projects

Recommend Topics

Recommend Org

`LongNetTransformer`