Giter Club home page Giter Club logo

lit-llama's People

Contributors

andrei-aksionov avatar aniketmaurya avatar awaelchli avatar borda avatar carmocca avatar chris-alexiuk-1 avatar derekjuba-nist avatar dnhkng avatar dspinellis avatar ever2after avatar gkroiz avatar gregor-soniox avatar h4dr1en avatar joaopalotti avatar lantiga avatar laurentmazare avatar lucas-ventura avatar lun-4 avatar mentoc3000 avatar mosheber avatar rasbt avatar robieta avatar rubenfricke avatar t-vi avatar thequert avatar timothylimyl avatar vihangd avatar waitzkin avatar williamfalcon avatar wlsdnen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lit-llama's Issues

AssertionError

$ python3 generate.py --quantize llm.int8 --prompt "Hello, my name is"

error message

ubuntu 22.04

Traceback (most recent call last):
  File "generate.py", line 159, in <module>
    CLI(main)
  File "/home/yongun/.local/lib/python3.8/site-packages/jsonargparse/cli.py", line 82, in CLI
    return _run_component(component, cfg_init)
  File "/home/yongun/.local/lib/python3.8/site-packages/jsonargparse/cli.py", line 138, in _run_component
    return component(**cfg)
  File "generate.py", line 104, in main
    assert checkpoint_path.is_file()
AssertionError

Reproducing alpaca

python scripts/prepare_alpaca.py fails to run if I don't run python setup.py install first.

Traceback (most recent call last):
  File "/home/gregor/experiments/lit-llama/scripts/prepare_alpaca.py", line 9, in <module>
    from lit_llama.tokenizer import Tokenizer
ModuleNotFoundError: No module named 'lit_llama'

TPU

Does this code work on TPU?

Expected is_sm80 to be true, but got false

I tried running the finetuning scripts on a 3090 GPU and got this error:

/home/adrian/repositories/lightning-llama/lit_llama/model.py:43: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at ../aten/src/ATen/EmptyTensor.cpp:31.)
  ).to(complex_dtype)
Traceback (most recent call last):
  File "/home/adrian/repositories/lightning-llama/finetune_adapter.py", line 201, in <module>
    main()
  File "/home/adrian/repositories/lightning-llama/finetune_adapter.py", line 67, in main
    train(fabric, model, optimizer, train_data, val_data)
  File "/home/adrian/repositories/lightning-llama/finetune_adapter.py", line 97, in train
    fabric.backward(loss / gradient_accumulation_steps)
  File "/home/adrian/anaconda3/envs/lit-llama/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 365, in backward
    self._precision.backward(tensor, module, *args, **kwargs)
  File "/home/adrian/anaconda3/envs/lit-llama/lib/python3.10/site-packages/lightning/fabric/plugins/precision/amp.py", line 70, in backward
    super().backward(tensor, model, *args, **kwargs)
  File "/home/adrian/anaconda3/envs/lit-llama/lib/python3.10/site-packages/lightning/fabric/plugins/precision/precision.py", line 81, in backward
    tensor.backward(*args, **kwargs)
  File "/home/adrian/anaconda3/envs/lit-llama/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/adrian/anaconda3/envs/lit-llama/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Expected is_sm80 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

This was on the branch of #100 where I added the EmptyInitOnDevice() context manager. It looks like the conversion to complex_dtype caused problems in the backward.

Both

python finetune_lora.py

and

python finetune_adapter.py

fail with this error.

Precommit hooks

If you like, I can add some precommit hooks for automatic linting before making this public

Politically Kind License Wording

Hi,

This seems like a really wonderful project for which I am thankful.

I personally support the GPL and don’t see it as preventing academic or commercial use at all.

When I read the wording in the readme that the GPL prevents these things, and is a problem to be solved (rather than a solution to problems), I feel pain. I don’t understand why or how these things would be. The function of the GPL is to keep derivative works open source. Do academic institutions need to keep their source code private?

Would you be willing to cite where these opinions stem from, and/or state the expression as opinion rather than fact?

Cannot save checkpoint while train.py on single GPU

I got following error at utils.py while saving checkpoint while pre-training on single GPU. Any hint how should I fix it? Thanks.

state_dict = model._forward_module.state_dict()                            

NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet

This is a brazen insult to open source community

Meta released LLLaMA under GPLv3, you took their code did some irrelevant changes and claim it to be yours and relicensed it. You could also do the same to other open source contributors, steal their work and rebrand it as yours. As a company, you also ignored the safety issue from LLaMA and just want to promote your framework without putting any safety measures. Shame on you.

Typo? "7B require ~26 GB of GPU memory (A100 GPU)."

Is the following a typo or the lit-llama implementation requires vastly more vram than original implementation? 7B fits natively on a single 3090 24G gpu in original llama implementation.

This will run the 7B model and require ~26 GB of GPU memory (A100 GPU).

Convergence of LLaMA-adapter

Dear Sir,

I find that the implementation of LLaMa-adapter can not converge. The loss function keeps around 0.8-1.0.

I wonder if you can solve this?

And thanks for your attention.

Missing rope_cache for model with lora

Hi, according to the #81, the rope_cache argument has been removed from CausalSelfAttention in the model.py. I think the same process should also be done in the lora.py

How to finetune with the multi-GPU

Hi, I'm wondering how to change the code for multi-GPU finetuning. Currently, I tried
fabric = L.Fabric(devices=4, accelerator="gpu", strategy="ddp")
But I encounter an error about initialization:
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

AttributeError: module 'lightning' has no attribute 'Fabric'

$ python generate.py --prompt "Hello, my name is"

Traceback (most recent call last):
  File "generate.py", line 148, in <module>
    CLI(main)
  File "/axp/aida/data/platformds/aiservices/conda/envs/llama/lib/python3.7/site-packages/jsonargparse/cli.py", line 82, in CLI
    return _run_component(component, cfg_init)
  File "/axp/aida/data/platformds/aiservices/conda/envs/llama/lib/python3.7/site-packages/jsonargparse/cli.py", line 138, in _run_component
    return component(**cfg)
  File "generate.py", line 104, in main
    fabric = L.Fabric(accelerator=accelerator, devices=1)
AttributeError: module 'lightning' has no attribute 'Fabric'

How to fine tune llama with peft?

I have a dataset, I try the openai embading but they are not good. However I want to fine tune lllama with peft on single consumer gpu.

So, how to do this ?

Loading the 13B checkpoint

Hi!

I was trying to load the 13B model checkpoint but it seems that there is a mismatch between dimensions:

e.g:

size mismatch for transformer.h.39.mlp.c_proj.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).

Error in convert_checkpoint.py when converting 13B weights

Whenever I try to convert the 13B weights (unmodified) sourced from the dalai llama download, the first checkpoint successfully completes; however, the second checkpoint fails to convert.

I am using this command:

python scripts/convert_checkpoint.py
--output_dir checkpoints/lit-llama
--ckpt_dir dalai/llama/models
--tokenizer_path /dalai/llama/models/tokenizer.model
--model_size 13B

And receive the following error:

python scripts/convert_checkpoint.py --output_dir checkpoints/lit-llama --ckpt_dir dalai/llama/models --tokenizer_path dalai/llama/models/tokenizer.model --model_size 13B
50%|███████████████████████████████████████████ | 1/2 [00:32<00:32, 32.09s/it]

Killed

Question: about left padding

if the model is padded to the left and its a casual language model does that mean that padding tokens will receive attention from the rest of the sequence?

Should there me a mask to prevent tokens to attend to padding?

Readme wording

Readme wording suggestion:

Before:

  • Simple, single-file, no boilerplate
  • Numerically equivalent to the original model
  • Optimized to run on consumer hardware or at scale
  • Open-source no strings attached

Suggested:

  • Simple, single-file implementaton without boilerplate
  • Numerically equivalent to the original model
  • Optimized to run on consumer hardware or at scale
  • Open-source and no strings attached

Saving FSDP Model

I was trying to change the fine-tuning to use FSDP training but currently there is any way to save the checkpoint.

Saving in train.py is commented and the saving in finetune.pt only supports Lora.

I am trying to compare full fine-tuning with Lora. For full fine-tuning using FSDP my checkpoint is saved with the embedding layer and lm_head in a single flat_param which prevents me to load the checkpoint afterwards. How can I recover the original model architecture and load that checkpoint into a single GPU for inference ?

Vocabulary size when training the tokenizer

For all I can tell, this will likely use the sentencepiece defaults (i.e., 8000 tokens vocabulary size) whereas Llama was supposedly trained using a vocabulary size of 32000?

@staticmethod
def train(input: str, destination: str) -> None:
model_prefix = os.path.join(destination, "tokenizer")
SentencePieceTrainer.Train(input=input, model_prefix=model_prefix)

My suggestion is to

  • add a parameter to set the vocabulary size
  • set the default value to 32000 to match Llama?

How to use deepspeed zero-3-offload strategy correctly? (Parameters Duplication Issue)

Hi, I wonder how to write the code for using the deepspeed zero-3-offload strategy correctly. Currently, my code looks like:

from lightning.fabric.strategies import DeepSpeedStrategy
deep_speed = DeepSpeedStrategy(
                    stage=3,
                    offload_optimizer=True,
                    offload_parameters=True,
                )
fabric = L.Fabric(accelerator="gpu", devices=num_devices,strategy=deep_speed)

However, it seems the parameters are duplicated for all gpu. I attached the screenshot to show the GPU utilization after model, optimizer = fabric.setup(model, optimizer):

Selection_282

According to my understanding, the parameters should be distributed on different devices, right?

Use FlashAttention with LLaMA-Adapter

Looking at the LLaMA-Adapter implementation, at https://github.com/Lightning-AI/lit-llama/blob/main/lit_llama/adapter.py#L91

            # inefficient attention because we need to insert the gate for the adaption in the middle
            aT = prefix.size(1)
            _, ak, av = self.c_attn(prefix).split(self.n_embd, dim=2)
            ak = ak.view(1, aT, self.n_head, head_size).repeat(B, 1, 1, 1).transpose(1, 2)
            av = av.view(1, aT, self.n_head, head_size).repeat(B, 1, 1, 1).transpose(1, 2)

            ascores = torch.matmul(q, ak.transpose(2, 3)) / math.sqrt(self.n_embd)
            ascores = self.gating_factor * F.softmax(ascores.float(), dim=-1).type_as(q)
            y = y + torch.matmul(ascores, av)

it looks to me we could replace the above with

            a_mask = torch.ones(aT, aT, dtype=torch.bool)
            ay = F.scaled_dot_product_attention(q, ak, av, attn_mask=a_mask, dropout_p=0.0, is_causal=False)
            y = y + self.gating_factor * ay

since

(gating * softmax(ascores)) @ av

is equivalent to

gating * (softmax(ascores) @ av)

Model compilation support

With FSDP currently the code could not be run.

If you try to add model compilation to the training like:

...
fabric = L.Fabric(accelerator="cuda", devices=8, precision="bf16-mixed", strategy=strategy)
fabric.launch()
...

model = fabric.setup_module(model)
# compile() goes should go wrapping as per https://github.com/huggingface/transformers/commit/fb0a38b4f275727d6228fb4a78c15c6dd8480e91
# Though it does not work either even if goes before setup_module() as you'll get the same issue (see below)
model = torch.compile(model)

optimizer = torch.optim.AdamW(model.parameters(), ...)
optimizer = fabric.setup_optimizers(optimizer)
...

train(model, ...)

and try it via:

lightning run model --accelerator=cuda --devices=8 train.py ...

you'll get:

  File ".../.venv/lib/python3.8/site-packages/torch/_dynamo/variables/builder.py", line 172, in __call__
    return self._wrap(value).clone(**self.options())
  File ".../.venv/lib/python3.8/site-packages/torch/_dynamo/variables/builder.py", line 345, in _wrap
    assert getattr(
AssertionError: Dynamo only supports FSDP with use_orig_params=True

If I pass use_orig_params = True into the FSDPStrategy() constructor you get:

ValueError: The optimizer does not seem to reference any FSDP parameters. HINT: Make sure to create the optimizer after setting up the model.

So, then one can remove optimizer = fabric.setup_optimizers(optimizer) as it is anyway no-op for FSDP, but even in this case I see:

from user code:
   File ".../.venv/lib/python3.8/site-packages/lightning_utilities/core/apply_func.py", line 75, in apply_to_collection
    is_namedtuple_ = is_namedtuple(data)

Also I was not able to find any tests of compiled model with the FSDP neither here nor here.

I wonder if anyone was able to successfully launch compiled model in a FSDP regime? Thanks a lot for the help!


P.S.: if i try to run similar code using HuggingFace trainer I run into the exact same AssertionError: Dynamo only supports FSDP with use_orig_params=True :)

error when run [python3 generate.py --quantize true ]: undefined symbol: cget_col_row_stats

u20@u20:~/lit-llama$ python3 generate.py --quantize true --prompt "Hello, my name is"
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/home/u20/.local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
warn(f"Failed to load image Python extension: {e}")
Loading model ...
Traceback (most recent call last):
File "generate.py", line 147, in
CLI(main)
File "/home/u20/.local/lib/python3.8/site-packages/jsonargparse/cli.py", line 82, in CLI
return _run_component(component, cfg_init)
File "/home/u20/.local/lib/python3.8/site-packages/jsonargparse/cli.py", line 138, in _run_component
return component(**cfg)
File "generate.py", line 108, in main
model = LLaMA.from_name(model_size)
File "/home/u20/lit-llama/lit_llama/model.py", line 223, in from_name
return cls(LLaMAConfig.from_name(name))
File "/home/u20/lit-llama/lit_llama/model.py", line 179, in init
self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)
File "/home/u20/lit-llama/lit_llama/quantization.py", line 31, in init
self._quantize_weight(self.weight.data)
File "/home/u20/lit-llama/lit_llama/quantization.py", line 48, in _quantize_weight
CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
File "/home/u20/.local/lib/python3.8/site-packages/bitsandbytes/functional.py", line 1616, in double_quant
row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
File "/home/u20/.local/lib/python3.8/site-packages/bitsandbytes/functional.py", line 1505, in get_colrow_absmax
lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr
func = self.getitem(name)
File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home/u20/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

Step method?

What is the purpose of the step method? It doesn't seem to be used anywhere. Can it be a) removed for the sake of simplicity? Or b) used where the loss is correctly calculated using F.cross_crossentropy elsewhere?

def step(self, idx: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
logits = self(idx)
loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1), ignore_index=-1)
return loss

Command python generate.py --quantize true --prompt "Hello, my name is" Not working

Getting below error while executing the given command.

Loading model ...
Traceback (most recent call last):
  File "generate.py", line 147, in <module>
    CLI(main)
  File "/mnt/hdd1/rajeevy/anaconda3/envs/lama/lib/python3.8/site-packages/jsonargparse/cli.py", line 82, in CLI
    return _run_component(component, cfg_init)
  File "/mnt/hdd1/rajeevy/anaconda3/envs/lama/lib/python3.8/site-packages/jsonargparse/cli.py", line 138, in _run_component
    return component(**cfg)
  File "generate.py", line 110, in main
    model.load_state_dict(checkpoint)
  File "/mnt/hdd1/rajeevy/anaconda3/envs/lama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2027, in load_state_dict
    load(self, state_dict)
  File "/mnt/hdd1/rajeevy/anaconda3/envs/lama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "/mnt/hdd1/rajeevy/anaconda3/envs/lama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2009, in load
    module._load_from_state_dict(
  File "/mnt/hdd1/rajeevy/nlp/Lama/lit-llama/lit_llama/quantization.py", line 35, in _load_from_state_dict
    weight_key = next(name for name in local_state_dict.keys() if name.endswith("weight"))
StopIteration```

src folder

For better readability etc., do we want to reorganize this with the code in a src folder or llama subfolder? Since there's already a setup.py it would probably be more organized and readable this way

Apache - 2.0 - Commercial License

Hey team,
Thanks for releasing the code and repo under Apache-2.0

I'm still wondering though, as to how this would be truly open-sourced and commercialisable, if we're still loading official Llama weights (under GPL License) and converting them into Lit-Llama weights?

Or does it only mean that if we train from scratch using this code instead, without using the official Llama weights, then the end model could be used for commercial purposes?

Please help clarify.
TIA.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.