Giter Club home page Giter Club logo

Comments (9)

moyix avatar moyix commented on April 24, 2024 2

I don't know of a good guide to fine-tuning unfortunately! One of my colleagues, @shailja-thakur, has fine-tuned CodeGen on Verilog code, but it takes a lot of VRAM to fine-tune the 16B model (we had to use 80GB A100s).

The --dataset_name is just the location of the code you want to train on in a format that Huggingface Datasets recognizes. The simplest is probably to use JSONL format – a JSON file with one dictionary per line, using the format:

{"text": "content_of_source_file_1", "url": "path_to_source_file_1"}
{"text": "content_of_source_file_2", "url": "path_to_source_file_2"}
...

(You can add other keys if you want; the only field used by the training script is text, but I find it helpful to include some extra metadata so I can keep track of where the code came from.)

You can see an example of a dataset I put together of C/C++ code found in Debian here: https://huggingface.co/datasets/moyix/debian_csrc

I would not expect the bigger models to get much better from being fine-tuned a relatively small amount of code, but the smallest models (like 350M) might benefit from seeing your code.

Also note that it is still a bit tricky to get a custom model working – you'll have to run the conversion from HF to FasterTransformers after training it, and create a configuration file for the new model (there is a script for this in the converter directory: https://github.com/moyix/fauxpilot/blob/main/converter/triton_config_gen.py).

from fauxpilot.

shailja-thakur avatar shailja-thakur commented on April 24, 2024 1

from fauxpilot.

leemgs avatar leemgs commented on April 24, 2024 1

AttributeError: 'CodeGenAttention' object has no attribute 'causal_mask'

FIXED. I figured out what was causing this problem. It was because the versions I learned and tried to sample were different. This problem has been resolved by using the most recent Transformer's latest version (e.g. 4.25.0.dev0) and incorrect weights in the config.json file. My report will be useful to anyone who may have a similar difficulty in the near future. 😄

  • The model card informaiton : fine-tuned Codegen-350M-multi model
    • /mylab/fine-tuning-codegen/codegen-350M-finetuned$ cat ./README.md

license: bsd-3-clause

tags:

  • generated_from_trainer
    datasets:
  • moyix/debian_csrc
    model-index:
  • name: codegen-350M-finetuned
    results: []

This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment.

codegen-350M-finetuned

This model is a fine-tuned version of Salesforce/codegen-350M-multi on the moyix/debian_csrc dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Training results

Framework versions

  • Transformers 4.25.0.dev0
  • Pytorch 1.13.0
  • Datasets 2.6.1
  • Tokenizers 0.11.0

from fauxpilot.

leemgs avatar leemgs commented on April 24, 2024

I would not expect the bigger models to get much better from being fine-tuned a relatively small amount of code, but the smallest models (like 350M) might benefit from seeing your code.

Yepp, I think so. :)

from fauxpilot.

leemgs avatar leemgs commented on April 24, 2024

I don't know of a good guide to fine-tuning unfortunately! One of my colleagues, @shailja-thakur, has fine-tuned CodeGen on Verilog code, but it takes a lot of VRAM to fine-tune the 16B model (we had to use 80GB A100s).

@moyix, @shailja-thakur, I got the unexpected OOM issue (e.g., torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 198.00 MiB (GPU 0; 11.90 GiB total capacity; 10.55 GiB already allocated; 200.50 MiB free; 10.70 GiB reserved in total by PyTorch) while running the fine-tuning task with the smallest model (e.g., 350M) and your debian dataset on my Ubuntu 22.04 (DRAM 32GB)+ Nvidia GPU Xp (Vram 12GB).

Have you had a similar experience? Did you have to utilize Nvidia A100 VRAM 80GB (or 40GB) at the time, even if you tried to fine-tune tasks using the smallest model, such as the 350M? Can we try to change the 'ds config.json' file to reduce the memory consumption of the GPU VRAM in order to complete the fine-tuning operation successfully? Any feedback will be appreciated.

  • Screenshot:
$ my-codegen-350m-deepspeed-finetune.sh
     ......... OMISSION ..........
[INFO|trainer.py:1608] 2022-11-04 11:17:11,278 >> ***** Running training *****
[INFO|trainer.py:1609] 2022-11-04 11:17:11,278 >>   Num examples = 3786289
[INFO|trainer.py:1610] 2022-11-04 11:17:11,278 >>   Num Epochs = 1
[INFO|trainer.py:1611] 2022-11-04 11:17:11,278 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:1612] 2022-11-04 11:17:11,278 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1613] 2022-11-04 11:17:11,278 >>   Gradient Accumulation steps = 32
[INFO|trainer.py:1614] 2022-11-04 11:17:11,278 >>   Total optimization steps = 118321
[INFO|trainer.py:1615] 2022-11-04 11:17:11,278 >>   Number of trainable parameters = 354858103
  0%|                                                                                                                                                                                                      /work/qtlab/transformers/src/transformers/models/codegen/modeling_codegen.py:167: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version
  attn_weights = torch.where(causal_mask, attn_weights, mask_value)
Traceback (most recent call last):
  File "/work/qtlab/./transformers/examples/pytorch/language-modeling/run_clm.py", line 580, in <module>
    main()
  File "/work/qtlab/./transformers/examples/pytorch/language-modeling/run_clm.py", line 528, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/work/qtlab/transformers/src/transformers/trainer.py", line 1501, in train
    return inner_training_loop(
  File "/work/qtlab/transformers/src/transformers/trainer.py", line 1749, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/work/qtlab/transformers/src/transformers/trainer.py", line 2508, in training_step
    loss = self.compute_loss(model, inputs)
  File "/work/qtlab/transformers/src/transformers/trainer.py", line 2540, in compute_loss
    outputs = model(**inputs)
  File "/home/invain/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/invain/anaconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
    return func(*args, **kwargs)
  File "/home/invain/anaconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1680, in forward
    loss = self.module(*inputs, **kwargs)
  File "/home/invain/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/work/qtlab/transformers/src/transformers/models/codegen/modeling_codegen.py", line 711, in forward
    lm_logits = self.lm_head(hidden_states).to(torch.float32)
  File "/home/invain/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/invain/.local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 198.00 MiB (GPU 0; 11.90 GiB total capacity; 10.55 GiB already allocated; 200.50 MiB free; 10.70 GiB reserved in total by PyTorch) If re
  0%|
[2022-11-04 11:17:13,621] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3296
[2022-11-04 11:17:13,621] [ERROR] [launch.py:324:sigkill_handler] ['/home/invain/anaconda3/envs/deepspeed/bin/python', '-u', './run_clm.py', '--local_rank= 'moyix/debian_csrc', '--tokenizer_name', 'Salesforce/codegen-350M-multi', '--block_size', '2048', '--gradient_accumulation_steps', '32', '--do_train', '--fp16', '--overwrite_output_dir', '--deepspeed',

real    94m15.273s
user    461m18.611s
sys     3m52.003s

from fauxpilot.

leemgs avatar leemgs commented on April 24, 2024

Can you share your my-codegen-350m-deepspeed-
finetune.sh, ds_config.json, and the size of the training data, so I get an
idea of what could be happening in your case?

@shailja-thakur, Here, I don't know why this training strategy still gives a CUDA-out-of-memory issue on out-of-date Nvidia GPU (e.g., VRAM 12GB).

  • fine-tune option with deepspeed framework (e.g., my-codegen-350m-deepspeed-finetune.sh)
    • 12th Gen Intel Core i7 + DRAM 31GB + Nvidia Titan Xp (VRAM 12GB) : It's failed due to CUDA-OOM 😭
    • 12th Gen Intel Core i7 + DRAM 31GB + Nvidia A100 (VRAM 80GB) : It's succeeded thanks to VRAM 80GB 😄
 --num_gpus 1 --num_nodes 1 $RUN_CLM --model_name_or_path=Salesforce/codegen-${PARAM_SIZE}-multi \
 --per_device_train_batch_size=1 --learning_rate 2e-5 --num_train_epochs 1 \
 --output_dir=./codegen-${PARAM_SIZE}-finetuned --dataset_name $MY_DATASET \
 --tokenizer_name Salesforce/codegen-${PARAM_SIZE}-multi  \
 --block_size 2048 --gradient_accumulation_steps 32 --do_train --fp16 --overwrite_output_dir \
 --deepspeed $DS_CONFIG
  • ds_config.json
    "zero_optimization": {
        "stage": 2,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "allgather_partitions": true,
        "allgather_bucket_size": 2e8,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": 2e8,
        "contiguous_gradients": true
    },

    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 2000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false

  • the size of the training data
    • 153G ~/.cache/huggingface/datasets/moyix___parquet/

At that time, I concentrated on Parameters, Gradients, Optimizer States to avoid CUDA-OOM issue on Nvidia GPU (with VRAM 12GB). However, I could not still find a recipe to avoid CUDA-OOM issue on Nvidia GPU VRAM 12GB.

image

from fauxpilot.

leemgs avatar leemgs commented on April 24, 2024

12th Gen Intel Core i7 + DRAM 31GB + Nvidia Titan Xp (VRAM 12GB) : It's failed due to CUDA-OOM 😭
12th Gen Intel Core i7 + DRAM 31GB + Nvidia A100 (VRAM 80GB) : It's succeeded thanks to VRAM 80GB 😄

@shailja-thakur, Are there any hints or clues to work on Fine-Tune on NVIDIA TITAN XP? I tried various things, but I failed. So now, in my case, I use the high -performance GPU (e.g. NVIDIA A100 (VRAM 80GB) to avoid the CUDA room reported above.

from fauxpilot.

leemgs avatar leemgs commented on April 24, 2024

Also note that it is still a bit tricky to get a custom model working
– you'll have to run the conversion from HF to FasterTransformers after training it,

@moyix, First of all, thank you for sharing your experiences.
Thanks to your sharing, I could create a Fine-tuned model (e.g., codegen-350M-multi-finetuned) as follows.

$ tree ./codegen-350M-multi-finetuned/
./codegen-350M-multi-finetuned/
├── added_tokens.json
├── all_results.json
├── config.json
├── merges.txt
├── pytorch_model.bin
├── README.md
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
├── trainer_state.json
├── training_args.bin
├── train_results.json
└── vocab.json

$ ls -al ./codegen-350M-multi-finetuned/
total 778380
drwxr-xr-x 2 leemgs leemgs      4096 Nov 10 16:40 .
drwxr-xr-x 6 leemgs leemgs      4096 Nov 10 16:43 ..
-rw-r--r-- 1 leemgs leemgs      1080 Nov 10 16:31 added_tokens.json
-rw-r--r-- 1 leemgs leemgs       582 Nov 10 16:31 all_results.json
-rw-r--r-- 1 leemgs leemgs      1011 Nov 10 16:31 config.json
-rw-r--r-- 1 leemgs leemgs    456356 Nov 10 16:31 merges.txt
-rw-r--r-- 1 leemgs leemgs 793630000 Nov 10 16:31 pytorch_model.bin
-rw-r--r-- 1 leemgs leemgs      1149 Nov 10 16:31 README.md
-rw-r--r-- 1 leemgs leemgs        99 Nov 10 16:31 special_tokens_map.json
-rw-r--r-- 1 leemgs leemgs       283 Nov 10 16:31 tokenizer_config.json
-rw-r--r-- 1 leemgs leemgs   2114827 Nov 10 16:31 tokenizer.json
-rw-r--r-- 1 leemgs leemgs       998 Nov 10 16:31 trainer_state.json
-rw-r--r-- 1 leemgs leemgs      4539 Nov 10 16:31 training_args.bin
-rw-r--r-- 1 leemgs leemgs       582 Nov 10 16:31 train_results.json
-rw-r--r-- 1 leemgs leemgs    798156 Nov 10 16:31 vocab.json
(deepspeed) leemgs@ai02:~/qtlab/CodeGen/checkpoints$

Using the generated fined-tuned model, I performed the "def hello_word" test.
Currently, I have read the official CodeGen documentation as follows:

However, I meet an unexpected error message like this:

  • error message: 'CodeGenAttention' object has no attribute 'causal_mask'
    I am perplexed as to why the "pytorch model.bin" file I prepared throughout the fine-tuning process is incompatible.
    I believe that any feedback or experience on this error message will be helpful.
(.venv) $ python3 -m jaxformer.hf.sample --model codegen-350M-multi --context "def hello_world():"


loading parameters
loading parameters took 9.95s
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/home/leemgs/qtlab/CodeGen/jaxformer/hf/sample.py", line 253, in <module>
    main()
  File "/data/home/leemgs/qtlab/CodeGen/jaxformer/hf/sample.py", line 225, in main
    model = create_model(ckpt=ckpt, fp16=use_fp16).to(device)
  File "/data/home/leemgs/qtlab/CodeGen/jaxformer/hf/sample.py", line 63, in create_model
    return CodeGenForCausalLM.from_pretrained(ckpt, revision='float16', torch_dtype=torch.float16, low_cpu_mem_usage=True)
  File "/data/home/leemgs/qtlab/CodeGen/.venv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1526, in from_pretrained
    cls._load_state_dict_into_model_low_mem(model, loaded_state_dict_keys, resolved_archive_file)
  File "/data/home/leemgs/qtlab/CodeGen/.venv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1786, in _load_state_dict_into_model_low_mem
    new_val = getattr(submodule, param_name)
  File "/data/home/leemgs/qtlab/CodeGen/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'CodeGenAttention' object has no attribute 'causal_mask'

from fauxpilot.

leemgs avatar leemgs commented on April 24, 2024

I would not expect the bigger models to get much better from being fine-tuned a relatively small amount of code, but the smallest models (like 350M) might benefit from seeing your code.

@moyix, I have one query about the fine-tuned Codegen model. With the 350M Codegen model, how can I compare the quality/accuracy of the original Codegen model and the fine-tuned Codegen model? I'm curious if there are any well-known benchmarking tools or general methods for comparing the quality/accuracy of these two models.

from fauxpilot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.