Giter Club home page Giter Club logo

opendelta's Introduction

An Open-Source Framework for Parameter-Efficient Tuning (Delta Tuning).


OverviewInstallationBasic UsageDocsPerformance

version

Overview

OpenDelta is a toolkit for parameter-efficient tuning methods (we dub it as delta tuning), by which users could flexibly assign (or add) a small amount parameters to update while keeping the most parameters frozen. By using OpenDelta, users could easily implement prefix-tuning, adapters, Lora, or any other types of delta tuning with preferred PTMs.

  • The latest version of OpenDelta is tested on Python==3.8.13, PyTorch==1.12.1, transformers==4.22.2. Other versions are likely to be supported as well. If you encounter bugs when using your own package versions, please raise an issue, we will look into it as soon as possible.

  • A demo of using OpenDelta to modify the PLM (E.g., BART). How PLM changes using Delta-tuning

News

  • 2022.10.25 Release v0.3.2. Support BMTrain! Improve docs. Add inspect utilities.
  • 2022.10.14 Release v0.3.0. We make the usage of default configurations of each delta tuning methods (i.e., the position they are attached) more friendly! If a custom model has our supported models as submodules inside, the default configuration is also available. Other key changes can be seen in Update Log
  • 2022.10.10 Merge a long-developed branch v0.2.4 into the master branch. Key updates are (1) the an example unifying the delta tuning paradigm and the prompt-tuning paradigm; (2) and support for Delta Center, whose webpage is still under construction. Details can be seen in Update Log
  • 2022.03.24 We notice several bugs in Soft Prompt Tuning and Prefix Tuning, mainly due to their need to customize attention ids, token_type_ids, we are fixing it! Currently, please use the other methods since they are stabler and better in performance.
  • 2022.03.20 Add a Colab example to illustrate efficient training and space-saving multitask-serving.
  • 2022.03.20 A new pip version released.
  • 2022.02.16 Support regular expression in named-based addressing.

Installation

  1. create a virtualenv (optional)
conda create -n opendelta_env python=3.8
conda activate opendelta_env
  1. install the latest version
pip install git+https://github.com/thunlp/OpenDelta.git

or install the latest pip version (more stable)

pip install opendelta

or build from source

git clone [email protected]:thunlp/OpenDelta.git
cd OpenDelta
python setup.py install
# python setup.py develop # if you want to do some modifications on the code for your research:

Must Try

The following codes and comments walk you through the key functionality of OpenDelta. It is also in must_try.py and must_try.ipynb in Colab.

# use transformers as usual.
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
t5_tokenizer = AutoTokenizer.from_pretrained("t5-large")
# A running example
inputs_ids = t5_tokenizer.encode("Is Harry Potter written by J.K. Rowling", return_tensors="pt")
t5_tokenizer.decode(t5.generate(inputs_ids)[0]) 
# >>> '<pad><extra_id_0>? Is it Harry Potter?</s>'


# use existing delta models
from opendelta import AutoDeltaModel, AutoDeltaConfig
# use existing delta models from DeltaCenter
delta = AutoDeltaModel.from_finetuned("thunlp/Spelling_Correction_T5_LRAdapter_demo", backbone_model=t5)
# freeze the whole backbone model except the delta models.
delta.freeze_module()
# visualize the change
delta.log()


t5_tokenizer.decode(t5.generate(inputs_ids)[0]) 
# >>> <pad> Is Harry Potter written by J.K. Rowling?</s>


# Now save merely the delta models, not the whole backbone model, to tmp/
delta.save_finetuned(".tmp")
import os; os.listdir(".tmp")
# >>>  The state dict size is 1.443 MB
# >>>  We encourage users to push their final and public models to delta center to share them with the community!


# reload the model from local url and add it to pre-trained T5.
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
delta1 = AutoDeltaModel.from_finetuned(".tmp", backbone_model=t5)
import shutil; shutil.rmtree(".tmp") # don't forget to remove the tmp files. 
t5_tokenizer.decode(t5.generate(inputs_ids)[0]) 
# >>> <pad> Is Harry Potter written by J.K. Rowling?</s>

# detach the delta models, the model returns to the unmodified status.
delta1.detach()
t5_tokenizer.decode(t5.generate(inputs_ids)[0])  
# >>> '<pad><extra_id_0>? Is it Harry Potter?</s>'

# use default configuration for customized wrapped models which have PLMs inside. This is a common need for users. 
import torch.nn as nn
class WrappedModel(nn.Module):
  def __init__(self, inner_model):
    super().__init__()
    self.inner = inner_model
  def forward(self, *args, **kwargs):
    return self.inner(*args, **kwargs)

wrapped_model = WrappedModel(WrappedModel(t5))

# say we use LoRA
delta_config = AutoDeltaConfig.from_dict({"delta_type":"lora"})
delta2 = AutoDeltaModel.from_config(delta_config, backbone_model=wrapped_model)
delta2.log()
# >>> root
#       -- inner
#          -- inner
#             ...
#             ... lora_A:[8,1024], lora_B:[1024,8]
delta2.detach()

# use a not default configuration
# say we add lora to the last four layer of the decoder of t5, with lora rank=5
delta_config3 = AutoDeltaConfig.from_dict({"delta_type":"lora", "modified_modules":["[r]decoder.*((20)|(21)|(22)|(23)).*DenseReluDense\.wi"], "lora_r":5})
delta3 = AutoDeltaModel.from_config(delta_config3, backbone_model=wrapped_model)
delta3.log()

Verified Default Configurations

  • You can try to use OpenDelta on any backbone models based on PyTorch.

  • However, with small chances that the interface of the submodules of the backbone model is not supported. Therefore we verified some commonly used models that OpenDelta are sure to support.

  • We will keep testing more and more emerging models.

  • Pull requests are welcomed when you successfully apply OpenDelta on your own backbone model.

Citation

@article{hu2023opendelta,
  title={OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models},
  author={Hu, Shengding and Ding, Ning and Zhao, Weilin and Lv, Xingtai and Zhang, Zhen and Liu, Zhiyuan and Sun, Maosong},
  journal={arXiv preprint arXiv:2307.03084},
  year={2023}
}
@article{ding2022delta,
  title={Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models},
  author={Ding, Ning and Qin, Yujia and Yang, Guang and Wei, Fuchao and Yang, Zonghan and Su, Yusheng and Hu, Shengding and Chen, Yulin and Chan, Chi-Min and Chen, Weize and others},
  journal={arXiv preprint arXiv:2203.06904},
  year={2022}
}

opendelta's People

Contributors

achazwl avatar caffreyr avatar guang-yng avatar guspan-tanadi avatar hirasawaayui avatar maxpa1n avatar mxschmdt avatar namezhenzhang avatar ningding97 avatar shengdinghu avatar telxt avatar xcjthu avatar zibuyu avatar zt-wang19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendelta's Issues

Failed to load adapter layer using OpenDelta+BMTrain

Environment:

python=3.7, pytorch=1.13.0+cu117, bmtrain=0.2.1, model_center=1.0.0, opendelta=0.3.2

Description:

  • I am encountering an issue while loading a trained adapter layer using OpenDelta and BMTrain. I am trying to load saved parameters from disk. But the adapter layer is not getting loaded, and I am getting an "unexpected key" message from the load_state_dict() function.
  • The saved parameters include adapter layer and classifier layer. The classifier layer is getting loaded correctly, while the adapter layer is not loaded.
  • The saved parameters and the same layer in the model are not the same, indicating that the model has not loaded the saved parameters correctly:
    image

Code:

# load from huggingface
model = DebertaV2ForSequenceClassification.from_pretrained("microsoft/deberta-v2-xxlarge", torch_dtype=torch.float16, num_labels=2)

# bmtrain wrapper
model = bmt.BMTrainModelWrapper(model)

# add adapter
delta_model = AdapterModel(backbone_model=model, modified_modules=['output'], backend='bmt')
delta_model.freeze_module(exclude=["deltas", "classifier"], set_state_dict=True)

# load adapter weights
model.load_state_dict(torch.load(args.model_path), strict=False)

Can OpenDelta execute on CUDA?

When I add .to('cuda') to my backbone T5, there is a bug as follow:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

And I check the source code in opendelta.delta_models.soft_prompt, some variables are not send to 'cuda', for example:
soft_embeds.data = torch.clone(self.raw_embedding(torch.tensor([i for i in range(self.num_tokens)])))

Although I have solved this problem, I want to know : is it a BUG of OpenDelta? or my misunderstanding?

Differences between Houlsby and Pfeiffer adapters

Thanks for providing such a great work here! There are structural differences between Houlsby and Pfeiffer adapters (Houlsby et al. places two adapters sequentially within one layer of the transformer, one after the multi-head attention and one after the FFN sub-layer, while Pfeiffer et al. adapter is inserted only after the FFN “add & layer norm” sub-layer), which seems to be missed in the code.

save issue

save issue

expected behavior

Configuration saved in test_delta_model/config.json

Model weights saved in test_delta_model/pytorch_model.bin

Model weights push to hub

situation description:

thanks for impressive work, however though I followed the document, I still can not save the entire model after training.

https://opendelta.readthedocs.io/en/latest/notes/saveload.html#saveload

reproduce the situation:

packages:

gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

Python 3.8.0

opendelta 0.2.4

torch 1.12.0+cu113

transformers 4.21.1

code:

inheriate from:

https://github.com/thunlp/OpenDelta/blob/main/examples/examples_seq2seq/run_seq2seq.py

original code:

   repo_name = create_hub_repo_name(root="DeltaHub",
                        dataset=data_args.task_name,
                        delta_type = delta_args.delta_type,
                        model_name_or_path= model_args.model_name_or_path)
   results['repo_name'] = repo_name
   if training_args.push_to_hub: # TODO add description here
       delta_model.save_finetuned(push_to_hub=True, save_directory=repo_name, use_auth_token=True)
       # trainer.push_to_hub(**kwargs)
   else:
       delta_model.save_finetuned(push_to_hub=False, save_directory=repo_name, use_auth_token=True)

replace with :

    delta_model.save_finetuned("test_delta_model")

it would display the following:

Traceback (most recent call last):
  File "/sharefs/healthshare/fujie/jiajunzhu/repo/full_supervisetry/OpenDelta/examples/examples_seq2seq/wino.py", line 474, in <module>
    result = main()
  File "/sharefs/healthshare/fujie/jiajunzhu/repo/full_supervisetry/OpenDelta/examples/examples_seq2seq/wino.py", line 467, in main
    delta_model.save_finetuned("jjztest_delta_model")
  File "/home/fujie/miniconda3/envs/opendel/lib/python3.8/site-packages/opendelta/utils/saving_loading_utils.py", line 149, in save_finetuned
    final_center_args = self.create_delta_center_args(center_args=center_args,
  File "/home/fujie/miniconda3/envs/opendel/lib/python3.8/site-packages/opendelta/utils/saving_loading_utils.py", line 367, in create_delta_center_args
    mdict['name'] = self.create_default_name(**mdict)
  File "/home/fujie/miniconda3/envs/opendel/lib/python3.8/site-packages/opendelta/utils/saving_loading_utils.py", line 389, in create_default_name
    reponame += kwargs["model_path_public"].split("/")[-1]+"_" if kwargs['model_path_public'] is not None else kwargs['backbone_model']
KeyError: 'model_path_public'

and if i modifed code as

delta_model.save_finetuned("test_delta_model", push_to_hub = True)

it would display

Traceback (most recent call last):
  File "/sharefs/healthshare/fujie/jiajunzhu/repo/full_supervisetry/OpenDelta/examples/examples_seq2seq/wino.py", line 474, in <module>
    result = main()
  File "/sharefs/healthshare/fujie/jiajunzhu/repo/full_supervisetry/OpenDelta/examples/examples_seq2seq/wino.py", line 467, in main
    delta_model.save_finetuned("test_delta_model", push_to_hub = True)
TypeError: save_finetuned() got an unexpected keyword argument 'push_to_hub'

LICENSE

Dear authors,

Thanks for the useful toolkit.

Would you mind specifying the LICENSE of this toolkit? Thanks!

colab -- opendelta_must_try.ipynb, HTTP Error 502: Bad Gateway

HTTPError                                 Traceback (most recent call last)
[<ipython-input-3-329ab0880ca3>](https://localhost:8080/#) in <cell line: 14>()
     12 from opendelta import AutoDeltaModel, AutoDeltaConfig
     13 # use existing delta models from DeltaCenter
---> 14 delta = AutoDeltaModel.from_finetuned("thunlp/Spelling_Correction_T5_LRAdapter_demo", backbone_model=t5)
     15 # freeze the whole backbone model except the delta models.
     16 delta.freeze_module()

10 frames
[/usr/lib/python3.9/urllib/request.py](https://localhost:8080/#) in http_error_default(self, req, fp, code, msg, hdrs)
    639 class HTTPDefaultErrorHandler(BaseHandler):
    640     def http_error_default(self, req, fp, code, msg, hdrs):
--> 641         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    642 
    643 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 502: Bad Gateway

Is it possible to extract the Visualization module as an independent python packages?

Visualization(mode).structure_graph() is especially useful to view the large language models, and sometimes I would like to use it in some other scenario.

So instead of install the whole OpenDelta, is it possible to isolate the Visualization functionality from OpenDelta, then it can become more light-weight and more easily to install ?

`ModuleNotFoundError` caused by `turtle` package

Bug

Attempting to import opendelta results in an error caused by a module not found failure stemming from the turtle graphics package. Is it possible that the following line was unintended, as it is unused elsewhere in the project?

from turtle import back

Traceback

>>> import opendelta
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".env/lib/python3.8/site-packages/opendelta/__init__.py", line 14, in <module>
    from .basemodel import DeltaBase
  File ".env/lib/python3.8/site-packages/opendelta/basemodel.py", line 6, in <module>
    from turtle import back
  File "/usr/lib/python3.8/turtle.py", line 107, in <module>
    import tkinter as TK
ModuleNotFoundError: No module named 'tkinter'

`sequential` parameter is not used in `AdapterModel`

Hi,

Thanks for the awesome tool!
I noticed that the sequential: Optional[str]=True parameter in the AdapterModel is not used,
so the user can not actually insert the adapter in a parallel manner for the AdapterModel class by setting sequntial=False.
I think it's a little bit confusing for the user.
Maybe you can add the insert_parallel_module() function to the AdapterModel class,
or just don't let the user to be able to set the sequential parameter when initializing the AdapterModel class.

Fix for a Bug in BitFit for T5

Hi, I was trying to run BitFit for T5 and was getting some errors while initializing the bias parameters. It seems like in you register the bias parameter later to the linear class and initialize it first which leads to an error. The error happened here, moving L185 to the place of L179 fixes the bug.

Thanks,
Prateek

Soft prompt for BERT

Hi, I was training soft prompt for BERT and met the following error:

  File "/home/user/anaconda3/lib/python3.7/site-packages/OpenDelta/opendelta/delta_models/soft_prompt.py", line 90, in pre_forward
    inputs_embeds = self.raw_embedding(input_ids)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2044, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_text_classification.py", line 652, in <module>
    main()
  File "train_text_classification.py", line 561, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/trainer.py", line 1332, in train
    tr_loss_step = self.training_step(model, inputs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/trainer.py", line 1891, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/trainer.py", line 1923, in compute_loss
    outputs = model(**inputs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/OpenDelta/opendelta/utils/decorate.py", line 47, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/user/anaconda3/lib/python3.7/site-packages/OpenDelta/opendelta/basemodel.py", line 502, in _caller
    args, kwargs = delta_module.pre_forward(*args, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/OpenDelta/opendelta/delta_models/soft_prompt.py", line 92, in pre_forward
    raise RuntimeError("neither inputs_embeds nor input_ids is specified.")
RuntimeError: neither inputs_embeds nor input_ids is specified.

I used the example configs and example_text_classification/run_glue.py to launch the job.
Could you help me with this? Thank you!

Where to find the trained delta checkpoints?

Hi @telxt, @ShengdingHu, @WuNein, thanks for creating the amazing repository. I was trying to perform inference on some of the results from the papers and was trying to locate the fine-tuned delta checkpoints for various methods. In the documentation, I saw the link that multiple models are supported. I was unable to find the delta center from where I can load these finetune delta checkpoints into the model and perform inference.

Would it be possible for you to point me to these delta checkpoints?

Thanks,
Prateek

Prefix tuning for T5-small

Hi, I met an error when using Prefix tuning with T5-small.

File "/home/user/anaconda3/lib/python3.7/site-packages/OpenDelta/opendelta/basemodel.py", line 502, in _caller
    args, kwargs = delta_module.pre_forward(*args, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/OpenDelta/opendelta/delta_models/prefix.py", line 68, in pre_forward
    kwargs['past_key_value'] = (expand_batchsize(past_key), expand_batchsize(past_value))
  File "/home/user/anaconda3/lib/python3.7/site-packages/OpenDelta/opendelta/delta_models/prefix.py", line 60, in expand_batchsize
    x = x.reshape(self.prefix_token_num, self.num_heads, -1).transpose(0,1)
RuntimeError: shape '[6, 6, -1]' is invalid for input of size 2048

In T5-small, with 6 heads, it looks not possible to evenly divide 2048 with 6, no matter what num_prefix_token is.

def expand_batchsize(x):
            x = x.reshape(self.prefix_token_num, self.num_heads, -1).transpose(0,1)
            x = x.unsqueeze(0).expand(batch_size, *x.shape)
            return x

Could you help me with this? Thank you!

May I request for the trainable codes for the paper section 5.1?

Hello, thank you for creating this repo! It is very useful and the API is so neat :D

I read the paper, and I'd love to reproduce several results in section 5.1.
However, it seems like in the examples directory, there are only GLUE and SuperGLUE configs & source/target/metric information. (Each in examples/examples_prompt/configs, and examples/examples_prompt/data_processors/tasks.py)
According to the paper, you searched for the best hyper-parameters for each task and I'd like to know the configuration. & I think templates_text will make a difference in the output, so I opened this issue. (Also there are 100+ tasks, and I'm not sure if I could search for the hparams/ create all templates ;-D)

If it is OK, may I request the codes of config / Task codes that you used on creating the numbers in section 5.1? (Another branch, zip file, ..any method that you're convenient for is all very much appreciated!)

RuntimeError: This is a delta model, which should be attached to a backbone model and can't forward any data by itself. Please using the backbone model's forward function after attach the delta model to the backbone. eceived was empty, your model won't be able to train on it. Double-check that your training dataset contains keys expected by the model: args,kwargs,label_ids,label.

I used bert model to train on RAFT datasets, the original model went well. But when I tried to add LowRankAdapterModel to finetune, it went wrong. I just simply apply the code in this. @ShengdingHu

#!/usr/bin/env python
# coding: utf-8

# In[1]:


import datasets

datasets.logging.set_verbosity_error()


# In[2]:


from datasets import get_dataset_config_names

RAFT_TASKS = get_dataset_config_names("ought/raft")
RAFT_TASKS


# In[3]:


from datasets import load_dataset

TASK = "ade_corpus_v2"
raft_dataset = load_dataset("ought/raft", name=TASK)
raft_dataset


# In[4]:


from transformers import AutoTokenizer,Seq2SeqTrainingArguments, TrainerCallback
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

from sklearn.model_selection import train_test_split
X = raft_dataset["train"]['Sentence']
y = raft_dataset["train"]['Label']

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
X_train_tokenized = tokenizer(X_train, padding=True, truncation=True, max_length=512)
X_val_tokenized = tokenizer(X_val, padding=True, truncation=True, max_length=512)


# In[5]:


# X_train_tokenized


# In[19]:


item={}
for key, val in X_train_tokenized.items():
    if key == 'input_ids':
        item['label_ids']=torch.tensor(val[idx])
    else:
        item[key]=torch.tensor(val[idx])
        
item
        


# In[6]:


import torch
class Dataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels=None):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
#         item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item={}
        for key, val in self.encodings.items():
            if key == 'input_ids':
                item['label_ids']=torch.tensor(val[idx])
            else:
                item[key]=torch.tensor(val[idx])
        if self.labels:
            item["label"] = torch.tensor(self.labels[idx]-1)
        return item

    def __len__(self):
        return len(self.encodings["input_ids"])

train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_val)


# In[7]:


train_dataset[0]


# In[8]:


from transformers import TrainingArguments, Trainer
from transformers import AutoModelForSequenceClassification,EarlyStoppingCallback

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)


# In[9]:


from opendelta import Visualization
Visualization(model).structure_graph();


# In[13]:


from opendelta import LowRankAdapterModel
delta_model1 = LowRankAdapterModel(backbone_model=model, modified_modules=['LayerNorm'])
# delta_model1.freeze_module(set_state_dict = True)
delta_model1.log(delta_ratio=True, trainable_ratio=True, visualization=True)

from opendelta import LoraModel
delta_model2 = LoraModel(backbone_model=model, modified_modules=['dense'])
# delta_model2.freeze_module(set_state_dict = True)
delta_model2.log(delta_ratio=True, trainable_ratio=True, visualization=True)from opendelta import CompacterModel
delta_model3 = CompacterModel(backbone_model=model, modified_modules=['dense'])
# delta_model2.freeze_module(set_state_dict = True)
delta_model3.log(delta_ratio=True, trainable_ratio=True, visualization=True)
# In[14]:


def compute_metrics(p):
    pred, labels = p
    pred = np.argmax(pred, axis=1)

    accuracy = accuracy_score(y_true=labels, y_pred=pred)
    recall = recall_score(y_true=labels, y_pred=pred)
    precision = precision_score(y_true=labels, y_pred=pred)
    f1 = f1_score(y_true=labels, y_pred=pred)

    return {"accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1}

# Define Trainer
args = TrainingArguments(
    output_dir="output",
    evaluation_strategy="steps",
    eval_steps=500,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    seed=0,
    load_best_model_at_end=True,
)
trainer = Trainer(
    model=delta_model1,
#     model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)

# Train pre-trained model
trainer.train()


# TrainOutput(global_step=15, training_loss=0.5652575810750325, metrics={'train_runtime': 11.1754, 'train_samples_per_second': 10.738, 'train_steps_per_second': 1.342, 'total_flos': 4563332366400.0, 'train_loss': 0.5652575810750325, 'epoch': 3.0})


RuntimeError: This is a delta model, which should be attached to a backbone model and can't forward any data by itself. Please using the backbone model's forward function after attach the delta model to the backbone.

BitFit for GPT-2 Models

Are there results on what the best finetuning scheme for GPT (autoregressive) style models are? I couldn't find it in the Delta Tuning paper... does BitFit finetuning perform well for GPT-2 -- are there any public benchmarks that show performance?

LowRankAdapter not working with Bert models

Ok I am trying to use LowRankAdapterModel with bert-base-uncased and bert-large-uncased and I am getting the following error. Please look into it


KeyError Traceback (most recent call last)
in ()
1 from opendelta import LowRankAdapterModel
----> 2 delta_model1 = LowRankAdapterModel(backbone_model=model)
3 delta_model1.freeze_module(set_state_dict = True)
4 delta_model1.log(delta_ratio=True, trainable_ratio=True, visualization=True)

5 frames
/usr/local/lib/python3.7/dist-packages/opendelta/delta_models/low_rank_adapter.py in init(self, backbone_model, reduction_factor, non_linearity, low_rank_w_init, low_rank_rank, modified_modules, exclude_modules, unfrozen_modules, common_structure, interactive_modify)
167 unfrozen_modules=unfrozen_modules,
168 common_structure=common_structure,
--> 169 interactive_modify=interactive_modify,
170 )
171 arg_names = get_arg_names_inside_func(self.init)

/usr/local/lib/python3.7/dist-packages/opendelta/basemodel.py in init(self, backbone_model, modified_modules, exclude_modules, unfrozen_modules, interactive_modify, common_structure)
130 self.common_structure = common_structure
131 if self.common_structure:
--> 132 self.structure_mapping = CommonStructureMap.load(self.backbone_model)
133 else:
134 self.structure_mapping = None

/usr/local/lib/python3.7/dist-packages/opendelta/utils/structure_mapping.py in load(cls, backbone_model, strict, warining, visualize)
317 if backbone_class not in cls.Mappings:
318 raise KeyError(backbone_class)
--> 319 mapping = cls.Mappings[backbone_class]
320 if visualize:
321 logger.info("Since you are using the common structure mapping, draw the transformed parameter structure for checking.")

/usr/local/lib/python3.7/dist-packages/opendelta/utils/structure_mapping.py in getitem(self, key)
279 raise KeyError(key)
280 value = self._mapping_string[key]
--> 281 self._mapping[key] = eval(value)
282 return self._mapping[key]
283

/usr/local/lib/python3.7/dist-packages/opendelta/utils/structure_mapping.py in ()

/usr/local/lib/python3.7/dist-packages/opendelta/utils/structure_mapping.py in mapping_for_SequenceClassification(mapping, type)
252 }
253 elif type == "bert":
--> 254 mapping.pop("lm_head")
255 mapping["classifier"] = {"name": "classifier"}
256 elif type == "deberta":

KeyError: 'lm_head'

This is how model is defined

config = AutoConfig.from_pretrained(
"bert-base-uncased"
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
config.dropout_rate = 0.0
tokenizer = AutoTokenizer.from_pretrained(
"bert-base-uncased",
cache_dir=model_args.cache_dir,
use_fast=model_args.use_fast_tokenizer,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
model = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased",
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
model.resize_token_embeddings(len(tokenizer))

可以像OpenPrompt项目那样提供一些样例代码吗?

在使用中有一些不太清楚的地方,希望可以有些详细的代码参考,感谢!

目前遇到的问题:在使用PrefixModel时,指定modified_modules=["0.layer.0"]和不传入modified_modules参数时reparams的参数不一样,是使用的方式不对吗

modified_modules=["0.layer.0"]时:reparams.control_trans.2: weight:[3072, 512] bias:[3072] ,而且在model.generate时会报错:The size of tensor a (2) must match the size of tensor b (12) at non-singleton dimension 3

不传入该参数时:reparams.control_trans.2: weight:[36864, 512] bias:[36864] ,可以正常generate

模型使用T5

XGLM: How to apply OpenDelta to a new model?

Hi,

Thanks for providing this useful library for delta tuning!

How can we apply OpenDelta to a new model, such as facebook/xglm-564M? Its architecture looks like:

root
├── model (XGLMModel)
│   ├── embed_tokens (Embedding) weight:[256008, 1024]
│   ├── embed_positions (XGLMSinusoidalPositionalEmbedding) weights:[2050, 1024]
│   ├── layers (ModuleList)
│   │   └── 0-23(XGLMDecoderLayer)
│   │       ├── self_attn (XGLMAttention)
│   │       │   └── k_proj,v_proj,q_proj,out_proj(Linear) weight:[1024, 1024] bias:[1024]
│   │       ├── self_attn_layer_norm,final_layer_norm(LayerNorm) weight:[1024] bias:[1024]
│   │       ├── fc1 (Linear) weight:[4096, 1024] bias:[4096]
│   │       └── fc2 (Linear) weight:[1024, 4096] bias:[1024]
│   └── layer_norm (LayerNorm) weight:[1024] bias:[1024]
└── lm_head (Linear) weight:[256008, 1024]

To reproduce

from opendelta import LoraModel
from transformers import XGLMForCausalLM

backbone_model = XGLMForCausalLM.from_pretrained("facebook/xglm-564M")
delta_model = LoraModel(backbone_model)

Prefix tuning for BERT

Hi, I met an error when training Prefix tuning on BERT. Looks like Prefix tuning does not support BERT now.

Code: example_text_classification/run_glue.py
PLM: google/bert_uncased_L-4_H-512_A-8
Error message:

opendelta/delta_models/prefix.py, line 553, in new_module_like
    raise NotImplementedError(type(module))
NotImplementedError: <class 'transformers.models.bert.modeling_bert.BertAttention

The error is from the following function:

def new_module_like(self, module):
        # TODO: support more Attention modules

        if isinstance(module, T5Attention) or isinstance(module, T5LayerSelfAttention): 
            if isinstance(module, T5LayerSelfAttention):
                module = module.SelfAttention # innermodule
            module_device = get_device(module)
            prefixlayer = PrefixLayerT5(prefix_token_num=self.prefix_token_num, num_heads=module.n_heads ,device=module_device)
        elif isinstance(module, MultiHeadSelfAttention):  # MultiHeadSelfAttention didn't provide past_key_value in the interface of the forward function.
            module_device = get_device(module)
            prefixlayer = PrefixLayerDistilBert(prefix_token_num=self.prefix_token_num, device=module_device)
            self.insert_sequential_module(getattr(module, "k_lin"), pre_caller=prefixlayer.key_pre_forward, post_caller=prefixlayer.key_forward)
            self.insert_sequential_module(getattr(module, "v_lin"), pre_caller=prefixlayer.value_pre_forward, post_caller=prefixlayer.value_forward)
        elif isinstance(module, BertSelfAttention):
            raise NotImplementedError
        elif isinstance(module, RobertaAttention):
            module_device = get_device(module)
            prefixlayer = PrefixLayerRoberta(prefix_token_num=self.prefix_token_num, num_heads=module.self.num_attention_heads,device=module_device)
        elif isinstance(module, GPT2Attention):
            module_device = get_device(module)
            prefixlayer = PrefixLayerGPT2(prefix_token_num=self.prefix_token_num, num_heads=module.num_heads ,device=module_device)
        elif isinstance(module, BartAttention):
            module_device = get_device(module)
            prefixlayer = PrefixLayerBart(prefix_token_num=self.prefix_token_num, num_heads=module.num_heads ,device=module_device)
        else:
            raise NotImplementedError(type(module))
        return prefixlayer, module

Could you help me with this? Thank you!

How to perform inference on already trained checkpoints.

Hi @telxt, @ShengdingHu, @WuNein, thank you for creating this useful repository. I am trying to understand how to use this repo for some use cases.

I see that a lot of the trained checkpoints are available on the hugging face hub. I wanted to use these checkpoints to perform inference on the respective datasets to obtain the val/test set performances of these trained models. I am sure there is a way to run inference for these models available on the hub but I am not able to figure it out. Would it be possible for you to share an example command on how this can be done?

Thanks,
Prateek

导入报错

from opendelta import AutoDeltaModel, AutoDeltaConfig
/root/anaconda3/lib/python3.8/site-packages/requests/init.py:109: RequestsDependencyWarning: urllib3 (1.26.14) or chardet (2.1.1)/charset_normalizer (2.1.1) doesn't match a supported version!
warnings.warn(
Traceback (most recent call last):
File "", line 1, in
File "/search/ai/kaitongyang/cpm_ant_plus/CPM-Live/cpm-live/OpenDelta/opendelta/init.py", line 13, in
from .utils.saving_loading_utils import SaveLoadMixin
File "/search/ai/kaitongyang/cpm_ant_plus/CPM-Live/cpm-live/OpenDelta/opendelta/utils/saving_loading_utils.py", line 9, in
from DeltaCenter import OssClient
File "/root/anaconda3/lib/python3.8/site-packages/delta_center_client-0.0.4-py3.8.egg/DeltaCenter/init.py", line 3, in
from .client import Client, help
File "/root/anaconda3/lib/python3.8/site-packages/delta_center_client-0.0.4-py3.8.egg/DeltaCenter/client.py", line 5, in
import oss2
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 655, in _load_unlocked
File "", line 618, in _load_backward_compatible
File "", line 259, in load_module
File "/root/anaconda3/lib/python3.8/site-packages/oss2-2.15.0-py3.8.egg/oss2/init.py", line 3, in
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 655, in _load_unlocked
File "", line 618, in _load_backward_compatible
File "", line 259, in load_module
File "/root/anaconda3/lib/python3.8/site-packages/oss2-2.15.0-py3.8.egg/oss2/models.py", line 9, in
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 655, in _load_unlocked
File "", line 618, in _load_backward_compatible
File "", line 259, in load_module
File "/root/anaconda3/lib/python3.8/site-packages/oss2-2.15.0-py3.8.egg/oss2/utils.py", line 32, in
File "/root/anaconda3/lib/python3.8/site-packages/Crypto/Cipher/init.py", line 7, in
from Crypto.Cipher._mode_ctr import _create_ctr_cipher
File "/root/anaconda3/lib/python3.8/site-packages/Crypto/Cipher/_mode_ctr.py", line 35, in
from Crypto.Util.number import long_to_bytes
File "/root/anaconda3/lib/python3.8/site-packages/Crypto/Util/number.py", line 387
s = pack('>I', n & 0xffffffffL) + s
^
SyntaxError: invalid syntax

Flash Attention and Open Delta LoRA

Hello @ShengdingHu,

Are you able to confirm whether Flash Attention will be compatible with Open Delta LoRA?

For example:

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1.4b")
tokenizer.pad_token = tokenizer.mask_token

model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/pythia-1.4b")

max_positions = model_args.max_positions
tokenizer.model_max_length = max_positions
for layer in model.gpt_neox.layers:
    original_emb = layer.attention.rotary_emb
    layer.attention.rotary_emb = RotaryEmbedding(layer.attention.rotary_ndims,max_positions,10000)
    layer.attention.bias = torch.tril(torch.ones((max_positions, max_positions), dtype=torch.uint8)).view(
                1, 1, max_positions, max_positions
            )
    layer.attention = FlashAttentionWrapper(layer.attention, max_seqlen = max_positions)

# patching for the random contiguous tensors bug
for p in model.parameters():
    p = p.contiguous()

Visualization(model).structure_graph()

delta_model1 = LoraModel(
    backbone_model=model, 
    modified_modules=[
        'attention.attention.query_key_value',
        'mlp.dense_h_to_4h',
    ]
)
delta_model1.freeze_module()
delta_model1.log(delta_ratio=True, trainable_ratio=True, visualization=True)

Screenshot from 2023-02-26 19-32-48

Thank you for your great work,

Enrico

How to implement prefix tuning with BartForConditionalGeneration?

Thank you for the awesome work. Currently, I am trying to implement prefix-tuning experiments with BART. The original code provided by the author is a total mess.
Then I found your work here. However, I cannot find enough docs for the usage. For example, I dont know how the run experiment with PrefixModel you provided. And I checked the source code , I haven't figure out
how it works.
Could you please give me more information about that?

How to change the size of the dataset?

Hi @ShengdingHu , thanks for sharing the code. May I ask that how to change the size of dataset. It seems that that the COPA have 400 train dataset, 50 val dataset, 50 test dataset. I want to change a little bit of the size, but change the "max_train_samples","max_val_samples","max_test_samples" seems not to work.

pip install opendelta报错

从报错信息看,似乎是要在setup.py中更新一下,“scikit-learn”

error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
rather than 'sklearn' for pip commands.

  Here is how to fix this error in the main use cases:
  - use 'pip install scikit-learn' rather than 'pip install sklearn'
  - replace 'sklearn' by 'scikit-learn' in your pip requirements files
    (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
  - if the 'sklearn' package is used by one of your dependencies,
    it would be great if you take some time to track which package uses
    'sklearn' instead of 'scikit-learn' and report it to their issue tracker
  - as a last resort, set the environment variable
    SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error
  
  More information is available at
  https://github.com/scikit-learn/sklearn-pypi-package
  
  If the previous advice does not cover your use case, feel free to report it at
  https://github.com/scikit-learn/sklearn-pypi-package/issues/new
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Integration with Huggingface Trainer

Hi,

I wonder what's the correct way to integrate this library into Huggingface Trainer?

Does it work out of the box? Or we need to explictly save the delta parameters. For example, implement a model saving callback after each evaluation.

`index.html` is not included in the package if installing from PyPI

Thanks for the excellent package.

Problem

The index.html file in opendelta/utils/interactive/templates/ is a static file, and it will not be included in the distributed package file (like the wheel file) unless you add the package data manually in setup.py.

Reproduce

On a clean environment,

$ pip install opendelta
$ python examples/tutorial/0_interactive.py

compatibility with pytorch

Hi. here is another problem. See, I use opendelta and pytorch lightning to fine tune my model using lora. But when I tried to load, it seems wrong since it seems there is state keys missing here. Apparently, it seems not save the LORA weight.
@ShengdingHu


def opendelta_modify_with_lora(transformer, config):
    # pass
    LoraModel(backbone_model=transformer, modified_modules=['[r](\d).SelfAttention.[q,v,o,k]'])
    LoraModel(backbone_model=transformer, modified_modules=['[r](\d).EncDecAttention.[q,v,o,k]'])
    delta_model = LoraModel(backbone_model=transformer, modified_modules=['[r](\d).DenseReluDense.w[o,i]'])

    delta_model.freeze_module(exclude=["layer_norm", "lora_A", "lora_B"])
    # delta_model.log(delta_ratio=True, trainable_ratio=True, visualization=True)
    # Visualization(transformer).structure_graph();
    return transformer

class EncoderDecoder(LightningModule):
    """
    Encoder Decoder
    """

    def __init__(self, config, tokenizer, transformer, dataset_reader):
        """
        :param config
        """
        super().__init__()
        self.config = config
        self.tokenizer = tokenizer
        self.model = transformer
        self.dataset_reader = dataset_reader

        self.use_deepspeed = self.config.compute_strategy.startswith("deepspeed")
        self.use_ddp = self.config.compute_strategy.startswith("ddp")
        self.load_model()

        self._last_global_step_saved = -1

        if self.config.fishmask_mode is not None:
            fishmask_plugin_on_init(self)

model= EncoderDecoder.load_from_checkpoints("my file path")

image

does opendelta support gradient_checkpointing?

Thank you for the awesome work.
I met some problems when using opendelta with gradient_checkpointing, it just throws:
"RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn"
btw code works well as gradient_checkpointing is closed.

so does opendelta support gradient_checkpointing?

some bug about reparameterize flag in PrefixModel

when I turn off the reparameterize flag during training time, I got an error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I found the code snippet in PrefixLayerT5 led to the above error (other classes have the same problem):

past_key = self.past_key.data

and
past_value = self.past_value.data

To solve my problem, they should be:
past_key = self.past_key
and
past_value = self.past_value

tutorial doc bug

Hi, I noticed that there're some bugs exists in the BM train tutorial file, would you mind if you could modify it in the future?

argument bug

returns:2_with_bmtrain.py: error: unrecognized arguments: --delta_type low_rank_adapter

delta model visualization bug

returns:

File "./2_with_bmtrain.py", line 132, in get_model
od.Visualization(model).structure_graph()
AttributeError: module 'opendelta' has no attribute 'Visualization'

in order to reproduce it, I worked with open delta 0.3.2

lowrank adapter with bert

when using bert with lowrankadapter, returns

AttributeError: str(forward() got an unexpected keyword argument 'output_pooler_output')
        The LowRankAdapterModel requires a dummy_inputs to be passed through the model to understand the dimensionality of each tensor in the computation graph. 
         The BertModel Class has no dummy_inputs, and automatically created dummy_inputs failed.
         Refer to `https://opendelta.readthedocs.io/en/latest/notes/faq.html` for detail.

lora with bert

Traceback (most recent call last):
  File "./2_with_bmtrain.py", line 371, in <module>
    main()
  File "./2_with_bmtrain.py", line 360, in main
    tokenizer, model, optimizer, lr_scheduler = setup_model_and_optimizer(args)
  File "./2_with_bmtrain.py", line 204, in setup_model_and_optimizer
    model = get_model(args)
  File "./2_with_bmtrain.py", line 135, in get_model
    delta_model = LoraModel(backbone_model=model, modified_modules=['project_q', 'project_k'], backend='bmt')
  File "/root/miniconda3/lib/python3.8/site-packages/opendelta/delta_models/lora.py", line 136, in __init__
    self.add_all_delta_to_backbone(self.backbone_model,
  File "/root/miniconda3/lib/python3.8/site-packages/opendelta/basemodel.py", line 213, in add_all_delta_to_backbone
    self.update_module(backbone, key)
  File "/root/miniconda3/lib/python3.8/site-packages/opendelta/delta_models/lora.py", line 143, in update_module
    parallel_module = self.new_module_like(child_module=child_ref)
  File "/root/miniconda3/lib/python3.8/site-packages/opendelta/delta_models/lora.py", line 151, in new_module_like
    in_features, out_features = child_module.in_features, child_module.out_features
  File "/root/miniconda3/lib/python3.8/site-packages/bmtrain-0.1.8-py3.8-linux-x86_64.egg/bmtrain/layer.py", line 12, in __getattr__
    ret = super().__getattr__(name)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Linear' object has no attribute 'in_features'

incorrect installation commands

root@container-bb51118b3c-1fa0644b:~/OpenDelta/examples/tutorial# pip install [email protected]:OpenBMB/ModelCenter.git
ERROR: Invalid requirement: '[email protected]:OpenBMB/ModelCenter.git'
Hint: It looks like a path. File '[email protected]:OpenBMB/ModelCenter.git' does not exist.

thanks for your contribution to the open source community; if you got some time in the feature, it would be great to update the tutorial
with regards jiajun

error when running on multiple cards

When running the program with the adapter inserted on multiple cards,
from opendelta import AdapterModel
delta_model = AdapterModel(backbone_model=model, modified_modules=['fc2'], bottleneck_dim=12)

the following error occurs,
RuntimeError: Caught RuntimeError in replica 0 on device 0.
and when transferred to a single card to run, no more errors are reported.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.