flagai-open / flagai Goto Github PK
View Code? Open in Web Editor NEWFlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
License: Apache License 2.0
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
License: Apache License 2.0
I want to know if OPT is supported on M1/M1Pro/M1Max?
trainer.train(model=model, train_dataset=dataset, collate_fn=cifar10_collate_fn)
请问 train 完成后的模型如何导出用于推理计算
Hi, thanks for your great work about AltCLIP.
In the paper, for the first stage to train AltCLIP, the teacher model text embedding is extracted from origin CLIP text encoder as [TOS] token, is this [TOS] token according to open-ai's CLIP: https://github.com/openai/CLIP/blob/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1/clip/model.py#L354
x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)]
But the Figure-1 in paper use [EOS] token, is the [TOS] and [EOS] token the same with open-ai's impelementation?
And the student model XLM-R use [CLS] token as text embedding to calculate teacher and student MSE loss ?
Describe the bug
train_10b_clue.py from ./examples/glm_superglue
afqmc task in CLUE
pytorch setting
task_name = 'afqmc'
trainer = Trainer(env_type="pytorch",
batch_size=16,
epochs=10,
eval_interval=10,
load_dir=None,
pytorch_device="cuda",
save_dir="./glm_superglue_en",
save_epoch=1)
model = GLMForSingleTokenCloze.from_pretrain(download_path="/mnt/test_10b_models",
model_name="GLM-large-ch")
tokenizer = GLMLargeChTokenizer()
train_dataset = SuperGlueDataset(task_name=task_name,
data_dir='./datasets/',
dataset_type='train',
tokenizer=tokenizer,
cloze_eval=True)
valid_dataset = SuperGlueDataset(task_name=task_name,
data_dir='./datasets/',
dataset_type='dev',
tokenizer=tokenizer,
cloze_eval=True)
cl_args = CollateArguments()
cl_args.cloze_eval = True
cl_args.multi_token = False
collate_fn = ConstructSuperglueStrategy(cl_args,
tokenizer,
task_name=task_name)
trainer.train(model,
train_dataset=train_dataset,
valid_dataset=valid_dataset,
collate_fn=collate_fn,
metric_methods=[["acc", accuracy_metric]])
Tasks
To Reproduce
(base) root@deepspeed:~/FlagAI/examples/glm_superglue# python train_10b_clue.py
file cog-pretrain.vocab not exist in ['cog-pretrain.model', 'cog-pratrain.vocab', 'pytorch_model.bin', 'vocab.txt', 'config.json', 'README.md']
{'pad': 50000, 'eos': 50000, 'sep': 50001, 'ENC': 50002, 'MASK': 50003, 'unk': 50004, 'sop': 50006, 'eop': 50007, 'sMASK': 50008, 'gMASK': 50009}
Creating afqmc dataset from file at ./datasets/ (split=train)
Returning 34334 train examples with label dist.: [('0', 23761), ('1', 10573)]
Creating afqmc dataset from file at ./datasets/ (split=dev)
Returning 4316 dev examples with label dist.: [('0', 2978), ('1', 1338)]
Optimizer = Adam
[2022-06-09 10:10:29,530] [INFO] [logger.py:70:log_dist] [Rank -1] loading checkpoints form None
[2022-06-09 10:10:29,530] [INFO] [logger.py:70:log_dist] [Rank -1] working on epoch 0 ...
Traceback (most recent call last):
File "train_10b_clue.py", line 64, in <module>
trainer.train(model,
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 460, in train
lm_loss, skipped_iter, _ = self.train_step(batch,
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 568, in train_step
step_output = self.forward_step(data, model, mems)
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 635, in forward_step
model_output = model(**data)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/model/glm_model.py", line 754, in forward
model_out = self.model(input_ids,
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/model/glm_model.py", line 453, in forward
loss = F.cross_entropy(logits_parallel.contiguous().float(),
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2996, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected target size [16, 50048], got [16]
OS (please complete the following information):
Hi, I see that the diffusers can already support Altdiffusion. And I try dreambooth on Altdiffusion by using the diffusers.It just need to change original StableDiffusion Pipeline to AltDiffusion Pipeline,and replace the text encoder.And I get results that looks great!
Here are some results I generated in Chinese.I use the special token <鸣人> to represent Uzumaki Naruto.
Prompt 一张<鸣人>男孩的照片,背景是沙漠,masterpieces
Prompt: 一张<鸣人>男孩的照片,背景是富士山,masterpieces
I'm curious that whether flagai could support the dreambooth? Thanks!
FlagAI的logo,上下有很大的空白,导致readme文件看起来前面不太均衡。检查一下是否可以减少logo上下的空白?谢谢
1、初始化代码:
auto_loader = AutoLoader(
task_name="txt_img_matching",
model_dir="./checkpoints",
model_name="AltCLIP-XLMR-L" # Load the checkpoints from Modelhub(model.baai.ac.cn/models)
)
2、错误:
│ d:\Users\bigdata\Anaconda3\lib\site-packages\transformers\configuration_utils.py:688 in │
│ from_dict │
│ │
│ 685 │ │ if "_commit_hash" in kwargs and "_commit_hash" in config_dict: │
│ 686 │ │ │ kwargs["_commit_hash"] = config_dict["_commit_hash"] │
│ 687 │ │ │
│ ❱ 688 │ │ config = cls(**config_dict) │
│ 689 │ │ │
│ 690 │ │ if hasattr(config, "pruned_heads"): │
│ 691 │ │ │ config.pruned_heads = dict((int(key), value) for key, value in config.pruned │
│ │
│ C:\Users\bigdata\AppData\Roaming\Python\Python38\site-packages\flagai\model\mm\AltCLIP.py:79 in │
│ init │
│ │
│ 76 │ │ │ │ num_layers=3, │
│ 77 │ │ │ │ variant='invert', │
│ 78 │ │ │ │ **kwargs): │
│ ❱ 79 │ │ super().init(text_config_dict, vision_config_dict, projection_dim, │
│ 80 │ │ │ │ │ │ logit_scale_init_value, **kwargs) │
│ 81 │ │ if text_config_dict is None: │
│ 82 │ │ │ text_config_dict = {}
Describe the bug
A clear and concise description of what the bug is.
Tasks
To Reproduce
Error code
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
OS (please complete the following information):
Additional context
Add any other context about the problem here.
本文总结了十个可穿戴产品的设计原则,而这些原则同样也是笔者认为是这个行业最吸引人的地方,1为人们解决重复性问题2从人开始而不是从机器开始3要引起注意但不要刻意4提升用户能力而不是取代人。 :
--------------sample 0 :-------------------
-----------random sample: --------------
{'input_ids': [23694, 35526, 12895, 43392, 32153, 2837, 101, 1369, 43359, 24733, 1369, 1736, 88, 11921, 5789, 15658, 43469, 39550, 247, 4153, 43377, 797, 341, 3075, 30639, 43372, 43576, 43371, 71, 1878, 43576, 1354, 71, 43393, 43385, 817, 295, 30057, 7692, 43413, 852, 439, 169, 1878, 6170, 43371, 43361], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}
Traceback (most recent call last):
File "glm_title_ch.py", line 33, in
predictor.predict_generate_randomsample(text,
File "/home/lz/miniconda3/envs/flagai/lib/python3.8/site-packages/flagai/model/predictor/predictor.py", line 288, in predict_generate_randomsample
return glm_random_sample(self.model, self.tokenizer, text,
File "/home/lz/miniconda3/envs/flagai/lib/python3.8/site-packages/flagai/model/predictor/utils.py", line 600, in glm_random_sample
position_ids = torch.tensor([data['position_ids']],
KeyError: 'position_ids'
Describe the question
A clear and concise description of what the question is.
Additional context
Add any other context about the question here.
您好,按照readme中的例子 我是用自己的数据集进行finetune 在第一个iteration之后 得到的lm_loss全为None
在forward_step方法里打印了data 数据是正常的,但无法获取正确的model_output, 请大神帮忙看下
Describe the bug
A clear and concise description of what the bug is.
Tasks
To Reproduce
Traceback (most recent call last):
File "train_large_clue.py", line 51, in <module>
trainer.train(model,
File "/root/anaconda3/envs/py38/lib/python3.8/site-packages/flagai-1.6.1-py3.8.egg/flagai/trainer.py", line 598, in train
eval_dict = self.evaluate_and_print_results(
File "/root/anaconda3/envs/py38/lib/python3.8/site-packages/flagai-1.6.1-py3.8.egg/flagai/trainer.py", line 1103, in evaluate_and_print_results
eval_dict = self.evaluate(forward_step_func=forward_step_func,
File "/root/anaconda3/envs/py38/lib/python3.8/site-packages/flagai-1.6.1-py3.8.egg/flagai/trainer.py", line 1051, in evaluate
metrics[i] += eval_method(all_logits, all_labels, meta=meta, tokenizer=self.tokenizer)
TypeError: accuracy_metric() got an unexpected keyword argument 'tokenizer'
对应方法函数:
"""train_large_clue.py"""
trainer.train(model,
train_dataset=train_dataset,
valid_dataset=valid_dataset,
collate_fn=collate_fn,
metric_methods=[["acc", accuracy_metric]])
"""flagai.metrics.accuracy_metric.py"""
def accuracy_metric(predictions, labels, meta=None):
'''
predictions: torch.size(n, class_num)
labels: torch.size(n)
'''
count = 0
assert len(predictions) == len(labels)
if predictions.size() != labels.size():
predictions = torch.argmax(predictions, dim=-1)
for prediction, label in zip(predictions, labels):
count += prediction == label
else:
prediction, label = predictions[0], labels[0]
if sigmoid(prediction) >= 0.5:
count += label == 1
else:
count += label == 0
return 100.0 * count / len(labels)
FlagAI (Fast LArge-scale General AI models) is an fast, easy-to-use and extensible toolkit for large-scale models.
如题
现在工程整体的一个问题是 缺乏具体有效训练的代码
examples中的例子都是极小数据量的 除非GLM有很强的few shot能力
否则无法使得使用者能根据自己的数据验证训练过程及模型的有效性。
已经训练好的模型:如 GLM-large-ch 及这些可预先加载的模型的效果都非常好
如果能给出这些模型从随机初始化及全量数据到训练完成的过程则会非常好。
也就是这个工程在开箱即用的意义下非常好,但在如何进行复刻和全量数据调试上缺乏根据。
能多开源一些相关的部分吗?谢谢
是这样吗?
text = '''
问题:1994年3月,范廷颂担任什么职务?答案:[MASK] 范廷颂枢机(,),圣名保禄·若瑟(),是越南罗马天主教枢机。1963年被任为主教;1990年被擢升为天主教河内总教区宗座署理;1994年被擢升为总主教,同年年底被擢升为枢机;2009年2月离世。范廷颂于1919年6月15日在越南宁平省天主教发艳教区出生;童年时接受良好教育后,被一位越南神父带到河内继续其学业。范廷颂于1940年在河内大修道院完成神学学业。范廷颂于1949年6月6日在河内的主教座堂晋铎;及后被派到圣女小德兰孤儿院服务。1950年代,范廷颂在河内堂区创建移民接待中心以收容到河内避战的难民。1954年,法越战争结束,越南**共和国建都河内,当时很多天主教神职人员逃至越南的南方,但范廷颂仍然留在河内。翌年管理圣若望小修院;惟在1960年因捍卫修院的自由、自治及拒绝政府在修院设政治课的要求而被捕。1963年4月5日,教宗任命范廷颂为天主教北宁教区主教,同年8月15日就任;其牧铭为「我信天主的爱」。由于范廷颂被越南政府软禁差不多30年,因此他无法到所属堂区进行牧灵工作而专注研读等工作。范廷颂除了面对战争、贫困、被当局**天主教会等问题外,也秘密恢复修院、创建女修会团体等。1990年,教宗若望保禄二世在同年6月18日擢升范廷颂为天主教河内总教区宗座署理以填补该教区总主教的空缺。1994年3月23日,范廷颂被教宗若望保禄二世擢升为天主教河内总教区总主教并兼天主教谅山教区宗座署理;同年11月26日,若望保禄二世擢升
'''
output=predictor.predict_generate_beamsearch(
text,
out_max_length = 30
)
output
或者是这样的?
text = '''
问题:1994年3月,范廷颂担任什么职务?答案:[MASK] 范廷颂枢机(,),圣名保禄·若瑟(),是越南罗马天主教枢机。1963年被任为主教;1990年被擢升为天主教河内总教区宗座署理;1994年被擢升为总主教,同年年底被擢升为枢机;2009年2月离世。范廷颂于1919年6月15日在越南宁平省天主教发艳教区出生;童年时接受良好教育后,被一位越南神父带到河内继续其学业。范廷颂于1940年在河内大修道院完成神学学业。范廷颂于1949年6月6日在河内的主教座堂晋铎;及后被派到圣女小德兰孤儿院服务。1950年代,范廷颂在河内堂区创建移民接待中心以收容到河内避战的难民。1954年,法越战争结束,越南**共和国建都河内,当时很多天主教神职人员逃至越南的南方,但范廷颂仍然留在河内。翌年管理圣若望小修院;惟在1960年因捍卫修院的自由、自治及拒绝政府在修院设政治课的要求而被捕。1963年4月5日,教宗任命范廷颂为天主教北宁教区主教,同年8月15日就任;其牧铭为「我信天主的爱」。由于范廷颂被越南政府软禁差不多30年,因此他无法到所属堂区进行牧灵工作而专注研读等工作。范廷颂除了面对战争、贫困、被当局**天主教会等问题外,也秘密恢复修院、创建女修会团体等。1990年,教宗若望保禄二世在同年6月18日擢升范廷颂为天主教河内总教区宗座署理以填补该教区总主教的空缺。1994年3月23日,范廷颂被教宗若望保禄二世擢升为天主教河内总教区总主教并兼天主教谅山教区宗座署理;同年11月26日,若望保禄二世擢升
'''
output=predictor.predict_generate_randomsample(
text,
out_max_length = 30
)
output
输出好像都不太对啊
[MASK]前面的是问题,后面的是上下文。
您好,下面是我使用的代码
import os
import numpy as np
import torch
from torch.utils.data import Dataset
from flagai.auto_model.auto_loader import AutoLoader
from flagai.trainer import Trainer
from flagai.metrics import accuracy_metric
from flagai.data.dataset import SuperGlueDataset
from flagai.test_utils import CollateArguments
from flagai.data.dataset import ConstructSuperglueStrategy
from flagai.data.dataset.superglue.control import DEFAULT_METRICS, MULTI_TOKEN_TASKS, CH_TASKS
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
task_name = "tnews"
auto_loader = AutoLoader('classification',
model_name="GLM-large-ch",
model_dir="./checkpoints",
load_pretrain_params=True,
class_num=15)
cl_args = CollateArguments()
cl_args.cloze_eval = False
cl_args.multi_token = task_name in MULTI_TOKEN_TASKS
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()
train_dataset = SuperGlueDataset(task_name=task_name,
data_dir='./datasets/',
dataset_type='train',
tokenizer=tokenizer)
collate_fn = ConstructSuperglueStrategy(cl_args,
tokenizer,
task_name=task_name)
valid_dataset = SuperGlueDataset(task_name=task_name,
data_dir='./datasets/',
dataset_type='dev',
tokenizer=tokenizer)
trainer = Trainer(
env_type="pytorch",
experiment_name="GLM_cls",
batch_size=1,
lr=1e-5,
weight_decay=1e-5,
epochs=10,
log_interval=1,
eval_interval=10000,
pytorch_device=device,
checkpoint_activations=False,
save_dir="./glm_cls",
save_interval=10000,
)
trainer.train(model,
train_dataset=train_dataset,
valid_dataset=valid_dataset,
collate_fn=collate_fn,
metric_methods=[["acc", accuracy_metric]])
import torch
from flagai.trainer import Trainer
from flagai.model.glm_model import GLMForSequenceClassification
from flagai.data.tokenizer import Tokenizer
from flagai.metrics import accuracy_metric
from flagai.data.dataset import SuperGlueDataset
from flagai.test_utils import CollateArguments
from flagai.data.dataset.superglue.control import DEFAULT_METRICS, MULTI_TOKEN_TASKS, CH_TASKS
from flagai.data.dataset import ConstructSuperglueStrategy
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
task_name = "tnews"
model_name = 'GLM-large-ch'
cl_args = CollateArguments()
cl_args.cloze_eval = False
cl_args.multi_token = task_name in MULTI_TOKEN_TASKS
tokenizer = Tokenizer.from_pretrained(model_name)
class_num = 15
model = GLMForSequenceClassification.from_pretrain(model_name=model_name, spell_length=2,
class_num=class_num, tune_prefix_layers=1)
train_dataset = SuperGlueDataset(task_name=task_name,
data_dir='./datasets/',
dataset_type='train',
tokenizer=tokenizer)
collate_fn = ConstructSuperglueStrategy(cl_args,
tokenizer,
task_name=task_name)
valid_dataset = SuperGlueDataset(task_name=task_name,
data_dir='./datasets/',
dataset_type='dev',
tokenizer=tokenizer)
trainer = Trainer(env_type='pytorch',
pytorch_device=device,
epochs=2,
batch_size=1,
lr=1e-5,
weight_decay=1e-5,
eval_interval=10000,
checkpoint_activations=False,
fp16=True,
log_interval=1000,
save_interval=10000,
save_dir="./glm_large_clue")
trainer.train(model,
train_dataset=train_dataset,
valid_dataset=valid_dataset,
collate_fn=collate_fn,
metric_methods=[["acc", accuracy_metric]])
运行如下代码:
import torch
from flagai.model.predictor.predictor import Predictor
from flagai.auto_model.auto_loader import AutoLoader
if __name__ == "__main__":
"""Main training program."""
print('Generate Samples')
# Random seeds for reproducibility.
# Model,
loader = AutoLoader(task_name='lm',
model_name='GLM-large-en',
only_download_config=False)
model = loader.get_model()
tokenizer = loader.get_tokenizer()
model.cuda(torch.cuda.current_device())
predictor = Predictor(model, tokenizer)
# generate samples
text = [
'Question: Is drinking beer bad for your health? Answer: [gMASK]',
]
for t in text:
output = predictor.predict_generate_randomsample(
t, top_k=50, repetition_penalty=4.0, top_p=1.0)
print(t, '\n', output)
得到如下输出:
******************** lm glm-large-en
Question: Is drinking beer bad for your health? Answer: [gMASK]
[CLS] question : is drinking beer bad for your health ? answer : [gMASK] <|startofpiece|> , , 1 <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|> <|endofpiece|>
Describe the bug
A clear and concise description of what the bug is.
Tasks
To Reproduce
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
if __name__ == '__main__':
loader = AutoLoader("seq2seq", "glm-10b-ch", model_dir="./checkpoints/")
model = loader.get_model()
tokenizer = loader.get_tokenizer()
predictor = Predictor(model, tokenizer)
text = "今天天气不错[gMASK]"
output = predictor.predict_generate_beamsearch(text, out_max_length=5, beam_size=1)
print(output)
结果会输出 ?? ?? ??,
debug内部发现给tokenizer decode之前的ID都是0
环境: Win10 x64, Python 3.10,FlagAI版本是pip上当前最新。默认似乎是使用CPU计算的,CPU有明显占用。
使用 AutoLoader("lm", "glm-10b-ch", model_dir="./checkpoints/")
也是一样的问题
Describe the question
A clear and concise description of what the question is.
您好,请问有没有模型的下载地址,代码下载的方法速度较慢,有没有百度云盘等模型下载链接,谢谢~
Additional context
Add any other context about the question here.
Describe the bug
A clear and concise description of what the bug is.
Tasks
To Reproduce
Error code
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
OS (please complete the following information):
Additional context
Add any other context about the problem here.
我在测试ALTDiffusion, 文档中说只需要10G以上现存就可以,但是我12g的显存跑不起来, 显存不够。请问这是为什么呢?
I've just installed the package locally and ran test code quickstart/title_en.py and got the following issues.
Any possible reasons? thanks!! see detail below
skys-MacBook-Pro:quickstart sky$ python3 title_en.py
******************** title-generation 100013 bert-base-en
Traceback (most recent call last):
File "title_en.py", line 29, in
print(predictor.predict_generate_beamsearch(text, out_max_length=50, beam_size=3))
File "../flagai/model/predictor/predictor.py", line 231, in predict_generate_beamsearch
return bert_beamsearch(self.model,
File "../flagai/model/predictor/utils.py", line 676, in bert_beamsearch
out_puts_ids = bert_beam_search(model,
File "../flagai/model/predictor/utils.py", line 280, in bert_beam_search
scores = bert_predict_generate(model, new_input_ids,
File "../flagai/model/predictor/utils.py", line 235, in bert_predict_generate
score = model(**{
File "/Users/sky/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "../flagai/model/bert_model.py", line 359, in forward
encoder_out, pooler_out = self.model(
File "/Users/sky/Library/Python/3.8/lib/python/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "../flagai/model/bert_model.py", line 153, in forward
extended_attention_mask = extended_attention_mask * attention_mask
RuntimeError: The size of tensor a (3) must match the size of tensor b (171) at non-singleton dimension 2
现有的例子只包含文本生成和标题生成,有没有文本分类的样例呢?
It seems that there is issue to establish connection to proxy of Huggingface to download safety checker model. Could we change the safety checker model download URL from Huggingface to Baai ModelHub?
Below is error output when runing python generate.py:
root@-0:~/FlagAI/examples/AltDiffusion# python generate.py
******************** text2img altdiffusion-m9
Extension horovod.torch has not been built: /usr/local/lib/python3.8/dist-packages/horovod/torch/mpi_lib/_mpi_lib.cpython-38-x86_64-linux-gnu.so not found
If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
Warning! MPI libs are missing, but python applications are still avaiable.
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 64, 64) = 16384 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 96, in create_connection
raise err
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 86, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 382, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 358, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f2b1c1d1b80>: Failed to establish a new connection: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='s3-proxy.huggingface.tech', port=443): Max retries exceeded with url: /lfs.huggingface.co/repos/c3/33/c333b2b94c5a8a06ddcbb20b02e728f6bef192870028f8a6859247cabb771a03/64b8393f1afd5a0c1ed2aa5f341fa7c08286839a48f3743162a76a2835c808bd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIA4N7VTDGOZQA2IKWK%2F20230104%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230104T011244Z&X-Amz-Expires=259200&X-Amz-Signature=6b4bb6d3d218d24cf8b030d2ee60679e3175ba64c350072718017b7701b01d02&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22pytorch_model.bin%22&x-id=GetObject (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2b1c1d1b80>: Failed to establish a new connection: [Errno 110] Connection timed out'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2007, in from_pretrained
resolved_archive_file = cached_path(
File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 284, in cached_path
output_path = get_from_cache(
File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 594, in get_from_cache
http_get(url_to_download, temp_file, proxies=proxies, resume_size=resume_size, headers=headers)
File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 432, in http_get
r = requests.get(url, stream=True, proxies=proxies, headers=headers)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='s3-proxy.huggingface.tech', port=443): Max retries exceeded with url: /lfs.huggingface.co/repos/c3/33/c333b2b94c5a8a06ddcbb20b02e728f6bef192870028f8a6859247cabb771a03/64b8393f1afd5a0c1ed2aa5f341fa7c08286839a48f3743162a76a2835c808bd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIA4N7VTDGOZQA2IKWK%2F20230104%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230104T011244Z&X-Amz-Expires=259200&X-Amz-Signature=6b4bb6d3d218d24cf8b030d2ee60679e3175ba64c350072718017b7701b01d02&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22pytorch_model.bin%22&x-id=GetObject (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2b1c1d1b80>: Failed to establish a new connection: [Errno 110] Connection timed out'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "generate.py", line 19, in <module>
predictor.predict_generate_images(
File "/usr/local/lib/python3.8/dist-packages/flagai/model/predictor/predictor.py", line 342, in predict_generate_images
safety_checker, safety_feature_extractor = get_safety_checker()
File "/usr/local/lib/python3.8/dist-packages/flagai/model/predictor/utils.py", line 24, in get_safety_checker
safety_checker = StableDiffusionSafetyChecker.from_pretrained(safety_model_id)
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2096, in from_pretrained
raise EnvironmentError(
OSError: Can't load the model for 'CompVis/stable-diffusion-safety-checker'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'CompVis/stable-diffusion-safety-checker' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
如题。
如下禁用transpose_weight方法时正常
def load_weights_without_trans(self, checkpoint_path):
checkpoint = torch.load(checkpoint_path,
map_location=torch.device("cpu"))
if "module" in checkpoint:
# ddp
checkpoint = checkpoint["module"]
#checkpoint = self.transpose_weight(checkpoint)
self.load_state_dict(checkpoint, strict=False)
return checkpoint
如题
我在 V100 单卡上可以跑得动 glm-10b 英文的推理,但是跑 quickstart 中的任务时把模型改成 glm-10b-ch 就会 OOM
Since we have moved the repo to FlagAI-Open, remember to change the README.md file.
There is the line as following.
git clone https://github.com/BAAI-Open/FlagAI.git
Congratulations on the new release of AltDiffusion-m9 which supporting 9 popular languages in the world. But when I was pointed to the link of exmple/AltDiffusion, I couldn't find any m9 information in the readme file, until I went into almost the end of readme file.
It will be good to add the multilingual support information in the very beginning of readme.
Below is my code, the output is not good, I wander if the prompt is suitable. Could you give me some sample configs?
from diffusers import AltDiffusionPipeline, EulerDiscreteScheduler
from PIL import Image
from diffusers import AltDiffusionImg2ImgPipeline
if __name__ == '__main__':
text2img = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
img2img = AltDiffusionImg2ImgPipeline(**text2img.components)
img2img = img2img.to("cuda")
img = Image.open('input/高圆圆.jpeg')
out_imgs = img2img(prompt="((masterpiece)), (((best quality))), ((ultra-detailed)), ((illustration)), girl, genshin impact,vision",\
init_image=img, strength=0.7,\
guidance_scale=30,\
negative_prompt='nsfw, longbody, lowres, bad anatomy, bad hands, missing fingers, pubic hair,extra digit, fewer digits, cropped, worst quality, low quality').images[0]
out_imgs.save(f'output.png')
1、如题 有在阅读理解上调试后的模型吗?
2、而且 predictor 构造的模版分布于collect_fn中
elif self.task_name in ["cmrc"]:
mask_id = self.tokenizer.get_command_id('MASK')
source_text = example.text_a
target_text = example.meta["answer"].strip()
question = example.meta["question"].strip()
source_tokens = self.tokenizer.EncodeAsIds(source_text.rstrip())
question_tokens = self.tokenizer.EncodeAsIds("问题:" + question +
"答案:")
max_src_length = self.args.max_src_length - len(
question_tokens) - 2
if max_src_length <= 0:
question_tokens = question_tokens[self.args.max_src_length //
4]
source_tokens = [cls_id] + question_tokens + [
mask_id
] + source_tokens[:max_src_length]
是否考虑将这部分做文档进行说明。
3、数据集的导入和预处理是依赖于具体数据集名称 而不依赖于更一般的任务格式和数据集格式 会增加用户模仿数据集输入格式的成本。
请问有没有OPT每种模型大小的资源使用情况?
Describe the bug
key error from superGLUE example
task_name = 'qqp'
trainer = Trainer(env_type='pytorch',
pytorch_device="cuda",
epochs=2,
batch_size=1,
eval_interval=1000,
checkpoint_activations=False,
fp16=True,
log_interval=1,
save_dir="./glm_superglue_en",
# master_ip='127.0.0.1',
# master_port=17755,
# num_nodes=1,
# num_gpus=2,
# hostfile='./hostfile',
model_parallel_size=2,
deepspeed_config='./deepspeed.json',
training_script=__file__)
model = GLMForSingleTokenCloze.from_pretrain(download_path="/mnt/test_10b_models",
model_name="GLM-large-en")
tokenizer = GLM10bENBPETokenizer()
train_dataset = SuperGlueDataset(task_name=task_name,
data_dir='./datasets/',
dataset_type='train',
tokenizer=tokenizer,
cloze_eval=True)
valid_dataset = SuperGlueDataset(task_name=task_name,
data_dir='./datasets/',
dataset_type='dev',
tokenizer=tokenizer,
cloze_eval=True)
cl_args = CollateArguments()
cl_args.cloze_eval = True
if task_name in ['copa', 'wsc', 'record']:
cl_args.multi_token = True
from flagai.data.dataset import ConstructSuperglueStrategy
collate_fn = ConstructSuperglueStrategy(cl_args,
tokenizer,
task_name=task_name)
trainer.train(model,
train_dataset=train_dataset,
valid_dataset=valid_dataset,
collate_fn=collate_fn,
metric_methods=[["acc", accuracy_metric]])
Tasks
To Reproduce
Creating qqp dataset from file at ./datasets/ (split=train)
Returning 363846 train examples with label dist.: [('0', 229468), ('1', 134378)]
Creating qqp dataset from file at ./datasets/ (split=dev)
Returning 40430 dev examples with label dist.: [('0', 25545), ('1', 14885)]
Optimizer = Adam
[2022-06-08 17:54:06,911] [INFO] [logger.py:70:log_dist] [Rank -1] loading checkpoints form checkpoints/99
[2022-06-08 17:54:06,912] [INFO] [logger.py:70:log_dist] [Rank -1] WARNING: could not find the metadata file checkpoints/99/latest_checkpointed_iteration.txt
[2022-06-08 17:54:06,912] [INFO] [logger.py:70:log_dist] [Rank -1] will not load any checkpoints and will start from random
[2022-06-08 17:54:06,912] [INFO] [logger.py:70:log_dist] [Rank -1] working on epoch 0 ...
Traceback (most recent call last):
File "train_10b_superglue.py", line 59, in <module>
trainer.train(model,
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/trainer.py", line 448, in train
for iteration_, batch in enumerate(train_dataloader):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
data = self._next_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/data_collator/collate_fn.py", line 105, in __call__
sample = self.pvp.encode(example, {})
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/superglue/pvp.py", line 195, in encode
raw_parts_a, raw_parts_b = self.get_parts(example)
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/superglue/pvp.py", line 1493, in get_parts
return [text_a], [" Do you mean ", text_b, [self.mask], "."]
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/dataset/superglue/pvp.py", line 99, in mask
return self.tokenizer.get_command('MASK').Id
File "/opt/conda/lib/python3.8/site-packages/flagai-1.0.1-py3.8.egg/flagai/data/tokenizer/tokenizer.py", line 172, in get_command
return self.command_name_map[name]
KeyError: 'MASK'
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
OS (please complete the following information):
Is there any plan for Swin Transformer?
Hi 按照以下程序运行后 会出现The model_name: altdiffusion-m9 is not be supported的错误
import torch
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
# Initialize
prompt = "Anime portrait of natalie portman as an anime girl by stanley artgerm lau, wlop, rossdraws, james jean, andrei riabovitchev, marc simonetti, and sakimichan, trending on artstation"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
loader = AutoLoader(task_name="text2img", #contrastive learning
model_name="AltDiffusion-m9",
model_dir="./checkpoints")
model = loader.get_model()
model.eval()
model.to(device)
predictor = Predictor(model)
predictor.predict_generate_images(prompt)
我在两台3090上对GLM-10b-ch
进行微调时到验证阶段总是会显存不足,但是训练阶段不会,想知道是两台3090不足以对GLM-10b-ch
进行微调还是我的参数设置的有问题?
下面是我训练时使用的参数:
Trainer
:
trainer = Trainer(
env_type="deepspeed+mpu",
epochs=10,
experiment_name="GLM-10b-ch-seq2seq",
eval_interval=2000,
log_interval=100,
load_dir=None,
# parallel settings
master_ip='127.0.0.1',
master_port=17750,
num_nodes=1,
num_gpus=2,
hostfile='hostfile',
training_script=__file__,
# deepspeed
deepspeed_config='./config/deepspeed.json',
# megatron-lm
model_parallel_size=2,
save_dir="checkpoints_glm_title_generation",
save_interval=1,
num_checkpoints=3,
)
deepspeed.json
:
{
"train_micro_batch_size_per_gpu": 16,
"eval_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 2,
"steps_per_print": 100,
"gradient_clipping": 1.0,
"zero_optimization": {
"stage": 3,
"contiguous_gradients": false,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 5e7,
"allgather_bucket_size": 5e7,
"cpu_offload": true
},
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"optimizer": {
"type": "Adam",
"params": {
"lr": 0.000005,
"weight_decay": 0.01,
"betas": [
0.9,
0.98
],
"eps": 1e-6
}
},
"activation_checkpointing": {
"partition_activations": false,
"contiguous_memory_optimization": false
},
"wall_clock_breakdown": false
}
Code in Trainer
if hasattr(tmp_model,
'config') and 'checkpoint_activations' in tmp_model.config:
tmp_model.config[
'checkpoint_activations'] = tmp_checkpoint_activations
Code in uitils.py
if hasattr(model, 'save_config'):
model.save_config(config_path)
log_dist(' successfully saved {}'.format(config_path))
Describe the bug
There is a error when I try to finetune bert model on masked langguage model learning task.
Tasks
To Reproduce
https://github.com/marscrazy/Tab2NL/blob/train_with_flagai/train_our_flagai.py
import os
import argparse
from data import get_dataset
from sklearn.metrics import roc_auc_score
import numpy as np
import random
import time
import torch
from flagai.trainer import Trainer
from flagai.auto_model.auto_loader import AutoLoader
from transformers import DataCollatorForLanguageModeling, AutoTokenizer
def set_seed(SEED):
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)
np.random.seed(SEED)
random.seed(SEED)
#torch.backends.cudnn.deterministic = True
set_seed(26)
def compute_metrics(predictions, labels, meta=None):
predictions = predictions[:,1]
return {'roc_auc':roc_auc_score(labels,predictions)}
class txtDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
item['labels'] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.labels)
def finetuning_model(
train_x, train_y, val_x, val_y, cv_fold=1, dataset_id=11,
model_dir = "bert-base-ch", #bert-base-uncased
is_mlm = False,
num_train_epochs=10, #10
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=32, # batch size for evaluation
warmup_steps=200, # number of warmup steps for learning rate scheduler
weight_decay=0.1, # strength of weight decay
logging_steps=100,#20
seed=11,
learning_rate=4e-5,
metric_for_best_model=None,
config = None,
tokenizer = None,
model = None,
output_dir = None,
logging_dir = None,
return_model = False
):
if output_dir is None:
output_dir = './results/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-mlm'
if logging_dir is None:
logging_dir = './logs/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-mlm'
#if config is None:
# config = AutoConfig.from_pretrained(model_dir)
# import json
# config = json.load(open('./checkpoints/BERT-base-en/config.json'))
if model is None:
if is_mlm:
auto_loader = AutoLoader(
"masklm",
model_name="BERT-base-en",
model_dir='./checkpoints',
)
else:
auto_loader = AutoLoader(
"classification",
model_name="BERT-base-en",
model_dir='./checkpoints',
class_num = 2
)
model = auto_loader.get_model()
tokenizer = AutoTokenizer.from_pretrained("./checkpoints/BERT-base-en")
train_encodings = tokenizer(train_x.tolist(), truncation=True, padding=True)
val_encodings = tokenizer(val_x.tolist(), truncation=True, padding=True)
train_dataset = txtDataset(train_encodings, train_y.astype(np.longlong))
val_dataset = txtDataset(val_encodings, val_y.astype(np.longlong))
if is_mlm:
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm_probability=0.15
)
class MyTrainer(Trainer):
def forward_step(self, data, model, mems):
model_output = model(**{'input_ids':data['input_ids'],
'segment_ids':data['token_type_ids'],
'attention_mask':data['attention_mask']
})
print(model_output)
trainer = MyTrainer(
env_type='pytorch',
epochs=num_train_epochs,
weight_decay=weight_decay,
log_interval=logging_steps,
seed=seed,
lr=learning_rate,
save_dir=output_dir,
tensorboard_dir=logging_dir
)
trainer.train(model=model, # the instantiated 🤗 Transformers model to be trained
train_dataset=train_dataset, # training dataset
valid_dataset=val_dataset, # evaluation dataset
metric_methods=[compute_metrics] if not is_mlm else [],
collate_fn=data_collator if is_mlm else None)
dir_name = os.listdir(output_dir)[0]
cur_model_dir = os.path.join(output_dir,dir_name)
del model
torch.cuda.empty_cache()
time.sleep(5)
if return_model:
return cur_model_dir, tokenizer, config
def train_ptm_cls(train_x,train_y,val_x, val_y, test_x, test_y, cv_fold=1, dataset_id=11,tokenizer=None, config=None,
model_dir = "../contrastive/resources/bert-base-uncased"):
train_encodings = tokenizer(train_x.tolist(), truncation=True, padding=True)
val_encodings = tokenizer(val_x.tolist(), truncation=True, padding=True)
test_encodings = tokenizer(test_x.tolist(), truncation=True, padding=True)
train_dataset = txtDataset(train_encodings, train_y.astype(np.longlong))
test_dataset = txtDataset(test_encodings, test_y.astype(np.longlong))
val_dataset = txtDataset(val_encodings, val_y.astype(np.longlong))
model = AutoModelForSequenceClassification.from_pretrained(model_dir , config=config, from_tf=False,num_labels=2)
output_dir = './results/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-cls'
log_dir = './logs/'+str(dataset_id)+'-cv-'+str(cv_fold)+'-cls'
training_args = TrainingArguments(
output_dir=output_dir, # output directory
num_train_epochs=10, # total number of training epochs
per_device_train_batch_size=32, # batch size per device during training
per_device_eval_batch_size=32, # batch size for evaluation
warmup_steps=1000, # number of warmup steps for learning rate scheduler
weight_decay=0.1, # strength of weight decay
logging_dir=log_dir, # directory for storing logs
logging_steps=10,
eval_steps=10,
save_steps=10,
save_total_limit=1,
do_eval=True,
evaluation_strategy='steps',
learning_rate=2e-5,
seed=11,
#save_strategy='steps',
load_best_model_at_end=True,
metric_for_best_model="roc_auc"
)
trainer = Trainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=test_dataset, # evaluation dataset
compute_metrics=compute_metrics
#optimizers=(optimizer,None)
)
trainer.train()
train_rs = trainer.evaluate(train_dataset)
test_rs = trainer.evaluate(test_dataset)
val_rs = trainer.evaluate(val_dataset)
return train_rs['eval_roc_auc'], val_rs['eval_roc_auc'],test_rs['eval_roc_auc']
def train(dataset_id=1):
ds = get_dataset(dataset_id=dataset_id)
rs = []
for i, (train_x, val_x, test_x, train_y, val_y, test_y) in enumerate(ds.generate_datasets(to_txt=True)):
model_dir,tokenizer, config = finetuning_model(train_x,train_y,val_x, val_y,cv_fold=i, dataset_id=dataset_id,
model_dir = "../contrastive/resources/bert-base-uncased",is_mlm=True)
train_auc, val_auc, test_auc = finetuning_model(
train_x,train_y,val_x, val_y, test_x, test_y, cv_fold=i,dataset_id= dataset_id,tokenizer=tokenizer, config= config,
model_dir = model_dir,is_mlm=False)
rs.append((train_auc,val_auc,test_auc))
print("Train auc {:.3f}, val auc {:.3f}, Test auc {:.3f}".format(train_auc, val_auc, test_auc))
for x,y,z in rs:
print("Train auc {:.3f}, Val auc {:.3f}, Test auc {:.3f}".format(x,y,z))
print("avg auc is {:.3f}\t{:.3f}".format(np.mean([x[-1] for x in rs]), np.std([x[-1] for x in rs])))
#train_xgb(ds)
if __name__=="__main__":
parser = argparse.ArgumentParser(description='Train Classifier with mixup', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
# Data
parser.add_argument('--model_dir', type=str, default='H:\\contrast\\SimCSE-main\\SimCSE-main\\bert-base-uncased',help='the path to pretrained models')
parser.add_argument('--dataset_id', type=str, default='11',choices=['1','2','3','4','5','6','7','8','9','10','11'], help='Choose between 1-11.')
# MLM pretrain
parser.add_argument('--mlm_warmup_steps', default=1000, type=int, metavar='N', help='warmup steps (default: 1000)')
parser.add_argument('--mlm_learning_rate', type=float, default=2e-5)
parser.add_argument('--mlm_decay', type=float, default=0.1, help='weight decay (L2 penalty)')
parser.add_argument('--mlm_epochs', type=int, default=300, help='number of epochs to train')
parser.add_argument('--mlm_train_batch_size', type=int, default=32)
parser.add_argument('--mlm_eval_batch_size', type=int, default=32)
parser.add_argument('--mlm_logging_steps', default=10, type=int, metavar='N', help='logging frequency (default: 10)')
# text classification
parser.add_argument('--cls_epochs', type=int, default=300, help='number of epochs to train')
parser.add_argument('--cls_train_batch_size', type=int, default=32)
parser.add_argument('--cls_eval_batch_size', type=int, default=32)
parser.add_argument('--cls_warmup_steps', default=1000, type=int, metavar='N', help='warmup steps (default: 1000)')
parser.add_argument('--cls_decay', type=float, default=0.1, help='weight decay (L2 penalty)')
parser.add_argument('--cls_logging_steps', default=10, type=int, metavar='N', help='logging frequency (default: 10)')
parser.add_argument('--cls_learning_rate', type=float, default=2e-5)
# Optimization options
#parser.add_argument('--train', type=str, default='vanilla', choices=['vanilla', 'mixup', 'mixup_hidden', 'SRRS'], help='mixup layer')
# training
#parser.add_argument('--momentum', type=float, default=0.9)
#parser.add_argument('--schedule', type=int, nargs='+', default=[150, 225], help='decrease learning rate at these epochs')
#parser.add_argument('--gammas', type=float, nargs='+', default=[0.1, 0.1], help='LR is multiplied by gamma on schedule, number of gammas should be equal to schedule')
# Checkpoints
parser.add_argument('--resume', default='', type=str, metavar='PATH', help='path to latest checkpoint (default: none)')
parser.add_argument('--start_epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
# random seed
parser.add_argument('--seed', default=0, type=int, help='manual seed')
parser.add_argument('--add_name', type=str, default='')
parser.add_argument('--job_id', type=str, default='')
args = parser.parse_args()
ds = get_dataset(dataset_id=int(args.dataset_id))
rs = []
for i, (train_x, val_x, test_x, train_y, val_y, test_y) in enumerate(ds.generate_datasets(to_txt=True,with_title=True if args.dataset_id not in ['1','3'] else False)):
model_dir,tokenizer, config = finetuning_model(train_x, train_y, val_x, val_y,cv_fold=i, dataset_id=args.dataset_id,
model_dir = "hkunlp/T5_large_prefix_all_tasks_2upsample2",#bert-base-uncased,hkunlp/from_all_T5_large_prefix_sql2text2
is_mlm = True,
num_train_epochs=10, #args.mlm_epochs,10
per_device_train_batch_size=args.mlm_train_batch_size, # batch size per device during training
per_device_eval_batch_size=args.mlm_eval_batch_size, # batch size for evaluation
warmup_steps=args.mlm_warmup_steps, # number of warmup steps for learning rate scheduler
weight_decay=args.mlm_decay, # strength of weight decay
logging_steps=100,#20
seed=11,
learning_rate=4e-5,
metric_for_best_model=None,
config = None,
tokenizer = None,
model = None,
output_dir = None,
logging_dir = None,
return_model = False)
model_dir,tokenizer,config, trainer= finetuning_model(
train_x, train_y, val_x, val_y, cv_fold=i,dataset_id= args.dataset_id,tokenizer=tokenizer, config= config,
model_dir = model_dir,is_mlm=False, return_model=True)
test_encodings = tokenizer(test_x.tolist(), truncation=True, padding=True)
test_dataset = txtDataset(test_encodings, test_y.astype(np.longlong))
test_auc = trainer.evaluate(test_dataset)['eval_roc_auc']
rs.append(test_auc)
print("Test auc {:.3f}".format(test_auc))
print("avg auc is {:.3f}\t{:.3f}".format(np.mean(rs),np.std(rs)))
Expected behavior
fine-tuning BERT on MLM and classification tasks
Screenshots
If applicable, add screenshots to help explain your problem.
OS (please complete the following information):
代码好像没有区分device
loader = AutoLoader(task_name="lm", model_name="opt-1.3b-en")
self.wte = nn.Embedding(config.vocab_size, config.n_embd)
AttributeError: 'dict' object has no attribute 'vocab_size'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.