Giter Club home page Giter Club logo

bert4torch's Introduction

bert4torch

licence GitHub release PyPI PyPI - Downloads GitHub stars GitHub Issues contributions welcome Generic badge

Documentation | Torch4keras | Examples | build_MiniLLM_from_scratch

目录

1. 下载安装

安装稳定版

pip install bert4torch

安装最新版

pip install git+https://github.com/Tongjilibo/bert4torch
  • 注意事项:pip包的发布慢于git上的开发版本,git clone注意引用路径,注意权重是否需要转换
  • 测试用例git clone https://github.com/Tongjilibo/bert4torch,修改example中的预训练模型文件路径和数据路径即可启动脚本
  • 自行训练:针对自己的数据,修改相应的数据处理代码块
  • 开发环境:原使用torch==1.10版本进行开发,现已切换到torch2.0开发,如其他版本遇到不适配,欢迎反馈

2. 功能

  • LLM模型: 加载chatglm、llama、 baichuan、ziya、bloom等开源大模型权重进行推理和微调

  • 核心功能:加载bert、roberta、albert、xlnet、nezha、bart、RoFormer、RoFormer_V2、ELECTRA、GPT、GPT2、T5、GAU-alpha、ERNIE等预训练权重继续进行finetune、并支持在bert基础上灵活定义自己模型

  • 丰富示例:包含llmpretrainsentence_classficationsentence_embeddingsequence_labelingrelation_extractionseq2seqserving等多种解决方案

  • 实验验证:已在公开数据集实验验证,使用如下examples数据集

  • 易用trick:集成了常见的trick,即插即用

  • 其他特性加载transformers库模型一起使用;调用方式简洁高效;有训练进度条动态展示;配合torchinfo打印参数量;默认Logger和Tensorboard简便记录训练过程;自定义fit过程,满足高阶需求

  • 训练过程

    2022-10-28 23:16:10 - Start Training
    2022-10-28 23:16:10 - Epoch: 1/2
    5000/5000 [==============================] - 13s 3ms/step - loss: 0.1351 - acc: 0.9601
    Evaluate: 100%|██████████████████████████████████████████████████| 2500/2500 [00:03<00:00, 798.09it/s] 
    test_acc: 0.98045. best_test_acc: 0.98045
    
    2022-10-28 23:16:27 - Epoch: 2/2
    5000/5000 [==============================] - 13s 3ms/step - loss: 0.0465 - acc: 0.9862
    Evaluate: 100%|██████████████████████████████████████████████████| 2500/2500 [00:03<00:00, 635.78it/s] 
    test_acc: 0.98280. best_test_acc: 0.98280
    
    2022-10-28 23:16:44 - Finish Training
    
功能 bert4torch transformers 备注
训练进度条 进度条打印loss和定义的metrics
分布式训练dp/ddp torch自带dp/ddp
各类callbacks 日志/tensorboard/earlystop/wandb等
大模型推理,stream/batch输出 各个模型是通用的,无需单独维护脚本
大模型微调 lora依赖peft库,pv2自带
丰富tricks 对抗训练等tricks即插即用
代码简洁易懂,自定义空间大 代码复用度高, keras代码训练风格
仓库的维护能力/影响力/使用量/兼容性 目前仓库个人维护

3. 快速上手

4. 版本和更新历史

4.1 版本历史

更新日期 bert4torch torch4keras 版本说明
20240418 0.5.0 0.2.2 修复chatglm3的bug, 修复save_pretrained时多文件的bug,增加CausalLMLoss, 修改deepspeed的传参逻辑,修改Text2Vec的bug, 完善openai client, 增加get_weight_decay_optim_groups
20240317 0.4.9.post2 0.2.1.post2 增加get_weight_decay_optim_groups函数, attention中允许is_causal,修改repetition_penalty的bug,把baichuan从llama中剥离,修复config_path的bug,允许num_key_value_heads参数,torch4keras-v0.2.1.post2更新特性
20240221 0.4.8 0.2.0 fastapi发布服务允许闲时offload到cpu, build_transformer_model允许从hf下载, 添加FillMask的pipeline, 添加SequenceClassificationTrainer

更多版本

4.2 更新历史

更多历史

5. 预训练权重

  • 预训练模型支持多种代码加载方式
from bert4torch.models import build_transformer_model

# 1. 仅指定config_path: 从头初始化模型结构, 不加载预训练模型
model = build_transformer_model('./model/bert4torch_config.json')

# 2. 仅指定checkpoint_path: 
## 2.1 文件夹路径: 自动寻找路径下的*.bin/*.safetensors权重文件 + bert4torch_config.json/config.json文件
model = build_transformer_model(checkpoint_path='./model')

## 2.2 文件路径/列表: 文件路径即权重路径/列表, config会从同级目录下寻找
model = build_transformer_model(checkpoint_path='./pytorch_model.bin')

## 2.3 model_name: hf上预训练权重名称, 会自动下载hf权重以及bert4torch_config.json文件
model = build_transformer_model(checkpoint_path='bert-base-chinese')

# 3. 同时指定config_path和checkpoint_path(本地路径名或model_name排列组合): 
config_path = './model/bert4torch_config.json'  # 或'bert-base-chinese'
checkpoint_path = './model/pytorch_model.bin'  # 或'bert-base-chinese'
model = build_transformer_model(config_path, checkpoint_path)
模型分类 模型名称 权重来源 权重链接/checkpoint_path config_path
bert bert-base-chinese google-bert bert-base-chinese bert-base-chinese
chinese_L-12_H-768_A-12 谷歌 github, tf, Tongjilibo/bert-chinese_L-12_H-768_A-12
chinese-bert-wwm-ext HFL githubhfl/chinese-bert-wwm-ext chinese-bert-wwm-ext
bert-base-multilingual-cased google-bert bert-base-multilingual-cased bert-base-multilingual-cased
macbert HFL githubhfl/chinese-macbert-base, hfl/chinese-macbert-large chinese-macbert-base, chinese-macbert-large
wobert 追一科技 githubjunnyu/wobert_chinese_basejunnyu/wobert_chinese_plus_base wobert_chinese_base, wobert_chinese_plus_base
roberta chinese-roberta-wwm-ext HFL githubhfl/chinese-roberta-wwm-ext, hfl/chinese-roberta-wwm-ext-large chinese-roberta-wwm-ext, chinese-roberta-wwm-ext-large
roberta-small/tiny 追一科技 githubTongjilibo/chinese_roberta_L-4_H-312_A-12, Tongjilibo/chinese_roberta_L-6_H-384_A-12
roberta-base FacebookAI roberta-base roberta-base
guwenbert ethanyt ethanyt/guwenbert-base guwenbert-base
albert albert brightmart githubtorch, voidful/albert_chinese_tinyvoidful/albert_chinese_small, voidful/albert_chinese_base, voidful/albert_chinese_large, voidful/albert_chinese_xlarge, voidful/albert_chinese_xxlarge albert_chinese_tinyalbert_chinese_small, albert_chinese_base, albert_chinese_large, albert_chinese_xlarge, albert_chinese_xxlarge
nezha NEZHA 华为 githubtorch, sijunhe/nezha-cn-base, sijunhe/nezha-cn-large, sijunhe/nezha-base-wwm, sijunhe/nezha-large-wwm nezha-cn-base, nezha-cn-large, nezha-base-wwm, nezha-large-wwm
nezha_gpt_dialog bojone github, Tongjilibo/nezha_gpt_dialog
xlnet chinese-xlnet HFL github, hfl/chinese-xlnet-base chinese-xlnet-base
tranformer_xl huggingface transfo-xl/transfo-xl-wt103 transfo-xl-wt103
deberta Erlangshen-DeBERTa-v2 IDEA IDEA-CCNL/Erlangshen-DeBERTa-v2-97M-Chinese, IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese, IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-Chinese Erlangshen-DeBERTa-v2-97M-Chinese, Erlangshen-DeBERTa-v2-320M-Chinese, Erlangshen-DeBERTa-v2-710M-Chinese
electra Chinese-ELECTRA HFL githubhfl/chinese-electra-base-discriminator chinese-electra-base-discriminator
ernie ernie 百度文心 paddlenghuyong/ernie-1.0-base-zh, nghuyong/ernie-3.0-base-zh ernie-1.0-base-zh, ernie-3.0-base-zh
roformer roformer 追一科技 githubjunnyu/roformer_chinese_base roformer_chinese_base
roformer_v2 追一科技 githubjunnyu/roformer_v2_chinese_char_base roformer_v2_chinese_char_base
simbert simbert 追一科技 githubTongjilibo/simbert-chinese-base, Tongjilibo/simbert-chinese-small, Tongjilibo/simbert-chinese-tiny
simbert_v2/roformer-sim 追一科技 githubjunnyu/roformer_chinese_sim_char_basejunnyu/roformer_chinese_sim_char_ft_basejunnyu/roformer_chinese_sim_char_smalljunnyu/roformer_chinese_sim_char_ft_small roformer_chinese_sim_char_base, roformer_chinese_sim_char_ft_base, roformer_chinese_sim_char_small, roformer_chinese_sim_char_ft_small
gau GAU-alpha 追一科技 github, Tongjilibo/chinese_GAU-alpha-char_L-24_H-768
uie uie 百度 github, torch, Tongjilibo/uie-base
gpt CDial-GPT thu-coai github, thu-coai/CDial-GPT_LCCC-base, thu-coai/CDial-GPT_LCCC-large CDial-GPT_LCCC-base, CDial-GPT_LCCC-large
cmp_lm(26亿) 清华 github, TsinghuaAI/CPM-Generate CPM-Generate
nezha_gen huawei_noah github, Tongjilibo/chinese_nezha_gpt_L-12_H-768_A-12
gpt2-chinese-cluecorpussmall UER uer/gpt2-chinese-cluecorpussmall gpt2-chinese-cluecorpussmall
gpt2-ml imcaspar tf, torch, BaiduYun(84dh) gpt2-ml_15g_corpus, gpt2-ml_30g_corpus
bart bart_base_chinese 复旦fnlp github, v1.0, fnlp/bart-base-chinese bart-base-chinese, bart-base-chinese-v1.0
t5 t5 UER uer/t5-small-chinese-cluecorpussmall, uer/t5-base-chinese-cluecorpussmall t5-base-chinese-cluecorpussmall, t5-small-chinese-cluecorpussmall
mt5 谷歌 google/mt5-base mt5-base
t5_pegasus 追一科技 github, Tongjilibo/chinese_t5_pegasus_small, Tongjilibo/chinese_t5_pegasus_base
chatyuan v1&v2 clue-ai github, ClueAI/ChatYuan-large-v1, ClueAI/ChatYuan-large-v2 ChatYuan-large-v1, ChatYuan-large-v2
PromptCLUE clue-ai github, ClueAI/PromptCLUE-base PromptCLUE-base
chatglm chatglm-6b THUDM github, THUDM/chatglm-6b, THUDM/chatglm-6b-int8, THUDM/chatglm-6b-int4, v0.1.0 chatglm-6b, chatglm-6b-int8, chatglm-6b-int4, chatglm-6b-v0.1.0
chatglm2-6b THUDM github, THUDM/chatglm2-6b, THUDM/chatglm2-6b-int4, THUDM/chatglm2-6b-32k chatglm2-6b, chatglm2-6b-int4, chatglm2-6b-32k
chatglm3-6b THUDM github, THUDM/chatglm3-6b, THUDM/chatglm3-6b-32k chatglm3-6b, chatglm3-6b-32k
llama llama meta github llama-7b, llama-13b
llama-2 meta github, meta-llama/Llama-2-7b-hf, meta-llama/Llama-2-7b-chat-hf, meta-llama/Llama-2-13b-hf, meta-llama/Llama-2-13b-chat-hf Llama-2-7b-hf, Llama-2-7b-chat-hf, Llama-2-13b-hf, Llama-2-13b-chat-hf
llama-3 meta github, meta-llama/Meta-Llama-3-8B, meta-llama/Meta-Llama-3-8B-Instruct Meta-Llama-3-8B, Meta-Llama-3-8B-Instruct
chinese_llama_alpaca HFL github chinese_alpaca_plus_7b, chinese_llama_plus_7b
Belle_llama LianjiaTech github, BelleGroup/BELLE-LLaMA-7B-2M-enc 合成说明BELLE-LLaMA-7B-2M-enc
Ziya IDEA-CCNL IDEA-CCNL/Ziya-LLaMA-13B-v1, IDEA-CCNL/Ziya-LLaMA-13B-v1.1, IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1 Ziya-LLaMA-13B-v1, Ziya-LLaMA-13B-v1.1
Baichuan baichuan-inc github, baichuan-inc/Baichuan-7B, baichuan-inc/Baichuan-13B-Base, baichuan-inc/Baichuan-13B-Chat Baichuan-7B, Baichuan-13B-Base, Baichuan-13B-Chat
Baichuan2 baichuan-inc github, baichuan-inc/Baichuan2-7B-Base, baichuan-inc/Baichuan2-7B-Chat, baichuan-inc/Baichuan2-13B-Base, baichuan-inc/Baichuan2-13B-Chat Baichuan2-7B-Base, Baichuan2-7B-Chat, Baichuan2-13B-Base, Baichuan2-13B-Chat
vicuna lmsys lmsys/vicuna-7b-v1.5 vicuna-7b-v1.5
Yi 01-ai github, 01-ai/Yi-6B, 01-ai/Yi-6B-200K Yi-6B, Yi-6B-200K
bloom bloom bigscience bigscience/bloom-560m, bigscience/bloomz-560m bloom-560m, bloomz-560m
Qwen Qwen 阿里云 github, Qwen/Qwen-1_8B, Qwen/Qwen-1_8B-Chat, Qwen/Qwen-7B, Qwen/Qwen-7B-Chat Qwen-1_8B, Qwen-1_8B-Chat, Qwen-7B, Qwen-7B-Chat
InternLM InternLM 上海人工智能实验室 github, internlm/internlm-chat-7b, internlm/internlm-7b internlm-7b, internlm-chat-7b
Falcon Falcon tiiuae hf, tiiuae/falcon-rw-1b, tiiuae/falcon-7b, tiiuae/falcon-7b-instruct falcon-rw-1b, falcon-7b, falcon-7b-instruct
moe deeoseek-moe deepseek github, deepseek-ai/deepseek-moe-16b-base, deepseek-ai/deepseek-moe-16b-chat deepseek-moe-16b-base, deepseek-moe-16b-chat
embedding text2vec-base-chinese shibing624 shibing624/text2vec-base-chinese text2vec-base-chinese
m3e moka-ai moka-ai/m3e-base m3e-base
bge BAAI BAAI/bge-large-en-v1.5, BAAI/bge-large-zh-v1.5 bge-large-en-v1.5, bge-large-zh-v1.5
gte thenlper thenlper/gte-large-zh, thenlper/gte-base-zh gte-base-zh, gte-large-zh

*注:

  1. 高亮格式(如bert-base-chinese)的表示可直接build_transformer_model()联网下载
  2. 国内镜像网站加速下载
    • HF_ENDPOINT=https://hf-mirror.com python your_script.py
    • export HF_ENDPOINT=https://hf-mirror.com后再执行python代码
    • 在python代码开头如下设置
    import os
    os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"

6. 鸣谢

  • 感谢苏神实现的bert4keras,本实现有不少地方参考了bert4keras的源码,在此衷心感谢大佬的无私奉献;
  • 其次感谢项目bert4pytorch,也是在该项目的指引下给了我用pytorch来复现bert4keras的想法和思路。

7. 引用

@misc{bert4torch,
  title={bert4torch},
  author={Bo Li},
  year={2022},
  howpublished={\url{https://github.com/Tongjilibo/bert4torch}},
}

8. 其他

  • Wechat & Star History Chart
pic
微信号
pic
微信群
pic
Star History Chart

bert4torch's People

Contributors

nejweka407 avatar nuass avatar skykiseki avatar tongjilibo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bert4torch's Issues

t5_pegasus_small 支持问题

你好,我想用t5_pegasus_small做一个seq2seq任务,config.json文件如下:
{
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 512,
"initializer_range": 0.02,
"intermediate_size": 1024,
"num_attention_heads": 6,
"attention_head_size": 64,
"num_hidden_layers": 8,
"vocab_size": 50000,
"hidden_act": "gelu",
"relative_attention_num_buckets": 32
}
,但是运行时在layer层会抛assert hidden_size % num_attention_heads == 0 错误,请问是配置文件哪里不对吗

一个小问题

在关系抽取模型casrel中,如果在模型结构中加入了其他的网络结构,在def forward(self, inputs)中添加了使用过程。那么需要在class Model(BaseModel)下的def predict_subject(self, inputs)和def predict_object(self, inputs)下添加这个网络结构的使用嘛吗
image
image
image

模型最后几层平均输出

想要使用模型最后几层的输出作为输出结果,现在bert4torch能实现吗
看了模型定义没找到方法

Traceback (most recent call last): File "task_sentiment_classification_hierarchical_position.py", line 117, in <module> model.fit(train_dataloader, epochs=10, steps_per_epoch=None, callbacks=[evaluator]) File "/home/huangjiaheng/.local/lib/python3.8/site-packages/bert4torch/models.py", line 274, in fit loss.backward(retain_graph=retain_graph) File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward Variable._execution_engine.run_backward( RuntimeError: cuda runtime error (710) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/THC/generic/THCTensorMath.cu:29

basic_masked_language_model.py

这个example中
输入:[CLS]科学[MASK][MASK]是第一生产力[SEP]
预测出来的结果是,,两个逗号,而不是技术
使用的模型是hugging face 模型库中的bert-base-chinese。
模型加载过程中出现大量警告:
image
请问只是啥问题?

虚拟对抗训练

您好呀,尝试在模型上添加对抗训练,添加完成后,代码的f1值一直是0

运行t5-pegasus报错

image
您好,通过您给定convert_t5_pegasus转化成立pytorch_model.bin,此处config.json我沿用了base版本的config.json,这个错把
"hidden_act": ["gelu", "linear"]改成"hidden_act": "gelu"后出这个错
image
望解答,谢谢!

RuntimeError: masked_select: expected BoolTensor or ByteTensor for mask

Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/bert_crf/train.py", line 191, in
model.fit(train_dataloader, epochs=20, steps_per_epoch=None, callbacks=[evaluator])
File "D:\python36\lib\site-packages\bert4torch\models.py", line 213, in fit
output, loss, loss_detail = self.train_step(train_X, train_y, grad_accumulation_steps)
File "D:\python36\lib\site-packages\bert4torch\models.py", line 131, in train_step
loss_detail = self.criterion(output, train_y)
File "D:\python36\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:/Users/Administrator/Desktop/bert_crf/train.py", line 122, in forward
return model.crf.neg_log_likelihood_loss(*outputs, labels)
File "D:\python36\lib\site-packages\bert4torch\layers.py", line 913, in neg_log_likelihood_loss
forward_score, scores = self._forward_alg(feats, mask)
File "D:\python36\lib\site-packages\bert4torch\layers.py", line 862, in _forward_alg
masked_cur_partition = cur_partition.masked_select(mask_idx) # [x * tag_size]

你好,你能提供预测代码吗?比如给没有情感的文本进行情感预测,或者没有实体的文本进行实体预测。

比如这个代码case:
你能提供预测新的文本的inference代码吗?

#! -- coding:utf-8 --

情感分类任务, 加载bert权重

valid_acc: 94.72, test_acc: 94.11

from bert4torch.tokenizers import Tokenizer
from bert4torch.models import build_transformer_model, BaseModel
from bert4torch.snippets import sequence_padding, Callback, text_segmentate, ListDataset
import torch.nn as nn
import torch
import torch.optim as optim
import random, os, numpy as np
from torch.utils.data import DataLoader
from tensorboardX import SummaryWriter

maxlen = 256
batch_size = 16
config_path = 'F:/Projects/pretrain_ckpt/bert/[google_tf_base]--chinese_L-12_H-768_A-12/bert_config.json'
checkpoint_path = 'F:/Projects/pretrain_ckpt/bert/[google_tf_base]--chinese_L-12_H-768_A-12/pytorch_model.bin'
dict_path = 'F:/Projects/pretrain_ckpt/bert/[google_tf_base]--chinese_L-12_H-768_A-12/vocab.txt'

device = 'cuda' if torch.cuda.is_available() else 'cpu'
writer = SummaryWriter(log_dir='./summary') # prepare summary writer

固定seed

seed = 42
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)

建立分词器

tokenizer = Tokenizer(dict_path, do_lower_case=True)

加载数据集

class MyDataset(ListDataset):
@staticmethod
def load_data(filenames):
"""加载数据,并尽量划分为不超过maxlen的句子
"""
D = []
seps, strips = u'\n。!?!?;;,, ', u';;,, '
for filename in filenames:
with open(filename, encoding='utf-8') as f:
for l in f:
text, label = l.strip().split('\t')
for t in text_segmentate(text, maxlen - 2, seps, strips):
D.append((t, int(label)))
return D

def collate_fn(batch):
batch_token_ids, batch_segment_ids, batch_labels = [], [], []
for text, label in batch:
token_ids, segment_ids = tokenizer.encode(text, maxlen=maxlen)
batch_token_ids.append(token_ids)
batch_segment_ids.append(segment_ids)
batch_labels.append([label])

batch_token_ids = torch.tensor(sequence_padding(batch_token_ids), dtype=torch.long, device=device)
batch_segment_ids = torch.tensor(sequence_padding(batch_segment_ids), dtype=torch.long, device=device)
batch_labels = torch.tensor(batch_labels, dtype=torch.long, device=device)
return [batch_token_ids, batch_segment_ids], batch_labels.flatten()

加载数据集

train_dataloader = DataLoader(MyDataset(['E:/Github/bert4torch/examples/datasets/sentiment/sentiment.train.data']), batch_size=batch_size, shuffle=True, collate_fn=collate_fn)
valid_dataloader = DataLoader(MyDataset(['E:/Github/bert4torch/examples/datasets/sentiment/sentiment.valid.data']), batch_size=batch_size, collate_fn=collate_fn)
test_dataloader = DataLoader(MyDataset(['E:/Github/bert4torch/examples/datasets/sentiment/sentiment.test.data']), batch_size=batch_size, collate_fn=collate_fn)

定义bert上的模型结构

class Model(BaseModel):
def init(self) -> None:
super().init()
self.bert, self.config = build_transformer_model(config_path=config_path, checkpoint_path=checkpoint_path, with_pool=True, return_model_config=True)
self.dropout = nn.Dropout(0.1)
self.dense = nn.Linear(self.config['hidden_size'], 2)

def forward(self, token_ids, segment_ids):
    _, pooled_output = self.bert([token_ids, segment_ids])
    output = self.dropout(pooled_output)
    output = self.dense(output)
    return output

model = Model().to(device)

定义使用的loss和optimizer,这里支持自定义

model.compile(
loss=nn.CrossEntropyLoss(),
optimizer=optim.Adam(model.parameters(), lr=2e-5), # 用足够小的学习率
metrics=['accuracy']
)

定义评价函数

def evaluate(data):
total, right = 0., 0.
for x_true, y_true in data:
y_pred = model.predict(x_true).argmax(axis=1)
total += len(y_true)
right += (y_true == y_pred).sum().item()
return right / total

class Evaluator(Callback):
"""评估与保存
"""
def init(self):
self.best_val_acc = 0.

# def on_batch_end(self, global_step, batch, logs=None):
#     if global_step % 10 == 0:
#         writer.add_scalar(f"train/loss", logs['loss'], global_step)
#         val_acc = evaluate(valid_dataloader)
#         writer.add_scalar(f"valid/acc", val_acc, global_step)

def on_epoch_end(self, global_step, epoch, logs=None):
    val_acc = evaluate(valid_dataloader)
    test_acc = evaluate(test_dataloader)
    if val_acc > self.best_val_acc:
        self.best_val_acc = val_acc
        # model.save_weights('best_model.pt')
    print(f'val_acc: {val_acc:.5f}, test_acc: {test_acc:.5f}, best_val_acc: {self.best_val_acc:.5f}\n')

if name == 'main':
evaluator = Evaluator()
model.fit(train_dataloader, epochs=10, steps_per_epoch=None, callbacks=[evaluator])
else:
model.load_weights('best_model.pt')

BERT-MRC跑的P,R,F1一直为0

Epoch 1/50
2000/2000 [==============================] - 534s 267ms/step - loss: 0.0394
Evaluation: 100%|██████████| 1159/1159 [01:34<00:00, 12.25it/s]
[val] f1: 0.00000, p: 0.00000 r: 0.00000

basic_language_model_nezha_gen_gpt.py

这个模块中输入的pytorch_model.bin文件是使用convert_nezha_gpt_dialog.py这个脚本对苏神提供的tf版的chinese_nezha_gpt_L-12_H-768_A-12模型进行转换而来的吗?

原始数据

您好,请问能否发一下您用于训练和测试的原始数据集呢?主要是想了解一下您是如何跑出如此高的准确率的,谢谢!

一个可能的小问题

在脚本layers.py中55行,cond = cond.unsqueeze(dim=1),这里的dim应该是等于0吧,调试发现是不太对的,老哥确认下

请问class CRF(nn.Module):是从哪里移植过来的?

麻烦您给个链接,我想看看移植过来之前的源码。
主要是想知道转移矩阵这里为什么要+2:
init_transitions = torch.zeros(self.num_labels + 2, self.num_labels + 2)
因为看过其他实现,头一次看到+2的情况,您在代码中也注释了是要加首尾,但我不知道加首尾是要解决什么问题。

如果是自定义的acc,loss怎么用

例如:
想象keras一样直接将自定义函数作为模型的评估函数该怎么做?

该如何修改
def accuracy(y_pred, y_true):
y_pred = torch.where(y_pred>0.5,torch.ones_like(y_pred,dtype = torch.float32),
torch.zeros_like(y_pred,dtype = torch.float32))
acc = torch.mean(1-torch.abs(y_true-y_pred))
return acc

model.compile(
loss=nn.CrossEntropyLoss(),
optimizer=optim.Adam(model.parameters(), lr=2e-5),
metrics={LogLoss}
)
image

关于gradient-checkpointing的支持

你好!

非常感谢作者编写的这套torch框架,gradient-checkpointing是种可以节省显存的训练方法,对于资源紧张下训练大模型有比较大的帮助作用,在苏神的博客上也有介绍,huggingface的transformers也内置了相关支持,是否能在后期加上这个功能?

RoPE细节问题

class RoPEPositionEncoding(nn.Module):
    """旋转式位置编码: https://kexue.fm/archives/8265
    """
    def __init__(self, max_position, embedding_size):
        super(RoPEPositionEncoding, self).__init__()
        position_embeddings = get_sinusoid_encoding_table(max_position, embedding_size)  # [seq_len, hdsz]
        # cos_position = position_embeddings[:, 1::2].repeat(1, 2) 
        # sin_position = position_embeddings[:, ::2].repeat(1, 2)
        cos_position = position_embeddings[:, 1::2].repeat_interleave(2, dim=-1)  # 修改后
        sin_position = position_embeddings[:, ::2].repeat_interleave(2, dim=-1)  # 修改后
        self.register_buffer('cos_position', cos_position)
        self.register_buffer('sin_position', sin_position)
    
    def forward(self, qw, seq_len_dim=1):
        dim = len(qw.shape)
        assert (dim >= 2) and (dim <= 4), 'Input units should >= 2 dims(seq_len and hdsz) and usually <= 4 dims'
        seq_len = qw.shape[seq_len_dim]
        # qw2 = torch.cat([-qw[..., 1::2], qw[..., ::2]], dim=-1)
        qw2 = torch.stack([-qw[..., 1::2], qw[..., ::2]], dim=-1).reshape_as(qw)  # 修改后

        if dim == 2:
            return qw * self.cos_position[:seq_len] + qw2 * self.sin_position[:seq_len]
        if dim == 3:
            return qw * self.cos_position[:seq_len].unsqueeze(0) + qw2 * self.sin_position[:seq_len].unsqueeze(0)
        else:
            return qw * self.cos_position[:seq_len].unsqueeze(0).unsqueeze(2) + qw2 * self.sin_position[:seq_len].unsqueeze(0).unsqueeze(2)

大佬你好,bert4torch的RoPE在实现上是不是有点问题,按照苏神的博客应该是上面修改后的代码吧

怎么加载hugface的Bert模型啊?波哥帮帮我

我跑的是task_sequence_labeling_ner_global_pointer.py
我的代码里就改了我下载的bert的路径(我用的是绝对路径)和在定义bert这一行(加了个model="bert")

config_path = '/mnt/hdd0/lsn/bert/bert-base-uncased/config.json'
checkpoint_path = '/mnt/hdd0/lsn/bert/bert-base-uncased/pytorch_model.bin'
dict_path = '/mnt/hdd0/lsn/bert/bert-base-uncased/vocab.txt'

self.bert = build_transformer_model(config_path=config_path, checkpoint_path=checkpoint_path, model='bert', segment_vocab_size=0)

然后运行后报一堆[warning],类似下面这样的
[WARNIMG] bert.embeddings.LayerNorm.weight not found in pretrain models
[WARNIMG] bert.embeddings.LayerNorm.bias not found in pretrain models
[WARNIMG] bert.encoder.layer.0.attention.output.LayerNorm.weight not found in pretrain models
[WARNIMG] bert.encoder.layer.0.attention.output.LayerNorm.bias not found in pretrain models
[WARNIMG] bert.encoder.layer.0.output.LayerNorm.weight not found in pretrain models
[WARNIMG] bert.encoder.layer.0.output.LayerNorm.bias not found in pretrain models

模型训练中的收敛问题

咨询波哥一个问题哦。
对比测试了几个模型,比如分类,序列标注,文本生成等。使用bert4torch和hugging face中的tokenizer和model load,

hugging face版本的会在五六轮左右出现一个比较好的效果
bert4torch需要20轮以上效果才可以
而最终的模型评估效果是hugging face略高1~2个点
对比代码,暂时没找到原因。比较疑惑

模型结果不是特别好,不知道是什么问题

大佬,我用你examples中的人民日报数据跑,在task_sequence_labeling_ner_W2NER 上的结果并不好,不知道是哪里出问题了,在验证集上F1值只有90一点。我用的是windows系统3090显卡,torch=1.11.3,bert4torch=0.2.2,Python=3.8

[val-token  level] f1: 0.91029, p: 0.89145 r: 0.93052
[val-entity level] f1: 0.90059, p: 0.91598 r: 0.88571 best_f1: 0.90154
============Finish Training=============

验证evaluate推理速度第一次快,其他时候非常慢的问题

你好,波哥,请问为啥self.evaluate(valid_dataset.data[:valid_len]),第一次推理验证集时速度快,第二次推理同样的验证集时速度只有前一次的二十分之一左右,evaluate代码如下
def evaluate(self, data):
total = 0
rouge_1, rouge_2, rouge_l, bleu = 0, 0, 0, 0
for title, content in tqdm(data):
total += 1
title = ' '.join(title).lower()
# with torch.no_grad():
pred_title = ' '.join(autosumm.generate(content)).lower()
if pred_title.strip():
scores = self.rouge.get_scores(hyps=pred_title, refs=title)
rouge_1 += scores[0]['rouge-1']['f']
rouge_2 += scores[0]['rouge-2']['f']
rouge_l += scores[0]['rouge-l']['f']
bleu += sentence_bleu(references=[title.split(' ')], hypothesis=pred_title.split(' '),
smoothing_function=self.smooth)
rouge_1, rouge_2, rouge_l, bleu = rouge_1 / total, rouge_2 / total, rouge_l / total, bleu / total
return {'rouge-1': rouge_1, 'rouge-2': rouge_2, 'rouge-l': rouge_l, 'bleu': bleu}
微信截图_20220919220951

遇到的几个问题分享一下,以及博主波哥的帮助解决记录

1、basic_language_model_CDial_GPT.py 文件测试的时候显示生成的文字杂七杂八的。
image
解决方法: @AutoRegressiveDecoder.wraps(default_rtype='probas') 中的probas改为logits
2、basic_language_model_nezha_gpt_dialog.py 文件测试的时候报错
image
解决:主要问题是model的转换,之前是自己写的,有些层写的不对,参看convert中的转换就没问题。另外相对距离的计算,波哥在配置文件中添加了,bert4keras则是在代码里实现。

使用task_sequence_labeling_ner_global_pointer.py训练结果为0

使用task_sequence_labeling_ner_global_pointer.py脚本做尝试
修改位置及代码如下
1、加载bert模型为huggingface上面的模型权重
self.bert_dir = "/home/BERT/bert_torch/bert-base-chinese/"
self.config_path = self.bert_dir + 'config.json'
self.checkpoint_path = self.bert_dir + 'pytorch_model.bin'
self.dict_path = self.bert_dir + 'vocab.txt'
2、修改model.fit参数
model.fit(train_dataloader, epochs=20, steps_per_epoch=5, callbacks=[evaluator])
3、完全运行结果,部分截图如下
1/5 [=====>........................] - ETA: 0s - loss: 2.5862
2/5 [===========>..................] - ETA: 0s - loss: 2.5127
3/5 [=================>............] - ETA: 0s - loss: 2.7961
4/5 [=======================>......] - ETA: 0s - loss: 2.8367
5/5 [==============================] - 1s 125ms/step - loss: 2.4887
[val] f1: 0.00000, p: 0.00000 r: 0.00000 best_f1: 0.00000
============Finish Training=============
Process finished with exit code 0
系统:ubuntu 20.0.4
pytorch版本:1.11.0+cu113
python: 3.7

想请教下是哪里的问题,导致f1结果一直为0

您好,请问W2NER的预测怎么写呢

提问时请尽可能提供如下信息:

基本信息

  • 你使用的操作系统:
  • 你使用的Python版本:
  • 你使用的Pytorch版本:
  • 你使用的bert4torch版本:
  • 你加载的预训练模型:

核心代码

# 请在此处贴上你的核心代码

输出信息

# 请在此处贴上你的调试输出

自我尝试

此处请贴上你的自我尝试过程

代码问题

115 行被注释掉了, 后面inference要用到这个文件

def on_epoch_end(self, global_step, epoch, logs=None):
val_acc = self.evaluate(valid_dataloader)
test_acc = self.evaluate(test_dataloader)
logs['val/acc'] = val_acc
logs['test/acc'] = test_acc
if val_acc > self.best_val_acc:
self.best_val_acc = val_acc
# model.save_weights('best_model.pt')
print(f'val_acc: {val_acc:.5f}, test_acc: {test_acc:.5f}, best_val_acc: {self.best_val_acc:.5f}\n')

预训练模型加载出错的请进

不少issue是关于预训练模型加载出错的,包含报warning和config参数不对,解释如下

  • 原因:报warning是因为模型文件中的key和bert4torch的key没有完全对齐,config参数不对是笔者对原config文件做了修改(方便参数名统一)
  • 解决方案:可以直接查看README文件结尾,部分预训练权重提供了convert文件,config参数提供了config说明

tensorrt 转onnx为fp16 trt

提问时请尽可能提供如下信息:
请教下,tensorrt trtexec把onnx转换成fp32的trt时,是没有问题的,但是转换成fp16,误差很大,基本不可用;不知道你这边有没有转成fp16的trt,有没有问题?

基本信息

  • 你使用的操作系统:
  • 你使用的Python版本:
  • 你使用的Pytorch版本:
  • 你使用的bert4torch版本:
  • 你加载的预训练模型:

核心代码

# 请在此处贴上你的核心代码

输出信息

# 请在此处贴上你的调试输出

自我尝试

此处请贴上你的自我尝试过程

你好,波哥,请问T5-PEGASUS的generate策略是和transformers库的mt5不同吗?

我使用同样的权重文件,同样的测试数据,使用如下代码可以生成摘要:
model = MT5ForConditionalGeneration.from_pretrained(args.pretrain_model).to(device)
gen = model.generate(max_length=args.max_len_generate,
eos_token_id=tokenizer.sep_token_id,
decoder_start_token_id=tokenizer.cls_token_id,
**content)

使用Autotitle 示例中的generate输出为空,后来发现是beam_search中预测的第一个值就是end_id [SEP]
class AutoSummarize(AutoRegressiveDecoder):
"""seq2seq解码器
"""
@AutoRegressiveDecoder.wraps(default_rtype='logits')
def predict(self, inputs, output_ids, states):
# inputs中包含了[decoder_ids, encoder_hidden_state, encoder_attention_mask]
return model.decoder.predict([output_ids] + inputs)[-1][:, -1, :] # 保留最后一位

def generate(self, text):
    gc.collect()
    torch.cuda.empty_cache()
    token_ids, _ = tokenizer.encode(text, maxlen=args.max_c_len)
    token_ids = torch.tensor([token_ids], device=device)
    encoder_output = model.encoder.predict([token_ids])
    output_ids = self.beam_search(encoder_output, topk=1)  # 基于beam search
    return tokenizer.decode([int(i) for i in output_ids.cpu().numpy()]

不知道是不是因为states为空呢,我发现encoder_output里并没有states

预测时候 object has no attribute 'copy'

这样预测时候 model.load_weights('../../output/best_model.pt'),报错

Traceback (most recent call last):
File "/opt/pycharm-2021.1.3/plugins/python/helpers/pydev/pydevd.py", line 1483, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/opt/pycharm-2021.1.3/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/gallup/study/search/kuafu/kuafu/matching/run_poly_encoders.py", line 149, in
model.load_weights('../../output/' + args.model_name + '_best_model.pt')
File "/home/gallup/anaconda3/envs/tf2-torch1/lib/python3.6/site-packages/bert4torch/models.py", line 272, in load_weights
self.load_state_dict(state_dict, strict=strict)
File "/home/gallup/anaconda3/envs/tf2-torch1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1455, in load_state_dict
state_dict = state_dict.copy()
File "/home/gallup/anaconda3/envs/tf2-torch1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1178, in getattr
type(self).name, name))
AttributeError: 'BiEncoder' object has no attribute 'copy'
python-BaseException

csl数据集文本摘要

你好,波哥,请问你做文本摘要的csl数据集是10K样本还是3K样本的那个呢,我跑的时候F1值到60%左右就不动了,跑不到68%

global_pointer的示例

global_pointer示例在分词后没有添加cls和sep, token加了之后指标会提高,不知是故意去掉还是疏忽了。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.