nlpscott / bert-chinese-classification-task Goto Github PK

bert中文分类实践

Python 98.09% Shell 1.91%

bert-chinese-classification-task's Introduction

bert-Chinese-classification-task

bert中文分类实践

在run_classifier_word.py中添加NewsProcessor，即新闻的预处理读入部分
在main方法中添加news类型数据处理label
processors = {
"cola": ColaProcessor,
"mnli": MnliProcessor,
"mrpc": MrpcProcessor,
"news": NewsProcessor,
}

download_glue_data.py 提供glue_data下面其他的bert论文公测glue数据下载

data目录下是news数据的样例

export GLUE_DIR=/search/odin/bert/extract_code/glue_data
export BERT_BASE_DIR=/search/odin/bert/chinese_L-12_H-768_A-12/
export BERT_PYTORCH_DIR=/search/odin/bert/chinese_L-12_H-768_A-12/

python run_classifier_word.py
--task_name NEWS
--do_train
--do_eval
--data_dir $GLUE_DIR/NewsAll/
--vocab_file $BERT_BASE_DIR/vocab.txt
--bert_config_file $BERT_BASE_DIR/bert_config.json
--init_checkpoint $BERT_PYTORCH_DIR/pytorch_model.bin
--max_seq_length 256
--train_batch_size 32
--learning_rate 2e-5
--num_train_epochs 3.0
--output_dir ./newsAll_output/
--local_rank 3

中文分类任务实践

实验中对中文34个topic进行实践（包括：时政，娱乐，体育等），在对run_classifier.py代码中的预处理环节需要加入NewsProcessor模块，及类似于MrpcProcessor，但是需要对中文的编码进行适当修改，训练数据与测试数据按照4:1进行切割，数据量约80万，单卡GPU资源，训练时间18小时，acc为92.8%

eval_accuracy = 0.9281581998809113

eval_loss = 0.2222444740207354

global_step = 59826

loss = 0.14488934577978746

bert-chinese-classification-task's People

Contributors

Stargazers

Watchers

Forkers

tjunlp flyounger lishengfever zhaozk0745 zhouleidcc tigeryang93 chenny0808 coontash zhenzhenbadgirl u784799i zhouzhouyang520 toughhou tifoit starstylesky wishgale fendaq awesome-archive gogobook huguanglong chivalrouss guanlongtianzi wutonghua zhoujialinmumu giserh lang101 wangjunji benjamesbabala a625687551 tianyikenan howardchenhd binaryomaire aforever lsjiiia juary88 hailiang-wang robink87 supercp3 allensmile junfengduan zgd716 williamjia lbda1 qiqimaochiyu zhangyanbo2007 greengrass2015 chenztchan libertain fakeryfx lefugang wxrui luo1129 yw1991 zdx caiyanwei jackkuo666 thomascx colionx kyroad phelanwang wqw123 yang1 zhhhzhang rnnnnn qianrenjian nanhaishun liwzhi akiratu zyxpaidaxing pokbe rongle daedaeru flyingzhy elennaxingzi youngsmile wq2018 davidlanz berryhn safly david30907d george191 hnqhnq romeowen liqingrikeiikyeong iwaller qq547276542 qq240035000 jadeluo wujianjun789 gwzhu0717 wsp317 mazicwong goodluckkk holsworson yehuangcn autoave whitespur psyxusheng legendtianjin yilaguan damon-wyg

bert-chinese-classification-task's Issues

pytorch_model.bin指的是什么，下载下来的文件只有.ckpt文件

pytorch_model.bin指的是什么，下载下来的文件只有.ckpt文件
@NLPScott

我想问一下，我跑完好像达不到这么一个准确率，而且准确率极低，您知道这是怎么回事吗？

您好，您的分类数据集能分享下吗?

用自己的数据集finetune后的网络模型保存在哪，求告知

RuntimeError: Error(s) in loading state_dict for BertModel:

您好，很感谢您提供代码，本人水平有限，在执行这一步时:
model.bert.load_state_dict(torch.load(args.init_checkpoint, map_location='cpu'))
遇到以下错误:
RuntimeError: Error(s) in loading state_dict for BertModel: Missing key(s) in state_dict: "embeddings.word_embeddings.weight. ...."
请问这是为什么呢。

请问训练好模型后怎么做预测？

能否指明哪一个数据集是中文的分类数据集？谢谢

想问下，报错error: the following arguments are required: --vocab_file

您好，我运行了您的代码，上面显示如图报错，我修改了export路径，但好像还是没把与训练模型读进去，您知道原因吗

pytorch_model.bin可以去哪里下载

请问pytorch_model.bin可以去哪里下载

长文本

数据在其他模型效果？

你好，请问这个数据在其他模型(textCNN, lstm, eg.)上的效果有没有测试过呢？谢谢

No module

ModuleNotFoundError: No module named 'optimization'

汉语中有做分词吗？还是直接在字级别做了

能否把optimization和pytorch的checkpoint这个也放进来

能否把optimization和pytorch的checkpoint这个也放进来，我用最新的bert-pytorch master的代码转的checkpoint报错：
model.bert.load_state_dict(torch.load(args.init_checkpoint, map_location='cpu'))
RuntimeError: Error(s) in loading state_dict for BertModel:
Missing key(s) in state_dict:

1080ti跑提示内存不足

但是1080ti应该符合要求才对阿

loading state_dict for BertModel

您好，非常感谢您的代码：
我在调试的时候，下载了谷歌的chinese_base压缩包，解压后，用https://github.com/huggingface/pytorch-pretrained-BERT/tree/1de35b624b9d7998feb4d518e4f7e8e53abac4e1的方法转化成bin。或者是用https://github.com/NLPScott/bert-Chinese-classification-task/issues/13这里提供的chinese版本，都会遇到模型载入的错误。
RuntimeError: Error(s) in loading state_dict for BertModel:
Missing key(s) in state_dict: "embeddings.word_embeddings.weight",
可以发现是模型的名字对应错误，应该是名字有了调整，这里我解决不了，您能帮忙看看吗？

model.bert.load_state_dict(torch.load(args.init_checkpoint, map_location='cpu'))

model.bert.load_state_dict(torch.load(args.init_checkpoint, map_location='cpu'))
RuntimeError: Error(s) in loading state_dict for BertModel:
Missing key(s) in state_dict:
有什么解决办法吗？谢谢！

Originally posted by @wutonghua in #5 (comment)