loujie0822 / deepie Goto Github PK
View Code? Open in Web Editor NEWDeepIE: Deep Learning for Information Extraction
Home Page: https://github.com/loujie0822/DeepIE
DeepIE: Deep Learning for Information Extraction
Home Page: https://github.com/loujie0822/DeepIE
大佬,你好:
尝试跑了一下elt_span_transformers发现报了一些错误:
2021-01-26 14:49:24,295 - transformers.tokenization_utils - INFO - Model name 'transformer_model_path' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, ber
t-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-lar
ge-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-germa
n-dbmdz-cased, bert-base-german-dbmdz-uncased). Assuming 'transformer_model_path' is a path or url to a directory containing tokenizer files.
2021-01-26 14:49:24,295 - transformers.tokenization_utils - INFO - Didn't find file transformer_model_path. We won't load it.
2021-01-26 14:49:24,296 - transformers.tokenization_utils - INFO - Didn't find file transformer_model_path\added_tokens.json. We won't load it.
2021-01-26 14:49:24,296 - transformers.tokenization_utils - INFO - Didn't find file transformer_model_path\special_tokens_map.json. We won't load it.
2021-01-26 14:49:24,296 - transformers.tokenization_utils - INFO - Didn't find file transformer_model_path\tokenizer_config.json. We won't load it.
Traceback (most recent call last):
File "run/relation_extraction/etl_span_transformers/main.py", line 148, in
main()
File "run/relation_extraction/etl_span_transformers/main.py", line 129, in main
tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=True)
File "D:\Anaconda3\envs\deepie\lib\site-packages\transformers\tokenization_utils.py", line 283, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "D:\Anaconda3\envs\deepie\lib\site-packages\transformers\tokenization_utils.py", line 347, in _from_pretrained
list(cls.vocab_files_names.values())))
OSError: Model name 'transformer_model_path' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingu
al-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whol
e-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We a
ssumed 'transformer_model_path' was a path or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
能否提供一些模型数据呢?多谢
rt
看etl_span_transformers.py说明部分用到的数据集是BaiduIE_2019或BaiduIE_2020。请问可以用CHIP2020-中文医学文本实体关系抽取数据集吗?
md文件中shuo的将数据上传到 data/BaiduIE_2020/
百度的duie数据上传直接放三个json的文件就可以了吗
大佬好,《Joint Extraction of Entities and Relations Based on a Novel Decomposition Strategy》bert实现和苏剑林的bert4keras信息抽取在百度2019基本f1都在0.82左右,但是真正抽取新闻的时候,使用句子进行切割,效果很不理想,有什么推荐trick?
大佬,请问一下你用层叠式指针标注处理CHIP2020命名实体识别任务时,一共9类应该会存在标签稀疏问题,能请教一下怎么处理这个问题吗 我用LSTM接Linear当做多分类问题处理,但是效果很差 识别不出实体。
仔细查看了代码里面的数据读取,代码应该是没有匹配2019年的数据格式,不知道是不是我看错了
run/relation_extraction/etl_span_transformers/data_loader_v2.py, line 212
2019数据的spo['object']已经是个字符串了,没有keys()属性了,
for spo_object in spo['object'].keys():
if spo['predicate'] in self.spo_conf:
label = spo['predicate']
else:
label = spo['predicate'] + '_' + spo_object
spo_dict[self.spo_conf[label]] = spo['object'][spo_object]
File "/home/powerop/work/DeepIE-master/run/relation_extraction/etl_span_transformers/data_loader_v2.py", line 212, in _read
for spo_object in spo['object'].keys():
AttributeError: 'str' object has no attribute 'keys'
大神,尽快上项目代码,学习学习
如果能把数据集的链接也放上去的话,感觉万星可期!
你好,请问有FLAT这个方法的代码吗?我看了一下,原作者提供了一个空的链接
/run/relation_extraction/etl_span/train.py
line 145-147
ans_dict = self.convert_spo_contour(qids, subject_pred, po_pred, eval_file,
answer_dict, use_bert=self.args.use_bert)
return ans_dict
convert_spo_contour 在 285-315行
该函数代码中并没有 return ,这里是用了其他高级语法还是torch的什么特性,这里没有看懂。
谢谢大家了
what a pity, lack of ee
Good work !
I want to know how to do predict?
Thanks.
请问一下哪里可以看到评测的具体结果 recall precisiion f1
transformers_model_path/added_tokens.json.
transformers_model_path/special_tokens_map.json.
transformers_model_path/tokenizer_config.json.
这三个文件是2020年数据里面的吗?19年的数据好像没有
您好,我最近一直在关注您的项目,您在知乎上的文章我都仔细看过了,注意到您最近有提交transformers的spo方法,请问是spert那篇论文的方法吗?
transformers_multi_label_span 这个目前可以运行抽取百度的spo信息吗?
大佬,项目中有哪些论文您已经重现了?可否项目中的网络结构略微说明一下,多谢
代码中有参数控制了spo_version,但是对应的dataloader_v1, v2都无法读取BaiduIE 2019的数据集
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.