thunlp / chinese_nre Goto Github PK

Source code for ACL 2019 paper "Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge"

License: MIT License

Python 100.00%

chinese_nre's People

Contributors

Stargazers

Watchers

chinese_nre's Issues

Pre-trained model?

Is there any pre-trained model checkpoint I can use to test? Training takes too much time because of the batch_size

你好，请问SanWen数据集的数据为什么和原数据集不一样？是经过处理之后的吗？

关于数据集

在test文档中，第一列的实体和第二列的实体顺序是否不可改变，必须是句子中先出现的在前，后出现的在后

File "Chinese_NRE\nn\mglattice.py", line 175, in reset_parameters self.weight_hh.data.set_(weight_hh_data) RuntimeError: set_storage is not allowed on a Tensor created from .data or .detach(). If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset) without autograd tracking the change, remove the .data / .detach() call and wrap the change in a `with torch.no_grad():` block.

Changing
self.weight_hh.data.set_(weight_hh_data)

to
with torch.no_grad():
self.weight_hh.data.set_(weight_hh_data)

did not help.

runtime error

ValueError: At least 2 points are needed to compute area under curve, but x.shape = 0
auc曲线那里获取不到数据？

数据文件有错

给的数据文件sense.txt最后一行"铛铛车"的embedding缺失了一小段数值，长度才175，其他都是201

关于sense和sensemap

在github上看见了你们组的论文，想了解一下sense.txt和sense_map.txt怎么生成呀？想学习一下你们的代码。请问sense.txt是在这里下载那个sense-vec.txt吗？https://cloud.tsinghua.edu.cn/d/76ab4a71efa541bd8eb3/

Is it possible to share data processing script for ACE2005?

Thanks

Is it possible to share data processing script for ACE2005?

请问如果想只跑the basic lattice LSTM encoder参数应该怎么设置呢

请问如何修改batch_size？

模型里牵扯到的太多了所以不知道应该修改哪里的batch_size，看到显存占用只有980M，麻烦您能不能解答一下，非常感谢！

RuntimeError:set_storge is not allowed on Tensor created from .data or .detach()

Traceback (most recent call last):
File "D:/pycharm/work/Chinese_NRE-master/main.py", line 187, in
train(data, configure.savemodel)
File "D:/pycharm/work/Chinese_NRE-master/main.py", line 97, in train
model = MGLattice_model(data)
File "D:\pycharm\work\Chinese_NRE-master\nn\framework.py", line 15, in init
self.encoder = BiLstmEncoder(data)
File "D:\pycharm\work\Chinese_NRE-master\nn\encoder.py", line 98, in init
self.forward_lstm = LatticeLSTM(lstm_input, lstm_hidden, data.gaz_dropout, data.gaz_alphabet.size(), data.gaz_emb_dim, data.pretrain_gaz_embedding, True, data.HP_fix_gaz_emb, self.gpu)
File "D:\pycharm\work\Chinese_NRE-master\nn\mglattice.py", line 262, in init
self.rnn = MultiInputLSTMCell(input_dim, hidden_dim)
File "D:\pycharm\work\Chinese_NRE-master\nn\mglattice.py", line 163, in init
self.reset_parameters()
File "D:\pycharm\work\Chinese_NRE-master\nn\mglattice.py", line 174, in reset_parameters
self.weight_hh.data.set_(weight_hh_data)
RuntimeError: set_storage is not allowed on Tensor created from .data or .detach()

请问您的batch_size怎么调。。

关于位置嵌入

想问一下对于论文中的公式1，为什么在代码中还要加上最大句长再加1？

return x + maxlen + 1

I checked and re-labeled FinRE dataset with these rules

I re-labeled the FinRE dataset as FinRE-v2 with these rules below:

Extend 订单 to 订单, 被下订单 relations so that can capture the characters between "provider" and "client"
Add relation 砍单, 被砍单: if 增持 and 减持 exist, there should have 订单 and 砍单
Check and extend company relations in [交易, 签约, 重组]: more specific capture what kind of trading, eg. 买资, 收购, 持股, 增持 or 减持, etc.

The entire relation classes schema:

unknown 0
注资 1
拥有 2
纠纷 3
自己 4
增持 5
重组 6
买资 7
签约 8
持股 9
交易 10
入股 11
转让 12
成立 13
分析 14
合作 15
帮助 16
发行 17
商讨 18
合并 19
竞争 20
订单 21
砍单 22
减持 23
合资 24
收购 25
借壳 26
欠款 27
被发行 28
被转让 29
被成立 30
被注资 31
被持股 32
被拥有 33
被收购 34
被帮助 35
被借壳 36
被买资 37
被欠款 38
被增持 39
拟收购 40
被减持 41
被分析 42
被入股 43
被拟收购 44
被重组 45
被下订单 46
被砍单 47

The re-labeled dataset is provided through Google Drive link on my Github repo: https://github.com/A-baoYang/NLP-techniques-chinese/tree/main/NLU/Classification/RelationClassification