coopercoppers / pfn Goto Github PK

View Code? Open in Web Editor NEW

169.0 6.0 20.0 11.07 MB

EMNLP 2021 - A Partition Filter Network for Joint Entity and Relation Extraction

License: MIT License

Python 100.00%

relation-extraction nlp multi-task-learning

pfn's Issues

你好，能否提供适用于nested-pfn的inference.py呢？感谢！

如题

Nest下的处理

您好，请问，您们是如何在以表表示的方式下，处理嵌套实体的关系识别的呢？方便将大致思路分享下吗？（如下，是您们在github中的实例）

关于实验结果复现的问题

大佬您好，感谢您提供您工作的开源代码，您的工作对我十分有意义！想询问下关于论文结果复现的问题：在您提供的代码中按照您提供的参数训练测试了webnlg数据集，但结果总达不到您链接中提供的结果，一共进行了三次实验，但最终结果都不理想，想询问下我的参数设置是否存在问题，万分感谢！以下是我的训练log
WEBNLG_baseline_true.log
WEBNLG_baseline_true.txt

关于头尾实体特征和关系特征的问题

大佬您好，请教个问题：
我看实体特征里面，实体开始结束特征为什么这么定义，能说说您的想法吗？
为什么repeat(1, length, 1, 1)就表示开始特征，repeat(length, 1, 1, 1)表示结尾特征，有什么含义在里面吗？
`st = h_ner.unsqueeze(1).repeat(1, length, 1, 1)

en = h_ner.unsqueeze(0).repeat(length, 1, 1, 1)

ner = torch.cat((st, en, h_global), dim=-1)
`

以及，关系抽取的时候，关系特征，为什么会分为r1和r2两个子特征，是表示头实体对于关系r的特征和尾实体对于关系r的特征吗？
`r1 = h_re.unsqueeze(1).repeat(1, length, 1, 1)

r2 = h_re.unsqueeze(0).repeat(length, 1, 1, 1)

re = torch.cat((r1, r2, h_global), dim=-1)
`
多谢大佬指教！

why use albert-xxlarge instead of bert-base when training on some datasets?

I run the code using bert-base on the dataset Conll04, and got F1-scores approximately 66. I find the f1 is much lower than using albert-large. I wonder whether the comparison between this model using albert-large and the previous work using bert-base is really reasonable?

为什么PFN-nested model在关系训练时候可以利用实体tail信息呢？

您好：
膜拜您的设计和实现！！！有一个小问题想请教一下：
您的介绍中提到：PFN-nested is an enhanced version of PFN. It is better in leveraging entity tail information and capable of handling nested triple prediction. 在PFN-nested网络结构中（PFN.py），有这样的代码：
re_head_score = self.re_head(h_re, h_share, mask)
re_tail_score = self.re_tail(h_share, h_re, mask)
分别是利用实体head和tail的信息进行关系抽取对吧？在这里self.re_head和self.re_tail都是re_unit结构，仅仅将这里的h_share, h_re换一下位置，是如何利用的tail信息的呢？self.re_tail(h_share, h_re, mask)利用的是h_share中的信息计算的r1和r2，如何体现的tail信息呢？

FileNotFoundError: [Errno 2] No such file or directory: 'data/data/NYT/ner2idx.json'

Hi,
I have tried Evaluation on Pre-trained Model for NYT and WEBNLG, but the system shows an error about the ner2idx.json file. The files are there but have no content. I have also tried to generate it using dataloder.py, but it shows the following error ModuleNotFoundError: No module named 'utils'.
Can you please fix them or suggest to me an alternate solution?
Thanks for sharing such a nice repo.

请问消融实验的编码模式encoding scheme应该怎么设置呀

如题，想知道怎么改代码和数据完成消融实验的encoding scheme呀，对比sequential encoding和parellel encoding两种方式。谢谢

有一个报错问题，不知道是环境不对还是，

您好，我应该用的是30系列的显卡，然后按照这个requirements的环境要求安装了相关的环境，然而，当我运行
python main.py
--data CONLL04
--do_train
--do_eval
--embed_mode albert
--batch_size 10
--lr 0.00002
--output_file conll04
--eval_metric micro
--clip 1.0
--epoch 200
这个命令的时候，给我报错如下:

11/22/2021 14:59:58 - INFO - main - ['main.py', '--data', 'CONLL04', '--do_train', '--do_eval', '--embed_mode', 'albert', '--batch_size', '10', '--lr', '0.00002', '--output_file', 'conll04', '--eval_metric', 'micro', '--clip', '1.0', '--epoch', '200']
11/22/2021 14:59:58 - INFO - main - Namespace(batch_size=10, clip=1.0, data='CONLL04', do_eval=True, do_train=True, dropconnect=0.1, dropout=0.1, embed_mode='albert', epoch=200, eval_batch_size=10, eval_metric='micro', hidden_size=300, linear_warmup_rate=0.0, lr=2e-05, max_seq_len=128, output_file='conll04', seed=0, steps=50, weight_decay=0)
11/22/2021 15:00:19 - INFO - main - ------Training------
Some weights of the model checkpoint at albert-xxlarge-v1 were not used when initializing AlbertModel: ['predictions.decoder.bias', 'predictions.bias', 'predictions.LayerNorm.weight', 'predictions.dense.bias', 'predictions.decoder.weight', 'predictions.LayerNorm.bias', 'predictions.dense.weight']

This IS expected if you are initializing AlbertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing AlbertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/cuda/init.py:106: UserWarning:
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/cuda/init.py:106: UserWarning:
NVIDIA GeForce RTX 3080 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
0%| | 0/93 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 196, in
ner_pred, re_pred = model(text, mask)
File "/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ps/lwc/PFN/model/pfn.py", line 260, in forward
x = self.bert(**x)[0]
File "/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ps/anaconda3/envs/lwc/lib/python3.7/site-packages/transformers/models/albert/modeling_albert.py", line 715, in forward
extended_attention_mask = extended_attention_mask.to(dtype=self.dtype) # fp16 compatibility
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

不是很明白这个错误。pytorch的社区解释说，是pytorch的版本不对而引起的，但是没有找到合理的解决方案，请您帮助，谢谢。

关于实体抽取和关系抽取F1分数问题

作者你好！
您论文中的实体抽取和关系抽取的F1的分数取的是联合分数还是单项最高分数？您实验过程中有存在单项最高的分数可能不在同一个epoch里的情况吗？谢谢！

对于不是一个单词的实体

您好，我将您的模型用于实验室的生物文献数据集上，我的格式是是按照casrel的格式处理的，但是发现您在处理实体的时候默认所有的实体都是一个单词，而我的数据集的实体大部分都是多单词，会在预处理时就报错。
您将许多的多单词实体的最后一个单词作为实体是处于什么样的考虑呢？
还有对于这种存在多单词的实体数据集，您推荐使用哪种启动参数配置才能处理呢？

{ "text": "HES1 as an independent prognostic marker in esophageal squamous cell carcinoma .", "triple_list": [ [ "HES1", "/Gene/Cancer/prognostic_factor_orMarkers", "esophageal squamous cell carcinoma" ] ] }

question about eval_metric

micro and macro, what is the differences?

有关论文中公式6和源代码的一些疑惑

你好，我在阅读论文的时候，论文提及会使用公式6将memory做一个线性变换然后得到feature，但是我在阅读代码的时候发现好像没有实现公式6，而是直接将memory作为了最终的feature。请问你们是不是在哪里实现了同样的等价操作呢？谢谢！

When training model, is it necessary to set args.do_eval=True?

if the parameter of args.do_eval is false. the entity_best and triple_best in save_file.save method will be none.
I found that do_eval parameter in the training command-line that you listed is none, so the default do_eval parameter will be False.

saved_file.save("best test result ner-p: {:.4f} \t ner-r: {:.4f} \t ner-f: {:.4f} \t re-p: {:.4f} \t re-r: {:.4f} \t re-f: {:.4f} ".format(entity_best["p"],
                        entity_best["r"], entity_best["f"], triple_best["p"], triple_best["r"], triple_best["f"]))

关于Extension on Ablation Study

您好，感谢您额外展示出encoding scheme相关的NER结果。
基于您展示的结果，我观察到在NER的结果上Sequential >>Parallel > original。
如果original model是您文中提出的PFN模型的话，这是否说明PFN的编码方式损害了NER的性能。因为Sequential方式是只将entity信息送给Relation model而不将relation信息送入entity model，而Sequential的NER结果远好于original model。甚至Parallel 也是略好于original的。
但是您文中的核心论点是与以前的related work结论相反, 您证明了re是对ner有利的，这也是最吸引我的一个观点。
所以我想问一下，这个Extension on Ablation Study的实验结果和您的结论是否矛盾？希望能够得到您的解答。

Question about entity extraction of out-of-triples

Thanks for your talented and excellent work!
I wondered whether there are some considerable ways to make up for relation information gain that out-of-triples entity missed?

RuntimeError: CUDA error: device-side assert triggered

你好，我最近想用这个方法去结合半监督学习来实现实体关系抽取，我将数据集拆分成有标签和无标签的，但是在训练时，一直会报一些错误。比如：RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.这是我仅运行有标签训练部分
代码时发生的，运行整个框架时也会发生。但是我有标签训练部分代码没修改几乎是源码，为什么会发生这种问题？下面是部分代码：
if args.do_train:
logger.info("------Training------")
if args.embed_mode == "albert":
input_size = 4096
else:
input_size = 768

    model = PFN(args, input_size, ner2idx, rel2idx)
    model.to(device)

    optimizer = optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)

    if args.eval_metric == "micro":
        metric = micro(rel2idx, ner2idx)
    else:
        metric = macro(rel2idx, ner2idx)

    BCEloss = loss()
    best_result = 0
    triple_best = None
    entity_best = None

    for epoch in range(args.epoch):
        steps, train_loss, loss_unlabeled, loss_labeled = 0, 0, 0, 0
        file_num = 1
        model.train()        
        for labeled_data in tqdm(labeled_batch):
            steps += 1
            optimizer.zero_grad()
            
            # 有标签数据
            text = labeled_data[0]
            ner_label = labeled_data[1].to(device)
            re_label = labeled_data[2].to(device)
            mask = labeled_data[-1].to(device)

            ner_pred, re_pred = model(text, mask)
            labeled_loss = BCEloss(ner_pred, ner_label, re_pred, re_label)
            labeled_loss.backward()
            train_loss += labeled_loss.item()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=args.clip)
            optimizer.step()

            if steps % args.steps == 0:
                logger.info("Epoch: {}, step: {} / {}, train_loss = {:.4f}".format
                            (epoch, steps, len(labeled_batch), (train_loss) / steps))

        logger.info("------ Training Set Results ------")
        logger.info("loss : {:.4f}".format((train_loss) / steps))

evaluate the model with customized input

I appreciate the work you shared, but I'm having some problems

I have many samples to predict, what should I do with them？
python inference.py
--model_file ${the path of your saved model}
--sent ${sentence you want to evaluate, str type restricted}
This approach seems to load the model once to process only one sample, which is very slow. Is there any way to process all samples after loading the model once? Do you have any suggestions?

Thanks.

chinese custom dataset

中文的数据格式需要转换成哪张数据集的格式才可以

OOM for my own bigger datasets

When I finished the model training and began to test, the OOM occurred, since the model does not optimize with multi-gpu, did you have this problem before?

SEMEVAL dataset

I have a dataset that is a translated version of SEMEVAL. I tried to change your code to evaluate your model on my dataset but I didn't get good results. so I want to know why you didn't report your model's performance on SEMEVAL? you didn't try at all or for some reason it didn't work?

其他数据集训练出错

我按照您对数据的处理（nytAndWebnlg），应用在其他数据集上：
我猜测您dataloader之中
` subj = entity[subj_idx]

        obj = entity[obj_idx]

        rc_head_labels+=[subj['start'], obj['start'], re['type']]

        rc_tail_labels+=[subj['end']-1, obj['end']-1, re['type']]

的含义是实体变成[1, 1, 'None', 16, 20, 'None']，两个数字是实体的单词起始和结束下标，None是类型关系是：rc_head_labels = [1, 1, '/location/location/contains]，rc_tail_labels= [16, 20, '/location/location/contains]`，即头实体和尾实体的单词下标对和关系类型。

我想请教的是：

我对于数据的处理是否理解正确？
实体的类型为None或者实际的类型有什么区别吗？
在训练中，进行forward()的时候，出现了维度不匹配的问题,这是什么原因呢？应当如何解决？

Traceback (most recent call last):
File "B:\work\pycharm\PYCHARM\PyCharm Community Edition 2020.3.4\plugins\python-ce\helpers\pydev\pydevd.py", line 1483, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "B:\work\pycharm\PYCHARM\PyCharm Community Edition 2020.3.4\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "B:/model/PFN-nested/main.py", line 197, in
ner_pred, re_head_pred, re_tail_pred = model(text, mask)
File "B:\work\anaconda\envs\pfn\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "B:\model\PFN-nested\model\pfn.py", line 260, in forward
x = self.bert(**x)[0]
File "B:\work\anaconda\envs\pfn\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "B:\work\anaconda\envs\pfn\lib\site-packages\transformers\models\bert\modeling_bert.py", line 989, in forward
past_key_values_length=past_key_values_length,
File "B:\work\anaconda\envs\pfn\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "B:\work\anaconda\envs\pfn\lib\site-packages\transformers\models\bert\modeling_bert.py", line 221, in forward
embeddings += position_embeddings
RuntimeError: The size of tensor a (588) must match the size of tensor b (512) at non-singleton dimension 1

Process finished with exit code 1

关于OOT（Out-of-triple）数据集问题

请教一下，您论文中关于测试OOT实体时，训练集，验证集也有分割成OOT实体的数据集吗？然后模型是基于OOT的训练集重新训练一遍，测试OOT实体，还是拿原训练集（包含oot数据和in-triple数据）训练好的模型，来预测oot的测试集？

Small inconsistency, or not?

Here

PFN/PFN-nested/model/pfn.py

Line 271 in 6173b3e

re_tail_score = self.re_tail(h_share, h_re, mask)

we see

re_tail_score = self.re_tail(h_share, h_re, mask)

i think it should be

re_tail_score = self.re_tail(h_re, h_share, mask)

just from gradient flow considerations we actually have two almost identical modules, but with inputs being swapped h_re and h_share have gradients from upper layers for semantically different tasks/losses. Besides of that, corrected variant learns slightly better according to my experiments.

关于results of ablation study

您好，非常感谢您非常优秀的工作成果，让我重新认识了relation识别对entity识别有帮助的可能。在阅读paper时，有一些疑问，希望能够得到解答。

文章中说到在SciERC上进行了ablation study实验，但是不同ablation的结果都很相近，请问下您的结果是单轮的训练结果吗？我运行了您的代码，发现SciERC上的结果非常不稳定，在不同random seed上结果变化较大，您是否能够给出显著性的值呢？或者在其他数据集上ablation study的结果和SciERC数据集上结论相同吗？
ablation study实验结果中只显示了relation extraction的结果，请问不同方式对entity extraction结果有怎样的影响呢？是否有导致relation extraction结果高但是entity extraction结果低的情况呢？
您使用了partition操作但是并没有验证这个操作的效果，请问在使用filter操作的情况下，使用partition操作和其他方式的实验结果区别是怎样的呢？
非常感谢！

Question about the tail re unit

Is there a reason why the order of h_re and h_share are swapped here as compared to over here ?

Thanks for any help.

coopercoppers / pfn Goto Github PK

pfn's Issues

Recommend Projects

Recommend Topics

Recommend Org