tricktreat / promptner Goto Github PK

View Code? Open in Web Editor NEW

72.0 72.0 5.0 632 KB

Code for the paper "PromptNER: Prompt Locating and Typing for Named Entity Recognition", accepted at ACL 2023.

Python 98.07% HTML 1.93%

promptner's Issues

Format of the data set

What is the format of the dataset? Can you give me an example？

python prompt4ner.py train --config "/hy-tmp/PromptNER-main/configs/bert/conll2003.conf"

Config:
Namespace(boundary_threshold=0.0, cache_path=None, cls_threshold=0.0, config='/hy-tmp/PromptNER-main/configs/bert/conll2003.conf', cpu=False, debug=False, decoder_layers=3, deeply_weight='linear', device_id=0, epochs=50, eval_batch_size=16, eval_every_epochs=1, example_count=None, final_eval=False, freeze_transformer=False, init_eval=False, label='conll03_train', last_layer_for_loss=1, local_rank=-1, log_path='data/checkpoint', loss_boundary_weight=2.0, loss_class_weight=2.0, lowercase=False, lr=2e-05, lr_warmup=0.1, lstm_layers=3, match_boundary_weight=2.0, match_class_weight=2.0, match_solver='hungarian', match_warmup_epoch=0, max_grad_norm=1.0, model_path='bert-large-cased', model_type='prompt4ner', nil_weight=-1.0, no_duplicate=True, no_overlapping=True, no_partial_overlapping=True, pool_type='max', prompt_individual_attention=False, prompt_length=2, prompt_number=50, prompt_type='soft', prop_drop=0.5, repeat_gt_entities=45, sampling_processes=4, save_optimizer=False, save_path='data/checkpoint', save_path_include_iteration=False, seed=47, sentence_individual_attention=True, split_epoch=5, stage_one_lr_scale=1.5, store_examples=False, store_predictions=False, tokenizer_path='bert-large-cased', train_batch_size=8, train_log_iter=1, train_path='data/datasets/conll03/[email protected]', type_loss='celoss', types_path='data/datasets/conll03/conll03_types.json', use_masked_lm=False, valid_path='data/datasets/conll03/[email protected]', weight_decay=0.01, withimage=False, world_size=-1)
Repeat 1 times

Iteration 0

{}
Need 1 GPU for Normal Training, All are busy. Waiting for Free GPU ......
报错，确定使用GPU且安装了torch1.10.1+cu111，仍然显示GPU在忙，
十分感谢

没有找到mit-movie_types.json文件

types_path = data/datasets/few-shot/data/mit-movie/mit-movie_types.json,数据文件中没有找到该mit-movie_types.json文件，不知道是否是我没有找到，能否将该文件下载路径提供给我，谢谢了

有关代码中数据的问题。

您好，请问论文代码中wiki_50000_80_first_types.json现在可以下载吗，想复现一下论文中的实验

动态模板填充

请问在进行动态模板填充的时候，M个提示不是一样的吗？那实体和这M个提示之间不同的匹配有什么区别？

debug模式下多线程卡死

您好，读了您的论文深受启发，所以想读一下代码如何实现的，但是在阅读代码过程中出现了一点问题，在这里想问一下是不是有解决方案？具体问题如下，我使用vscode ssh连接云服务器进行调试，但是在调试的过程中，不明白代码中的多进程为什么会死锁，无法继续debug调试下去。具体代码如下所示：

def _compute_extended_attention_mask(self, attention_mask, context_count, prompt_number):
        
        if not self.prompt_individual_attention and not self.sentence_individual_attention:
            # #batch x seq_len
            extended_attention_mask = attention_mask
        else:
            # #batch x seq_len x seq_len
            extended_attention_mask = attention_mask.unsqueeze(1).expand(-1, attention_mask.size(-1), -1).clone()
            # attention_mask=attention_mask.unsqueeze(1).expand(-1, attention_mask.size(-1), -1)
            # extended_attention_mask=torch.empty_like(attention_mask)
            # extended_attention_mask.copy_(attention_mask)
            for mask, c_count in zip(extended_attention_mask, context_count):
                # mask seq_len x seq_len
                # mask prompt for sentence encoding
                if self.prompt_individual_attention:
                    # encode for each prompt
                    for p in range(prompt_number):
                        mask[p*self.prompt_length:  p*self.prompt_length + self.prompt_length, :prompt_number*self.prompt_length] = 0
                        mask[p*self.prompt_length: p*self.prompt_length + self.prompt_length, p*self.prompt_length: p*self.prompt_length + self.prompt_length] = 1
                if self.sentence_individual_attention:
                    for c in range(c_count):
                        mask[c+self.prompt_length*prompt_number, :self.prompt_length*prompt_number] = 0

        return extended_attention_mask

    def _common_forward(
        self, 
        encodings: torch.tensor, 
        context_masks: torch.tensor, 
        raw_context_masks: torch.tensor, 
        inx4locator: torch.tensor, 
        pos_encoding: torch.tensor, 
        seg_encoding: torch.tensor, 
        context2token_masks:torch.tensor,
        token_masks:torch.tensor,
        image_inputs: dict = None,
        meta_doc = None):
        
        batch_size = encodings.shape[0]
        context_masks = context_masks.float()
        token_count = token_masks.long().sum(-1,keepdim=True)
        context_count = context_masks.long().sum(-1,keepdim=True)
        raw_context_count = raw_context_masks.long().sum(-1,keepdim=True)
        pos = None
        tgt = None
        tgt2 = None

        # pdb.set_trace()
        
        context_masks = self._compute_extended_attention_mask(context_masks, raw_context_count, self.prompt_number)
        # print(context_masks.shape) (n,len,len)
        # print(context_masks.shape)
        # self = self.eval()
        if self.model_type == "bert":
            model = self.bert
        if self.model_type == "roberta":
            model = self.roberta
        # model.embeddings.position_embeddings
        outputs = model(
                    input_ids=encodings,
                    attention_mask=context_masks,
                    # token_type_ids=seg_encoding,
                    # position_ids=pos_encoding,
                    output_hidden_states=True)
        # last_hidden_state, pooler_output, hidden_states
....

在context_masks = self._compute_extended_attention_mask(context_masks, raw_context_count, self.prompt_number)中的extended_attention_mask = attention_mask.unsqueeze(1).expand(-1, attention_mask.size(-1), -1).clone()位置前后打断点之后就会产生多进程死锁，无法调试的情况。我看内部函数实现也没有涉及到多进程冲突，想问一下是什么原因导致呢？

How do I get the JSON file

Hi, I want to know how to get the JSON file.
The uploaded file is in txt format, but it seems that the source code needs to be converted to JSON format before it can be used. And 'mit-movie_types.json' is also not in the dataset. However, no tool class for conversion was found in the source code.

tricktreat / promptner Goto Github PK

promptner's Issues

没有Data的示例

Format of the data set

求助，使用conf文件解析，GPU不能正常启用

python prompt4ner.py train --config "/hy-tmp/PromptNER-main/configs/bert/conll2003.conf"

Iteration 0

请问什么时候开源呢

没有找到mit-movie_types.json文件

有关代码中数据的问题。

动态模板填充

debug模式下多线程卡死

How do I get the JSON file

这里代码看起来会丢掉一些元素？

数据集问题

关于eval方法

i get it.

关于跨域低资源实验

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent