clovaai / bros Goto Github PK

View Code? Open in Web Editor NEW

156.0 156.0 23.0 64 KB

License: Apache License 2.0

Python 99.58% Shell 0.42%

bros's People

Contributors

Stargazers

Watchers

bros's Issues

Inference code for EL task

First of all thanks for the awesome code! I really want to try with my own dataset and see some performance. However, I could not find the inference code anywhere. So I am wondering do we have a plan to release the inference code? Many thanks!

How to solve lr = 0 after training 5 epochs

Thank you for the amazing work!
I am training the model with a customized dataset. However, I just noticed after 5 epochs training, the learning rate came to 0 which makes model hard to learn. Could you please point me to the learning rate strategy of BROS and may I know how to change it according to my case? Thanks!

TRAIN [epoch: 0/50] || train_loss: 460.69653 || lr: 4e-05 || time: 193.6 secs.
precision: 0.9080, recall: 0.9023, f1: 0.9052
TRAIN [epoch: 1/50] || train_loss: 129.8502 || lr: 3e-05 || time: 198.6 secs.
precision: 0.9374, recall: 0.9184, f1: 0.9278
TRAIN [epoch: 2/50] || train_loss: 75.951 || lr: 3e-05 || time: 198.0 secs.
precision: 0.9293, recall: 0.9183, f1: 0.9237
TRAIN [epoch: 3/50] || train_loss: 46.87292 || lr: 2e-05 || time: 197.8 secs.
precision: 0.9442, recall: 0.9391, f1: 0.9416
TRAIN [epoch: 4/50] || train_loss: 28.64673 || lr: 1e-05 || time: 197.7 secs.
precision: 0.9444, recall: 0.9392, f1: 0.9418
TRAIN [epoch: 5/50] || train_loss: 16.82515 || lr: 0.0 || time: 197.6 secs.

What does the result mean?

last_hidden_state
tensor([[[-0.0342, 0.2487, -0.2819, ..., 0.1495, 0.0218, 0.0484],
[ 0.0792, -0.0040, -0.0127, ..., -0.0918, 0.0810, 0.0419],
[ 0.0808, -0.0918, 0.0199, ..., -0.0566, 0.0869, -0.1859],
[ 0.0862, 0.0901, 0.0473, ..., -0.1328, 0.0300, -0.1613],
[-0.2925, 0.2539, 0.1348, ..., 0.1988, -0.0148, -0.0982],
[-0.4160, 0.2135, -0.0390, ..., 0.6908, -0.2985, 0.1847]]],
grad_fn=)
last_hidden_state.shape
torch.Size([1, 6, 768])

The dataset for CORD linking task

Hello, I am interested in the great work. However, I am a little bit confused about the linking task in CORD. Is the entity with category as "menu.nm" linking to all the other entities within a same group? Besides, do you use "is_key" to split a valid line (often in the bottom of an image) into 2 entities and then generate a link between them?

Can you also provide the inference file for this repository?

Great work! Can you also provide the inference file for this repository?

config parameter max_seq_length: 512

Hi!
Thank you for sharing BROS!
I run into a document where the entities are beyond the limit of 512 tokens,
I do see BROS has a configuration parameter to extend this limit

  max_seq_length: 512

but the pre-trained model available in huggingface is only for 512 tokens,
then finetuning will only limited up to 512 tokens?

thank you,

MLM Pretraining missing bbox inputs

Hi, Great work on the package.
It seems on some of the model classes, eg. BrosLMHeadModel, the code misses the bbox inputs. Example below. Correct me if I misunderstood, but I guess bbox should be added here.
If you would like I can put in a PR to fix it here and in the other places like BrosForSequenceClassification and BrosPreTrainedModel.

MLM Model input

bros/bros/modeling_bros.py

Lines 1314 to 1318 in eb3aa51

 def forward( 

 self, 

 input_ids=None, 

 attention_mask=None, 

 token_type_ids=None,

Bros Model call in MLM Model

bros/bros/modeling_bros.py

Lines 1378 to 1381 in eb3aa51

 outputs = self.bros( 

 input_ids, 

 attention_mask=attention_mask, 

 token_type_ids=token_type_ids,

Multiple linking ln Bros

I am trying out multiple linking in the model i have changed the "el_labels" shape to (2, self.max_seq_length) in order to give 2 links for each boxes, and i have changed the linking output by changing the parameter n_relations = 2

## in bros_dataset.py
### Line 410
el_labels = np.ones((2, self.max_seq_length), dtype=int) * self.max_seq_length

## in bros_spade_rel.py
### Line 51
self.relation_net = RelationExtractor(
    n_relations=2,
    backbone_hidden_size=self.backbone_hidden_size,
    head_hidden_size=self.head_hidden_size,
    head_p_dropout=self.head_p_dropout,
)

Anyone tell me how to change the loss function for the multi linking.

Suggestions for implement pre-training

Thanks for your impressive work.
Can you share how to implement pretraining code?

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

This error while training the model
CUDA_VISIBLE_DEVICES=0 python3 train.py --config=configs/custom.yaml

Clarification regarding `num_samples_per_epoch`

Could you guys please clarify whether num_samples_per_epoch in the config files refers to the total number of documents in the training set or does it mean something else?

I set the num_samples_per_epoch to the number of docs in my training set, however, the LRScheduler warmup is not working as expected.

Code release date

Hi.
I've seen that the model has been uploaded to the huggingface hub but without any information in the card: https://huggingface.co/naver-clova-ocr/bros-base-uncased
Just wondering when you are planning to upload the code in this repo.
Thanks in advance,

Change bert model

Hello!!

I want to learn using Korean dataset. Can I change the bert model?? If so, which part should I modify?

F-score on CORD dataset

Thanks for the excellent work! I am trying to reproduce the result on CORD dataset. However, I find the f-score results in your paper are somewhat different from that in LayoutLMv2 paper. Specifically, LayoutLMv2*-base achieves 96.05 and LayoutLMv2*-large achieves 97.24 in your paper. While in LayoutLMv2 paper, LayoutLMv2-base achieves 94.95 and LayoutLMv2-large achieves 96.01. Could you give an example of BROS fine-tuning on CORD dataset? Thanks!

How to convert BIO-tagged sequence to SPADE

Hi,

BIO is the dominant tagging strategy for token classification tasks. Could you provide an explanation of how to convert a BIO-tagged sequence to SPADE? This would be useful to fine-tine the SPADE-based EE model on custom datasets.

I know the same can be reverse-engineered from the codebase, but it'll be helpful if we have a concrete description of -

initial_tokens, subsequent_tokens
How to obtain them from a BIO-tagged sequence
How to "combine" the predicted initial_logits and subsequent_logits to determine the final class prediction for each token.

Thanks.

Clarification on table 5

Hi there, first of all thanks for sharing your excellent work.
I have a doubt regarding how you get the results of table 5. In the paper you mention that you don't use the order information, but how do you implement that exactly?. Do you remove the 1D abs. positional embeddings from the model? if so, that comes with a new pre-training? and finally, I guess you still train with the dataset order of the words and it is only in test where you shuffle the words, is that right?

Thanks in advance!

KeyError: 'bros'

Hi, I am trying to load the model using the transformers library of HuggingFace. However I got a KeyError: 'bros' when trying to load the model from_pretrained. Specifically I have the following:
from transformers import AutoModel model = AutoModel.from_pretrained("naver-clova-ocr/bros-base-uncased")

as stated in the doc https://huggingface.co/naver-clova-ocr/bros-base-uncased/tree/main

The full stack error:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/envs/fair/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 396, in from_pretraine
config, kwargs = AutoConfig.from_pretrained(
File "/usr/local/envs/fair/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 560, in from_pre
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/usr/local/envs/fair/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 301, in __getite
raise KeyError(key)

Im using huggingface-hub version 0.1.2 and transformers 4.12.5

Any thoughts here?
Thanks.

F-score on CORD dataset

Finetuned Checkpoints for SPADE EL Task

Hi, hope you are doing well.

I wanted to ask whether you have uploaded any checkpoints mentioned in the fine-tuning doc. I wanted to evaluate/test the model performance and training does not seem to be an option currently.

Thanks

Bounding box clarification

Thanks for contributing this awesome piece of research!

Quick question about the input boxes.

For bros, is the expected format [x1, y1, x2, y2, x3, y3, x4, y4], where each x,y pair is the corners of the bounding box, starting from the top left and clockwise?
Each bounding box should be normalized by dividing x values by width and y values by height?

I'm training on DocVQA, but the results are not that great. Just trying to make sure I'm doing everything right :)

End2end EE and EL

Hi,
first of all thanks for the code, that's a great contribution to the community!!. From the paper I understood that the model could be fine-tuned end2end for EE and EL at the same time, however looking at the code I think it does not do it like that, right? Is it supported the combined EE and EL end2end somehow?

Thanks,

Format of label & output in Relation Extraction task

Hi, thanks for the excellent work!
In your repository, I see format of relation extraction label you save from dataset (link):
el_labels = np.ones(self.max_seq_length, dtype=int) * self.max_seq_length
el_labels[word_to] = word_from
where el_labels[i] = j means words[j] link to words[i] and words[j] is question, words[i] is answer.
In this way, one word (label is answer) can be only linked by one word (label is question) . What happen if there are many words (label is question) that have same connection to one word(label is answer). May it cause loss links (connections) ?

Fine Tuning on Custom Dataset

Thank you very much for sharing this great work!
I was wondering if there are any instructions on how to prepare custom data to be used for fine-tuning Bros. I understand there are preprocessing codes for FUNSD, but if there are summarized instructions, it will be greatly helpful.

Model weights license

Hi team, thanks for the great work. I just want to know whether the pre-trained models are released in the same license as Apache-2.0 license?

TorchText Issue on Google Colab

Hello,

I am trying to run the fine tuning scripts for FUNSD on Google Colab; I have installed all the required dependencies in requirements.txt, but when running

!CUDA_VISIBLE_DEVICES=0 python train.py --config=configs/finetune_funsd_ee_bies.yaml

I am getting

OSError: /usr/local/lib/python3.8/dist-packages/torchtext/lib/libtorchtext.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_

I have tried installing torchtext and upgraded pytorch lightning correspondingly, to no avail.

Any ideas on what could be going on?

Thanks!

Question about EL Task Experiment Results

Thank you very much for sharing this great work again!

I have a question while reading the paper carefully. I am curious why in the Table 5: Chart of Performance comparisons EL tasks section, there is no comparison between BROS and Spade since BROS uses the Spade decoder for the EL task. I am very interested in the result. Thank you!

Correct implementation of RelationExtractor

I find the implementation of RelationExtractor in this repository is incorrect (according to the original one). I'm aware that the implementation is (kind of) the same as the one 2005.00642. But after digging up the code in clovaai/spade, I realized the original implementation is different from the paper. I'll refer to the implementation as SPADE, this repository as BROS and the paper as SPADE paper.

SPADE and SPADE paper use two score matrix for each relation, BROS only have one.
SPADE paper and BROS use threshold to binarize and obtain adjacency matrix; SPADE use element wise argmax of two scores matrix, so each score matrix is similar to the probability of edges or not.
The loss function in SPADE is weighted cross entropy, with heavy weight toward the second score matrix (having edge).

My version of RelationExtractor (which have been tested and able to achieve somewhat equivalent results of the original SPADE):

class RelationTagger(nn.Module):
    def __init__(self, n_fields, hidden_size):
        super().__init__()
        self.head = nn.Linear(hidden_size, hidden_size)
        self.tail = nn.Linear(hidden_size, hidden_size)
        self.field_embeddings = nn.Parameter(
            torch.rand(1, n_fields, hidden_size))
        self.W_label_0 = nn.Linear(hidden_size, hidden_size, bias=False)
        self.W_label_1 = nn.Linear(hidden_size, hidden_size, bias=False)

    def forward(self, enc):

        enc_head = self.head(enc)
        enc_tail = self.tail(enc)

        batch_size = enc_tail.size(0)
        field_embeddings = self.field_embeddings.expand(batch_size, -1, -1)
        enc_head = torch.cat([field_embeddings, enc_head], dim=1)

        score_0 = torch.matmul(
            enc_head, self.W_label_0(enc_tail).transpose(1, 2))
        score_1 = torch.matmul(
            enc_head, self.W_label_1(enc_tail).transpose(1, 2))

        score = torch.cat([score_0.unsqueeze(1), score_1.unsqueeze(1)], dim=1)
        return score

This implementation works for single relation, but one can use multiple instances of this layer for multiple relations. The output dim is b * s * (n+f) * n, where b is batch size, s = 2 and s is the number of score matrices, n is sequence length, and f is the number of fields. The final relation matrices is obtained by score.argmax(dim=1).

	def forward(
	self,
	input_ids=None,
	attention_mask=None,
	token_type_ids=None,

	outputs = self.bros(
	input_ids,
	attention_mask=attention_mask,
	token_type_ids=token_type_ids,

clovaai / bros Goto Github PK

bros's People

Contributors

Stargazers

Watchers

Forkers

bros's Issues

Recommend Projects

Recommend Topics

Recommend Org