clovaai / bros Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
First of all thanks for the awesome code! I really want to try with my own dataset and see some performance. However, I could not find the inference code anywhere. So I am wondering do we have a plan to release the inference code? Many thanks!
Thank you for the amazing work!
I am training the model with a customized dataset. However, I just noticed after 5 epochs training, the learning rate came to 0 which makes model hard to learn. Could you please point me to the learning rate strategy of BROS and may I know how to change it according to my case? Thanks!
TRAIN [epoch: 0/50] || train_loss: 460.69653 || lr: 4e-05 || time: 193.6 secs.
precision: 0.9080, recall: 0.9023, f1: 0.9052
TRAIN [epoch: 1/50] || train_loss: 129.8502 || lr: 3e-05 || time: 198.6 secs.
precision: 0.9374, recall: 0.9184, f1: 0.9278
TRAIN [epoch: 2/50] || train_loss: 75.951 || lr: 3e-05 || time: 198.0 secs.
precision: 0.9293, recall: 0.9183, f1: 0.9237
TRAIN [epoch: 3/50] || train_loss: 46.87292 || lr: 2e-05 || time: 197.8 secs.
precision: 0.9442, recall: 0.9391, f1: 0.9416
TRAIN [epoch: 4/50] || train_loss: 28.64673 || lr: 1e-05 || time: 197.7 secs.
precision: 0.9444, recall: 0.9392, f1: 0.9418
TRAIN [epoch: 5/50] || train_loss: 16.82515 || lr: 0.0 || time: 197.6 secs.
last_hidden_state
tensor([[[-0.0342, 0.2487, -0.2819, ..., 0.1495, 0.0218, 0.0484],
[ 0.0792, -0.0040, -0.0127, ..., -0.0918, 0.0810, 0.0419],
[ 0.0808, -0.0918, 0.0199, ..., -0.0566, 0.0869, -0.1859],
[ 0.0862, 0.0901, 0.0473, ..., -0.1328, 0.0300, -0.1613],
[-0.2925, 0.2539, 0.1348, ..., 0.1988, -0.0148, -0.0982],
[-0.4160, 0.2135, -0.0390, ..., 0.6908, -0.2985, 0.1847]]],
grad_fn=)
last_hidden_state.shape
torch.Size([1, 6, 768])
Hello, I am interested in the great work. However, I am a little bit confused about the linking task in CORD. Is the entity with category as "menu.nm" linking to all the other entities within a same group? Besides, do you use "is_key" to split a valid line (often in the bottom of an image) into 2 entities and then generate a link between them?
Great work! Can you also provide the inference file for this repository?
Hi!
Thank you for sharing BROS!
I run into a document where the entities are beyond the limit of 512 tokens,
I do see BROS has a configuration parameter to extend this limit
max_seq_length: 512
but the pre-trained model available in huggingface is only for 512 tokens,
then finetuning will only limited up to 512 tokens?
thank you,
Hi, Great work on the package.
It seems on some of the model classes, eg. BrosLMHeadModel
, the code misses the bbox inputs. Example below. Correct me if I misunderstood, but I guess bbox should be added here.
If you would like I can put in a PR to fix it here and in the other places like BrosForSequenceClassification
and BrosPreTrainedModel
.
MLM Model input
Lines 1314 to 1318 in eb3aa51
Lines 1378 to 1381 in eb3aa51
I am trying out multiple linking in the model i have changed the "el_labels" shape to (2, self.max_seq_length) in order to give 2 links for each boxes, and i have changed the linking output by changing the parameter n_relations = 2
## in bros_dataset.py
### Line 410
el_labels = np.ones((2, self.max_seq_length), dtype=int) * self.max_seq_length
## in bros_spade_rel.py
### Line 51
self.relation_net = RelationExtractor(
n_relations=2,
backbone_hidden_size=self.backbone_hidden_size,
head_hidden_size=self.head_hidden_size,
head_p_dropout=self.head_p_dropout,
)
Anyone tell me how to change the loss function for the multi linking.
Thanks for your impressive work.
Can you share how to implement pretraining code?
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
This error while training the model
CUDA_VISIBLE_DEVICES=0 python3 train.py --config=configs/custom.yaml
Could you guys please clarify whether num_samples_per_epoch
in the config files refers to the total number of documents in the training set or does it mean something else?
I set the num_samples_per_epoch
to the number of docs in my training set, however, the LRScheduler warmup is not working as expected.
Hi.
I've seen that the model has been uploaded to the huggingface hub but without any information in the card: https://huggingface.co/naver-clova-ocr/bros-base-uncased
Just wondering when you are planning to upload the code in this repo.
Thanks in advance,
Hello!!
I want to learn using Korean dataset. Can I change the bert model?? If so, which part should I modify?
Thanks for the excellent work! I am trying to reproduce the result on CORD dataset. However, I find the f-score results in your paper are somewhat different from that in LayoutLMv2 paper. Specifically, LayoutLMv2*-base achieves 96.05 and LayoutLMv2*-large achieves 97.24 in your paper. While in LayoutLMv2 paper, LayoutLMv2-base achieves 94.95 and LayoutLMv2-large achieves 96.01. Could you give an example of BROS fine-tuning on CORD dataset? Thanks!
Hi,
BIO is the dominant tagging strategy for token classification tasks. Could you provide an explanation of how to convert a BIO-tagged sequence to SPADE? This would be useful to fine-tine the SPADE-based EE model on custom datasets.
I know the same can be reverse-engineered from the codebase, but it'll be helpful if we have a concrete description of -
initial_tokens
, subsequent_tokens
initial_logits
and subsequent_logits
to determine the final class prediction for each token.Thanks.
Hi there, first of all thanks for sharing your excellent work.
I have a doubt regarding how you get the results of table 5. In the paper you mention that you don't use the order information, but how do you implement that exactly?. Do you remove the 1D abs. positional embeddings from the model? if so, that comes with a new pre-training? and finally, I guess you still train with the dataset order of the words and it is only in test where you shuffle the words, is that right?
Thanks in advance!
Hi, I am trying to load the model using the transformers library of HuggingFace. However I got a KeyError: 'bros' when trying to load the model from_pretrained. Specifically I have the following:
from transformers import AutoModel model = AutoModel.from_pretrained("naver-clova-ocr/bros-base-uncased")
as stated in the doc https://huggingface.co/naver-clova-ocr/bros-base-uncased/tree/main
The full stack error:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/envs/fair/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 396, in from_pretraine
config, kwargs = AutoConfig.from_pretrained(
File "/usr/local/envs/fair/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 560, in from_pre
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/usr/local/envs/fair/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 301, in __getite
raise KeyError(key)
Im using huggingface-hub version 0.1.2 and transformers 4.12.5
Any thoughts here?
Thanks.
Thanks for the excellent work! I am trying to reproduce the result on CORD dataset. However, I find the f-score results in your paper are somewhat different from that in LayoutLMv2 paper. Specifically, LayoutLMv2*-base achieves 96.05 and LayoutLMv2*-large achieves 97.24 in your paper. While in LayoutLMv2 paper, LayoutLMv2-base achieves 94.95 and LayoutLMv2-large achieves 96.01. Could you give an example of BROS fine-tuning on CORD dataset? Thanks!
Hi, hope you are doing well.
I wanted to ask whether you have uploaded any checkpoints mentioned in the fine-tuning doc. I wanted to evaluate/test the model performance and training does not seem to be an option currently.
Thanks
Thanks for contributing this awesome piece of research!
Quick question about the input boxes.
For bros, is the expected format [x1, y1, x2, y2, x3, y3, x4, y4], where each x,y pair is the corners of the bounding box, starting from the top left and clockwise?
Each bounding box should be normalized by dividing x values by width and y values by height?
I'm training on DocVQA, but the results are not that great. Just trying to make sure I'm doing everything right :)
Hi,
first of all thanks for the code, that's a great contribution to the community!!. From the paper I understood that the model could be fine-tuned end2end for EE and EL at the same time, however looking at the code I think it does not do it like that, right? Is it supported the combined EE and EL end2end somehow?
Thanks,
Hi, thanks for the excellent work!
In your repository, I see format of relation extraction label you save from dataset (link):
el_labels = np.ones(self.max_seq_length, dtype=int) * self.max_seq_length
el_labels[word_to] = word_from
where el_labels[i] = j
means words[j] link to words[i] and words[j] is question, words[i] is answer.
In this way, one word (label is answer) can be only linked by one word (label is question) . What happen if there are many words (label is question) that have same connection to one word(label is answer). May it cause loss links (connections) ?
Thank you very much for sharing this great work!
I was wondering if there are any instructions on how to prepare custom data to be used for fine-tuning Bros. I understand there are preprocessing codes for FUNSD, but if there are summarized instructions, it will be greatly helpful.
Hi team, thanks for the great work. I just want to know whether the pre-trained models are released in the same license as Apache-2.0 license?
Hello,
I am trying to run the fine tuning scripts for FUNSD on Google Colab; I have installed all the required dependencies in requirements.txt
, but when running
!CUDA_VISIBLE_DEVICES=0 python train.py --config=configs/finetune_funsd_ee_bies.yaml
I am getting
OSError: /usr/local/lib/python3.8/dist-packages/torchtext/lib/libtorchtext.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_
I have tried installing torchtext and upgraded pytorch lightning correspondingly, to no avail.
Any ideas on what could be going on?
Thanks!
Thank you very much for sharing this great work again!
I have a question while reading the paper carefully. I am curious why in the Table 5: Chart of Performance comparisons EL tasks section, there is no comparison between BROS and Spade since BROS uses the Spade decoder for the EL task. I am very interested in the result. Thank you!
I find the implementation of RelationExtractor
in this repository is incorrect (according to the original one). I'm aware that the implementation is (kind of) the same as the one 2005.00642. But after digging up the code in clovaai/spade, I realized the original implementation is different from the paper. I'll refer to the implementation as SPADE, this repository as BROS and the paper as SPADE paper.
My version of RelationExtractor
(which have been tested and able to achieve somewhat equivalent results of the original SPADE):
class RelationTagger(nn.Module):
def __init__(self, n_fields, hidden_size):
super().__init__()
self.head = nn.Linear(hidden_size, hidden_size)
self.tail = nn.Linear(hidden_size, hidden_size)
self.field_embeddings = nn.Parameter(
torch.rand(1, n_fields, hidden_size))
self.W_label_0 = nn.Linear(hidden_size, hidden_size, bias=False)
self.W_label_1 = nn.Linear(hidden_size, hidden_size, bias=False)
def forward(self, enc):
enc_head = self.head(enc)
enc_tail = self.tail(enc)
batch_size = enc_tail.size(0)
field_embeddings = self.field_embeddings.expand(batch_size, -1, -1)
enc_head = torch.cat([field_embeddings, enc_head], dim=1)
score_0 = torch.matmul(
enc_head, self.W_label_0(enc_tail).transpose(1, 2))
score_1 = torch.matmul(
enc_head, self.W_label_1(enc_tail).transpose(1, 2))
score = torch.cat([score_0.unsqueeze(1), score_1.unsqueeze(1)], dim=1)
return score
This implementation works for single relation, but one can use multiple instances of this layer for multiple relations. The output dim is b * s * (n+f) * n
, where b
is batch size, s = 2
and s
is the number of score matrices, n
is sequence length, and f
is the number of fields. The final relation matrices is obtained by score.argmax(dim=1)
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.