kamalkraj / bert-ner Goto Github PK

View Code? Open in Web Editor NEW

1.2K 23.0 280.0 1.67 MB

Pytorch-Named-Entity-Recognition-with-BERT

License: GNU Affero General Public License v3.0

Python 54.30% CMake 0.56% C++ 45.14%

bert named-entity-recognition pytorch conll-2003 cpp11 bert-ner inference curl postman pretrained-models

bert-ner's Introduction

BERT NER

Use google BERT to do CoNLL-2003 NER !

Train model using Python and Inference using C++

ALBERT-TF2.0

BERT-NER-TENSORFLOW-2.0

BERT-SQuAD

Requirements

python3
pip3 install -r requirements.txt

Run

python run_ner.py --data_dir=data/ --bert_model=bert-base-cased --task_name=ner --output_dir=out_base --max_seq_length=128 --do_train --num_train_epochs 5 --do_eval --warmup_proportion=0.1

Result

BERT-BASE

Validation Data

             precision    recall  f1-score   support

        PER     0.9677    0.9745    0.9711      1842
        LOC     0.9654    0.9711    0.9682      1837
       MISC     0.8851    0.9111    0.8979       922
        ORG     0.9299    0.9292    0.9295      1341

avg / total     0.9456    0.9534    0.9495      5942

Test Data

             precision    recall  f1-score   support

        PER     0.9635    0.9629    0.9632      1617
        ORG     0.8883    0.9097    0.8989      1661
        LOC     0.9272    0.9317    0.9294      1668
       MISC     0.7689    0.8248    0.7959       702

avg / total     0.9065    0.9209    0.9135      5648

Pretrained model download from here

BERT-LARGE

Validation Data

             precision    recall  f1-score   support

        ORG     0.9288    0.9441    0.9364      1341
        LOC     0.9754    0.9728    0.9741      1837
       MISC     0.8976    0.9219    0.9096       922
        PER     0.9762    0.9799    0.9781      1842

avg / total     0.9531    0.9606    0.9568      5942

Test Data

             precision    recall  f1-score   support

        LOC     0.9366    0.9293    0.9329      1668
        ORG     0.8881    0.9175    0.9026      1661
        PER     0.9695    0.9623    0.9659      1617
       MISC     0.7787    0.8319    0.8044       702

avg / total     0.9121    0.9232    0.9174      5648

Pretrained model download from here

Inference

from bert import Ner

model = Ner("out_base/")

output = model.predict("Steve went to Paris")

print(output)
'''
    [
        {
            "confidence": 0.9981840252876282,
            "tag": "B-PER",
            "word": "Steve"
        },
        {
            "confidence": 0.9998939037322998,
            "tag": "O",
            "word": "went"
        },
        {
            "confidence": 0.999891996383667,
            "tag": "O",
            "word": "to"
        },
        {
            "confidence": 0.9991968274116516,
            "tag": "B-LOC",
            "word": "Paris"
        }
    ]
'''

Inference C++

Pretrained and converted bert-base model download from here

Download libtorch from here

install cmake, tested with cmake version 3.10.2
unzip downloaded model and libtorch in BERT-NER

Compile C++ App

  cd cpp-app/
  cmake -DCMAKE_PREFIX_PATH=../libtorch

make

Runing APP
```
   ./app ../base
```

NB: Bert-Base C++ model is split in to two parts.

Bert Feature extractor and NER classifier.
This is done because jit trace don't support input depended for loop or if conditions inside forword function of model.

Deploy REST-API

BERT NER model deployed as rest api

python api.py

API will be live at 0.0.0.0:8000 endpoint predict

cURL request

curl -X POST http://0.0.0.0:8000/predict -H 'Content-Type: application/json' -d '{ "text": "Steve went to Paris" }'

Output

{
    "result": [
        {
            "confidence": 0.9981840252876282,
            "tag": "B-PER",
            "word": "Steve"
        },
        {
            "confidence": 0.9998939037322998,
            "tag": "O",
            "word": "went"
        },
        {
            "confidence": 0.999891996383667,
            "tag": "O",
            "word": "to"
        },
        {
            "confidence": 0.9991968274116516,
            "tag": "B-LOC",
            "word": "Paris"
        }
    ]
}

cURL

Postman

C++ unicode support

http://github.com/ufal/unilib

Tensorflow version

https://github.com/kyzhouhzau/BERT-NER

bert-ner's People

Contributors

Stargazers

Watchers

Forkers

abhaikollara cbowdon zorrock hackable chaoyue729 allensmile gokunwu kangjinle lkrsnik aditi138 mrbai333 shenfuli enod tkukurin 36984712 ilham-bintang hongkuanzhang weiyumou mac-kim wengbenjue msoancah birendra20743592 alphanlp rich-junwang joaoalvarenga clevecque mcfly5 santhoshsthanikam kendricklee91 gdsttian stefensa iamdsyang alouiamine aevanchen shizhediao sandeeppilania xiaojie2018 leopoldwalden reganzm mrbearwithhissword 2448795365 codefreakmad ranjeetds cleeag hyzcn minhson-kaist ryan2x akhilkishore tunguyenlam yingenglei 18106574249 pku-wuwei bhagwatibhushan jingenyan codedecde qianrenjian dhruva77 0x01h amiya-mandal tamuhey denis-gordeev beekbin anandhperumal arrowluo oceanann lunayach 471417367 larue3000 berryhn delldu deepfind sathvik0 fishredleaf ratmcu jjjamie lzjpaul arita37 loretoparisi malcolmgreaves sunyancn napoler quanth charlesxrwu madhugraj pngza limyeonsoo veereshkumar4 sarwar187 danielsmith-eu ragabov ayanbasak13 saranggupta94 hainan89 sjliu0920 ryannetwork lhoaibao qjzhzw akatie wushowdawn phillette

bert-ner's Issues

What is `max_seq_length`?

Hi @kamalkraj !
Nice repo.
If a sentence has length more than 128 how do you predict NER tags for those sentences?
Especially for test data.

warmup_linear - issue

Hi, Thank you sharing the model . I am getting an error while training the model .
Getting warmup_linear library is not available .

Could you please help me how to solve this issue ?

Regards,
niranjan

line no 507:
lr_this_step = args.learning_rate * warmup_linear(global_step/num_train_optimization_steps, args.warmup_proportion)

UserWarning:Was asked to gather along dimension 0, but all input tensors were scalars;

Hey,
as the tittle,And training stopped at 12% .How can i deal with this situation

Dataset compatile issue

Can I use my own dataset with the similar format with conll2003 which only have word and tag? And the tag is different to the conll2003 but still comply to IOB2.

Reasons for predicting the labels of [CLS] and [SEP]?

Hi, thanks for your great job.
I have a question here, why do you predict the labels of [CLS] and [SEP] rather than simply mask them? Will it improve the performance of the NER task?

How to use BERT just for ENTITY extraction from a Sequence without classification in the NER task?

My requirement here is given a sentence(sequence), I would like to just extract the entities present in the sequence without classifying them to a type in the NER task. I see that BertForTokenClassification for NER does the classification. Can this be adapted for just the extraction?

Can you give me an idea of how to do entity extraction/identification using BERT?

why not repetition of bert's paper which F1 reach 0.924

hope for reply, thanks!

less supports in results

Hi @kamalkraj, nice work ! It helps me a lot.
I'm wondering why the support of the valid dataset and test dataset results in this branch is much smaller than the other branch in your results?

I used your pretrained model in this dataset, but only get F1 = 0.9078 in test.txt.

Fine tuning BERT-NER on a domain-specific dataset

Hi @kamalkraj, nice work! I'm wondering how I can continue the training of a pre-trained CoNLL'03 BERT-NER model on a separate dataset? What POS tagger and chunker should I use to get proper train.text/valid.txt/test.txt files? How to start from the checkpoint and avoid the re-training of a model from scratch?

Alternative for CUDA

Hey @kamalkraj , thanks for your work.
I am trying to run this code on my AMD Vega 10 XT. Is there any way you can help me because your code is on CUDA.
Thnaks in advance.

Incorrect ordering of arguments

Hey,
I was looking at the code and noticed that there is an error in the ordering of parameters given to BERT in line 486 of run_ner.py file

BertForTokenClassification (https://huggingface.co/pytorch-transformers/model_doc/bert.html#pytorch_transformers.BertForTokenClassification) expects the order to be input_ids, attention_mask, token_type_ids, position_ids, head_mask, labels whereas in the code as one can see below you have passed segment_ids (== token_type_ids) before input_mask (== attention_mask). Also label_ids (== labels) and l_mask (== position_ids) should also be switched.

Line 485-486
input_ids, input_mask, segment_ids, label_ids, valid_ids,l_mask = batch
loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask)

The above also in line 555

Correct me if I've understood incorrectly

How to make use of the pretained weights from download link?

Hi! Thanks for your work. I'm trying to run the model on my train-test-valid set using pre-trained weights from your google drive link but I don't see the right parameters to give when calling run_ner script.

Cannot reproduce your reported F1 score

after downloading your pretrained model in the master branch and run this command:
'''
python run_ner.py --data_dir=data/ --bert_model=bert-base-cased --task_name=ner --output_dir=out --max_seq_length=128 --num_train_epochs 5 --do_eval --warmup_proportion=0.4
'''
I got the F1 score of 90.9 in the test set, which is far away from what you reported. Could you help with my issue? Thanks!

Returning results as dictionary silently throws away tags for words that appear > once

BERT-NER/bert.py

Line 94 in c12b2ec

 output = {word:{"tag":label,"confidence":confidence} for word,label,confidence in zip(words,labels,logits_confidence)} 

Have made it work for me locally with:
output = [ {"word": word, "tag": label, "confidence": confidence} for word, label, confidence in zip(words, labels, logits_confidence) ]

Syntax error

Because of the syntax error I had to change this line from a list

output = [word:{"tag":label,"confidence":confidence} for word,label,confidence in zip(words,labels,logits_confidence)]

to a dictionary

output = {word:{"tag":label,"confidence":confidence} for word,label,confidence in zip(words,labels,logits_confidence)}

The readfile() function does not return output as the suggested format

BERT-NER/run_ner.py

Line 116 in 48a868b

if len(sentence) >0:

To begin with, thank you very much for sharing the code, it did save me a huge amount of time!

The if statement in line 116 should be in the for loop above, otherwise the output would be a list of tuples of a list of a sentence followed by a list of its corresponding tags eg:

(['-', 'JAPAN', 'GET', 'LUCKY', 'WIN', ',', 'CHINA', 'IN', 'SURPRISE', 'DEFEAT', '.'],
['O', 'B-LOC', 'O', 'O', 'O', 'O', 'B-PER', 'O', 'O', 'O', 'O']),
(['Nadim', 'Ladki'], ['B-PER', 'I-PER']),
(['AL-AIN', ',', 'United', 'Arab', 'Emirates', '1996-12-06'],
['B-LOC', 'O', 'B-LOC', 'I-LOC', 'I-LOC', 'O'])]

If the desired output is as suggested in the code, i.e.

[ ['EU', 'B-ORG'], ['rejects', 'O'], ['German', 'B-MISC'], ['call', 'O'], ['to', 'O'], ['boycott', 'O'], ['British', 'B-MISC'], ['lamb', 'O'], ['.', 'O'] ]

then the if statement could be modified as:
if len(sentence) > 0: sentence.extend(label); data.append(sentence); sentence = []; label = []

Dataset concerns

Hi @kamalkraj, nice work ! It helps me a lot.
I'm wondering is this dataset the (CoNLL-03) dataset?

How to identify entities made of more that 1 word?

Hi, I would like to recognize entities that are made up of more than one word: e.g.

Stephen King
The Art of War
United States of America

etc...

Your program splits each word making this impossible. Any workaround for this?

FP16 not working!

When I pass --fp16 parameter to train faster, it gives the following error:

Traceback (most recent call last):                                                                                              | 0/1 [00:00<?, ?it/s]
  File "run_ner.py", line 594, in <module>
    main()
  File "run_ner.py", line 487, in main
    loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "run_ner.py", line 46, in forward
    logits = self.classifier(sequence_output)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1371, in linear
    output = input.matmul(weight.t())
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #2 'mat2'

Evaluation does not use GPU

Hi, just figured it out. Inference with python does not get onto GPU. It seems that it just uses CPU. How can we push it into GPU? Is there any option to do so? Or it's not been implemented?

Thanks!

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.i s_available() is False. If you are running on a CPU-only machine, please use tor ch.load with map_location='cpu' to map your storages to the CPU.

Hello guys,

Anytime I try to run the script, I get this error.

Any suggestions on how to fix it?

(base) C:\Users\user1\Desktop\BERT-NER-experiment>activate neuro

(neuro) C:\Users\user1\Desktop\BERT-NER-experiment>python
Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bi
t (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

from bert import Ner
Better speed can be achieved with apex installed from https://www.github.com/nvi
dia/apex.
model = Ner("out_!x/")
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\user1\Desktop\BERT-NER-experiment\bert.py", line 35, in ini
t
self.model , self.tokenizer, self.model_config = self.load_model(model_dir)
File "C:\Users\user1\Desktop\BERT-NER-experiment\bert.py", line 48, in load_
model
model.load_state_dict(torch.load(output_model_file))
File "C:\Users\user1\Anaconda3\envs\neuro\lib\site-packages\torch\serializat
ion.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "C:\Users\user1\Anaconda3\envs\neuro\lib\site-packages\torch\serializat
ion.py", line 574, in _load
result = unpickler.load()
File "C:\Users\user1\Anaconda3\envs\neuro\lib\site-packages\torch\serializat
ion.py", line 537, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "C:\Users\user1\Anaconda3\envs\neuro\lib\site-packages\torch\serializat
ion.py", line 119, in default_restore_location
result = fn(storage, location)
File "C:\Users\user1\Anaconda3\envs\neuro\lib\site-packages\torch\serializat
ion.py", line 95, in _cuda_deserialize
device = validate_cuda_device(location)
File "C:\Users\user1\Anaconda3\envs\neuro\lib\site-packages\torch\serializat
ion.py", line 79, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.i
s_available() is False. If you are running on a CPU-only machine, please use tor
ch.load with map_location='cpu' to map your storages to the CPU.

output = model.predict("Steve went to Paris")
Traceback (most recent call last):
File "", line 1, in
NameError: name 'model' is not defined

N-Gram support

Hey Kamal,
Great work on the repo and major thanks for hosting the trained model.

Would it be possible to add N-Gram support to this model, for say, 'New York' instead of detecting on 'New' and then 'York'?

Also, any plans to train a larger model for QA or NER? Say RoBERTa or XL-NET?

simple training speed improvement suggestion.

Hi @kamalkraj, nice work! I noticed the training in the experiment branch is much slower than that in the master branch, which might be caused by the two for loop in forward pass:

for i in range(batch_size):
         jj = -1
         for j in range(max_len):
                 if valid_ids[i][j].item() == 1:
                     jj += 1
                     valid_output[i][jj] = sequence_output[i][j]

Instead, we could use

valid_mask = valid_ids.eq(1)
for i in range(batch_size):
        valid_mask_b = valid_mask[i]
        mask_len = torch.sum( valid_mask_b )
        valid_output[i, :mask_len] = sequence_output[i][ valid_mask_b ]

that has the same performance and similar speed as master branch.

what's wrong with the dictionary？

thanks for your sharing， but after downlading your code, there are always some editing error as the following:

I use PyCharm as my IDE tool, can you help to check what's wrong? many thanks

Implement multiple workers for NER task?

Is it possible to implement parallelized workers for the NER task, like in this repo? This does not have support for PyTorch models.
Any suggestions?

RuntimeError: a Tensor with 3145728 elements cannot be converted to Scalar

06/09/2019 23:16:28 - INFO - main - ***** Running training *****
06/09/2019 23:16:28 - INFO - main - Num examples = 14041
06/09/2019 23:16:28 - INFO - main - Batch size = 32
06/09/2019 23:16:28 - INFO - main - Num steps = 2190
Epoch: 0%| | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last): | 0/439 [00:00<?, ?it/s]
File "run_ner.py", line 534, in
main()
File "run_ner.py", line 430, in main
loss = model(input_ids, segment_ids, input_mask, label_ids)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ub16c9/ub16_prj/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 1022, in forward
sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ub16c9/ub16_prj/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 628, in forward
embedding_output = self.embeddings(input_ids, token_type_ids)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ub16c9/ub16_prj/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 198, in forward
embeddings = self.LayerNorm(embeddings)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 149, in forward
input, self.weight, self.bias)
File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 21, in forward
input_, self.normalized_shape, weight_, bias_, self.eps)
RuntimeError: a Tensor with 3145728 elements cannot be converted to Scalar (item at /pytorch/aten/src/ATen/native/Scalar.cpp:9)

Label [SEP] in results

If use the small --max_seq_length (in example bellow 32), we get SEP in results.
The lower max_seq_length the greater SEPs

         precision    recall  f1-score   support
    LOC     0.9248    0.9248    0.9248      1529
    PER     0.9556    0.9582    0.9569      1436
    ORG     0.8860    0.8993    0.8926      1539
   MISC     0.7620    0.8242    0.7919       637
  [SEP]     1.0000    1.0000    1.0000       637

avg / total 0.9125 0.9235 0.9178 5778

warmup_linear not working on BERT 0.6.2 vesrion

Hi Kamal, Thank you sharing the code . I installed pytorch-pretrained-bert 0.6.2 version in my PC and run your code . I am getting below error while executing the code. Can you please guide me how to solve this issue using latest BERT 0.6.2 version?

from pytorch_pretrained_bert.optimization import BertAdam, warmup_linear
ImportError: cannot import name 'warmup_linear' from 'pytorch_pretrained_bert.optimization' (C:\ProgramData\Anaconda3\lib\site-packages\pytorch_pretrained_bert\optimization.py)

lr_this_step = args.learning_rate * warmup_linear(global_step/num_train_optimization_steps, args.warmup_proportion)

Is there other way we can change the code ?

Regards,
Niranjan

About valid_output

When I read the code, I have concerns with the following part:
for i in range(batch_size):
jj = -1
for j in range(max_len):
if valid_ids[i][j].item() == 1:
jj += 1
valid_output[i][jj] = sequence_output[i][j]

why the valid_output is valid_output[i][jj], not valid_output[i][j]? Can you help me explain it?

Thanks.

what is the 'valid_positions' meaning?

from the original paper, the inputs contains three parameters, they are input_ids,input_mask,segment_ids, but I saw your code including valid_positions which is difficult for me to understand, can you expalin that for me? thanks

Performance and fine-tuning custom NER

Hey, I would like to know how fast were your predictions for a single request with multiple entities? And did you perform any load testing, if so what are the results?
Also I would like to know approaches for fine-tuning a custom NER model using BERT. If you know any approaches, please help me.
Thanks.

wordpiece and label?

the latest versions about wordpiece and label is Jim Hen ##son was a puppet ##eer [Jim , Hen , was , a, puppet]?

only get first token about a word which is tokenizer by wordpiece?

have you get other methods to do experimental comparison?
Hope for you reply ^^
Thanks

suggestion for own dataset to train on pretrained model.

Thanks, for the great project.
I checked the train format.
it contains text, pos , bio-tag, entity tag(bio-schema)
for example:
-DOCSTART- -X- -X- O

EU NNP B-NP B-ORG
rejects VBZ B-VP O
German JJ B-NP B-MISC
call NN I-NP O
to TO B-VP O
boycott VB I-VP O
British JJ B-NP B-MISC
lamb NN I-NP O
. . O O
I having a 10,000 own sentence data, I can do entity tagging for my dataset. but How can i do pos and bio-tag is there any python library especially for bio-tag.

Thanks

While training custom NER with large model as pretrained one - Getting " "Weights sum to zero, can't be normalized") ZeroDivisionError: Weights sum to zero, can't be normalized" Error

Bert Version ?

Is this using the older BERT version or BERT-NER Version 2 ? Thanks

Just a question

Hello ,
finally, excellent script for bert-NER. I am just wondering if this script can be used(slight changes) to train a token based classification task. i.e. similar to NER task but the token(target word) to be classified are given in advance. For example, train a model for word-sense disambiguation. Given a word in a sentence determine/classify its sense.
e.g. "He went to the store" went here has sense “motion”. the target word here is "went"
Any idea?

License

Nice work! Do you plan to make a license available for the code and pretrained model?

Can't reproduce F1 scores

After training the model using the default parameters, the result is
` precision recall f1-score support

    ORG     0.7019    0.7685    0.7337       337
    LOC     0.8190    0.8854    0.8509       419
   MISC     0.8188    0.8278    0.8233       273
    PER     0.7619    0.7453    0.7535       322

avg / total 0.7761 0.8113 0.7929 1351
`

which is different from the README results.

BERT NER very slow across multiple docker containers

When running BERT NER on a single container the code executes fine, but when scaled to run across 2+ containers the speed drops drastically (from 20 seconds -> 20 minutes)
This has been isolated as definitely being a problem with BERT as it does not happen without it, wondering if you have come across anything like this
The problem specifically comes about when calling model.predict()
for one sentence it takes ~20s to output a prediction
EDIT:
it is slowing down at lines 88-89 in bert.py:
with torch.no_grad():
logits = self.model(input_ids, segment_ids, input_mask,valid_ids)

resulting sequence

In the following code block :

                    if m and label_map[label_ids[i][j]] != "X":
                        temp_1.append(label_map[label_ids[i][j]])
                        temp_2.append(label_map[logits[i][j]])
                    else:
                        temp_1.pop()
                        temp_2.pop()
                        y_true.append(temp_1)
                        y_pred.append(temp_2)
break

why is temp_1 and temp_2 popped if m and label_map[label_ids[i][j]] == "X" ?

Shouldn't the code block looks more like :

                    if m:
                          if label_map[label_ids[i][j]] != "X":
                            temp_1.append(label_map[label_ids[i][j]])
                            temp_2.append(label_map[logits[i][j]])
                    else:
                        temp_1.pop()
                        temp_2.pop()
                        y_true.append(temp_1)
                        y_pred.append(temp_2)
break

KeyError: 0 during evaluation

ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/BERT-NER+kamalkraj$ python3.5 run_ner.py --data_dir=data/ --bert_model=bert-base-cased --task_name=ner --output_dir=out --max_seq_length=128 --do_train --num_train_epochs 5 --do_eval --warmup_proportion=0.4
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
06/11/2019 19:02:56 - INFO - main - device: cuda n_gpu: 1, distributed training: False, 16-bits training: False
06/11/2019 19:02:57 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt from cache at /home/ub16c9/.pytorch_pretrained_bert/5e8a2b4893d13790ed4150ca1906be5f7a03d6c4ddf62296c383f6db42814db2.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1
06/11/2019 19:02:58 - INFO - pytorch_pretrained_bert.modeling - loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz from cache at /home/ub16c9/.pytorch_pretrained_bert/distributed_-1/a803ce83ca27fecf74c355673c434e51c265fb8a3e0e57ac62a80e38ba98d384.681017f415dfb33ec8d0e04fe51a619f3f01532ecea04edbfd48c5d160550d9c
06/11/2019 19:02:58 - INFO - pytorch_pretrained_bert.modeling - extracting archive file /home/ub16c9/.pytorch_pretrained_bert/distributed_-1/a803ce83ca27fecf74c355673c434e51c265fb8a3e0e57ac62a80e38ba98d384.681017f415dfb33ec8d0e04fe51a619f3f01532ecea04edbfd48c5d160550d9c to temp dir /tmp/tmpyj8ar20e
06/11/2019 19:03:01 - INFO - pytorch_pretrained_bert.modeling - Model config {
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"type_vocab_size": 2,
"vocab_size": 28996
}

06/11/2019 19:03:05 - INFO - pytorch_pretrained_bert.modeling - Weights of BertForTokenClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
06/11/2019 19:03:05 - INFO - pytorch_pretrained_bert.modeling - Weights from pretrained model not used in BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
06/11/2019 19:03:07 - INFO - main - *** Example ***
06/11/2019 19:03:07 - INFO - main - guid: train-0
06/11/2019 19:03:07 - INFO - main - tokens: EU rejects German call to boycott British la ##mb .
06/11/2019 19:03:07 - INFO - main - input_ids: 101 7270 22961 1528 1840 1106 21423 1418 2495 12913 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - *** Example ***
06/11/2019 19:03:07 - INFO - main - guid: train-1
06/11/2019 19:03:07 - INFO - main - tokens: Peter Blackburn
06/11/2019 19:03:07 - INFO - main - input_ids: 101 1943 14428 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - input_mask: 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - *** Example ***
06/11/2019 19:03:07 - INFO - main - guid: train-2
06/11/2019 19:03:07 - INFO - main - tokens: BR ##US ##SE ##LS 1996 - 08 - 22
06/11/2019 19:03:07 - INFO - main - input_ids: 101 26660 13329 12649 15928 1820 118 4775 118 1659 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - *** Example ***
06/11/2019 19:03:07 - INFO - main - guid: train-3
06/11/2019 19:03:07 - INFO - main - tokens: The European Commission said on Thursday it disagreed with German advice to consumers to s ##hun British la ##mb until scientists determine whether mad cow disease can be transmitted to sheep .
06/11/2019 19:03:07 - INFO - main - input_ids: 101 1109 1735 2827 1163 1113 9170 1122 19786 1114 1528 5566 1106 11060 1106 188 17315 1418 2495 12913 1235 6479 4959 2480 6340 13991 3653 1169 1129 12086 1106 8892 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - *** Example ***
06/11/2019 19:03:07 - INFO - main - guid: train-4
06/11/2019 19:03:07 - INFO - main - tokens: Germany ' s representative to the European Union ' s veterinary committee Werner Z ##wing ##mann said on Wednesday consumers should buy sheep ##me ##at from countries other than Britain until the scientific advice was clearer .
06/11/2019 19:03:07 - INFO - main - input_ids: 101 1860 112 188 4702 1106 1103 1735 1913 112 188 27431 3914 14651 163 7635 4119 1163 1113 9031 11060 1431 4417 8892 3263 2980 1121 2182 1168 1190 2855 1235 1103 3812 5566 1108 27830 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:07 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:03:10 - INFO - main - ***** Running training *****
06/11/2019 19:03:10 - INFO - main - Num examples = 14041
06/11/2019 19:03:10 - INFO - main - Batch size = 32
06/11/2019 19:03:10 - INFO - main - Num steps = 2190
Epoch: 40%|████████████████████████████████████████████████████████████████████████████████████████████▊ | 2/5 [07:01<10:33, 211.32s/it^Epoch: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [17:28<00:00, 209.73s/it]
06/11/2019 19:20:40 - INFO - main - *** Example ***████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 439/439 [03:27<00:00, 2.24it/s]
06/11/2019 19:20:40 - INFO - main - guid: dev-0
06/11/2019 19:20:40 - INFO - main - tokens: CR ##IC ##KE ##T - L ##EI ##CE ##ST ##ER ##S ##H ##IR ##E T ##A ##KE O ##VE ##R AT TO ##P A ##FT ##ER IN ##NI ##NG ##S VI ##CT ##OR ##Y .
06/11/2019 19:20:40 - INFO - main - input_ids: 101 15531 9741 22441 1942 118 149 27514 10954 9272 9637 1708 3048 18172 2036 157 1592 22441 152 17145 2069 13020 16972 2101 138 26321 9637 15969 27451 11780 1708 7118 16647 9565 3663 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - *** Example ***
06/11/2019 19:20:40 - INFO - main - guid: dev-1
06/11/2019 19:20:40 - INFO - main - tokens: L ##ON ##D ##ON 1996 - 08 - 30
06/11/2019 19:20:40 - INFO - main - input_ids: 101 149 11414 2137 11414 1820 118 4775 118 1476 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - *** Example ***
06/11/2019 19:20:40 - INFO - main - guid: dev-2
06/11/2019 19:20:40 - INFO - main - tokens: West Indian all - round ##er Phil Simmons took four for 38 on Friday as Leicestershire beat Somerset by an innings and 39 runs in two days to take over at the head of the county championship .
06/11/2019 19:20:40 - INFO - main - input_ids: 101 1537 1890 1155 118 1668 1200 5676 14068 1261 1300 1111 3383 1113 5286 1112 21854 3222 8860 1118 1126 6687 1105 3614 2326 1107 1160 1552 1106 1321 1166 1120 1103 1246 1104 1103 2514 2899 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - *** Example ***
06/11/2019 19:20:40 - INFO - main - guid: dev-3
06/11/2019 19:20:40 - INFO - main - tokens: Their stay on top , though , may be short - lived as title rivals Essex , Derbyshire and Surrey all closed in on victory while Kent made up for lost time in their rain - affected match against Nottinghamshire .
06/11/2019 19:20:40 - INFO - main - input_ids: 101 2397 2215 1113 1499 117 1463 117 1336 1129 1603 118 2077 1112 1641 9521 8493 117 15964 1105 9757 1155 1804 1107 1113 2681 1229 5327 1189 1146 1111 1575 1159 1107 1147 4458 118 4634 1801 1222 21942 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - *** Example ***
06/11/2019 19:20:40 - INFO - main - guid: dev-4
06/11/2019 19:20:40 - INFO - main - tokens: After bowling Somerset out for 83 on the opening morning at Grace Road , Leicestershire extended their first innings by 94 runs before being bowled out for 29 ##6 with England disc ##ard Andy C ##ad ##dick taking three for 83 .
06/11/2019 19:20:40 - INFO - main - input_ids: 101 1258 11518 8860 1149 1111 6032 1113 1103 2280 2106 1120 4378 1914 117 21854 2925 1147 1148 6687 1118 5706 2326 1196 1217 21663 1149 1111 1853 1545 1114 1652 6187 2881 4827 140 3556 25699 1781 1210 1111 6032 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:40 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
06/11/2019 19:20:41 - INFO - main - ***** Running evaluation *****
06/11/2019 19:20:41 - INFO - main - Num examples = 3250
06/11/2019 19:20:41 - INFO - main - Batch size = 8
Evaluating: 11%|████████████████████████▉ | 45/407 [00:01<00:14, 24.73it/s]
Traceback (most recent call last):
File "run_ner.py", line 534, in
main()
File "run_ner.py", line 518, in main
temp_2.append(label_map[logits[i][j]])
KeyError: 0
ub16c9@ub16c9-gpu

Timeout error while handling quotes in the text

The model fail to respond back when a text contains quotes in it.
For example,
{
"text" : "Steve went to Paris. He said, "Paris is amazing city." "
}

It doesn't work whenever the text has single/double quotes and i am supposed to maintain the quotes in the text while working through the model so i cant remove in preprocessing as well

attention_mask_label

I have noticed attention_mask_label in your experiment code, but why you set it to none to avoid using it? Is the performance worse if you only use active parts of loss?

I want to ask how much time cost to fine tune conll03 task?

i spent few minutes to fine tune on conll03 task?
so i think i was wrong with somewhere.

Trained model on custom dataset. Though predicting only conll2003 entities.

I have custom entities data of around 8 entities. I combined that dataset with the conll2003 (As I am interested in conll2003 entities also). I trained the model. Though the trained model is unable to predict any entities outside conll2003. Could you please help me if I am missing anything while training on custom dataset.

I used below command to train the model.

nohup python3 run_ner.py --data_dir=data --bert_model=bert-large-cased --task_name=ner --output_dir=out_bert_large --max_seq_length=128 --num_train_epochs 10 --do_train --do_eval --no_cuda --warmup_proportion=0.4 > log.txt &

About 'X' label

In my opinion, you should remove the 'X' label's signal in evaluation, because you add more label than standard dataset, so I can't know very well the F1-score increase because the more label of 'X'. I think the 'X' label is not equal the 'O' label in standard dataset and the BERT paper, but in your code it may be same.

why num_labels = len(label_list) + 1?

NER fails horribly when uncased

In the pre-trained example provided, try changing the casing in the sentence.

output = model.predict("Steve went to paris")
{'paris': {'tag': 'O', 'confidence': 0.9998948574066162},
'Steve': {'tag': 'B-PER', 'confidence': 0.9998831748962402}}

output = model.predict("steve went to Paris")
{'Paris': {'tag': 'B-LOC', 'confidence': 0.9998199343681335},
'steve': {'tag': 'O', 'confidence': 0.9998823404312134}}

maybe training should be done uncased?

NER on a document

Thanks Kamal for wonderful work.

I am seeking for some help on how can i keep a check on the co reference of entities in a doc. For exam Person names as James Paul appears 10 times in the document ( which can be any one of James or Paul or James Paul). Can you suggest me some ideas on how to list up all the mentions together. It can get tricky if the doc has like two person as James Paul and James Real so how would one can find which James is being referred in the doc.

Sorry, just looking for some help

Thanks

kamalkraj / bert-ner Goto Github PK

bert-ner's Introduction

BERT NER

Requirements

Run

Result

BERT-BASE

Validation Data

Test Data

Pretrained model download from here

BERT-LARGE

Validation Data

Test Data

Pretrained model download from here

Inference

Inference C++

Pretrained and converted bert-base model download from here

Download libtorch from here

Deploy REST-API

cURL request

cURL

Postman

C++ unicode support

Tensorflow version

bert-ner's People

Contributors

Stargazers

Watchers

Forkers

bert-ner's Issues

Recommend Projects

Recommend Topics

Recommend Org