cyberzhg / keras-bert Goto Github PK

View Code? Open in Web Editor NEW

2.4K 59.0 509.0 14.13 MB

Implementation of BERT that could load official pre-trained models for feature extraction and prediction

License: MIT License

Shell 0.33% Python 99.67%

keras bert language-model

keras-bert's Introduction

Keras BERT

[中文|English]

Implementation of the BERT. Official pre-trained models could be loaded for feature extraction and prediction.

Install

pip install keras-bert

External Links

Load Official Pre-trained Models

In feature extraction demo, you should be able to get the same extraction results as the official model chinese_L-12_H-768_A-12. And in prediction demo, the missing word in the sentence could be predicted.

Run on TPU

The extraction demo shows how to convert to a model that runs on TPU.

The classification demo shows how to apply the model to simple classification tasks.

Tokenizer

The Tokenizer class is used for splitting texts and generating indices:

from keras_bert import Tokenizer

token_dict = {
    '[CLS]': 0,
    '[SEP]': 1,
    'un': 2,
    '##aff': 3,
    '##able': 4,
    '[UNK]': 5,
}
tokenizer = Tokenizer(token_dict)
print(tokenizer.tokenize('unaffable'))  # The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]']`
indices, segments = tokenizer.encode('unaffable')
print(indices)  # Should be `[0, 2, 3, 4, 1]`
print(segments)  # Should be `[0, 0, 0, 0, 0]`

print(tokenizer.tokenize(first='unaffable', second='钢'))
# The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]', '钢', '[SEP]']`
indices, segments = tokenizer.encode(first='unaffable', second='钢', max_len=10)
print(indices)  # Should be `[0, 2, 3, 4, 1, 5, 1, 0, 0, 0]`
print(segments)  # Should be `[0, 0, 0, 0, 0, 1, 1, 0, 0, 0]`

Train & Use

from tensorflow import keras
from keras_bert import get_base_dict, get_model, compile_model, gen_batch_inputs


# A toy input example
sentence_pairs = [
    [['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],
    [['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],
    [['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],
]


# Build token dictionary
token_dict = get_base_dict()  # A dict that contains some special tokens
for pairs in sentence_pairs:
    for token in pairs[0] + pairs[1]:
        if token not in token_dict:
            token_dict[token] = len(token_dict)
token_list = list(token_dict.keys())  # Used for selecting a random word


# Build & train the model
model = get_model(
    token_num=len(token_dict),
    head_num=5,
    transformer_num=12,
    embed_dim=25,
    feed_forward_dim=100,
    seq_len=20,
    pos_num=20,
    dropout_rate=0.05,
)
compile_model(model)
model.summary()

def _generator():
    while True:
        yield gen_batch_inputs(
            sentence_pairs,
            token_dict,
            token_list,
            seq_len=20,
            mask_rate=0.3,
            swap_sentence_rate=1.0,
        )

model.fit_generator(
    generator=_generator(),
    steps_per_epoch=1000,
    epochs=100,
    validation_data=_generator(),
    validation_steps=100,
    callbacks=[
        keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
    ],
)


# Use the trained model
inputs, output_layer = get_model(
    token_num=len(token_dict),
    head_num=5,
    transformer_num=12,
    embed_dim=25,
    feed_forward_dim=100,
    seq_len=20,
    pos_num=20,
    dropout_rate=0.05,
    training=False,      # The input layers and output layer will be returned if `training` is `False`
    trainable=False,     # Whether the model is trainable. The default value is the same with `training`
    output_layer_num=4,  # The number of layers whose outputs will be concatenated as a single output.
                         # Only available when `training` is `False`.
)

Use Warmup

AdamWarmup optimizer is provided for warmup and decay. The learning rate will reach lr in warmpup_steps steps, and decay to min_lr in decay_steps steps. There is a helper function calc_train_steps for calculating the two steps:

import numpy as np
from keras_bert import AdamWarmup, calc_train_steps

train_x = np.random.standard_normal((1024, 100))

total_steps, warmup_steps = calc_train_steps(
    num_example=train_x.shape[0],
    batch_size=32,
    epochs=10,
    warmup_proportion=0.1,
)

optimizer = AdamWarmup(total_steps, warmup_steps, lr=1e-3, min_lr=1e-5)

Download Pretrained Checkpoints

Several download urls has been added. You can get the downloaded and uncompressed path of a checkpoint by:

from keras_bert import get_pretrained, PretrainedList, get_checkpoint_paths

model_path = get_pretrained(PretrainedList.multi_cased_base)
paths = get_checkpoint_paths(model_path)
print(paths.config, paths.checkpoint, paths.vocab)

Extract Features

You can use helper function extract_embeddings if the features of tokens or sentences (without further tuning) are what you need. To extract the features of all tokens:

from keras_bert import extract_embeddings

model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
texts = ['all work and no play', 'makes jack a dull boy~']

embeddings = extract_embeddings(model_path, texts)

The returned result is a list with the same length as texts. Each item in the list is a numpy array truncated by the length of the input. The shapes of outputs in this example are (7, 768) and (8, 768).

When the inputs are paired-sentences, and you need the outputs of NSP and max-pooling of the last 4 layers:

from keras_bert import extract_embeddings, POOL_NSP, POOL_MAX

model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'
texts = [
    ('all work and no play', 'makes jack a dull boy'),
    ('makes jack a dull boy', 'all work and no play'),
]

embeddings = extract_embeddings(model_path, texts, output_layer_num=4, poolings=[POOL_NSP, POOL_MAX])

There are no token features in the results. The outputs of NSP and max-pooling will be concatenated with the final shape (768 x 4 x 2,).

The second argument in the helper function is a generator. To extract features from file:

import codecs
from keras_bert import extract_embeddings

model_path = 'xxx/yyy/uncased_L-12_H-768_A-12'

with codecs.open('xxx.txt', 'r', 'utf8') as reader:
    texts = map(lambda x: x.strip(), reader)
    embeddings = extract_embeddings(model_path, texts)

keras-bert's People

Contributors

Stargazers

Watchers

Forkers

george86028 yxryxryxr3 echo-ray wangxuekui wushicanasl yuanjie-ai vomoboros nikitos9000 bgshin gu5hanl1gh7n1n ngo010 shujian2015 daniter-cu aiedward drivendataorg ym-kang etndenis dkhaha amoliu little1tow gongqingyi-github allensmile zhouyonglong awesome-archive zyxpaidaxing huguanglong leweicn gdtop818 arunkumarramanan ruirongxue jiabaohan bradfox2 jumutc inistlwq hoangcuong2011 adamjm cxz lyoshiwo gsj1029 inanalysis xinpingluo gdh756462786 boluoyu liupeng0606 lduml tongbc billpku wsp317 thethiny psirenny dkyos zzbzzb1413 panchunguang shea1992 tpalczew zawecha1 ericschles lity3lenovo gc20 totalgood kunge huanzhang999 soonhwan-kwon bojone genpeng leekltw elavin11 bidoudhd princessd8251 autwind sangensong wild3fish janciswang shkklt hemidemisemi ibrahim85 liguiming77 ericpts guoqunabc tjunlp ieee820 airob little-girl-1992 cr7wo cliffsong94 ch0831 ch488674662 chenny0808 plodded xiyangde morindaz rileyshe abc3436645 mujizi taichuai frannetty vicky-meng bzp92 leochencipher ross-intelligence

keras-bert's Issues

MaskedGlobalAveragePool1D

Is your feature request related to a problem? Please describe.
Masked Global Average Pool1D

Describe the solution you'd like

Hi Cyber,
I wrote this piece of code with comments, could you help take a look ? Thanks. I initially used it and seems working fine.

class MaskedGlobalAveragePool1D(keras.layers.Layer): 
  
     def __init__(self, **kwargs): 
         super(MaskedGlobalAveragePool1D, self).__init__(**kwargs) 
         self.supports_masking = True 
  
     def compute_mask(self, inputs, mask=None): 
         return None 
  
     def compute_output_shape(self, input_shape): 
         return input_shape[:-2] + (input_shape[-1],) 
  
     def call(self, inputs, mask=None): 
         if mask is not None: 
             mask = K.cast(mask, K.floatx())  # cast mask to float
             inputs *= K.expand_dims(mask, axis=-1) # zero wherever mask=0.0
         return K.mean(inputs, axis=-2) # average through time

Model of bert that it is using.

I am using this file and after i installed pip3 install keras_bert --user i got i working befor that it was not.

I am wondering which model of bert is it using ?

SQuAD

Hi,
is it possible to apply the fine tuning of the original BERT model on SQuAD dataset as in the paper?

OOV (out of vocab)

Is your feature request related to a problem? Please describe.
Some words are out of the vocab.txt (which is from https://github.com/google-research/bert and gives 30522 words), e.g., "edits" is OOV.

Describe the solution you'd like
I am thinking random initialize a 512 embedding for the OOV words, but not sure it works, and how.
Besides, since OOV is a very common problem, maybe there is already some off-the-shelf solution code for keras-BERT ?
Many thanks
Describe alternatives you've considered
NA

Additional context

Is this able to reproduce original results?

Hi,

Thanks for the repo and your time! Do you know if this is able to reproduce the original results in terms of accuracy and model speed?

Thank you for your time and God bless!

(Sorry I mislabeled this as a bug.)

Generators file for low-memory users - Code Provided

Hi,
I would suggest these 2 generators for usage with people with small memory, or for GPUs with low memory.

def _generator(sentence_pairs, batch_size, seq_len):
    while True:
        inp, outp = gen_batch_inputs(sentence_pairs, token_dict, token_list, seq_len=seq_len)
        for i in range (0, len(sentence_pairs), batch_size):
            yield [inp[x][i:i+batch_size] for x in range(len(inp))], [outp[x][i:i+batch_size] for x in range (len(outp))]
            
def _generator_v(sentence_pairs, batch_size, seq_len):
    while True:
        for i in range(0, len(sentence_pairs), batch_size):
            yield gen_batch_inputs(sentence_pairs[i:i+batch_size], token_dict, token_list, seq_len=seq_len)

如何获取句子向量？

您好，
您的代码对我帮助非常大，现在得到的是字的向量，请问如何得到句子向量呢？

How to use Bert in keras Embedding layer?

Hi,

I find embedding.py
https://github.com/CyberZHG/keras-bert/blob/master/keras_bert/layers/embedding.py

but how can i add it into my Keras model, and how i input data to pretrain bert embedding?

Keras model is:

model = Sequential()
model.add(TokenEmbedding)
model.add(Bidirectional(LSTM(256, return_sequences=True)))
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(len(tag2index))))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=Adam(0.001),metrics=['accuracy'])

is error, the error message is :
TypeError: The added layer must be an instance of class Layer. Found: <class 'main.TokenEmbedding'>

i have a matrix [batch_size,sequnce_len],is:issue is:open how could i get my data' embedding [batch_size,sequence_len,768]by the test_embedding.py

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Is there a function to build sentence_pairs in train_use.py?

I have my own data chunk and vocab. I want to train BERT in Keras. However I could not find a function to build sentence_pairs which is used as a parameter in gen_batch_inputs function.

Unable to load official pre-trained checkpoint using load_trained_model_from_checkpoint

When I try to use load_trained_model_from_checkpoint() to load the official BERT-base uncased model (https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip), I get the following error:

Traceback (most recent call last):
  File "ner_training_code_BERT.py", line 139, in <module>
    model, loss = build_model()
  File "ner_training_code_BERT.py", line 67, in build_model
    bert_model = load_trained_model_from_checkpoint(_bert_config_path, _bert_checkpoint_path, training=False)
  File "/Users/scguo/miniconda3/envs/py36/lib/python3.6/site-packages/keras_bert/loader.py", line 26, in load_trained_model_from_checkpoint
    tf.train.load_variable(checkpoint_file, 'bert/embeddings/word_embeddings'),
  File "/Users/scguo/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 81, in load_variable
    return reader.get_tensor(name)
  File "/Users/scguo/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 334, in get_tensor
    status)
  File "/Users/scguo/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: bert/embeddings/word_embeddings not found in checkpoint file

Seems like the op names that we're expecting does not match with the official model?

Sentence Pair Classification

How can I use this library for sentence pair classification?

bug in Custom Feature Extraction

def _custom_layers(x, trainable=True):
    return keras.layers.LSTM(
        units=768,
        trainable=trainable,
        name='LSTM',
    )(x)

It seems that the lstm custom_layers　should add "return_sequences=True"

How to get encoded sentence vector

Hello, Sir

Thank you for implementation Keras version BERT.

Is there any way to get an encoded sentence vector?

I have a sentence like:

The Top 25 Songs That Matter Right Now

And I want to get encoded sentence vector after putting this sentence to BERT.

Thx again.

Using released BERT pre-trained weights

Awesome job that you did !

Do you plan implementing the BERT pre-trained weights, now that they are released ?

关于指定seq_len的长度？

您好，我想问下在load bert模型时可不可以不指定seq的长度呀？就比如在做NER任务时预测时，每条句子的长度都是不一样的．．

Inconsistency definition of the training param of load_trained_model_from_checkpoint function

Inconsistency definition of the training param of load_trained_model_from_checkpoint function

I loaded the pre-trained BERT model from an official tf checkpoint, using load_trained_model_from_checkpoint with param training=False.
I don't want to train the BERT model from scratch, i.e. by MLM or NSP, however, I do want my downstream data will somehow update params inside BERT model. As shown in the fig below, the bert is trainable as a keras model, however all weights inside the model are non-trainable weights.

I'm confused is there anything wrong with my code or anything wrong with the training param?

finetune BERT with custom dataset

Is your feature request related to a problem? Please describe.
Wish to finetune BERT (MLM, PairSentence) with customer dataset, e.g. text exacted from a book.

Describe the solution you'd like

Describe alternatives you've considered
Which function wherein we can feed a customer dataset, for example, a text file from a book ?
Do we need write a function to format the text file so that it can be taken by BERT ?

Additional context

Is that possible to change input shape from 512 to 20 when using pre-trained model

Is that possible to change input shape from 512 to 20 when using a pre-trained model?
I want to fine-tune a short text classification with Keras.

Retraining vs. training from scratch inconsistency?

Hello, thanks for your effort and work in this project. I attempt to train two models using the same data and generator code (based on gen_batch_inputs in bert.py), once using get_model and the other using load_model_from_checkpoint. As an aside, I am still unsure whether the second option is ok: your readme states that "Official pre-trained models could be loaded for feature extraction and prediction" but in Issue #1 you seem to say that the official models cannot be loaded correctly with this implementation.
In any case, when using get_model training proceeds as expected, but when using load_model_from_checkpoint, I get:
ValueError: Error when checking target: expected MLM to have shape (512, 30522) but got array with shape (512,1)
(30522 is the length of my token_list / size of my token_dict)
Am I missing an obvious reason for this, or is this a problem with the code?

About parameter trainable for classifier

Hi. When using Bert model to classify tasks, the following code

If trainable = True is set and then any classifier is added, the result will only be biased towards a certain category, whether multi-classification or bi-classification.
Non-trainable params: 0 in the result of model. summary ().
But if trainable is not set, the other codes are exactly the same, and the result is quite normal. At this point Non-trainable params: 101, 306, 880

So how to train fine-tune correctly?

bert_model = load_trained_model_from_checkpoint(config_path, checkpoint_path, seq_len=maxlen, 
                              training=False, trainable=True)
pool_layer = MaskedGlobalMaxPool1D(name='Pooling')(bert_model.output)
out = Dense(32, activation='relu')(pool_layer)
output = Dense(units=class_num, activation=output_activation)(out)
model = Model(bert_model.input, output)
model.compile(loss=model_loss, optimizer=model_optimizer, metrics=['categorical_accuracy'])

tensorflow.python.framework.errors_impl.DataLossError: Checksum does not match: stored 3531060969 vs. calculated on the restored bytes 1701788620

测试环境

tensorflow1.13.1+python3.6+ubuntu18.04+cuda10.0

keras-bert version: 0.29.0, 0.30.0

报错信息如下

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/notebooks/ner.py", line 95, in train
    embedding = BERTEmbedding('./bert', seq_len)
  File "/usr/local/lib/python3.6/dist-packages/kashgari/embeddings/embeddings.py", line 69, in __init__
    self.build(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kashgari/embeddings/embeddings.py", line 301, in build
    seq_len=self.sequence_length)
  File "/usr/local/lib/python3.6/dist-packages/keras_bert/loader.py", line 71, in load_trained_model_from_checkpoint
    loader('bert/encoder/layer_%d/attention/self/value/kernel' % i),
  File "/usr/local/lib/python3.6/dist-packages/keras_bert/loader.py", line 10, in _loader
    return tf.train.load_variable(checkpoint_file, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 82, in load_variable
    return reader.get_tensor(name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 370, in get_tensor
    status)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: Checksum does not match: stored 3531060969 vs. calculated on the restored bytes 1701788620

What dict we need to passed when using demo load_and_extract.py

I am trying to use load_and_extract.py but unable to find in 'DICT' path what we need to pass ?

Special tokens representation

In bert.py, the special tokens in base_dict use angle brackets, but in the pretrained bert models released by Google, the vocab file has them with rectangular brackets. e.g., your TOKEN_UNK is '<UNK>' but in the vocab file it is '[UNK]'. Also, your padding token is an empty string while the vocab file's padding token is '[PAD]'. This causes problems when trying to adapt your bert_fit_demo to continue training the released models, instead of training a model from scratch.

Why does the model need to know the size of the dictionary?

MaskConv1D

Is your feature request related to a problem? Please describe.
Error when adding a Conv1D layer on top of the BERT embeddings, complaining about "mask".

Describe the solution you'd like
Can you make a class for MaskConv1D layer, similar as MaskGlobalMaxPooling ?

Describe alternatives you've considered
PR

Additional context
N/A

ImportError : cannot import name 'Tokenizer'

Describe the Bug

when I use the keras_bert to execute the demo https://github.com/CyberZHG/keras-bert/blob/master/demo/load_model/load_and_extract.py, it gives me a error about the import the Tokenizer. The detailed description is shown as follow:

And What is the cause of this problem?
thanks

Minimal Codes To Reproduce

import keras_bert

from keras_bert import load_trained_model_from_checkpoint
from keras_bert import Tokenizer

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-27-cdcfeee15447> in <module>()
      1 from keras_bert import load_trained_model_from_checkpoint
----> 2 from keras_bert import Tokenizer

ImportError: cannot import name 'Tokenizer'

save and load model

Is your feature request related to a problem? Please describe.
When the model is saved and loaded, error happens due to "mask".
So far the workaround is saving weights and loading weights.

Thanks

tf.placeholder don't exist in tf 2.0

Describe the Bug

On tensorflow 2.0, there is no tf.placeholder

Traceback (most recent call last):
  File "load_and_extract.py", line 18, in <module>
    model = load_trained_model_from_checkpoint(config_path, checkpoint_path)
  File "/usr/local/lib/python3.5/dist-packages/keras_bert/loader.py", line 43, in load_trained_model_from_checkpoint
    training=training,
  File "/usr/local/lib/python3.5/dist-packages/keras_bert/bert.py", line 58, in get_model
    inputs = get_inputs(seq_len=seq_len)
  File "/usr/local/lib/python3.5/dist-packages/keras_bert/layers/inputs.py", line 15, in get_inputs
    ) for name in names]
  File "/usr/local/lib/python3.5/dist-packages/keras_bert/layers/inputs.py", line 15, in <listcomp>
    ) for name in names]
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/input_layer.py", line 178, in Input
    input_tensor=tensor)
  File "/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/input_layer.py", line 87, in __init__
    name=self.name)
  File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 517, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
AttributeError: module 'tensorflow' has no attribute 'placeholder'

Version Info

tf-nightly-gpu-2.0-preview 2.0.0.dev20190304
Keras 2.2.4
python 3.5.2

Single-Sentence Input?

A little new to the BERT model, but it seems that in your code, you have a gen_batch_inputs() function that only allows for sentence pairing to be passed through. How would you implement a dataset for single-sentence encoding or classification?

single sentence classifier

Does it support single sentence classifier (not pair-sentence) ?
Such as SST (http://nlp.stanford.edu/~socherr/stanfordSentimentTreebank.zip) or Cola dataset (https://nyu-mll.github.io/CoLA/), described in original BERT paper https://arxiv.org/pdf/1810.04805.pdf Section 4.1 GLUE dataset

If yes, what data format it should be ?
Can we simply input [sentence , padding_zeros] as a pair and get reasonable training ?

Minor issue in demo

Hi,

in https://github.com/CyberZHG/keras-bert/blob/master/demo/load_model/load_and_extract.py#L28

the sequence input is encoded as

seg_input = np.asarray([[0] * len(tokens) + [0] * (512 - len(tokens))])

I think the second part should be [1], i.e.

seg_input = np.asarray([[0] * len(tokens) + [1] * (512 - len(tokens))])

Only then is the padding part masked correctly. For the demo it does not make any difference directly as only the len(tokens) tokens are checked. But when this is used as a basis for something more complex, this might create a problem.

Explanation of parameter max_len in tokenizer.encode ？

It seems that this parameter should be capable with the pretrained BERT model , but where to find the setting in donwleded bert_config?

How to do NER task? not sentence pairs

how to input the ner data to model? not sentence pairs

Setting up Keras Bert for Reading Comprehension

I am looking at section 4.2 of the BERT paper or how to set up BERT for reading comprehension. It looks like a module needs to be added to the end of BERT, S and E are new parameters, and a log softmax loss is calculated for the start and end positions.

This extension is included in the original tensorflow BERT in the 'run_squad.py' script in the repository.

Does does an extension exist for BERT Keras?

batch prediction ?

how to feed a batch of data to the model ?

model.input_shape
[(None, 512), (None, 512)]

When I was trying to feed the model with a batch of data, such as data shape = [64, 2, 512] it gives
"ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays ..."

,
I was using something like this:

from keras.layers import Lambda
import keras.backend as K 
def mean(x):
    return K.mean(x, axis=1, keepdims=False)
###
inputs, embeds = get_model(
    token_num=len(token_dict),
    head_num=12,
    transformer_num=12,
    embed_dim=768,
    feed_forward_dim=100,
    seq_len=512,
    pos_num=512,
    dropout_rate=0.05,
    training = False,
)
####
avg_embeds = Lambda(mean)(embeds)
pred = Dense(1, activation="sigmoid")(avg_embeds)
model = Model(inputs=inputs, outputs=pred)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])

Thanks

Do you change the shape of the prediction result

Thanks for your great work.
I see the original BERT prediction return 3 numpy, but you demo(load_and_predict.py) only return 2 numpy. Not sure if i'm misunderstand you or the original BERT.

Building custom model over the final embedding layer

BERT supposedly generates 768 dimensional embeddings for tokens. I am trying to build a multi-class classification model on top of this. My assumption is that the output of layer Encoder-12-FeedForward-Norm of shape (None, [seq_length], 768) would give this embeddings. This is what I am trying :

model = load_trained_model_from_checkpoint(config_path, checkpoint_path, training=True, seq_len=seq_len)

new_out = Bidirectional(LSTM(50, return_sequences=True, 
                       dropout=0.1, 
                       recurrent_dropout=0.1))(model.layers[-9].output)
new_out = GlobalMaxPool1D()(new_out)
new_out = Dense(50, activation='relu')(new_out)
new_out = Dropout(0.1)(new_out)
new_out = Dense(6, activation='sigmoid')(new_out)

newModel = Model(model.inputs[:2], new_out)

I get the following error for new_out = GlobalMaxPool1D()(new_out) :

TypeError: Layer global_max_pooling1d_11 does not support masking, but was passed an input_mask: Tensor("Encoder-12-FeedForward-Add/All:0", shape=(?, 128), dtype=bool)

I am not sure how masking is involved if I am just using the output of the encoder.

The paper mentions that the output corresponding to just the first [CLS] token should be used for classification. On trying this :

new_out = Lambda(lambda x: x[:,0,:])(model.layers[-9].output)

the model trains (although with poor results).

How can the pre-loaded model be used for classification?

The sepration of this repo make it much harder to read the code now.

Awesome work!
However I have noticed that you have split this repo into 5 or 6 diff repos recently, which make the entire src much harder to follow now.
I think it`s better to keep all your customize code in a same repo, just a suggestion.

How to apply BERT to a cloze-style QA task?

Great job! Can I apply it to the cloze-style QA task, which is to predict the words that are masked according to the context. How should the input of the model be organized?

For example:

The dataset looks like:

From Monday to Friday most people are busy working or studying, but in the evenings and weekends they are free and _ themselves.
And there are four candidate answers :

options": [
[
"love",
"work",
"enjoy",
"play"
]

Apparently the correct answer is "enjoy"， How can I organize the input so that BERT can predict the missing word given the context? Thank you for your excellent code!

using model.fit instead of model.fit_generator

hello!

i am trying to build a classifier on top of the keras-bert model. however, when I attempt to run model.fit on the new model, I constantly get ValueError: An operation has `None` for gradient.

upon further testing, running model.fit on the loaded keras-bert model alone also gives this error. as such, may I clarify if fit_generator must be used for fine-tuningthe BERT model?

(my rationale for not using the original BERT code to fine-tune the model is because I am making use of keras shared layers as part of my classifier, which does not appear to be doable with tensorflow alone)

thank you so much!

TypeError: Tensors in list passed to 'values' of 'Pack' Op have types

Line11 : custom_layers=_custom_layers

TypeError: Tensors in list passed to 'values' of 'Pack' Op have types [bool, ] that don't all match.

Anyone having the same error and solution ? Thanks

Do we support bert finetune classifier?我们支持 bert 的finetune 标签分类模型吗

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.
如题，最近在使用 bert ，确实在小数据上很管用，想咨询一下，我们这个包是否支持

if reps is tensor.vector, you should specify the ndim

I am trying to run the code :
from keras_bert import get_base_dict, get_model, gen_batch_inputs

A toy input example

sentence_pairs = [
[['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],
[['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],
[['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],
]

Build token dictionary

token_dict = get_base_dict() # A dict that contains some special tokens
for pairs in sentence_pairs:
for token in pairs[0] + pairs[1]:
if token not in token_dict:
token_dict[token] = len(token_dict)
token_list = list(token_dict.keys()) # Used for selecting a random word

Build & train the model

model = get_model(
token_num=len(token_dict),
head_num=5,
transformer_num=12,
embed_dim=25,
feed_forward_dim=100,
seq_len=20,
pos_num=20,
dropout_rate=0.05,
)
model.summary()

But it's showing me this error :
ValueError Traceback (most recent call last)
in ()
28 seq_len=20,
29 pos_num=20,
---> 30 dropout_rate=0.05,
31 )
32 model.summary()

~\Anaconda3\lib\keras_bert\bert.py in get_model(token_num, pos_num, seq_len, embed_dim, transformer_num, head_num, feed_forward_dim, dropout_rate, attention_activation, feed_forward_activation, custom_layers, training, lr)
63 pos_num=pos_num,
64 dropout_rate=dropout_rate,
---> 65 trainable=training,
66 )
67 transformed = embed_layer

~\Anaconda3\lib\keras_bert\layers\embedding.py in get_embedding(inputs, token_num, pos_num, embed_dim, dropout_rate, trainable)
54 trainable=trainable,
55 name='Embedding-Position',
---> 56 )(embed_layer)
57 if dropout_rate > 0.0:
58 dropout_layer = keras.layers.Dropout(

~\Anaconda3\lib\site-packages\keras\engine\base_layer.py in call(self, inputs, **kwargs)
455 # Actually call the layer,
456 # collecting output(s), mask(s), and shape(s).
--> 457 output = self.call(inputs, **kwargs)
458 output_mask = self.compute_mask(inputs, previous_mask)
459

~\Anaconda3\lib\keras_pos_embd\pos_embd.py in call(self, inputs, **kwargs)
128 pos_embeddings = K.tile(
129 K.expand_dims(self.embeddings[:seq_len, :self.output_dim], axis=0),
--> 130 K.stack([batch_size, 1, 1]),
131 )
132 if self.mode == self.MODE_ADD:

~\Anaconda3\lib\site-packages\keras\backend\theano_backend.py in tile(x, n)
1067
1068 def tile(x, n):
-> 1069 y = T.tile(x, n)
1070 if hasattr(x, '_keras_shape'):
1071 if _is_explicit_shape(n):

~\Anaconda3\lib\site-packages\theano\tensor\basic.py in tile(x, reps, ndim)
5413 elif ndim_check == 1:
5414 if ndim is None:
-> 5415 raise ValueError("if reps is tensor.vector, you should specify "
5416 "the ndim")
5417 else:

ValueError: if reps is tensor.vector, you should specify the ndim
Any help please

Tokenization on Cased Model

Is your feature request related to a problem? Please describe.
On the original BERT implementation from Google, the tokenizer does not perform normalization (lower casing, accent stripping, or Unicode normalization) on the input when using a cased model. e.g. Multilingual Cased

Describe the solution you'd like
In the Tokenization class, prevent normalization on the input when the model is cased.

Additional context
Lines 71 - 73 in tokenizer.py

could you provide a demo for classifier?

I want to use keras-bert on my project, it's more clean than bert original source for me. But I didn't know how to convert bert classifier source to keras-bert classifer.
Could you provide a demo for model.get_pooled_output() classifier ?

Regards.

What does token_num parameter in get_model function mean?

Padding zeros

Hi,

Thank you for your keras BERT code. I'd like to report a padding issue.

The rest of segment ids should be padded with zeros not ones.

segments += [1] * (max_len - len(segments))
==>
segments += [0] * (max_len - len(segments))

code from tokenizer.py link

Use BERT to compute sentence similarity

I want to compute similarity between two sentences (sentA and sentB).
I have encoded each sentence using script i.e. load_and_extract.py. so now embedding matrix of sentA and sentB has shape (1,512,768). After that i am thinking to add fully connected layer to compute the similarity between two sentences.

Note: I am using base model (with 12 hidden layers)

Question: Is this right approach to use BERT for sentence similarity? Furthermore, I have also seen some people are using MaskedGlobalMaxPool1D after hidden layers to encode the sentences. Do I have to take embeddings after applying MaskedGlobalMaxPool1D? Why there is need of MaskedGlobalMaxPool1D ?

Thanks in advance.

How to apply BERT to a custom classification model?

Let's say I'm trying to load a pretrained BERT model from a checkpoint, add layers after it and build the model with Model.compile():

What does the input have to look like if BERT is the first "layer"? Would be nice to have a simple example.