utterworks / fast-bert Goto Github PK
View Code? Open in Web Editor NEWSuper easy library for BERT based NLP models
License: Apache License 2.0
Super easy library for BERT based NLP models
License: Apache License 2.0
When installing fast_bert
after pip install fast_bert
, i got this in from fast_bert.learner import BertLearner
when I tried to run the example. I got the error:
databunch = BertDataBunch('./data/', './data/',
tokenizer='bert-base-uncased',
train_file='train.csv',
val_file='val.csv',
label_file='labels.csv',
text_col='text',
label_col='label',
batch_size_per_gpu=16,
max_seq_length=512,
multi_gpu=True,
multi_label=False,
model_type='bert',
no_cache=True)
TypeError Traceback (most recent call last)
in
11 multi_label=False,
12 model_type='bert',
---> 13 no_cache=True)
/data/miniconda3/envs/pt/lib/python3.7/site-packages/fast_bert/data_cls.py in init(self, data_dir, label_dir, tokenizer, train_file, val_file, test_data, label_file, text_col, label_col, batch_size_per_gpu, max_seq_length, multi_gpu, multi_label, backend, model_type, logger, clear_cache, no_cache)
288 self.tokenizer = tokenizer
289 self.data_dir = data_dir
--> 290 self.cache_dir = data_dir/'cache'
291 self.max_seq_length = max_seq_length
292 self.batch_size_per_gpu = batch_size_per_gpu
TypeError: unsupported operand type(s) for /: 'str' and 'str'
Could you help me to deal with that?
Everything works perfectly until I want to create the BertLearner.
When I run following cell
learner = BertLearner.from_pretrained_model(databunch, 'bert-base-multilingual-uncased', metrics, device, logger, finetuned_wgts_path=None, is_fp16=args['fp16'], loss_scale=args['loss_scale'], multi_gpu=multi_gpu, multi_label=False)
the cell is stuck loading.
The logger gives me following hints:
`07/17/2019 10:05:36 - INFO - pytorch_pretrained_bert.modeling - loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased.tar.gz from cache at /home/ec2-user/.pytorch_pretrained_bert/437da855f7aeb6dcc47ee03b11ac55bfbc069d31354f6867f3b298aad8429925.dd2dce7e7331017693bd2230dbc8015b12a975201a420a856a6efbf7ae9d84c5
07/17/2019 10:05:36 - INFO - pytorch_pretrained_bert.modeling - extracting archive file /home/ec2-user/.pytorch_pretrained_bert/437da855f7aeb6dcc47ee03b11ac55bfbc069d31354f6867f3b298aad8429925.dd2dce7e7331017693bd2230dbc8015b12a975201a420a856a6efbf7ae9d84c5 to temp dir /tmp/tmp5yuiacnx
07/17/2019 10:05:43 - INFO - pytorch_pretrained_bert.modeling - Model config {
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2,
"vocab_size": 105879
}
07/17/2019 10:05:48 - INFO - pytorch_pretrained_bert.modeling - Weights of BertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
07/17/2019 10:05:48 - INFO - pytorch_pretrained_bert.modeling - Weights from pretrained model not used in BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']`
I followed your instructions using my data.
Since the batch_size was too big for my data i changed it to 6.
Then i got this error during evaluation:
08/23/2019 17:50:14 - INFO - root - Running evaluation---------------------------------------------------------| 0.82% [49/5955 00:37<1:15:53] 08/23/2019 17:50:14 - INFO - root - Num examples = 9833 08/23/2019 17:50:14 - INFO - root - Batch size = 6 Traceback (most recent call last): File "train_fast_bert_doc_rerank.py", line 81, in <module> optimizer_type="lamb" File "/usr/local/lib/python3.6/site-packages/fast_bert/learner_cls.py", line 295, in fit results = self.validate() File "/usr/local/lib/python3.6/site-packages/fast_bert/learner_cls.py", line 382, in validate validation_scores[metric['name']] = metric['function'](all_logits, all_labels) File "/usr/local/lib/python3.6/site-packages/fast_bert/metrics.py", line 31, in accuracy_thresh return ((y_pred > thresh) == y_true.byte()).float().mean().item() RuntimeError: The size of tensor a (2) must match the size of tensor b (9833) at non-singleton dimension
Could you help me?
Thank you in advance
I have trained multi class text classifier using BERT. I a getting accuracy around 90%. The only issue is the model is classifying out of domain sentences with very high confidence score(e.g. 0.9954564 score).
I have seen in other models like space supervised it classify out of domain sentences with very low confidence which helps to detect them. Is there any method to solve this problem?
/content/xlnet_cased_L-12_H-768_A-12/output/tensorboard
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
AttributeError Traceback (most recent call last)
in ()
----> 1 learner.fit(args.num_train_epochs, args.learning_rate, validate=True)
2 frames
/usr/local/lib/python3.6/dist-packages/fast_bert/metrics.py in accuracy_thresh(y_pred, y_true, thresh, sigmoid)
29 if sigmoid:
30 y_pred = y_pred.sigmoid()
---> 31 return ((y_pred > thresh) == y_true.bool()).float().mean().item()
32 # return np.mean(((y_pred>thresh)==y_true.byte()).float().cpu().numpy(), axis=1).sum()
33
AttributeError: 'Tensor' object has no attribute 'bool'
I use metrics as [{'name': 'F1-score', 'function': F1}], run the samples data for 4 epoch.
However, after each epoch, I got the F1 score is 0, what's wrong?
from fast_bert.learner import *
from fast_bert.metrics import *
from pytorch_pretrained_bert.tokenization import BertTokenizer
from bert_data import *
import torch
from fastai.text import *
import datetime
run_start_time = datetime.datetime.today().strftime('%Y-%m-%d_%H-%M-%S')
LOG_PATH=Path('logs/')
MODEL_PATH=Path('models/')
if not LOG_PATH.exists():
LOG_PATH.mkdir()
import logging
args = {
"run_text": "my_test",
"max_seq_length": 512,
"do_lower_case": True,
"train_batch_size": 16,
"learning_rate": 6e-5,
"num_train_epochs": 12.0,
"warmup_proportion": 0.002,
"local_rank": -1,
"gradient_accumulation_steps": 1,
"fp16": True,
"loss_scale": 128
}
logfile = str(LOG_PATH/'log-{}-{}.txt'.format(run_start_time, args["run_text"]))
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(name)s - %(message)s',
datefmt='%m/%d/%Y %H:%M:%S',
handlers=[
logging.FileHandler(logfile),
logging.StreamHandler(sys.stdout)
])
logger = logging.getLogger()
device = torch.device('cuda')
if torch.cuda.device_count() > 1:
multi_gpu = True
else:
multi_gpu = False
print('multi_gpu={}'.format('True' if multi_gpu else 'False'))
DATA_PATH = Path('data/sample/data/')
LABEL_PATH = Path('data/sample/labels')
BERT_PRETRAINED_MODEL = "bert/bert-base-uncased"
args["do_lower_case"] = True
args["train_batch_size"] = 16
args["learning_rate"] = 6e-5
args["max_seq_length"] = 512
args["fp16"] = True
tokenizer = BertTokenizer.from_pretrained(BERT_PRETRAINED_MODEL,
do_lower_case=args['do_lower_case'])
label_cols = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
databunch = BertDataBunch(DATA_PATH, LABEL_PATH, tokenizer, train_file='train.csv', val_file='valid.csv',
test_data='test.csv', label_file="labels.csv",
text_col="comment_text", label_col=label_cols,
bs=args['train_batch_size'], maxlen=args['max_seq_length'],
multi_gpu=multi_gpu, multi_label=True)
#metrics = [{'name': 'accuracy', 'function': accuracy_multilabel}]
#metrics = [{'name': 'roc_auc', 'function': roc_auc}]
metrics = [{'name': 'F1-score', 'function': F1}]
learner = BertLearner.from_pretrained_model(databunch, BERT_PRETRAINED_MODEL, metrics, device, logger,
is_fp16=args['fp16'], loss_scale=args['loss_scale'],
multi_gpu=multi_gpu, multi_label=True)
learner.fit(4, lr=args['learning_rate'], schedule_type="warmup_linear")
Does this return the attention weights that is possible to obtain from the BERT model through PyTorch transformers?
This might not be an issue related to fast-bert, but I give it a shot here either way. I now have a dataset of 500+ labels. At first, fast-bert predicts various values between 0-1 for every label which seems fine, but the more I train it the more it predicts only zeros for everything. Logically, it seems wise as only 1/500 is a positive label while the rest are zeros. Is there a way to fix this? Can I change the loss function somehow? Possibly introduce class weights to really penalize false-negatives?
Hi
I have an issue when running
learner.fit(epochs=6,
lr=6e-5,
validate=True, # Evaluate the model after each epoch
schedule_type="warmup_linear")
using the following learner object:
logger = logging.getLogger()
device_cuda = torch.device("cuda")
metrics = [{'name': 'accuracy', 'function': accuracy}]
learner = BertLearner.from_pretrained_model(
databunch,
pretrained_path='bert-base-uncased',
metrics=metrics,
device=device_cuda,
logger=logger,
#output_dir=OUTPUT_DIR,
finetuned_wgts_path=None,
#warmup_steps=500,
multi_gpu=True,
is_fp16=True,
multi_label=False,
max_grad_norm=1.0)
TypeError Traceback (most recent call last)
in
2 lr=6e-5,
3 validate=True, # Evaluate the model after each epoch
----> 4 schedule_type="warmup_linear")
~/.conda/envs/transformers/lib/python3.7/site-packages/fast_bert/learner.py in fit(self, epochs, lr, validate, schedule_type)
462
463 if self.use_amp_optimizer == False:
--> 464 self.fit_old(epochs, lr, validate=validate, schedule_type=schedule_type)
465 return
466
~/.conda/envs/transformers/lib/python3.7/site-packages/fast_bert/learner.py in fit_old(self, epochs, lr, validate, schedule_type)
573 num_train_steps = int(len(self.data.train_dl) / self.grad_accumulation_steps * epochs)
574 if self.optimizer is None:
--> 575 self.optimizer, self.schedule = self.get_optimizer_old(lr , num_train_steps)
576
577 t_total = num_train_steps
~/.conda/envs/transformers/lib/python3.7/site-packages/fast_bert/learner.py in get_optimizer_old(self, lr, num_train_steps, schedule_type)
233 lr=lr,
234 bias_correction=False,
--> 235 max_grad_norm=1.0)
236
237 if self.loss_scale == 0:
TypeError: init() got an unexpected keyword argument 'max_grad_norm'
Does anyone know how to fix?
Thanks!
earlier I was able to see accuracy and f beta score while training the model but now I can't see anything. Model just completes its epoch and not printing anything.
any suggestions?
Hi,
I'm trying to test out fast-bert, and when I setup a train.csv file as follows:
index text label
0 test neg
2 test2 pos
tab seperated test file, I get the following error:
Traceback (most recent call last):
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4729, in get_value
return libindex.get_value_box(s, key)
File "pandas/_libs/index.pyx", line 51, in pandas._libs.index.get_value_box
File "pandas/_libs/index.pyx", line 47, in pandas._libs.index.get_value_at
File "pandas/_libs/util.pxd", line 98, in pandas._libs.util.get_value_at
File "pandas/_libs/util.pxd", line 83, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "bert.py", line 17, in
model_type='bert')
File "/home/w3pt/.local/lib/python3.7/site-packages/fast_bert/data_cls.py", line 332, in init
train_file, text_col=text_col, label_col=label_col)
File "/home/w3pt/.local/lib/python3.7/site-packages/fast_bert/data_cls.py", line 222, in get_train_examples
return self._create_examples(data_df, "train", text_col=text_col, label_col=label_col)
File "/home/w3pt/.local/lib/python3.7/site-packages/fast_bert/data_cls.py", line 257, in _create_examples
return list(df.apply(lambda row: InputExample(guid=row.index, text_a=row[text_col], label=str(row[label_col])), axis=1))
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 6906, in apply
return op.get_result()
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 186, in get_result
return self.apply_standard()
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 292, in apply_standard
self.apply_series_generator()
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 321, in apply_series_generator
results[i] = self.f(v)
File "/home/w3pt/.local/lib/python3.7/site-packages/fast_bert/data_cls.py", line 257, in
return list(df.apply(lambda row: InputExample(guid=row.index, text_a=row[text_col], label=str(row[label_col])), axis=1))
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/series.py", line 1064, in getitem
result = self.index.get_value(self, key)
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4737, in get_value
raise e1
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4723, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('text', 'occurred at index 0')
Code:
from fast_bert.data_cls import BertDataBunch
from pathlib import Path
DATA_PATH = Path('./')
LABEL_PATH = Path('./')
databunch = BertDataBunch(DATA_PATH, LABEL_PATH,
tokenizer='bert-base-uncased',
train_file='train.csv',
val_file='val.csv',
label_file='labels.csv',
text_col='text',
label_col='label',
batch_size_per_gpu=16,
max_seq_length=512,
multi_gpu=True,
multi_label=False,
Am I doing something wrong?
How can I use DistilBERT for multi-label classification for building a fast and deploy-able model?
Hi @kaushaltrivedi. Thanks so much for creating this library, it's great.
I was using it a few days ago and it worked well. But now I'm getting an import error from fast_bert/learner.py
. I think it's due to a incomplete class Learner(object):
. Complete message below:
File "/usr/local/lib/python3.6/dist-packages/fast_bert/learner.py", line 61 class BertLearner(object): ^ IndentationError: expected an indented block
When I tried to run the model for multi-class problem after training and running evaluation it throws
RuntimeError Traceback (most recent call last) 1 learner.fit(args.num_train_epochs, args.learning_rate, validate=True) 52 if len(types) <= 1: ---> 53 return orig_fn(*args, **kwargs) 54 elif len(types) == 2 and types == set(['HalfTensor', 'FloatTensor']): 55 new_args = utils.casted_args(cast_fn, RuntimeError: The size of tensor a (4) must match the size of tensor b (74) at non-singleton dimension 1
Metrics I have used is fbeta.
Thank you for your contribution.
Like the paper said, the lamb optimizer could also be used for ImageNet classification. I am trying to incorporate the lamb here to my own code. Could the optimizer you contributed here be also applied in this kind of classification?
Many thanks.
I'm trying to just get the included toxicity notebook to work from a fresh clone and am having some issues:
Out of the box, the data & labels directory are pointing to the wrong place and the DataBunch is using filenames that are not part of the repo. These are fixed easily enough.
It would help if there was a pointer to where to get the PyTorch pretrained model uncased_L-12_H-768_A-12
. There is a Google download which will not work with the from_pretrained_model
cell:
FileNotFoundError: [Errno 2] No such file or directory: '../../bert/bert-models/uncased_L-12_H-768_A-12/pytorch_model.bin'
I have been able to get past this step by instead of using 'bert-base-uncased' instead of BERT_PRETRAINED_PATH
as the model spec in the tokenizer and from_pretrained_model steps.
RuntimeError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 7.43 GiB total capacity; 6.91 GiB already allocated; 10.94 MiB free; 24.36 MiB cached)
This is a standard 8G GPU compute engine instance on GCP. Advice on how to not run out of memory would help the tutorial a lot.
When i train a fastbert model and save it using save_and_reload(), the model output is not consistent with the models output before saving.
code to reproduce:
from fast_bert import BertClassificationPredictor
databunch = BertDataBunch(args['data_dir'], LABEL_PATH, tokenizer, train_file='train.csv', val_file='val.csv',
test_data=test_df['content'].tolist(),
text_col="content", label_col=label_cols,
bs=args['train_batch_size'], maxlen=args['max_seq_length'],
multi_gpu=True, multi_label=True)
databunch.save()
metrics = []
metrics.append({'name': 'accuracy_thresh', 'function': accuracy_thresh})
metrics.append({'name': 'roc_auc', 'function': roc_auc})
metrics.append({'name': 'fbeta', 'function': fbeta})
metrics.append({'name': 'accuracy_single', 'function': accuracy_multilabel})
learner = BertLearner.from_pretrained_model(databunch, BERT_PRETRAINED_PATH, metrics, device, logger,
finetuned_wgts_path=FINETUNED_PATH,
is_fp16=args['fp16'], loss_scale=args['loss_scale'],
multi_gpu=True, multi_label=True,)
learner.fit(4, lr=args['learning_rate'], schedule_type="warmup_cosine_hard_restarts",validate=True)
#save prediction on test set
prediction_before_saving = learner.predict_batch(test_df['content'].tolist())
model_path = os.getcwd()+'/fastBertModels'
model_name = 'fastBert_split_'+str(idx)+'_test'
learner.save_and_reload(model_path,model_name)
predictor = BertClassificationPredictor(model_path=model_path+'/'+model_name+'.bin', pretrained_path = BERT_PRETRAINED_PATH, label_path = LABEL_PATH, multi_label=True)
#save prediction on test set (again)
prediction_after_loading = predictor.predict_batch(test_df['content'].tolist())
#remove column names from predictions
prediction_before_saving = [[x[0][1],x[1][1]] for x in prediction_before_saving]
prediction_after_loading = [[x[0][1],x[1][1]] for x in prediction_after_loading]
for x,y in zip(prediction_before_saving,prediction_after_loading):
print(x==y,x,y)
I also get a bunch of warnings regarding the bert model weights when i run save_and_reload(), as well as when i load the model into a BertClassificationPredictor. I suspect this to be the culprit (example below).
05/28/2019 22:13:30 - INFO - pytorch_pretrained_bert.modeling - loading archive file uncased_L-12_H-768_A-12 from cache at uncased_L-12_H-768_A-12
05/28/2019 22:13:30 - INFO - pytorch_pretrained_bert.modeling - Model config {
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"type_vocab_size": 2,
"vocab_size": 30522
}
05/28/2019 22:13:36 - INFO - pytorch_pretrained_bert.modeling - Weights of BertForMultiLabelSequenceClassification not initialized from pretrained model: ['bert.embeddings.word_embeddings.weight', 'bert.embeddings.position_embeddings.weight', 'bert.embeddings.token_type_embeddings.weight', 'bert.embeddings.LayerNorm.weight', 'bert.embeddings.LayerNorm.bias', 'bert.encoder.layer.0.attention.self.query.weight', 'bert.encoder.layer.0.attention.self.query.bias', 'bert.encoder.layer.0.attention.self.key.weight', 'bert.encoder.layer.0.attention.self.key.bias', 'bert.encoder.layer.0.attention.self.value.weight', 'bert.encoder.layer.0.attention.self.value.bias', 'bert.encoder.layer.0.attention.output.dense.weight', 'bert.encoder.layer.0.attention.output.dense.bias', 'bert.encoder.layer.0.attention.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.output.LayerNorm.bias', 'bert.encoder.layer.0.intermediate.dense.weight', 'bert.encoder.layer.0.intermediate.dense.bias', 'bert.encoder.layer.0.output.dense.weight', 'bert.encoder.layer.0.output.dense.bias', 'bert.encoder.layer.0.output.LayerNorm.weight', 'bert.encoder.layer.0.output.LayerNorm.bias', 'bert.encoder.layer.1.attention.self.query.weight', 'bert.encoder.layer.1.attention.self.query.bias', 'bert.encoder.layer.1.attention.self.key.weight', 'bert.encoder.layer.1.attention.self.key.bias', 'bert.encoder.layer.1.attention.self.value.weight', 'bert.encoder.layer.1.attention.self.value.bias', 'bert.encoder.layer.1.attention.output.dense.weight', 'bert.encoder.layer.1.attention.output.dense.bias', 'bert.encoder.layer.1.attention.output.LayerNorm.weight', 'bert.encoder.layer.1.attention.output.LayerNorm.bias', 'bert.encoder.layer.1.intermediate.dense.weight', 'bert.encoder.layer.1.intermediate.dense.bias', 'bert.encoder.layer.1.output.dense.weight', 'bert.encoder.layer.1.output.dense.bias', 'bert.encoder.layer.1.output.LayerNorm.weight', 'bert.encoder.layer.1.output.LayerNorm.bias', 'bert.encoder.layer.2.attention.self.query.weight', 'bert.encoder.layer.2.attention.self.query.bias', 'bert.encoder.layer.2.attention.self.key.weight', 'bert.encoder.layer.2.attention.self.key.bias', 'bert.encoder.layer.2.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.value.bias', 'bert.encoder.layer.2.attention.output.dense.weight', 'bert.encoder.layer.2.attention.output.dense.bias', 'bert.encoder.layer.2.attention.output.LayerNorm.weight', 'bert.encoder.layer.2.attention.output.LayerNorm.bias', 'bert.encoder.layer.2.intermediate.dense.weight', 'bert.encoder.layer.2.intermediate.dense.bias', 'bert.encoder.layer.2.output.dense.weight', 'bert.encoder.layer.2.output.dense.bias', 'bert.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.layer.2.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'classifier.weight', 'classifier.bias']
05/28/2019 22:13:36 - INFO - pytorch_pretrained_bert.modeling - Weights from pretrained model not used in BertForMultiLabelSequenceClassification: ['module.bert.embeddings.word_embeddings.weight', 'module.bert.embeddings.position_embeddings.weight', 'module.bert.embeddings.token_type_embeddings.weight', 'module.bert.embeddings.LayerNorm.weight', 'module.bert.embeddings.LayerNorm.bias', 'module.bert.encoder.layer.0.attention.self.query.weight', 'module.bert.encoder.layer.0.attention.self.query.bias', 'module.bert.encoder.layer.0.attention.self.key.weight', 'module.bert.encoder.layer.0.attention.self.key.bias', 'module.bert.encoder.layer.0.attention.self.value.weight', 'module.bert.encoder.layer.0.attention.self.value.bias', 'module.bert.encoder.layer.0.attention.output.dense.weight', 'module.bert.encoder.layer.0.attention.output.dense.bias', 'module.bert.encoder.layer.0.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.0.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.0.intermediate.dense.weight', 'module.bert.encoder.layer.0.intermediate.dense.bias', 'module.bert.encoder.layer.0.output.dense.weight', 'module.bert.encoder.layer.0.output.dense.bias', 'module.bert.encoder.layer.0.output.LayerNorm.weight', 'module.bert.encoder.layer.0.output.LayerNorm.bias', 'module.bert.encoder.layer.1.attention.self.query.weight', 'module.bert.encoder.layer.1.attention.self.query.bias', 'module.bert.encoder.layer.1.attention.self.key.weight', 'module.bert.encoder.layer.1.attention.self.key.bias', 'module.bert.encoder.layer.1.attention.self.value.weight', 'module.bert.encoder.layer.1.attention.self.value.bias', 'module.bert.encoder.layer.1.attention.output.dense.weight', 'module.bert.encoder.layer.1.attention.output.dense.bias', 'module.bert.encoder.layer.1.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.1.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.1.intermediate.dense.weight', 'module.bert.encoder.layer.1.intermediate.dense.bias', 'module.bert.encoder.layer.1.output.dense.weight', 'module.bert.encoder.layer.1.output.dense.bias', 'module.bert.encoder.layer.1.output.LayerNorm.weight', 'module.bert.encoder.layer.1.output.LayerNorm.bias', 'module.bert.encoder.layer.2.attention.self.query.weight', 'module.bert.encoder.layer.2.attention.self.query.bias', 'module.bert.encoder.layer.2.attention.self.key.weight', 'module.bert.encoder.layer.2.attention.self.key.bias', 'module.bert.encoder.layer.2.attention.self.value.weight', 'module.bert.encoder.layer.2.attention.self.value.bias', 'module.bert.encoder.layer.2.attention.output.dense.weight', 'module.bert.encoder.layer.2.attention.output.dense.bias', 'module.bert.encoder.layer.2.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.2.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.2.intermediate.dense.weight', 'module.bert.encoder.layer.2.intermediate.dense.bias', 'module.bert.encoder.layer.2.output.dense.weight', 'module.bert.encoder.layer.2.output.dense.bias', 'module.bert.encoder.layer.2.output.LayerNorm.weight', 'module.bert.encoder.layer.2.output.LayerNorm.bias', 'module.bert.encoder.layer.3.attention.self.query.weight', 'module.bert.encoder.layer.3.attention.self.query.bias', 'module.bert.encoder.layer.3.attention.self.key.weight', 'module.bert.encoder.layer.3.attention.self.key.bias', 'module.bert.encoder.layer.3.attention.self.value.weight', 'module.bert.encoder.layer.3.attention.self.value.bias', 'module.bert.encoder.layer.3.attention.output.dense.weight', 'module.bert.encoder.layer.3.attention.output.dense.bias', 'module.bert.encoder.layer.3.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.3.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.3.intermediate.dense.weight', 'module.bert.encoder.layer.3.intermediate.dense.bias', 'module.bert.encoder.layer.3.output.dense.weight', 'module.bert.encoder.layer.3.output.dense.bias', 'module.bert.encoder.layer.3.output.LayerNorm.weight', 'module.bert.encoder.layer.3.output.LayerNorm.bias', 'module.bert.encoder.layer.4.attention.self.query.weight', 'module.bert.encoder.layer.4.attention.self.query.bias', 'module.bert.encoder.layer.4.attention.self.key.weight', 'module.bert.encoder.layer.4.attention.self.key.bias', 'module.bert.encoder.layer.4.attention.self.value.weight', 'module.bert.encoder.layer.4.attention.self.value.bias', 'module.bert.encoder.layer.4.attention.output.dense.weight', 'module.bert.encoder.layer.4.attention.output.dense.bias', 'module.bert.encoder.layer.4.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.4.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.4.intermediate.dense.weight', 'module.bert.encoder.layer.4.intermediate.dense.bias', 'module.bert.encoder.layer.4.output.dense.weight', 'module.bert.encoder.layer.4.output.dense.bias', 'module.bert.encoder.layer.4.output.LayerNorm.weight', 'module.bert.encoder.layer.4.output.LayerNorm.bias', 'module.bert.encoder.layer.5.attention.self.query.weight', 'module.bert.encoder.layer.5.attention.self.query.bias', 'module.bert.encoder.layer.5.attention.self.key.weight', 'module.bert.encoder.layer.5.attention.self.key.bias', 'module.bert.encoder.layer.5.attention.self.value.weight', 'module.bert.encoder.layer.5.attention.self.value.bias', 'module.bert.encoder.layer.5.attention.output.dense.weight', 'module.bert.encoder.layer.5.attention.output.dense.bias', 'module.bert.encoder.layer.5.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.5.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.5.intermediate.dense.weight', 'module.bert.encoder.layer.5.intermediate.dense.bias', 'module.bert.encoder.layer.5.output.dense.weight', 'module.bert.encoder.layer.5.output.dense.bias', 'module.bert.encoder.layer.5.output.LayerNorm.weight', 'module.bert.encoder.layer.5.output.LayerNorm.bias', 'module.bert.encoder.layer.6.attention.self.query.weight', 'module.bert.encoder.layer.6.attention.self.query.bias', 'module.bert.encoder.layer.6.attention.self.key.weight', 'module.bert.encoder.layer.6.attention.self.key.bias', 'module.bert.encoder.layer.6.attention.self.value.weight', 'module.bert.encoder.layer.6.attention.self.value.bias', 'module.bert.encoder.layer.6.attention.output.dense.weight', 'module.bert.encoder.layer.6.attention.output.dense.bias', 'module.bert.encoder.layer.6.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.6.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.6.intermediate.dense.weight', 'module.bert.encoder.layer.6.intermediate.dense.bias', 'module.bert.encoder.layer.6.output.dense.weight', 'module.bert.encoder.layer.6.output.dense.bias', 'module.bert.encoder.layer.6.output.LayerNorm.weight', 'module.bert.encoder.layer.6.output.LayerNorm.bias', 'module.bert.encoder.layer.7.attention.self.query.weight', 'module.bert.encoder.layer.7.attention.self.query.bias', 'module.bert.encoder.layer.7.attention.self.key.weight', 'module.bert.encoder.layer.7.attention.self.key.bias', 'module.bert.encoder.layer.7.attention.self.value.weight', 'module.bert.encoder.layer.7.attention.self.value.bias', 'module.bert.encoder.layer.7.attention.output.dense.weight', 'module.bert.encoder.layer.7.attention.output.dense.bias', 'module.bert.encoder.layer.7.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.7.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.7.intermediate.dense.weight', 'module.bert.encoder.layer.7.intermediate.dense.bias', 'module.bert.encoder.layer.7.output.dense.weight', 'module.bert.encoder.layer.7.output.dense.bias', 'module.bert.encoder.layer.7.output.LayerNorm.weight', 'module.bert.encoder.layer.7.output.LayerNorm.bias', 'module.bert.encoder.layer.8.attention.self.query.weight', 'module.bert.encoder.layer.8.attention.self.query.bias', 'module.bert.encoder.layer.8.attention.self.key.weight', 'module.bert.encoder.layer.8.attention.self.key.bias', 'module.bert.encoder.layer.8.attention.self.value.weight', 'module.bert.encoder.layer.8.attention.self.value.bias', 'module.bert.encoder.layer.8.attention.output.dense.weight', 'module.bert.encoder.layer.8.attention.output.dense.bias', 'module.bert.encoder.layer.8.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.8.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.8.intermediate.dense.weight', 'module.bert.encoder.layer.8.intermediate.dense.bias', 'module.bert.encoder.layer.8.output.dense.weight', 'module.bert.encoder.layer.8.output.dense.bias', 'module.bert.encoder.layer.8.output.LayerNorm.weight', 'module.bert.encoder.layer.8.output.LayerNorm.bias', 'module.bert.encoder.layer.9.attention.self.query.weight', 'module.bert.encoder.layer.9.attention.self.query.bias', 'module.bert.encoder.layer.9.attention.self.key.weight', 'module.bert.encoder.layer.9.attention.self.key.bias', 'module.bert.encoder.layer.9.attention.self.value.weight', 'module.bert.encoder.layer.9.attention.self.value.bias', 'module.bert.encoder.layer.9.attention.output.dense.weight', 'module.bert.encoder.layer.9.attention.output.dense.bias', 'module.bert.encoder.layer.9.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.9.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.9.intermediate.dense.weight', 'module.bert.encoder.layer.9.intermediate.dense.bias', 'module.bert.encoder.layer.9.output.dense.weight', 'module.bert.encoder.layer.9.output.dense.bias', 'module.bert.encoder.layer.9.output.LayerNorm.weight', 'module.bert.encoder.layer.9.output.LayerNorm.bias', 'module.bert.encoder.layer.10.attention.self.query.weight', 'module.bert.encoder.layer.10.attention.self.query.bias', 'module.bert.encoder.layer.10.attention.self.key.weight', 'module.bert.encoder.layer.10.attention.self.key.bias', 'module.bert.encoder.layer.10.attention.self.value.weight', 'module.bert.encoder.layer.10.attention.self.value.bias', 'module.bert.encoder.layer.10.attention.output.dense.weight', 'module.bert.encoder.layer.10.attention.output.dense.bias', 'module.bert.encoder.layer.10.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.10.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.10.intermediate.dense.weight', 'module.bert.encoder.layer.10.intermediate.dense.bias', 'module.bert.encoder.layer.10.output.dense.weight', 'module.bert.encoder.layer.10.output.dense.bias', 'module.bert.encoder.layer.10.output.LayerNorm.weight', 'module.bert.encoder.layer.10.output.LayerNorm.bias', 'module.bert.encoder.layer.11.attention.self.query.weight', 'module.bert.encoder.layer.11.attention.self.query.bias', 'module.bert.encoder.layer.11.attention.self.key.weight', 'module.bert.encoder.layer.11.attention.self.key.bias', 'module.bert.encoder.layer.11.attention.self.value.weight', 'module.bert.encoder.layer.11.attention.self.value.bias', 'module.bert.encoder.layer.11.attention.output.dense.weight', 'module.bert.encoder.layer.11.attention.output.dense.bias', 'module.bert.encoder.layer.11.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.11.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.11.intermediate.dense.weight', 'module.bert.encoder.layer.11.intermediate.dense.bias', 'module.bert.encoder.layer.11.output.dense.weight', 'module.bert.encoder.layer.11.output.dense.bias', 'module.bert.encoder.layer.11.output.LayerNorm.weight', 'module.bert.encoder.layer.11.output.LayerNorm.bias', 'module.bert.pooler.dense.weight', 'module.bert.pooler.dense.bias', 'module.classifier.weight', 'module.classifier.bias']
It could be nice to have an option to save the model with the best validation score for a given metric.
Also it could be nice just to have a function to do anything on each epoch's end.
When I tried to use a metric roc_auc, I got an error:
ValueError: Found input variables with inconsistent numbers of samples: [64, 128]
"train_batch_size": 64, "eval_batch_size": 64,
multi_label=False
Hello,
It's possible to create a model that uses pre-trained BERT (or any other model), and feeds data from multiple datasets to predict multiple outputs?
Example, which I have 4 text datasets:
Dataset A contains [ ValueA, ValueB, ValueC ]
Dataset B contains [ ValueA, ValueB, ValueC, ValueD, ValueE, ValueF ]
Dataset C contains [ ValueA, ValueB ]
Dataset D contains [ ValueD, ValueE, ValueF ]
Since all of them are on English, I hope to use BERT to enchance the similarity between datasets.
Approaches that I thought:
y
, and add 0.
to empty fields which I don't have for it. In this case, my prediction would be [ ValueA, ValueB, ValueC, ValueD, ValueE, ValueF ]
Hi,
def accuracy_multilabel(y_pred:Tensor, y_true:Tensor, sigmoid:bool=True):
if sigmoid: y_pred = y_pred.sigmoid()
outputs = np.argmax(y_pred, axis=1)
real_vals = np.argmax(y_true, axis=1)
return np.mean(outputs.numpy() == real_vals.numpy())
in this block.
This piece of code seems incorrect as the shape of y_pred
and y_true
is (Batch_size, class_space)
.
Doing a np.argmax
with axis=1
returns a single class index value for each sample.
This is what we do for multi-class classification.
However in multi-class classification we don't normally use sigmoid
on y_pred
, although it is not wrong.
This function seems much like accuracy_multiclass
rather than accuracy_multilabel
I wonder if there is something like different callbacks in fastai for saving models and earlystopping?
NameError Traceback (most recent call last)
in ()
8 from pytorch_pretrained_bert.tokenization import BertTokenizer
9
---> 10 from fast_bert.data import BertDataBunch
11 from fast_bert.learner import BertLearner
12 from fast_bert.metrics import accuracy, accuracy_thresh, fbeta, roc_auc
/opt/conda/lib/python3.6/site-packages/fast_bert/init.py in ()
1 from .modeling import BertForMultiLabelSequenceClassification
2 from .data import BertDataBunch, InputExample, InputFeatures, MultiLabelTextProcessor, convert_examples_to_features
----> 3 from .metrics import accuracy, accuracy_thresh, fbeta, roc_auc, accuracy_multilabel
4 from .learner import BertLearner
5 from .prediction import BertClassificationPredictor
/opt/conda/lib/python3.6/site-packages/fast_bert/metrics.py in ()
54 return roc_auc["micro"]
55
---> 56 def Hamming_loss(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, thresh:float = threshold, sample_weight = None):
57 if sigmoid: y_pred = y_pred.sigmoid()
58 y_pred = (y_pred > thresh).float()
NameError: name 'threshold' is not defined
I was checking the memory consumption of RoBERTa and DistilBERT. I found there is no significant change in memory usage. Although Inference time is around 1sec for DistilBERT and for RoBERTa is 2sec.
Memory usage on CPU:
Port 9000: DistilBERT
Port 9002: RoBERTa
Have you seen any significant change in memory usage @kaushaltrivedi
Hi,
Target size (torch.Size([0, 6])) must be the same as input size (torch.Size([32, 6]))
Below is the code.
databunch = BertDataBunch('fast-bert/sample_data/multi_label_toxic_comments/data', 'fast-bert/sample_data/multi_label_toxic_comments/label', tokenizer,
train_file='train_sample.csv', val_file='val_sample.csv',label_file='labels.csv',label_col=None,
bs=args['train_batch_size'], maxlen=args['max_seq_length'],
multi_gpu=multi_gpu, multi_label=True)
metrics = []
metrics.append({'name': 'accuracy', 'function': accuracy})
learner = BertLearner.from_pretrained_model(databunch, 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz', metrics, device, logger=None,
finetuned_wgts_path=None,
is_fp16=args['fp16'], loss_scale=args['loss_scale'],
multi_gpu=multi_gpu, multi_label=True)
learner.fit(1, lr=args['learning_rate'],
schedule_type="warmup_cosine_hard_restarts")
random_word() is called twice but it is not defined or imported.
https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L63
I am trying to detect lies in text, so it can either be the person telling the truth or a lie.
So this is not a multi_label problem, and therefore my BertDataBunch is looking like
databunch = BertDataBunch(args['data_dir'], LABEL_PATH, tokenizer, train_file='train.csv', val_file='val.csv',
test_data='test.csv',
text_col="content", label_col=label_cols,
bs=args['train_batch_size'], maxlen=args['max_seq_length'],
multi_gpu=multi_gpu, multi_label=False)
However I am then getting a keyerror
'lie 0\nName: 0, dtype: object'
I'm unable to load a trained model for inference on my Mac which doesn't have an Nvidia GPU.
I think it is because of this line. It should have a check around it to make sure CUDA is available before being called.
Hi,
I'm getting
TypeError: unsupported operand type(s) for /: 'str' and 'str'
error when calling BertDataBunch function. I'm actually surprised how it works for others because in line 294 of data_cls.py there is divide symbol between two strings:
292 self.tokenizer = tokenizer
293 self.data_dir = data_dir
--> 294 self.cache_dir = data_dir/'cache'
295 self.max_seq_length = max_seq_length
296 self.batch_size_per_gpu = batch_size_per_gpu
Thanks!
I cant load the model on CPU instead of GPU while trained on GPU. Can somebody tell me
Running the following code results in the following error,
databunch = BertDataBunch(DATA_PATH, LABEL_PATH, tokenizer,
train_file='train.csv', val_file='valid.csv', label_file='labels.csv',
bs=args['train_batch_size'], maxlen=args['max_seq_length'],
multi_gpu=multi_gpu, multi_label=False)
373 train_sampler = RandomSampler(train_data)
374 else:
--> 375 torch.distributed.init_process_group(backend="nccl",
376 init_method = "tcp://localhost:23459",
377 rank=0, world_size=1)
AttributeError: module 'torch.distributed' has no attribute 'init_process_group'```
Hi, I came to this error, how to solve
Traceback (most recent call last):
File "fastBertDemo.py", line 23, in
model_type='bert')
File "/usr/local/python3/lib/python3.6/site-packages/fast_bert/data_cls.py", line 332, in init
train_dataset = self.get_dataset_from_examples(train_examples, 'train')
File "/usr/local/python3/lib/python3.6/site-packages/fast_bert/data_cls.py", line 431, in get_dataset_from_examples
all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
File "/usr/local/python3/lib/python3.6/site-packages/fast_bert/data_cls.py", line 431, in
all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
AttributeError: 'str' object has no attribute 'input_ids'
What do we need to specify for the labels if we need logits as a result?
I used learner.save_and_reload to save my model and an output of pretrained_bert.bin occured. How can i used this .bin file and classify with learner.predict_batches() as i have been stuck for ages and i dont know how.
Does fast-bert shuffle the train, val and eval datasets?
Getting "TypeError: init_weights() takes 1 positional argument but 2 were given" when running the below code for any of bert, xlnet model. Please note that this code was working couple of days back.
learner = BertLearner.from_pretrained_model(
databunch,
pretrained_path='bert-base-uncased',#xlnet-large-cased, bert-base-uncased
metrics=metrics,
device=device_cuda,
logger=logger,
output_dir=OUTPUT_DIR,
finetuned_wgts_path=None,
warmup_steps=500,
multi_gpu=True,
is_fp16=True,
multi_label=True,
logging_steps=50)
How can I train this model with fine tuning all layers?
args = Box({ "run_text": "multilabel toxic comments with freezable layers", "train_size": -1, "val_size": -1, "log_path": LOG_PATH, "full_data_dir": DATA_PATH, "data_dir": DATA_PATH, "task_name": "toxic_classification_lib", "no_cuda": False, "bert_model": BERT_PRETRAINED_PATH, "output_dir": OUTPUT_PATH, "max_seq_length": 512, "do_train": True, "do_eval": True, "do_lower_case": True, "train_batch_size": 8, "eval_batch_size": 16, "learning_rate": 5e-5, "num_train_epochs": 4, "warmup_proportion": 0.0, "no_cuda": False, "local_rank": -1, "seed": 42, "gradient_accumulation_steps": 1, "optimize_on_cpu": False, "fp16": True, "fp16_opt_level": "O1", "weight_decay": 0.0, "adam_epsilon": 1e-8, "max_grad_norm": 1.0, "max_steps": -1, "warmup_steps": 500, "logging_steps": 50, "eval_all_checkpoints": True, "overwrite_output_dir": True, "overwrite_cache": False, "seed": 42, "loss_scale": 128, "task_name": 'intent', "model_name": 'bert-base-uncased', "model_type": 'bert' })
databunch = BertDataBunch(args['data_dir'], LABEL_PATH, args.model_name, train_file='train.csv', val_file='val.csv', test_data='test.csv', text_col="text", label_col=label_cols, batch_size_per_gpu=args['train_batch_size'], max_seq_length=args['max_seq_length'], multi_gpu=args.multi_gpu, multi_label=True, model_type=args.model_type)
learner = BertLearner.from_pretrained_model(databunch, args.model_name, metrics=metrics, device=device, logger=logger, output_dir=args.output_dir, finetuned_wgts_path=FINETUNED_PATH, warmup_steps=args.warmup_steps, multi_gpu=args.multi_gpu, is_fp16=args.fp16, multi_label=True, logging_steps=0)
Hi @kaushaltrivedi ,
I used:
learner.fit(epochs=6,
lr=6e-5,
validate=True. # Evaluate the model after each epoch
schedule_type="warmup_cosine")
However, that code onlys checks after the whole training, not after each epoch.
What could I do?
Thanks
I was trying to create Databunch on Google Colab, using the sentiments140 twitter dataset from google colab. But no matter what batch size I use the GPU always crashes. I tried all batch sizes from 2 to 256. But the runtime crashes every single time. Can anyone please help me to solve the issue.
databunch = BertDataBunch(DATA_PATH, LABEL_PATH, tokenizer='xlnet-base-cased', train_file= 'df_train2.csv', val_file = 'df_valid2.csv', label_file = 'labels.csv', text_col = 'text', label_col = 'label', batch_size_per_gpu=2, max_seq_length=128, multi_gpu=False, multi_label=False, model_type='xlnet', )
This is the code where it crashes.
from fast_bert.learner_cls import BertLearner
from fast_bert.metrics import accuracy
import logging
logger = logging.getLogger()
device_cuda = torch.device('cpu') #torch.device("cuda")
metrics = [{'name': 'accuracy', 'function': accuracy}]
learner = BertLearner.from_pretrained_model(
databunch,
pretrained_path='bert-base-uncased',
metrics=metrics,
device=device_cuda,
logger=logger,
output_dir=MODEL_PATH,
finetuned_wgts_path=None,
warmup_steps=500,
multi_gpu=multi_gpu,
is_fp16=True,
multi_label=False,
logging_steps=50)
AssertionError Traceback (most recent call last)
in
19 is_fp16=True,
20 multi_label=False,
---> 21 logging_steps=50)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fast_bert/learner_cls.py in from_pretrained_model(dataBunch, pretrained_path, output_dir, metrics, device, logger, finetuned_wgts_path, multi_gpu, is_fp16, loss_scale, warmup_steps, fp16_opt_level, grad_accumulation_steps, multi_label, max_grad_norm, adam_epsilon, logging_steps)
67 model = model_class[0].from_pretrained(pretrained_path, config=config)
68
---> 69 device_id = torch.cuda.current_device()
70 model.to(device)
71
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/init.py in current_device()
349 def current_device():
350 r"""Returns the index of a currently selected device."""
--> 351 _lazy_init()
352 return torch._C._cuda_getDevice()
353
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/init.py in _lazy_init()
160 raise RuntimeError(
161 "Cannot re-initialize CUDA in forked subprocess. " + msg)
--> 162 _check_driver()
163 torch._C._cuda_init()
164 _cudart = _load_cudart()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/init.py in _check_driver()
73 def _check_driver():
74 if not hasattr(torch._C, '_cuda_isDriverSufficient'):
---> 75 raise AssertionError("Torch not compiled with CUDA enabled")
76 if not torch._C._cuda_isDriverSufficient():
77 if torch._C._cuda_getDriverVersion() == 0:
AssertionError: Torch not compiled with CUDA enabled
I can't run a model on os X and I was wondering if I could train without using cuda?
/usr/local/lib/python3.6/dist-packages/fast_bert/learner_cls.py in fit(self, epochs, lr, validate, schedule_type, optimizer_type)
211 def fit(self, epochs, lr, validate=True, schedule_type="warmup_cosine", optimizer_type='lamb'):
212
--> 213 tensorboard_dir = self.output_dir/'tensorboard'
214 tensorboard_dir.mkdir(exist_ok=True)
215 print(tensorboard_dir)
TypeError: unsupported operand type(s) for /: 'str' and 'str
I saw that we could train labeled dataset using your module. But I have huge corpus of unlabeled text data which are in sentence sequence representations. I just want to train language model kind of model on my data to learn about domain specific word or sentence representations interms of embeddings so than I can use those embddings for downstram unsupervised tasks. Do you have any idea how can I train bert pretrained model on my corpus. Thank you.
I'm trying to train fast-bert on a custom multi-labeled dataset (10 labels). It works perfectly when I strip down my dataset to only use 6 labels (same number as the provided toxic comments dataset), but when I try to switch the labels to be more or less than that, I get the following error:
Traceback (most recent call last):
File "multilabel.py", line 149, in <module> learner.fit(args.num_train_epochs, args.learning_rate, validate=True)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/fast_bert/learner_cls.py", line 271, in fit outputs = self.model(**inputs)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__ result = self.forward(*input, **kwargs)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/fast_bert/modeling.py", line 194, in forward loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1, self.num_labels))
RuntimeError: shape '[-1, 10]' is invalid for input of size 36
Seems like fast-bert is hard-coded to strictly work for only 6 multi-labels. Especially considering I get these different errors when I change the batch size as following with 10 labels in my dataset:
batch_size = 2 --> RuntimeError: shape '[-1, 10]' is invalid for input of size 12
(2 batch_size * 6 (labels?) = 12?)
batch_size = 4 --> RuntimeError: shape '[-1, 10]' is invalid for input of size 24
(4 batch_size * 6 (labels?) = 24?)
batch_size = 6 --> RuntimeError: shape '[-1, 10]' is invalid for input of size 36
(6 batch_size * 6 (labels?) = 36?)
batch_size = 8 --> RuntimeError: shape '[-1, 10]' is invalid for input of size 48
(8 batch_size * 6 (labels?) = 48?)
Any ideas how I can solve fast-bert to use more than 6 labels?
How can i use the confusion matrix for each class and the other metrics in this link #17 ??
learner.fit(args.num_train_epochs, args.learning_rate, validate=True)
RuntimeError Traceback (most recent call last)
in
----> 1 learner.fit(args.num_train_epochs, args.learning_rate, validate=True)
~/.conda/envs/fastbert/lib/python3.6/site-packages/fast_bert/learner_cls.py in fit(self, epochs, lr, validate, schedule_type, optimizer_type)
311 # Evaluate the model after every epoch
312 if validate:
--> 313 results = self.validate()
314 for key, value in results.items():
315 self.logger.info("eval_{} after epoch {}: {}: ".format(key, (epoch + 1), value))
~/.conda/envs/fastbert/lib/python3.6/site-packages/fast_bert/learner_cls.py in validate(self)
382 # Evaluation metrics
383 for metric in self.metrics:
--> 384 validation_scores[metric['name']] = metric['function'](all_logits, all_labels)
385
386 results = {'loss': eval_loss }
~/.conda/envs/fastbert/lib/python3.6/site-packages/fast_bert/metrics.py in accuracy_thresh(y_pred, y_true, thresh, sigmoid)
29 if sigmoid:
30 y_pred = y_pred.sigmoid()
---> 31 return ((y_pred > thresh) == y_true.byte()).float().mean().item()
32 # return np.mean(((y_pred>thresh)==y_true.byte()).float().cpu().numpy(), axis=1).sum()
33
~/.conda/envs/fastbert/lib/python3.6/site-packages/apex/amp/wrap.py in wrapper(*args, **kwargs)
51
52 if len(types) <= 1:
---> 53 return orig_fn(*args, **kwargs)
54 elif len(types) == 2 and types == set(['HalfTensor', 'FloatTensor']):
55 new_args = utils.casted_args(cast_fn,
RuntimeError: Expected object of scalar type Bool but got scalar type Byte for argument #2 'other'
Hi,
Is it possible to use fast-bert to make submissions on kaggle? I tried but it threw the above error while making databunch.
Hi, I'm trying to follow the notebook example provided in this repo with some of my own data. However, when I go to fit the model, I get the following:
ModuleNotFoundError Traceback (most recent call last)
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fast_bert/learner.py in get_optimizer(self, lr, num_train_steps, schedule_type)
197 try:
--> 198 from apex.optimizers import FP16_Optimizer
199 from apex.optimizers import FusedAdam
ModuleNotFoundError: No module named `'apex.optimizers'
I have installed Apex correctly using NVIDIA's documentation, and the Apex directory appears the same as in their repo, which leads me to think it's a fast-bert issue. I am using an AWS instance (ml.p3.8xlarge), and my environment is conda_pytorch_p36.
Thanks in advance for any help,
Darren
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.