Comments (5)
The paper Neural architectures for named entity recognition said that
y0 and yn are the start and end tags of a sentence, that we add to the set of possible tags. A is therefore a square matrix of size k+2.
I think here the author only implements the start_logits
, but not corresponding end_logits
. I revised the codes to add end_logits
in loss_layer
function.
def loss_layer(self, project_logits, lengths, name=None):
"""
calculate crf loss
:param project_logits: [1, num_steps, num_tags]
:return: scalar loss
"""
with tf.variable_scope("crf_loss" if not name else name):
small = -1000.0
# pad logits for crf loss
start_logits = tf.concat(
[small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), tf.zeros(shape=[self.batch_size, 1, 1]), small * tf.ones(shape=[self.batch_size, 1, 1])], axis=-1)
end_logits = tf.concat(
[small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), small * tf.ones(shape=[self.batch_size, 1, 1]), tf.zeros(shape=[self.batch_size, 1, 1])], axis=-1)
pad_logits = tf.cast(small * tf.ones([self.batch_size, self.num_steps, 2]), tf.float32)
logits = tf.concat([project_logits, pad_logits], axis=-1)
logits = tf.concat([start_logits, logits, end_logits], axis=1)
targets = tf.concat(
[tf.cast(self.num_tags*tf.ones([self.batch_size, 1]), tf.int32), self.targets, tf.cast((self.num_tags+1)*tf.ones([self.batch_size, 1]), tf.int32)], axis=-1)
log_likelihood, self.trans = crf_log_likelihood(
inputs=logits,
tag_indices=targets,
#transition_params=self.trans,
sequence_lengths=lengths+2)
return tf.reduce_mean(-log_likelihood)
and decode
function
def decode(self, logits, lengths, matrix):
"""
:param logits: [batch_size, num_steps, num_tags]float32, logits
:param lengths: [batch_size]int32, real length of each sequence
:param matrix: transaction matrix for inference
:return:
"""
# inference final labels usa viterbi Algorithm
paths = []
small = -1000.0
start = np.asarray([[small]*self.num_tags +[0, small]])
end = np.asarray([[small]*self.num_tags +[small, 0]])
for score, length in zip(logits, lengths):
score = score[:length]
pad = small * np.ones([length, 2])
logits = np.concatenate([score, pad], axis=1)
logits = np.concatenate([start, logits, end], axis=0)
#print('logits shape:', logits.shape)
#print('matrix shape:', matrix.shape)
path, _ = viterbi_decode(logits, matrix)
paths.append(path[1:len(path)-1])
return paths
I got the best test F1 score 91.510, which is little larger than that generated by the function with start_logits
only.
I also tested different models with the combination of number of hidden layers (see my issue) and with or without start_logits
. And I found that two hidden layers with start_logits
and end_logits
have the best test F1 score.
from chinesener.
have the same question,too. @zjy-ucas
from chinesener.
Also confused about the loss_layer, why should you add padding before and after the prediction?
from chinesener.
I reduced all the padding in both dimension of tag and sentence_length/time_step, and ran a 10 epochs test. The F1 score seems quite the same.
from chinesener.
The paper Neural architectures for named entity recognition said that
y0 and yn are the start and end tags of a sentence, that we add to the set of possible tags. A is therefore a square matrix of size k+2.
I think here the author only implements the
start_logits
, but not correspondingend_logits
. I revised the codes to addend_logits
inloss_layer
function.def loss_layer(self, project_logits, lengths, name=None): """ calculate crf loss :param project_logits: [1, num_steps, num_tags] :return: scalar loss """ with tf.variable_scope("crf_loss" if not name else name): small = -1000.0 # pad logits for crf loss start_logits = tf.concat( [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), tf.zeros(shape=[self.batch_size, 1, 1]), small * tf.ones(shape=[self.batch_size, 1, 1])], axis=-1) end_logits = tf.concat( [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), small * tf.ones(shape=[self.batch_size, 1, 1]), tf.zeros(shape=[self.batch_size, 1, 1])], axis=-1) pad_logits = tf.cast(small * tf.ones([self.batch_size, self.num_steps, 2]), tf.float32) logits = tf.concat([project_logits, pad_logits], axis=-1) logits = tf.concat([start_logits, logits, end_logits], axis=1) targets = tf.concat( [tf.cast(self.num_tags*tf.ones([self.batch_size, 1]), tf.int32), self.targets, tf.cast((self.num_tags+1)*tf.ones([self.batch_size, 1]), tf.int32)], axis=-1) log_likelihood, self.trans = crf_log_likelihood( inputs=logits, tag_indices=targets, #transition_params=self.trans, sequence_lengths=lengths+2) return tf.reduce_mean(-log_likelihood)
and
decode
functiondef decode(self, logits, lengths, matrix): """ :param logits: [batch_size, num_steps, num_tags]float32, logits :param lengths: [batch_size]int32, real length of each sequence :param matrix: transaction matrix for inference :return: """ # inference final labels usa viterbi Algorithm paths = [] small = -1000.0 start = np.asarray([[small]*self.num_tags +[0, small]]) end = np.asarray([[small]*self.num_tags +[small, 0]]) for score, length in zip(logits, lengths): score = score[:length] pad = small * np.ones([length, 2]) logits = np.concatenate([score, pad], axis=1) logits = np.concatenate([start, logits, end], axis=0) #print('logits shape:', logits.shape) #print('matrix shape:', matrix.shape) path, _ = viterbi_decode(logits, matrix) paths.append(path[1:len(path)-1]) return paths
I got the best test F1 score 91.510, which is little larger than that generated by the function with
start_logits
only.I also tested different models with the combination of number of hidden layers (see my issue) and with or without
start_logits
. And I found that two hidden layers withstart_logits
andend_logits
have the best test F1 score.
Thanks for pointing out the end_logits
bit. However, appending the end_logits
by the end of padded sentences should not improve the performance. The crf_log_likelihood
will do the mask according to the provided sequence_lengths
from the begining of sentences, thus the crf_log_likelihood
from your code is actually processing a sequence [start
, label, label, ..., label, pad
] instead of [start
, label, label, ..., label, end
] (except for the longest sentence in the batch).
from chinesener.
Related Issues (20)
- dev文件识别出0个句子
- 程序正确运行的时候应该是什么样的啊? HOT 2
- 将该模型部署到TensorFlow Serving时如何定义Signature?
- ner_predict.utf8
- 参数意义及设置,是否有keras版本的?
- 这个数据集是哪个团队标注的呀 HOT 2
- 取作者全部数据集训练OK,但是取数据集中的前3000条训练数据为新的训练数据,验证及测试集数据不变,训练出错。 HOT 3
- Fatal Python error: Aborted
- 数据量
- 你好,为什么在线预测出来的标签是beio格式的,而不是bio
- 请问大家在别的领域运用模型需要训练新的Word2vec模型嘛 HOT 1
- 求助,这个报错是什么原因,用的自己的数据和删除部分自带的训练数据后都会有该错误 HOT 2
- 把数据集中的LOC换成其他字母后报keyerror HOT 1
- 报错No such file or directory: 'data\\example.train'该如何解决呢
- 请问这个在window下怎么调用main呢 我加上了train_file,test_file,dev_file 执行后没有任何效果
- 如何去掉BiLSTM层?
- 如何添加attention层? HOT 2
- TypeError: slice indices must be integers or None or have an __index__ method HOT 1
- 每次都要加载模型有什么办法加载一次么
- 我修改了clean和train的False为True 然后运行报出name 'tf' is not defined的错误是什么情况 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chinesener.