For example, when defining the loss function, you expand logits and targets to [self.n

have the same question,too. <a class="user-mention notranslate" data-hovercard-type="u

Why should we expand the shape of logits to [self.num_tags + 1, self.num_tags + 1] ? about chinesener HOT 5 CLOSED

zjy-ucas commented on June 11, 2024

Why should we expand the shape of logits to [self.num_tags + 1, self.num_tags + 1] ?

from chinesener.

Comments (5)

gaoisbest commented on June 11, 2024 8

The paper Neural architectures for named entity recognition said that

y0 and yn are the start and end tags of a sentence, that we add to the set of possible tags. A is therefore a square matrix of size k+2.

I think here the author only implements the start_logits, but not corresponding end_logits. I revised the codes to add end_logits in loss_layer function.

    def loss_layer(self, project_logits, lengths, name=None):
        """
        calculate crf loss
        :param project_logits: [1, num_steps, num_tags]
        :return: scalar loss
        """
        with tf.variable_scope("crf_loss"  if not name else name):
            small = -1000.0
            # pad logits for crf loss
            start_logits = tf.concat(
                [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), tf.zeros(shape=[self.batch_size, 1, 1]), small * tf.ones(shape=[self.batch_size, 1, 1])], axis=-1)            
            end_logits = tf.concat(
                [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), small * tf.ones(shape=[self.batch_size, 1, 1]), tf.zeros(shape=[self.batch_size, 1, 1])], axis=-1)          
            
            pad_logits = tf.cast(small * tf.ones([self.batch_size, self.num_steps, 2]), tf.float32)            
            logits = tf.concat([project_logits, pad_logits], axis=-1)            
            logits = tf.concat([start_logits, logits, end_logits], axis=1)    
            targets = tf.concat(
                [tf.cast(self.num_tags*tf.ones([self.batch_size, 1]), tf.int32), self.targets, tf.cast((self.num_tags+1)*tf.ones([self.batch_size, 1]), tf.int32)], axis=-1)               
            log_likelihood, self.trans = crf_log_likelihood(
                inputs=logits,
                tag_indices=targets,
                #transition_params=self.trans,
                sequence_lengths=lengths+2)
            return tf.reduce_mean(-log_likelihood)

and decode function

    def decode(self, logits, lengths, matrix):
        """
        :param logits: [batch_size, num_steps, num_tags]float32, logits
        :param lengths: [batch_size]int32, real length of each sequence
        :param matrix: transaction matrix for inference
        :return:
        """
        # inference final labels usa viterbi Algorithm
        paths = []
        small = -1000.0
        start = np.asarray([[small]*self.num_tags +[0, small]])
        end =   np.asarray([[small]*self.num_tags +[small, 0]])
        for score, length in zip(logits, lengths):
            score = score[:length]
            pad = small * np.ones([length, 2])
            logits = np.concatenate([score, pad], axis=1)
            logits = np.concatenate([start, logits, end], axis=0)
            #print('logits shape:', logits.shape)
            #print('matrix shape:', matrix.shape)
            path, _ = viterbi_decode(logits, matrix)

            paths.append(path[1:len(path)-1])
        return paths

I got the best test F1 score 91.510, which is little larger than that generated by the function with start_logits only.

I also tested different models with the combination of number of hidden layers (see my issue) and with or without start_logits. And I found that two hidden layers with start_logits and end_logits have the best test F1 score.

from chinesener.

zpppy commented on June 11, 2024

have the same question,too. @zjy-ucas

from chinesener.

syw2014 commented on June 11, 2024

Also confused about the loss_layer, why should you add padding before and after the prediction?

from chinesener.

yigexu commented on June 11, 2024

I reduced all the padding in both dimension of tag and sentence_length/time_step, and ran a 10 epochs test. The F1 score seems quite the same.

from chinesener.

vxacezxcv commented on June 11, 2024

The paper Neural architectures for named entity recognition said that

y0 and yn are the start and end tags of a sentence, that we add to the set of possible tags. A is therefore a square matrix of size k+2.

I think here the author only implements the start_logits, but not corresponding end_logits. I revised the codes to add end_logits in loss_layer function.

    def loss_layer(self, project_logits, lengths, name=None):
        """
        calculate crf loss
        :param project_logits: [1, num_steps, num_tags]
        :return: scalar loss
        """
        with tf.variable_scope("crf_loss"  if not name else name):
            small = -1000.0
            # pad logits for crf loss
            start_logits = tf.concat(
                [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), tf.zeros(shape=[self.batch_size, 1, 1]), small * tf.ones(shape=[self.batch_size, 1, 1])], axis=-1)            
            end_logits = tf.concat(
                [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), small * tf.ones(shape=[self.batch_size, 1, 1]), tf.zeros(shape=[self.batch_size, 1, 1])], axis=-1)          
            
            pad_logits = tf.cast(small * tf.ones([self.batch_size, self.num_steps, 2]), tf.float32)            
            logits = tf.concat([project_logits, pad_logits], axis=-1)            
            logits = tf.concat([start_logits, logits, end_logits], axis=1)    
            targets = tf.concat(
                [tf.cast(self.num_tags*tf.ones([self.batch_size, 1]), tf.int32), self.targets, tf.cast((self.num_tags+1)*tf.ones([self.batch_size, 1]), tf.int32)], axis=-1)               
            log_likelihood, self.trans = crf_log_likelihood(
                inputs=logits,
                tag_indices=targets,
                #transition_params=self.trans,
                sequence_lengths=lengths+2)
            return tf.reduce_mean(-log_likelihood)

and decode function

    def decode(self, logits, lengths, matrix):
        """
        :param logits: [batch_size, num_steps, num_tags]float32, logits
        :param lengths: [batch_size]int32, real length of each sequence
        :param matrix: transaction matrix for inference
        :return:
        """
        # inference final labels usa viterbi Algorithm
        paths = []
        small = -1000.0
        start = np.asarray([[small]*self.num_tags +[0, small]])
        end =   np.asarray([[small]*self.num_tags +[small, 0]])
        for score, length in zip(logits, lengths):
            score = score[:length]
            pad = small * np.ones([length, 2])
            logits = np.concatenate([score, pad], axis=1)
            logits = np.concatenate([start, logits, end], axis=0)
            #print('logits shape:', logits.shape)
            #print('matrix shape:', matrix.shape)
            path, _ = viterbi_decode(logits, matrix)

            paths.append(path[1:len(path)-1])
        return paths

I got the best test F1 score 91.510, which is little larger than that generated by the function with start_logits only.

Thanks for pointing out the end_logits bit. However, appending the end_logits by the end of padded sentences should not improve the performance. The crf_log_likelihood will do the mask according to the provided sequence_lengths from the begining of sentences, thus the crf_log_likelihood from your code is actually processing a sequence [start, label, label, ..., label, pad] instead of [start, label, label, ..., label, end] (except for the longest sentence in the batch).

from chinesener.

Why should we expand the shape of logits to [self.num_tags + 1, self.num_tags + 1] ? about chinesener HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent