Giter Club home page Giter Club logo

Comments (5)

gaoisbest avatar gaoisbest commented on June 11, 2024 8

The paper Neural architectures for named entity recognition said that

y0 and yn are the start and end tags of a sentence, that we add to the set of possible tags. A is therefore a square matrix of size k+2.

I think here the author only implements the start_logits, but not corresponding end_logits. I revised the codes to add end_logits in loss_layer function.

    def loss_layer(self, project_logits, lengths, name=None):
        """
        calculate crf loss
        :param project_logits: [1, num_steps, num_tags]
        :return: scalar loss
        """
        with tf.variable_scope("crf_loss"  if not name else name):
            small = -1000.0
            # pad logits for crf loss
            start_logits = tf.concat(
                [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), tf.zeros(shape=[self.batch_size, 1, 1]), small * tf.ones(shape=[self.batch_size, 1, 1])], axis=-1)            
            end_logits = tf.concat(
                [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), small * tf.ones(shape=[self.batch_size, 1, 1]), tf.zeros(shape=[self.batch_size, 1, 1])], axis=-1)          
            
            pad_logits = tf.cast(small * tf.ones([self.batch_size, self.num_steps, 2]), tf.float32)            
            logits = tf.concat([project_logits, pad_logits], axis=-1)            
            logits = tf.concat([start_logits, logits, end_logits], axis=1)    
            targets = tf.concat(
                [tf.cast(self.num_tags*tf.ones([self.batch_size, 1]), tf.int32), self.targets, tf.cast((self.num_tags+1)*tf.ones([self.batch_size, 1]), tf.int32)], axis=-1)               
            log_likelihood, self.trans = crf_log_likelihood(
                inputs=logits,
                tag_indices=targets,
                #transition_params=self.trans,
                sequence_lengths=lengths+2)
            return tf.reduce_mean(-log_likelihood)

and decode function

    def decode(self, logits, lengths, matrix):
        """
        :param logits: [batch_size, num_steps, num_tags]float32, logits
        :param lengths: [batch_size]int32, real length of each sequence
        :param matrix: transaction matrix for inference
        :return:
        """
        # inference final labels usa viterbi Algorithm
        paths = []
        small = -1000.0
        start = np.asarray([[small]*self.num_tags +[0, small]])
        end =   np.asarray([[small]*self.num_tags +[small, 0]])
        for score, length in zip(logits, lengths):
            score = score[:length]
            pad = small * np.ones([length, 2])
            logits = np.concatenate([score, pad], axis=1)
            logits = np.concatenate([start, logits, end], axis=0)
            #print('logits shape:', logits.shape)
            #print('matrix shape:', matrix.shape)
            path, _ = viterbi_decode(logits, matrix)

            paths.append(path[1:len(path)-1])
        return paths

I got the best test F1 score 91.510, which is little larger than that generated by the function with start_logits only.

I also tested different models with the combination of number of hidden layers (see my issue) and with or without start_logits. And I found that two hidden layers with start_logits and end_logits have the best test F1 score.

from chinesener.

zpppy avatar zpppy commented on June 11, 2024

have the same question,too. @zjy-ucas

from chinesener.

syw2014 avatar syw2014 commented on June 11, 2024

Also confused about the loss_layer, why should you add padding before and after the prediction?

from chinesener.

yigexu avatar yigexu commented on June 11, 2024

I reduced all the padding in both dimension of tag and sentence_length/time_step, and ran a 10 epochs test. The F1 score seems quite the same.

from chinesener.

vxacezxcv avatar vxacezxcv commented on June 11, 2024

The paper Neural architectures for named entity recognition said that

y0 and yn are the start and end tags of a sentence, that we add to the set of possible tags. A is therefore a square matrix of size k+2.

I think here the author only implements the start_logits, but not corresponding end_logits. I revised the codes to add end_logits in loss_layer function.

    def loss_layer(self, project_logits, lengths, name=None):
        """
        calculate crf loss
        :param project_logits: [1, num_steps, num_tags]
        :return: scalar loss
        """
        with tf.variable_scope("crf_loss"  if not name else name):
            small = -1000.0
            # pad logits for crf loss
            start_logits = tf.concat(
                [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), tf.zeros(shape=[self.batch_size, 1, 1]), small * tf.ones(shape=[self.batch_size, 1, 1])], axis=-1)            
            end_logits = tf.concat(
                [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), small * tf.ones(shape=[self.batch_size, 1, 1]), tf.zeros(shape=[self.batch_size, 1, 1])], axis=-1)          
            
            pad_logits = tf.cast(small * tf.ones([self.batch_size, self.num_steps, 2]), tf.float32)            
            logits = tf.concat([project_logits, pad_logits], axis=-1)            
            logits = tf.concat([start_logits, logits, end_logits], axis=1)    
            targets = tf.concat(
                [tf.cast(self.num_tags*tf.ones([self.batch_size, 1]), tf.int32), self.targets, tf.cast((self.num_tags+1)*tf.ones([self.batch_size, 1]), tf.int32)], axis=-1)               
            log_likelihood, self.trans = crf_log_likelihood(
                inputs=logits,
                tag_indices=targets,
                #transition_params=self.trans,
                sequence_lengths=lengths+2)
            return tf.reduce_mean(-log_likelihood)

and decode function

    def decode(self, logits, lengths, matrix):
        """
        :param logits: [batch_size, num_steps, num_tags]float32, logits
        :param lengths: [batch_size]int32, real length of each sequence
        :param matrix: transaction matrix for inference
        :return:
        """
        # inference final labels usa viterbi Algorithm
        paths = []
        small = -1000.0
        start = np.asarray([[small]*self.num_tags +[0, small]])
        end =   np.asarray([[small]*self.num_tags +[small, 0]])
        for score, length in zip(logits, lengths):
            score = score[:length]
            pad = small * np.ones([length, 2])
            logits = np.concatenate([score, pad], axis=1)
            logits = np.concatenate([start, logits, end], axis=0)
            #print('logits shape:', logits.shape)
            #print('matrix shape:', matrix.shape)
            path, _ = viterbi_decode(logits, matrix)

            paths.append(path[1:len(path)-1])
        return paths

I got the best test F1 score 91.510, which is little larger than that generated by the function with start_logits only.

I also tested different models with the combination of number of hidden layers (see my issue) and with or without start_logits. And I found that two hidden layers with start_logits and end_logits have the best test F1 score.

Thanks for pointing out the end_logits bit. However, appending the end_logits by the end of padded sentences should not improve the performance. The crf_log_likelihood will do the mask according to the provided sequence_lengths from the begining of sentences, thus the crf_log_likelihood from your code is actually processing a sequence [start, label, label, ..., label, pad] instead of [start, label, label, ..., label, end] (except for the longest sentence in the batch).

from chinesener.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.