Giter Club home page Giter Club logo

hierarchical-attention-networks's People

Contributors

ematvey avatar mtdersvan avatar nthain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hierarchical-attention-networks's Issues

Performance on the paper's dataset

The performance reported in the Readme has not been computed on the same dataset used in the original paper (Hierarchical Attention Networks for Document Classification, Tang et al. 2016).

It would be more convenient for understanding the real performance of the implementation to report the accuracy on that dataset, where training, dev and test sets are predefined.
The dataset can be downloaded from the Duyu Tang's homepage.
Download link: http://ir.hit.edu.cn/~dytang/paper/emnlp2015/emnlp-2015-data.7z

Error While Running yelp_prepare.py

Hellow,While Running yelp_prepare.py, I got error log as follow.
The code has been ran with Yelp dataset round10 and tensorflow 1.1.0 and Python 3.5.2 in Linux.

0it [00:00, ?it/s]
Traceback (most recent call last):
File "yelp_prepare.py", line 98, in
make_data()
File "yelp_prepare.py", line 78, in make_data
for sent in en(review['text']).sents:
File "/home/wangtao/py35env/lib/python3.5/site-packages/spacy/language.py", line 330, in call
for name, proc in self.pipeline:
TypeError: 'Tagger' object is not iterable

Performance on Yelp 15

I used the same dataset (Download link: http://ir.hit.edu.cn/~dytang/paper/emnlp2015/emnlp-2015-data.7z), but can only get 68.5% on yelp 2015 (The paper said they can get 71%), is there any wrong with my parameters? Here are my parameters:
vocab_size: 49000 (Byte-Pair-Encoding with 50000 byte pairs; all tokens that appears no less than 5 times)
learning_rate: 0.001
max tokens in a sentence: 48 (over 95% sentences are shorter than 48 tokens)
max sentences in a document: 32 (over 95% docs are shorter than 32 sentences)
word_embedding_size: 300 (pre-trained with word2vec)
word_output_size: 128
sentence_output_size: 128
LSTM hidden_dim: 64
LSTM layer_num: 5
dropout_keep_prob: 0.8 (using tf.nn.dropout, add dropout after word_output and sentence_output)

some error in yelp_prepare.py

When i run the yelp_prepare.py, it told me that ValueError: sentence boundary detection requires the dependency parse, which requires data to be installed. For more info, see the documentation:(http://spacy.io/docs/usage)
Could you please give me some advice!
Thanks a lot!

What's the Accuracy ?

Could you tell me the acc in the Yelp-2013/2014/2015 by running your code? I run the code, but I could not reach the acc written in the paper.

Thanks!

GRU VS LSTM

Hi @ematvey ,
First of all thanks a lot for your implementation !
This is actually more of a question than an Issue: If I'm not mistaken, your code seems to tell that you initially used GRU, then left it out for LSTM cells. Can I know the reasons ?

Error While Training with Yelp Dataset

While I trained model with Yelp Dataset prepared by yelp_prepare.py, I got error log as follow.
The code has been ran with tensorflow 1.0.1 and Python 2.7.12 in Linux.

....
step 251, loss=1.55983, accuracy=0.2, t=19.89, inputs=(30, 30, 30)
Traceback (most recent call last):
  File "worker.py", line 220, in <module>
    main()
  File "worker.py", line 215, in main
    train()
  File "worker.py", line 195, in train
    ], fd)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[28,0,6] = 50000 is not in [0, 50000)
	 [[Node: tcm/tcm/embedding/embedding_lookup = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@tcm/embedding/embedding_matrix"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](tcm/embedding/embedding_matrix/read, _recv_tcm/inputs_0)]]

Caused by op u'tcm/tcm/embedding/embedding_lookup', defined at:
  File "worker.py", line 220, in <module>
    main()
  File "worker.py", line 215, in main
    train()
  File "worker.py", line 165, in train
    model, saver = model_fn(s)
  File "worker.py", line 97, in HAN_model_1
    is_training=is_training,
  File "/data/zhiheng/project/deep-text-classifier/HAN_model.py", line 63, in __init__
    self._init_embedding(scope)
  File "/data/zhiheng/project/deep-text-classifier/HAN_model.py", line 102, in _init_embedding
    self.embedding_matrix, self.inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/embedding_ops.py", line 111, in embedding_lookup
    validate_indices=validate_indices)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1359, in gather
    validate_indices=validate_indices, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): indices[28,0,6] = 50000 is not in [0, 50000)
	 [[Node: tcm/tcm/embedding/embedding_lookup = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@tcm/embedding/embedding_matrix"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](tcm/embedding/embedding_matrix/read, _recv_tcm/inputs_0)]]

Are uw and us global weights? just to conform.

Thank you ematvey for this paper.

I wonder the uw and us are two vectors as global weights, or there are different uw(s) for each sentence, and different us(s) for each document?

From the code I think these are global vectors, am I right? Please help me confirm this.

As in the model_components.py it is said

Performs task-specific attention reduction, using learned
attention context vector (constant within task of interest).

The uw or us are defined in the function task_specific_attention(), although they are both referred to the attention_context_vector, but in the computational graph, are they different vectors? It would be helpful if you could explain a little about this part.

attention_context_vector = tf.get_variable(name='attention_context_vector', shape=[output_size], initializer=initializer, dtype=tf.float32)

Thank you.

The Attention code

Hi,ematvey. I have read the paper and I don't understand your code of the attention mechanism. According to you code ,how can I get the weight of each word in a sentence ?
thx.

Embeddings for special tokens/padding?

I was wondering where in the code you are initializing the embeddings for the special tokens in the vocabulary (like unknown and padding words) - shouldn't these be set to zero-embeddings and excluded from training? Or how are your dealing with these?

ValueError in running worker.py

Sorry to bother you again.
I used the tensorflow=1.2.1, python=3.6, run worker.py as your instructions, but it encountered an error.
ValueError: Trying to share variable tcm/word/fw/multi_rnn_cell/cell_0/bn_lstm/w_xh, but specified shape(100,320) and found shpae(200,320).

en-core-web-sm needs to be installed beforehand

When trying to install using the requirements.txt, I got errors about "Could not find a version that satisfies the requirement en-core-web-sm"

This can be avoided by first installing spacy, then installing the english model as in step 2, and then installing with the requirements.txt.

Is the embedding initialized with a pre-trained one?

From the code it seems the embedding is not initialized with a pre-trained embedding (i.e. word2vec), although in the paper it says so. Am I right or I missed something? Many thanks!

relevant code in _init_embedding

def _init_embedding(self, scope): #seems did not using word embedding
with tf.variable_scope(scope):
with tf.variable_scope("embedding") as scope:
self.embedding_matrix = tf.get_variable(
name="embedding_matrix",
shape=[self.vocab_size, self.embedding_size],
initializer=layers.xavier_initializer(),
dtype=tf.float32)
self.inputs_embedded = tf.nn.embedding_lookup(
self.embedding_matrix, self.inputs)

Mask for attention weight

Hi ematvey,

Thanks for sharing the code!

I notice the attention weights for sentence & word are not mask according to their actual length, which means the model will "pay attention" to the useless input. Is there a reason you didn't use a mask for the project?

Please correct me if I am wrong. Thanks!
Xianlonb

Won't the code leads to different input shape for different batch?

In the file data_util.py, the code is as follows:
`def batch(inputs):
batch_size = len(inputs)

document_sizes = np.array([len(doc) for doc in inputs], dtype=np.int32) # Different batch will
# have different document_sizes.
document_size = document_sizes.max() # Document with maximum sentence number.

sentence_sizes_ = [[len(sent) for sent in doc] for doc in inputs] # every sentence len in each document.
sentence_size = max(map(max, sentence_sizes_)) # The maximum sentence length.

b = np.zeros(shape=[batch_size, document_size, sentence_size], dtype=np.int32) # == PAD

sentence_sizes = np.zeros(shape=[batch_size, document_size], dtype=np.int32)
for i, document in enumerate(inputs):
for j, sentence in enumerate(document):
sentence_sizes[i, j] = sentence_sizes_[i][j]
for k, word in enumerate(sentence):
b[i, j, k] = word

return b, document_sizes, sentence_sizes`
The output batch depends on the inputs. Won't this leads to different shapes of b since the input is not padded before. Each document may have different number of sentences and each sentence may have different number words.

How to make `TensorBoard Projector` work.

I'd like to comment out the Embedding Projector part to visualize the result, but don't know how. For instance, the embedding.metadata_path = vocab_tsv in the original code does not work since vocab_tsv does not exists. What variable should I assign in this statement?

image

dev accuracy: nan???

I ran the program on small data not a few values in review.json. I get the value of dev accuracy = nan during training. I checked the file dev.dataset and it is not empty. Can someone explain to me?
screenshot from 2018-09-08 13-16-21

Attention layer output

The method task_specific_attention applies attention to the projected vectors instead of the hidden vectors (output from RNN cell).

Has it been applied purposefully or has the information on attention according to the paper been missed out where final sentence vector is weighted summation of hidden states and NOT inner projected vector?

Implementation using tf.contrib.seq2seq.

Hi, your Hierarchical Attention Networks for Document Classification (Yang et al., 2016) implementation looks very good, especially after bug fixes, nice work!

I wonder what it would take to reimplement your HANs with the new tf.contrib.seq2seq API (> r.1.2). There are some examples how to do encoder-decoder attention with seq2seq, but no examples of pure encoding ('reducing') attention.

Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.