Comments (15)
Well, it's a shape mismatch. The tf.nn.sparse_softmax_cross_entropy_with_logits()
requires the label shape like [batch_size]. I've updated prepare_data.py
, you can use load_data
with argument one_hot=False
to load data, which doesn't convert label to one-hot format.
from text-classification.
thanks for that ....but i am getting another problem ...while training its showing val accuracy is 0 and after training test accuracy is also zero
Train Epoch time: 8.773 s
validation accuracy: 0.000
Epoch 17 start !
Train Epoch time: 8.756 s
validation accuracy: 0.000
Epoch 18 start !
Train Epoch time: 8.755 s
validation accuracy: 0.000
Epoch 19 start !
Train Epoch time: 9.088 s
validation accuracy: 0.000
Epoch 20 start !
Train Epoch time: 9.034 s
validation accuracy: 0.000
Training finished, time consumed : 181.54535388946533 s
Start evaluating:
Test accuracy : 0.000000 %
can u help me where i was getting wrong
from text-classification.
thanks it solved after using loadv2
from text-classification.
@kbkreddy The information you provided is not enough to get some ideas. You might print the loss on training set to see if the model is work correctly and check the accuracy calculation.
from text-classification.
when i change the code to work on my dataset its not getting trained(val accuracy is zero) can u help me where i might get wrong..i didnt change much ur code to work on my dataset ,i just changed number of classes to 5 .. one irony is if keep the number of classes to 15 and run on my dataset wcode works perfectly,but problem comes when i change n_classes to 5 .....
from text-classification.
is there any way that i can visualize the attention vector ... if can how
from text-classification.
@kbkreddy You can print the attention weight by fetching the variable when evaluating the model.
from text-classification.
i am really confused which varibale i need to fetch (is it alpha ??)...and can u help me with the code how to get attention vector .. i want see if the attention is really working or not . does the attention vector static for all the inputs or it changes according to input?
from text-classification.
@kbkreddy You asked a great question, which helps me find a bug!
Here is the detail about how to see attention is working or not:
We need to fetch alpha
during training, so I made alpha
become a member of the classifier and define a simple help function in model_helper
:
def get_attn_weight(model, sess, batch):
feed_dict = make_train_feed_dict(model, batch)
return sess.run(model.alpha, feed_dict)
Then, fetch it when training or evaluating:
for x_batch, y_batch in fill_feed_dict(x_train, y_train, config["batch_size"]):
return_dict = run_train_step(classifier, sess, (x_batch, y_batch))
attn = get_attn_weight(classifier, sess, (x_batch, y_batch))
print(np.reshape(attn, (config["batch_size"], config["max_len"])))
Well, the result I got is that all the weight is equal to 1, like [ 1, 1, , .. , 1, 1, ]
, obviously, it's wrong.
I check the code then, the problem is the dimension of softmax operated on, so I modify the code and get the supposed result like below:
[[0.07075115 0.09335954 0.09848893 0.12299125 0.12147289 0.17489243
0.22785075 0.09019314]
[0.10558738 0.15054257 0.11275174 0.07762329 0.09574074 0.09661827
0.12752177 0.23361419]
[0.08490831 0.13731416 0.16969529 0.18511814 0.14524785 0.11838076
0.09544718 0.06388821]
[0.07429677 0.07080048 0.6452198 0.05168323 0.04661963 0.04235588
0.02698826 0.04203588]]
To keep the result short, I made max_len = 8
and batch_size = 4
. I've push the new code, you can try and test it. If you find something wrong, feel free to ask a question. Thank you very much ~
from text-classification.
@TobiasLee yeah i too came across that error ,thanks for the correction ,its gving some values now ....i got one doubt.. i got below alpha values
Susan Su Ryden is a legislator in the U.S. state of Colorado. Elected to the Colorado House of Representatives as a Democrat in 2008 Ryden represents House District 36 which encompasses eastern Aurora Colorado.
[(0.24889317, 'a'), (0.17085373, 'in'), (0.11460834, 'the'), (0.07909548, 'legislator'), (0.06332039, 'is'), (0.05122245, 'Colorado.'), (0.037922077, 'Su'), (0.03440894, 'U.S.'), (0.025408622, 'Ryden'), (0.022541974, 'Susan')]
true value-->5pred value-->5
how come it is able predict correctly even it got highest 3 softmax probabilities 'a' 'in' 'the' .. i dont think so ''a in the'' alone are able to predict class 5
from text-classification.
may i know what is diff bw attention mechanism in modules folder and the one in attn bilstm (this code) thanks ..
anyways this is really clearing my doubts on attention mechanism thanks @TobiasLee
from text-classification.
To my understand, the attention mechanism is just a weighted sum of hidden states, thus can provide more information for prediction. The example you showed may be a bad case, which I can not explain either. As for the differences between modules and in attn_bi_lstm.py
, the calculations of attention weight are different. Computing attention weight behaves like figuring out the relationship between input and output, here, the input and output are both hidden states, thus we call it self-attention
(you can refer to 《Attention is All You Need》 for more details). Usually this relation is computed by a forward neural networks, and the implementation of this neural networks differs, but the idea is still same.
from text-classification.
ohk thanks for that ,now i am getting good at this ... one more doubt i came across pls see output below
Marynin [maˈrɨnin] is a village in the administrative district of Gmina Siedliszcze within Chełm County Lublin Voivodeship in eastern Poland.
sorted attention values (alphas):-
[(0.9983796, 'village'), (0.001056958, 'the'), (0.00024786787, 'administrative'), (0.00012790732, 'in'), (7.457812e-05, 'a'), (4.7753467e-05, 'Marynin'), (4.2701886e-05, '[maˈrɨnin]'), (6.1858645e-06, 'is'), (6.0190782e-06, 'district'), (4.3025084e-06, 'Voivodeship')]
sorted outupt of BiLSTM with attention(rnn_output*attention vector) :-
[(0.75726455, 'County'), (0.60007113, 'a'), (0.42976815, 'in'), (0.39802772, 'Gmina'), (0.25964364, 'the'), (0.20377639, 'of'), (0.20130117, 'village'), (0.10621535, 'Poland.'), (0.10221334, 'in'), (0.09738086, 'is')]
true value-->9pred value-->9 (class 9 =village)
i used the attention mechanism in modules folder, alpha vector generation seems to be correct in this example but check that above output of rnn_output*attention vector is not i expected all values in the second array are similar but what i am expecting is the BiLSTM node of village should be having more priority right?
i think the mistake lies in matrix multiplication of attention vector with rnn outputs ..
output = tf.reduce_sum(inputs * tf.reshape(alphas, [-1, sequence_length, 1]), 1) #from attention.py line 77
as of my understanding first index of attention vector should be multiplied with rnn_output of 1 node (which length is 64(hidden units in lstm cell)) ...
can u pls help me get out of this....
from text-classification.
Well, actually I'm not clear what's your problem. Do you mean the implementation of the attention module may be wrong? I check the origin repo and find the author updated the attention module, so I updated it, too. You can try the new implementation.
from text-classification.
when i change the code to work on my dataset its not getting trained(val accuracy is zero) can u help me where i might get wrong..i didnt change much ur code to work on my dataset ,i just changed number of classes to 5 .. one irony is if keep the number of classes to 15 and run on my dataset wcode works perfectly,but problem comes when i change n_classes to 5 .....
@TobiasLee : I too ran into the same issue as @kbkreddy above. I'm running the adversarial_abblstm.py ( adversarial training example ) In my dataset the number of classes = 7 , I get the validation accuracy = 0, but if I change the n_class = 15, I do get some decent accuracy values.
Can you look into the above ? In order to fetch the attention vector while training i did modify the alpha in my code to self.alpha
self.alpha = tf.nn.softmax(tf.matmul(tf.reshape(M, [-1, self.hidden_size]),
tf.reshape(W, [-1, 1])))
in the cal_loss_logit
function.
from text-classification.
Related Issues (17)
- data not exist HOT 2
- How can I load data HOT 6
- Wrong output dimension of the embedding_lookup table HOT 1
- Test Accuracy is lower than the Performance in Readme HOT 1
- validation and testing accuracy=0 HOT 2
- Use pre-trained embedding instead of randome one HOT 1
- 关于 Adversarial Training Methods For Semi-Supervised Text Classification代码中的一个问题 HOT 2
- wrong with cnn.py HOT 2
- attn_bi_lstm.py HOT 1
- cannot load the dataset HOT 1
- attn_bi_lstm.py模型的y_hat那里是不是写错了? HOT 1
- Can i ask the editionof the dbpediafile HOT 2
- 怎么存模型呢
- Error in attn_bi_lstm.py while feeding data label during training HOT 2
- adversarial_abblstm.py - validation accuracy: 0% / test accuracy: 0% HOT 1
- A problem about the Paper: Adversarial Training Methods For Semi-Supervised Text Classification HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text-classification.