Giter Club home page Giter Club logo

Comments (15)

TobiasLee avatar TobiasLee commented on May 16, 2024

Well, it's a shape mismatch. The tf.nn.sparse_softmax_cross_entropy_with_logits() requires the label shape like [batch_size]. I've updated prepare_data.py, you can use load_data with argument one_hot=False to load data, which doesn't convert label to one-hot format.

from text-classification.

kbkreddy avatar kbkreddy commented on May 16, 2024

thanks for that ....but i am getting another problem ...while training its showing val accuracy is 0 and after training test accuracy is also zero

Train Epoch time: 8.773 s
validation accuracy: 0.000
Epoch 17 start !
Train Epoch time: 8.756 s
validation accuracy: 0.000
Epoch 18 start !
Train Epoch time: 8.755 s
validation accuracy: 0.000
Epoch 19 start !
Train Epoch time: 9.088 s
validation accuracy: 0.000
Epoch 20 start !
Train Epoch time: 9.034 s
validation accuracy: 0.000
Training finished, time consumed : 181.54535388946533 s
Start evaluating:

Test accuracy : 0.000000 %

can u help me where i was getting wrong

from text-classification.

kbkreddy avatar kbkreddy commented on May 16, 2024

thanks it solved after using loadv2

from text-classification.

TobiasLee avatar TobiasLee commented on May 16, 2024

@kbkreddy The information you provided is not enough to get some ideas. You might print the loss on training set to see if the model is work correctly and check the accuracy calculation.

from text-classification.

kbkreddy avatar kbkreddy commented on May 16, 2024

when i change the code to work on my dataset its not getting trained(val accuracy is zero) can u help me where i might get wrong..i didnt change much ur code to work on my dataset ,i just changed number of classes to 5 .. one irony is if keep the number of classes to 15 and run on my dataset wcode works perfectly,but problem comes when i change n_classes to 5 .....

from text-classification.

kbkreddy avatar kbkreddy commented on May 16, 2024

is there any way that i can visualize the attention vector ... if can how

from text-classification.

TobiasLee avatar TobiasLee commented on May 16, 2024

@kbkreddy You can print the attention weight by fetching the variable when evaluating the model.

from text-classification.

kbkreddy avatar kbkreddy commented on May 16, 2024

i am really confused which varibale i need to fetch (is it alpha ??)...and can u help me with the code how to get attention vector .. i want see if the attention is really working or not . does the attention vector static for all the inputs or it changes according to input?

from text-classification.

TobiasLee avatar TobiasLee commented on May 16, 2024

@kbkreddy You asked a great question, which helps me find a bug!
Here is the detail about how to see attention is working or not:
We need to fetch alpha during training, so I made alpha become a member of the classifier and define a simple help function in model_helper:

def get_attn_weight(model, sess, batch):
    feed_dict = make_train_feed_dict(model, batch)
    return sess.run(model.alpha, feed_dict)

Then, fetch it when training or evaluating:

        for x_batch, y_batch in fill_feed_dict(x_train, y_train, config["batch_size"]):
            return_dict = run_train_step(classifier, sess, (x_batch, y_batch))
            attn = get_attn_weight(classifier, sess, (x_batch, y_batch))
            print(np.reshape(attn, (config["batch_size"], config["max_len"])))

Well, the result I got is that all the weight is equal to 1, like [ 1, 1, , .. , 1, 1, ], obviously, it's wrong.
I check the code then, the problem is the dimension of softmax operated on, so I modify the code and get the supposed result like below:

[[0.07075115 0.09335954 0.09848893 0.12299125 0.12147289 0.17489243
0.22785075 0.09019314]
[0.10558738 0.15054257 0.11275174 0.07762329 0.09574074 0.09661827
0.12752177 0.23361419]
[0.08490831 0.13731416 0.16969529 0.18511814 0.14524785 0.11838076
0.09544718 0.06388821]
[0.07429677 0.07080048 0.6452198 0.05168323 0.04661963 0.04235588
0.02698826 0.04203588]]

To keep the result short, I made max_len = 8 and batch_size = 4. I've push the new code, you can try and test it. If you find something wrong, feel free to ask a question. Thank you very much ~

from text-classification.

kbkreddy avatar kbkreddy commented on May 16, 2024

@TobiasLee yeah i too came across that error ,thanks for the correction ,its gving some values now ....i got one doubt.. i got below alpha values

Susan Su Ryden is a legislator in the U.S. state of Colorado. Elected to the Colorado House of Representatives as a Democrat in 2008 Ryden represents House District 36 which encompasses eastern Aurora Colorado.
[(0.24889317, 'a'), (0.17085373, 'in'), (0.11460834, 'the'), (0.07909548, 'legislator'), (0.06332039, 'is'), (0.05122245, 'Colorado.'), (0.037922077, 'Su'), (0.03440894, 'U.S.'), (0.025408622, 'Ryden'), (0.022541974, 'Susan')]
true value-->5pred value-->5

how come it is able predict correctly even it got highest 3 softmax probabilities 'a' 'in' 'the' .. i dont think so ''a in the'' alone are able to predict class 5

from text-classification.

kbkreddy avatar kbkreddy commented on May 16, 2024

may i know what is diff bw attention mechanism in modules folder and the one in attn bilstm (this code) thanks ..
anyways this is really clearing my doubts on attention mechanism thanks @TobiasLee

from text-classification.

TobiasLee avatar TobiasLee commented on May 16, 2024

To my understand, the attention mechanism is just a weighted sum of hidden states, thus can provide more information for prediction. The example you showed may be a bad case, which I can not explain either. As for the differences between modules and in attn_bi_lstm.py, the calculations of attention weight are different. Computing attention weight behaves like figuring out the relationship between input and output, here, the input and output are both hidden states, thus we call it self-attention (you can refer to 《Attention is All You Need》 for more details). Usually this relation is computed by a forward neural networks, and the implementation of this neural networks differs, but the idea is still same.

from text-classification.

kbkreddy avatar kbkreddy commented on May 16, 2024

ohk thanks for that ,now i am getting good at this ... one more doubt i came across pls see output below

Marynin [maˈrɨnin] is a village in the administrative district of Gmina Siedliszcze within Chełm County Lublin Voivodeship in eastern Poland.
sorted attention values (alphas):-
[(0.9983796, 'village'), (0.001056958, 'the'), (0.00024786787, 'administrative'), (0.00012790732, 'in'), (7.457812e-05, 'a'), (4.7753467e-05, 'Marynin'), (4.2701886e-05, '[maˈrɨnin]'), (6.1858645e-06, 'is'), (6.0190782e-06, 'district'), (4.3025084e-06, 'Voivodeship')]
sorted  outupt of BiLSTM with attention(rnn_output*attention vector) :-
[(0.75726455, 'County'), (0.60007113, 'a'), (0.42976815, 'in'), (0.39802772, 'Gmina'), (0.25964364, 'the'), (0.20377639, 'of'), (0.20130117, 'village'), (0.10621535, 'Poland.'), (0.10221334, 'in'), (0.09738086, 'is')]
 true value-->9pred value-->9 (class 9 =village)

i used the attention mechanism in modules folder, alpha vector generation seems to be correct in this example but check that above output of rnn_output*attention vector is not i expected all values in the second array are similar but what i am expecting is the BiLSTM node of village should be having more priority right?
i think the mistake lies in matrix multiplication of attention vector with rnn outputs ..
output = tf.reduce_sum(inputs * tf.reshape(alphas, [-1, sequence_length, 1]), 1) #from attention.py line 77
as of my understanding first index of attention vector should be multiplied with rnn_output of 1 node (which length is 64(hidden units in lstm cell)) ...
can u pls help me get out of this....

from text-classification.

TobiasLee avatar TobiasLee commented on May 16, 2024

Well, actually I'm not clear what's your problem. Do you mean the implementation of the attention module may be wrong? I check the origin repo and find the author updated the attention module, so I updated it, too. You can try the new implementation.

from text-classification.

freshforlife avatar freshforlife commented on May 16, 2024

when i change the code to work on my dataset its not getting trained(val accuracy is zero) can u help me where i might get wrong..i didnt change much ur code to work on my dataset ,i just changed number of classes to 5 .. one irony is if keep the number of classes to 15 and run on my dataset wcode works perfectly,but problem comes when i change n_classes to 5 .....

@TobiasLee : I too ran into the same issue as @kbkreddy above. I'm running the adversarial_abblstm.py ( adversarial training example ) In my dataset the number of classes = 7 , I get the validation accuracy = 0, but if I change the n_class = 15, I do get some decent accuracy values.

Can you look into the above ? In order to fetch the attention vector while training i did modify the alpha in my code to self.alpha

self.alpha = tf.nn.softmax(tf.matmul(tf.reshape(M, [-1, self.hidden_size]),
                                                tf.reshape(W, [-1, 1])))   

in the cal_loss_logit function.

from text-classification.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.