Finding it strange. Trying the sentence "Crashing tv isn't showing" yields a sentiment

Incorrect polarity calculation about sentimentr HOT 8 CLOSED

trinker commented on August 18, 2024

Incorrect polarity calculation

from sentimentr.

Comments (8)

swsankar commented on August 18, 2024 1

Thank you for the detailed insights

x = data.frame(x = c("especially", "most", "more", "bigger"), y = c(2,4)
My bad - I intended to use the c(rep(2,4)). Now I have corrected that.
I now get why "isn't" and "isn't showing" is yielding 0 - And it makes perfect sense. Guess for my domain, I am gonna try n.after=1 to see how it does overall
I completely understand that 100% accuracy is impossible and I have seen your comparison as well. Just that sentences like "Crashing TV isn't showing", "Don't even bother with this" kept bothering me if I am doing anything wrong. Now, I re-collect how the valence shifters context is resulting in this
Lastly on the update_key, I understand the comparison part before adding a new word. But curious with my specific example. When I try to add Looks like in the Valence table, it did not allow me. I verified that it does not exists in both Polarity as well as Valence table. However I kept getting the error. One thing to note is there was this word like that already exists in Polarity table. Was wondering if the logic checks for every singular word in a n-gram word for duplication when we add a n-gram word to the table ?

And once again, thanks a lot for building an excellent sentiment analysis tool.

from sentimentr.

trinker commented on August 18, 2024

Thanks for trying sentimentr.

It's hard to tell discuss this without reproducible example. I believe I know where you are getting tripped up but will wait until you make a reproducible example so I can see your process. Please use markdown formatting to display intext and blocks of code so that it's easy to read & grab.

from sentimentr.

swsankar commented on August 18, 2024

Not sure if I understand the ask
I am trying to evaluate/debug the anomalies I am getting in terms of the sentiment score and eventually improve my dictionary. One such example is for the sentence "Crashing tv isn't showing"

All I am doing is running <sentimentby()> function for the above sentence in RStudio as it appears above

sentiment_by("Crashing tv isn't showing", by = NULL, polarity_dt = pk_table, valence_shifters_dt = vs_table)

Below is what I use to update the polarity table and Valence table.

vs_table <- sentimentr::valence_shifters_table  
 vs_table <- update_key(vs_table, drop=NULL, x = data.frame(x = c("especially", "most", "more", "bigger"), y = c(2,4), stringsAsFactors = FALSE), comparison=sentimentr::polarity_table, sentiment=F)

pk_table <- sentimentr::polarity_table
pk_table <- update_key(pk_table,  x = data.frame(x = c("used to", "outdated", "restarts", "reboot", "i wish"), y = c(rep(-2, 5))

from sentimentr.

swsankar commented on August 18, 2024

Here are more examples where I am getting a positive score instead of a negative

Horrible can't even watch the game and it's football season, this app needs a face lift.

Looks like half the channels from basic cable lineup are missing! (tried adding 'looks like' as a de-amp, but getting a duplication error while both the polarity & valence table does not contain it)

It crashes every time I use it. They marketed it like it was as good or better then the Netflix app... Please. Don't even bother with this.

from sentimentr.

trinker commented on August 18, 2024

Let's start with this:

x = data.frame(x = c("especially", "most", "more", "bigger"), y = c(2,4)

These two vectors are not equal in length so R invokes the recycling rule to make the data.frame. IS that really what you want? Also...What is the 4 for? It's use isn't documented so I'm wondering what you are using it for.

Realize that negators (isn't in this case) before or after a polarized word can flip it's polarity. The default is if a negator is 2 words after a polarized word it flips the sign. You can tone this down but may affect other statements the opposite way.

sentiment_by("Crashing tv isn't showing", by = NULL, polarity_dt = pk_table, valence_shifters_dt = vs_table, n.after = 1)

gives:

   element_id word_count sd ave_sentiment
1:          1          4 NA          -0.5

In your original post you wrote:

Sentiment for "isn't showing" yields 0
Sentiment for "isn't " yields 0 - This is surprising coz I have "isn't" as negator in my valence table

isn't showing showing isn't a polarized word so it's not surprising that this is considered neutral. Your second statement has me believing you don't understand the difference between a negative word and a negator. Negative words make polarity negative. Negators flip the sign of the polarity. A negator has no polarity of its own, it can only affect polarized words.

from sentimentr.

trinker commented on August 18, 2024

The sentences you are showing are not surprising to me. Here's a few things to note:

There is no claims to sentimentr being 100% accurate. Even the best taggers such as Stanford's does not come close to 100% accurate. See the comparison between a few taggers here: https://github.com/trinker/sentimentr#comparing-sentimentr-syuzhet-and-stanford
The sentiment_by function averages the sentiments for each sentence using a simple mean. So if you have a combo of negative and positive the eman smoothes that out and may not be what you want. Use sentiment and figure out how to handle the differences between sentences yourself.
The tagger requires properly formatted sentences as the tagger is based on a model of how English works. This sentence "Horrible can't even watch the game and it's football season, this app needs a face lift." in particular breaks this model. This is actually 2 sentences, not one. There should be a period after the word horrible. Instead the word 'can't' negated horrible. Not what you want.

Also realize the update_key protects you from adding words to a key when they are found in the other key. In this case it won't let you add isn't to the sentiment key because it's in the valence key. You'll need to update the valence key first using the drop argument. This is why you're getting warnings. The key's are data.table objects so you can see if your added words made it in the by looking at the key.

The act of making dictionaries is important and the format in sentimentr was designed to be mutable but requires attention to detail. As you go through this process, if you have ideas to make the UX of dictionary updating smoother please share.

from sentimentr.

trinker commented on August 18, 2024

I will check int this.

from sentimentr.

trinker commented on August 18, 2024

Can you show me the code you tried? It works for me. Here's my code and output:

update_key(
    valence_shifters_table, 
    x = data.frame(x = c("Looks like"), y = c(3)), 
    comparison = sentimentr::polarity_table
)

Output:

                x y
 1:         acute 2
 2:       acutely 2
 3:         ain't 1
 4:      although 4
 5:        aren't 1
 6:        barely 3
 7:           but 4
 8:         can't 1
 9:        cannot 1
10:       certain 2
11:     certainly 2
12:      colossal 2
13:    colossally 2
14:      couldn't 1
15:          deep 2
16:        deeply 2
17:      definite 2
18:    definitely 2
19:        didn't 1
20:       doesn't 1
21:         don't 1
22:      enormous 2
23:    enormously 2
24:       extreme 2
25:     extremely 2
26:       faintly 3
27:           few 3
28:       greatly 2
29:        hardly 3
30:        hasn't 1
31:       haven't 1
32:       heavily 2
33:         heavy 2
34:          high 2
35:        highly 2
36:       however 4
37:          huge 2
38:        hugely 2
39:       immense 2
40:     immensely 2
41:  incalculable 2
42:  incalculably 2
43:         isn't 1
44:         least 3
45:        little 3
46:    looks like 3
47:       massive 2
48:     massively 2
49:      mightn't 1
50:          more 2
51:          much 2
52:       mustn't 1
53:       neither 1
54:         never 1
55:            no 1
56:        nobody 1
57:          none 1
58:           nor 1
59:           not 1
60:          only 3
61:    particular 2
62:  particularly 2
63:       purpose 2
64:     purposely 2
65:         quite 2
66:        rarely 3
67:          real 2
68:        really 2
69:        seldom 3
70:       serious 2
71:     seriously 2
72:        severe 2
73:      severely 2
74:        shan't 1
75:     shouldn't 1
76:   significant 2
77: significantly 2
78:      slightly 3
79:      sparesly 3
80:  sporadically 3
81:          sure 2
82:        surely 2
83:       totally 2
84:          true 2
85:         truly 2
86:          vast 2
87:        vastly 2
88:          very 2
89:      very few 3
90:   very little 3
91:        wasn't 1
92:       weren't 1
93:         won't 1
94:      wouldn't 1
                x y
Warning message:
In update_key(valence_shifters_table, x = data.frame(x = c("Looks like"),  :
  One or more terms in the first column contain capital letters. Capitals are ignored.
  I found the following suspects:

   * Looks like

These terms have been lower cased.

from sentimentr.

Incorrect polarity calculation about sentimentr HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent