Comments (8)
Thank you for the detailed insights
x = data.frame(x = c("especially", "most", "more", "bigger"), y = c(2,4)
My bad - I intended to use thec(rep(2,4))
. Now I have corrected that.- I now get why "isn't" and "isn't showing" is yielding 0 - And it makes perfect sense. Guess for my domain, I am gonna try n.after=1 to see how it does overall
- I completely understand that 100% accuracy is impossible and I have seen your comparison as well. Just that sentences like "Crashing TV isn't showing", "Don't even bother with this" kept bothering me if I am doing anything wrong. Now, I re-collect how the valence shifters context is resulting in this
- Lastly on the update_key, I understand the comparison part before adding a new word. But curious with my specific example. When I try to add Looks like in the Valence table, it did not allow me. I verified that it does not exists in both Polarity as well as Valence table. However I kept getting the error. One thing to note is there was this word like that already exists in Polarity table. Was wondering if the logic checks for every singular word in a n-gram word for duplication when we add a n-gram word to the table ?
And once again, thanks a lot for building an excellent sentiment analysis tool.
from sentimentr.
Thanks for trying sentimentr.
It's hard to tell discuss this without reproducible example. I believe I know where you are getting tripped up but will wait until you make a reproducible example so I can see your process. Please use markdown formatting to display intext and blocks of code so that it's easy to read & grab.
from sentimentr.
Not sure if I understand the ask
I am trying to evaluate/debug the anomalies I am getting in terms of the sentiment score and eventually improve my dictionary. One such example is for the sentence "Crashing tv isn't showing"
All I am doing is running <sentimentby()>
function for the above sentence in RStudio as it appears above
sentiment_by("Crashing tv isn't showing", by = NULL, polarity_dt = pk_table, valence_shifters_dt = vs_table)
Below is what I use to update the polarity table and Valence table.
vs_table <- sentimentr::valence_shifters_table
vs_table <- update_key(vs_table, drop=NULL, x = data.frame(x = c("especially", "most", "more", "bigger"), y = c(2,4), stringsAsFactors = FALSE), comparison=sentimentr::polarity_table, sentiment=F)
pk_table <- sentimentr::polarity_table
pk_table <- update_key(pk_table, x = data.frame(x = c("used to", "outdated", "restarts", "reboot", "i wish"), y = c(rep(-2, 5))
from sentimentr.
Here are more examples where I am getting a positive score instead of a negative
Horrible can't even watch the game and it's football season, this app needs a face lift.
Looks like half the channels from basic cable lineup are missing! (tried adding 'looks like' as a de-amp, but getting a duplication error while both the polarity & valence table does not contain it)
It crashes every time I use it. They marketed it like it was as good or better then the Netflix app... Please. Don't even bother with this.
from sentimentr.
Let's start with this:
x = data.frame(x = c("especially", "most", "more", "bigger"), y = c(2,4)
These two vectors are not equal in length so R invokes the recycling rule to make the data.frame. IS that really what you want? Also...What is the 4 for? It's use isn't documented so I'm wondering what you are using it for.
Realize that negators (isn't
in this case) before or after a polarized word can flip it's polarity. The default is if a negator is 2 words after a polarized word it flips the sign. You can tone this down but may affect other statements the opposite way.
sentiment_by("Crashing tv isn't showing", by = NULL, polarity_dt = pk_table, valence_shifters_dt = vs_table, n.after = 1)
gives:
element_id word_count sd ave_sentiment
1: 1 4 NA -0.5
In your original post you wrote:
Sentiment for "isn't showing" yields 0
Sentiment for "isn't " yields 0 - This is surprising coz I have "isn't" as negator in my valence table
isn't showing
showing isn't a polarized word so it's not surprising that this is considered neutral. Your second statement has me believing you don't understand the difference between a negative word and a negator. Negative words make polarity negative. Negators flip the sign of the polarity. A negator has no polarity of its own, it can only affect polarized words.
from sentimentr.
The sentences you are showing are not surprising to me. Here's a few things to note:
- There is no claims to sentimentr being 100% accurate. Even the best taggers such as Stanford's does not come close to 100% accurate. See the comparison between a few taggers here: https://github.com/trinker/sentimentr#comparing-sentimentr-syuzhet-and-stanford
- The
sentiment_by
function averages the sentiments for each sentence using a simple mean. So if you have a combo of negative and positive the eman smoothes that out and may not be what you want. Usesentiment
and figure out how to handle the differences between sentences yourself. - The tagger requires properly formatted sentences as the tagger is based on a model of how English works. This sentence "Horrible can't even watch the game and it's football season, this app needs a face lift." in particular breaks this model. This is actually 2 sentences, not one. There should be a period after the word horrible. Instead the word 'can't' negated horrible. Not what you want.
Also realize the update_key
protects you from adding words to a key when they are found in the other key. In this case it won't let you add isn't to the sentiment key because it's in the valence key. You'll need to update the valence key first using the drop argument. This is why you're getting warnings. The key's are data.table objects so you can see if your added words made it in the by looking at the key.
The act of making dictionaries is important and the format in sentimentr was designed to be mutable but requires attention to detail. As you go through this process, if you have ideas to make the UX of dictionary updating smoother please share.
from sentimentr.
I will check int this.
from sentimentr.
Can you show me the code you tried? It works for me. Here's my code and output:
update_key(
valence_shifters_table,
x = data.frame(x = c("Looks like"), y = c(3)),
comparison = sentimentr::polarity_table
)
Output:
x y
1: acute 2
2: acutely 2
3: ain't 1
4: although 4
5: aren't 1
6: barely 3
7: but 4
8: can't 1
9: cannot 1
10: certain 2
11: certainly 2
12: colossal 2
13: colossally 2
14: couldn't 1
15: deep 2
16: deeply 2
17: definite 2
18: definitely 2
19: didn't 1
20: doesn't 1
21: don't 1
22: enormous 2
23: enormously 2
24: extreme 2
25: extremely 2
26: faintly 3
27: few 3
28: greatly 2
29: hardly 3
30: hasn't 1
31: haven't 1
32: heavily 2
33: heavy 2
34: high 2
35: highly 2
36: however 4
37: huge 2
38: hugely 2
39: immense 2
40: immensely 2
41: incalculable 2
42: incalculably 2
43: isn't 1
44: least 3
45: little 3
46: looks like 3
47: massive 2
48: massively 2
49: mightn't 1
50: more 2
51: much 2
52: mustn't 1
53: neither 1
54: never 1
55: no 1
56: nobody 1
57: none 1
58: nor 1
59: not 1
60: only 3
61: particular 2
62: particularly 2
63: purpose 2
64: purposely 2
65: quite 2
66: rarely 3
67: real 2
68: really 2
69: seldom 3
70: serious 2
71: seriously 2
72: severe 2
73: severely 2
74: shan't 1
75: shouldn't 1
76: significant 2
77: significantly 2
78: slightly 3
79: sparesly 3
80: sporadically 3
81: sure 2
82: surely 2
83: totally 2
84: true 2
85: truly 2
86: vast 2
87: vastly 2
88: very 2
89: very few 3
90: very little 3
91: wasn't 1
92: weren't 1
93: won't 1
94: wouldn't 1
x y
Warning message:
In update_key(valence_shifters_table, x = data.frame(x = c("Looks like"), :
One or more terms in the first column contain capital letters. Capitals are ignored.
I found the following suspects:
* Looks like
These terms have been lower cased.
from sentimentr.
Related Issues (20)
- Some words in Hu&Liu dict have off polarity values HOT 1
- Any hint why the two emoji approaches are different and in what circumstance which one is better?! HOT 1
- Amplifiers
- polarity_dt requires words with spaces to work HOT 1
- Sentimentr split words when there is an accented character HOT 5
- Totally different results in Qdap polarity and sentiment r HOT 4
- Changing polarity of words in the dictionary HOT 2
- Question about 'highlight' HOT 1
- how to calculate sum of sentiment? HOT 1
- Question about how the dictionary was built HOT 1
- .mygsub is slow HOT 8
- Transparency for lexicon::hash_sentiment_jockers_rinker HOT 1
- What is the theoretical range for the "sentiment" variable? HOT 1
- I wonder why this sentence is recognized as positive HOT 2
- Polarity categorization? HOT 1
- relabel "black" and "white" as neutral sentiments (black is currently negative; white is labeled positive)
- Overall positive and negative scores given a piece of text
- Positive and Negative Sentiment based on German Language text HOT 1
- how to get pos, neg, neu other than ave_sentiment ?
- plot.sentiment_by ordered fix?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sentimentr.