- Find an upcoming match
- Start the Twitter streaming API one hour before
- End the stream one hour after
- Plot sentiment over time using R
- Look at past data as well
- Build a sentimental analysis machine
- Start with a basic one
- Then add one capable of learning
- Interact with the data in an R Shiny app
- If two matches are playing at the same time, what to do? Join all the search terms into one Stream request and somehow extricate the tweets later? Or just pick one match to follow (say, using an RNG) and ignore the other(s) on this occasion?
- The official hashtags alone are not much use. For example, #EFC was a much more popular hashtag than #WBAFC, suggesting we should also use the API shortname (maybe still as hashtags?)
Manually create a dictionary of possible search terms for each team playing: (e.g. Tottenham Hotspur FC might call for "Tottenham", "Spurs", "Tottenham Hotspur" and so on.) However this can be accepted as a limitation of the app for now; Twitter's search engine may be clever enough to work out which Tweets to return anyway...Got a list of 'official' hashtags for each team in the 14-15 Premier League (see References).Improve the streamer so it writes useful information to a file, including tweet text, hashtags (maybe) and time of tweet. If possible exclude retweetsFind a way to schedule the streamer to start before kick off and the end after the final whistle. This may involve Windows Task Scheduler, or else a small helper script that periodically checks conditions and then triggers the main streamer (etc) when a match is taking place.Write an initial Python script to perform sentiment analysis using a corpus of positive and negative words. Be sure to remove stop words, hyperlinks etc.- Later improve with some form of machine learning to assign sentiment scores to terms not found in corpus
- Strip down the data to team (or hashtag), sentiment score and timestamp. Create density plot.
- Maybe build into a Shiny app that allows adjustment of bandwidth of the density plot / width of bars in histograms. Host on http://shinyapps.io