Data Collection
Keywords To collect a variety of tweets addressing some typical issues faced during the Covid-19 lockdown pandemic in India, the trending keywords based on Google Trends were analyzed and collected. The top 15 keywords selected are 'corona','virus','coronavirus', 'covid19', 'social', 'job' , 'loss' , 'jobloss', 'migrant' , 'treatment' , 'hospital', 'health', 'mask', 'lockdown', 'curfew'.
Location In order to express the sentiment of the whole country, it is mandatory to collect tweets from different location within the country. The tweets were collected with geotagged locations of six major cities in India, i.e, Delhi, Mumbai, Kolkata, Chennai, Bengaluru, Hyderabad. The first three cities represent North India and the latter three cities represent South India. The radius of area for each city was set to 100km.
Date Range The tweets were collected from February 1 to July 31, 2020.
Data Preprocessing
• Lower casing • Removal of Punctuations • Removal of Stopwords • Removal of Frequent words • Removal of Rare words • Stemming • Lemmatization • Conversion of emoticons to words • Conversion of emojis to words • Removal of URLs • Removal of HTML tags • Chat words conversion • Spelling correction • Removal of ‘#’ from hashtags • Addition of hashtags to tweet