There are some general library requirements for the project and some which are specific to individual methods. The general requirements are as follows. ● Numpy, Pandas
● scikit-learn
● scipy
● nltk
The library requirements specific to some methods are:
● Multinomial Naive Bayes
● SVM
● Logistic Regression
● xgboost for XGBoost
● Linear SVC
● .Random Forest
Importing Important Libraries
● pandas,numpy,nltk,re,future,matplotlib.pyplot
● train_test_split,GridSearchCV
● CountVectorizer, TfidfVectorizer
● TfidfTransformer
● BernoulliNB, MultinomialNB
● metrics,roc_auc_score
● accuracy_score,label_binarize,LogisticRegression
● Pipeline,svm,LinearSVC,SVR
● RandomForestClassifier,DecisionTreeClassifier
● BeautifulSoup,stopwords,SnowballStemmer
● Mounting from google drive or any local path ● Using pd.read_csv and encoding latin
● Lowercasing the letter
● Removing Usernames
● Removing URLs
● Removing all digits
● Removing Quotations
● Replacing Emojis with their corresponding sentiment part eg : positive emoji or negative emojis
● Replacing contractions
● Removing punctuations
● Replacing double spaces with single spaces
● Showing Plots
● Showing Word clouds
● Used Count Vectorizer
● Used Multinomial Naive Bayes
● Used Linear SVC
● Used Logistic Regression
● Used SVM
● Using Decision Trees
● Using Xgb
One has to simply open the colab file and keep on running all the codes.Give path for reading the csv file.The first 3 classifiers are showing the best results
- Logistic Regression
- Multinomial Naive Bayes
- Linear SVC
Below is the results of the accuracy results of all the three classifiers used above to predict the model.
We see that Logistic Regression has performed better as compared to the other 2.Linear SVC and Multinomial had almost done equally better.