University project at the course of Data Mining Technology for Business and Society concerning methods baseline or more advance for deal with text classification (NLP task) for Sentiment Analysis
The homework is organized in two different part:
-
given the dataset organized as list of reviews, containing the text and the sentiment of the review; 1 represents a positive review, while 0 represents a negative review, we will perform a tf-idf encoding of the data, and then train a classifier, optimising its hyper-parameters. Also we will compare the result with a pipeline of One-Hot-Encoding and PCA on the data, followed by a tuned KNeighborsClassifier.
-
given the same dataset, we will use pre-trained sentence embedding choosen in the "sentence-transformers" libraries and perform the best hyper-parameter optimisation we can afford in LESS than 10 minutes.