adrientremblay / comp472-mp1 Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 12.29 MB

Jupyter Notebook 100.00%

comp472-mp1's Introduction

Hi i'm Adrien 👋

🔭 I’m currently working on a super secret Godot game project
🌱 I’m currently improving my game dev skills
📫 How to reach me: [email protected]
🧗 In my free time, I run, read & code
🖥️ For more about me, check out my personal site: https://www.adrientremblay.com

comp472-mp1's People

Contributors

Watchers

comp472-mp1's Issues

2.1

Process the dataset using feature extraction.text.CountVectorizer to extract tokens/words
and their frequencies. Display the number of tokens (the size of the vocabulary) in the dataset.

2.3.3

Base-MLP: a Multi-Layered Perceptron (neural network.MLPClassifier) with the
default parameters.

For each of the 6 classifiers above and each of the classification tasks (emotion or sentiment),
produce and save the following information in a file called performance:
• a string clearly describing the model (e.g. the model name + hyper-parameter values) and the
classification task (emotion or sentiment)
• the confusion matrix – use metrics.confusion matrix
• the precision, recall, and F1-measure for each class, and the accuracy, macro-average F1 and
weighted-average F1 of the model – use metrics.classification report

2.3.6

Top-MLP: a better performing Multi-Layered Perceptron found using GridSearchCV.
The hyper-parameters that you will experiment with are:
• activation: sigmoid, tanh, relu and identity
• 2 network architectures of your choice: for eg, 2 hidden layers with 30 + 50 nodes and 3 hidden
layers with 10 + 10 + 10
• solver: Adam and stochastic gradient descent

3.1-3.4

2.3

Train and test the following classifiers, for both the emotion and the sentiment classification, using
word frequency as features.

2.3.4

Top-MNB: a better performing Multinomial Naive Bayes Classifier found using GridSearchCV.
The gridsearch will allow you to find the best combination of hyper-parameters, as determined
by the evaluation function that you have determined in step 1.3. The only hyper-parameter that
you will experiment with is alphafloat with values 0.5, 0 and 2 other values of your choice.

2.2

Split the dataset into 80% for training and 20% for testing. For this, you can use train test split.

2.3.2

Base-DT: a Decision Tree (tree.DecisionTreeClassifier) with the default parame-
ters.

2.3.1

Base-MNB: a Multinomial Naive Bayes Classifier (naive bayes.MultinomialNB.html)
with the default parameters.

2.3.5

Top-DT: a better performing Decision Tree found using GridSearchCV. The hyper-
parameters that you will experiment with are:
• criterion: gini or entropy
• max depth: 2 different values of your choice
• min samples split: 3 different values of your choice

2.5

Do your own exploration: Do only one of the following, depending on your own interest:
• Use tf-idf instead of word frequencies and redo all substeps of 2.3 above – you can use TfidfTransformer
for this. Display the results of this experiment.
• Remove stop words and redo all substeps of 2.3 above – you can use the parameter of CountVectorizer
for this. Display the results of this experiment.
• Play with train test split in order have different splits of 80% training, 20% test sets and
different sizes of training sets and redo all substeps of 2.3 above. Show and explain how the
performance of your models vary depending on the training/test sets are used.

1.3 - Create the graphs

Extract the posts and the 2 sets of labels (emotion and sentiment), then plot the distribution
of the posts in each category and save the graphic (a histogram or pie chart) in pdf. Do this for both
the emotion and the sentiment categories. You can use matplotlib.pyplot and savefig to do this.
This pre-analysis of the dataset will allow you to determine if the classes are balanced, and which
metric is more appropriate to use to evaluate the performance of your classifiers.