- ๐ญ Iโm currently working on a super secret Godot game project
- ๐ฑ Iโm currently improving my game dev skills
- ๐ซ How to reach me: [email protected]
- ๐ง In my free time, I run, read & code
- ๐ฅ๏ธ For more about me, check out my personal site: https://www.adrientremblay.com
comp472-mp1's Introduction
comp472-mp1's People
comp472-mp1's Issues
2.1
Process the dataset using feature extraction.text.CountVectorizer to extract tokens/words
and their frequencies. Display the number of tokens (the size of the vocabulary) in the dataset.
2.3.3
Base-MLP: a Multi-Layered Perceptron (neural network.MLPClassifier) with the
default parameters.
2.4
For each of the 6 classifiers above and each of the classification tasks (emotion or sentiment),
produce and save the following information in a file called performance:
โข a string clearly describing the model (e.g. the model name + hyper-parameter values) and the
classification task (emotion or sentiment)
โข the confusion matrix โ use metrics.confusion matrix
โข the precision, recall, and F1-measure for each class, and the accuracy, macro-average F1 and
weighted-average F1 of the model โ use metrics.classification report
2.3.6
Top-MLP: a better performing Multi-Layered Perceptron found using GridSearchCV.
The hyper-parameters that you will experiment with are:
โข activation: sigmoid, tanh, relu and identity
โข 2 network architectures of your choice: for eg, 2 hidden layers with 30 + 50 nodes and 3 hidden
layers with 10 + 10 + 10
โข solver: Adam and stochastic gradient descent
3.1-3.4
2.3
Train and test the following classifiers, for both the emotion and the sentiment classification, using
word frequency as features.
2.3.4
Top-MNB: a better performing Multinomial Naive Bayes Classifier found using GridSearchCV.
The gridsearch will allow you to find the best combination of hyper-parameters, as determined
by the evaluation function that you have determined in step 1.3. The only hyper-parameter that
you will experiment with is alphafloat with values 0.5, 0 and 2 other values of your choice.
2.2
Split the dataset into 80% for training and 20% for testing. For this, you can use train test split.
2.3.2
Base-DT: a Decision Tree (tree.DecisionTreeClassifier) with the default parame-
ters.
2.3.1
Base-MNB: a Multinomial Naive Bayes Classifier (naive bayes.MultinomialNB.html)
with the default parameters.
2.3.5
Top-DT: a better performing Decision Tree found using GridSearchCV. The hyper-
parameters that you will experiment with are:
โข criterion: gini or entropy
โข max depth: 2 different values of your choice
โข min samples split: 3 different values of your choice
2.5
Do your own exploration: Do only one of the following, depending on your own interest:
โข Use tf-idf instead of word frequencies and redo all substeps of 2.3 above โ you can use TfidfTransformer
for this. Display the results of this experiment.
โข Remove stop words and redo all substeps of 2.3 above โ you can use the parameter of CountVectorizer
for this. Display the results of this experiment.
โข Play with train test split in order have different splits of 80% training, 20% test sets and
different sizes of training sets and redo all substeps of 2.3 above. Show and explain how the
performance of your models vary depending on the training/test sets are used.
1.3 - Create the graphs
Extract the posts and the 2 sets of labels (emotion and sentiment), then plot the distribution
of the posts in each category and save the graphic (a histogram or pie chart) in pdf. Do this for both
the emotion and the sentiment categories. You can use matplotlib.pyplot and savefig to do this.
This pre-analysis of the dataset will allow you to determine if the classes are balanced, and which
metric is more appropriate to use to evaluate the performance of your classifiers.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.