Giter Club home page Giter Club logo

comp472-mp1's Introduction

Hi i'm Adrien ๐Ÿ‘‹

  • ๐Ÿ”ญ Iโ€™m currently working on a super secret Godot game project
  • ๐ŸŒฑ Iโ€™m currently improving my game dev skills
  • ๐Ÿ“ซ How to reach me: [email protected]
  • ๐Ÿง— In my free time, I run, read & code
  • ๐Ÿ–ฅ๏ธ For more about me, check out my personal site: https://www.adrientremblay.com

comp472-mp1's People

Contributors

adrientremblay avatar kilner99 avatar rosscopico avatar

Watchers

 avatar  avatar

comp472-mp1's Issues

2.1

Process the dataset using feature extraction.text.CountVectorizer to extract tokens/words
and their frequencies. Display the number of tokens (the size of the vocabulary) in the dataset.

2.3.3

Base-MLP: a Multi-Layered Perceptron (neural network.MLPClassifier) with the
default parameters.

2.4

For each of the 6 classifiers above and each of the classification tasks (emotion or sentiment),
produce and save the following information in a file called performance:
โ€ข a string clearly describing the model (e.g. the model name + hyper-parameter values) and the
classification task (emotion or sentiment)
โ€ข the confusion matrix โ€“ use metrics.confusion matrix
โ€ข the precision, recall, and F1-measure for each class, and the accuracy, macro-average F1 and
weighted-average F1 of the model โ€“ use metrics.classification report

2.3.6

Top-MLP: a better performing Multi-Layered Perceptron found using GridSearchCV.
The hyper-parameters that you will experiment with are:
โ€ข activation: sigmoid, tanh, relu and identity
โ€ข 2 network architectures of your choice: for eg, 2 hidden layers with 30 + 50 nodes and 3 hidden
layers with 10 + 10 + 10
โ€ข solver: Adam and stochastic gradient descent

2.3

Train and test the following classifiers, for both the emotion and the sentiment classification, using
word frequency as features.

2.3.4

Top-MNB: a better performing Multinomial Naive Bayes Classifier found using GridSearchCV.
The gridsearch will allow you to find the best combination of hyper-parameters, as determined
by the evaluation function that you have determined in step 1.3. The only hyper-parameter that
you will experiment with is alphafloat with values 0.5, 0 and 2 other values of your choice.

2.2

Split the dataset into 80% for training and 20% for testing. For this, you can use train test split.

2.3.2

Base-DT: a Decision Tree (tree.DecisionTreeClassifier) with the default parame-
ters.

2.3.1

Base-MNB: a Multinomial Naive Bayes Classifier (naive bayes.MultinomialNB.html)
with the default parameters.

2.3.5

Top-DT: a better performing Decision Tree found using GridSearchCV. The hyper-
parameters that you will experiment with are:
โ€ข criterion: gini or entropy
โ€ข max depth: 2 different values of your choice
โ€ข min samples split: 3 different values of your choice

2.5

Do your own exploration: Do only one of the following, depending on your own interest:
โ€ข Use tf-idf instead of word frequencies and redo all substeps of 2.3 above โ€“ you can use TfidfTransformer
for this. Display the results of this experiment.
โ€ข Remove stop words and redo all substeps of 2.3 above โ€“ you can use the parameter of CountVectorizer
for this. Display the results of this experiment.
โ€ข Play with train test split in order have different splits of 80% training, 20% test sets and
different sizes of training sets and redo all substeps of 2.3 above. Show and explain how the
performance of your models vary depending on the training/test sets are used.

1.3 - Create the graphs

Extract the posts and the 2 sets of labels (emotion and sentiment), then plot the distribution
of the posts in each category and save the graphic (a histogram or pie chart) in pdf. Do this for both
the emotion and the sentiment categories. You can use matplotlib.pyplot and savefig to do this.
This pre-analysis of the dataset will allow you to determine if the classes are balanced, and which
metric is more appropriate to use to evaluate the performance of your classifiers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.