Giter Club home page Giter Club logo

movie_review_classification's Introduction

Movie_review_classification

Three are three parts for this project.

Part 1: This corpus vocabulary is what would represent the content of each different document for clustering and classification purposes, which will be our next step. This means that we need to make decisions - what is in, what is out. To decide on whether or not we will keep a term, we need to know that: It is important, in at least one (preferably more) documents, and It is prevalent, in at least two or three of the documents.

Use some of NLP text mining techniques to clean the text.

  • remove punctuation
  • lower case
  • remove tags
  • remove special chars and digits
  • lemmatization
  • remove stop words

Calculate the TF-IDF score to find the most popular term in the dataset. Use Word2vec to calcuate the word vectors. calculate the cosine similarity across corpus using TFIDF matrix.

Part 2:

Clustering:

Used K-means and TF-IDF to create clustering model to identify the movies in the same genres. Manipulating the number of k to get the best k-means model

Sentiment Analysis: Use the class corpus and do sentiment analysis for the positive and negative reviews. Use SVM, logistic, naive bayers and random forest to predict the positive and negative reviews

Topic Modeling:

Use the entire class corpus. Try LSA and LDA methods to do topic modeling, on the class corpus. Manipulating the number of topics to get the best model performance

Create ontology knowledge graph to identify the relationship between each entity

movie_review_classification's People

Contributors

myl941222 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.