Giter Club home page Giter Club logo

rimtouny / advanced-nlp-powered-sentiment-analysis-for-e-commerce-enhancement Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 3.14 MB

Using NLP and a smart chatbot, this project gauges customer sentiments online, offering customization and real-time feedback. Employing TF-BOW-LDA and ML models, it empowers e-commerce decisions, culminating in an NLP course at uOttawa in 2023.

License: MIT License

Jupyter Notebook 100.00%
nlp sentiment-analysis bag-of-words bow classfication clustering hierarchica k-means latent-dirichlet-allocation lda

advanced-nlp-powered-sentiment-analysis-for-e-commerce-enhancement's Introduction

Sentiment Analysis NLP

Using NLP, this project gauges customer sentiments online, offering customization and real-time feedback. Employing TF-BOW-LDA and ML models on train.csv dataset, it empowers e-commerce decisions, culminating in an NLP course at uOttawa in 2023.

  • Required libraries: scikit-learn, pandas, matplotlib.
  • Execute cells in a Jupyter Notebook environment.
  • The uploaded code has been executed and tested successfully within the Google Colab environment.

Supervised Sentiment Analysis for Text classification problem

Perform supervised sentiment analysis to categorize user sentiments into three classes: Positive, Negative, and Neutral.

Independent Variables:

  • 'name': Name of the product.
  • 'brand': Brand of the product.
  • 'categories': Categories associated with the product.
  • 'primaryCategories': Primary category of the product.
  • 'reviews.date': Date of the review.
  • 'reviews.text': Text content of the review.
  • 'reviews.title': Title of the review.

Target variable:

  • 'sentiment': Dependent variable indicating the sentiment (Positive, Negative, Neutral) of the review.

Key Tasks Undertaken

  1. Data Explore:

    • The most common keywords and their counts. image

    • The most common Positive words using WorldCloud. image

    • The most common Negative words using WorldCloud. image

    • The most common Neutral words using WorldCloud. image

  2. Data Preparation:

    • Data Cleaning

      • Handling Missing Data: The dataset has a very low percentage of missing cells (less than 0.1%) (10 values in reviews.title ), so we can safely drop or impute those missing values based on the specific context.
      • Handling Duplicate Rows: The dataset has 1.5% duplicate rows, which can be removed to ensure data integrity
    • Renaming and Dropping Columns:Renamed the columns 'reviews.text,' 'reviews.title,' and 'reviews.date' to 'reviews_text,' 'reviews_title,' and 'reviews_date,' respectively. Additionally, Dropped the columns 'name,' 'brand,' 'categories,' 'primaryCategories,' and 'reviews.date' from the dataset.

    • Sentiment Label Encoding: Created a mapping dictionary for sentiment labels and encoded the 'sentiment' column into numerical form (1 for 'Positive,' -1 for 'Negative,' and 0 for 'Neutral').

    • Create new Column ‘Polarity Scores’: Apply the SentimentIntensityAnalyzer to the 'reviews_text' column to calculate polarity scores for each review. Polarity scores represent the sentiment of the text as a continuous value between -1 (negative) and 1 (positive).

    • Balancing Data : The classes are imbalanced, you may consider applying techniques like SMOTE to balance the data. image

  3. Text Feature Engineering:

    • Normalizing Case Folding: Convert all text to lowercase to ensure consistent comparisons between words.
    • Removing Punctuation: Eliminate special characters and punctuation marks from the text to avoid any interference in analysis.
    • Removing Numbers: Exclude numerical digits from the text as they may not be relevant for certain tasks like sentiment analysis.
    • Removing Stopwords: Remove common words that do not carry much meaning (e.g., "the," "and," "is") using stopwords from the
    • English language.Remove Rare Words: Eliminate words that appear infrequently in the dataset, as they may not contribute significantly to the analysis.
    • Lemmatization: Convert words to their base or root form (lemmas) to reduce inflected words to a common base form. For example, "running," "runs," and "ran" will all be transformed to "run."
  4. Text Transformations:

    • Bag-of-Words (BOW): Similar to TF, but it also ignores the frequency and considers only whether a word appears or not (binary representation).

    • Term Frequency-Inverse Document Frequency (TF-IDF): Convert the text data into a bag-of-words representation, where each document is represented as a vector of word frequencies in the corpus.

    • Latent Dirichlet Allocation (LDA): Perform topic modeling to extract latent topics from the text data. Each document is represented as a mixture of topics.

  5. Modeling

    • Classfication (Random Forest , SVM , Logistic Regression , Gaussian Navie Bayes)

      • BOW Technique

      • TF-IDF Technique

      • LDA Technique

    • Clustering ( K-Means , Hierarchical)

      • BOW Technique
            Silhouette Score (K-Means): 81.55401438608376
            Silhouette Score (Hierarchical) : 17.925024032592773

      • TF-IDF Technique
            Silhouette Score (K-Means): 0.7683612431807604
            Silhouette Score (Hierarchical) : 17.966507375240326

      • LDA Technique
            Silhouette Score (K-Means): 81.55401438608376
             Silhouette Score (Hierarchical) 16.194509

  6. Evaluations

  7. Champion Model

advanced-nlp-powered-sentiment-analysis-for-e-commerce-enhancement's People

Contributors

rimtouny avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

etanafik

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.