Giter Club home page Giter Club logo

wine-description-nlp's Introduction

Wine-Description-NLP

Natural Language Processing of Wine Description and origin Classification with Logistic Regression in PySpark

Introduction:

The curated data contains two fields – Origin and Description. Origin refers to the country of origin. It is either US or Non-US, implying wine US or Non-US made. The description refers to reviews/description written by wine critics.

Objective is to analyze the data to determine if the words used in the description can be used to predict wine’s country of origin. This is a end -to-end case. We will work with PySpark and MLlib’s functionality and implementation of ML algorithms to complete this project.

Steps followed and questions answered,

  1. Clear Statement of the problem to solve.
  2. Statement/Comment on the data collection (in terms of what is needed) and scope.
  3. Preliminary decision about the ML algorithm you will use. Justify your decision to use this particular algorithm
  4. Statement/comments on data preparation and clean up. What do you need to do to get data ready for the ML algorithm of your choice.
  5. Perform data prep and clean-up.
  6. Conduct Explanatory data analysis (EDA). Comment on data distribution.
  7. Comment on required data transformation needed to get the data ready for input to the ML algorithm of your choice. Remember, all data available in the CSV document are of string data type.
  8. Complete all needed data transformation. Use TF-IDF method of transforming token to their respective numeric values.
  9. Apply the ML algorithm of your choice. Use an 80/20 split for creating training and test dataset.
  10. Train the model using the training dataset. Comment on the overall model fit using evaluation metrics of your chosen ML algorithms. Comment on your findings and draw conclusions about the trained model’s accuracy and worthiness.
  11. Test/Evaluate your trained model with test dataset. Again, evaluate the model’s performance using the evaluation metrics available for your ML algorithm. Again, comment on your findings and draw conclusions about the model’s accuracy and worthiness in terms solving the problem you were trying to solve.

wine-description-nlp's People

Contributors

venkatasowjanyakoka avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.