Giter Club home page Giter Club logo

food-ingredients-for-good's Introduction

food-ingredients-for-good

Plan

  1. Data discovery
    • Kaggle dataset
    • Recipe data
  2. Data model
    • fact
    • dim
      • UOM
  3. EDA
  4. Classifying ingredients
    • Clustering
      • Numeric features
      • Unstructured features
    • Deep learning from unstructured text
  5. Conversions
    • Adding this step for now
  6. Recipe comparison
  7. Data pipeline
    • preprocessing (cleanse and prepare data)
      • pre deployment
      • post deployment
    • predictions
    • consumption
  8. Refactor code?

virtual environment

pip3 install virtualenv virtualenv venv source venv/bin/activate

virtual environment setup guide

EDA Questions

  1. What foods can be substituted for meats and still have the same amount of protein?
  2. How many groups of foods are there based on ingredients?
  3. Can we classify foods simply based on their ingredients? Does it make intuitive sense?
  4. What foods have the highest sugar, protein, fat, or calories?

Clustering

Each observation in the dataset has a unique label (both a 'key', "NBD_No", and a text label for the ingredient, "Descrip"). My hypothesis is that similar foods have similar nutritional values, meaning we may be able to see distinct groupings of these foods based on their nutritional value.

I examine three (3) different clustering algorithms for this dataset:

  1. Kmeans
  2. Agglomerative Clustering
  3. DBSCAN

Result: There is some utility in clustering observations by their nutritional value, because some clusters have very similar foods assigned to them. However, this is not a flawless approach, since clusters will also contain very different foods, making it difficult to say what the cluster represents (in terms of food). Additionally, a numeric approach using the silhouette score for clustering algorithms shows that the clusters still overlap and are not distinct from each other.

Next Steps:

  • Keep a baseline clustering method available and iterate to improve
  • Examine other ways to assign foods to a group or give them a label using their nutritional value.
    • Deep learning on a subset of manually labeled ingredients
      • Decide on a apriori labeling method for food ingredients

Product

Directories

  • data

References

https://www.kaggle.com/datasets/thedevastator/now-with-more-nutrients

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.