Giter Club home page Giter Club logo

amaranth's People

Contributors

dependabot[bot] avatar ryanulep avatar timmy-ch avatar tommylau-exe avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

amaranth's Issues

Save model after training

Currently a trained model exists only in the memory of the application. This means that once the application stops, the model is lost. If we want the model to be distributed or trained more later, there should be functionality to save it to disk.

Update train.py path names to navigate from project root

Currently path names in files such as train.py define path names relative to the current file's location. For the sake of consistency and simplicity, this should be modified to define path names relative to the project's root.

Identify key words model is recognizing

ML models are typically a black box. However, there are a number of different tactics to probe the inner-workings of them. One idea is to check the most influential words on the models output as a sanity check.

Add intermediate layers in ML model

Expected Behavior

Model takes in word embedding, processes it with intermediate layers, and sends output to nodes in final layer.

Actual Behavior

Model takes in word embedding and sends it directly to output nodes in final layer.

Define HTML structure of Grubhub.com

Inspecting the DOM structure of this webpage may allow us to directly access dish names by traversing the DOM in a specific way. This would allow us to classify dish names on that site.

Update requirements.txt

There are some Python dependencies that are implicit, and were left out of the requirements.txt file. These should be added back in before the project is wrapped up to ensure different environments can run the project easily.

Integrate new ml/amaranth_lib with ml/main

Writing tests for ml/amaranth_lib modified the functionality of some functions, and even resulted in the addition and removal of functions altogether. As a result, ml/main now calls functions that don't exist or that may no longer function as expected

Add dtypes to CSV reading code

Expected Behavior

CSV reading code tells Tensorflow what datatypes to expect for each column of every CSV file.

Actual Behavior

CSV reading code doesn't provide this information to Tensorflow, resulting in a slowdown for CSV reading code and unnecessary memory usage.

Steps to Reproduce the Problem

  1. Switch branch to model-dev
  2. Run ml/main.py
  3. Observe warning sys:1: DtypeWarning: Columns (9) have mixed types.Specify dtype option on import or set low_memory=False.

Reformat project to fit the Python Package format

Example guide can be found on Python's site here.

This change would put the code into modules, which it attempts to do already. This change would be beneficial because type checking tools and the Python runtime currently disagree on how to interpret the projects format. This makes is difficult to automate code checks, and potentially makes is more difficult for people to understand this project.

Serialize Tokenization process

Although the ML model can be easily serialized using Tensorflow functions, the TextVectorization layer we were using previously cannot be. A new method of serializing this process must be found, and it must be compatible with javascript.

Refactor NLP Pipeline to use Keras TextVectorization

The current implementation for the model's NLP pipeline is all custom, and cannot be serialized in it's current form. By transitioning to the Keras TextVectorization layer, although experimental, we'll be able to easily serialize the entirety of the model's logic. This allows us to reload the ML model in other applications, such as the upcoming Chrome extension.

Split helper functions off from ml/main.py

The helper functions that are used to create the model in ml/main.py should be separated for the sake of readability and encapsulation. These functions also need to be tested to ensure correctness.

Setup repo for Amaranth Chrome Extension

Prepare the repo for development on the front-end Chrome extension. This includes creating the extension's development directory, possibly create the manifest.json, organize tests, etc.

Create dish labeling pipeline

There are multiple steps involved with labeling a dish in our ML model. All of them must be implemented in Javascript to get the same results as the Python ML model.

  1. Remove special characters from dish name
  2. Convert dish name to lowercase
  3. Split dish name on spaces
  4. Feed dish name into model
  5. Take softmax of outputs to get dish label

Update README.md

All well-documented open-source projects have a descriptive README.md file. This is important to introduce users to the repo and briefly state it's purpose. One solid template can be found here.

Increase training epochs

Currently the model goes through only one epoch of training. This should be increased to ensure it learns the data set accurately.

Add interactive interface to Chrome extension

Allow the user to type in arbitrary strings and have them classified as high or low calorie. This could happen in a little settings window for the Chrome extension, allowing for easy and quick access.

Create prototype calorie labels

In order to verify that the model is working on web pages we need identifiers. The encapsulation of the HTML fragments is yet to be determined as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.