googleinterns / amaranth Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 24.81 MB

License: Apache License 2.0

Python 73.25% JavaScript 23.91% Makefile 1.38% CSS 1.46%

amaranth's People

Contributors

Stargazers

Watchers

amaranth's Issues

Save model after training

Currently a trained model exists only in the memory of the application. This means that once the application stops, the model is lost. If we want the model to be distributed or trained more later, there should be functionality to save it to disk.

Update train.py path names to navigate from project root

Currently path names in files such as train.py define path names relative to the current file's location. For the sake of consistency and simplicity, this should be modified to define path names relative to the project's root.

Add functionality to give custom input to model (interactive mode)

This enhancement will make it possible to see the ML model's output for any given string of text. This would be incredibly useful for sanity checks and future interactivity.

Identify key words model is recognizing

ML models are typically a black box. However, there are a number of different tactics to probe the inner-workings of them. One idea is to check the most influential words on the models output as a sanity check.

Add intermediate layers in ML model

Expected Behavior

Model takes in word embedding, processes it with intermediate layers, and sends output to nodes in final layer.

Actual Behavior

Model takes in word embedding and sends it directly to output nodes in final layer.

Define HTML structure of Grubhub.com

Inspecting the DOM structure of this webpage may allow us to directly access dish names by traversing the DOM in a specific way. This would allow us to classify dish names on that site.

Update requirements.txt

There are some Python dependencies that are implicit, and were left out of the requirements.txt file. These should be added back in before the project is wrapped up to ensure different environments can run the project easily.

Integrate new ml/amaranth_lib with ml/main

Writing tests for ml/amaranth_lib modified the functionality of some functions, and even resulted in the addition and removal of functions altogether. As a result, ml/main now calls functions that don't exist or that may no longer function as expected

Relocate Amaranth ML tests to be grouped with source

This will make it easier to separate the current Python tests from future Typescript tests for the Amaranth Chrome Extension.

Add dtypes to CSV reading code

Expected Behavior

CSV reading code tells Tensorflow what datatypes to expect for each column of every CSV file.

Actual Behavior

CSV reading code doesn't provide this information to Tensorflow, resulting in a slowdown for CSV reading code and unnecessary memory usage.

Steps to Reproduce the Problem

Switch branch to model-dev
Run ml/main.py
Observe warning sys:1: DtypeWarning: Columns (9) have mixed types.Specify dtype option on import or set low_memory=False.

Reformat project to fit the Python Package format

Example guide can be found on Python's site here.

This change would put the code into modules, which it attempts to do already. This change would be beneficial because type checking tools and the Python runtime currently disagree on how to interpret the projects format. This makes is difficult to automate code checks, and potentially makes is more difficult for people to understand this project.

Serialize Tokenization process

Although the ML model can be easily serialized using Tensorflow functions, the TextVectorization layer we were using previously cannot be. A new method of serializing this process must be found, and it must be compatible with javascript.

Refactor NLP Pipeline to use Keras TextVectorization

The current implementation for the model's NLP pipeline is all custom, and cannot be serialized in it's current form. By transitioning to the Keras TextVectorization layer, although experimental, we'll be able to easily serialize the entirety of the model's logic. This allows us to reload the ML model in other applications, such as the upcoming Chrome extension.

Complete Amaranth Chrome Extension prototype

Finish up the JS coding, and have a working sample to demo for my final presenation.

Split helper functions off from ml/main.py

The helper functions that are used to create the model in ml/main.py should be separated for the sake of readability and encapsulation. These functions also need to be tested to ensure correctness.

Setup repo for Amaranth Chrome Extension

Prepare the repo for development on the front-end Chrome extension. This includes creating the extension's development directory, possibly create the manifest.json, organize tests, etc.

Visualize embeddings in Tensorflow Embedding Projector

Tensorflow has an Embedding Projector tool that could be useful for inspecting what the model is learning. There is a tutorial on how to export an embedding for this tool here.

Create dish labeling pipeline

There are multiple steps involved with labeling a dish in our ML model. All of them must be implemented in Javascript to get the same results as the Python ML model.

Remove special characters from dish name
Convert dish name to lowercase
Split dish name on spaces
Feed dish name into model
Take softmax of outputs to get dish label

googleinterns / amaranth Goto Github PK

amaranth's People

Contributors

Stargazers

Watchers

amaranth's Issues

Expected Behavior

Actual Behavior

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Recommend Projects

Recommend Topics

Recommend Org