googleinterns / amaranth Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Currently a trained model exists only in the memory of the application. This means that once the application stops, the model is lost. If we want the model to be distributed or trained more later, there should be functionality to save it to disk.
Currently path names in files such as train.py
define path names relative to the current file's location. For the sake of consistency and simplicity, this should be modified to define path names relative to the project's root.
This enhancement will make it possible to see the ML model's output for any given string of text. This would be incredibly useful for sanity checks and future interactivity.
ML models are typically a black box. However, there are a number of different tactics to probe the inner-workings of them. One idea is to check the most influential words on the models output as a sanity check.
Model takes in word embedding, processes it with intermediate layers, and sends output to nodes in final layer.
Model takes in word embedding and sends it directly to output nodes in final layer.
Inspecting the DOM structure of this webpage may allow us to directly access dish names by traversing the DOM in a specific way. This would allow us to classify dish names on that site.
There are some Python dependencies that are implicit, and were left out of the requirements.txt file. These should be added back in before the project is wrapped up to ensure different environments can run the project easily.
Writing tests for ml/amaranth_lib modified the functionality of some functions, and even resulted in the addition and removal of functions altogether. As a result, ml/main now calls functions that don't exist or that may no longer function as expected
This will make it easier to separate the current Python tests from future Typescript tests for the Amaranth Chrome Extension.
CSV reading code tells Tensorflow what datatypes to expect for each column of every CSV file.
CSV reading code doesn't provide this information to Tensorflow, resulting in a slowdown for CSV reading code and unnecessary memory usage.
sys:1: DtypeWarning: Columns (9) have mixed types.Specify dtype option on import or set low_memory=False.
Example guide can be found on Python's site here.
This change would put the code into modules, which it attempts to do already. This change would be beneficial because type checking tools and the Python runtime currently disagree on how to interpret the projects format. This makes is difficult to automate code checks, and potentially makes is more difficult for people to understand this project.
Although the ML model can be easily serialized using Tensorflow functions, the TextVectorization
layer we were using previously cannot be. A new method of serializing this process must be found, and it must be compatible with javascript.
The current implementation for the model's NLP pipeline is all custom, and cannot be serialized in it's current form. By transitioning to the Keras TextVectorization layer, although experimental, we'll be able to easily serialize the entirety of the model's logic. This allows us to reload the ML model in other applications, such as the upcoming Chrome extension.
Finish up the JS coding, and have a working sample to demo for my final presenation.
The helper functions that are used to create the model in ml/main.py should be separated for the sake of readability and encapsulation. These functions also need to be tested to ensure correctness.
Prepare the repo for development on the front-end Chrome extension. This includes creating the extension's development directory, possibly create the manifest.json, organize tests, etc.
Tensorflow has an Embedding Projector tool that could be useful for inspecting what the model is learning. There is a tutorial on how to export an embedding for this tool here.
There are multiple steps involved with labeling a dish in our ML model. All of them must be implemented in Javascript to get the same results as the Python ML model.
All well-documented open-source projects have a descriptive README.md file. This is important to introduce users to the repo and briefly state it's purpose. One solid template can be found here.
Currently the model goes through only one epoch of training. This should be increased to ensure it learns the data set accurately.
There is currently no public documentation to run any of the code stored in this repository. One way to make it more accessible is to create a Makefile to easily run different parts of the codebase.
The shift to a new serialized ML model structure broke the interface that Amaranth's interactive mode functioned. This should be fixed by implementing text pre-processing separate from the ML model itself.
Allow the user to type in arbitrary strings and have them classified as high or low calorie. This could happen in a little settings window for the Chrome extension, allowing for easy and quick access.
Javascript is easy to mess up, so it's important to use a linter to catch simple mistakes. Google has an eslint styleguide, so eslint would probably be the easiest linter to use.
In order to verify that the model is working on web pages we need identifiers. The encapsulation of the HTML fragments is yet to be determined as well.
Internal docs exist for the model, but putting them in the same place as code, issues, project tracking, etc. would make them more visible to me as I'm working, and more visible to anyone else visiting the repository.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.