jrsmith3 / minimum_sugar Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 557 KB

Python 100.00%

minimum_sugar's People

Contributors

Watchers

Forkers

deanmalmgren

minimum_sugar's Issues

Refactor `menu_histogram` to handle SQLite

Flatten structure of data

The menu item data is presently held in a data structure that has unnecessary nesting. The menu item data should be contained in a list of dicts; each dict representing an individual menu item. Presently, the organization according to restaurant needlessly duplicates data and adds complexity.

This new, flatter structure requires that I write new tools to filter the data.

Check categorization of menu items using reported values

Some of the text reports generated in report.ipynb have results that indicate a problem in categorization. For example, in the "Maximum sugar" subsection the code

wendys_menu_items = minimum_sugar.filter_menu_items(menu_data, "brand_name", "Wendy's")
wendys_entree_items = minimum_sugar.filter_menu_items(wendys_menu_items, "menu_category", "entree")
wendys_high_sugar_menu_items = [menu_item for menu_item in wendys_entree_items if menu_item["nf_sugars"] > 18]
for menu_item in wendys_high_sugar_menu_items:
    print menu_item["item_name"] + ":", menu_item["nf_sugars"]

returns a result "Double Chocolate Chip Cookie: 28". Clearly that menu item is mis-categorized.

Separate non-entree menu items

I only care about the sugar content of the entree menu items for any particular restaurant. Thus I should separate the following categories:

beverage
condiment
side order
dessert

Tag and release

I am nearing the point where I can release this report on my blog a la #11. I will tag the commit that gets posted to the blog using the YYYYMMDD rubric (I don't think semver applies here). Additionally, I need to tag a commit for which the notebook cells containing plotting directives have been executed and the plots generated.

Post report to blog on jrsmith3.github.io

Once #10 is closed, post the report to my blog.

Note the url of this repo in `report.ipynb`

Eventually I am going to post report.ipynb on my blog (cf. #11). A link to this minimum_sugar repo should appear in that file so that people can see the source.

Plot histogram of sugar data

I can plot histogram data of the various sugar content of the menu items. These would probably be a nice visualization.

Separate report ipynb from data wrangling ipynb

The information in report.ipynb as of 36b5061 contains both data wrangling code and report copy/code. The data wrangling component should be separated into its own notebook.

Collect list of all restaurants and corresponding UIDs Nutritionix has

In closing issue #1, I got a small subset of the restaurant names and corresponding UIDs contained in the Nutritionix database. I should get the UIDs for all of the restaurants in the database.

Functionality to grab all of the nutrition data for all of a restaurant's menu items

I need to be able to fetch a list of menu items given an arbitrary restaurant UID; each item in that list should contain all of the nutritional data available.

Refactor `menu_histogram` to use pyplot.hist

See this SO question for an example of using pyplot.hist as opposed to the pyplot.bar I was using.

Download data I need

I want a local copy of this data so I don't have to keep hitting Nutritionix's server.

Handle duplicate entries as records are added to the database

As noted in #20, the Nutritionix API sometimes returns duplicate menu items. These duplicates need to be handled before attempting to add items to the SQLite database.

Generate dict mapping restaurant names to UIDs

Nutritionix identifies restaurants by a unique ID number. For example, according to the API documentation, McDonald's ID is 513fbc1283aa2dc80c000053. I frequent the following places and need to determine the Nutritionix ID number for each:

McDonalds
Wendy's
Taco Bell
Qdoba
Chipotle
Five Guys
Costco

Add functionality to normalize histograms

I need functionality to make the histogram plots look uniform. Currently, the x scale for each restaurant is different because each restaurant has a different distribution of menu items. Additionally, the widths of the boxes in the histogram are different.

I am ultimately going to plot these histograms in a column and so the horizontal and vertical scales should match.

Organize code in fewer files

Presently (commit 1a7c11c), there are several files containing python source. The code in these files should be combined into a single library.

The following files should be concatenated:

data_manipulation.py
fetch_data.py
fetch_restaurant_ids.py

The file restaurant_menus.ipynb should be updated to reflect the change in the library.

Refactor to use SQLite instead of the list of dicts

I created a file named menu_data.json based on Nutritionix API calls, but I never committed that file to this repo. Nutritionix was nice enough to let me use their data, and I'm pretty sure they don't want me publishing it in my repo.

The menu_data.json file contains a list of dicts. A lot of the analysis would be easier if the menu data were contained in a SQLite database file. Thus, I need to:

Move data from menu_data.json to menu_data.db.
Refactor code to access menu_data.db instead.

`fetch_menu_item_data` returns duplicates

The following code will yield duplicate menu items.

# Assume `credentials` is a dictionary holding Nutritionix API credentials.
import minimum_sugar
import collections

# ID value 513fbc1283aa2dc80c000053 corresponds to McDonald's
menu_items = minimum_sugar.fetch_menu_item_data("513fbc1283aa2dc80c000053", credentials)
item_ids = [menu_item["item_id"] for menu_item in menu_items]

dups = [item for item, count in collections.Counter(item_ids).items() if count > 1]

print len(item_ids)
print len(item_ids) - len(dups)
print len(dups)

# Returns
#359
#347
#12

Write up report

Once #9 is closed, I need to write up a report of the results along with some development notes.

Refactor `print_max_sugar_menu_item` to handle SQLite

Refactor code to leverage `entree_items` list

Early in the report.ipynb, the following line of code occurs:

entree_items = minimum_sugar.filter_menu_items(menu_data, "menu_category", "entree")

Many times following that line, I re-sort the entree items from menu_data. I should refactor that code to rely on entree_items instead so I don't look like an amateur.

Write function to determine max value for particular restaurant menu item

I've repeated code that looks like the following as of 3143fbf

restaurant_name = "Taco Bell"

restaurant_menu_items = minimum_sugar.filter_menu_items(menu_data, "brand_name", restaurant_name)
restaurant_entree_items = minimum_sugar.filter_menu_items(restaurant_menu_items, "menu_category", "entree")
max_sugar = max(minimum_sugar.extract_variable(restaurant_entree_items, "nf_sugars"))

print "Max sugar:", max_sugar
menu_items = minimum_sugar.filter_menu_items(restaurant_entree_items, "nf_sugars", max_sugar)
for menu_item in menu_items:
    print "Item name:", menu_item["item_name"]

in order to determine the entree menu item(s) containing the most sugar for a particular restaurant. This code should be factored into its own function instead of copying and pasting it all over the place.

jrsmith3 / minimum_sugar Goto Github PK

minimum_sugar's People

Contributors

Watchers

Forkers

minimum_sugar's Issues

Recommend Projects

Recommend Topics

Recommend Org