amazon-science / carbon-assessment-with-ml Goto Github PK

CaML: Carbon Footprinting of Household Products with Zero-Shot Semantic Text Similarity

Home Page: https://www.amazon.science/publications/caml-carbon-footprinting-of-household-products-with-zero-shot-semantic-text-similarity

License: Apache License 2.0

Python 6.59% Jupyter Notebook 93.41%

carbon-assessment-with-ml's Introduction

Carbon assessment with machine learning

This code repository presents a machine learning based method for selection of an Environmental Impact Factor (EIF) for a given product, material, or activity, which is a fundamental step of carbon footprinting. The code documents the methods in the following research papers.

EIF matching for EIO-LCA, published in WWW 2023 --
CaML: Carbon Footprinting of Household Products with Zero-Shot Semantic Text Similarity
Bharathan Balaji, Venkata Sai Gargeya Vunnava, Geoffrey Guest, Jared Kramer
EIF matching for Process LCA, published in ACM JCSS --
Flamingo: Environmental Impact Factor Matching for Life Cycle Assessment with Zero-Shot Machine Learning Bharathan Balaji, Venkata Sai Gargeya Vunnava, Shikhar Gupta, Nina Domingo, Harsh Gupta, Geoffrey Guest, Aravind Srinivasan

Installation

Required packages are given in requirements.txt Run the following commands to install the package:

git clone https://github.com/amazon-science/carbon-assessment-with-ml.git
cd carbon-assessment-with-ml
pip install -r requirements.txt
pip install -e .

Getting Started

Follow the code in notebooks folder.
For EIO-LCA use: notebooks/eio/demo.ipynb
for process LCA use: notebooks/process/generate_ranked_preds.ipynb

Dataset

The dataset is for research purposes only, and is not indicative of Amazon’s business use for carbon footprinting.

The dataset consists of retail products mapped to North American Industry Classification System (NAICS) codes. The mapping was done with Amazon Mechanical Turk, aggregating ground truth from 5 annotations per product. The dataset is the basis of estimating the carbon emissions of a product using Economic Input-Output Life Cycle Assessment (EIO-LCA). Dataset is stored as a Pandas dataframe.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the terms of the Apache 2.0 license. See LICENSE. Included datasets are licensed under the terms of the CDLA Permissive license, version 2.0. See LICENSE-DATA.

Citation

Below is the BibTeX text, if you would like to cite our work.

@Inproceedings{Balaji2023CaML,
 author = {Bharathan Balaji and Geoffrey Guest and Venkata Sai Gargeya Vunnava and Jared Kramer},
 title = {CaML: Carbon footprinting of household products with zero-shot semantic text similarity},
 year = {2023},
 url = {https://www.amazon.science/publications/caml-carbon-footprinting-of-household-products-with-zero-shot-semantic-text-similarity},
 booktitle = {The Web Conference 2023},
}

@Inproceedings{Balaji2023Flamingo,
 author = {Bharathan Balaji and Venkata Sai Gargeya Vunnava and Shikhar Gupta and Nina Domingo and Harsh Gupta and Geoffrey Guest and Aravind Srinivasan},
 title = {Flamingo: Environmental Impact Factor Matching for Life Cycle Assessment with Zero-Shot Machine Learning},
 year = {2023},
 url = {https://www.amazon.science/publications/flamingo-environmental-impact-factor-matching-for-life-cycle-assessment-with-zero-shot-machine-learning}
 booktitle = {ACM Journal on Computing and Sustainable Societies},
}

carbon-assessment-with-ml's People

Contributors

Stargazers

Watchers

Forkers

siri-sc skwolvie comdaze

carbon-assessment-with-ml's Issues

Query: Methods to improve similarity score.

I've been contemplating the introduction of an intermediate API call to look up user search queries on platforms like Amazon. By doing so, we could collect the first search result and extract its detailed product description, brand name, associated company, and more. These details can then be used to establish the similarity of the product description embedding to the industry product descriptions. Furthermore, the brand or company details can be matched to their respective NAICS industry associations. This dual-validation method, I believe, could significantly refine and enhance the match scores.

I'd be very interested to hear your thoughts on this. Is this a direction you've considered or possibly experimented with? If not, do you see any potential challenges or advantages in implementing this method? Your feedback and insights would be invaluable to me as I embark on extending your work.

I have attached a screenshot of my experiments with your project in which, I think the user search query intent did not match the industry mapping.

I'm hoping to get your opinion on the pros and cons of changing the input product string, and industry string, with it's respective descriptions. For example instead of the product input, "coca cola", and industry "Soft drink manufacturing", we replace it with preprocessed and cleaned embeddings of product description, and industry description as shown below input-lhs, and input-rhs.

input-lhs:

Coca-Cola Soda Soft Drink, 16.9 fl oz, 6 Pack
Product Description
Soda. Pop. Soft drink. Sparkling beverage. Whatever you call it, nothing compares to the refreshing, crisp taste of Coca-Cola Original Taste, the delicious soda you know and love. Enjoy with friends, on the go or with a meal. Whatever the occasion, wherever you are, Coca-Cola Original Taste makes life’s special moments a little bit better. Carefully crafted in 1886, its great taste has stood the test of time. Something so delicious, so unique and so familiar, it’s what makes you think “Coca-Cola” whenever you hear “soft drink.” Between that perfect taste and refreshing fizz, it’s sure to give you that “ahhh” moment whenever you want it. Coca-Cola is available in many different options in addition to Original Taste, including a variety of all-time favorite flavors like Coca-Cola Cherry and Coca-Cola Vanilla. Looking for something zero sugar or caffeine free? Then look no further than Coca-Cola Zero Sugar and Coca-Cola Caffeine Free. Whatever you’re looking for in a soda, there’s a Coca-Cola to satisfy your taste buds. Every sip, every “ahhh,” every smile—find that feeling with Coca-Cola Original Taste. Best enjoyed ice-cold for maximum refreshment. Grab a Coca-Cola Original Taste, take a sip and find your “ahhh” moment. Enjoy Coca-Cola Original Taste.

input-rhs:

Common types of business activities within NAICS Code 312111 - Soft Drink Manufacturing are:

Flavored water manufacturing
Coffee, iced, manufacturing
Iced coffee manufacturing
Soda carbonated, manufacturing
Pop, soda, manufacturing
Carbonated soda manufacturing
Artificially carbonated waters manufacturing
Fruit drinks (except juice), manufacturing
Soda pop manufacturing
Beverages, soft drink (including artificially carbonated waters), manufacturing
Water, flavored, manufacturing
Water, artificially carbonated, manufacturing
Soft drinks manufacturing
Beverages, fruit and vegetable drinks, cocktails, and ades, manufacturing
Carbonated soft drinks manufacturing
Iced tea manufacturing
Tea, iced, manufacturing
Drinks, fruit (except juice), manufacturing

Query: Speed up evaluation with faiss search

Hi tried using the FAISS inner product similarity metric on the code in evaluation.ipynb
The existing code took 25 minutes on the 6k annotated dataset. Whereas the FAISS implementation took just 19 seconds. The accuracies are quite different but comparable. I wanted to understand if this is a good contribution that can be made as a PR.
The comparable accuracy, and 60X increased speed might be beneficial to test multiple sentence similarity models.

Top-1 accuracy w.r.t NAICS codes: 0.6466165413533834
Correct predictions: 3698, Total Products: 5719
Top-1 accuracy w.r.t BEA codes: 0.7518796992481203
Correct predictions: 4300, Total Products: 5719

Top-1 accuracy w.r.t NAICS codes (FAISS): 0.6284315439762196
Correct predictions: 3594, Total Products: 5719
Top-1 accuracy w.r.t BEA codes (FAISS): 0.7361426822871131
Correct predictions: 4210, Total Products: 5719

AttributeError: module 'caml.eio.config' has no attribute 'useeio_file'

In the demo notebook, when the cell 3 is executed, I get the Attribute error.
Below is the error:

----> [1](vscode-notebook-cell:/Users/sachin.murthy/Desktop/carbon-assessment-with-ml/notebooks/eio/demo.ipynb#W2sZmlsZQ%3D%3D?line=0) naics_df = naics.get_naics_data()
      [2](vscode-notebook-cell:/Users/sachin.murthy/Desktop/carbon-assessment-with-ml/notebooks/eio/demo.ipynb#W2sZmlsZQ%3D%3D?line=1) naics_list = naics_df.naics_desc.values
      [3](vscode-notebook-cell:/Users/sachin.murthy/Desktop/carbon-assessment-with-ml/notebooks/eio/demo.ipynb#W2sZmlsZQ%3D%3D?line=2) print(len(naics_list))

File [~/Desktop/carbon-assessment-with-ml/caml/eio/naics.py:5](https://file+.vscode-resource.vscode-cdn.net/Users/sachin.murthy/Desktop/carbon-assessment-with-ml/notebooks/eio/~/Desktop/carbon-assessment-with-ml/caml/eio/naics.py:5), in get_naics_data()
      4 def get_naics_data():
----> 5     useeio_df = pd.read_csv(config.useeio_file)
      6     useeio_df = useeio_df[['2017 NAICS Code', '2017 NAICS Title', 'Supply Chain Emission Factors with Margins', 'Reference USEEIO Code']]
      7     useeio_df = useeio_df.rename(columns={
      8         "2017 NAICS Code": "naics_code",
      9         "2017 NAICS Title": "naics_title",
     10         "Supply Chain Emission Factors with Margins": "co2e_per_dollar",
     11         "Reference USEEIO Code": "bea_code",
     12     })

AttributeError: module 'caml.eio.config' has no attribute 'useeio_file'

This data file is not present in this repo. Where can I get it from?

Thanks,

amazon-science / carbon-assessment-with-ml Goto Github PK

carbon-assessment-with-ml's Introduction

Carbon assessment with machine learning

Installation

Getting Started

Dataset

Security

License

Citation

carbon-assessment-with-ml's People

Contributors

Stargazers

Watchers

Forkers

carbon-assessment-with-ml's Issues

Query: Methods to improve similarity score.

Query: Speed up evaluation with faiss search

AttributeError: module 'caml.eio.config' has no attribute 'useeio_file'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent