Giter Club home page Giter Club logo

carbon-assessment-with-ml's Introduction

Carbon assessment with machine learning

This code repository presents a machine learning based method for selection of an Environmental Impact Factor (EIF) for a given product, material, or activity, which is a fundamental step of carbon footprinting. The code documents the methods in the following research papers.

  1. EIF matching for EIO-LCA, published in WWW 2023 --
    CaML: Carbon Footprinting of Household Products with Zero-Shot Semantic Text Similarity
    Bharathan Balaji, Venkata Sai Gargeya Vunnava, Geoffrey Guest, Jared Kramer

  2. EIF matching for Process LCA, published in ACM JCSS --
    Flamingo: Environmental Impact Factor Matching for Life Cycle Assessment with Zero-Shot Machine Learning Bharathan Balaji, Venkata Sai Gargeya Vunnava, Shikhar Gupta, Nina Domingo, Harsh Gupta, Geoffrey Guest, Aravind Srinivasan

Installation

Required packages are given in requirements.txt Run the following commands to install the package:

git clone https://github.com/amazon-science/carbon-assessment-with-ml.git
cd carbon-assessment-with-ml
pip install -r requirements.txt
pip install -e .

Getting Started

Follow the code in notebooks folder.
For EIO-LCA use: notebooks/eio/demo.ipynb
for process LCA use: notebooks/process/generate_ranked_preds.ipynb

Dataset

The dataset is for research purposes only, and is not indicative of Amazon’s business use for carbon footprinting.

The dataset consists of retail products mapped to North American Industry Classification System (NAICS) codes. The mapping was done with Amazon Mechanical Turk, aggregating ground truth from 5 annotations per product. The dataset is the basis of estimating the carbon emissions of a product using Economic Input-Output Life Cycle Assessment (EIO-LCA). Dataset is stored as a Pandas dataframe.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the terms of the Apache 2.0 license. See LICENSE. Included datasets are licensed under the terms of the CDLA Permissive license, version 2.0. See LICENSE-DATA.

Citation

Below is the BibTeX text, if you would like to cite our work.

@Inproceedings{Balaji2023CaML,
 author = {Bharathan Balaji and Geoffrey Guest and Venkata Sai Gargeya Vunnava and Jared Kramer},
 title = {CaML: Carbon footprinting of household products with zero-shot semantic text similarity},
 year = {2023},
 url = {https://www.amazon.science/publications/caml-carbon-footprinting-of-household-products-with-zero-shot-semantic-text-similarity},
 booktitle = {The Web Conference 2023},
}
@Inproceedings{Balaji2023Flamingo,
 author = {Bharathan Balaji and Venkata Sai Gargeya Vunnava and Shikhar Gupta and Nina Domingo and Harsh Gupta and Geoffrey Guest and Aravind Srinivasan},
 title = {Flamingo: Environmental Impact Factor Matching for Life Cycle Assessment with Zero-Shot Machine Learning},
 year = {2023},
 url = {https://www.amazon.science/publications/flamingo-environmental-impact-factor-matching-for-life-cycle-assessment-with-zero-shot-machine-learning}
 booktitle = {ACM Journal on Computing and Sustainable Societies},
}

carbon-assessment-with-ml's People

Contributors

amazon-auto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

carbon-assessment-with-ml's Issues

Query: Methods to improve similarity score.

I've been contemplating the introduction of an intermediate API call to look up user search queries on platforms like Amazon. By doing so, we could collect the first search result and extract its detailed product description, brand name, associated company, and more. These details can then be used to establish the similarity of the product description embedding to the industry product descriptions. Furthermore, the brand or company details can be matched to their respective NAICS industry associations. This dual-validation method, I believe, could significantly refine and enhance the match scores.

I'd be very interested to hear your thoughts on this. Is this a direction you've considered or possibly experimented with? If not, do you see any potential challenges or advantages in implementing this method? Your feedback and insights would be invaluable to me as I embark on extending your work.

I have attached a screenshot of my experiments with your project in which, I think the user search query intent did not match the industry mapping.

I'm hoping to get your opinion on the pros and cons of changing the input product string, and industry string, with it's respective descriptions. For example instead of the product input, "coca cola", and industry "Soft drink manufacturing", we replace it with preprocessed and cleaned embeddings of product description, and industry description as shown below input-lhs, and input-rhs.

input-lhs:

Coca-Cola Soda Soft Drink, 16.9 fl oz, 6 Pack
Product Description
Soda. Pop. Soft drink. Sparkling beverage. Whatever you call it, nothing compares to the refreshing, crisp taste of Coca-Cola Original Taste, the delicious soda you know and love. Enjoy with friends, on the go or with a meal. Whatever the occasion, wherever you are, Coca-Cola Original Taste makes life’s special moments a little bit better. Carefully crafted in 1886, its great taste has stood the test of time. Something so delicious, so unique and so familiar, it’s what makes you think “Coca-Cola” whenever you hear “soft drink.” Between that perfect taste and refreshing fizz, it’s sure to give you that “ahhh” moment whenever you want it. Coca-Cola is available in many different options in addition to Original Taste, including a variety of all-time favorite flavors like Coca-Cola Cherry and Coca-Cola Vanilla. Looking for something zero sugar or caffeine free? Then look no further than Coca-Cola Zero Sugar and Coca-Cola Caffeine Free. Whatever you’re looking for in a soda, there’s a Coca-Cola to satisfy your taste buds. Every sip, every “ahhh,” every smile—find that feeling with Coca-Cola Original Taste. Best enjoyed ice-cold for maximum refreshment. Grab a Coca-Cola Original Taste, take a sip and find your “ahhh” moment. Enjoy Coca-Cola Original Taste.

input-rhs:

Common types of business activities within NAICS Code 312111 - Soft Drink Manufacturing are:

Flavored water manufacturing
Coffee, iced, manufacturing
Iced coffee manufacturing
Soda carbonated, manufacturing
Pop, soda, manufacturing
Carbonated soda manufacturing
Artificially carbonated waters manufacturing
Fruit drinks (except juice), manufacturing
Soda pop manufacturing
Beverages, soft drink (including artificially carbonated waters), manufacturing
Water, flavored, manufacturing
Water, artificially carbonated, manufacturing
Soft drinks manufacturing
Beverages, fruit and vegetable drinks, cocktails, and ades, manufacturing
Carbonated soft drinks manufacturing
Iced tea manufacturing
Tea, iced, manufacturing
Drinks, fruit (except juice), manufacturing
image

Query: Speed up evaluation with faiss search

Hi tried using the FAISS inner product similarity metric on the code in evaluation.ipynb
The existing code took 25 minutes on the 6k annotated dataset. Whereas the FAISS implementation took just 19 seconds. The accuracies are quite different but comparable. I wanted to understand if this is a good contribution that can be made as a PR.
The comparable accuracy, and 60X increased speed might be beneficial to test multiple sentence similarity models.

Top-1 accuracy w.r.t NAICS codes: 0.6466165413533834
Correct predictions: 3698, Total Products: 5719
Top-1 accuracy w.r.t BEA codes: 0.7518796992481203
Correct predictions: 4300, Total Products: 5719

Top-1 accuracy w.r.t NAICS codes (FAISS): 0.6284315439762196
Correct predictions: 3594, Total Products: 5719
Top-1 accuracy w.r.t BEA codes (FAISS): 0.7361426822871131
Correct predictions: 4210, Total Products: 5719

image image

AttributeError: module 'caml.eio.config' has no attribute 'useeio_file'

In the demo notebook, when the cell 3 is executed, I get the Attribute error.
Below is the error:

----> [1](vscode-notebook-cell:/Users/sachin.murthy/Desktop/carbon-assessment-with-ml/notebooks/eio/demo.ipynb#W2sZmlsZQ%3D%3D?line=0) naics_df = naics.get_naics_data()
      [2](vscode-notebook-cell:/Users/sachin.murthy/Desktop/carbon-assessment-with-ml/notebooks/eio/demo.ipynb#W2sZmlsZQ%3D%3D?line=1) naics_list = naics_df.naics_desc.values
      [3](vscode-notebook-cell:/Users/sachin.murthy/Desktop/carbon-assessment-with-ml/notebooks/eio/demo.ipynb#W2sZmlsZQ%3D%3D?line=2) print(len(naics_list))

File [~/Desktop/carbon-assessment-with-ml/caml/eio/naics.py:5](https://file+.vscode-resource.vscode-cdn.net/Users/sachin.murthy/Desktop/carbon-assessment-with-ml/notebooks/eio/~/Desktop/carbon-assessment-with-ml/caml/eio/naics.py:5), in get_naics_data()
      4 def get_naics_data():
----> 5     useeio_df = pd.read_csv(config.useeio_file)
      6     useeio_df = useeio_df[['2017 NAICS Code', '2017 NAICS Title', 'Supply Chain Emission Factors with Margins', 'Reference USEEIO Code']]
      7     useeio_df = useeio_df.rename(columns={
      8         "2017 NAICS Code": "naics_code",
      9         "2017 NAICS Title": "naics_title",
     10         "Supply Chain Emission Factors with Margins": "co2e_per_dollar",
     11         "Reference USEEIO Code": "bea_code",
     12     })

AttributeError: module 'caml.eio.config' has no attribute 'useeio_file'

This data file is not present in this repo. Where can I get it from?

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.