Giter Club home page Giter Club logo

environmental-footprint-data's People

Contributors

airloren avatar boavizta-gh-api avatar bpetit avatar elenaaab avatar nitot avatar pabluk avatar pcorpet avatar redapengam avatar sbaudoin avatar vincentvillet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

environmental-footprint-data's Issues

Outdated/incorrect data

Hi,
First of all, thank you for this very useful data !

Some of the data reported in the csv file do not correspond to the data in the sources. For example,
HP ProLiant DL360 Gen10 server reports a gwp_total of 1710 kgeqCO2 (with 77% of this caused by the server usage). However, the corresponding source document (https://assets.ext.hpe.com/is/content/hpedam/a50002430enw) reports 6270 kgeqCO2 (with 87% caused by the server usage).

Some other HP server have the same problem (eg. "ProLiant ML30 Gen10 server", "ProLiant DL160 Gen10 server"). HPE probably updated their datasheets.

Check memory unit

The memory attribute is expected to be a float number in GB. This means that 1) the GB must be removed, and 2) that the parsers have to guaranty that they parsed a number in the right unit, that is not the case yet.

Duplicated entries

Hello,
I found 4 duplicates in this database:

  • Apple Watch SE 44mm Aluminum Case with Sport Band --> L40 and L41
  • Apple Watch Series 8 45mm Aluminum Case with Sport Band --> L42 and L43
  • Apple Watch Ultra (GPS + Cellular) Titanium Case with Ocean Band --> L44 and L45
  • iPad (9th generation) Wi-Fi + Cellular with 64GB --> L50 and L51

Add multi criteria impact data from available LCA

Currently the database only focuses on Carbon footprint wheras other impacts such as Abiotic Depletion, Primary Energy , Water, Human toxicity should be assessed and are available in several Life Cycle Assessments provided by manufacturers.
We already identified the following :

Nvidia numbers

If you find any data about Nvidia GPUs, can you please let me know? This is something I'm really interested in!
Thank you :)

Add a script to automatically merge multiple .csv files and deal with duplicates

We need a dedicated tool to merge merge multiple .csv files while detecting and merging duplicates.

I've started to implement it through a new static method of DeviceCarbonFootprint:

@staticmethod
    def merge(device1: 'DeviceCarbonFootprint', device2: 'DeviceCarbonFootprint',
              conflict: Literal['keep2nd','interactive'] = 'keep2nd', verbose: bool = False) -> 'DeviceCarbonFootprint':

and a merge_csv.py file1 file2 standalone script written on top of the above merge function.

By default, priority is given to device2/file2.

Conflicts are detected only for attributes that provided for both devices and when they are clearly different. If they are close enough, then merge only print a warning in verbose mode.

Then, there are two modes to resolve the conflicts:

  1. Simply keep device2 (and print the differences in verbose mode)
  2. Ask the user which version should be kept.

TODO:

  1. Add a non-regression mode only testing that device2 is consistent with device1 and that device1 does not contain more information.
  2. Cleanup and unify some entries prior to fusion to avoid false negative (i.e., CN versus China, issue #64)
  3. Find a way to deal with PCF files reporting the same model name whereas they are not the same (in ecodiag I also extract the model name from the main html files)

HP vs HPE

Hewlett Packard Inc. (HP) and Hewlett Packard Enterprise (HPE) are two separate legal entities, and have been since 2015.

It would therefore be better if the products were correctly designated as such in the list.

Monitoring new data sources on Internet

The objective is to create a monitoring tool to detect publication of unknown product environmental footprint reports.
We could regularly search for specific keywords and alert when new reports are found.

Search could be build with a combination of :

  1. Name of manufacturer not already monitored by Boavizta's spiders (ex: ibm, cisco, samsung...)
  2. Typical keywords found in known PCF (ex: PAIA, Product Carbon Footprint, Product Environmental Report, kgCO2eq, kgCO2e)
  3. Typical filetype of document (ex: "type:pdf")

Add manufacturing breakdown details

Most PCF files provides breakdown details for the manufacturing part. They are, however, not always fully consistent on the partitioning. Here is the list I ended up on ecodiag's side:

  • packaging (PAIA)
  • chassis (PAIA desktop+laptops, a very very few HP monitors -> their mistake ?)
  • mainboard (PAIA+HPE)
  • daughterboard (HPE)
  • power supply unit (PAIA)
  • HDD (PAIA)
  • SSD (PAIA+HPE)
  • optical drive (PAIA)
  • display (PAIA laptops & AiO, but also a very few HP monitors -> their mistake ?)
  • battery (PAIA)
  • housing (PAIA monitors)
  • electronics (PAIA monitors + wise-thin-client)
  • panel (PAIA monitors)
  • assembly (Dell wise-thin-client, a very few HP laptops, many Lenovo laptops, HPE)
  • materials (Dell wise-thin-client, a very few HP laptops)
  • LCD assembly (1 dell + 2 HP laptops)
  • PWBs (2 HP laptops)
  • Integrated circuits (1 dell + 3 HP laptops)
  • chassis+PSU (HPE)
  • others (various HP)

This long list is conservative, but that's a lot ! So maybe some components could be merged together ?

For instance, when the PSU is combined with the chassis, maybe we could just put it to "others" since this does not provide much information.

Some other propositions:

  • Merge housing and chassis (their use is exclusive)
  • Merge display and panel (their use is exclusive)
  • Merge mainboard, daughterboard, IC and PWB within electronics ?

Names do not match model names in the device system

Hello,
We encounter issue when using this database as the name did not always fit the model name in the device system.
Example:
In this database --> EliteBook ...
In the device registry --> HP EliteBook ...

It make it harder for the automation as all inventory softwares use the device registry to get that information.

Is this issue known?

New spider and parser for Apple

Apple hardware PCF documents could be downloaded here

Spider should get all pdf links on the page as tools/monitoring/apple_check.py does and simply launch the parser for each of these links.

Parser could be build based on existing parsers such as tools/parsers/hp_workplace.py
ECODIAG parser could also be used to find all needed regex.

Improve tools documentation

README files should:

  • list all prerequisites to run parsers and spiders
  • explain how to run parsers in standalone
  • explain how to run spiders

Unify screen size unit

For Apple's smartphones and the likes, screen_size corresponds to the screen resolution in pixels, whereas for monitors and laptops it corresponds to inches.

Unify location names

Same locations are sometimes spelled with long names (China) or as two letters (CN). This needs to be unified.

Erroneous Dell's subcategory

Dell's parser assumes that 'Precision' models are Desktop whereas there also exists Precision laptops.

In ecodiag, I extract the sub-categories from the main html file itself rather than from the PCF file.

License

Hi and thanks for sharing this project :)

It is currently without license, so it would be difficult for people to contribute to it.

There is this sentence in the README:

This data can be freely used for any purpose including without using Boavizta's methodology.

Then I'd advise you to use a creative common public license. If you agree, I can make the PR.

Thanks and have a nice day!

Unify added date format

Initial parsing date format is 01-11-2020 and manually added rows are on the same format but Auto parsers are on a different format (2022-10-18).
I think it would be easier to change Initial parsing and manually added rows. It will avoid to modify all spiders.

New Spider and parser for HPE

HPE hardware PCF documents could be downloaded here

Spider should :

  • retrieve all PDF links :
    • get number of documents
    • create empty list of links
    • While number of links < number of documents
      • for all links (href) on element with class="gsr-result-head-link"
        • Click link
        • Get link (href) on element with id="downloadPdfLink"
        • add link to list of links
      • if number of links < number of documents
        • Go to next result page by clicking element with class="gsr-pagination-button next"
  • launch HPE parser

Parser could be build based on existing HP Workplace parser.
No need for OCR to analyse pie charts as all data is available as text.

Automate monitoring of all manufacturers

Monitoring of all manufacturers webpages could be automated with GH actions to :

  • run spiders to regularly check for new PDFs to analyse and launch parsers if needed
  • run generate-gh-pr.py to generate Pull Requests for new devices to add to the database

Warning : Some improvement could be needed on enerate-gh-pr.py as it was not tested since Novembre 2021.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.