Giter Club home page Giter Club logo

bw_hybrid's Introduction

Brightway Hybrid

Brightway

Warning

This package is work in progress.
For further questions, contact @michaelweinold

This package provides tools for the hybridization of inventory databases and multi-regional input-output tables.

๐Ÿ“š Literature (Excerpt):

Agez et al., 2020, Agez et al., 2019

๐Ÿ“ Related Repositories:

bw_hybrid's People

Contributors

michaelweinold avatar

Stargazers

Maxime Agez avatar Victor Tulus avatar

Watchers

James Cloos avatar  avatar

Forkers

cmutel

bw_hybrid's Issues

`ecospold2matrix` Data Flowchart

Ecospold xml Ingestion

flowchart TD
Ain[IntermediateExchanges.xml] --> A["extract_products()"] --> Aout[products: pd.DataFrame]
Bin[ActivityIndex.xml <br> ActivityNames.xml] --> B["extract_activities()"] -->  Bout[activities: pd.DataFrame]
C["get_labels()"] --> Cextract1 --> Cout[PRO: pd.DataFrame <br> STR: pd.DataFrame]
Cin[ElementaryExchanges.xml <br> spold files] --> C --> Cextract2 --> Cout
Din[spold files] --> D["get_flows()"] --> Dextract["extract_flows()"] --> Dout[inflows: pd.DataFrame <br> outflows: pd.DataFrame <br> elementary_flows: pd.DataFrame]
Cextract1["build_PRO()"]
Cextract2["build_STR()"]
Loading

DataFrame Table

object type columns
(e2m)
columns correspondence
(bw) or Ecospold
comment
inflows pd.DataFrame fileId
sourceActivityId
productId
amount
row_index
.spold file name
activityLinkId
intermediateExchangeId
amount
???
extracted from .spold files
outflows pd.DataFrame fileId
productId
amount
productionVolume
outputGroup
.spold file name
intermediateExchangeId
amount
productionVolume
outputGroup
extracted from .spold files
elementary_flows pd.DataFrame fileId
elementaryExchangeId
amount
.spold file name
elementaryExchangeId
amount
extracted from .spold files
activities pd.DataFrame activityId
activityNameId
activityType
startDate
endDate
activityName
id
activityNameId
activityType
startDate
endDate
activityName
extracted from ActivityIndex.xml with activityName data merged from ActivityNames.xml
products pd.DataFrame productName
unitName
productId
unitId
cpc
properties
name
unitName
id
unitId
classification == 'cpc'
properties
extracted from IntermediateExchanges.xml
STR pd.DataFrame id
name
unit
cas
comp
subcomp
id
name
unitName
casNumber
compartment
subcompartment
extracted from ElementaryExchanges.xml
PRO pd.DataFrame 'activityId'
'productId'
'activityName'
'ISIC'
'price'
'priceUnit'
'EcoSpoldCategory'
'geography'
'technologyLevel'
'macroEconomicScenario'
properties_x
'productionVolume'
'productName'
'unitName'
'cpc'
properties_y
'activityNameId'
'activityType'
'startDate'
'endDate'
'activityName_duplicate'
'id'
'productId'
'activityName'
'ISIC'
'price'
'priceUnit'
'EcoSpoldCategory'
'geography'
'technologyLevel'
'macroEconomicScenario'
properties_x
'productionVolume'
'productName'
'unitName'
'cpc'
properties_y
'activityNameId'
'activityType'
'startDate'
'endDate'
'activityName_duplicate'
extracted from .spold files

Preparation and Cleanup

flowchart TD
in[activities: pd.DataFrame <br> products: pd.DataFrame] --> F["complement_labels()"] --> out[PRO: pd.DataFrame <br> STR: pd.DataFrame]
Loading

DataFrame Table

object type columns
(e2m)
columns correspondence
(bw) or Ecospold
comment
PRO pd.DataFrame all prev. columns
'productionVolume'
all cols. from products
all cols. from activities
all prev. columns
'productionVolume'
all cols. from products
all cols. from activities
for merge keys, see below

Join Table

left right left_key right_key added cols.
PRO outflows index = 'abc' index = 'abc' 'productionVolume'
PRO products index = 'abc' index = 'abc' all except potential duplicates
PRO activities index = 'abc' index = 'abc' all except potential duplicates

DataFrame Construction (change heading)

flowchart TD
in[inflows: pd.DataFrame <br> elementary_flows: pd.DataFrame <br> outflows: pd.DataFrame] --> F["build_AF()"] --> out[A: pd.DataFrame <br> F: pd.DataFrame]
Loading

DataFrame Table

Pivot Table

output input index columns values output index output cols.
A inflows 'row_index' = 'fileId' + 'productId' 'fileId' 'amount' PRO.index = 'abc' PRO.index = 'abc'
F elementary_flows 'elementaryExchangeId' 'fileId' 'amount' STR.index = 'abc' PRO.index = 'abc'

Characterization

flowchart TD
in["LCIA Implementation v3.8.xlsx"] --> F1["if-else"]  --> F2["simple_characterisation_matching()"] --> out[A: pd.DataFrame <br> C: pd.DataFrame]
Loading

Pivot Table

output input index columns values output index output cols.
C C_long 'impact_label' 'stressorId' 'CF' N/A N/A

`pylcaio.LCAIO.update_prices_electricity`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

Database Loader: Environmental Extensions (`complete_extensions`)

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

Auxiliary Function: `get_inflation`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

`pylcaio.LCAIO.extend_inventory`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

`pylcaio.LCAIO.apply_scaling_without_prices` ("CONVERSION PART")

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

`pylcaio.LCAIO.identify_rows`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb Cell 16 in <cell line: 1>()
      [3](vscode-notebook-cell://ssh-remote%2Bhetzner/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2) else:
      [4](vscode-notebook-cell://ssh-remote%2Bhetzner/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3)     parser = e2m.Ecospold2Matrix(
      [5](vscode-notebook-cell://ssh-remote%2Bhetzner/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4)         sys_dir = path_ecoinvent_local,
      [6](vscode-notebook-cell://ssh-remote%2Bhetzner/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=5)         project_name = 'ecoinvent_3_5_cutoff',
   (...)
      [9](vscode-notebook-cell://ssh-remote%2Bhetzner/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8)         positive_waste = False,
     [10](vscode-notebook-cell://ssh-remote%2Bhetzner/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9)         nan2null = True)
---> [11](vscode-notebook-cell://ssh-remote%2Bhetzner/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=10)     parser.ecospold_to_Leontief(
     [12](vscode-notebook-cell://ssh-remote%2Bhetzner/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=11)         fileformats = 'Pandas',
     [13](vscode-notebook-cell://ssh-remote%2Bhetzner/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=12)         with_absolute_flows=True)
     [14](vscode-notebook-cell://ssh-remote%2Bhetzner/home/weinold/github/pylcaio_integration_with_brightway/notebooks/main.ipynb#X21sdnNjb2RlLXJlbW90ZQ%3D%3D?line=13)     ecoinvent = read_ecoinvent_pickle(path_e2m_project)

File ~/miniconda3/envs/pylcaio/lib/python3.9/site-packages/ecospold2matrix/ecospold2matrix.py:398, in Ecospold2Matrix.ecospold_to_Leontief(self, fileformats, with_absolute_flows, lci_check, rtol, atol, imax, characterisation_file, ardaidmatching_file)
    396 self.extract_activities()
    397 self.get_flows()
--> 398 self.get_labels()
    400 # Clean up if necessary
    401 self.__find_unsourced_flows()

File ~/miniconda3/envs/pylcaio/lib/python3.9/site-packages/ecospold2matrix/ecospold2matrix.py:617, in Ecospold2Matrix.get_labels(self)
    612         self.log.info(msg.format('Labels', filename, sha1))
    614 # OR EXTRACT FROM ECOSPOLD DATA...
    615 else:
--> 617     self.build_PRO()
    618     self.build_STR()
    620     # and optionally pickle for further use

File ~/miniconda3/envs/pylcaio/lib/python3.9/site-packages/ecospold2matrix/ecospold2matrix.py:1166, in Ecospold2Matrix.build_PRO(self)
   1164     PRO.loc[file_index, ['price', 'priceUnit']] = [price, price_unit]
   1165 # Or complain if price already exists
-> 1166 elif not np.allclose([price_org], [price]):
   1167     print("WARNING: We have heterogeneous prices")
   1168 else:

File <__array_function__ internals>:180, in allclose(*args, **kwargs)

File ~/miniconda3/envs/pylcaio/lib/python3.9/site-packages/numpy/core/numeric.py:2265, in allclose(a, b, rtol, atol, equal_nan)
   2194 @array_function_dispatch(_allclose_dispatcher)
   2195 def allclose(a, b, rtol=1.e-5, atol=1.e-8, equal_nan=False):
   2196     """
   2197     Returns True if two arrays are element-wise equal within a tolerance.
   2198 
   (...)
   2263 
   2264     """
-> 2265     res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan))
   2266     return bool(res)

File <__array_function__ internals>:180, in isclose(*args, **kwargs)

File ~/miniconda3/envs/pylcaio/lib/python3.9/site-packages/numpy/core/numeric.py:2372, in isclose(a, b, rtol, atol, equal_nan)
   2369     dt = multiarray.result_type(y, 1.)
   2370     y = asanyarray(y, dtype=dt)
-> 2372 xfin = isfinite(x)
   2373 yfin = isfinite(y)
   2374 if all(xfin) and all(yfin):

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

`pylcaio.LCAIO.save_system`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

Improve naming

Library and repo names are different (bw2hybrid versus brightway2-hybridization); moreover this library should not be compatible with Brightway 2, but the "next generation", partially available already. Therefore, I suggest that we change both repo and library names to either bw_hybridization, bw_hybrid, or bw_heuristic_hybridization.

`pylcaio.LCAIO.calc_productions`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

Auxiliary Function: `completing_extensions`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

Switched variables names in `1_pylcaio_to_brightway.ipynb` notebook

path_file_ecoinvent_raw and path_file_exiobase_raw are flipped:

exiobase: pymrio.IOSystem = pymrio.parse_exiobase3(path_dir_exiobase_raw / str_exiobase_zip_file)
with open(path_file_ecoinvent_raw, 'wb') as file_handle:
    pickle.dump(obj = exiobase, file = file_handle, protocol=pickle.HIGHEST_PROTOCOL)

Database Loader: Capital Endogenization (`path_to_capitals`)

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

`pylcaio.LCAIO.hybridize`: "CONVERSION PART"

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

Documentation

pylcaio variable domain official name status comment
A_io IO A matrix ongoing
Y_io IO final demand matrix ongoing
F_io IO satellite account coefficient matrix ongoing usually named S
PRO_f PDB LCA database metadata ongoing
A_ff PDB ??? not yet started
F_f PDB environmental matrix ongoing

`pylcaio.LCAIO.low_production_volume_processes`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

`pylcaio.DatabaseLoader.combine_ecoinvent_exiobase`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

`pylcaio.LCAIO.extract_*`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

Database Loader: Regionalization (`regionalized`)

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

`KeyError: 'method'` when reading pickled hybridized databases using `pylcaio.Analysis`

When attempting to read a pickled pylcaio.LCAIO class instance with the pylcaio.Analysis method, the following error is thrown:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/pylcaio/src/pylcaio.py:1915, in Analysis.__init__(self, path_to_hybrid_system)
   1914 try :
-> 1915     self.C = pd.concat([pd.DataFrame(self.C_f.todense(), index=self.IMP['method'], columns=self.STR['MATRIXID']),
   1916                pd.DataFrame(self.C_io.todense(), index=self.impact_categories_IO,
   1917                             columns=self.flows_of_IO)]).fillna(0)
   1918 except KeyError:

KeyError: 'method'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
Cell In [10], line 1
----> 1 Analysis = pylcaio.Analysis(path_file_hybrid)

File ~/pylcaio/src/pylcaio.py:1919, in Analysis.__init__(self, path_to_hybrid_system)
   1915     self.C = pd.concat([pd.DataFrame(self.C_f.todense(), index=self.IMP['method'], columns=self.STR['MATRIXID']),
   1916                pd.DataFrame(self.C_io.todense(), index=self.impact_categories_IO,
   1917                             columns=self.flows_of_IO)]).fillna(0)
   1918 except KeyError:
-> 1919     self.C = pd.concat([pd.DataFrame(self.C_f.todense(), index=self.IMP['method'], columns=self.STR['MATRIXID']),
   1920                pd.DataFrame(self.C_io.todense(), index=self.impact_categories_IO,
   1921                             columns=self.flows_of_IO)]).fillna(0)
   1922 self.C_index = self.C.index.tolist()
   1923 self.C = back_to_sparse(self.C)

KeyError: 'method'

`pylcaio.LCAIO.correct_double_counting`

Documentation:

  • add data input/output format documentation
  • add mathematical formulation of data manipulation

Refactoring:

  • implement function using native Pandas/NumPy objects and methods
  • document performance improvements
  • create MWE input/output data for unit test

Convert Static Data Files to `json` and `csv`

Currently, data files are of formats txt, csv, xlsx. To avoid using unsafe functions like eval() and to simpify data import, all data should be converted to either json or csv.

File names should indicate the target Python type (e.g.: dict_ecoinvent_aux_data.json or list_geography_correspondence.json.

Task List

  • countries_per_regions.txt
  • countries.txt
  • Filter.xlsx
  • geography_replacements.txt
  • Product_Concordances.xlsx
  • STAM_categories.txt
  • STAM_functional_categories.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.