majeau-bettez / ecospold2matrix Goto Github PK
View Code? Open in Web Editor NEWClass for recasting Ecospold2 LCA dataset into Leontief matrix representations or Supply and Use Tables
License: BSD 2-Clause "Simplified" License
Class for recasting Ecospold2 LCA dataset into Leontief matrix representations or Supply and Use Tables
License: BSD 2-Clause "Simplified" License
Attempting to run ecospold2matrix
like so:
parser = e2m.Ecospold2Matrix(
sys_dir = path_dir_ecoinvent_input,
lci_dir = os.path.join(path_dir_ecoinvent_input, 'datasets'),
project_name = e2m_project_name,
characterisation_file = path_file_ecoinvent_characterisation,
out_dir = path_dir_databases_pickle,
positive_waste = False,
nan2null = True
)
parser.ecospold_to_Leontief(
fileformats = 'Pandas',
with_absolute_flows=True
)
with Ecoinvent 3.8 returns the following error:
---------------------------------------------------------------------------
OperationalError Traceback (most recent call last)
Cell In [29], line 1
----> 1 parser.ecospold_to_Leontief(
2 fileformats = 'Pandas',
3 with_absolute_flows=True
4 )
File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/ecospold2matrix/ecospold2matrix.py:425, in Ecospold2Matrix.ecospold_to_Leontief(self, fileformats, with_absolute_flows, lci_check, rtol, atol, imax, characterisation_file, ardaidmatching_file)
423 else:
424 self.prepare_matching_load_parameters()
--> 425 self.process_inventory_elementary_flows()
426 self.read_characterisation()
427 self.populate_complementary_tables()
File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/ecospold2matrix/ecospold2matrix.py:2803, in Ecospold2Matrix.process_inventory_elementary_flows(self)
2801 # export to tmp SQL table
2802 c = self.conn.cursor()
-> 2803 self.STR.to_sql('dirty_inventory',
2804 self.conn,
2805 index_label='id',
2806 if_exists='replace')
2807 c.execute( """
2808 INSERT INTO raw_inventory(id, name, comp, subcomp, unit, cas)
2809 SELECT DISTINCT id, name, comp, subcomp, unit, cas
2810 FROM dirty_inventory;
2811 """)
2813 self.clean_label('raw_inventory')
File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/pandas/core/generic.py:2987, in NDFrame.to_sql(self, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)
2830 """
2831 Write records stored in a DataFrame to a SQL database.
2832
(...)
2983 [(1,), (None,), (2,)]
2984 """ # noqa:E501
2985 from pandas.io import sql
-> 2987 return sql.to_sql(
2988 self,
2989 name,
2990 con,
2991 schema=schema,
2992 if_exists=if_exists,
2993 index=index,
2994 index_label=index_label,
2995 chunksize=chunksize,
2996 dtype=dtype,
2997 method=method,
2998 )
File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/pandas/io/sql.py:695, in to_sql(frame, name, con, schema, if_exists, index, index_label, chunksize, dtype, method, engine, **engine_kwargs)
690 elif not isinstance(frame, DataFrame):
691 raise NotImplementedError(
692 "'frame' argument should be either a Series or a DataFrame"
693 )
--> 695 return pandas_sql.to_sql(
696 frame,
697 name,
698 if_exists=if_exists,
699 index=index,
700 index_label=index_label,
701 schema=schema,
702 chunksize=chunksize,
703 dtype=dtype,
704 method=method,
705 engine=engine,
706 **engine_kwargs,
707 )
File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/pandas/io/sql.py:2187, in SQLiteDatabase.to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype, method, **kwargs)
2176 raise ValueError(f"{col} ({my_type}) not a string")
2178 table = SQLiteTable(
2179 name,
2180 self,
(...)
2185 dtype=dtype,
2186 )
-> 2187 table.create()
2188 return table.insert(chunksize, method)
File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/pandas/io/sql.py:838, in SQLTable.create(self)
836 raise ValueError(f"'{self.if_exists}' is not valid for if_exists")
837 else:
--> 838 self._execute_create()
File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/pandas/io/sql.py:1871, in SQLiteTable._execute_create(self)
1869 with self.pd_sql.run_transaction() as conn:
1870 for stmt in self.table:
-> 1871 conn.execute(stmt)
OperationalError: duplicate column name: id
in line 1687 the setting of the absolute values for waste streams throws an error as it seems that the operation is not allowed on sparse arrays:
if self.positive_waste:
# In cutoff version of ecoinvent, some dummy waste processes do
# not seem to have negative reference outputs. These must then be
# identified more crudely based on string recognition, and their
# rows forced positive in the A-matrix
bo_cutoff = self.PRO.activityName.str.contains(self.__CUTOFFTXT)
self.A.loc[bo_cutoff,:] = self.A.loc[bo_cutoff,:].abs()
I found this workaround but perhaps there;s a better way?
This is the log output/error message:
2020-06-30 14:19:24,609 - plcaio_test - INFO - Starting to assemble the matrices
2020-06-30 14:19:28,229 - plcaio_test - INFO - fillna
2020-06-30 14:20:11,665 - plcaio_test - INFO - Starting normalizing matrices
TypeError Traceback (most recent call last)
in
8 characterisation_file=
9 '/home/jakobs/data/ecoinvent/ecoinvent 3.5_LCIA_implementation/LCIA_implementation_3.5.xlsx', )
---> 10 parser.ecospold_to_Leontief(fileformats='Pandas',with_absolute_flows=False)
~/Documents/IndEcol/OASES/ecospold2matrix/ecospold2matrix/ecospold2matrix.py in ecospold_to_Leontief(self, fileformats, with_absolute_flows, lci_check, rtol, atol, imax, characterisation_file, ardaidmatching_file)
405
406 # Finally, assemble normalized, symmetric matrices
--> 407 self.build_AF()
408
409 if with_absolute_flows:
~/Documents/IndEcol/OASES/ecospold2matrix/ecospold2matrix/ecospold2matrix.py in build_AF(self)
1685 # rows forced positive in the A-matrix
1686 bo_cutoff = self.PRO.activityName.str.contains(self.__CUTOFFTXT)
-> 1687 self.A.loc[bo_cutoff,:] = self.A.loc[bo_cutoff,:].abs()
1688
1689 if self.force_all_positive:
~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/indexing.py in setitem(self, key, value)
669 key = com.apply_if_callable(key, self.obj)
670 indexer = self._get_setitem_indexer(key)
--> 671 self._setitem_with_indexer(indexer, value)
672
673 def _validate_key(self, key, axis: int):
~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
984 v = np.nan
985
--> 986 setter(item, v)
987
988 # we have an equal len ndarray/convertible to our labels
~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/indexing.py in setter(item, v)
960 s._consolidate_inplace()
961 s = s.copy()
--> 962 s._data = s._data.setitem(indexer=pi, value=v)
963 s._maybe_update_cacher(clear=True)
964
~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/internals/managers.py in setitem(self, **kwargs)
559
560 def setitem(self, **kwargs):
--> 561 return self.apply("setitem", **kwargs)
562
563 def putmask(self, **kwargs):
~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, filter, **kwargs)
440 applied = b.apply(f, **kwargs)
441 else:
--> 442 applied = getattr(b, f)(**kwargs)
443 result_blocks = _extend_blocks(applied, result_blocks)
444
~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/internals/blocks.py in setitem(self, indexer, value)
1801
1802 check_setitem_lengths(indexer, value, self.values)
-> 1803 self.values[indexer] = value
1804 return self
1805
~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/arrays/sparse/array.py in setitem(self, key, value)
459 # ExtensionBlock.where
460 msg = "SparseArray does not support item assignment via setitem"
--> 461 raise TypeError(msg)
462
463 @classmethod
TypeError: SparseArray does not support item assignment via setitem
Hi, majeau-bettez,
Based on the "ecospold2matrix" class you shared, I successfully recast Ecospold2 LCA dataset into Leontief matrix representations. But in my results, there are only A, F, C, and no Z matrix. How can I modify it to obtain the Z matrix? And what should I do if I want to get SUT?
Looking forward to your reply.
Thanks!
`
import ecospold2matrix as e2m
parser = e2m.Ecospold2Matrix('/home/radekl/Desktop/ecoinvent 3.4_cutoff_lci_ecoSpold02/datasets', project_name='eco34', out_dir='/home/radekl/eco_matrices', positive_waste=True)
2018-02-22 10:58:36,955 - eco34 - INFO - Ecospold2Matrix Processing
2018-02-22 10:58:36,961 - eco34 - INFO - Current git commit: ca2593a
2018-02-22 10:58:36,962 - eco34 - INFO - Project name: eco34
2018-02-22 10:58:36,962 - eco34 - INFO - Unit process and Master data directory: /home/radekl/Desktop/ecoinvent 3.4_cutoff_lci_ecoSpold02/datasets
2018-02-22 10:58:36,962 - eco34 - INFO - Data saved in: /home/radekl/eco_matrices
2018-02-22 10:58:36,962 - eco34 - INFO - Sign conventions changed to make waste flows positive
2018-02-22 10:58:36,962 - eco34 - INFO - Pickle intermediate results to files
2018-02-22 10:58:36,962 - eco34 - INFO - Order processes based on: ISIC, activityName
2018-02-22 10:58:36,962 - eco34 - INFO - Order elementary exchanges based on: comp, name, subcomp
2018-02-22 10:58:36,972 - eco34 - WARNING - Could not establish connection to database
parser.ecospold_to_Leontief(with_absolute_flows=True)
Traceback (most recent call last):
File "", line 1, in
File "/home/radekl/workspace/ecospold2matrix/ecospold2matrix/ecospold2matrix.py", line 363, in ecospold_to_Leontief
self.extract_products()
File "/home/radekl/workspace/ecospold2matrix/ecospold2matrix/ecospold2matrix.py", line 669, in extract_products
assert os.path.exists(fp), "Can't find " + self.__INTERMEXCHANGE
AssertionError: Can't find IntermediateExchanges.xml
`
Hi,
I was wondering what you mean by Leontief coefficients format? In the code the Leontief is called A, Do you mean that your code transforms the ecoinvent data to a symmetric A matrix only or is it also inverted and shaped in the form of (I-A)^-1? What are other attributes of the created A matrix? Does it include self-inputs and does each process have the same output like in IO is assumed 1 euro?
Thank you!
Does anyone encounter this error?
Hi Guillaume,
When I try to parse the xml files in line 785 and 1011 I get an error message that the '<' sign is not in line 1 column 1. When inspecting the file with open(fp).readline() we discovered that the xml's contain a Byte Order Mark. We have been able to get around this by adjusting line 784 and line 1010 as follows:
with open(fp, 'r', encoding="utf-8") as fh:
ecoinvent now parses properly. Haven't been able to produce a C- matrix yet
Hi,
I cannot export SparseMatrix with current master branch.
I used:
import ecospold2matrix as em2
parser = em2.Ecospold2Matrix('/home/tomas/dbs/3.1/cutoff_ecospold02', project_name='eco31_cutoff', positive_waste=True )
parser.ecospold_to_Leontief(fileformats=['SparseMatrix'], with_absolute_flows=True)
but the process cannot finish, because of a fillna problem. Here is the complete output:
If the IMP gets initialized as a DataFrame instead of a numpy array, then the process happens correctly (see tngTUDOR@2298089) but I'm not sure this is the best way to go. What do you think ?
WARNING: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.
2016-05-10 16:54:42,937 - eco31_cutoff - INFO - Ecospold2Matrix Processing
fatal: Not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Project name: eco31_cutoff
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Unit process and Master data directory: /home/tomas/dbs/3.1/cutoff_ecospold02
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Data saved in: /home/tomas/virtualenvs/ecospold2matrix-dev
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Sign conventions changed to make waste flows positive
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Pickle intermediate results to files
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Order processes based on: ISIC, activityName
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Order elementary exchanges based on: comp, name, subcomp
2016-05-10 16:54:44,502 - eco31_cutoff - WARNING - obs2char_subcomps constraints temporarily relaxed because not full recipe parsed
2016-05-10 16:54:44,553 - eco31_cutoff - INFO - Products extracted from IntermediateExchanges.xml with SHA-1 of ca2c05c4dff035265fc44c53c7b534a3a711ff70
2016-05-10 16:54:59,855 - eco31_cutoff - WARNING - Removed 176 duplicate rows from activity_list, see duplicate_activity_list.csv.
2016-05-10 16:54:59,877 - eco31_cutoff - INFO - Activities extracted from ActivityIndex.xml with SHA-1 of c579d38fb6fa4a52ec4e09e5b04b873df77ce4c9
2016-05-10 16:54:59,904 - eco31_cutoff - INFO - Processing 11301 files in /home/tomas/dbs/3.1/cutoff_ecospold02/datasets
2016-05-10 16:55:52,495 - eco31_cutoff - INFO - Flows saved in /home/tomas/dbs/3.1/cutoff_ecospold02/flows.pickle with SHA-1 of ee7fcb8b40433af5e79b09a20d884ac01a2e7189
2016-05-10 16:55:52,542 - eco31_cutoff - INFO - Processing 11301 files - this may take a while ...
/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py:1111: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
self.PRO = PRO.sort(columns=self.PRO_order)
/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py:971: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
self.STR = STR.sort(columns=self.STR_order)
2016-05-10 16:56:40,862 - eco31_cutoff - INFO - Elementary flows extracted from ElementaryExchanges.xml with SHA-1 of 8a3a0a95e8a023950f42704eebc248014164166c
2016-05-10 16:56:40,910 - eco31_cutoff - INFO - Labels saved in /home/tomas/dbs/3.1/cutoff_ecospold02/rawlabels.pickle with SHA-1 of 2be0897a16fd2b0814f8aa6f49424f9b8f131650
2016-05-10 16:56:40,928 - eco31_cutoff - INFO - OK. No untraceable flows.
2016-05-10 16:56:41,241 - eco31_cutoff - INFO - OK. Source activities seem in order. Each product traceable to an activity that actually does produce or distribute this product.
/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py:1475: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
self.PRO = self.PRO.sort(columns=self.PRO_order)
/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py:1476: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
self.STR = self.STR.sort(columns=self.STR_order)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/tomas/virtualenvs/ecospold2matrix-dev/export_3.1_leontief.py in <module>()
1 import ecospold2matrix as em2
2 parser = em2.Ecospold2Matrix('/home/tomas/dbs/3.1/cutoff_ecospold02', project_name='eco31_cutoff', positive_waste=True )
----> 3 parser.ecospold_to_Leontief(fileformats=['SparseMatrix'], with_absolute_flows=True)
/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py in ecospold_to_Leontief(self, fileformats, with_absolute_flows, lci_check, rtol, atol, imax, characterisation_file, ardaidmatching_file)
449
450 # Save system to file
--> 451 self.save_system(fileformats)
452
453 # Read/load lci cummulative emissions and perform quality check
/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py in save_system(self, file_formats)
2015 PRO = self.PRO.fillna('').values
2016 STR = self.STR.fillna('').values
-> 2017 IMP = self.IMP.fillna('').values
2018 PRO_header = self.PRO.columns.values
2019 PRO_header = PRO_header.reshape((1,len(PRO_header)))
AttributeError: 'numpy.ndarray' object has no attribute 'fillna'
Hi Guillaume,
I received an error executing the following line (2437)
foo.cas = foo.cas.str.replace('^[0]*','')
This only occured for those sheets in ReCiPe111.xlsx where there aren't any cas numbers specified for all stressors (LOP,LTP, WDP,MDP, FDP) . The nans are not recognized by str.replace.
A quick fix was to implement the following line.
if foo.cas.values.dtype != 'float64':
foo.cas = foo.cas.str.replace('^[0]*','')
Probably needs a more permanent solution though.
Hi Guillaume,
there is small bug. There is missing closing parenthesis. I do not have rights to push,
so I am attaching diff you can apply. Or just type it yourself. Up to you. It is just one
')'
In characterisation branch.
fix_closing_parenthesis.txt
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.