pypsa-meets-earth / pypsa-earth Goto Github PK
View Code? Open in Web Editor NEWPyPSA-Earth: A flexible Python-based open optimisation model to study energy system futures around the world.
Home Page: https://pypsa-earth.readthedocs.io/en/latest/
PyPSA-Earth: A flexible Python-based open optimisation model to study energy system futures around the world.
Home Page: https://pypsa-earth.readthedocs.io/en/latest/
See PyPSA-Eur
Selecting columns results in key error in osm_pbf_power_data_extractor.py .
df_all_substations = df_all_substations[ { "id", "lonlat", "tags.power", "tags.substation", "tags.voltage", "tags.frequency", "Type", "Country", } ]
How to reproduce:
run with test_CC (Nigeria)
Possible fix for now:
df_all_substations = df_all_substations[ df_all_substations.columns & [ "id", "lonlat", "tags.power", "tags.substation", "tags.voltage", "tags.frequency", "Type", "Country", ] ]
However "operating as a set operation is deprecated" in pandas
We need a bus_id
To Do:
osm_pbf_power_data_extractor.py does not contain cleaning for generators. Some cleaning rules are given in the test jupyter notebook, however these return a SettingWithCopyWarning.
Todo: correct warning and commit to main script.
We have to review how the offshore buses are defined and fix the offshore shapes
Originally posted by @pz-max in #99 (comment)
The simple osm_build_network.py, only uses the line.csv script and creates busses at the start and end of each line. Eventhough lines might look connected on the map from a high-level, they are not point-to-point connected if you zoom in. Meaning the geolocation of a line end-point is not the geolocation of a start/end-point of another line. But what was recognised is that they end in certain areas which are eventually marked as substation areas.
TODO:
We had some issues reading OSM data for a list of countries via https://gitlab.com/dlr-ve-esy/esy-osmfilter .
A hacky solution was provided by @mnm-matin that add some new functions in between the esy-osmfilter code.
That the whole esy-osmfilter community can benefit from our changes/functions we propose two solutions to not only push back our contributions, but also have a stable package for our model:
Comparing the OSM_extracted network map to the worldbank provided map, we could see that mixing both datasets would be best (check Egypt and Mauritania). So maybe it would make sense to combine these datasets without adding duplicates if possible.
To model hydro powerplants, we need data on:
Not all hydro power plants have a reservoir (run-of-river plants do not), but those that have reservoirs provide flexibility to the energy system by being able to shift generation in time.
As a starting point, we can investigate using the GRanD database (http://globaldamwatch.org/grand/). The data can be freely used, but note that the license doesn't allow us to redistribute the data. There may also be other suitable databases that we could use.
The main task for this issue is integrating the reservoir data (from e.g. GRanD) with our existing powerplant data (from powerplantmatching), and inflow modelling data (see #38).
So far we have the OpenStreetMap (OSM) power.line dataset which represents overhead lines.
We could add OSM power.cable data to our dataset to add underground/oversea lines. Such underground/oversea lines are also implemented in PyPSA-Eur.
The script seems to runs now, though a couple of non-critical problems appear. Maybe good to solve them.
The osm_pbf_power_data_extractor.py extracts OpenStreetMap data for power lines, cables, substation, cables at the moment. It can be put in an extra repository for the following reason:
Some thoughts need to put into that as well as some time spent for proper execution.
For now, we extract OpenStreetMap data only for Africa in our osm_pbf_power_data_extractor.py.
Because we want to apply some ML to detect energy assets (and because it would be just a great feature for the community), we need to add all other continents to the extraction process. You might to adjust the iso_country script to exclude countries that represent small islands.
That script should be operatable from a list i.e. ["Africa", "Europe", "Asia","Islands"]
A tanzania line has issues in defining its ending: the boundary of the geometry has no 2 points (start-end).
For the whole code-base, we need to agree on a way to internally represent country names. For WP2, I have adopted ISO 3166 two-letter country codes. For OSM data extraction, country names are used (see also this note in #37). We should find a convention and stick to it.
Personally, I would argue that ISO 3166 country codes (https://www.iso.org/iso-3166-country-codes.html) are the way to go, at least for internal representation in code. In WP2, I have already had to patch the powerplantmatching tool to work with two-letter country codes, because the different databases being merged use different names for some countries. Country names depend on language, have short and long forms and sometimes contain special characters (e.g. Côte d'Ivoire) which may or may not be converted to ASCII equivalent depending on the data source. Therefore I think we are setting ourselves up for trouble if we want to use full country names in code.
Of course, when it comes to presentation, we should use full country names. There is already the dictionary at https://github.com/pypsa-meets-africa/pypsa-africa/blob/main/scripts/iso_country_codes.py, and the python package pycountry
also provides easy tools for working with country names.
The alternative of using full country names internally is of course also possible, but then we need to at the very least have a strict standard for which form of the names we use. Let's discuss!
As a side-note, I think that using full names internally in PyPSA-Eur works, but even there it might have been easier to just for the country codes. I will probably raise the issue at least with powerplantmatching and see if the upstream there is interested in using two-letting country codes instead of full names (at least internally within powerplantmatching).
The way we collect powerplant data (at least conventional) is using the powerplantmatching tools (see WP2).
WP5 has produced a geojson file with powerplant data for Africa. To integrate this the other powerplant databases (GPD, CARMA, GEO), a function has to written for powerplantmatching to "import" the geojson file. See https://powerplantmatching.readthedocs.io/en/latest/contributing.html#integrating-new-data-sources. For the time being, we are using a fork of powerplantmatching at https://github.com/koen-vg/powerplantmatching. Note that this fork has been modified to work with two-letter country codes (ISO 3166-1) rather than full names.
A working prototype of the importer function is good enough for now.
Eventually, we need to think about how to pass the data to the powerplantmatching tool, etc. When our solution is robust, we should contribute it back upstream to powerplantmatching, which will also simplify our workflow.
The osm_build_network.py
creates from line data, buses at the start and endpoint.
Though, looking at the image below which can be reproduced with pypsa-africa/notebooks/osm_build_network_plot.ipynb
, we find out that some buses appear to be relatively far from the maingrid:
Further, the osm_build_network.py
also defines the low voltage buses. Currently, most buses are set to "True", and only at locations with multiple buses (LV,MV,HV) only the lowest bus is "True" while the others are set "False". We recognised for instance, by looking at hydropower plants, that there we assume that these are also LV buses. We need to:
Our goal is to keep the network structure of the "real system" in the representation. Since the Voronoi cell will spread from these LV buses later, maybe it is a good option to keep the generators as LV_bus. In Aswan, Egypt, is a roughly 1.4 GW solar PV plant installed in the desert far from the city. If Egypt is modelled in high resolution, it might be useful to have the grid bottlenecks represented I.e. by spreading the Voroin cell from the generator substation.
As discussed with @pz-max , it is well known that the build_bus_regions appear to have wide blank ares, mainly related to Somalina, Ciad, Central African Republic, South Sudan and Somalia.
For Somalia, there are really no data, but for the other countries there are only few data (1-5 lines/buses), so that when the voronoi cells are calculated, vast areas of the countries become blank.
My feeling is that this issue depends on the implementation of voronoi_partition_pts which limits the size of each voronoi cell. When the number of points is large and widespread all over the region this is ok, but in our case where in some countries we have 1-3 points very close each other, the above does not work.
I see two options:
The version (a) may be faster
Regarding the off_shore areas, the same problem may apply also there, however, some countries may not have off shore buses.
I am not sure whether we should introduce fake nodes and lines to do so at this stage. Maybe we should tackle that issue when we will address the "under_construction" lines or similar.
As we are developing the database of powerplants, we also need to model hydro inflow for all hydro plants. That is, for each hydroplant, we need hourly time series giving the amount of water flowing into the reservoir of the power plant (in MW).
As a first step, we should set up atlite to do this. That's how it's done in PyPSA-Eur, and can serve as inspiration here. The inflow modelling in PyPSA-Eur is quite simple, but is a good starting point. If we want, we can later integrate more advanced hydro-specific tools such as LISFLOOD.
Resources:
We download generators, towers, lines, etc. from Geofabrik.
BY default geofabrik assigns one country shapes for "Senegal-Gambia".
According to the ISO these should be represented be splitted.
We need to do the following:
Functions should also work also outside the snakemake workflow. (See PyPSA-Eur PR here PyPSA/pypsa-eur#275). Instead of using snakemake in a function, we should simply assign it to a variable.
An example of do's and don'ts
Don't:
def load(n, cost=snakemake.input["cost"]):
Don't:
def load(n):
cost=snakemake.input["cost"]
Do:
def load(n, costs):
cost=costs
and later at the execution important the i.e. costs from snakemake:
costs = snakemake.input["cost"]
BUG 1 – Something wrong with filling the dataframe. Would try another method than df.loc[variable, “pop”] = gadm_pop
to test if there is somehow the bug.
(toast) max@max-XPS-13-9300:~/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa$ /home/max/anaconda3/envs/toast/bin/python /home/max/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa/scripts/build_shapes.py
This is the repository path: /home/max/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa/scripts
Had to go 0 folder(s) up.
Create country shapes
/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/geopandas/geodataframe.py:577: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
for feature in features_lst:
Create offshore shapes
/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/geopandas/geodataframe.py:577: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
for feature in features_lst:
Merge country shapes
Creation GADM GeoDataFrame
/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/geopandas/geodataframe.py:577: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
for feature in features_lst:
Add population data to GADM GeoDataFrame
Download WorldPop datasets
DZ : 0 out of 1504
103142.46 (PRINT BY MAX. To check the population data)
Traceback (most recent call last):
File "/home/max/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa/scripts/build_shapes.py", line 574, in
gadm_shapes = gadm(layer_id, update, out_logging, year)
File "/home/max/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa/scripts/build_shapes.py", line 529, in gadm
add_population_data(df_gadm, countries, year, update, out_logging)
File "/home/max/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa/scripts/build_shapes.py", line 383, in add_population_data
df_gadm.loc[index, "pop"] = pop_by_geom
File "/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/pandas/core/indexing.py", line 723, in setitem
iloc._setitem_with_indexer(indexer, value, self.name)
File "/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/pandas/core/indexing.py", line 1730, in _setitem_with_indexer
self._setitem_with_indexer_split_path(indexer, value, name)
File "/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/pandas/core/indexing.py", line 1817, in _setitem_with_indexer_split_path
self._setitem_single_column(loc, value, pi)
File "/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/pandas/core/indexing.py", line 1920, in _setitem_single_column
ser._mgr = ser._mgr.setitem(indexer=(pi,), value=value)
File "/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 355, in setitem
return self.apply("setitem", indexer=indexer, value=value)
File "/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 327, in apply
applied = getattr(b, f)(**kwargs)
File "/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 926, in setitem
if not self._can_hold_element(value):
File "/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 622, in _can_hold_element
return can_hold_element(self.values, element)
File "/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/pandas/core/dtypes/cast.py", line 2192, in can_hold_element
if is_float(element) and element.is_integer():
AttributeError: 'numpy.float32' object has no attribute 'is_integer'
The script works fine but there are some areas of improvements which I will list in this issue. There are various TODO: comments scattered throughout the code as well.
Disclaimer: I have written most of the code in the script and that is why I feel so liberal critiquing it.
To avoid issues in the accessibility of resources online (see problems with the gadm platform), we may decide to store a sample of raw data to be internally used as debug.
The sample may be related to 1-2 countries just to run the codes.
To do so, we may enable to push changes of the folder data/raw for example (assuming that will be the one used to store the raw data)
I let the extractor script run from scratch.
The tags were extracted for the lines
but not for the cables
.
The missing tags.circuits
and tags.frequency
columns lead to issues in the cleaning script.
The cleaning script is now stable but maybe we can solve that issue.
This has been discussed many times but I think a document explaining how the fields in the csv files produced by entsoegridkit are related should be created. In other words: what rules are followed for the relation of generator_id, bus_id, bus_0, bus_1, etc.
An explanation to the format of the "tags" would also be helpful eg. "'""TSO""=>"""""
Some geometries are listed as " ""oid""=>""29894""" (ArcGis?) although value such as POINT(39.673004 22.086912) are given in later fields without headers.
It is also important to keep in mind the format of the osm data. We currently have generators, substations and lines only.
Climate compatible growth is providing a documented dataset for each each country in Africa.
Some of their data is useful for our project, for instance, capital cost, variable cost, discount rates, average capacity factors for non-variable generators. This data is provided also per country over the lifetime from 2015-2050.
One task would be to create a dataframe from the above data that looks like PyPSA-Eur. Just for multiple years and a new column called 'country'.
In the GADM dataset, used to load the shapes, Morocco and Western Sahara are managed as two different countries, whereas the geofabrik dataset accounts for them as a unique country.
Thus we have two options:
The build renewable_profiles.py
from PyPSA-Eur might be split into:
build_renewable_area_availbility
build_renewable_profiles
The intention behind this is that the code base might stay more
comprehensible. Further potential new additions such as:
might require new features either on the time-series or land
availability side. For instance, that an area availability map is produced
for each technology separately.
Plotted the PyPSA-Eur 'base_network.py' inputs here. Creating static plots is good, but dynamic plots even better. With dynamic plots, I mean that you can zoom in and out and move around the map which is not only useful for detailed data exploration at various locations but also beneficial for future analysis.
The above script already builds in a dynamic plot via Folium. However, slightly more effort should be put into this such that multiple assets can be plotted in one map with different colours.
Further, one can put some thought into it how all plots in PyPSA could be dynamic i.e. n.plot('dynamic = True')
In short:
ERA5 is ok for wind time-series, but not so good for solar time-series in some regions.
Urraca et al (2018) summarise it nicely:
"[...] This makes ERA5 comparable with satellite-derived products in terms of the mean bias in most inland stations, but ERA5 results degrade in coastal areas and mountains. The bias of ERA5 varies with the cloudiness, overestimating under cloudy conditions and slightly underestimating under clear-skies, which suggests a poor prediction of cloud patterns and leads to larger absolute errors than that of satellite-based products. [...] We conclude that ERA5 and COSMO-REA6 have reduced the gap between reanalysis and satellite-based data, but further development is required in the prediction of clouds while the spatial grid of ERA5 (31 km) remains inadequate for places with high variability of surface irradiance (coasts and mountains). Satellite-based data should be still used when available, but having in mind their limitations, ERA5 is a valid alternative for situations in which satellite-based data are missing (polar regions and gaps in times series) while COSMO-REA6 complements ERA5 in Central and Northern Europe mitigating the limitations of ERA5 in coastal areas."
Originally posted by @euronion in #50 (comment)
We have created a "simple" network topology with osm_build_network.py. This network data suits to try making the pypsa-eur base_network.py run which does many of the jobs needed.
**Error message zipfile.BadZipFile: File is not a zip file
. Seems to me that GADM has some failing features. Or did it work for you for the whole continent @davide-f ? **
This is the repository path: /home/max/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa/scripts
Had to go 0 folder(s) up.
Create country shapes
/home/max/anaconda3/envs/toast/lib/python3.9/site-packages/geopandas/geodataframe.py:577: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
for feature in features_lst:
Traceback (most recent call last):
File "/home/max/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa/scripts/build_shapes.py", line 565, in
country_shapes = countries(update, out_logging)
File "/home/max/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa/scripts/build_shapes.py", line 172, in countries
df_countries = get_GADM_layer(countries, 0, update)
File "/home/max/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa/scripts/build_shapes.py", line 112, in get_GADM_layer
file_gpkg, name_file = download_GADM(country_code, False)
File "/home/max/OneDrive/PHD-Flexibility/07_pypsa-africa/0github/pypsa-africa/pypsa-africa/scripts/build_shapes.py", line 86, in download_GADM
with zipfile.ZipFile(GADM_inputfile_zip, "r") as zip_ref:
File "/home/max/anaconda3/envs/toast/lib/python3.9/zipfile.py", line 1257, in init
self._RealGetContents()
File "/home/max/anaconda3/envs/toast/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
We need timeseries data for solar, wind and hydro. I will try to implement existing Atlite approaches.
Update, we tested the current version of powerplantmatching in branch WP2. @mnm-matin and I had the following issue:
Further, just a note. It seems that the issue you had with the custom_config is solved (see PyPSA/powerplantmatching#41). As you mentioned in the README it makes probably sense to use now the less hacky custom_config feature, since the issue is fixed. They are also quite some updates upstream that might remove the above-described issue.
Let us know what are your thoughts or plans 👍
I introduce the current missing elements of build_shapes to discuss, track and define the next steps.
snakemake -j 1 build_shapes
run from the pypsa-africa root. (Requires currently path adjustments)AttributeError: 'numpy.float32' object has no attribute 'is_integer'
(see comment below from Max)the file of the eez zones may be downloaded by the user inlcuding data from the website ( https://www.marineregions.org/download_file.php?name=World_EEZ_v11_20191118_gpkg.zip ), We may include this file in the package to avoid the user to manually download it
(others issues will be progressively added)
As pypsa-eur
To DO:
separate voltage
under construction info
Create bus_id -> start & end
Line ID
DC (links) and AC line separation (Transformer in between & station at the location)
Intersecting lines should join?
Line changes voltage -> transformer in between
Line end and start should be a bus
Big point. Connect all the loose lines (create one network?)
When applying the power extractor tool to the whole africa about 100 unexpected lines are saved in the csv.
Such lines may not have buses, or a specific country associated.
In github I push the output file I obtained, which is the most updated, given the scripts
See TODO comments in cleaning scripts. Would have to refine and add rules for cleaning inclusive of comments.
We should operate the osm_pbf_power_data_extractor.py with snakemake. I.e. by the command snakemake -j 1 extract_osm_data
Our documentation is currently not build automatically see:
https://pypsa-meets-africa.readthedocs.io/en/latest/api_reference.html
We should fix that before the prototype is released.
To reproduce the error message:
conda env create -f envs/environment.docs.yaml
cd /doc
make html
to build the documentation locally.Some places that can help solving the issue:
Readthedocs uses Sphinx. In particular, the sphinx.ext.autodoc creates the automated documentation. It is documented here.
Examples of other API's (automated documentation):
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.