Giter Club home page Giter Club logo

pylib's Introduction

ICOS Carbon Portal Python Package

Latest Release
PyPI Downloads

About ICOS

The Integrated Carbon Observation System, ICOS, is a European-wide greenhouse gas research infrastructure. ICOS produces standardised data on greenhouse gas concentrations in the atmosphere, as well as on carbon fluxes between the atmosphere, earth and oceans. This information is being used by scientists as well as by decision makers in predicting and mitigating climate change. The high-quality and open ICOS data is based on the measurements from over 130 stations across Europe. For more information about the ICOS station network, data quality control and assurance, and much more, please read the ICOS Handbook 2022, or visit our website https://www.icos-cp.eu/.

This package is under active development. Please be aware that changes to names of functions and classes are possible without further notice. Please do feedback any recommendations, issues, etc. if you try it out.

What is the package about? In essence this package allows you to have direct access to data objects from the ICOS CarbonPortal where a "Preview" is available. It is an easy access to data objects hosted at the ICOS Carbon Portal (https://data.icos-cp.eu/). By using this library you can load data files directly into memory.

Please be aware, that by either downloading data, or accessing data directly through this library, you agree and accept, that all ICOS data is provided under a CC BY 4.0 licence

Installation

The latest release is available on https://pypi.org/project/icoscp/. You can simply run

pip install icoscp

If you need the cutting edge version you may install the library directly from github with

pip install git+https://github.com/ICOS-Carbon-Portal/pylib.git

We would encourage you to use a virtual environment for python to test this library. For example with Miniconda you can create a new environment with:

  • conda create -n icos python
  • activate icos
  • pip install icoscp

Documentation

The full documentation about the library and all the modules are available at https://icos-carbon-portal.github.io/pylib/

Development

For instructions about how to go about extending and testing this software, please see <development.md>

pylib's People

Contributors

altix avatar andreby avatar claudiodonofrio avatar gareth-j avatar karolinapntzt avatar klarakristina avatar mirzov avatar tylere avatar ukarst avatar zogopz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pylib's Issues

Filter out non-icos stations.

Station.getList('AS') returns stations that are not ICOS stations.
Due to a change in the metadata the sparql-query used to retrieve the stations needs an update. This affects several notebooks.

  • Update the needed sparql queries to return the correct stations.
  • Filter out non-icos stations when querying themed stations.

Order of columns in `get_ts()`.

  • Fix the order of the columns returned by get_ts() which seems a bit arbitrary.
  • Also fix the order of the aforementioned columns in the documentation.

Check compatibility of notebooks with pylib 0.1.15

the following files contain a reference to either .station .info .colNames (which have changed in the new pylib version.
We need to check all of them and possibly provide a fix.

.info

  • notebooks\icos_jupyter_notebooks\radiocarbon\gui_measured_cp.py IS
  • notebooks\icos_jupyter_notebooks\radiocarbon\gui_stilt.py IS
  • notebooks\icos_jupyter_notebooks\station_characterization\gui.py IS
  • notebooks\icos_jupyter_notebooks\station_characterization\stc_generate_PDFs.py IS
  • notebooks\project_jupyter_notebooks\envrifair_winterschool\timeseries\modules\bokeh_funcs.py AD
  • notebooks\education\PhD\upscaling_carbon_fluxes\notebooks\exercise2.ipynb ZZ
  • notebooks\education\PhD\upscaling_carbon_fluxes\notebooks\Station_example.ipynb ZZ
  • notebooks\education\PhD\upscaling_carbon_fluxes\notebooks\tools.ipynb ZZ
  • notebooks\icos_jupyter_notebooks\as_obs_tools\icos_as_obs_tools.ipynb AD
  • notebooks\icos_jupyter_notebooks\as_stat_tools\icos_as_stat_tools.ipynb AD
  • notebooks\icos_jupyter_notebooks\icos_STILT\icos_stilt_tools.ipynb AD
  • notebooks\project_jupyter_notebooks\envrifair_winterschool\map\exercise3_station.ipynb ZZ
  • notebooks\project_jupyter_notebooks\envrifair_winterschool\timeseries\icos_obs_vs_stilt_timeseries.ipynb AD
  • notebooks\pylib_examples\ex1_data.ipynb ZZ
  • notebooks\pylib_examples\ex2_station.ipynb ZZ
  • notebooks\pylib_examples\ex4_collection.ipynb ZZ

.station

  • notebooks\icos_jupyter_notebooks\network_characterization\gui_network_characterization.py IS
  • notebooks\icos_jupyter_notebooks\network_characterization\gui_percent_aggregate_footprints.py IS
  • notebooks\icos_jupyter_notebooks\network_characterization\network_object.py IS
  • notebooks\icos_jupyter_notebooks\radiocarbon\gui_measured_cp.py IS
  • notebooks\icos_jupyter_notebooks\radiocarbon\gui_overview_radiocarbon.py IS
  • notebooks\icos_jupyter_notebooks\radiocarbon\gui_stilt.py IS
  • notebooks\icos_jupyter_notebooks\radiocarbon\radiocarbon_functions.py IS
  • notebooks\icos_jupyter_notebooks\radiocarbon\radiocarbon_object.py IS
  • notebooks\icos_jupyter_notebooks\radiocarbon\radiocarbon_object_cp.py IS
  • notebooks\icos_jupyter_notebooks\station_characterization\gui.py IS
  • notebooks\icos_jupyter_notebooks\station_characterization\stationchar.py IS
  • notebooks\icos_jupyter_notebooks\station_characterization\stc_functions.py IS
  • notebooks\icos_jupyter_notebooks\station_characterization\stc_generate_PDFs.py IS
  • notebooks\project_jupyter_notebooks\envrifair_winterschool\timeseries\modules\bokeh_funcs.py AD
  • notebooks\icos_jupyter_notebooks\as_obs_tools\icos_as_obs_tools.ipynb ZZ
  • notebooks\icos_jupyter_notebooks\as_stat_tools\icos_as_stat_tools.ipynb ZZ
  • notebooks\icos_jupyter_notebooks\availability_tools\icos_availability_tools.ipynb ZZ
  • notebooks\icos_jupyter_notebooks\icos_STILT\icos_stilt_tools.ipynb AD
  • notebooks\project_jupyter_notebooks\envrifair_winterschool\map\exercise3_station.ipynb ZZ
  • notebooks\project_jupyter_notebooks\envrifair_winterschool\timeseries\icos_obs_vs_stilt_timeseries.ipynb AD
  • notebooks\project_jupyter_notebooks\RINGO_T1.3\modules\flasksampling_modules_ff.ipynb ZZ
  • notebooks\pylib_examples\ex2_station.ipynb ZZ
  • notebooks\pylib_examples\ex5_sparql.ipynb ZZ

.colNames

  • notebooks\icos_jupyter_notebooks\as_obs_tools\icos_as_obs_tools.ipynb
  • notebooks\pylib_examples\ex1_data.ipynb ZZ
  • notebooks\pylib_examples\ex4_collection.ipynb ZZ

stats reporting

adjust stats reporting to reflect origin
either inhouse jupyter services or external

find station method

the station module should contain a function .find('search string') which should find all relevant stations. For example the search for Norunda should return two (2) station objects for the atmospheric and ecosystem station id (SE-Nor, NOR). This find function should be a full text search over all station properties.

Regenerate json file in stilt module.

Use the built-in _save function in icoscp/stilt/geoinfo.py to regenerate the stations.json file.
The script must be executed on a Carbon Portal server, with access to the stilt data file system.

  • Regenerate file stations.json.
  • Update pylib's documentation to include these changes.

Problems when building the library.

When building the pylib to release the next version, after this step run python setup.py sdist bdist_wheel the file icoscp.egg-info/PKG-INFO gets modified automatically by the run command.

Release 0.1.14

Tasks:

  • StiltStations: get columns.. make case InSeNsitive
  • Fix folium map of stations.
    Generating a folium map of stations has stopped working with 'ALL' and 'ICOS' projects:
    station.getIdList(project='ALL', outfmt='map')
    station.getIdList(project='ICOS', outfmt='map')
    The code executes successfully with 'NEON', 'INGOS', and 'FLUXNET' projects.
  • Nominatim
    • When pausing the docker container of icos nominatim, requests to reverse geocoding are not properly forwarded to OpenStreetMap nominatim.
    • Reverse geocoding is not working correctly for zoom levels 0-4 for Spain, Gibraltar and Russia (Russia is not included in the europe .pbf files and needs to be added docker side.)

Regenerate json file in stilt module.

Due to updates in the SPARQL of the getStation() function, the function getIdList() has also changed and thus we need to re-regenerate the stations.json file.

  • Regenerate file stations.json.

Report usage of stilt module.

Track and report stilt's data usage back to rest-heart.
Information to be reported back:

  • station id,
  • station coordinates,
  • station country,
  • data type (or which function was executed, namely get_ts() (get time-series) or get_fp (get footprint)).
  • library making the call,
  • version of icoscp,
  • internal flag (currently always True).

Select a unique key for this data entry and pick a constant format/schema for the data.

Tasks for future releases

See here which tasks were included in release 0.1.18

  • release 0.1.19
    • #61
    • remove all unused sparql queries from the pylib, add a new tool to provide access to all example queries from the sparql endpoint http....
    • ⛓️ ICOS stations: add column to .data() indicating if the data object can be loaded with the pylib or note
    • ⛓️ ICOS dobj: if a pid is set where no data is available, return a sensible error message to the user
    • abstract the backend server for Dobj. The goal is to read datasets from other RI like fieldsites
    • add .help() function returning the url to the documentation
    • #76
    • #24
    • #132
    • #133
    • Rework the code from Jonathan here, which was included in release 0.1.18


  • Issue or card?
    • #134 This could perhaps wait until we remove the sparqls from the station module.
    • Reporting data. __portalUse() currently reports data access. Maybe create a module that can be used from stilt and dobj or collections.
    • Add icos data link to stilt station. (https://stilt.icos-cp.eu/viewer/stationinfo) (Needs elaboration)
    • Consider renaming the cpb module.
    • Filter station.data() by product ('co2' for example).
    • Go through the code and check for strings that need to be added to constants.
    • Licence by pid is available from the metadata store. Check it out by sparqling any pid. License should be fetched from the metadata store and should not be set as a fixed string.
    • Automation of pylib. (🌿 $git checkout tests locally for Zois)
      • Test automation (pytest)
      • Force coding styles (flake8)
      • Automate distribution to PyPI (GitHub workflows, actions, e.t.c) (tox)
      • Zois & Claudio have both worked on this. No issue has yet been published.
    • Convert countries to module (🌿 Here's the branch published, although there isn't any actual work in it: https://github.com/ICOS-Carbon-Portal/pylib/tree/zz_countries_module)
    • Switch nominatim to something lighter independent of release (Probably make a card for this)
      • Include automation (see point below) (Learn how to automate)
      • Implement as webservice with docker.
      • Minimal python
      • Shapefiles
      • More info on slack.

  • Anders's comments/ideas
    • There are several function calls of the pylib with the parameter ‘project’. Some of this code is implemented only for ‘project’ = ‘ICOS’ (which is fine), but some are related to 'NEON', 'INGOS' or, 'FLUXNET' which is confusing for a user who might compare the pylib to the data portal (which, by the time of writing, has these seven projects: European ObsPack, GCP, ICOS, InGOS, Miscellaneous (Various other data not associated with a specific project), SOCAT, Swedish National Network).
      Should we clarify or explain this somehow?
    • Some improvement suggestions for the Station object, of icoscp.station:
      1. The Station object has the property products. By default products() returns a pandas DataFrame (provided the station object has ‘data’), but it can deliver a dictionary, (which is just the dataframe converted into a dictionary with the key 0 (zero), and then products(‘dict’)[0] is a dictionary), a better choice would be to return:
        • A dictionary with useful keys. In the atmospheric case (we could have other implementations for ES, OS): product labels with sampling heights like: {'Atmospheric CO2 product': [2.5, 8, 33, 60, 150], 'ICOS ATC CO2 Release': [2.5, 10, 30, 60, 150],...}
        • Just a list with labels instead of a data-frame
      2. The Station object could have a method get_dobj() with parameters doi or a product-key, where the product-key, in the case of 'AS' could be a tuple like (‘ICOS ATC CO2 Release’, 60).
    • icoscp.station.fmap() – if provided a dataframe with product data, we could give much more specific info on the station – e.g. if the product_df contains the column 'samplingheight', 'height' or 'depth'. This could be displayed on the map. Here is an example where I renamed the column 'samplingheight' to 'elevation'
      Screenshot from 2023-02-21 17-24-19

Update stationData() sparql.

The data-property of the station object is slow (also it lacks the prefixes)

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

it would be better to use a more narrow query instead of fetching records we don't want.

Property returns wrong content.

station.name property returns some index number, not the string containing the station name.
station[2] gives the correct station name.

See for example:

from icoscp.station import station
stationlist=station.getIdList(project='all')
for i, stat in stationlist.iterrows():
    if (stat.theme in ['AS','ATMO']): 
        print(stat.name,stat.id,stat.uri,stat[2])

Multiple PIs not supported; bad naming of PI-related props

There are multiple ICOS stations with more than one PI. The object returned from e.g. station.get('CMN') does not reflect that. Additionally, it appears that the object representing the station has properties "firstName", "lastName" and "email", which clearly does not make sense. There should be property called "pis", with an array of objects representing people as the value.

Also, there is no reason for the "uri" property to be an array.

Controlled exiting of stilt error.

Currently when users try to run the stilt module locally they sometimes get a
NameError: name 'exit' is not defined error.

  • Update user information when they are trying to access the module locally.
  • Fix the NameError: name 'exit' is not defined error.

Instrument in atmospheric measurements metadata.

Sorry if this isn't the right place but I was wondering if the instrument used for atmospheric measurements is available anywhere in the metadata? I haven't been able to find it in the station or measurement metadata, I might have just missed it though.

Thanks,

Gareth

docstring in station module check example

the example in the docstring of the station module on how to get a list of atmospheric stations is not working as described in the docstring
myList = station.getList('AS') # returns a list of Atmospheric stations
returns an error
myList = station.getList(['AS']) works as expected

return_all_stiltstation

the function to find all stilstation, does not return the full list of stiltstations

from icoscp.stilt import stiltstation
stiltstation.find()

the progressbar displayed shows there over 200 stations, but only around 100 are return.

Resolve deprecation warnings from numpy.

In stiltstation and geoinfo we use

  lon = np.float(clon[:-1])
  lat = np.float(clat[:-1])
  alt = np.int(loc_ident[-5:])

these calls leads to silent warnings:
DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
lon = np.float(clon[:-1])
DeprecationWarning: np.int is a deprecated alias for the builtin int. To silence this warning, use int by itself. Doing this will not modify any behavior and is safe. When replacing np.int, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
alt = np.int(loc_ident[-5:])

Rework the default return of `get_ts` in stiltobj.py

Today the default return is ["isodate","co2.stilt","co2.fuel","co2.bio", "co2.background"]
it would be nice if we had ["isodate","co2.stilt","co2.fuel","co2.bio", "co2.cement", "co2.background"]
instead since then these are the components of co2.stilt:
co2.stilt = co2.fuel + co2.bio + co2.cement + co2.background

Add availability table as output format to stilt stations

Extend the STILT module to provide an availability table for STILT stations.
The code to process and calculate STILT availabilities is already implemented and exists in jupyter-collaboration-space under cptools project.

Documentation for the STILT module can be found here.

Performance message in `Dobj()`

The following snippet:

from icoscp.cpb.dobj import Dobj
pid = 'https://meta.icos-cp.eu/objects/BEK6kHXAhE4yDdk_P9i5nF-K'
df = Dobj(pid).data

generates the warning:

/opt/conda/lib/python3.10/site-packages/icoscp/cpb/dobj.py:381: PerformanceWarning:

DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead.  To get a de-fragmented frame, use `newframe = frame.copy()`

Add `icon` argument to `getIdList()` function.

Extend getIdList() function by adding an icon argument.
By default, when running:
station.getIdList(project='icos', outfmt='map') the code adds a country flag to each station's location.

  • Change default behavior to a folium's built-in default marker.
    station.getIdList(project='icos', outfmt='map')
  • Add functionality for custom user defined icon.
    station.getIdList(project='icos', outfmt='map', icon='/home/.../small_sized_image.png')
  • Keep the previous functionality when the user explicitly sets icon='flag'.
    station.getIdList(project='icos', outfmt='map', icon='flag')
  • Update pylib's documentation to include these changes.

Release 0.1.12

  • Fix folium map of stations.
    Generating a folium map of stations has stopped working with 'ALL' and 'ICOS' projects:
    station.getIdList(project='ALL', outfmt='map')
    station.getIdList(project='ICOS', outfmt='map')
    The code executes successfully with 'NEON', 'INGOS', and 'FLUXNET' projects.

    The issue above was forwarded to #53

  • Add "internal" icos nominatim to the stilt reverse geocoding country search.
    The external nominatim service has a usage policy which only allows an absolute maximum of 1 request per second thus limiting the Stilt module in its reverse geocoding needs. For this reason we have deployed our own icos nominatim web service and it needs to be added in the reverse geocoding implementation.

    • First responder when reverse geocoding using latitude, longitude coordinates should be icos nominatim.
    • Second responder should be nominatim.
  • #50.

Disable cache control

Sometimes using SPARQL to fetch data might fail. The python request will succeed, giving you an OK status_code, but the actual response is an incomplete json object. This happens due to the response being too big in size, and the cache being limited to a specific number of bytes.
Example run:

from icoscp.sparql.runsparql import RunSparql

object_specification = '<http://meta.icos-cp.eu/resources/cpmeta/atcLosGatosL0DataObject>' + '<http://meta.icos-cp.eu/resources/cpmeta/atcPicarroL0DataObject>'
query = (
    f"prefix cpmeta: <http://meta.icos-cp.eu/ontologies/cpmeta/>\n"
    f"prefix prov: <http://www.w3.org/ns/prov#>\n"
    f"select ?dobj ?hasNextVersion ?spec ?fileName ?size ?submTime ?timeStart ?timeEnd\n"
    f"where {{\n"
    f"  VALUES ?spec {{{object_specification}}}\n"
    f"  ?dobj cpmeta:hasObjectSpec ?spec .\n"
    f"  ?dobj cpmeta:hasSizeInBytes ?size .\n"
    f"  ?dobj cpmeta:hasName ?fileName .\n"
    f"  ?dobj cpmeta:wasSubmittedBy/prov:endedAtTime ?submTime .\n"
    f"  ?dobj cpmeta:hasStartTime | (cpmeta:wasAcquiredBy/prov:startedAtTime) ?timeStart .\n"
    f"  ?dobj cpmeta:hasEndTime | (cpmeta:wasAcquiredBy/prov:endedAtTime) ?timeEnd .\n"
    f"  FILTER NOT EXISTS {{[] cpmeta:isNextVersionOf ?dobj}}\n"
    f"  }}\n"
    f"order by desc(?submTime)\n"
)
raw_data = RunSparql(sparql_query=query, output_format='json').run()

and the error:

SPARQL RESPONSE TOO LARGE TO BE CACHED.
The largest cacheable response size is 8388608 bytes.
Try running the query with 'Cache-Control: no-cache' to get full response

Add outfmt argument to getIdList() function.

  • Add the optional argument outfmt in getIdList() function in pylib/icoscp/station/station.py.
  • The outfmt argument can have either 'pandas' or 'map' values assigned.
  • Among other information the link to the station's DOI should be provided.
  • Rework the function's documentation (docstring).
  • Add the gh-pages documentation.

Release 0.1.10

  • Include non-code files to the distribution.
  • Report pylib's version to the back-end instead of dobj.py version.
  • Check if pylib is run locally and inform the user that the stilt module can only be used on ICOS servers.

Fix sparql query.

During the testing phase of pylib patch release 0.1.16 we noticed that the code returns some icos stations with None values for station properties firstName, lastName and siteType. That is the first and last name for the station's principal investigator, and the site type of the station (which is something like mountain, tower, groud, grassland etc).

  • Update the SPARQL query in getStations() function of the SPARQL module according to Oleg's input.
  • Update stations_with_pi() function to use the getStations() function instead.
  • Add a deprecation warning for stations_with_pi() function.

Remove eag attribute.

Ute's comments

The ‘eag’ (= elevation above ground level, see comments in the code) attribute doesn’t make sense anymore.
It is set to the same values as ‘eas’ (=elevation above sea level) - which is complete nonsense. As long as the eag information was coming from the labelling app, it was at least not wrong. eag should simply be removed from the list of attributes in the next version.

  • Remove the eag attribute.
  • Find any associations of the attribute with other components of the pylib and adjust them accordingly.

Bug: stiltstation.find(outfmt = 'list') leads to TypeError

In order to reproduce, run the code:

from icoscp.stilt import stiltstation

stations = stiltstation.find(country="sweden",outfmt='list')

Result:

 /opt/conda/lib/python3.8/site-packages/icoscp/stilt/stiltstation.py in <listcomp>(.0)
    301 def __get_object(stations):
    302 
--> 303     return [StiltStation().get_info(stations[st]) for st in stations.keys()]
    304 
    305 

TypeError: __init__() missing 1 required positional argument: 'st_dict'
)

units for each column

units can be extracted from the .info methoda (meta data information about the data set).
Make it easier for the user by returning the unit for each column in the function/property
.colNames

Rework stiltstation.find(id='...')

stiltstation.find(id='HTM') (for any stilt station id) should not cycle through all the stations.
Minimize the list first before assembling the station.

correct ICOS sampling height in stiltstation dictionary

Currently, the uppermost sampling height of an ICOS station is provided regardless of the sampling height in STILT. This can be misleading for the user. A list of all available sampling heights at the ICOS station would be better. The corresponding sampling height would be perfect.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.