bird-house / finch Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 5.0 6.5 MB

A Web Processing Service for Climate Indicators

Home Page: https://finch.readthedocs.io/en/latest/

License: Apache License 2.0

Makefile 2.60% Python 97.06% Dockerfile 0.35%

climate indices xarray xclim wps ogc

finch's People

Contributors

Stargazers

Watchers

Forkers

davidcaron zeitsperre cjauvin

finch's Issues

Description

export WPS_SERVICE=http://finch.crim.ca/wps
export VERIFY_SSL=False
birdy tn_min --tasmin https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip3/tasmin.sresa2.miub_echo_g.run1.atm.da.nc

Fails with

Warning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.

Warning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.

Error: Connection failed.

Configure processes list from parameters

Description

The processes list is quite long, so it would make sense to add some configuration parameters to disable some processes groups.

Description

Build on dockerhub.
Do I have permissions to do that ?

Wait for xclim v0.14 before next release

Description

There was a bug fixed in Ouranosinc/xclim#372 that is required to perform some computations with bccaqv2 files.

Add units to inputs for process descriptions

Description

Some inputs have units that could be described in the DescribeProcess document. For example temperature (degC) or precipitations (mm/day). These could be used for default units when not provided. @huard what do you think?

Also, these units can be used by the platform frontend. Currently, the only way I'm aware of to know which unit to provide to the process is by parsing the default values...

Related PR: geopython/pywps#523

netcdf_output format is DODS by default instead of NETCDF

Description

Or at least that's my interpretation of an error that occurs when accessing the output from an indicator calculation using birdy.

If not specified, PyWPS will use the first format in the supported_formats list.

Bbox fails when wrapping around dataset boundaries

Description

a bbox :

lon0 = -5 # Minimum longitude.
lon1 = 17.0 # Maximum longitude.
lat0 = 10.5 # Minimum latitude.
lat1 = 24.0 # Maximum latitude.

out = finch_i.subset_bbox(
    resource=ds[0], lon0=lon0, lon1=lon1, lat0=lat0, lat1=lat1)

failed with:

owslib.wps.WPSException : {'code': 'NoApplicableCode', 'locator': 'None', 'text': 'Process error: method=wps_xsubsetbbox.py._handler, line=170, msg=Input longitude bounds ([-5. 17.]) cross the 0 degree meridian but dataset longitudes are all positive.'}

wrong metadata frequency

Description

calculation of indices are mostly connected with time aggregation (month, year ...).

But in the output file stays the

	:frequency = "day" ;

Also check if timestamps set according to archive specifications. Gues you need to drop the day/month information.

Multiple grid cells when subsetting

Description

There is a need to subset the BCCAQv2 data with multiple grid cells at a time. Not a bounding box, just a list of lat-lon coordinates.

@huard I was wondering how we should implement this and I wanted to get your opinion. Here is how I could do it. Currently, the SubsetBCCAQV2Process process accepts lat0, lat1, lon0, lon1 coordinates. If the lat1 and lon1 are not given, the process makes a single grid cell subset.

I was thinking of changing the lon0 and lat0 to accept a list of comma separated floats instead. Do you think this would be confusing? So the same process could be used for:

Single grid cell subset: lat0, lon0
Boundingbox subset: lat0, lon0, lat1, lon1
Multiple grid cell subset: lat0, lon0 as a list of comma separated floats

Fix segmentation fault in Travis ci test suite

Description

Travis CI fails with a segmentation fault.

Environment

Finch version used, if any: latest
Python version, if any: 3.6
Operating System: Linux

Steps to Reproduce

The test responsible for the segmentation fault is: tests/test_wps_xsubsetpoint.py::test_thredds.

I could reproduce the error with this very simplified piece of code:

import xarray as xr
from pathlib import Path

here = Path(__file__).parent

url1 = "http://test.opendap.org:8080/opendap/netcdf/examples/tos_O1_2001-2002.nc"
url2 = "http://test.opendap.org:8080/opendap/netcdf/examples/sresa1b_ncar_ccsm3_0_run1_200001.nc"

for n, url in enumerate([url2, url1]):
    ds = xr.open_dataset(url)
    ds.to_netcdf(str(here / f'{n}.nc'))

Notice how no part of finch or pywps is touched in this code. The error we get is either:

1]    22220 segmentation fault (core dumped)  env USER=ubuntu  SHLVL=0 HOME=/home/ubuntu  LOGNAME=ubuntu NAME=###my_hostname###

Or when we use opendap files from pavics.ouranos.ca, we actually get a traceback:

Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/finch/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 240, in __del__
    self.close(needs_lock=False)
  File "/home/ubuntu/miniconda3/envs/finch/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 218, in close
    file.close()
  File "netCDF4/_netCDF4.pyx", line 2485, in netCDF4._netCDF4.Dataset.close
  File "netCDF4/_netCDF4.pyx", line 2449, in netCDF4._netCDF4.Dataset._close
  File "netCDF4/_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: cannot delete file
python: /home/conda/feedstock_root/build_artifacts/libnetcdf_1574519229483/work/libdap2/dceconstraints.c:512: dcefree: Assertion `0' failed.

Additional Information

I will try to simplify the python environment in which I encounter this issue to see if I can isolate a particular library, and keep this issue updated of my findings.

Untangle bccaqv2 inheritance with base processes

Description

I originally thought that inheriting logging and subsetting functions between the bccaqv2 processes and their base processes was a good idea to improve code reuse, but I've been slowly coming to the conclusion it was not. The processes depend on each other too much. The main reason to use inheritance instead of simple functions was to reuse logging... but I think it's possible to extract logging into its own function.

There are more bccaqv2 processes to expose in the short term, and I believe it's a good time to to this refactoring.

only one file get processed

If a list of files are provided only one (last element in the list) gets processed

len(pr_files)
151

Processing started
Opening as local file: /tmp/pywps_process_e3e6824c/pr_AFR-22_NCC-NorESM1-M_historical_r1i1p1_CLMcom-KIT-CCLM5-0-15_v1_day_20010101-20051231_NER.nc
Computing the output netcdf
[#              ] | 10% Done |  0.0s
[###############] | 100% Done |  1.1s
Processing finished successfully

observed for prcptot and cdd

prcptot without need of tas file

prcptot has the option of providing tas to distinguish between rain and snow;

If the daily mean temperature is provided, ...

But if the file is not provided the process gives an error:

ows:Exception exceptionCode="MissingParameterValue" locator="tas".

Should be possible without.

Take care of the convention:
Snowfall Flux prsn

https://is-enes-data.github.io/cordex_archive_specifications.pdf

Release

Description

Birdhouse chose the name "Oxford" for the next release cycle.

Average over shape process

Description

Given one or multiple polygons and a netCDF file, compute the spatial average (area-weighted) over each region and store along a new "geometry" dimension.

Might require to pass a file storing the cell area for accurate computations.

Docker hub did not build the latest 0.2.3 tag

Description

Version 0.2.3 is not on https://hub.docker.com/r/birdhouse/finch/tags.

Additional Information

I think probably the automatic build looks for the v at the beginning of the tags and this new tag do not have it. Old tags: v0.1, v0.2, v0.2.1.

Update finch to string based parameters in xclim master

Description

Indicator functions in xclim now require units to be explicit. Finch processes need to account for that.

Improved progress information during subsetting

Description

The percent completion for subsetting task is useless as the moment.

More tests for subsetters

Description

Add tests for subsets, since there is a failure on the prod server.

Add streamflow indicators

Description

Xclim master has support for new streamflow indicators that are part of the Raven project. Add support for those in Finch.

Progress status for xclim indicators

Description

Implement a mechanism to get a progress update during the computation.

Examples and documentation

Description

Add notebooks with examples of use.
Setup readthedocs and add link to latest docs

Ouranos deployment has configuration problem

Description

Running finch.ipynb in xclim/docs/notebooks/xclim_training returns a link to output files at localhost instead of pavics.ouranos.ca. The output files of course cannot be found at that address.

ProcessSucceeded
frost_daysResponse(
    output_netcdf='http://localhost:5000/outputs/e435307c-7bc8-11e9-9d55-0242ac120017/out.nc',
    output_log='http://localhost:5000/outputs/e435307c-7bc8-11e9-9d55-0242ac120017/log.txt'
)

Update to latest cookie-cutter

Description

Also, we're using port 5000, should that be changed ?

Test notebooks in Travis-CI

Description

Similar to what is done in Raven.

Add polygon subsetting process based on ocgis and xarray

Description

Regenerate with latest cookiecutter without buildout

finch was initially created when we still had the buildout template. We should update to the latest version without buildout.

Finch instance not accessible using birdy

Description

See docs/source/notebooks/basic.ipynb

Loading the wps client raises an error. I'm guessing this is due to an older pywps version on the production server.

Support multiple input netCDF files for indicators

Description

I don't think the current version supports multiple netCDF file inputs. There are two cases to consider, and it may not be possible to support both at once for the moment.

You want to compute indicators over each file independently (multi-model ensemble)
You want to compute indicators over the aggregation of all files (e.g. simulation split over time)

I think on the short term it is probably best to support the second usage. The first usage can be done by looping over function calls.

output filename conform to DRS filename convention

Set output filename

output is given as:
outputs/2f91fe3a-3f5d-11ea-a7e1-9cb6d08a53e7/out.nc'

Suggestion: use eggshell:
https://eggshell.readthedocs.io/en/latest/_modules/eggshell/nc/nc_utils.html#drs_filename

Add bbox subsetting process using xarray

Description

Use xclim.utils.subset_bbox
Support multiple outputs using metalink

Empty zip file using subset_bccaqv2

Description

See branch bccaqv2_nb for notebook example of download. The output file is empty.

More reliable way to identify opendap url

Description

Currently, when we want to know if a url is an Opendap url, we append .dds to it and check if we get a result.

This causes a problem when sending a link of the following format:

.../thredds/dodsC/birdhouse/nrcan/nrcan_canada_daily_v2/pr/nrcan_canada_daily_pr_2017.nc?pr[0:1:3000][0:1:2][0:1:2]

where appending .dds to the url returns a 404.

We need a more resilient way to find if a url id an opendap url or not.

Fix documentation generation

Description

The documentation generation is currently broken (on master and for version 0.3.1), I'm not sure why. It seems to have to do with the dynamic process generation.

I'll confirm in a clean environment.

Environment

finch version used, if any: 0.3.1, and master branch
Python version, if any: 3.7
Operating System: Ubuntu (WSL)

Steps to Reproduce

cd docs
make html

Add support for pydap data store urls

Description

To support server-side DAP subsetting, we need to be able to open data store using the xarray pydap backend. That means doing something like:

store = xr.backends.PydapDataStore.open(url) 
ds = xr.open_dataset(store)

instead of just ds = xr.open_dataset(url).

where url can be something like `.nc?pr[0:1:5][0:1:2][0:1:3]'

merge multiple files of one dataset one or according to frq

Description

In contrast to flyingpigeon finch operated on single files instead of entire Datasets. Which is a good idea for not running into memory issues. But the multiple output files should be merged at some point.

Suggestion providing a process doing cdo mergetime infies outfile or/and give an additional option for the indices process. Per default the indices processes shoud respect the CMIP/CORDEX archive specifications in terms of file slices.
https://is-enes-data.github.io/cordex_archive_specifications.pdf

Hint: here is a datasets sorting function finding the corresponding files
https://eggshell.readthedocs.io/en/latest/_modules/eggshell/nc/nc_utils.html#sort_by_filename

Add tests for climate indices (ICCLIM, OCGIS)

Check with CMIP5, CMIP6 and CORDEX.

Increase options of time aggregation / frequency

include option all for a calculation over the entire file and/or multiple file dataset

Here are eggshell time_group.
https://github.com/bird-house/eggshell/blob/master/eggshell/nc/ocg_utils.py#L9

Suggestion: keep CMIP/CORDEX file convention 'yr', instead of 'YS'

Be careful with fileend and e.g. winter aggregation (DEC, JAN, FEB). values might be in two seperated files

subset_ensemble_bccaqv2 parameters

Description

There are the parameters variable : {'tasmin', 'tasmax', 'pr'}
Whats about others? e.g. tas?
suggestion use get_variable in eggshhell.nc.nc_utils

for rcp : {'rcp26', 'rcp45', 'rcp85'}
What about rcp60, historical and evaluation?

Also suggestion to use eggshells sort_by_filename which is brining files belonging to one dataset together

Add gridpoint subsetting algorithm based on xarray

Description

use xclim.utils.subset_gridpoint
Support multiple outputs using metalink.

Performance for subsetting a gridpoint is very slow

Description

Subsetting the BCCAQv2 datasets alog the time dimension takes 5-10 minutes for a single file. There are 270 files, so the processing is needlessy long for this simple operation.

While modifying the original data is not desired, here are the proposed solutions:

Don't do anything, keep using the original data
re-chunk the data
re-align the data so that the time dimension is last

Related to #33

Remove specific references to bccaqv2 in processes

Description

Make the processes 'dataset-agnostic'. Currently, some processes are coupled with the bccaqv2 datasets. Try to make it as generic as reasonable, with configuration parameters for specific datasets.

Add processes and inputs metadata internationalization

Description

Related to Ouranosinc/xclim#359

Fix bug with file path

Description

Subset processes fail because MetaFile is given a Path instead of a file string.

This should also be supported by PyWPS eventually.

Set-up read the docs

Update xclim to latest version

Using progress=True results in hung process

Description

Running a finch subset gridpoint using progress=True never gets beyond 'process accepted'
Running directly on small data sets works fine but a large (e.g. NcML aggregate) results in a server timeout (I think)

Describe what you were trying to get done or your feature request.

Environment

https://pavics.ouranos.ca/twitcher/ows/proxy/finch/wps

Steps to Reproduce

from birdy import WPSClient
import xarray as xr
import numpy as np
import os
url = 'https://pavics.ouranos.ca/twitcher/ows/proxy/finch/wps'
wps_sync = WPSClient(url)
wps_prog = WPSClient(url, progress=True)

# single year file 
tasmin = "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/nrcan/nrcan_canada_daily/tasmin/nrcan_canada_daily_tasmin_2010.nc"

#ncml aggregate (all years - 3 variables)
ncml = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/ouranos/cb-oura-1.0/MPI-ESM-LR/rcp85/day/MPI-ESM-LR_rcp85_allvars.ncml'

# non progress small file == OK
resp = wps_sync.subset_gridpoint(resource=tasmin, lat=47.0, lon=-78.0)

# non progress large file == ERROR
resp = wps_sync.subset_gridpoint(resource=ncml, lat=47.0, lon=-78.0)

# progress=True small file == Stops at 'ProcessAccepted'  - 10% complete
resp = wps_prog.subset_gridpoint(resource=tasmin, lat=47.0, lon=-78.0)

# progress=True large file == Stops at 'ProcessAccepted'  - 10% complete
resp = wps_prog.subset_gridpoint(resource=ncml, lat=47.0, lon=-78.0)

Additional Information

Re-running mutliple calls to wps_prog will eventually fill up the queue and gives a 'too many parrallel process error'

I think this is a general problem and not limited to subsetting functions. I am unsure if this is a code/finch issue or a configuration issue on the server

Links to other issues or sources.

Allow selection of models in `ensemble_*` processes

Description

Currently, all the 24 hard-coded models are taken when calculating an ensemble. We need to add an input so that the user can choose which models to use.

Update to newest xclim after clean-up

Description

xclim has an API change that breaks the travis build.

Add DODS to supported formats for resources

Description

make_nc_input only accepts FORMATS.NETCDF. It should also include FORMATS.DODS.

bird-house / finch Goto Github PK

finch's People

Contributors

Stargazers

Watchers

Forkers

finch's Issues

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description

Environment

Steps to Reproduce

Additional Information

Description

Description

Description

Description

Additional Information

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description

Description

Environment

Steps to Reproduce

Description

Description

Description

Description

Description

Description

Description

Description

Description

Environment

Steps to Reproduce

Additional Information

Description

Description

Description

Recommend Projects

Recommend Topics

Recommend Org