bird-house / finch Goto Github PK
View Code? Open in Web Editor NEWA Web Processing Service for Climate Indicators
Home Page: https://finch.readthedocs.io/en/latest/
License: Apache License 2.0
A Web Processing Service for Climate Indicators
Home Page: https://finch.readthedocs.io/en/latest/
License: Apache License 2.0
export WPS_SERVICE=http://finch.crim.ca/wps
export VERIFY_SSL=False
birdy tn_min --tasmin https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip3/tasmin.sresa2.miub_echo_g.run1.atm.da.nc
Fails with
Warning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Warning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Error: Connection failed.
The processes list is quite long, so it would make sense to add some configuration parameters to disable some processes groups.
Build on dockerhub.
Do I have permissions to do that ?
There was a bug fixed in Ouranosinc/xclim#372 that is required to perform some computations with bccaqv2 files.
Some inputs have units that could be described in the DescribeProcess document. For example temperature (degC) or precipitations (mm/day). These could be used for default units when not provided. @huard what do you think?
Also, these units can be used by the platform frontend. Currently, the only way I'm aware of to know which unit to provide to the process is by parsing the default values...
Related PR: geopython/pywps#523
Or at least that's my interpretation of an error that occurs when accessing the output from an indicator calculation using birdy.
If not specified, PyWPS will use the first format in the supported_formats list.
a bbox :
lon0 = -5 # Minimum longitude.
lon1 = 17.0 # Maximum longitude.
lat0 = 10.5 # Minimum latitude.
lat1 = 24.0 # Maximum latitude.
out = finch_i.subset_bbox(
resource=ds[0], lon0=lon0, lon1=lon1, lat0=lat0, lat1=lat1)
failed with:
owslib.wps.WPSException : {'code': 'NoApplicableCode', 'locator': 'None', 'text': 'Process error: method=wps_xsubsetbbox.py._handler, line=170, msg=Input longitude bounds ([-5. 17.]) cross the 0 degree meridian but dataset longitudes are all positive.'}
calculation of indices are mostly connected with time aggregation (month, year ...).
But in the output file stays the
:frequency = "day" ;
Also check if timestamps set according to archive specifications. Gues you need to drop the day/month information.
There is a need to subset the BCCAQv2 data with multiple grid cells at a time. Not a bounding box, just a list of lat-lon coordinates.
@huard I was wondering how we should implement this and I wanted to get your opinion. Here is how I could do it. Currently, the SubsetBCCAQV2Process
process accepts lat0, lat1, lon0, lon1 coordinates. If the lat1 and lon1 are not given, the process makes a single grid cell subset.
I was thinking of changing the lon0 and lat0 to accept a list of comma separated floats instead. Do you think this would be confusing? So the same process could be used for:
Travis CI fails with a segmentation fault.
The test responsible for the segmentation fault is: tests/test_wps_xsubsetpoint.py::test_thredds
.
I could reproduce the error with this very simplified piece of code:
import xarray as xr
from pathlib import Path
here = Path(__file__).parent
url1 = "http://test.opendap.org:8080/opendap/netcdf/examples/tos_O1_2001-2002.nc"
url2 = "http://test.opendap.org:8080/opendap/netcdf/examples/sresa1b_ncar_ccsm3_0_run1_200001.nc"
for n, url in enumerate([url2, url1]):
ds = xr.open_dataset(url)
ds.to_netcdf(str(here / f'{n}.nc'))
Notice how no part of finch or pywps is touched in this code. The error we get is either:
1] 22220 segmentation fault (core dumped) env USER=ubuntu SHLVL=0 HOME=/home/ubuntu LOGNAME=ubuntu NAME=###my_hostname###
Or when we use opendap files from pavics.ouranos.ca, we actually get a traceback:
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/finch/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 240, in __del__
self.close(needs_lock=False)
File "/home/ubuntu/miniconda3/envs/finch/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 218, in close
file.close()
File "netCDF4/_netCDF4.pyx", line 2485, in netCDF4._netCDF4.Dataset.close
File "netCDF4/_netCDF4.pyx", line 2449, in netCDF4._netCDF4.Dataset._close
File "netCDF4/_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: cannot delete file
python: /home/conda/feedstock_root/build_artifacts/libnetcdf_1574519229483/work/libdap2/dceconstraints.c:512: dcefree: Assertion `0' failed.
I will try to simplify the python environment in which I encounter this issue to see if I can isolate a particular library, and keep this issue updated of my findings.
I originally thought that inheriting logging and subsetting functions between the bccaqv2 processes and their base processes was a good idea to improve code reuse, but I've been slowly coming to the conclusion it was not. The processes depend on each other too much. The main reason to use inheritance instead of simple functions was to reuse logging... but I think it's possible to extract logging into its own function.
There are more bccaqv2 processes to expose in the short term, and I believe it's a good time to to this refactoring.
If a list of files are provided only one (last element in the list) gets processed
len(pr_files)
151
Processing started
Opening as local file: /tmp/pywps_process_e3e6824c/pr_AFR-22_NCC-NorESM1-M_historical_r1i1p1_CLMcom-KIT-CCLM5-0-15_v1_day_20010101-20051231_NER.nc
Computing the output netcdf
[# ] | 10% Done | 0.0s
[###############] | 100% Done | 1.1s
Processing finished successfully
observed for prcptot
and cdd
prcptot has the option of providing tas to distinguish between rain and snow;
If the daily mean temperature is provided, ...
But if the file is not provided the process gives an error:
ows:Exception exceptionCode="MissingParameterValue" locator="tas"
.
Should be possible without.
Take care of the convention:
Snowfall Flux prsn
https://is-enes-data.github.io/cordex_archive_specifications.pdf
Birdhouse chose the name "Oxford" for the next release cycle.
Given one or multiple polygons and a netCDF file, compute the spatial average (area-weighted) over each region and store along a new "geometry" dimension.
Might require to pass a file storing the cell area for accurate computations.
Version 0.2.3 is not on https://hub.docker.com/r/birdhouse/finch/tags.
I think probably the automatic build looks for the v
at the beginning of the tags and this new tag do not have it. Old tags: v0.1, v0.2, v0.2.1
.
Indicator functions in xclim now require units to be explicit. Finch processes need to account for that.
The percent completion for subsetting task is useless as the moment.
Add tests for subsets, since there is a failure on the prod server.
Xclim master has support for new streamflow indicators that are part of the Raven project. Add support for those in Finch.
Implement a mechanism to get a progress update during the computation.
Running finch.ipynb
in xclim/docs/notebooks/xclim_training
returns a link to output files at localhost
instead of pavics.ouranos.ca
. The output files of course cannot be found at that address.
ProcessSucceeded
frost_daysResponse(
output_netcdf='http://localhost:5000/outputs/e435307c-7bc8-11e9-9d55-0242ac120017/out.nc',
output_log='http://localhost:5000/outputs/e435307c-7bc8-11e9-9d55-0242ac120017/log.txt'
)
Also, we're using port 5000, should that be changed ?
Similar to what is done in Raven.
finch was initially created when we still had the buildout
template. We should update to the latest version without buildout.
See docs/source/notebooks/basic.ipynb
Loading the wps client raises an error. I'm guessing this is due to an older pywps version on the production server.
I don't think the current version supports multiple netCDF file inputs. There are two cases to consider, and it may not be possible to support both at once for the moment.
I think on the short term it is probably best to support the second usage. The first usage can be done by looping over function calls.
Set output filename
output is given as:
outputs/2f91fe3a-3f5d-11ea-a7e1-9cb6d08a53e7/out.nc'
Suggestion: use eggshell:
https://eggshell.readthedocs.io/en/latest/_modules/eggshell/nc/nc_utils.html#drs_filename
Use xclim.utils.subset_bbox
Support multiple outputs using metalink
See branch bccaqv2_nb for notebook example of download. The output file is empty.
Currently, when we want to know if a url is an Opendap url, we append .dds
to it and check if we get a result.
This causes a problem when sending a link of the following format:
.../thredds/dodsC/birdhouse/nrcan/nrcan_canada_daily_v2/pr/nrcan_canada_daily_pr_2017.nc?pr[0:1:3000][0:1:2][0:1:2]
where appending .dds to the url returns a 404.
We need a more resilient way to find if a url id an opendap url or not.
The documentation generation is currently broken (on master and for version 0.3.1), I'm not sure why. It seems to have to do with the dynamic process generation.
I'll confirm in a clean environment.
cd docs
make html
To support server-side DAP subsetting, we need to be able to open data store using the xarray pydap backend. That means doing something like:
store = xr.backends.PydapDataStore.open(url)
ds = xr.open_dataset(store)
instead of just ds = xr.open_dataset(url)
.
where url
can be something like `.nc?pr[0:1:5][0:1:2][0:1:3]'
In contrast to flyingpigeon finch operated on single files instead of entire Datasets. Which is a good idea for not running into memory issues. But the multiple output files should be merged at some point.
Suggestion providing a process doing cdo mergetime infies outfile
or/and give an additional option for the indices process. Per default the indices processes shoud respect the CMIP/CORDEX archive specifications in terms of file slices.
https://is-enes-data.github.io/cordex_archive_specifications.pdf
Hint: here is a datasets sorting function finding the corresponding files
https://eggshell.readthedocs.io/en/latest/_modules/eggshell/nc/nc_utils.html#sort_by_filename
Check with CMIP5, CMIP6 and CORDEX.
include option all
for a calculation over the entire file and/or multiple file dataset
Here are eggshell time_group
.
https://github.com/bird-house/eggshell/blob/master/eggshell/nc/ocg_utils.py#L9
Suggestion: keep CMIP/CORDEX file convention 'yr', instead of 'YS'
Be careful with fileend and e.g. winter aggregation (DEC, JAN, FEB). values might be in two seperated files
There are the parameters variable : {'tasmin', 'tasmax', 'pr'}
Whats about others? e.g. tas
?
suggestion use get_variable
in eggshhell.nc.nc_utils
for rcp : {'rcp26', 'rcp45', 'rcp85'}
What about rcp60
, historical
and evaluation
?
Also suggestion to use eggshells sort_by_filename
which is brining files belonging to one dataset together
use xclim.utils.subset_gridpoint
Support multiple outputs using metalink.
Subsetting the BCCAQv2 datasets alog the time dimension takes 5-10 minutes for a single file. There are 270 files, so the processing is needlessy long for this simple operation.
While modifying the original data is not desired, here are the proposed solutions:
Related to #33
Make the processes 'dataset-agnostic'. Currently, some processes are coupled with the bccaqv2 datasets. Try to make it as generic as reasonable, with configuration parameters for specific datasets.
Related to Ouranosinc/xclim#359
Subset processes fail because MetaFile is given a Path instead of a file string.
This should also be supported by PyWPS eventually.
Running a finch subset gridpoint using progress=True never gets beyond 'process accepted'
Running directly on small data sets works fine but a large (e.g. NcML aggregate) results in a server timeout (I think)
Describe what you were trying to get done or your feature request.
https://pavics.ouranos.ca/twitcher/ows/proxy/finch/wps
from birdy import WPSClient
import xarray as xr
import numpy as np
import os
url = 'https://pavics.ouranos.ca/twitcher/ows/proxy/finch/wps'
wps_sync = WPSClient(url)
wps_prog = WPSClient(url, progress=True)
# single year file
tasmin = "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/nrcan/nrcan_canada_daily/tasmin/nrcan_canada_daily_tasmin_2010.nc"
#ncml aggregate (all years - 3 variables)
ncml = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/ouranos/cb-oura-1.0/MPI-ESM-LR/rcp85/day/MPI-ESM-LR_rcp85_allvars.ncml'
# non progress small file == OK
resp = wps_sync.subset_gridpoint(resource=tasmin, lat=47.0, lon=-78.0)
# non progress large file == ERROR
resp = wps_sync.subset_gridpoint(resource=ncml, lat=47.0, lon=-78.0)
# progress=True small file == Stops at 'ProcessAccepted' - 10% complete
resp = wps_prog.subset_gridpoint(resource=tasmin, lat=47.0, lon=-78.0)
# progress=True large file == Stops at 'ProcessAccepted' - 10% complete
resp = wps_prog.subset_gridpoint(resource=ncml, lat=47.0, lon=-78.0)
Re-running mutliple calls to wps_prog will eventually fill up the queue and gives a 'too many parrallel process error'
I think this is a general problem and not limited to subsetting functions. I am unsure if this is a code/finch issue or a configuration issue on the server
Links to other issues or sources.
Currently, all the 24 hard-coded models are taken when calculating an ensemble. We need to add an input so that the user can choose which models to use.
xclim has an API change that breaks the travis build.
make_nc_input
only accepts FORMATS.NETCDF. It should also include FORMATS.DODS.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.