Giter Club home page Giter Club logo

finch's People

Contributors

aulemahal avatar cehbrecht avatar cjauvin avatar davidcaron avatar dependabot[bot] avatar fmigneault avatar huard avatar matprov avatar perronld avatar pre-commit-ci[bot] avatar snyk-bot avatar tlogan2000 avatar tlvu avatar zeitsperre avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finch's Issues

birdy terminal client fails

Description

export WPS_SERVICE=http://finch.crim.ca/wps
export VERIFY_SSL=False
birdy tn_min --tasmin https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/flyingpigeon/cmip3/tasmin.sresa2.miub_echo_g.run1.atm.da.nc

Fails with

Warning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.

Warning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.

Error: Connection failed.

Dockerhub hook

Description

Build on dockerhub.
Do I have permissions to do that ?

Add units to inputs for process descriptions

Description

Some inputs have units that could be described in the DescribeProcess document. For example temperature (degC) or precipitations (mm/day). These could be used for default units when not provided. @huard what do you think?

Also, these units can be used by the platform frontend. Currently, the only way I'm aware of to know which unit to provide to the process is by parsing the default values...

Related PR: geopython/pywps#523

Bbox fails when wrapping around dataset boundaries

Description

a bbox :

lon0 = -5 # Minimum longitude.
lon1 = 17.0 # Maximum longitude.
lat0 = 10.5 # Minimum latitude.
lat1 = 24.0 # Maximum latitude.

out = finch_i.subset_bbox(
    resource=ds[0], lon0=lon0, lon1=lon1, lat0=lat0, lat1=lat1)

failed with:

owslib.wps.WPSException : {'code': 'NoApplicableCode', 'locator': 'None', 'text': 'Process error: method=wps_xsubsetbbox.py._handler, line=170, msg=Input longitude bounds ([-5. 17.]) cross the 0 degree meridian but dataset longitudes are all positive.'}

wrong metadata frequency

Description

calculation of indices are mostly connected with time aggregation (month, year ...).

But in the output file stays the

	:frequency = "day" ;

Also check if timestamps set according to archive specifications. Gues you need to drop the day/month information.

Multiple grid cells when subsetting

Description

There is a need to subset the BCCAQv2 data with multiple grid cells at a time. Not a bounding box, just a list of lat-lon coordinates.

@huard I was wondering how we should implement this and I wanted to get your opinion. Here is how I could do it. Currently, the SubsetBCCAQV2Process process accepts lat0, lat1, lon0, lon1 coordinates. If the lat1 and lon1 are not given, the process makes a single grid cell subset.

I was thinking of changing the lon0 and lat0 to accept a list of comma separated floats instead. Do you think this would be confusing? So the same process could be used for:

  • Single grid cell subset: lat0, lon0
  • Boundingbox subset: lat0, lon0, lat1, lon1
  • Multiple grid cell subset: lat0, lon0 as a list of comma separated floats

Fix segmentation fault in Travis ci test suite

Description

Travis CI fails with a segmentation fault.

Environment

  • Finch version used, if any: latest
  • Python version, if any: 3.6
  • Operating System: Linux

Steps to Reproduce

The test responsible for the segmentation fault is: tests/test_wps_xsubsetpoint.py::test_thredds.

I could reproduce the error with this very simplified piece of code:

import xarray as xr
from pathlib import Path

here = Path(__file__).parent

url1 = "http://test.opendap.org:8080/opendap/netcdf/examples/tos_O1_2001-2002.nc"
url2 = "http://test.opendap.org:8080/opendap/netcdf/examples/sresa1b_ncar_ccsm3_0_run1_200001.nc"

for n, url in enumerate([url2, url1]):
    ds = xr.open_dataset(url)
    ds.to_netcdf(str(here / f'{n}.nc'))

Notice how no part of finch or pywps is touched in this code. The error we get is either:

1]    22220 segmentation fault (core dumped)  env USER=ubuntu  SHLVL=0 HOME=/home/ubuntu  LOGNAME=ubuntu NAME=###my_hostname###

Or when we use opendap files from pavics.ouranos.ca, we actually get a traceback:

Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/finch/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 240, in __del__
    self.close(needs_lock=False)
  File "/home/ubuntu/miniconda3/envs/finch/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 218, in close
    file.close()
  File "netCDF4/_netCDF4.pyx", line 2485, in netCDF4._netCDF4.Dataset.close
  File "netCDF4/_netCDF4.pyx", line 2449, in netCDF4._netCDF4.Dataset._close
  File "netCDF4/_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: cannot delete file
python: /home/conda/feedstock_root/build_artifacts/libnetcdf_1574519229483/work/libdap2/dceconstraints.c:512: dcefree: Assertion `0' failed.

Additional Information

I will try to simplify the python environment in which I encounter this issue to see if I can isolate a particular library, and keep this issue updated of my findings.

Untangle bccaqv2 inheritance with base processes

Description

I originally thought that inheriting logging and subsetting functions between the bccaqv2 processes and their base processes was a good idea to improve code reuse, but I've been slowly coming to the conclusion it was not. The processes depend on each other too much. The main reason to use inheritance instead of simple functions was to reuse logging... but I think it's possible to extract logging into its own function.

There are more bccaqv2 processes to expose in the short term, and I believe it's a good time to to this refactoring.

only one file get processed

If a list of files are provided only one (last element in the list) gets processed

len(pr_files)
151

Processing started
Opening as local file: /tmp/pywps_process_e3e6824c/pr_AFR-22_NCC-NorESM1-M_historical_r1i1p1_CLMcom-KIT-CCLM5-0-15_v1_day_20010101-20051231_NER.nc
Computing the output netcdf
[#              ] | 10% Done |  0.0s
[###############] | 100% Done |  1.1s
Processing finished successfully

observed for prcptot and cdd

Release

Description

Birdhouse chose the name "Oxford" for the next release cycle.

Average over shape process

Description

Given one or multiple polygons and a netCDF file, compute the spatial average (area-weighted) over each region and store along a new "geometry" dimension.

Might require to pass a file storing the cell area for accurate computations.

Add streamflow indicators

Description

Xclim master has support for new streamflow indicators that are part of the Raven project. Add support for those in Finch.

Ouranos deployment has configuration problem

Description

Running finch.ipynb in xclim/docs/notebooks/xclim_training returns a link to output files at localhost instead of pavics.ouranos.ca. The output files of course cannot be found at that address.

ProcessSucceeded
frost_daysResponse(
    output_netcdf='http://localhost:5000/outputs/e435307c-7bc8-11e9-9d55-0242ac120017/out.nc',
    output_log='http://localhost:5000/outputs/e435307c-7bc8-11e9-9d55-0242ac120017/log.txt'
)

Finch instance not accessible using birdy

Description

See docs/source/notebooks/basic.ipynb

Loading the wps client raises an error. I'm guessing this is due to an older pywps version on the production server.

Support multiple input netCDF files for indicators

Description

I don't think the current version supports multiple netCDF file inputs. There are two cases to consider, and it may not be possible to support both at once for the moment.

  • You want to compute indicators over each file independently (multi-model ensemble)
  • You want to compute indicators over the aggregation of all files (e.g. simulation split over time)

I think on the short term it is probably best to support the second usage. The first usage can be done by looping over function calls.

More reliable way to identify opendap url

Description

Currently, when we want to know if a url is an Opendap url, we append .dds to it and check if we get a result.

This causes a problem when sending a link of the following format:

.../thredds/dodsC/birdhouse/nrcan/nrcan_canada_daily_v2/pr/nrcan_canada_daily_pr_2017.nc?pr[0:1:3000][0:1:2][0:1:2]

where appending .dds to the url returns a 404.

We need a more resilient way to find if a url id an opendap url or not.

Fix documentation generation

Description

The documentation generation is currently broken (on master and for version 0.3.1), I'm not sure why. It seems to have to do with the dynamic process generation.

I'll confirm in a clean environment.

Environment

  • finch version used, if any: 0.3.1, and master branch
  • Python version, if any: 3.7
  • Operating System: Ubuntu (WSL)

Steps to Reproduce

cd docs
make html

Add support for pydap data store urls

Description

To support server-side DAP subsetting, we need to be able to open data store using the xarray pydap backend. That means doing something like:

store = xr.backends.PydapDataStore.open(url) 
ds = xr.open_dataset(store)

instead of just ds = xr.open_dataset(url).

where url can be something like `.nc?pr[0:1:5][0:1:2][0:1:3]'

merge multiple files of one dataset one or according to frq

Description

In contrast to flyingpigeon finch operated on single files instead of entire Datasets. Which is a good idea for not running into memory issues. But the multiple output files should be merged at some point.

Suggestion providing a process doing cdo mergetime infies outfile or/and give an additional option for the indices process. Per default the indices processes shoud respect the CMIP/CORDEX archive specifications in terms of file slices.
https://is-enes-data.github.io/cordex_archive_specifications.pdf

Hint: here is a datasets sorting function finding the corresponding files
https://eggshell.readthedocs.io/en/latest/_modules/eggshell/nc/nc_utils.html#sort_by_filename

subset_ensemble_bccaqv2 parameters

Description

There are the parameters variable : {'tasmin', 'tasmax', 'pr'}
Whats about others? e.g. tas?
suggestion use get_variable in eggshhell.nc.nc_utils

for rcp : {'rcp26', 'rcp45', 'rcp85'}
What about rcp60, historical and evaluation?

Also suggestion to use eggshells sort_by_filename which is brining files belonging to one dataset together

Performance for subsetting a gridpoint is very slow

Description

Subsetting the BCCAQv2 datasets alog the time dimension takes 5-10 minutes for a single file. There are 270 files, so the processing is needlessy long for this simple operation.

While modifying the original data is not desired, here are the proposed solutions:

  • Don't do anything, keep using the original data
  • re-chunk the data
  • re-align the data so that the time dimension is last

Related to #33

Remove specific references to bccaqv2 in processes

Description

Make the processes 'dataset-agnostic'. Currently, some processes are coupled with the bccaqv2 datasets. Try to make it as generic as reasonable, with configuration parameters for specific datasets.

Fix bug with file path

Description

Subset processes fail because MetaFile is given a Path instead of a file string.

This should also be supported by PyWPS eventually.

Using progress=True results in hung process

Description

Running a finch subset gridpoint using progress=True never gets beyond 'process accepted'
Running directly on small data sets works fine but a large (e.g. NcML aggregate) results in a server timeout (I think)

Describe what you were trying to get done or your feature request.

Environment

https://pavics.ouranos.ca/twitcher/ows/proxy/finch/wps

Steps to Reproduce

from birdy import WPSClient
import xarray as xr
import numpy as np
import os
url = 'https://pavics.ouranos.ca/twitcher/ows/proxy/finch/wps'
wps_sync = WPSClient(url)
wps_prog = WPSClient(url, progress=True)

# single year file 
tasmin = "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/nrcan/nrcan_canada_daily/tasmin/nrcan_canada_daily_tasmin_2010.nc"

#ncml aggregate (all years - 3 variables)
ncml = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/ouranos/cb-oura-1.0/MPI-ESM-LR/rcp85/day/MPI-ESM-LR_rcp85_allvars.ncml'

# non progress small file == OK
resp = wps_sync.subset_gridpoint(resource=tasmin, lat=47.0, lon=-78.0)

# non progress large file == ERROR
resp = wps_sync.subset_gridpoint(resource=ncml, lat=47.0, lon=-78.0)

# progress=True small file == Stops at 'ProcessAccepted'  - 10% complete
resp = wps_prog.subset_gridpoint(resource=tasmin, lat=47.0, lon=-78.0)

# progress=True large file == Stops at 'ProcessAccepted'  - 10% complete
resp = wps_prog.subset_gridpoint(resource=ncml, lat=47.0, lon=-78.0)

Additional Information

Re-running mutliple calls to wps_prog will eventually fill up the queue and gives a 'too many parrallel process error'

I think this is a general problem and not limited to subsetting functions. I am unsure if this is a code/finch issue or a configuration issue on the server

Links to other issues or sources.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.