Giter Club home page Giter Club logo

eccc-msc / geomet-data-registry Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 11.0 366 KB

GeoMet Data Registry is a system to manage access to the Environment and Climate Change Canada's Meteorological Service of Canada (MSC) open data, including raw numerical weather prediction (NWP) model data layers and the weather radar mosaic, via Open Geospatial Consortium (OGC) standards such as the Web Map Service (WMS). Meteorological layers are served dynamically through the Web Map Service (WMS) standard to enable end-users to display meteorological data within their own tools, on interactive web maps and in mobile apps.

Home Page: https://www.canada.ca/en/environment-climate-change/services/weather-general-tools-resources/weather-tools-specialized-data/geospatial-web-services.html

License: Other

Dockerfile 0.56% Makefile 1.50% Python 96.34% Shell 1.60%

geomet-data-registry's People

Contributors

dukestep avatar philippeth avatar rousseaulambertlp avatar tomkralidis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

geomet-data-registry's Issues

add graceful error when file not in configuration

At the moment, when we want to add a file in the tileindex (geomet-data-registry data add --file /path/to/file.grib2) and this file is not part of the configuration, the script fails with a KeyError.

For example : KeyError: 'CWAT_EATM_0'

These error shoudl fail gracefully.

CC @tomkralidis @Dukestep

Full support for ISO 8601 with milliseconds (WMS time dimension)

Hi,

I'm using the leaflet.timeDimension plugin to make WMS requests of RADAR precip data from geomet. An issue I've found is with the plugin formatting the ISO8601 datestring with milliseconds, even though the value of milliseconds are always zero. E.g. &time=2019-07-23T11:50:00.000Z. The problem is that (at least for the layer I'm testing) geomet does not accept that 2019-07-23T11:50:00.000Z is a valid time, instead only accepting values with no milliseconds, e.g. 2019-07-23T11:50:00Z. I'll find a workaround in the meantime by modifying the plugin code, but I'm wondering if your service can be configured to fully implement the ISO8601 standard, including milliseconds, and make equivalence checks that can handle varying amounts of precision provided. E.g. 2019-01-01T00Z == 2019-01-01T00:00Z == 2019-01-01T00:00:00Z ==2019-01-01T00:00:00.000Z

Thanks, and apologies if this Issue is opened on the wrong repo.

add URL property to handling workflow

Giving we are thinking of downloading on-the-fly data when we get a GetMap or GetFeatureInfo, we should add a property in ES mapping for the source data.

For example, this source data could be the https path to a particular file on datamart.

This information (I think) could come from the amqp notification we get, something like parent.msg.baseurl. This could then be passed to the handler as a variable and written to ES for a feature.

The befit here is that we would only need to modify general files, like the layer base.py and the handler. And if the source changes, it will automatically be updated because it's directly from the amqp notifications.

We could call that property source_path ? Other ideas ?

cc @tomkralidis @Dukestep

Add error logger for incomplete count in store

in layer/base.py we should add a error logger for when a file is added in a new model run and the previous model run did not get the total required number of files.

This way we could know if we missed a layer from the amqp feed.

Add test to see if a radar timestep is missing

We should add in the code a condition that add a key in the store (Redis) with the latest timestep which will act as default wms time for the mapfile and will enable us to see if we missed any radar composite.

Example key could be like: RADAR_1KM_RRAI: 2019-10-23T12:10:00Z

Raising LayerError when variable not found causing CLI tool to fail

Recent changes introduced in #40 is causing issues when using the directory (-d) flag with the CLI tool.

i.e geomet-data-registry data add -d /data/geomet/feeds/dev/ensemble/.../00 errors out when a variable cannot be found and all further processing stops.

Code (example from layer/reps.py):

        if self.wx_variable not in var_path:
            msg = 'Variable "{}" not in ' \
                  'configuration file'.format(self.wx_variable)
            LOGGER.exception(msg)
            raise LayerError(msg)

Instead I suggest the identify method returns False, effectively stopping the processing for that file since handler/core.py only goes on to register if identify returns True. We could still log a message saying that the variable could not be found (maybe a warning instead of an exception?).

Thoughts @tomkralidis & @RousseauLambertLP?

Enhancement: calculate register_datetime with ES

We are currently calculating the register_datetime property of a tile in the tileindex by using elasticsearch.update_by_query once the file is indexed.

i.e File is registered in ES --> calculate datetime.now() --> update file's document in ES with es.update_by_query().

I suggest that we look at a different approach that could use ElasticSearch's ingest pipelines to calculate register_datetime at the time the document is going to be indexed. This would avoid having to do call the update_by_query call and remove the need for the refresh='wait_for' argument when we call es.index() for the received file.

We would have to set this pipeline during the initial creation of the index.

See: https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html and https://kb.objectrocket.com/elasticsearch/how-to-create-a-timestamp-field-for-an-elasticsearch-index-275

@tomkralidis, @RousseauLambertLP, thoughts on this?

All datetimes should contain Zulu designator 'Z'

Currently, an indexed document looks like this

    "properties": {
      "elevation": "2m",
      "identifier": "GDPS.ETA_TT-20190701000000-20190701000000",
      "reference_datetime": "2019-07-01T00:00:00",
      "file_creation_datetime": "2019-07-01T03:46:46.100000",
      "filepath": "/data-san/geomet/dev/feeds/model_gem_global/25km/grib2/lat_lon/00/000/CMC_glb_TMP_TGL_2_latlon.24x.24_2019070100_P000.grib2",
      "forecast_hour_datetime": "2019-07-01T00:00:00",
      "member": null,
      "identify_datetime": "2019-09-27T18:55:51.066768",
      "receive_datetime": "2019-09-27T18:55:50.893790",
      "layer": "GDPS.ETA_TT",
      "register_datetime": "2019-09-27T18:55:51.067903"
    }

I suggest that all datetimes for each layer handler (i.e forecast_hour_datetime, reference_datetime, file_creation_datetime, identify_datetime, etc...) should contain the Zulu designator (Z).

For ElasticSearch, this will allow us to more strictly define the format expected in the ES mapping. For example:

                    'reference_datetime': {
                        'type': 'date',
                        'format': 'date_time_no_millis'
                    },
                    'file_creation_datetime': {
                        'type': 'date',
                        'format': 'date_time'
                    },

Unfortunately, datetime does not seem to support this out of the box and the only solution I've found is to isoformat() + 'Z'. I don't find this very desirable. @tomkralidis, any other options?

Let ES handle and assign _id

According to ES documentation, the process to insert new entries in ES might be faster if let ES assign the _id field by itself instead of assigning it ourself.

I don't see any impact of letting ES create the _id, any thoughts @tomkralidis and @Dukestep ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.