ua-snap / ardac-toolbox Goto Github PK

Tools for working with ARDAC

Python 0.23% Jupyter Notebook 99.77%

ardac-toolbox's Introduction

ardac-toolbox

Notebooks, modules, and more to expose SNAP data holdings and the SNAP Data API.

What is this?

This repository is a place to store notebooks, modules, widgets, and code that might be used in ARDAC. This repo is a work in progress and all content should considered incomplete.

ardac-toolbox's People

Contributors

Stargazers

Watchers

ardac-toolbox's Issues

Add functionality for getting alaska met station data

Station data is some of the most useful companion data to the data we that will be exposed via ARDAC, so it makes sense to demonstrate how to get such data and work with it alongside our data.

The Iowa Environmental Mesonet interface is great and there is already an API in place for getting Alaska weather station data programmatically. This should be our backup plan.

The source data is available through NCEI's API through the "global-hourly" dataset, which is the Integrated Surface Dataset (ISD). I think it makes sense to build queries to this dataset so we are accessing the source data directly, but if it seems too hairy to work with then we can fall back to the IEM.

Here is a working query for Fairbanks airport temperature:
https://www.ncei.noaa.gov/access/services/data/v1?dataset=global-hourly&dataTypes=TMP&stations=70261026411&startDate=1980-01-01&endDate=1980-01-01&format=csv

Bias correction of reanlaysis and GCM output

This builds on fetching of met station data and fetching GCM data so both of those notebooks should probably be completed first.
Use accessed station data to bias-correct GCM and / or reanalysis data with e.g. xclim package.

ERA5 data via CDS API

Add a notebook that demonstrates working with the CDS API to pull ERA5 data subsets using AK points and polygons
Perhaps compare with observational data, linked to notebook (need issue) getting observational data for Alaska.

Fetching CMIP6 data from ESGF

We do not have hosted CMIP6 data yet, so we can demonstrate using existing libraries to fetch CMIP6 data from ESGF.
This includes CDS API and ESGF (via pyesgf package).
This will be done for points and areas in Alaska. Visualization with other geographical data (e.g. boundaries and points from API) for context is encouraged.

Using SNAP API to add environmental covariates (eg temp/precip) to observed data locations

Use existing spatial observations (field data sets from real scientists) and add data via SNAP API. Mostly tabular operations.

May include some spatial joining operations, for example using point data observations and joining to existing polygon boundaries (from SNAP places/boundary data).

Table could be the notebook result, or a simple plot, model, etc…

Interactive hypercube viz

Use interactive plotting library e.g. hvplot to visualize some data, with widgets to control what data is displayed.

Use earthaccess package to get DAYMET data, then combine with SNAP data

Add a notebook that demonstrates working with earthaccess package to pull DAYMET data subsets using SNAP API places/boundaries (points and polygons). Combine DAYMET with additional data from the API.

Parallel to ERA5 data from CDS API issue #1

Survey / ask people what kinds of tools/notebooks would be useful

"What would you do if you had this data API?"

Josh/Kyle will meet with fisheries biologists (W. Samuel / E. Schoen), ecologist (K. Spellman), and wildlife biologist Phd student (S. Zavoico) to gather some ideas from non-geospatial subject matter experts who might use this data.

Make conda environment.yml more complete and portable

I encountered some compatibility/portability problems creating a conda environment from environment.yml while reviewing PR #21. It worked initially but was missing the following packages:

xarray
rioxarray
cftime
rasterio

@kyleredilla added these packages and exported a new environment.yml, but attempting to create the conda environment on my side from this file failed. I thought maybe upgrading macOS on my side to a more current version would help with this problem, but then I wasn't able to install the conda environment from either version of environment.yml.

I've seen this same compatibility/portability problem in some of our other repos as well. After a little bit of research, it looks like the fix is to export the conda environment using the --from-history option. So, something like this:

conda env export --from-history | grep -v "^prefix: " > environment.yml

This will export only the packages that were explicitly installed when the conda environment was created, not hundreds of platform-specific libraries and such, which should help make environment.yml much more portable. But we'll want to do some careful testing between multiple people/platforms to make sure this is the case.

Basic cartography example (notebook)

Build a basemap and overlay with vector data from the API, or raster data from Rasdaman

Rasdaman WMS
USGS WMS/WFS data?
Include ArcGIS REST services (maybe wildland fire layers from here or here?)

See Hosted Layers - do we know any UA data thats hosted on an ArcGIS server that we could pull into a map example?

Do some lit survey so that datasets can be decorated with a list of research that has used that data

Primer on climate data

What I'm thinking here is a primer on using various types of climate data offered by SNAP and elsewhere.

explain gridded observed data
Explain reanlaysis and compare it with observed data, both station data and gridded observed data
explain GCMs and compare with historical runs with reanalysis, demonstrating mismatch between fine scale (e.g. hourly) but expected match between historical summaries over 10-years etc.
demonstrate bias between point extraction and station data, with potential comparison to elevation variation within a pixel
demonstrate the more useful ways to summarize GCM data

QGIS basic cartography template project

I think the idea here is to have a QGIS project with various layers configured as requests to our API and Rasdaman?
Also potentially with other external services like USGS
Similar to #6 but QGIS version

Informal survey notes

Josh and Kyle sat down with a few UAF researchers in the biological sciences. We met with:

Dr. Katie Spellman (IARC, works in plant ecology, citizen science, climate change education)
Sebastian Zavoico (PhD candidate, works in wildlife biology, has studied moose population dynamics, beavers in the arctic)
Will Samuel (works in fish/wildlife, has studied grayling in beaver ponds, wildfire/fish/beaver interactions)
Dr. Erik Schoen (IARC, fisheries biologist, has studied salmon response to climate stressors)

These were 30-45 min chats around a laptop, with no real agenda and no heavy note-taking. I (Josh) think we succeeded in keeping it low key and low stakes. We briefly demo'ed the SNAP data holdings accessible via the API and asked:

"How could you use this in your work?"
"What would you do with this tool?"
"What's missing?"

We let the participants know that were are coming up with ideas for example/template notebooks using the API and wanted these notebooks to be relevant to their field. Their comments and discussion topics are summarized below; these items could be used as jumping off points for ARDAC notebook projects or as ideas for further discussion about what ARDAC is/isn't.

interest in historical data: whether interpolated (eg DAYMET) or reanalysis (eg ERA5, downscaled ERA5), historical data is important to biologists who are building models from their observed data. For some applications monthly means will work just fine, but daily data is more appropriate for others (eg, timing of salmon outmigration, flowering phenology, river ice observations, conditions leading up to wildfires). A demo showing a quick and easy pull of CRU-TS historical temperature data for a random lat/lon was exciting for participants.
"Wow, I wish we had this earlier for paper XYZ..." : this comment came up more than once, how in previous research the participant either had to a) hire someone with GIS/programming skills to wrangle this data for them, b) ask SNAP to wrangle this data for them, or c) revise a research question because it was difficult to get at the data they wanted. This shows that the API functionality is needed and would potentially speed up research / free up resources for more in depth research. It also shows that there are probably ongoing projects that could use this tool, but don't know it yet.....how do we reach those people before they publish??
interest in projected daily data for computation of metrics: certain metrics (or "indicators" as we call them in the API) are computed from daily data. Growing degree days came up multiple times in these conversations, as did rolling averages over n-day periods (eg, 5-day precip, 7-day temperature, etc). The list of metrics is of course very long and specific to different disciplines. If requested enough, these could be calculated by SNAP and added to the current indicators list, or could be calculated ad-hoc from dailies in an ARDAC "how-to" type notebook. That type of notebook would require daily daily to be available.
interest in metrics that represent extremes: research focusing on shorter-duration extreme climate events and their effects on biological processes is becoming more common. To paraphrase an analogy from Erik Schoen: "imagine you spent all day in the kitchen, and for 5 seconds in the middle of the day you burned your hand on the stovetop. The average temperature of your hand for that day would not tell the story of the extreme event, but the event would still have large consequences for your health on that day and in days to come." <<< this was his way of communicating how biological research is leaning more into extreme events (eg, heat waves, flash floods, rain on snow) which requires finer temporal resolution for both observation and modeling.
do downscaled models flatten or otherwise mask extremes?: I (Josh) do not know enough about this subject to answer clearly, but I gather from conversations with Kyle and others that the method of downscaling (statistical vs dynamical) is important with regard to preservation of extreme values in the datasets. Computing metrics that represent extremes in the variables may be fraught if we use the wrong dataset, so any work in this direction should be vetted with downscaling experts!
R users: Of these researchers who have programming skills, all use R exclusively. Their R workflows are generally not geospatial in nature, but work with tabular data derived from geospatial processing in ESRI or done by other collaborators in python. This is really good to know: since the outputs of the API are tabular in nature, this does not preclude an R user from requesting a CSV response format and working with it immediately. Multiple participants simply asked "can I just get this as a CSV?" to which I was happily able to answer "yes of course!". Whether or not we offer ARDAC notebooks in the R language, the "researcher-performing-spatial-analysis-without-geospatial-programming-skills" is an important user group to keep in mind. I think this type of user is more common than one might suppose.
model selection and the "kitchen sink" approach: researchers do not know exactly which variables are going to become important when building their predictive models. They sometimes use a "kitchen sink" approach where their observations are stacked with a large number of available variables, and then use statistical methods to select the variables that contribute most to the result. I (Josh) do not understand those stats very well, but I know this "all of the above" approach is currently hard to perform using our API. The datasets are not all summarized over the same time buckets, the model and scenario options are not consistent, etc. What would it take to offer a "kitchen sink" endpoint? For example, "here's my coordinate, get me all variables summarized by decade"? The resulting monstrous CSV output might actually be what some people are looking for.
the API documentation was not exciting: The participants listed above are all very intelligent people, but showing them the API documentation did not really elicit a "Wow!" or "Aha!" response. To a person, their reaction was something like "I checked out that website, but I'm not really sure what it is, or what its supposed to do." I think this underscores the need for the ARDAC notebooks to be something shiny, engaging, and fun. The documentation is fantastic at being The Documentation, but does not make clear the potential of the API as a tool for research. This made me feel that this informal survey was perhaps premature - I felt like we were showing them the parts in the garage when we really might want to show them the fully assembled car.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.