Giter Club home page Giter Club logo

iomb-data's Introduction

IOMB-Data

This repository stores the scripts used to download observational data from various sources and format it in a CF-compliant, netCDF4 file which can be used for model benchmarking via ILAMB.

Please note that the repository contains no data. If you need to download our observational data, please see the ilamb-fetch tutorial. This collection of scripts is to:

  • archive how we have produced the datasets compared to models with IOMB
  • expose the details of our formatting choices for transparency
  • provide the community a path to contributing new datasets as well as pointing out errors in the current collection

Contributing

If you have an suggestion or issue with the observational data IOMB uses, we encourage you to use the issue tracker associated with this repository rather than that of the ILAMB codebase. This is because the ILAMB codebase is meant to be a general framework for model-data intercomparison and ignorant of the source of the observational data. Here are a few ways you can contribute to this work:

  • If you notice an irregularity/bug/error with a dataset in our collection, please raise an issue here with the tag bug. We also welcome pull requests which fix these errors, but please first raise an issue to give a record and location where we can have a dialog about the issue.
  • If you know of a dataset which would be a great addition to IOMB, raise an issue here with the tag new dataset. Please provide us with details of where we can find the dataset as well as some reasoning for the recommendation.
  • We also encourage pull requests with scripts that encode new datasets and will provide more information about procedure in the next section.

Formatting Guidelines

We appreciate the community interest in improving IOMB. We believe that more quality observational constraints will lead to a better Earth system model ecosystem and so are always interested in new observational data. We ask that you follow this procedure.

  • Before you encode the dataset, you should first search the open and closed issues here on the issue tracker. It may be we have someone already assigned to work on this and do not want to waste your effort. It may also be that we have considered adding the dataset and have a reason its quality is not sufficient.
  • If no issue is found, raise a new issue with the tag new dataset. This will allow for some discussion and let us know you intend on doing the work.
  • You may use any language you wish to encode the dataset, but we strongly encourage you to use python3 if at all possible. You can find examples in this repository to use as a guide. See this tutorial for details and feel free to ask questions in the issue corresponding to the dataset you are adding.
  • Once you have formatted the dataset, we recommend running it against a collection of models and along with other relevant observational datasets using ILAMB. There are tutorials to help you do this. This will allow the community to evaluate the new addition and decide on if or how it should be included into the curated collection.
  • After you have these results, attend one of our conference calls where you can present the results of the intercomparison and the group can discuss. Once the group agrees, then you can submit a pull request and your addition will be included.

iomb-data's People

Contributors

nocollier avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

iomb-data's Issues

Add Boyer

  • Find current script and submit a PR adding it to IOMB-Data

Merging Olu's and Weiwei's IOMB

Now that we are (almost) past Weiwei's paper publishing, I wanted to go back and look at what Olu had done. The following table shows the datasets that both papers use. There is a lot of overlap but some clear differences too. It would be good to have a discussion on which of these products we should make sure get into IOMB moving forward.

Olu Weiwei
anthropogenic dic OCIM, Gruber
chlorophyll SeaWIFS GLODAPv2, SeaWIFS, MODIS-Aqua
detritalorganiccarbon JAMSTEC
dimethylsulfide Lana, Ogunro
dissolvedinorganiccarbon JAMSTEC GLODAPv2
mixedlayerdepth Boyer.Montegut Boyer
netprimaryproductivity OSU
nitrate WOA, JAMSTEC GLODAPv2, WOA
oxygen WOA, JAMSTEC GLODAPv2, WOA
phosphate WOA GLODAPv2, WOA
phytoplankton JAMSTEC
salinity WOA, JAMSTEC GLODAPv2, WOA
shortwaveradiation RDA.UCAR
silicate WOA GLODAPv2, WOA
talk GLODAPv2
temperature WOA, JAMSTEC GLODAPv2, LDEO, WOA
windspeed NCEP
zooplankton JAMSTEC

GLODAP dataset improvements

  • The grid (depth, lat, lon) is arbitrarily chosen. In particular, I chose the depth dimension to be uniformly distributed 20 levels until 1000 [m]. This was largely because there seems to be wide diversity of choices among models. We could do better here but I am not sure exactly how.
  • We could use the standard deviation of values in each bin as a measure of uncertainty and then incorporate this into the benchmark.

World Ocean Atlas

@weiweifu

I wanted to start getting all the ocean data conversion scripts into this repository and thought we could start with WOA, specifically temperature. My idea is if I can show you one script that does things in a good way, it will simpler to replicate for the others and we can work on them together.

I have some questions:

  • I am downloading data from here, choosing the NetCDF format, 1 degree, files from the monthly column.
  • I see that there are several available decadal periods: '55-'64, '65-'74, '75-'84, '85-'94, '94-'04, '05-'17, or '81-'10. I get that we are getting an average year across these decades, but which one do we want to use?
  • Whatever one we pick, should we be comparing to a model average across the same decades?
  • I also see that in the netCDF files, there is a t_se whose standard name is sea_water_temperature standard_error. Could we use this as a measure of 'observational' uncertainty?

Revisit methodology

The (Collier2018](https://doi.org/10.1029/2018MS001354) methodology normalizes errors by the standard deviation of the reference variable:

s(x) = exp( - bias(x) / std( ref(t, x) ) )

This worked reasonably well for land, but leads to not-helpful error maps for ocean where the variability is comparatively small. We should rethink this in the context of ocean benchmarking. One alternative would be to use regional quantiles over ocean basins as we have proposed and implemented in ILAMB.

Add SeaWIFS

Jim was under the impression that chlorophyll is high at some river outlets and therefore a poor comparison to the global models. Should we try to mask out these areas?

Add LDEO

@weiweifu In your IOMB_AR6 results, we have a sst data product labeled LDEO which I assume is Lamont-Doherty Earth Observatory. It appears to be another shiptrack product but I am unable to locate more information about it. Can you please provide information where you obtained this data and what it is?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.