Giter Club home page Giter Club logo

numfocusfalldev's People

Contributors

esip-lab avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

numfocusfalldev's Issues

Make `xarray` datasets discoverable

There are a large and growing number of publicly-available datasets that are loadable into xarray from buckets in the Cloud. Currently, however, there is no effective way to discover these datasets.

Using standards like OGC Catalog Service the Web (CSW) and OpenSearch, it would be possible to discover these xarray datasets via sites like data.gov (and data.gov.uk, data.gov.au, etc) but it requires producing the ISO metadata which these sites consume.

It would also be possible to discover [xarray datasets via sites like Google's dataset search, but it would necessary to produce the json-ld metadata that these sites consume.

Since xarray preserves the content of datasets which follow the CF and ACDD metadata conventions, it should be possible to generate both types of metadata in a straightforward way from the xarray dataset object, using metadata tools that have already been developed for datasets that adhere to the CF conventions. The ncISO tool exists that generate ISO records from netCDF or OPeNDAP endpoints, so the mapping from CF/ACDD attributes to ISO could be reused for records from xarray. Similarly, there has been work already done to create nco-json metadata from netcdf files, a complete metadata representation from which the json-ld content could be extracted.

Proposed Work:

  • Develop code that integrates the nco-json spec into the xarray package, which represent the complete metadata of the xarray object.

  • Develop code that, from the complete nco-json metadata associated with xarray objects, generates the more restrictive ISO and json-ld metadata formats.

revive Pydap project

Pydap is "a Python library implementing the Data Access Protocol (DAP, aka OPeNDAP or DODS)." It facilities the streaming of datasets over a network using DAP. Its development and maintenance has slowed over the past few years, despite its continued use (and support in xarray).

This proposal would do the following:

  • provide a proof on concept for a pydap server backed by xarray and dask
  • provide some much needed developer maintenance to the pydap project and the pydap-xarray backend (not sexy work but absolutely needed)

per: https://twitter.com/rabernat/status/1039209501482778624

NumFOCUS project: Xarray
ESIP member institution: NCAR

cc @mrocklin @rabernat @shoyer

Improve conda-forge automation

The ESIP community relies on the conda-forge channel for installing the packages then need to enable their workflows, as these packages are often not found in the defaults conda channel. Maintaining 5000+ packages is a lot of work, and there are some straightforward improvements that could be made with modest funding.

Conda-forge has always relied on heavy automation to reduce maintenance burdens and keep the stack up to date and stable. There are some remaining problems however.

Existing Problems

  • Due to a constantly shifting landscape of ABI incompatibilities conda-forge periodically needs to change the pinnings on critical packages to guarantee interoperability of the entire stack. When this happens portions of the stack must be rebuilt so that they use the new binaries. Currently setting up these migrations is a manual process.

  • At the heart of conda-forge is a Directed Graph which describes the dependency relationships for all the packages in the distribution. This graph is used for properly installing packages into environments and for migrating packages to new pinnings and compilers. However, it is possible to get these dependencies wrong in the specification of the package recipe.

Proposed work:

  • Automate the process so that when pinnings change our stack can seamlessly be rebuilt to take advantage of the newer binaries. This will require inspection of the dependency graph and extraction of the portions touched by the newly pinned package, and automation of determining if a new pinning has been issued.

  • Re-generate Python (and maybe R) recipes using the package metadata to get the most accurate dependency list possible. That can be achieved via some small improvements in conda-skeleton recipe generator.

  • Extend the work of Eric Dill on https://github.com/ericdill/depfinder to automatically find dependencies for packages in the conda-forge ecosystem. If unreported dependencies are found they can be PRed into the recipes, helping to keep their recipes up to correct. This approach would then be extended beyond pure python packages with depfinder analogues for other languages.

Update on NumFocus/ESIP?

@abburgess , what ever happened here?
Did some of these get submitted?
Did some of these get funded?

cloud optimized netCDF and zarr

As part of the Pangeo project, we have been exploring the concept of "cloud optimized netCDF" - building off of "cloud optimized GeoTIFF". Zarr is an open-source Python library and storage spec "providing an implementation of chunked, compressed, N-dimensional arrays." The spec is simple, clearly documented, and well suited for use in cloud object store.

Last year, we (@rabernat, myself, and others from the xarray/dask/pangeo projects) wrote an experimental xarray backend for zarr and we have been testing its use on public clouds over the last year. The community is eager to see some formal effort put behind these concepts.

This proposal would do the following:

  • Complete a netCDF+zarr spec (zarr-developers/zarr-python#276 got started here)
  • Adopt the xarray backend to support any spec changes
  • Add missing functionality to the xarray zarr backend such as the ability to append to datasets

Other possible development objectives include:

per: https://twitter.com/rabernat/status/1039210134600396800

NumFOCUS project: Xarray
ESIP member institution: NCAR

cc @mrocklin @rabernat @shoyer @alimanfoo @WardF

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.