numfocusfalldev's People
numfocusfalldev's Issues
Integrate unit support into pandas through integration with pint or other units packages
NumFOCUS project: pandas
ESIP Member: Air Sciences
Relevant chatter around the topic:
pandas-dev/pandas#10349
hgrecco/pint#684
Make `xarray` datasets discoverable
There are a large and growing number of publicly-available datasets that are loadable into xarray from buckets in the Cloud. Currently, however, there is no effective way to discover these datasets.
Using standards like OGC Catalog Service the Web (CSW) and OpenSearch, it would be possible to discover these xarray
datasets via sites like data.gov (and data.gov.uk, data.gov.au, etc) but it requires producing the ISO metadata which these sites consume.
It would also be possible to discover [xarray datasets via sites like Google's dataset search, but it would necessary to produce the json-ld metadata that these sites consume.
Since xarray
preserves the content of datasets which follow the CF and ACDD metadata conventions, it should be possible to generate both types of metadata in a straightforward way from the xarray
dataset object, using metadata tools that have already been developed for datasets that adhere to the CF conventions. The ncISO tool exists that generate ISO records from netCDF or OPeNDAP endpoints, so the mapping from CF/ACDD attributes to ISO could be reused for records from xarray
. Similarly, there has been work already done to create nco-json
metadata from netcdf files, a complete metadata representation from which the json-ld
content could be extracted.
Proposed Work:
-
Develop code that integrates the
nco-json
spec into thexarray
package, which represent the complete metadata of thexarray
object. -
Develop code that, from the complete
nco-json
metadata associated withxarray
objects, generates the more restrictiveISO
andjson-ld
metadata formats.
revive Pydap project
Pydap is "a Python library implementing the Data Access Protocol (DAP, aka OPeNDAP or DODS)." It facilities the streaming of datasets over a network using DAP. Its development and maintenance has slowed over the past few years, despite its continued use (and support in xarray).
This proposal would do the following:
- provide a proof on concept for a pydap server backed by xarray and dask
- provide some much needed developer maintenance to the pydap project and the pydap-xarray backend (not sexy work but absolutely needed)
per: https://twitter.com/rabernat/status/1039209501482778624
NumFOCUS project: Xarray
ESIP member institution: NCAR
Improve conda-forge automation
The ESIP community relies on the conda-forge
channel for installing the packages then need to enable their workflows, as these packages are often not found in the defaults
conda channel. Maintaining 5000+ packages is a lot of work, and there are some straightforward improvements that could be made with modest funding.
Conda-forge has always relied on heavy automation to reduce maintenance burdens and keep the stack up to date and stable. There are some remaining problems however.
Existing Problems
-
Due to a constantly shifting landscape of ABI incompatibilities conda-forge periodically needs to change the pinnings on critical packages to guarantee interoperability of the entire stack. When this happens portions of the stack must be rebuilt so that they use the new binaries. Currently setting up these migrations is a manual process.
-
At the heart of conda-forge is a Directed Graph which describes the dependency relationships for all the packages in the distribution. This graph is used for properly installing packages into environments and for migrating packages to new pinnings and compilers. However, it is possible to get these dependencies wrong in the specification of the package recipe.
Proposed work:
-
Automate the process so that when pinnings change our stack can seamlessly be rebuilt to take advantage of the newer binaries. This will require inspection of the dependency graph and extraction of the portions touched by the newly pinned package, and automation of determining if a new pinning has been issued.
-
Re-generate Python (and maybe R) recipes using the package metadata to get the most accurate dependency list possible. That can be achieved via some small improvements in conda-skeleton recipe generator.
-
Extend the work of Eric Dill on https://github.com/ericdill/depfinder to automatically find dependencies for packages in the conda-forge ecosystem. If unreported dependencies are found they can be PRed into the recipes, helping to keep their recipes up to correct. This approach would then be extended beyond pure python packages with depfinder analogues for other languages.
]
Update on NumFocus/ESIP?
@abburgess , what ever happened here?
Did some of these get submitted?
Did some of these get funded?
cloud optimized netCDF and zarr
As part of the Pangeo project, we have been exploring the concept of "cloud optimized netCDF" - building off of "cloud optimized GeoTIFF". Zarr is an open-source Python library and storage spec "providing an implementation of chunked, compressed, N-dimensional arrays." The spec is simple, clearly documented, and well suited for use in cloud object store.
Last year, we (@rabernat, myself, and others from the xarray/dask/pangeo projects) wrote an experimental xarray backend for zarr and we have been testing its use on public clouds over the last year. The community is eager to see some formal effort put behind these concepts.
This proposal would do the following:
- Complete a netCDF+zarr spec (zarr-developers/zarr-python#276 got started here)
- Adopt the xarray backend to support any spec changes
- Add missing functionality to the xarray zarr backend such as the ability to append to datasets
Other possible development objectives include:
- cloud api specific stores for zarr (e.g. zarr-developers/zarr-python#252)
per: https://twitter.com/rabernat/status/1039210134600396800
NumFOCUS project: Xarray
ESIP member institution: NCAR
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.