Comments (10)
Did you try the rioxarray
engine instead of rasterio
?
from rioxarray.
@dcherian Sorry I realized I should have posted this in rasterio repo. I was confused rasterio and rioxarray! I will try them out and let you know soon, but you may close the issue if you think this doesn't fit here.
from rioxarray.
No! This is the place. Just try engine="rioxarray"
. engine="rasterio"
is bundled with xarray and is about be removed. rioxarray
is a more full-featured version of the same funcitonality
from rioxarray.
Actually, engine="rasterio"
comes from rioxarray ref. It calls rioxarray.open_rasterio
.
from rioxarray.
Oh that's confusing! In that case, i guess it's calling rioxarray
since its installed.
from rioxarray.
So I tried open_rasterio with NSIDC output, but the results are the same.
The issue did not happen with data directly downloaded from NASA Earth data (see updates on pydata/xarray#7621), so I am not sure it's an issue with rioxarray or the NSIDC output.
ds_NSIDC_rioxarray = rioxarray.open_rasterio(os.path.join(input_path, fn_NSIDC_output))
ds_NSIDC_rioxarray = ds_NSIDC_rioxarray.set_coords(["cell_lat", "cell_lon"])
ds_NSIDC_rioxarray
ds_NSIDC_rioxarray.precipitation_total_surface_flux[0].plot(y="cell_lat", x="cell_lon", vmin=-0.001)
from rioxarray.
If you do this:
rds = rioxarray.open_rasterio(os.path.join(input_path, fn_NSIDC_output), group="Geophysical_Data")
or this:
rds = xarray.open_dataset(os.path.join(input_path, fn_NSIDC_output), group="Geophysical_Data", engine="rasterio")
The data plots correctly with rasterio.
EDIT: This does not fix the issue. The plot has the correct orientation. However, the data is missing coordinates.
from rioxarray.
If you do this, the lat/lon data looks correct:
ds_NSIDC_root = xarray.open_dataset(data_path, variable=["cell_lon", "cell_lat"], engine='rasterio')
ds_NSIDC_root.cell_lat.plot()
from rioxarray.
This appears to be a rasterio/GDAL issue:
import rasterio
import netCDF4
import numpy
with netCDF4.Dataset(data_path) as nfh:
nc_data = nfh['Geophysical_Data']['precipitation_total_surface_flux'][:]
with rasterio.open('netcdf:SMAP_L4_SM_gph_20150331T013000_Vv7032_001_HEGOUT.nc:/Geophysical_Data/precipitation_total_surface_flux') as rfh:
rio_data = rfh.read(1, masked=True)
numpy.array_equal(nc_data, rio_data)
False
numpy.array_equal(nc_data, numpy.flip(rio_data, 0))
True
This is not something that can be resolved in rioxarray.
from rioxarray.
Hi,
I'm just found this issue while searching for a related problem. Thank you for documenting it so well.
I'm actually working on a tutorial for NSIDC to demonstrate working with SMAP data. So this is quite timely.
I do not think it is a software/tool problem but how you are trying to read the data. I'm using the original HDF5 file from NSIDC but the result is the same. I'm guessing you used a reprojection service to get the dataset you posted here.
TLDR: I think the problem is that rasterio
does not interpret the coordinate information and treats the data as an image - which is what rasterio
is for. If you want the features offered by rioxarray
you can use:
ds_rioxarray = xr.merge([
xr.open_dataset(filepath, decode_coords="all"),
xr.open_dataset(filepath, group='Geophysical_Data', decode_coords="all")['precipitation_total_surface_flux'],
])
I'm using the following packages
from pathlib import Path
import xarray as xr
import rioxarray
print(xr.__version__)
print(rioxarray.__version__)
2024.2.0
0.15.0
filepath = Path("smap_data/SMAP_L4_SM_gph_20150331T013000_Vv7032_001.h5")
The SMAP datasets you are using are either HDF5 (the original file from NSIDC) or NetCDF4 and follow most of the CF-Conventions, at least for the Level-4 data. So you can use
engine = "netcdf4"
ds = xr.merge([
xr.open_dataset(filepath, engine=engine),
xr.open_dataset(filepath, group='Geophysical_Data', engine=engine)['precipitation_total_surface_flux'],
])
ds
Or
engine="h5netcdf"
ds = xr.merge([
xr.open_dataset(filepath, engine=engine, phony_dims="sort"),
xr.open_dataset(filepath, group='Geophysical_Data', engine=engine,
phony_dims="sort")['precipitation_total_surface_flux'],
])
ds
These both return an xarray dataset like
<xarray.Dataset> Size: 175MB
Dimensions: (phony_dim_2: 1, y: 1624, x: 3856)
Coordinates:
* x (x) float64 31kB -1.736e+07 ... 1.736e+07
* y (y) float64 13kB 7.31e+06 ... -7.31e+06
Dimensions without coordinates: phony_dim_2
Data variables:
EASE2_global_projection (phony_dim_2) |S1 1B ...
cell_column (y, x) float64 50MB ...
cell_lat (y, x) float32 25MB ...
cell_lon (y, x) float32 25MB ...
cell_row (y, x) float64 50MB ...
time (phony_dim_2) datetime64[ns] 8B ...
precipitation_total_surface_flux (y, x) float32 25MB ...
Attributes:
Comment: HDF-5
Contact: http://gmao.gsfc.nasa.gov/
Conventions: CF
Filename: /discover/nobackup/projects/gmao/smap/SMAP_L4/L4_SM/Vv7032/...
History: File written by ldas2daac.x
Institution: NASA Global Modeling and Assimilation Office
References: see SMAP L4_SM Product Specification Documentation
Source: v17.11.1
Title: SMAP L4_SM Geophysical (GPH) Data Granule
Note that the x
and y
coordinates are projected-coordinates for the EASE-Grid v2.0 Global CRS. In your example they are geographic coordinates.
What I think is happening when you use either the rasterio
backend or rioxarray
is that the HDF5 GDAL driver (which is what rasterio
is using under-the-hood) is not able to parse the coordinate information.
engine = "rasterio"
ds_rasterio = xr.open_dataset(filepath, engine=engine)
This returns a bunch of warnings about not being able to determine georeferencing
[/home/apbarret/mambaforge/envs/nsidc-tutorials/lib/python3.9/site-packages/rioxarray/_io.py:1132](http://localhost:8890/home/apbarret/mambaforge/envs/nsidc-tutorials/lib/python3.9/site-packages/rioxarray/_io.py#line=1131): NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
warnings.warn(str(rio_warning.message), type(rio_warning.message)) # type: ignore
Printing x
and y
shows that the data are not georeferenced because the grid cell column and row indices - image coordinates - are returned not the projected coordinates.
print(ds_rasterio.x, rasterio.y)
<xarray.DataArray 'x' (x: 3856)> Size: 31kB
array([5.0000e-01, 1.5000e+00, 2.5000e+00, ..., 3.8535e+03, 3.8545e+03,
3.8555e+03])
Coordinates:
* x (x) float64 31kB 0.5 1.5 ... 3.854e+03 3.856e+03
EASE2_global_projection int64 8B ... <xarray.DataArray 'y' (y: 1624)> Size: 13kB
array([5.0000e-01, 1.5000e+00, 2.5000e+00, ..., 1.6215e+03, 1.6225e+03,
1.6235e+03])
Coordinates:
* y (y) float64 13kB 0.5 1.5 ... 1.622e+03 1.624e+03
EASE2_global_projection int64 8B ...
I suspect that rasterio
is treating the data as an image, which by convention have the origin of image coordinates at the upper-left corner of the upper-left pixel. So pixel[0.5, 0.5]
is the upper-left-most pixel center. However, when you plot the data with the xarray.DataArray.plot
method, the coordinate sytem that pcolormesh
(the default plotting method) uses is a normal cartesian coordinate system with the origin at the lower left, so the image appears upside down.
Plotting cell_lat
shows that this is the case, southern hemisphere grid-cells are plotted at the top of the image.
ds_rasterio.cell_lat.plot()
Using ds_rasterio.cell_lat.squeeze().plot.imshow(origin="upper")
to plot cell lat, which sets the origin to be the upper-left cell, gives the expected result: postive latitudes at the top of the image.
If you want to take advantage of the accessors and methods provided by rioxarray
then you need to import rioxarray
and set decode_coords="all"
. Note: this throws a warning because there is no coordinate information in the Geophysical_Data group and xarray will only look in the group specified. IMO I don't think nested groups are necessary for these kinds of high-level products. Merging the root and precipitation datasets retains the georeferencing from the first call to open_dataset
.
ds_rioxarray = xr.merge([
xr.open_dataset(filepath, decode_coords="all"),
xr.open_dataset(filepath, group='Geophysical_Data', decode_coords="all")['precipitation_total_surface_flux'],
])
print(ds_rioxarray)
<xarray.Dataset> Size: 175MB
Dimensions: (phony_dim_2: 1, y: 1624, x: 3856)
Coordinates:
EASE2_global_projection (phony_dim_2) |S1 1B ...
* x (x) float64 31kB -1.736e+07 ... 1.736e+07
* y (y) float64 13kB 7.31e+06 ... -7.31e+06
Dimensions without coordinates: phony_dim_2
Data variables:
cell_column (y, x) float64 50MB ...
cell_lat (y, x) float32 25MB ...
cell_lon (y, x) float32 25MB ...
cell_row (y, x) float64 50MB ...
time (phony_dim_2) datetime64[ns] 8B ...
precipitation_total_surface_flux (y, x) float32 25MB ...
Attributes:
Source: v17.11.1
Institution: NASA Global Modeling and Assimilation Office
History: File written by ldas2daac.x
Comment: HDF-5
Filename: /discover/nobackup/projects/gmao/smap/SMAP_L4/L4_SM/Vv7032/...
Title: SMAP L4_SM Geophysical (GPH) Data Granule
Conventions: CF
References: see SMAP L4_SM Product Specification Documentation
Contact: http://gmao.gsfc.nasa.gov/
[/tmp/ipykernel_230171/3642432009.py:3](http://localhost:8890/tmp/ipykernel_230171/3642432009.py#line=2): UserWarning: Variable(s) referenced in grid_mapping not in variables: ['EASE2_global_projection']
xr.open_dataset(filepath, group='Geophysical_Data', decode_coords="all")['precipitation_total_surface_flux'],
ds_rioxarray.precipitation_total_surface_flux.plot(vmax=0.001)
from rioxarray.
Related Issues (20)
- rioxarray.open_rasterio(file) doesn't work HOT 1
- Support GCPs without known z coordinate HOT 3
- `spatial_ref` coordinate not accessible after saving dataset HOT 2
- Add a rio.fill.fillnodata-based nodata interpolation in addition to existing scipy-based HOT 1
- write somewhere in the documentation that extentions are not respecting F401
- Rename bands as variables using long_name attribute HOT 9
- typo in docs
- `rio.transform()` does not retrieve exact transform stored in `rio.write_transform()` HOT 2
- Document difference between `set_crs` and `write_crs` HOT 4
- Typo in docs: longitute
- Fail to reproject and reproject_match a dataset with rotation affine. HOT 3
- reproject_match renames dims HOT 2
- Xarray padding with mode='reflect'
- Padding and Croping doesn't end up same result HOT 2
- overview_level failing in xarray with engine='rasterio' due to missing doc? HOT 2
- Rio array merge missing HOT 3
- Delayed/chunked opening (sentinel) SAFE data with bands as variables fails HOT 1
- `reproject_match` raises `MissingSpatialDimensionError` with spatial dims set HOT 1
- Save larger raster with zstd compression writes dirty block HOT 3
- Memory leak when looping through data variables of a dataset loaded from a VRT HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rioxarray.