Giter Club home page Giter Club logo

Comments (10)

dcherian avatar dcherian commented on September 23, 2024

Did you try the rioxarray engine instead of rasterio?

from rioxarray.

RY4GIT avatar RY4GIT commented on September 23, 2024

@dcherian Sorry I realized I should have posted this in rasterio repo. I was confused rasterio and rioxarray! I will try them out and let you know soon, but you may close the issue if you think this doesn't fit here.

from rioxarray.

dcherian avatar dcherian commented on September 23, 2024

No! This is the place. Just try engine="rioxarray". engine="rasterio" is bundled with xarray and is about be removed. rioxarray is a more full-featured version of the same funcitonality

from rioxarray.

snowman2 avatar snowman2 commented on September 23, 2024

Actually, engine="rasterio" comes from rioxarray ref. It calls rioxarray.open_rasterio.

from rioxarray.

dcherian avatar dcherian commented on September 23, 2024

Oh that's confusing! In that case, i guess it's calling rioxarray since its installed.

from rioxarray.

RY4GIT avatar RY4GIT commented on September 23, 2024

So I tried open_rasterio with NSIDC output, but the results are the same.
The issue did not happen with data directly downloaded from NASA Earth data (see updates on pydata/xarray#7621), so I am not sure it's an issue with rioxarray or the NSIDC output.

ds_NSIDC_rioxarray = rioxarray.open_rasterio(os.path.join(input_path, fn_NSIDC_output))
ds_NSIDC_rioxarray = ds_NSIDC_rioxarray.set_coords(["cell_lat", "cell_lon"])
ds_NSIDC_rioxarray

image

ds_NSIDC_rioxarray.precipitation_total_surface_flux[0].plot(y="cell_lat", x="cell_lon", vmin=-0.001)
image

from rioxarray.

snowman2 avatar snowman2 commented on September 23, 2024

If you do this:

rds = rioxarray.open_rasterio(os.path.join(input_path, fn_NSIDC_output), group="Geophysical_Data")

or this:

rds = xarray.open_dataset(os.path.join(input_path, fn_NSIDC_output), group="Geophysical_Data", engine="rasterio")

The data plots correctly with rasterio.

EDIT: This does not fix the issue. The plot has the correct orientation. However, the data is missing coordinates.

from rioxarray.

snowman2 avatar snowman2 commented on September 23, 2024

If you do this, the lat/lon data looks correct:

ds_NSIDC_root  = xarray.open_dataset(data_path, variable=["cell_lon", "cell_lat"], engine='rasterio')
ds_NSIDC_root.cell_lat.plot()

from rioxarray.

snowman2 avatar snowman2 commented on September 23, 2024

This appears to be a rasterio/GDAL issue:

import rasterio
import netCDF4
import numpy

with netCDF4.Dataset(data_path) as nfh:
    nc_data = nfh['Geophysical_Data']['precipitation_total_surface_flux'][:]
with rasterio.open('netcdf:SMAP_L4_SM_gph_20150331T013000_Vv7032_001_HEGOUT.nc:/Geophysical_Data/precipitation_total_surface_flux') as rfh:
    rio_data = rfh.read(1, masked=True)
numpy.array_equal(nc_data, rio_data)
False
numpy.array_equal(nc_data, numpy.flip(rio_data, 0))
True

This is not something that can be resolved in rioxarray.

from rioxarray.

andypbarrett avatar andypbarrett commented on September 23, 2024

Hi,

I'm just found this issue while searching for a related problem. Thank you for documenting it so well.

I'm actually working on a tutorial for NSIDC to demonstrate working with SMAP data. So this is quite timely.

I do not think it is a software/tool problem but how you are trying to read the data. I'm using the original HDF5 file from NSIDC but the result is the same. I'm guessing you used a reprojection service to get the dataset you posted here.

TLDR: I think the problem is that rasterio does not interpret the coordinate information and treats the data as an image - which is what rasterio is for. If you want the features offered by rioxarray you can use:

ds_rioxarray  = xr.merge([
    xr.open_dataset(filepath, decode_coords="all"),
    xr.open_dataset(filepath, group='Geophysical_Data', decode_coords="all")['precipitation_total_surface_flux'],
])

I'm using the following packages

from pathlib import Path

import xarray as xr
import rioxarray

print(xr.__version__)
print(rioxarray.__version__)
2024.2.0
0.15.0
filepath = Path("smap_data/SMAP_L4_SM_gph_20150331T013000_Vv7032_001.h5")

The SMAP datasets you are using are either HDF5 (the original file from NSIDC) or NetCDF4 and follow most of the CF-Conventions, at least for the Level-4 data. So you can use

engine = "netcdf4"
ds = xr.merge([
    xr.open_dataset(filepath, engine=engine),
    xr.open_dataset(filepath, group='Geophysical_Data', engine=engine)['precipitation_total_surface_flux'],
    ])
ds

Or

engine="h5netcdf"
ds = xr.merge([
    xr.open_dataset(filepath, engine=engine, phony_dims="sort"),
    xr.open_dataset(filepath, group='Geophysical_Data', engine=engine, 
                               phony_dims="sort")['precipitation_total_surface_flux'],
    ])
ds

These both return an xarray dataset like

<xarray.Dataset> Size: 175MB
Dimensions:                           (phony_dim_2: 1, y: 1624, x: 3856)
Coordinates:
  * x                                 (x) float64 31kB -1.736e+07 ... 1.736e+07
  * y                                 (y) float64 13kB 7.31e+06 ... -7.31e+06
Dimensions without coordinates: phony_dim_2
Data variables:
    EASE2_global_projection           (phony_dim_2) |S1 1B ...
    cell_column                       (y, x) float64 50MB ...
    cell_lat                          (y, x) float32 25MB ...
    cell_lon                          (y, x) float32 25MB ...
    cell_row                          (y, x) float64 50MB ...
    time                              (phony_dim_2) datetime64[ns] 8B ...
    precipitation_total_surface_flux  (y, x) float32 25MB ...
Attributes:
    Comment:      HDF-5
    Contact:      http://gmao.gsfc.nasa.gov/
    Conventions:  CF
    Filename:     /discover/nobackup/projects/gmao/smap/SMAP_L4/L4_SM/Vv7032/...
    History:      File written by ldas2daac.x
    Institution:  NASA Global Modeling and Assimilation Office
    References:   see SMAP L4_SM Product Specification Documentation
    Source:       v17.11.1
    Title:        SMAP L4_SM Geophysical (GPH) Data Granule

Note that the x and y coordinates are projected-coordinates for the EASE-Grid v2.0 Global CRS. In your example they are geographic coordinates.

What I think is happening when you use either the rasterio backend or rioxarray is that the HDF5 GDAL driver (which is what rasterio is using under-the-hood) is not able to parse the coordinate information.

engine = "rasterio"
ds_rasterio = xr.open_dataset(filepath, engine=engine)

This returns a bunch of warnings about not being able to determine georeferencing

[/home/apbarret/mambaforge/envs/nsidc-tutorials/lib/python3.9/site-packages/rioxarray/_io.py:1132](http://localhost:8890/home/apbarret/mambaforge/envs/nsidc-tutorials/lib/python3.9/site-packages/rioxarray/_io.py#line=1131): NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  warnings.warn(str(rio_warning.message), type(rio_warning.message))  # type: ignore

Printing x and y shows that the data are not georeferenced because the grid cell column and row indices - image coordinates - are returned not the projected coordinates.

print(ds_rasterio.x, rasterio.y)
<xarray.DataArray 'x' (x: 3856)> Size: 31kB
array([5.0000e-01, 1.5000e+00, 2.5000e+00, ..., 3.8535e+03, 3.8545e+03,
       3.8555e+03])
Coordinates:
  * x                        (x) float64 31kB 0.5 1.5 ... 3.854e+03 3.856e+03
    EASE2_global_projection  int64 8B ... <xarray.DataArray 'y' (y: 1624)> Size: 13kB
array([5.0000e-01, 1.5000e+00, 2.5000e+00, ..., 1.6215e+03, 1.6225e+03,
       1.6235e+03])
Coordinates:
  * y                        (y) float64 13kB 0.5 1.5 ... 1.622e+03 1.624e+03
    EASE2_global_projection  int64 8B ...

I suspect that rasterio is treating the data as an image, which by convention have the origin of image coordinates at the upper-left corner of the upper-left pixel. So pixel[0.5, 0.5] is the upper-left-most pixel center. However, when you plot the data with the xarray.DataArray.plot method, the coordinate sytem that pcolormesh (the default plotting method) uses is a normal cartesian coordinate system with the origin at the lower left, so the image appears upside down.

Plotting cell_lat shows that this is the case, southern hemisphere grid-cells are plotted at the top of the image.

ds_rasterio.cell_lat.plot()

smap_h5_cell_lat

Using ds_rasterio.cell_lat.squeeze().plot.imshow(origin="upper") to plot cell lat, which sets the origin to be the upper-left cell, gives the expected result: postive latitudes at the top of the image.

If you want to take advantage of the accessors and methods provided by rioxarray then you need to import rioxarray and set decode_coords="all". Note: this throws a warning because there is no coordinate information in the Geophysical_Data group and xarray will only look in the group specified. IMO I don't think nested groups are necessary for these kinds of high-level products. Merging the root and precipitation datasets retains the georeferencing from the first call to open_dataset.

ds_rioxarray  = xr.merge([
    xr.open_dataset(filepath, decode_coords="all"),
    xr.open_dataset(filepath, group='Geophysical_Data', decode_coords="all")['precipitation_total_surface_flux'],
])

print(ds_rioxarray)
<xarray.Dataset> Size: 175MB
Dimensions:                           (phony_dim_2: 1, y: 1624, x: 3856)
Coordinates:
    EASE2_global_projection           (phony_dim_2) |S1 1B ...
  * x                                 (x) float64 31kB -1.736e+07 ... 1.736e+07
  * y                                 (y) float64 13kB 7.31e+06 ... -7.31e+06
Dimensions without coordinates: phony_dim_2
Data variables:
    cell_column                       (y, x) float64 50MB ...
    cell_lat                          (y, x) float32 25MB ...
    cell_lon                          (y, x) float32 25MB ...
    cell_row                          (y, x) float64 50MB ...
    time                              (phony_dim_2) datetime64[ns] 8B ...
    precipitation_total_surface_flux  (y, x) float32 25MB ...
Attributes:
    Source:       v17.11.1
    Institution:  NASA Global Modeling and Assimilation Office
    History:      File written by ldas2daac.x
    Comment:      HDF-5
    Filename:     /discover/nobackup/projects/gmao/smap/SMAP_L4/L4_SM/Vv7032/...
    Title:        SMAP L4_SM Geophysical (GPH) Data Granule
    Conventions:  CF
    References:   see SMAP L4_SM Product Specification Documentation
    Contact:      http://gmao.gsfc.nasa.gov/
[/tmp/ipykernel_230171/3642432009.py:3](http://localhost:8890/tmp/ipykernel_230171/3642432009.py#line=2): UserWarning: Variable(s) referenced in grid_mapping not in variables: ['EASE2_global_projection']
  xr.open_dataset(filepath, group='Geophysical_Data', decode_coords="all")['precipitation_total_surface_flux'],
ds_rioxarray.precipitation_total_surface_flux.plot(vmax=0.001)

smap_h5_precipitation

from rioxarray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.