aazuspan / wxee Goto Github PK

View Code? Open in Web Editor NEW

203.0 203.0 15.0 36 MB

A Python interface between Earth Engine and xarray for processing time series data

Home Page: https://wxee.readthedocs.io/en/latest/

License: MIT License

Python 100.00%

climate climatology earth-engine earth-observation gis gridded netcdf raster time-series weather wx xarray

wxee's People

Contributors

Stargazers

Watchers

Forkers

dongyi1996 surajitdb renjianning morandiaye florinzai russmain gaowudao 2320sharon ajaysbsc lxiu-yu heqifan linzhi01 pankajkarman liushuci nilick

wxee's Issues

Performace is decreasing over time

Hi! First of all: congrats for this amazing library, is really really useful!!!
I am trying to download a image collection (DK_collection) that has 36 images per year, along 15 years (2004-2020).
I have been dealing with "User memory limit" problem while trying to do it.

At the beginning, I tried to download the collection in only one iteration, but is raised me a memory error.
For that reason, I have implemented one iteration per year, as you can see in the screenshot.
This solves the memory error but the performance is not good: first year takes 3:39 (so fast) mins and it increases linearly till 10 mins at the 8th year. So, at the end, at least 3h are needed to download the data.

The image collection is so simple: it consists in only one band and resolution is so low: 11km for a very small area, which makes 24 values for latitude and 35 for longitude. I think the images are not so heavy to show this poor performance.... so for sure I am missing something!

Any ideas on how to improve this???? Thanks in advance
Best regards,
Amelia

"EEException: Date: Parameter 'value' is required" when trying to download ASTER or SRTM data

Thanks for the great tool!

When I try to download an ASTER or SRTM elevation image clipped by an AOI the error "EEException: Date: Parameter 'value' is required" is produced.

Code:
aoi_ASTER = ee.Image('NASA/ASTER_GED/AG100_003').select('elevation').clip(aoi)
aoi_ASTER_ds = aoi_ASTER.wx.to_xarray(scale=50, crs='EPSG:6931')

Error:
`---------------------------------------------------------------------------
HttpError Traceback (most recent call last)
~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/ee/data.py in _execute_cloud_call(call, num_retries)
333 try:
--> 334 return call.execute(num_retries=num_retries)
335 except googleapiclient.errors.HttpError as e:

~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/googleapiclient/_helpers.py in positional_wrapper(*args, **kwargs)
133 logger.warning(message)
--> 134 return wrapped(*args, **kwargs)
135

~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/googleapiclient/http.py in execute(self, http, num_retries)
914 if resp.status >= 300:
--> 915 raise HttpError(resp, content, uri=self.uri)
916 return self.postproc(resp, content)

HttpError: <HttpError 400 when requesting https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/value:compute?prettyPrint=false&alt=json returned "Date: Parameter 'value' is required.". Details: "Date: Parameter 'value' is required.">

During handling of the above exception, another exception occurred:

EEException Traceback (most recent call last)
/var/folders/9t/w440ycrj6z94kk6yxbspq8n40000gn/T/ipykernel_56124/1447908745.py in
1 # Get into xarray datasets
----> 2 overuman_ASTER_ds = overuman_ASTER.wx.to_xarray(scale=50, crs='EPSG:4236')
3 overuman_ASTER_ds

~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/wxee/image.py in to_xarray(self, path, region, scale, crs, masked, nodata, progress, max_attempts)
80 """
81 with tempfile.TemporaryDirectory(prefix=constants.TMP_PREFIX) as tmp:
---> 82 files = self.to_tif(
83 out_dir=tmp,
84 region=region,

~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/wxee/image.py in to_tif(self, out_dir, description, region, scale, crs, file_per_band, masked, nodata, progress, max_attempts)
161 )
162
--> 163 url = self._get_url(region, scale, crs, file_per_band, nodata, max_attempts)
164
165 tifs = self._url_to_tif(

~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/wxee/image.py in _get_url(self, region, scale, crs, file_per_band, nodata, max_attempts)
237 url = img.getDownloadURL(
238 params=dict(
--> 239 name=image_id.getInfo(),
240 scale=scale,
241 crs=crs,

~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/ee/computedobject.py in getInfo(self)
96 The object can evaluate to anything.
97 """
---> 98 return data.computeValue(self)
99
100 def encode(self, encoder):

~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/ee/data.py in computeValue(obj)
670 The result of evaluating that object on the server.
671 """
--> 672 return _execute_cloud_call(
673 _get_cloud_api_resource().projects().value().compute(
674 body={'expression': serializer.encode(obj, for_cloud_api=True)},

~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/ee/data.py in _execute_cloud_call(call, num_retries)
334 return call.execute(num_retries=num_retries)
335 except googleapiclient.errors.HttpError as e:
--> 336 raise _translate_cloud_exception(e)
337
338

EEException: Date: Parameter 'value' is required.`

wxee crash in windows WSL linux system

I have a code file to use wxee to convert ee image to xarray array data, and it ran successfully on Windows.
But when I ran the same piece of code on Windows Subsystem for Linux (WSL) Ubuntu, it crashes.

Example:

import ee
ee.Initialize()
import wxee
wxee.Initialize()
myregion=ee.Geometry.LineString([[-84, 30], [-70, 45], [-70, 45], [-84, 30]])
cfsr=[]
dem=ee.ImageCollection('NOAA/CFSV2/FOR6H').filter(ee.Filter.date('1996-02-14', '1996-02-19')).select(['u component_of_wind_height_above_ground'])

etc=dem.wx.to_xarray(region=myregion,scale=2000)

print(etc)

The error was

Requesting data: 0%| | 0/20 [00:00<?, ?it/s]malloc(): unsorted double linked list corrupted
Aborted

again, it ran successfully on Windows, but not on WSL.

Make climatology consistent with WMO guidelines

WMO guidelines

Timeline

This would add a wxee.TimeSeries.timeline() method that plots a timeline showing each image's system:time_start date within the time series. Static plots would be doable using matplotlib which is already a dependency through rasterio, but an interactive plot where you could see the system:id of each image on hover would be much cooler. Unfortunately that would probably require adding plotly as a dependency.

Fix climatology algorithm

Currently, the climatology methods take a reducer and apply it directly to generate monthly aggregates. For example, using climatology_month with ee.Reducer.max() would give you the max of all months in the time period.

Instead, it should apply the given reducer to generate monthly statistics and then always apply ee.Reducer.mean() over the months to get, for example, the climatological average max in each month. The docs (and/or names of methods and kwargs) should be updated to make the output of the climatology methods clear. It should also be made clear that if the data is in the same time unit as the climatology, the reducer will have no effect (e.g. a max day-of-year climatology would only work with hourly data. If the data was already daily, the max reducer would have nothing to reduce).

Is it possible to do space aggregation (instead than time)?

I would like to produce a time series of global daily temperature based on ERA5 data.
In order to do that I'd have to apply a Reducer.mean() operation but on space instead than time.

The docs only explain how to do this operation to average over time. Is there any way to compute an average over the x, y dimensions and then download the result?

Obviously I would like to avoid having to download all data and compute the average locally.

How to call a country using ee.Geometry.Polygon?

Hi Aaron, I am wondering how to call a country using ee.Geometry.Polygon in wxee or is there any other way? Since Google Fusion Tables is not supported any more on Earth Engine, is there a way out to call a country polygon?

Thank you.

Replace parallel system

The standard multiprocessing library forces a lot of awkward workarounds in wxee e.g. aliased instance methods, using functools to pass args, zipping args, etc. It slows down development and won't scale well.

There are a variety of non-standard library packages for parallelizing that may do a better job like Ray and Dask. I need to decide if there's a better option available and implement it if so.

To avoid adding dependencies (or if a conda-forge recipe is not available), I may use a soft dependency where downloads will run in serial if the package cannot be imported, but the user will be notified and instructed on how to download.

`MergeError` when translating to `xarray`

Hi, @aazuspan!

Just wanted to say that I love wxee! I'm using it to combine products from Earth Engine and Planetary Computer and that's amazing! I'm using it almost every day, but sometimes this error happens:

---------------------------------------------------------------------------
MergeError                                Traceback (most recent call last)
/tmp/ipykernel_1042/4012842980.py in <module>
      1 CLOUD_MASK = PCL_s2cloudless(S2_ee).map(PSL).map(PCSL).map(matchShadows).select("CLOUD_MASK")
----> 2 CLOUD_MASK_xarray = CLOUD_MASK.wx.to_xarray(scale = 20,crs = "EPSG:" + str(S2.epsg.data),region = ee_aoi)

/srv/conda/envs/notebook/lib/python3.8/site-packages/wxee/collection.py in to_xarray(self, path, region, scale, crs, masked, nodata, num_cores, progress, max_attempts)
    135             )
    136 
--> 137             ds = _dataset_from_files(files)
    138 
    139         # Mask the nodata values. This will convert int datasets to float.

/srv/conda/envs/notebook/lib/python3.8/site-packages/wxee/utils.py in _dataset_from_files(files)
    120     das = [_dataarray_from_file(file) for file in files]
    121 
--> 122     return xr.merge(das)
    123 
    124 

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge(objects, compat, join, fill_value, combine_attrs)
    898         dict_like_objects.append(obj)
    899 
--> 900     merge_result = merge_core(
    901         dict_like_objects,
    902         compat,

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value)
    633 
    634     prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
--> 635     variables, out_indexes = merge_collected(
    636         collected, prioritized, compat=compat, combine_attrs=combine_attrs
    637     )

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_collected(grouped, prioritized, compat, combine_attrs)
    238                 variables = [variable for variable, _ in elements_list]
    239                 try:
--> 240                     merged_vars[name] = unique_variable(name, variables, compat)
    241                 except MergeError:
    242                     if compat != "minimal":

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in unique_variable(name, variables, compat, equals)
    147 
    148     if not equals:
--> 149         raise MergeError(
    150             f"conflicting values for variable {name!r} on objects to be combined. "
    151             "You can skip this check by specifying compat='override'."

MergeError: conflicting values for variable 'CLOUD_MASK' on objects to be combined. You can skip this check by specifying compat='override'.

It is weird because it is not something that happens all the time, and most of the times I just have to re-run the code and it works. So, I don't know exactly what the problem is xD

Anyway, here I let you the error I got. I was trying to get a cloud mask in GEE and download it as a xarray. I aleady tried it again and now it works, but, as I said, I don't know why. It also happens with other datasets. I was downloading some Sentinel-2 data (just as it is, without any processing steps) and sometimes work, but sometimes it doesn't and I can't reproduce the error because when I re-run it, most of the times it works xD

Ok, that was it!

Thank you!

Implement climatology

Month and day of year
- Mean
- Standard deviation
- Percentiles
Reference implementation

Improve progress bar

Currently, the progress bar only displays after URLs have been returned by GEE, but when a large amount of processing is required on the server side, this can lead to a long wait before anything noticeably happens.

To fix this, there are two options:

Add a separate processing toolbar while URLs are gathered
- Since URL requests run in parallel, the progress bar may be very non-linear and not very helpful
Just log a message letting the user know that the request is running
- This is the method used by cdsapi

I'll test the first implementation and fall back to the second if that doesn't work well.

Add describe method for ClimatologyMean

List the ID, number of images, climatology frequency, maybe the climatology period, etc.

Implement climatological standard deviation

Add wxee.TimeSeries.climatology_std method.
Same implementation as climatology_mean but using ee.Reducer.stdDev
Required for calculating anomalies (#16)

Temporal interpolation

Add a method/methods to wxee.TimeSeries for temporal interpolation to allow filling in missing data, regridding, etc. Usage would look something like below, with different methods (or a kwarg option) for linear, cosine, cubic, maybe spline implementations. This will require writing some reliable tools for getting images surrounding a target date.

gridmet = ee.ImageCollection("IDAHO_EPSCOR/GRIDMET")
col = gridmet.filterDate("2020-01-01", "2020-01-03")
interp: ee.Image = col.wx.linearInterpolate(ee.Date("2020-01-01T18"))

Interpolation reference

Add seasonal resampling and climatology

GEE doesn't natively support seasons, but it should be relatively straightforward to group months into seasons to allow temporal resampling and climatology calculation at the seasonal level.

Pickling fails with local functions (e.g. ee.Image.expression())

Hi, @aazuspan!

First of all, WOW! Your work with eexarray is amazing, keep it going! 🚀

I was using your dev repo to try to convert a S2 collection to xarray, and it works, but, when I compute a spectral index using eemont (that uses ee.Image.expression) it doesn't work:

This works!

import ee, eemont, eexarray

ee.Initialize()

tw = ee.Geometry.Point([10.4522,51.0792])
bf = tw.buffer(500)
xt = bf.bounds()

S2 = ee.ImageCollection("COPERNICUS/S2_SR") \
    .filterBounds(xt) \
    .preprocess() \
    .map(lambda x: x.addBands(x.normalizedDifference(["B8","B4"]).rename("NDVI"))) \
    .limit(10) \
    .map(lambda x: x.clip(xt)) \
    .eex.resample_daily(reducer = ee.Reducer.median())

S2eex = S2.eex.to_xarray(scale=10)

This doesn't work (using eemont)

import ee, eemont, eexarray

ee.Initialize()

tw = ee.Geometry.Point([10.4522,51.0792])
bf = tw.buffer(500)
xt = bf.bounds()

S2 = ee.ImageCollection("COPERNICUS/S2_SR") \
    .filterBounds(xt) \
    .preprocess() \
    .spectralIndices("NDVI") \
    .limit(10) \
    .map(lambda x: x.clip(xt)) \
    .eex.resample_daily(reducer = ee.Reducer.median())

S2eex = S2.eex.to_xarray(scale=10)

This doesn't work (not using eemont)

import ee, eemont, eexarray

ee.Initialize()

tw = ee.Geometry.Point([10.4522,51.0792])
bf = tw.buffer(500)
xt = bf.bounds()

def addExpressionNDVI(x):
    params = {"N": x.select("B8"),"R": x.select("B4")}
    NDVI = x.expression("(N-R)/(N+R)",params).rename("NDVI")
    return x.addBands(NDVI)

S2 = ee.ImageCollection("COPERNICUS/S2_SR") \
    .filterBounds(xt) \
    .preprocess() \
    .map(addExpressionNDVI) \
    .limit(10) \
    .map(lambda x: x.clip(xt)) \
    .eex.resample_daily(reducer = ee.Reducer.median())

S2eex = S2.eex.to_xarray(scale=10)

Error

AttributeError: Can't pickle local object 'Image.expression.<locals>.ReinterpretedFunction'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-37-94ef9caa673d> in <module>
----> 1 S2eex = S2.eex.to_xarray(scale=10)

~/anaconda3/envs/gee/lib/python3.9/site-packages/eexarray/ImageCollection.py in to_xarray(self, path, region, scale, crs, masked, nodata, num_cores, progress, max_attempts)
     90             collection = self._rename_by_time()
     91 
---> 92             files = collection.eex.to_tif(
     93                 out_dir=tmp,
     94                 region=region,

~/anaconda3/envs/gee/lib/python3.9/site-packages/eexarray/ImageCollection.py in to_tif(self, out_dir, prefix, region, scale, crs, file_per_band, masked, nodata, num_cores, progress, max_attempts)
    198                 max_attempts=max_attempts,
    199             )
--> 200             tifs = list(
    201                 tqdm(
    202                     p.imap(params, imgs),

~/anaconda3/envs/gee/lib/python3.9/site-packages/tqdm/std.py in __iter__(self)
   1183 
   1184         try:
-> 1185             for obj in iterable:
   1186                 yield obj
   1187                 # Update and possibly print the progressbar.

~/anaconda3/envs/gee/lib/python3.9/multiprocessing/pool.py in next(self, timeout)
    868         if success:
    869             return value
--> 870         raise value
    871 
    872     __next__ = next                    # XXX

~/anaconda3/envs/gee/lib/python3.9/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    535                         break
    536                     try:
--> 537                         put(task)
    538                     except Exception as e:
    539                         job, idx = task[:2]

~/anaconda3/envs/gee/lib/python3.9/multiprocessing/connection.py in send(self, obj)
    209         self._check_closed()
    210         self._check_writable()
--> 211         self._send_bytes(_ForkingPickler.dumps(obj))
    212 
    213     def recv_bytes(self, maxlength=None):

~/anaconda3/envs/gee/lib/python3.9/multiprocessing/reduction.py in dumps(cls, obj, protocol)
     49     def dumps(cls, obj, protocol=None):
     50         buf = io.BytesIO()
---> 51         cls(buf, protocol).dump(obj)
     52         return buf.getbuffer()
     53 

AttributeError: Can't pickle local object 'Image.expression.<locals>.ReinterpretedFunction'

Versions

xarray 0.19.0
earthengine-api 0.1.277
eemont 0.2.5
python 3.9

It seems to be something related specifically to that earthengine-api method, but if you can find a workaround, that would be amazing! 🚀

And again, thank you very much for eexarray!

Throw helpful error for missing `system:time_start`

A common wxee issue is that EEException: Date: Parameter 'value' is required is thrown when to_xarray is called on an image without a system:time_start property (#43, #50, #53). I should catch this error and throw something more helpful that explains the cause, suggests a workaround, and links to relevant issues.

Set optional dependencies

Several dependencies are relatively niche and would make more sense as optional.

plotly is only used in TimeSeries.timeline
netcdf4 is only used when writing NetCDFs

Improve download stability

The current download system is pretty solid with automated retrying, but the cdsapi package has a more extensive system that should improve download stability. See their implementation for reference.

Problem in installing python version <3.7

it does not work in python version < 3.7 ???

EEException: Date: Parameter 'value' is required.

I was trying to download a median image to xarray and encountered this error below. I understand that we need time series image collections, but wonder if there is a workaround for ee.Image?
Thanks,
Daniel

EEException: Date: Parameter 'value' is required.

afternoon data are missing for hourly data

When I use convert an hourly EE image collection to xarray, the afternoon hours are all missing.

NaN values in Sentinel 1 GRD scenes

i did the same with Sentinel 1 GRD scenes, the issue is some values are just converted as NaN,
why such issue ???
So i am getting a major backscatter values as Nan, why such issue ?

Originally posted by @ashishgitbisht in #46 (comment)

Refactor the time frequency system

Make sure valid options are listed and handled in only one location. Probably implement similarly to scikit-learn's scorer system with a submodule for storing and retrieving valid options. Rather than hardcoding valid options in the error messages, print them programatically from the valid list.

More example notebooks

To do

Climatological means of net primary productivity
Sentinel-2 / Landsat example
Integration with geemap
Temporal smoothing (#31)

Complete

Any suggestions or contributions from wxee users are welcome! :)

`PermissionError` when using `to_xarray` on Windows

It looks like there's a bug in the temp directory handling for 0.4.0 that only seems to affect Windows (and therefore not the CI workflow). It's also possible this was introduced with Windows 11, since I upgraded recently.

This would be handled automatically by resolving #19, so if this isn't a quick fix I may need to wait for that.

Reproducing

import wxee
import ee

wxee.Initialize()

img = ee.ImageCollection("IDAHO_EPSCOR/GRIDMET").first()
img.wx.to_xarray(scale=100_000)

Raises:

---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:627](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:627), in _rmtree_unsafe(path, onerror)
    626 try:
--> 627     os.unlink(fullname)
    628 except OSError:

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\az\\AppData\\Local\\Temp\\wxee_tmpstdzug4g\\IDAHO_EPSCOR_GRIDMET_19790101.time.19790101T060000.pr.tif'

During handling of the above exception, another exception occurred:

PermissionError                           Traceback (most recent call last)
File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:805](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:805), in TemporaryDirectory._rmtree..onerror(func, path, exc_info)
    804 try:
--> 805     _os.unlink(path)
    806 # PermissionError is raised on FreeBSD for directories

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\az\\AppData\\Local\\Temp\\wxee_tmpstdzug4g\\IDAHO_EPSCOR_GRIDMET_19790101.time.19790101T060000.pr.tif'

During handling of the above exception, another exception occurred:

NotADirectoryError                        Traceback (most recent call last)
[c:\Users\az\wxee\ee_computepixels.ipynb](file:///C:/Users/az/wxee/ee_computepixels.ipynb) Cell 9 in ()
      [4](vscode-notebook-cell:/c%3A/Users/az/wxee/ee_computepixels.ipynb#X11sZmlsZQ%3D%3D?line=3) wxee.Initialize()
      [6](vscode-notebook-cell:/c%3A/Users/az/wxee/ee_computepixels.ipynb#X11sZmlsZQ%3D%3D?line=5) img = ee.ImageCollection("IDAHO_EPSCOR/GRIDMET").first()
----> [7](vscode-notebook-cell:/c%3A/Users/az/wxee/ee_computepixels.ipynb#X11sZmlsZQ%3D%3D?line=6) img.wx.to_xarray(scale=100_000)

File [c:\Users\az\wxee\wxee\image.py:90](file:///C:/Users/az/wxee/wxee/image.py:90), in Image.to_xarray(self, path, region, scale, crs, masked, nodata, progress, max_attempts)
     77 with tempfile.TemporaryDirectory(prefix=constants.TMP_PREFIX) as tmp:
     78     files = self.to_tif(
     79         out_dir=tmp,
     80         region=region,
   (...)
     87         progress=progress,
     88     )
---> 90     ds = _dataset_from_files(files, masked, nodata)
     92 if path:
     93     msg = (
     94         "The path argument is deprecated and will be removed in a future "
     95         "release. Use the `xarray.Dataset.to_netcdf` method instead."
     96     )

File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:830](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:830), in TemporaryDirectory.__exit__(self, exc, value, tb)
    829 def __exit__(self, exc, value, tb):
--> 830     self.cleanup()

File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:834](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:834), in TemporaryDirectory.cleanup(self)
    832 def cleanup(self):
    833     if self._finalizer.detach():
--> 834         self._rmtree(self.name)

File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:816](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:816), in TemporaryDirectory._rmtree(cls, name)
    813     else:
    814         raise
--> 816 _shutil.rmtree(name, onerror=onerror)

File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:759](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:759), in rmtree(path, ignore_errors, onerror)
    757     # can't continue even if onerror hook returns
    758     return
--> 759 return _rmtree_unsafe(path, onerror)

File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:629](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:629), in _rmtree_unsafe(path, onerror)
    627             os.unlink(fullname)
    628         except OSError:
--> 629             onerror(os.unlink, fullname, sys.exc_info())
    630 try:
    631     os.rmdir(path)

File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:808](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:808), in TemporaryDirectory._rmtree..onerror(func, path, exc_info)
    806     # PermissionError is raised on FreeBSD for directories
    807     except (IsADirectoryError, PermissionError):
--> 808         cls._rmtree(path)
    809 except FileNotFoundError:
    810     pass

File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:816](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:816), in TemporaryDirectory._rmtree(cls, name)
    813     else:
    814         raise
--> 816 _shutil.rmtree(name, onerror=onerror)

File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:759](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:759), in rmtree(path, ignore_errors, onerror)
    757     # can't continue even if onerror hook returns
    758     return
--> 759 return _rmtree_unsafe(path, onerror)

File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:610](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:610), in _rmtree_unsafe(path, onerror)
    608         entries = list(scandir_it)
    609 except OSError:
--> 610     onerror(os.scandir, path, sys.exc_info())
    611     entries = []
    612 for entry in entries:

File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:607](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:607), in _rmtree_unsafe(path, onerror)
    605 def _rmtree_unsafe(path, onerror):
    606     try:
--> 607         with os.scandir(path) as scandir_it:
    608             entries = list(scandir_it)
    609     except OSError:

NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\Users\\az\\AppData\\Local\\Temp\\wxee_tmpstdzug4g\\IDAHO_EPSCOR_GRIDMET_19790101.time.19790101T060000.pr.tif'

EEException: Image.clipToBoundsAndScale: The geometry for image clipping must be bounded.

Hi, thanks for the great library. I ran into the following error:

collection_name = "MODIS/061/MOD13A2"
collection = ee.ImageCollection(collection_name) \ .filterDate('2019-11-01', '2019-12-31') .filterBounds(roi)
collection.wx.to_xarray()

EEException: Image.clipToBoundsAndScale: The geometry for image clipping must be bounded.

This error also show up when I remove the bounds filter

Implement climatological anomalies

Create a wxee.TimeSeries.climatology_anomaly method
Include args for frequency and whether or not to standardize
Implementation:
- To calculate anomalies, mean climatological normals will be required. Climatological standard deviations will be required for standardized anomalies.
- For flexibility, this should probably be implemented with required arguments for climatological mean and standard deviation TimeSeries objects. Users would run wxee.TimeSeries.climatology_mean and wxee.TimeSeries.climatology_std to generate those inputs and then pass those and another TimeSeries to calculate anomalies from.

Something like:

ts = wxee.TimeSeries("IDAHO_EPSCOR/GRIDMET").filterDate("1981", "2011").select("pr")
mean = ts.climatology_mean("month")
std = ts.climatology_std("month")
anom = ts.climatology_anomaly("month", mean, std, standardize=True)

Reference

Unit tests

Need to decide whether I want tests that pull data from the server and take forever to run or quick tests that ensure things run but don't validate any results.

Use regridding to handle leap day climatologies

Currently, running climatology_dayofyear groups days by Julian date. In a leap year, all days after February 29 will be pushed back one Julian day, so the climatological day-of-year 365 would represent December 31 in non-leap years and December 30 in leap years, for example. Day 366 would always represent December 31, but would be aggregated from 1/4 as many days as other days of the year.

Tools like Ferret handle this by re-gridding all years into 365 steps regardless of leap days (Reference 1, Reference 2).

Regridding may not be a practical solution in GEE, but it should be considered. If the current solution is kept, the docs should be updated to make that distinction clear.

xarray.to_rasterio()

xr.open_rasterio() doesnt work with the latest xarray package (encountered in wx.to_array(..)).
can be solved easily by adding rioxarray to utils.py (or specify xarray version).

Example notebook prolog

Use the nbsphinx prolog feature to automatically add Github, Binder, and Colab links to all example notebooks. See sankee implementation

Once that's done, remove the manually added links from notebooks.

Initialize with high-volume endpoint

Automated requests to Earth Engine (everything made by wxee) should be made through the high-volume endpoint. This feature would add a wxee.Initialize function to initialize with that endpoint.

Examples should be updated to use wxee.Initialize instead of ee.Initialize, and an explanation should be added to the docs.

Specific points to xarray

Dear Aaron Zuzpan

Thank you very much for this wonderful package.
I have in my assets a shp with 64 points, also locally as geojson. I tried following your instruction here , #28, to download sentinel-2 bands to xarray of those specifics 64 points. But, the total points depends of scale and region, being differents in number and localization of those 64.
There is any way to download those specific points to xarray?.

Thank in advance.

Walter Pereira

Add Drive export and import method

This would add two methods allowing ee.ImageCollection and its subclass objects to be exported to a Drive and then imported into an xarray.Dataset. Dimension and coordinates would be stored in filenames and parsed on import. This feature would allow users to handle time series data when file size or grid size is too large or computations time out.

Planned usage reference:

ts = wxee.TimeSeries("IDAHO_EPSCOR/GRIDMET").filterDate("2020", "2021")
task = ts.wx.to_drive(crs="EPSG:5070", scale=4_000)

# Once files are exported, user manually downloads them to a local folder
data_dir = "data"

ds = wxee.load_dataset(data_dir)

Drive exporting will be very similar to the wxee.image._get_url method but will instead run and return a batch export task. All of the importing functionality is already implemented in the private wxee.utils._dataset_from_files, so that portion should be simple.

Hourly precipitation from GPM to daily

Dear Aaron,

I would like to transform hourly precipitation from GPM to daily. As GPM images of precipitation (mm/h) are every 30 min, I have divide by 2. But my code have apponted an error as "Date: Parameter 'value' is required". Could you helpe me? See the example of my code bellow:

vamos filtrar o conjunto de shapefiles com a função `.filter()` e escolher apenas a área de `MG`:

regiao = ee.FeatureCollection('FAO/GAUL/2015/level1').filter(ee.Filter.eq('ADM1_NAME', 'Minas Gerais'))

**# carrega os dados
gpm = ee.ImageCollection('NASA/GPM_L3/IMERG_V06')
.select('precipitationCal')
.filterDate('2021-01-01', '2021-03-11')
.filterBounds(regiao)

transforma de mm/h para mm/0.5h

gpm = gpm.map(lambda img: img.multiply(0.5))

carrega os dados em formato de serie temporal

ts = gpm.wx.to_time_series()

agrupa para diário

daily = ts.aggregate_time(frequency='day', reducer=ee.Reducer.sum())

transforma para DataSet do Xarray**

ds = daily.wx.to_xarray(region=regiao.geometry(), scale=7000)

Thank you very much.

Best Regards,
Enrique

All parallel downloads fail with conda-forge installation

Issue

Any parallel operations (specifically wxee.TimeSeries.wx.to_xarray()) will fail and may crash Python in a fresh install. On Linux the issue causes an immediate crash and "segmentation fault" message. On Windows it throws an SSL error, usually after downloading several images, or Python crashes silently. This happens on a clean install of wxee from conda-forge but has not happened in my development environment, so it is probably a package version or missing dependency issue.

Temporary Workaround

Setting num_cores to 1 (which disables multiprocessing) seems to resolve the issue but slows down downloads.

size limit error

Hu aazuspan, thank you for providing this package is very helpful!
Im having problems to get images from landsat-8 imageCollection and convert to array due to size limit. I selected only one month period and two bands, but I`m getting an error message unless I set the scale 250m or higher (I wish 30m). Do you know if there is a way to solve this or is a GEE or code restriction this size? Thank you

Error "e.ee_exception.EEException: Total request size (238694952 bytes) must be less than or equal to 50331648 bytes."

My code

import ee
import wxee
from geetools import tools

wxee.Initialize()
ee.Initialize()


# Using CONUS C2 ARD tile 2613 tile (assuming there is a better way to import the grid)
aoi = ee.Geometry.Polygon([[[-81.5595250019701439, 32.8743922803664361], [-79.6900076077309478, 32.8743922803664361], [-79.6900076077309478, 34.4158935126628762], [-81.5595250019701439, 34.4158935126628762], [-81.5595250019701439,32.8743922803664361]]])

# Define image collection (here we are using Landsat-8 surface reflectance)
L8= ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')

# Filtering date, study area and selecting bands (in case of not using all of them)
collection = L8.filterDate("2020-07-05", "2020-08-11").filterBounds(aoi).select("B5", "B4")

# The coordinate reference system to use
crs = "EPSG:4326"

# Spatial resolution in CRS units (meters)
## PS: I set 250 otherwise is too big "e.ee_exception.EEException: Total request size (238694952 bytes) must be less than or equal to 50331648 bytes."
scale = 250

arr = collection.wx.to_xarray(scale=scale, crs=crs, region=aoi)

arr

path = "ARD2613_20200705_20200811_B4_B5.nc"

arr = collection.wx.to_xarray(path=path, scale=scale, crs=crs)

define scale in wx.to_xarray()

Hi, I have a Landsat time-series in epsg:4326 downloaded from the google earth engine that I am trying to convert to xarray.
The area covers the entire Las Vegas. Using ds = landsat_ts.wx.to_xarray() resulted in a ds with coarse scale of 1 decimal degree.
My question is how to define scale and crs parameters in the wx.to_xarray() function to get the raw Landsat's resolution of 30m?
Thanks,
Daniel

Attributes:
transform :
(1.0, 0.0, -116.0, 0.0, -1.0, 37.0)
crs :
+init=epsg:4326
res :
(1.0, 1.0)
is_tiled :
1
nodatavals :
(-32768.0,)
scales :
(1.0,)
offsets :
(0.0,)
AREA_OR_POINT :
Area
TIFFTAG_RESOLUTIONUNIT :
1 (unitless)
TIFFTAG_XRESOLUTION :
1
TIFFTAG_YRESOLUTION :
1

Drop support for Python 3.7

The 3.7 build is failing because of an incompaitiblity between xarray and the new importlib-metadata release (see pydata/xarray#7149).

xarray dropped support for 3.7 quite a while ago, so it's time wxee does too.

Converting Half/3-hourly to daily and monthly

Hi, I am wondering if wxee could convert half-hourly / 3-hourly data to daily/ monthly data for the following data sets:

ee.ImageCollection("TRMM/3B42") (3-hourly precipitation)
ee.ImageCollection("NASA/GPM_L3/IMERG_V06") (half-hourly)

Thanking you.

Allow overriding `col` in `rgb` method

The rgb method for plotting xarray objects lets users override most of the default arguments, but col is always set to time.

wxee/wxee/xarray.py

Line 100 in f171265

return da.plot.imshow(col="time", **kwargs)

We should allow col to be set by the user to allow plotting non time series data.

Time series smoothing filter

Add a wxee.TimeSeries.smooth_time method that applies pixel-wise temporal smoothing to a time series.

Whittaker smoothing
Savitzky Golay smoothing
Moving window with variable size and reducers
- Because the parameters are significantly different, this is implemented as a separate method wxee.TimeSeries.rolling_time rather than as a "type" of smooth_time.

Gap filling

Temporal smoothing (#31) requires gap-filled time series images. This feature would be a wxee.TimeSeries.fill_gaps method that uses a selectable method of interpolation (nearest neighbor, linear, and cubic) to unmask each image in the time series. Either drop images that don't have enough neighbors or just fall back to using nearest neighbor...

Remove NetCDF export references from tutorials

Passing a path to ee.ImageCollection.wx.to_xarray to automatically save to NetCDF was deprecated in 0.4.0 in favor of using the to_netcdf method (just as easy and much more flexible). However, there are still references to NetCDF export support in the documentation and tutorials that use the deprecated parameter. Those should be removed.

`timeline` and `dataframe` fail when `system:time_end` is missing

Some collections (at least Landsat 7) are missing system:time_end properties for most images, even though they have system:time_start properties. This causes dataframe to fail because the fields used to initialize the pd.Dataframe are of different lengths. I'll probably drop system:time_end from dataframe since it's rarely useful and filling missing values is not supported by aggregate_array.

Reproduce

pt = ee.Geometry.Point([-121.690476, 45.432933])

(wxee.TimeSeries("LANDSAT/LE07/C01/T1_SR")
      .filterBounds(pt).timeline()
)

Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-53-af49cab8b808> in <module>
      1 pt = ee.Geometry.Point([-121.690476, 45.432933])
      2 
----> 3 (wxee.TimeSeries("LANDSAT/LE07/C01/T1_SR")
      4       .filterBounds(pt).timeline()
      5 )

~\anaconda3\envs\gee\lib\site-packages\wxee\time_series.py in timeline(self)
    151             A Plotly graph object interactive plot showing the acquisition time of each image in the time series.
    152         """
--> 153         df = self.dataframe()
    154         df["y"] = 0
    155 

~\anaconda3\envs\gee\lib\site-packages\wxee\time_series.py in dataframe(self)
    139         ends = [_millis_to_datetime(ms) for ms in ends_millis]
    140 
--> 141         df = pd.DataFrame({"id": ids, "time_start": starts, "time_end": ends})
    142         df.index.id = collection_id
    143         return df

~\anaconda3\envs\gee\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    527 
    528         elif isinstance(data, dict):
--> 529             mgr = init_dict(data, index, columns, dtype=dtype)
    530         elif isinstance(data, ma.MaskedArray):
    531             import numpy.ma.mrecords as mrecords

~\anaconda3\envs\gee\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
    285             arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    286         ]
--> 287     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    288 
    289 

~\anaconda3\envs\gee\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
     78         # figure out the index, if necessary
     79         if index is None:
---> 80             index = extract_index(arrays)
     81         else:
     82             index = ensure_index(index)

~\anaconda3\envs\gee\lib\site-packages\pandas\core\internals\construction.py in extract_index(data)
    399             lengths = list(set(raw_lengths))
    400             if len(lengths) > 1:
--> 401                 raise ValueError("arrays must all be same length")
    402 
    403             if have_dicts:

ValueError: arrays must all be same length

Figure out feasibility of using `geedim` for downloading backend

geedim is a Python package that supports downloading EE images with automatic tiling to bypass file size limits. I've been wanting to improve the download system in wxee for a while (see #19), and using geedim might be a good way to do that with the added bonus of removing most of the low-level thread and tempfile management that causes a lot of headaches. Ideally, I would replace the entire image downloading system with geedim, both for to_tif and for to_xarray.

It will be quite a bit of work just to figure out how feasible this is, so I'm going to start keeping track of and checking off potential incompatibilities below as I figure them out.

Possible Issues

Parallelizing - geedim uses threads to download tiles of large images whereas wxee uses threads to download images within collections. I'll need to figure out the feasibility of parallelizing on both dimensions or else download speed would tank on large collections of small images, which is the primary focus of wxee.
Download progress - geedim tracks progress of image tiles whereas I need to track progress of images in collections (or both would be fine). I give separate progress bars for retrieving data (requesting the download URLs) and the download itself because the URL request can take a lot of time, and I don't think this will be possible with geedim.
Tempfiles - I don't believe geedim supports tempfile outputs, but that's typically what you want when converting to xarray. I don't want to have to manage files manually, so I'll need to think more about how this will work. Maybe just create temp directories and download into them?
File-per-band - geedim automatically sets filePerBand=False for all downloads. I'll need to do some rewriting to load xarray objects from multi-band images, but that may improve performance on the IO side by reading/writing fewer files.
Masking - wxee takes a nodata argument and replaces masked values with that. After downloading, it sets that value in the image metadata or xarray.Dataset. geedim takes a different approach of adding a "FILL_MASK" band to the image before downloading. The advantage of the geedim approach is that you don't need to choose between exporting everything as a float or risking assigning nodata to real values, but it does require downloading more data from EE, and once you actually get the image into xarray and mask it there's no advantage since xarray will promote everything to float64 anyways to accommodate NaN values. I'll probably live with the geedim approach by applying and removing the mask band after downloading, but I should do some experiments to see how that affects performance (and to make sure I'm fully understanding the geedim approach).

Solved Issues

Setting filenames - The geedim.MaskedImage class exposes and caches EE properties, so building filenames from metadata is straightforward. The only consideration is that we need to persist that MaskedImage instance throughout the download process to avoid having to retrieve properties multiple times.

How to subset an ImageCollection using a shapefile of an area?

Hi, though wxee serves the purpose of subsetting using a rectangular grid or a country, I am wondering if there is any provision for subsetting using a shapefile of an area like the way we can run on GEE platform using an asset in the form of shapefile file.