aazuspan / wxee Goto Github PK
View Code? Open in Web Editor NEWA Python interface between Earth Engine and xarray for processing time series data
Home Page: https://wxee.readthedocs.io/en/latest/
License: MIT License
A Python interface between Earth Engine and xarray for processing time series data
Home Page: https://wxee.readthedocs.io/en/latest/
License: MIT License
Hi! First of all: congrats for this amazing library, is really really useful!!!
I am trying to download a image collection (DK_collection) that has 36 images per year, along 15 years (2004-2020).
I have been dealing with "User memory limit" problem while trying to do it.
At the beginning, I tried to download the collection in only one iteration, but is raised me a memory error.
For that reason, I have implemented one iteration per year, as you can see in the screenshot.
This solves the memory error but the performance is not good: first year takes 3:39 (so fast) mins and it increases linearly till 10 mins at the 8th year. So, at the end, at least 3h are needed to download the data.
The image collection is so simple: it consists in only one band and resolution is so low: 11km for a very small area, which makes 24 values for latitude and 35 for longitude. I think the images are not so heavy to show this poor performance.... so for sure I am missing something!
Any ideas on how to improve this???? Thanks in advance
Best regards,
Amelia
Thanks for the great tool!
When I try to download an ASTER or SRTM elevation image clipped by an AOI the error "EEException: Date: Parameter 'value' is required" is produced.
Code:
aoi_ASTER = ee.Image('NASA/ASTER_GED/AG100_003').select('elevation').clip(aoi)
aoi_ASTER_ds = aoi_ASTER.wx.to_xarray(scale=50, crs='EPSG:6931')
Error:
`---------------------------------------------------------------------------
HttpError Traceback (most recent call last)
~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/ee/data.py in _execute_cloud_call(call, num_retries)
333 try:
--> 334 return call.execute(num_retries=num_retries)
335 except googleapiclient.errors.HttpError as e:
~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/googleapiclient/_helpers.py in positional_wrapper(*args, **kwargs)
133 logger.warning(message)
--> 134 return wrapped(*args, **kwargs)
135
~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/googleapiclient/http.py in execute(self, http, num_retries)
914 if resp.status >= 300:
--> 915 raise HttpError(resp, content, uri=self.uri)
916 return self.postproc(resp, content)
HttpError: <HttpError 400 when requesting https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/value:compute?prettyPrint=false&alt=json returned "Date: Parameter 'value' is required.". Details: "Date: Parameter 'value' is required.">
During handling of the above exception, another exception occurred:
EEException Traceback (most recent call last)
/var/folders/9t/w440ycrj6z94kk6yxbspq8n40000gn/T/ipykernel_56124/1447908745.py in
1 # Get into xarray datasets
----> 2 overuman_ASTER_ds = overuman_ASTER.wx.to_xarray(scale=50, crs='EPSG:4236')
3 overuman_ASTER_ds
~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/wxee/image.py in to_xarray(self, path, region, scale, crs, masked, nodata, progress, max_attempts)
80 """
81 with tempfile.TemporaryDirectory(prefix=constants.TMP_PREFIX) as tmp:
---> 82 files = self.to_tif(
83 out_dir=tmp,
84 region=region,
~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/wxee/image.py in to_tif(self, out_dir, description, region, scale, crs, file_per_band, masked, nodata, progress, max_attempts)
161 )
162
--> 163 url = self._get_url(region, scale, crs, file_per_band, nodata, max_attempts)
164
165 tifs = self._url_to_tif(
~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/wxee/image.py in _get_url(self, region, scale, crs, file_per_band, nodata, max_attempts)
237 url = img.getDownloadURL(
238 params=dict(
--> 239 name=image_id.getInfo(),
240 scale=scale,
241 crs=crs,
~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/ee/computedobject.py in getInfo(self)
96 The object can evaluate to anything.
97 """
---> 98 return data.computeValue(self)
99
100 def encode(self, encoder):
~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/ee/data.py in computeValue(obj)
670 The result of evaluating that object on the server.
671 """
--> 672 return _execute_cloud_call(
673 _get_cloud_api_resource().projects().value().compute(
674 body={'expression': serializer.encode(obj, for_cloud_api=True)},
~/opt/miniconda3/envs/SnowSat/lib/python3.9/site-packages/ee/data.py in _execute_cloud_call(call, num_retries)
334 return call.execute(num_retries=num_retries)
335 except googleapiclient.errors.HttpError as e:
--> 336 raise _translate_cloud_exception(e)
337
338
EEException: Date: Parameter 'value' is required.`
I have a code file to use wxee to convert ee image to xarray array data, and it ran successfully on Windows.
But when I ran the same piece of code on Windows Subsystem for Linux (WSL) Ubuntu, it crashes.
Example:
import ee
ee.Initialize()
import wxee
wxee.Initialize()
myregion=ee.Geometry.LineString([[-84, 30], [-70, 45], [-70, 45], [-84, 30]])
cfsr=[]
dem=ee.ImageCollection('NOAA/CFSV2/FOR6H').filter(ee.Filter.date('1996-02-14', '1996-02-19')).select(['u component_of_wind_height_above_ground'])etc=dem.wx.to_xarray(region=myregion,scale=2000)
print(etc)
The error was
Requesting data: 0%| | 0/20 [00:00<?, ?it/s]malloc(): unsorted double linked list corrupted
Aborted
again, it ran successfully on Windows, but not on WSL.
This would add a wxee.TimeSeries.timeline()
method that plots a timeline showing each image's system:time_start
date within the time series. Static plots would be doable using matplotlib
which is already a dependency through rasterio
, but an interactive plot where you could see the system:id
of each image on hover would be much cooler. Unfortunately that would probably require adding plotly
as a dependency.
Currently, the climatology methods take a reducer and apply it directly to generate monthly aggregates. For example, using climatology_month
with ee.Reducer.max()
would give you the max of all months in the time period.
Instead, it should apply the given reducer to generate monthly statistics and then always apply ee.Reducer.mean()
over the months to get, for example, the climatological average max in each month. The docs (and/or names of methods and kwargs) should be updated to make the output of the climatology methods clear. It should also be made clear that if the data is in the same time unit as the climatology, the reducer will have no effect (e.g. a max day-of-year climatology would only work with hourly data. If the data was already daily, the max reducer would have nothing to reduce).
I would like to produce a time series of global daily temperature based on ERA5 data.
In order to do that I'd have to apply a Reducer.mean()
operation but on space instead than time.
The docs only explain how to do this operation to average over time. Is there any way to compute an average over the x, y
dimensions and then download the result?
Obviously I would like to avoid having to download all data and compute the average locally.
Hi Aaron, I am wondering how to call a country using ee.Geometry.Polygon in wxee or is there any other way? Since Google Fusion Tables is not supported any more on Earth Engine, is there a way out to call a country polygon?
Thank you.
The standard multiprocessing
library forces a lot of awkward workarounds in wxee
e.g. aliased instance methods, using functools
to pass args, zipping args, etc. It slows down development and won't scale well.
There are a variety of non-standard library packages for parallelizing that may do a better job like Ray and Dask. I need to decide if there's a better option available and implement it if so.
To avoid adding dependencies (or if a conda-forge recipe is not available), I may use a soft dependency where downloads will run in serial if the package cannot be imported, but the user will be notified and instructed on how to download.
Hi, @aazuspan!
Just wanted to say that I love wxee
! I'm using it to combine products from Earth Engine and Planetary Computer and that's amazing! I'm using it almost every day, but sometimes this error happens:
---------------------------------------------------------------------------
MergeError Traceback (most recent call last)
/tmp/ipykernel_1042/4012842980.py in <module>
1 CLOUD_MASK = PCL_s2cloudless(S2_ee).map(PSL).map(PCSL).map(matchShadows).select("CLOUD_MASK")
----> 2 CLOUD_MASK_xarray = CLOUD_MASK.wx.to_xarray(scale = 20,crs = "EPSG:" + str(S2.epsg.data),region = ee_aoi)
/srv/conda/envs/notebook/lib/python3.8/site-packages/wxee/collection.py in to_xarray(self, path, region, scale, crs, masked, nodata, num_cores, progress, max_attempts)
135 )
136
--> 137 ds = _dataset_from_files(files)
138
139 # Mask the nodata values. This will convert int datasets to float.
/srv/conda/envs/notebook/lib/python3.8/site-packages/wxee/utils.py in _dataset_from_files(files)
120 das = [_dataarray_from_file(file) for file in files]
121
--> 122 return xr.merge(das)
123
124
/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge(objects, compat, join, fill_value, combine_attrs)
898 dict_like_objects.append(obj)
899
--> 900 merge_result = merge_core(
901 dict_like_objects,
902 compat,
/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value)
633
634 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
--> 635 variables, out_indexes = merge_collected(
636 collected, prioritized, compat=compat, combine_attrs=combine_attrs
637 )
/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_collected(grouped, prioritized, compat, combine_attrs)
238 variables = [variable for variable, _ in elements_list]
239 try:
--> 240 merged_vars[name] = unique_variable(name, variables, compat)
241 except MergeError:
242 if compat != "minimal":
/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in unique_variable(name, variables, compat, equals)
147
148 if not equals:
--> 149 raise MergeError(
150 f"conflicting values for variable {name!r} on objects to be combined. "
151 "You can skip this check by specifying compat='override'."
MergeError: conflicting values for variable 'CLOUD_MASK' on objects to be combined. You can skip this check by specifying compat='override'.
It is weird because it is not something that happens all the time, and most of the times I just have to re-run the code and it works. So, I don't know exactly what the problem is xD
Anyway, here I let you the error I got. I was trying to get a cloud mask in GEE and download it as a xarray
. I aleady tried it again and now it works, but, as I said, I don't know why. It also happens with other datasets. I was downloading some Sentinel-2 data (just as it is, without any processing steps) and sometimes work, but sometimes it doesn't and I can't reproduce the error because when I re-run it, most of the times it works xD
Ok, that was it!
Thank you!
Currently, the progress bar only displays after URLs have been returned by GEE, but when a large amount of processing is required on the server side, this can lead to a long wait before anything noticeably happens.
To fix this, there are two options:
I'll test the first implementation and fall back to the second if that doesn't work well.
List the ID, number of images, climatology frequency, maybe the climatology period, etc.
wxee.TimeSeries.climatology_std
method.climatology_mean
but using ee.Reducer.stdDev
Add a method/methods to wxee.TimeSeries
for temporal interpolation to allow filling in missing data, regridding, etc. Usage would look something like below, with different methods (or a kwarg option) for linear, cosine, cubic, maybe spline implementations. This will require writing some reliable tools for getting images surrounding a target date.
gridmet = ee.ImageCollection("IDAHO_EPSCOR/GRIDMET")
col = gridmet.filterDate("2020-01-01", "2020-01-03")
interp: ee.Image = col.wx.linearInterpolate(ee.Date("2020-01-01T18"))
GEE doesn't natively support seasons, but it should be relatively straightforward to group months into seasons to allow temporal resampling and climatology calculation at the seasonal level.
Hi, @aazuspan!
First of all, WOW! Your work with eexarray
is amazing, keep it going! 🚀
I was using your dev repo to try to convert a S2 collection to xarray, and it works, but, when I compute a spectral index using eemont
(that uses ee.Image.expression
) it doesn't work:
This works!
import ee, eemont, eexarray
ee.Initialize()
tw = ee.Geometry.Point([10.4522,51.0792])
bf = tw.buffer(500)
xt = bf.bounds()
S2 = ee.ImageCollection("COPERNICUS/S2_SR") \
.filterBounds(xt) \
.preprocess() \
.map(lambda x: x.addBands(x.normalizedDifference(["B8","B4"]).rename("NDVI"))) \
.limit(10) \
.map(lambda x: x.clip(xt)) \
.eex.resample_daily(reducer = ee.Reducer.median())
S2eex = S2.eex.to_xarray(scale=10)
This doesn't work (using eemont)
import ee, eemont, eexarray
ee.Initialize()
tw = ee.Geometry.Point([10.4522,51.0792])
bf = tw.buffer(500)
xt = bf.bounds()
S2 = ee.ImageCollection("COPERNICUS/S2_SR") \
.filterBounds(xt) \
.preprocess() \
.spectralIndices("NDVI") \
.limit(10) \
.map(lambda x: x.clip(xt)) \
.eex.resample_daily(reducer = ee.Reducer.median())
S2eex = S2.eex.to_xarray(scale=10)
This doesn't work (not using eemont)
import ee, eemont, eexarray
ee.Initialize()
tw = ee.Geometry.Point([10.4522,51.0792])
bf = tw.buffer(500)
xt = bf.bounds()
def addExpressionNDVI(x):
params = {"N": x.select("B8"),"R": x.select("B4")}
NDVI = x.expression("(N-R)/(N+R)",params).rename("NDVI")
return x.addBands(NDVI)
S2 = ee.ImageCollection("COPERNICUS/S2_SR") \
.filterBounds(xt) \
.preprocess() \
.map(addExpressionNDVI) \
.limit(10) \
.map(lambda x: x.clip(xt)) \
.eex.resample_daily(reducer = ee.Reducer.median())
S2eex = S2.eex.to_xarray(scale=10)
Error
AttributeError: Can't pickle local object 'Image.expression.<locals>.ReinterpretedFunction'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-37-94ef9caa673d> in <module>
----> 1 S2eex = S2.eex.to_xarray(scale=10)
~/anaconda3/envs/gee/lib/python3.9/site-packages/eexarray/ImageCollection.py in to_xarray(self, path, region, scale, crs, masked, nodata, num_cores, progress, max_attempts)
90 collection = self._rename_by_time()
91
---> 92 files = collection.eex.to_tif(
93 out_dir=tmp,
94 region=region,
~/anaconda3/envs/gee/lib/python3.9/site-packages/eexarray/ImageCollection.py in to_tif(self, out_dir, prefix, region, scale, crs, file_per_band, masked, nodata, num_cores, progress, max_attempts)
198 max_attempts=max_attempts,
199 )
--> 200 tifs = list(
201 tqdm(
202 p.imap(params, imgs),
~/anaconda3/envs/gee/lib/python3.9/site-packages/tqdm/std.py in __iter__(self)
1183
1184 try:
-> 1185 for obj in iterable:
1186 yield obj
1187 # Update and possibly print the progressbar.
~/anaconda3/envs/gee/lib/python3.9/multiprocessing/pool.py in next(self, timeout)
868 if success:
869 return value
--> 870 raise value
871
872 __next__ = next # XXX
~/anaconda3/envs/gee/lib/python3.9/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
535 break
536 try:
--> 537 put(task)
538 except Exception as e:
539 job, idx = task[:2]
~/anaconda3/envs/gee/lib/python3.9/multiprocessing/connection.py in send(self, obj)
209 self._check_closed()
210 self._check_writable()
--> 211 self._send_bytes(_ForkingPickler.dumps(obj))
212
213 def recv_bytes(self, maxlength=None):
~/anaconda3/envs/gee/lib/python3.9/multiprocessing/reduction.py in dumps(cls, obj, protocol)
49 def dumps(cls, obj, protocol=None):
50 buf = io.BytesIO()
---> 51 cls(buf, protocol).dump(obj)
52 return buf.getbuffer()
53
AttributeError: Can't pickle local object 'Image.expression.<locals>.ReinterpretedFunction'
Versions
It seems to be something related specifically to that earthengine-api method, but if you can find a workaround, that would be amazing! 🚀
And again, thank you very much for eexarray
!
A common wxee
issue is that EEException: Date: Parameter 'value' is required
is thrown when to_xarray
is called on an image without a system:time_start
property (#43, #50, #53). I should catch this error and throw something more helpful that explains the cause, suggests a workaround, and links to relevant issues.
Several dependencies are relatively niche and would make more sense as optional.
plotly
is only used in TimeSeries.timeline
netcdf4
is only used when writing NetCDFsThe current download system is pretty solid with automated retrying, but the cdsapi package has a more extensive system that should improve download stability. See their implementation for reference.
it does not work in python version < 3.7 ???
I was trying to download a median image to xarray and encountered this error below. I understand that we need time series image collections, but wonder if there is a workaround for ee.Image?
Thanks,
Daniel
EEException: Date: Parameter 'value' is required.
i did the same with Sentinel 1 GRD scenes, the issue is some values are just converted as NaN,
why such issue ???
So i am getting a major backscatter values as Nan, why such issue ?
Originally posted by @ashishgitbisht in #46 (comment)
Make sure valid options are listed and handled in only one location. Probably implement similarly to scikit-learn's scorer system with a submodule for storing and retrieving valid options. Rather than hardcoding valid options in the error messages, print them programatically from the valid list.
Any suggestions or contributions from wxee
users are welcome! :)
It looks like there's a bug in the temp directory handling for 0.4.0
that only seems to affect Windows (and therefore not the CI workflow). It's also possible this was introduced with Windows 11, since I upgraded recently.
This would be handled automatically by resolving #19, so if this isn't a quick fix I may need to wait for that.
import wxee
import ee
wxee.Initialize()
img = ee.ImageCollection("IDAHO_EPSCOR/GRIDMET").first()
img.wx.to_xarray(scale=100_000)
Raises:
---------------------------------------------------------------------------
PermissionError Traceback (most recent call last)
File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:627](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:627), in _rmtree_unsafe(path, onerror)
626 try:
--> 627 os.unlink(fullname)
628 except OSError:
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\az\\AppData\\Local\\Temp\\wxee_tmpstdzug4g\\IDAHO_EPSCOR_GRIDMET_19790101.time.19790101T060000.pr.tif'
During handling of the above exception, another exception occurred:
PermissionError Traceback (most recent call last)
File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:805](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:805), in TemporaryDirectory._rmtree..onerror(func, path, exc_info)
804 try:
--> 805 _os.unlink(path)
806 # PermissionError is raised on FreeBSD for directories
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\az\\AppData\\Local\\Temp\\wxee_tmpstdzug4g\\IDAHO_EPSCOR_GRIDMET_19790101.time.19790101T060000.pr.tif'
During handling of the above exception, another exception occurred:
NotADirectoryError Traceback (most recent call last)
[c:\Users\az\wxee\ee_computepixels.ipynb](file:///C:/Users/az/wxee/ee_computepixels.ipynb) Cell 9 in ()
[4](vscode-notebook-cell:/c%3A/Users/az/wxee/ee_computepixels.ipynb#X11sZmlsZQ%3D%3D?line=3) wxee.Initialize()
[6](vscode-notebook-cell:/c%3A/Users/az/wxee/ee_computepixels.ipynb#X11sZmlsZQ%3D%3D?line=5) img = ee.ImageCollection("IDAHO_EPSCOR/GRIDMET").first()
----> [7](vscode-notebook-cell:/c%3A/Users/az/wxee/ee_computepixels.ipynb#X11sZmlsZQ%3D%3D?line=6) img.wx.to_xarray(scale=100_000)
File [c:\Users\az\wxee\wxee\image.py:90](file:///C:/Users/az/wxee/wxee/image.py:90), in Image.to_xarray(self, path, region, scale, crs, masked, nodata, progress, max_attempts)
77 with tempfile.TemporaryDirectory(prefix=constants.TMP_PREFIX) as tmp:
78 files = self.to_tif(
79 out_dir=tmp,
80 region=region,
(...)
87 progress=progress,
88 )
---> 90 ds = _dataset_from_files(files, masked, nodata)
92 if path:
93 msg = (
94 "The path argument is deprecated and will be removed in a future "
95 "release. Use the `xarray.Dataset.to_netcdf` method instead."
96 )
File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:830](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:830), in TemporaryDirectory.__exit__(self, exc, value, tb)
829 def __exit__(self, exc, value, tb):
--> 830 self.cleanup()
File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:834](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:834), in TemporaryDirectory.cleanup(self)
832 def cleanup(self):
833 if self._finalizer.detach():
--> 834 self._rmtree(self.name)
File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:816](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:816), in TemporaryDirectory._rmtree(cls, name)
813 else:
814 raise
--> 816 _shutil.rmtree(name, onerror=onerror)
File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:759](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:759), in rmtree(path, ignore_errors, onerror)
757 # can't continue even if onerror hook returns
758 return
--> 759 return _rmtree_unsafe(path, onerror)
File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:629](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:629), in _rmtree_unsafe(path, onerror)
627 os.unlink(fullname)
628 except OSError:
--> 629 onerror(os.unlink, fullname, sys.exc_info())
630 try:
631 os.rmdir(path)
File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:808](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:808), in TemporaryDirectory._rmtree..onerror(func, path, exc_info)
806 # PermissionError is raised on FreeBSD for directories
807 except (IsADirectoryError, PermissionError):
--> 808 cls._rmtree(path)
809 except FileNotFoundError:
810 pass
File [c:\ProgramData\Miniconda3\envs\ee\lib\tempfile.py:816](file:///C:/ProgramData/Miniconda3/envs/ee/lib/tempfile.py:816), in TemporaryDirectory._rmtree(cls, name)
813 else:
814 raise
--> 816 _shutil.rmtree(name, onerror=onerror)
File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:759](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:759), in rmtree(path, ignore_errors, onerror)
757 # can't continue even if onerror hook returns
758 return
--> 759 return _rmtree_unsafe(path, onerror)
File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:610](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:610), in _rmtree_unsafe(path, onerror)
608 entries = list(scandir_it)
609 except OSError:
--> 610 onerror(os.scandir, path, sys.exc_info())
611 entries = []
612 for entry in entries:
File [c:\ProgramData\Miniconda3\envs\ee\lib\shutil.py:607](file:///C:/ProgramData/Miniconda3/envs/ee/lib/shutil.py:607), in _rmtree_unsafe(path, onerror)
605 def _rmtree_unsafe(path, onerror):
606 try:
--> 607 with os.scandir(path) as scandir_it:
608 entries = list(scandir_it)
609 except OSError:
NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\Users\\az\\AppData\\Local\\Temp\\wxee_tmpstdzug4g\\IDAHO_EPSCOR_GRIDMET_19790101.time.19790101T060000.pr.tif'
Hi, thanks for the great library. I ran into the following error:
collection_name = "MODIS/061/MOD13A2"
collection = ee.ImageCollection(collection_name) \ .filterDate('2019-11-01', '2019-12-31') .filterBounds(roi)
collection.wx.to_xarray()
EEException: Image.clipToBoundsAndScale: The geometry for image clipping must be bounded.
This error also show up when I remove the bounds filter
wxee.TimeSeries.climatology_anomaly
methodfrequency
and whether or not to standardize
TimeSeries
objects. Users would run wxee.TimeSeries.climatology_mean
and wxee.TimeSeries.climatology_std
to generate those inputs and then pass those and another TimeSeries
to calculate anomalies from.Something like:
ts = wxee.TimeSeries("IDAHO_EPSCOR/GRIDMET").filterDate("1981", "2011").select("pr")
mean = ts.climatology_mean("month")
std = ts.climatology_std("month")
anom = ts.climatology_anomaly("month", mean, std, standardize=True)
Need to decide whether I want tests that pull data from the server and take forever to run or quick tests that ensure things run but don't validate any results.
Currently, running climatology_dayofyear
groups days by Julian date. In a leap year, all days after February 29 will be pushed back one Julian day, so the climatological day-of-year 365 would represent December 31 in non-leap years and December 30 in leap years, for example. Day 366 would always represent December 31, but would be aggregated from 1/4 as many days as other days of the year.
Tools like Ferret handle this by re-gridding all years into 365 steps regardless of leap days (Reference 1, Reference 2).
Regridding may not be a practical solution in GEE, but it should be considered. If the current solution is kept, the docs should be updated to make that distinction clear.
xr.open_rasterio() doesnt work with the latest xarray package (encountered in wx.to_array(..)).
can be solved easily by adding rioxarray to utils.py (or specify xarray version).
Use the nbsphinx prolog feature to automatically add Github, Binder, and Colab links to all example notebooks. See sankee implementation
Once that's done, remove the manually added links from notebooks.
Automated requests to Earth Engine (everything made by wxee
) should be made through the high-volume endpoint. This feature would add a wxee.Initialize
function to initialize with that endpoint.
Examples should be updated to use wxee.Initialize
instead of ee.Initialize
, and an explanation should be added to the docs.
Dear Aaron Zuzpan
Thank you very much for this wonderful package.
I have in my assets a shp with 64 points, also locally as geojson. I tried following your instruction here , #28, to download sentinel-2 bands to xarray of those specifics 64 points. But, the total points depends of scale and region, being differents in number and localization of those 64.
There is any way to download those specific points to xarray?.
Thank in advance.
Walter Pereira
This would add two methods allowing ee.ImageCollection
and its subclass objects to be exported to a Drive and then imported into an xarray.Dataset
. Dimension and coordinates would be stored in filenames and parsed on import. This feature would allow users to handle time series data when file size or grid size is too large or computations time out.
Planned usage reference:
ts = wxee.TimeSeries("IDAHO_EPSCOR/GRIDMET").filterDate("2020", "2021")
task = ts.wx.to_drive(crs="EPSG:5070", scale=4_000)
# Once files are exported, user manually downloads them to a local folder
data_dir = "data"
ds = wxee.load_dataset(data_dir)
Drive exporting will be very similar to the wxee.image._get_url
method but will instead run and return a batch export task. All of the importing functionality is already implemented in the private wxee.utils._dataset_from_files
, so that portion should be simple.
Dear Aaron,
I would like to transform hourly precipitation from GPM to daily. As GPM images of precipitation (mm/h) are every 30 min, I have divide by 2. But my code have apponted an error as "Date: Parameter 'value' is required". Could you helpe me? See the example of my code bellow:
.filter()
e escolher apenas a área de MG
:regiao = ee.FeatureCollection('FAO/GAUL/2015/level1').filter(ee.Filter.eq('ADM1_NAME', 'Minas Gerais'))
**# carrega os dados
gpm = ee.ImageCollection('NASA/GPM_L3/IMERG_V06')
.select('precipitationCal')
.filterDate('2021-01-01', '2021-03-11')
.filterBounds(regiao)
gpm = gpm.map(lambda img: img.multiply(0.5))
ts = gpm.wx.to_time_series()
daily = ts.aggregate_time(frequency='day', reducer=ee.Reducer.sum())
ds = daily.wx.to_xarray(region=regiao.geometry(), scale=7000)
Thank you very much.
Best Regards,
Enrique
Any parallel operations (specifically wxee.TimeSeries.wx.to_xarray()
) will fail and may crash Python in a fresh install. On Linux the issue causes an immediate crash and "segmentation fault" message. On Windows it throws an SSL error, usually after downloading several images, or Python crashes silently. This happens on a clean install of wxee from conda-forge but has not happened in my development environment, so it is probably a package version or missing dependency issue.
Setting num_cores
to 1 (which disables multiprocessing) seems to resolve the issue but slows down downloads.
Hu aazuspan, thank you for providing this package is very helpful!
Im having problems to get images from landsat-8 imageCollection and convert to array due to size limit. I selected only one month period and two bands, but I`m getting an error message unless I set the scale 250m or higher (I wish 30m). Do you know if there is a way to solve this or is a GEE or code restriction this size? Thank you
Error "e.ee_exception.EEException: Total request size (238694952 bytes) must be less than or equal to 50331648 bytes."
My code
import ee
import wxee
from geetools import tools
wxee.Initialize()
ee.Initialize()
# Using CONUS C2 ARD tile 2613 tile (assuming there is a better way to import the grid)
aoi = ee.Geometry.Polygon([[[-81.5595250019701439, 32.8743922803664361], [-79.6900076077309478, 32.8743922803664361], [-79.6900076077309478, 34.4158935126628762], [-81.5595250019701439, 34.4158935126628762], [-81.5595250019701439,32.8743922803664361]]])
# Define image collection (here we are using Landsat-8 surface reflectance)
L8= ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')
# Filtering date, study area and selecting bands (in case of not using all of them)
collection = L8.filterDate("2020-07-05", "2020-08-11").filterBounds(aoi).select("B5", "B4")
# The coordinate reference system to use
crs = "EPSG:4326"
# Spatial resolution in CRS units (meters)
## PS: I set 250 otherwise is too big "e.ee_exception.EEException: Total request size (238694952 bytes) must be less than or equal to 50331648 bytes."
scale = 250
arr = collection.wx.to_xarray(scale=scale, crs=crs, region=aoi)
arr
path = "ARD2613_20200705_20200811_B4_B5.nc"
arr = collection.wx.to_xarray(path=path, scale=scale, crs=crs)
Hi, I have a Landsat time-series in epsg:4326 downloaded from the google earth engine that I am trying to convert to xarray.
The area covers the entire Las Vegas. Using ds = landsat_ts.wx.to_xarray() resulted in a ds with coarse scale of 1 decimal degree.
My question is how to define scale and crs parameters in the wx.to_xarray() function to get the raw Landsat's resolution of 30m?
Thanks,
Daniel
Attributes:
transform :
(1.0, 0.0, -116.0, 0.0, -1.0, 37.0)
crs :
+init=epsg:4326
res :
(1.0, 1.0)
is_tiled :
1
nodatavals :
(-32768.0,)
scales :
(1.0,)
offsets :
(0.0,)
AREA_OR_POINT :
Area
TIFFTAG_RESOLUTIONUNIT :
1 (unitless)
TIFFTAG_XRESOLUTION :
1
TIFFTAG_YRESOLUTION :
1
The 3.7 build is failing because of an incompaitiblity between xarray and the new importlib-metadata
release (see pydata/xarray#7149).
xarray dropped support for 3.7 quite a while ago, so it's time wxee
does too.
Hi, I am wondering if wxee could convert half-hourly / 3-hourly data to daily/ monthly data for the following data sets:
Thanking you.
The rgb
method for plotting xarray
objects lets users override most of the default arguments, but col
is always set to time
.
Line 100 in f171265
We should allow col
to be set by the user to allow plotting non time series data.
Add a wxee.TimeSeries.smooth_time
method that applies pixel-wise temporal smoothing to a time series.
wxee.TimeSeries.rolling_time
rather than as a "type" of smooth_time
.Temporal smoothing (#31) requires gap-filled time series images. This feature would be a wxee.TimeSeries.fill_gaps
method that uses a selectable method of interpolation (nearest neighbor, linear, and cubic) to unmask each image in the time series. Either drop images that don't have enough neighbors or just fall back to using nearest neighbor...
Passing a path
to ee.ImageCollection.wx.to_xarray
to automatically save to NetCDF was deprecated in 0.4.0
in favor of using the to_netcdf
method (just as easy and much more flexible). However, there are still references to NetCDF export support in the documentation and tutorials that use the deprecated parameter. Those should be removed.
Some collections (at least Landsat 7) are missing system:time_end
properties for most images, even though they have system:time_start
properties. This causes dataframe
to fail because the fields used to initialize the pd.Dataframe
are of different lengths. I'll probably drop system:time_end
from dataframe
since it's rarely useful and filling missing values is not supported by aggregate_array
.
pt = ee.Geometry.Point([-121.690476, 45.432933])
(wxee.TimeSeries("LANDSAT/LE07/C01/T1_SR")
.filterBounds(pt).timeline()
)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-53-af49cab8b808> in <module>
1 pt = ee.Geometry.Point([-121.690476, 45.432933])
2
----> 3 (wxee.TimeSeries("LANDSAT/LE07/C01/T1_SR")
4 .filterBounds(pt).timeline()
5 )
~\anaconda3\envs\gee\lib\site-packages\wxee\time_series.py in timeline(self)
151 A Plotly graph object interactive plot showing the acquisition time of each image in the time series.
152 """
--> 153 df = self.dataframe()
154 df["y"] = 0
155
~\anaconda3\envs\gee\lib\site-packages\wxee\time_series.py in dataframe(self)
139 ends = [_millis_to_datetime(ms) for ms in ends_millis]
140
--> 141 df = pd.DataFrame({"id": ids, "time_start": starts, "time_end": ends})
142 df.index.id = collection_id
143 return df
~\anaconda3\envs\gee\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
527
528 elif isinstance(data, dict):
--> 529 mgr = init_dict(data, index, columns, dtype=dtype)
530 elif isinstance(data, ma.MaskedArray):
531 import numpy.ma.mrecords as mrecords
~\anaconda3\envs\gee\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
285 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
286 ]
--> 287 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
288
289
~\anaconda3\envs\gee\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype, verify_integrity)
78 # figure out the index, if necessary
79 if index is None:
---> 80 index = extract_index(arrays)
81 else:
82 index = ensure_index(index)
~\anaconda3\envs\gee\lib\site-packages\pandas\core\internals\construction.py in extract_index(data)
399 lengths = list(set(raw_lengths))
400 if len(lengths) > 1:
--> 401 raise ValueError("arrays must all be same length")
402
403 if have_dicts:
ValueError: arrays must all be same length
geedim is a Python package that supports downloading EE images with automatic tiling to bypass file size limits. I've been wanting to improve the download system in wxee for a while (see #19), and using geedim
might be a good way to do that with the added bonus of removing most of the low-level thread and tempfile management that causes a lot of headaches. Ideally, I would replace the entire image downloading system with geedim
, both for to_tif
and for to_xarray
.
It will be quite a bit of work just to figure out how feasible this is, so I'm going to start keeping track of and checking off potential incompatibilities below as I figure them out.
geedim
uses threads to download tiles of large images whereas wxee
uses threads to download images within collections. I'll need to figure out the feasibility of parallelizing on both dimensions or else download speed would tank on large collections of small images, which is the primary focus of wxee
.geedim
tracks progress of image tiles whereas I need to track progress of images in collections (or both would be fine). I give separate progress bars for retrieving data (requesting the download URLs) and the download itself because the URL request can take a lot of time, and I don't think this will be possible with geedim
.geedim
supports tempfile outputs, but that's typically what you want when converting to xarray. I don't want to have to manage files manually, so I'll need to think more about how this will work. Maybe just create temp directories and download into them?geedim
automatically sets filePerBand=False
for all downloads. I'll need to do some rewriting to load xarray objects from multi-band images, but that may improve performance on the IO side by reading/writing fewer files.wxee
takes a nodata
argument and replaces masked values with that. After downloading, it sets that value in the image metadata or xarray.Dataset
. geedim
takes a different approach of adding a "FILL_MASK" band to the image before downloading. The advantage of the geedim
approach is that you don't need to choose between exporting everything as a float or risking assigning nodata to real values, but it does require downloading more data from EE, and once you actually get the image into xarray and mask it there's no advantage since xarray will promote everything to float64
anyways to accommodate NaN values. I'll probably live with the geedim
approach by applying and removing the mask band after downloading, but I should do some experiments to see how that affects performance (and to make sure I'm fully understanding the geedim
approach).geedim.MaskedImage
class exposes and caches EE properties, so building filenames from metadata is straightforward. The only consideration is that we need to persist that MaskedImage
instance throughout the download process to avoid having to retrieve properties multiple times.Hi, though wxee serves the purpose of subsetting using a rectangular grid or a country, I am wondering if there is any provision for subsetting using a shapefile of an area like the way we can run on GEE platform using an asset in the form of shapefile file.
Thanking you.
Turning time series into RGB plots is a headache currently (see the MODIS example notebook). This feature would add an rgb
method for visualizing multispectral data with color composites, probably extended to xarray.Dataset
using a wx
accessor.
hvplot
.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.