Giter Club home page Giter Club logo

Comments (5)

FabianHofmann avatar FabianHofmann commented on July 2, 2024 1

@zoltanmaric I am impressed how fast you are catching up with the atlite code! Definitely makes sense!

from atlite.

FabianHofmann avatar FabianHofmann commented on July 2, 2024 1

Interesting, but I would not bother with the value differences for different time resolutions, since it is related to ERA5 internals only.

from atlite.

zoltanmaric avatar zoltanmaric commented on July 2, 2024

The culprit is this attempt to infer the frequency of the time index:

# ERA5 variables are mean values for previous hour, i.e. 13:01 to 14:00 are labelled as "14:00"
# account by calculating the SolarPosition for the center of the interval for aggregation happens
# see https://github.com/PyPSA/atlite/issues/158
# Do not show DeprecationWarning from new SolarPosition calculation (#199)
with warnings.catch_warnings():
warnings.simplefilter("ignore", DeprecationWarning)
# Convert dt / time frequency to timedelta and shift solar position by half
# (freqs like ["H","30T"] do not work with pd.to_timedelta(...)
time_shift = (
-1
/ 2
* pd.to_timedelta(
pd.date_range(
"1970-01-01", periods=1, freq=pd.infer_freq(ds["time"])
).freq
)
)
sp = SolarPosition(ds, time_shift=time_shift)

An ERA5 CDS request spanning June 30 and July 1st looks like this:

{
  'product': 'reanalysis-era5-single-levels',
  'year': '2022',
  'month': [6, 7],
  'day': [30, 1],
  'time': ['00:00', '01:00', '02:00', '03:00', '04:00', '05:00', '06:00', '07:00', '08:00', '09:00', '10:00', '11:00', '12:00', '13:00', '14:00', '15:00', '16:00', '17:00', '18:00', '19:00', '20:00', '21:00', '22:00', '23:00']
}

which returns results covering June 1st, June 30th, July 1st, and July 30th:

Time index contents

From get_data_influx

print(ds.time)

Output:

<xarray.DataArray 'time' (time: 96)>
array(['2022-06-01T00:00:00.000000000', '2022-06-01T01:00:00.000000000',
       '2022-06-01T02:00:00.000000000', '2022-06-01T03:00:00.000000000',
       '2022-06-01T04:00:00.000000000', '2022-06-01T05:00:00.000000000',
       '2022-06-01T06:00:00.000000000', '2022-06-01T07:00:00.000000000',
       '2022-06-01T08:00:00.000000000', '2022-06-01T09:00:00.000000000',
       '2022-06-01T10:00:00.000000000', '2022-06-01T11:00:00.000000000',
       '2022-06-01T12:00:00.000000000', '2022-06-01T13:00:00.000000000',
       '2022-06-01T14:00:00.000000000', '2022-06-01T15:00:00.000000000',
       '2022-06-01T16:00:00.000000000', '2022-06-01T17:00:00.000000000',
       '2022-06-01T18:00:00.000000000', '2022-06-01T19:00:00.000000000',
       '2022-06-01T20:00:00.000000000', '2022-06-01T21:00:00.000000000',
       '2022-06-01T22:00:00.000000000', '2022-06-01T23:00:00.000000000',
       '2022-06-30T00:00:00.000000000', '2022-06-30T01:00:00.000000000',
       '2022-06-30T02:00:00.000000000', '2022-06-30T03:00:00.000000000',
       '2022-06-30T04:00:00.000000000', '2022-06-30T05:00:00.000000000',
       '2022-06-30T06:00:00.000000000', '2022-06-30T07:00:00.000000000',
       '2022-06-30T08:00:00.000000000', '2022-06-30T09:00:00.000000000',
       '2022-06-30T10:00:00.000000000', '2022-06-30T11:00:00.000000000',
       '2022-06-30T12:00:00.000000000', '2022-06-30T13:00:00.000000000',
       '2022-06-30T14:00:00.000000000', '2022-06-30T15:00:00.000000000',
       '2022-06-30T16:00:00.000000000', '2022-06-30T17:00:00.000000000',
       '2022-06-30T18:00:00.000000000', '2022-06-30T19:00:00.000000000',
       '2022-06-30T20:00:00.000000000', '2022-06-30T21:00:00.000000000',
       '2022-06-30T22:00:00.000000000', '2022-06-30T23:00:00.000000000',
       '2022-07-01T00:00:00.000000000', '2022-07-01T01:00:00.000000000',
       '2022-07-01T02:00:00.000000000', '2022-07-01T03:00:00.000000000',
       '2022-07-01T04:00:00.000000000', '2022-07-01T05:00:00.000000000',
       '2022-07-01T06:00:00.000000000', '2022-07-01T07:00:00.000000000',
       '2022-07-01T08:00:00.000000000', '2022-07-01T09:00:00.000000000',
       '2022-07-01T10:00:00.000000000', '2022-07-01T11:00:00.000000000',
       '2022-07-01T12:00:00.000000000', '2022-07-01T13:00:00.000000000',
       '2022-07-01T14:00:00.000000000', '2022-07-01T15:00:00.000000000',
       '2022-07-01T16:00:00.000000000', '2022-07-01T17:00:00.000000000',
       '2022-07-01T18:00:00.000000000', '2022-07-01T19:00:00.000000000',
       '2022-07-01T20:00:00.000000000', '2022-07-01T21:00:00.000000000',
       '2022-07-01T22:00:00.000000000', '2022-07-01T23:00:00.000000000',
       '2022-07-30T00:00:00.000000000', '2022-07-30T01:00:00.000000000',
       '2022-07-30T02:00:00.000000000', '2022-07-30T03:00:00.000000000',
       '2022-07-30T04:00:00.000000000', '2022-07-30T05:00:00.000000000',
       '2022-07-30T06:00:00.000000000', '2022-07-30T07:00:00.000000000',
       '2022-07-30T08:00:00.000000000', '2022-07-30T09:00:00.000000000',
       '2022-07-30T10:00:00.000000000', '2022-07-30T11:00:00.000000000',
       '2022-07-30T12:00:00.000000000', '2022-07-30T13:00:00.000000000',
       '2022-07-30T14:00:00.000000000', '2022-07-30T15:00:00.000000000',
       '2022-07-30T16:00:00.000000000', '2022-07-30T17:00:00.000000000',
       '2022-07-30T18:00:00.000000000', '2022-07-30T19:00:00.000000000',
       '2022-07-30T20:00:00.000000000', '2022-07-30T21:00:00.000000000',
       '2022-07-30T22:00:00.000000000', '2022-07-30T23:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2022-06-01 ... 2022-07-30T23:00:00
Attributes:
    long_name:  time

Because the time index is not continuous, pd.infer_freq returns None, resulting in a time shift of minus 12 hours.

from atlite.

zoltanmaric avatar zoltanmaric commented on July 2, 2024

I think the fix should be as simple as just always setting the time shift to 30 minutes (proposed diff).

The get_data function of era5.py already has the reanalysis-era5-single-levels hard-coded as the product:

retrieval_params = {
"product": "reanalysis-era5-single-levels",
"area": _area(coords),
"chunks": cutout.chunks,
"grid": [cutout.dx, cutout.dy],
"tmpdir": tmpdir,
"lock": lock,
}

To my understanding, this product's resolution is always hourly (see docs), so there's no reason to attempt to infer a different frequency.
image

What do you think @euronion @FabianHofmann ?

from atlite.

zoltanmaric avatar zoltanmaric commented on July 2, 2024

Comparing ERA5 Values Requested at Different Time Samplings

Here's a comparison of values received from ERA5 at 1h, 2h, 3h, 4h, and 6h sampling:

image

Code to Generate the Above Table

import xarray
import atlite.datasets.era5 as era5
from dask.utils import SerializableLock
import functools

# Create lists of hours like [00:00, 01:00, 02:00, ...], sampled
# every hour, every 2 hours, etc.
time_sampling = {}
for rate in [1, 2, 3, 4, 6]:
    time_sampling[rate] = [f"{hour:02}:00" for hour in range(0, 24, rate)]

retrieval_params = {
    'product': 'reanalysis-era5-single-levels',
    'area': [57.0, -0.5, 56.0, 0.5],
    'chunks': {'time': 100},
    'grid': [0.25, 0.25],
    'tmpdir': '/tmp',
    'lock': SerializableLock(),
    'year': '2013',
    'month': [1],
    'day': [1]
}

param_sets = {hour: {**retrieval_params, **{'time': time}} for hour, time in time_sampling.items()}


def retrieve_data_for_single_raster(params: dict) -> "xarray.DataSet":
    variable = [
        "surface_net_solar_radiation",
        "surface_solar_radiation_downwards",
        "toa_incident_solar_radiation",
        "total_sky_direct_solar_radiation_at_surface",
    ]
    ds = era5.retrieve_data(variable=variable, **params)
    return ds.sel(latitude=56, longitude=0).load().to_dataframe()[["ssr", "ssrd", "tisr", "fdir"]]


# Retrieve ERA5 data for each different time sampling
raw_ds = {hour: retrieve_data_for_single_raster(params) for hour, params in param_sets.items()}


def join_dfs(left_sampling_and_df, right_sampling_and_df):
    left_sampling, left_df = left_sampling_and_df
    right_sampling, right_df = right_sampling_and_df

    suffix = f"_{right_sampling}h"

    return left_sampling, left_df.join(right_df, on='time', how='left', rsuffix=suffix)


# Merge all data into a single dataframe to show differences in values
# for each hour next to each other
_, merged = functools.reduce(join_dfs, raw_ds.items())


samplings_compared = merged.sort_index(axis='columns')\
    .query('time.dt.hour > 7 and time.dt.hour < 18')\
    .astype(float)\
    .round(decimals=2)

# Remove date part
samplings_compared.index = samplings_compared.index.time

samplings_compared.to_csv('/tmp/samplings_compared.csv')

While this does show that the time shift is not proportional to the sampling, there's an additional weird twist to it. The values for 2h, 3h, 4h, and 6h sampling seem to be equal - but the values for 1h sampling are slightly different (by less than 1% at noon).

It would be interesting to find out why that is, but as far as this issue is concerned - I think it's still more appropriate to always shift the time by 30 minutes, than by half of the sampling interval.

from atlite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.