Giter Club home page Giter Club logo

satip's Introduction

Satip

All Contributors

PyPI version

codecov

Satip is a library for satellite image processing, and provides all of the functionality necessary for retrieving, and storing EUMETSAT data


Installation

To install the satip library please run:

pip install satip

Or if you're working in the development environment you can run the following from the directory root:

pip install -e .

Conda

Or, if you want to use conda from a cloned Satip repository:

conda env create -f environment.yml
conda activate satip
pip install -e .

If you plan to work on the development of Satip then also consider installing these development tools:

conda install pytest flake8 jedi mypy black pre-commit
pre-commit install

Development Environment

In order to contribute:

  • it's recommended that you use a Linux-based OS. This is currently used for all CI/CD testing, production, and development.
  • At the time of writing (21-Dec-23), the Python version used is 3.11 with work being done to update to Python 3.12. This is subject to updates over time.

Operation

Getting your own API key

In order to contribute to development or just test-run some scripts, you will need your own Eumetsat-API-key. Please follow these steps:

  1. Go to https://eoportal.eumetsat.int and register an account.
  2. You can log in and go to https://data.eumetsat.int/ to check available data services. From there go to your profile and choose the option "API key" or go to https://api.eumetsat.int/api-key/ directly.
  3. Please make sure that you added the key and secret to your user's environment variables.

Downloading EUMETSAT Data

The following command will download the last 2 hours of RSS imagery into NetCDF files at the specified location

python satip/app.py --api-key=<EUMETSAT API Key> --api-secret=<EUMETSAT API Secret> --save-dir="/path/to/saving/files/" --history="2 hours"

To download more historical data, the command below will download the native files, compress with bz2, and save into a subdirectory.

python satip/get_raw_eumetsat_data.py --user-key=<EUMETSAT API Key> --user-secret=<EUMETSAT API Secret>

Converting Native files to Zarr

scripts/convert_native_to_zarr.py converts EUMETSAT .nat files to Zarr datasets, using very mild lossy JPEG-XL compression. (JPEG-XL is the "new kid on the block" of image compression algorithms). JPEG-XL makes the files about a quarter the size of the equivalent bz2 compressed files, whilst the images are visually indistinguishable. JPEG-XL cannot represent NaNs so NaNs. JPEG-XL understands float32 values in the range [0, 1]. NaNs are encoded as the value 0.025. All "real" values are in the range [0.075, 1]. We leave a gap between "NaNs" and "real values" because there is very slight "ringing" around areas of constant value (see this comment for more details). Use satip.jpeg_xl_float_with_nans.JpegXlFloatWithNaNs to decode the satellite data. This class will reconstruct the NaNs and rescale the data to the range [0, 1].

Running in Production

The live service uses app.py as the entrypoint for running the live data download for OCF's forecasting service, and has a few configuration options, configurable by command line argument or environment variable.

--api-key or API_KEY is the EUMETSAT API key

--api-secret or API_SECRET is the EUMETSAT API secret

--save-dir or SAVE_DIR is the top level directory to save the output files, a latest subfolder will be added to that directory to contain the latest data

--history or HISTORY is the amount of history timesteps to use in the latest.zarr files

--db-url or DB_URL is the URL to the database to save to when a run has finished

--use-rescaler or USE_RESCALER tells whether to rescale the satellite data to between 0 and 1 or not when saving to disk. Primarily used as backwards compatibility for the current production models, all new training and production Zarrs should use the rescaled data.

Testing

To run tests, simply run pytest . from the root of the repository. To generate the test plots, run python scripts/generate_test_plots.py.

Environmental Variables

Some tests require environmental variables to be set that would be passed in by command line argument when running the code in production. These are as follows:

  • EUMETSAT_USER_KEY: the EUMETSAT API key
  • EUMETSAT_USER_SECRET: the EUMETSAT API secret

These can be added using the export command in your shell environment. To add these permanently, the export statements can be added to the configuration file for the shell environment (e.g. "~/.bashrc" if using bash).

Contributors โœจ

Thanks goes to these wonderful people (emoji key):

Jacob Bieker
Jacob Bieker

๐Ÿ’ป
Jack Kelly
Jack Kelly

๐Ÿ’ป
Ayrton Bourn
Ayrton Bourn

๐Ÿ’ป
Laurence Watson
Laurence Watson

๐Ÿ’ป
Notger Heinz
Notger Heinz

๐Ÿ“–
Peter Dudfield
Peter Dudfield

๐Ÿ“–
Azah Norbline
Azah Norbline

๐Ÿ’ป
Tom Pughe
Tom Pughe

๐Ÿ’ป
Zhenbang Feng
Zhenbang Feng

๐Ÿ’ป
jsbaasi
jsbaasi

๐Ÿ’ป
Suleman Karigar
Suleman Karigar

๐Ÿ’ป
Richa
Richa

๐Ÿ’ป
Nathan Simpson
Nathan Simpson

๐Ÿ›

This project follows the all-contributors specification. Contributions of any kind welcome!

satip's People

Contributors

14richa avatar allcontributors[bot] avatar aryanbhosale avatar ayrtonb avatar dependabot[bot] avatar devsjc avatar jackkelly avatar jacobbieker avatar jacqueline-j avatar jasonfenggit avatar jsbaasi avatar ludobegins avatar mduffin95 avatar norbline avatar notger avatar peterdudfield avatar pre-commit-ci[bot] avatar rabscuttler avatar simlmx avatar suleman1412 avatar tompughe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

satip's Issues

Add Unit Tests

Detailed Description

We should add more unit tests for the repo, downloading, etc.

Context

Unit tests are good

Possible Implementation

Possibly pass in the key and secret through GH Repo secrets? Then it could try downloading some data, etc.

Latest data

Detailed Description

  • Create app.py that saves latest satellite file to a directory.
  • add option of where to save the directory
  • add options for the channels to save
  • add option for how history of satellite images to get, for example 2 hours.
  • save both satellite and hrv
  • good to test that it can save to s3
  • save as netcdf file, latest.netcdf and {datetimenow}.netcdf

Context

Need this for nowcasting
example of nwp is here

Update to readme

Detailed Description

  • provide local command to collect live satellite data
  • add section on environment variables
  • how to run tests
  • add codecov badge
  • Give some examples (e.g. downloading EUMETSAT data)

Context

good to have good documenets

Create Zarrs of entire geographical extant of RSS imagery

Detailed Description

Create Zarr files that cover the whole geographical extant of the RSS image, so we can pre-train on satellite imagery across all of Europe, not just the UK. We would have to deal with the NaNs for pixels that are in space, which would currently result in every image being discarded for having NaNs. Possibly a mask or something?

Context

Pre-training can help models learn, and pre-training works best with the most amount of examples possible. By having the whole geographical extant of the RSS image ready to use, we have a lot more variety of geographies, climates, etc. for the models to learn how clouds move, and 10-11 years of data to train on.

Possible Implementation

Add a mask for dealing with the NaNs caused by space, and just don't crop the image when creating the Zarr. Would probably want to also figure out #39 first, so if we want the cloud masks included we can do it with this. I would assume that this will take quite awhile to create.

Write script to convert `.nat` EUMETSAT files to Zarr intermediate

Features of this script

  • Takes command-line arguments for directory of .nat files; and target directory for the Zarr.
  • The script should be able to append newly downloaded data to an existing Zarr store, so we can incremently grow the Zarr store whenever we download new .nat data: When the script starts, it checks through all the .nat files (recursively), and checks through the existing Zarr, and only converts data which is present in the .nat files but absent in the Zarr. I think you can append to Zarr stores using something like xr.Dataset.to_zarr(mode='a', append_dim='time'). Definitely have a look at the xarray docs on appending to Zarr. It's possible that appending to Zarr only works correctly if data is appending in order, but I'm not certain! (Zarr's fragility when it comes to appending data might be one strong argument for swapping to using GeoTIFF or individual NetCDF files per EUMETSAT timestep, instead of Zarr... But let's try to get Zarr to work because it does seem to enable the fastest reads).
  • Save to Zarr as int16, using only 10 bits per pixel per channel. i.e., re-scale each channel to [0, 1023], and save in np.int16 dtype. This results in really good compression (better than using float16), and probably more precise (see the raw benchmark results here. I benchmarked a bunch of compression algorithms. compressor = numcodecs.Blosc(cname="zstd", clevel=5) was the best setting I found. If we want to be really ambitious we could try compressing with a lossless, modern image compression algorithm like AVIF or WebP. Some more notes about these options in #13. But, for now, zstd is probably fine.)
  • Save EUMETSAT metadata into the Zarr stores? (Maybe this isn't very important given that we currently have no plans to use the EUMETSAT metadata!)
  • Discard any images with NaNs.
  • Optionally: #15
  • Optionally only saves a geographical subset of the data (perhaps with some handy human-readable shortcuts like "UK")
  • Use all the CPU cores.
  • Each Zarr chunk should probably be at least 500 kBytes on disk. Any smaller and it becomes really inefficient to load small files! We probably want 1 chunk per timestep (so we can efficiently read any combination of timesteps). Or maybe 1 chunk for a small number of timesteps (4?). One chunk could hold all satellite channels, given that we usually use all satellite channels.

Related:

Bug with time range on backfill pipeline

NING - 2021-03-20 15:28:12,702 - INFO - ********** Download Manager Initialised **************
[2021-03-20 15:28:12,703] {logging_mixin.py:112} INFO - [2021-03-20 15:28:12,702] {eumetsat.py:327} INFO - ********** Download Manager Initialised **************
[2021-03-20 15:28:12,791] {logging_mixin.py:112} INFO - Earliest 2020-01-09T00:00:00, latest 2020-01-13T00:00:00
[2021-03-20 15:28:12,798] {logging_mixin.py:112} WARNING - 2021-03-20 15:28:12 - dagster - ERROR - download_missing_data_pipeline - manual__2021-03-20T15:27:52.932900+00:00 - 17566 - download_missing_eumetsat_files.compute - STEP_FAILURE - Execution of step "download_missing_eumetsat_files.compute" failed.

ipypb.progressbar.ProgressBarInputError: Please specify the total number of iterations

  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/errors.py", line 180, in user_code_error_boundary
    yield
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 475, in _user_event_sequence_for_step_compute_fn
    for event in iterate_with_context(raise_interrupts_immediately, gen):
  File "/srv/airflow/lib/python3.7/site-packages/dagster/utils/__init__.py", line 443, in iterate_with_context
    next_output = next(iterator)
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/execution/plan/compute.py", line 105, in _execute_core_compute
    for step_output in _yield_compute_results(compute_context, inputs, compute_fn):
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/execution/plan/compute.py", line 76, in _yield_compute_results
    for event in user_event_sequence:
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/definitions/decorators/solid.py", line 230, in compute
    for item in result:
  File "/srv/airflow/lib/python3.7/site-packages/satip/backfill.py", line 54, in download_missing_eumetsat_files
    missing_datasets = io.identifying_missing_datasets(start_date, end_date)
  File "/srv/airflow/lib/python3.7/site-packages/satip/io.py", line 189, in identifying_missing_datasets
    for i in track(range(len(month_split)-1)):
  File "/srv/airflow/lib/python3.7/site-packages/ipypb/progressbar.py", line 118, in __init__
    raise ProgressBarInputError('Please specify the total number of iterations')
[2021-03-20 15:28:12,879] {taskinstance.py:1145} ERROR - step failed with error: (ProgressBarInputError) - ipypb.progressbar.ProgressBarInputError: Please specify the total number of iterations

Stack Trace: 
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/errors.py", line 180, in user_code_error_boundary
    yield
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 475, in _user_event_sequence_for_step_compute_fn
    for event in iterate_with_context(raise_interrupts_immediately, gen):
  File "/srv/airflow/lib/python3.7/site-packages/dagster/utils/__init__.py", line 443, in iterate_with_context
    next_output = next(iterator)
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/execution/plan/compute.py", line 105, in _execute_core_compute
    for step_output in _yield_compute_results(compute_context, inputs, compute_fn):
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/execution/plan/compute.py", line 76, in _yield_compute_results
    for event in user_event_sequence:
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/definitions/decorators/solid.py", line 230, in compute
    for item in result:
  File "/srv/airflow/lib/python3.7/site-packages/satip/backfill.py", line 54, in download_missing_eumetsat_files
    missing_datasets = io.identifying_missing_datasets(start_date, end_date)
  File "/srv/airflow/lib/python3.7/site-packages/satip/io.py", line 189, in identifying_missing_datasets
    for i in track(range(len(month_split)-1)):
  File "/srv/airflow/lib/python3.7/site-packages/ipypb/progressbar.py", line 118, in __init__
    raise ProgressBarInputError('Please specify the total number of iterations')
Traceback (most recent call last):
  File "/srv/airflow/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/srv/airflow/lib/python3.7/site-packages/dagster_airflow/vendor/python_operator.py", line 108, in execute
    return_value = self.execute_callable()
  File "/srv/airflow/lib/python3.7/site-packages/dagster_airflow/vendor/python_operator.py", line 113, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/srv/airflow/lib/python3.7/site-packages/dagster_airflow/operators/python_operator.py", line 10, in python_callable
    dagster_operator_parameters.invocation_args, ts, dag_run, **kwargs
  File "/srv/airflow/lib/python3.7/site-packages/dagster_airflow/operators/util.py", line 142, in invoke_steps_within_python_operator
    check_events_for_failures(events)
  File "/srv/airflow/lib/python3.7/site-packages/dagster_airflow/operators/util.py", line 16, in check_events_for_failures
    "step failed with error: %s" % event.event_specific_data.error.to_string()
airflow.exceptions.AirflowException: step failed with error: (ProgressBarInputError) - ipypb.progressbar.ProgressBarInputError: Please specify the total number of iterations

Stack Trace: 
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/errors.py", line 180, in user_code_error_boundary
    yield
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 475, in _user_event_sequence_for_step_compute_fn
    for event in iterate_with_context(raise_interrupts_immediately, gen):
  File "/srv/airflow/lib/python3.7/site-packages/dagster/utils/__init__.py", line 443, in iterate_with_context
    next_output = next(iterator)
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/execution/plan/compute.py", line 105, in _execute_core_compute
    for step_output in _yield_compute_results(compute_context, inputs, compute_fn):
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/execution/plan/compute.py", line 76, in _yield_compute_results
    for event in user_event_sequence:
  File "/srv/airflow/lib/python3.7/site-packages/dagster/core/definitions/decorators/solid.py", line 230, in compute
    for item in result:
  File "/srv/airflow/lib/python3.7/site-packages/satip/backfill.py", line 54, in download_missing_eumetsat_files
    missing_datasets = io.identifying_missing_datasets(start_date, end_date)
  File "/srv/airflow/lib/python3.7/site-packages/satip/io.py", line 189, in identifying_missing_datasets
    for i in track(range(len(month_split)-1)):
  File "/srv/airflow/lib/python3.7/site-packages/ipypb/progressbar.py", line 118, in __init__
    raise ProgressBarInputError('Please specify the total number of iterations')

[2021-03-20 15:28:12,880] {taskinstance.py:1202} INFO - Marking task as FAILED.dag_id=download_missing_data_pipeline, task_id=download_missing_eumetsat_files, execution_date=20210320T152752, start_date=20210320T152812, end_date=20210320T152812
[2021-03-20 15:28:22,000] {logging_mixin.py:112} INFO - [2021-03-20 15:28:22,000] {local_task_job.py:103} INFO - Task exited with return code 1

Add plots in Geostationary projection to `generate_test_plots.py`

Detailed Description

Modify scripts/generate_test_plots.py to produce an additional set of plots: Satellite imagery plotted in geostationary projection with coastlines overlaid, like this:

image

Context

The OSGB plot just shows the OSGB domain, which is just over Britain. The Geostationary plot shows the whole geographical extent of the imagery.

It'd be good to have both OSGB and Geostationary plots.

Possible Implementation

Something like this should work:

ax = plt.axes(projection=ccrs.Geostationary(central_longitude=9.5))
data_array.sel(time="2020-06-01 12:00").plot.pcolormesh(
    ax=ax, 
    transform=ccrs.Geostationary(central_longitude=9.5), 
    x="x", 
    y="y", 
    add_colorbar=False,
)
ax.coastlines();

Satip fails with multiple processes because of rate-limiting

Describe the bug
When downloading EUMETSAT data with the download script, when using multi-processing, the code will crash after a few days of data are downloaded as a result of rate-limiting from the Data Store.

To Reproduce
Steps to reproduce the behavior:

  1. Try running download script with a large (>4) number of processes
  2. Wait, and see error with the API returning a SUSPENDED value

Expected behavior
Satip to deal with the rate limiting without crashing.

Additional context
This could be through, when getting that error, causing a back-off from calling the API? A current hot-fix is to wait a random amount of time before calling download_date_range to hopefully reduce the load enough? But that is probably not ideal.

Rename `stacked_eumetsat_data` to `data`; and rename `variable` to `channels`?

Detailed Description

To be consistent with our other Zarrs (such as NWP and GSP PV Zarrs)

Possible Implementation

Maybe when we re-compress the existing satellite zarrs using JPEG-XL, we can also rename these?

And then change Satip so it always using data and channels?

Unless anyone thinks these new names are silly?! ๐Ÿ™‚

Related to #62

Zarrs seem to be float64 when they should be int8 or int16

Describe the bug
After loading a cloudmask Zarr whose datarrays were set as int8, calling print(zarr) shows that the stacked_eumetsat_data was float64.

To Reproduce
Steps to reproduce the behavior:

  1. Open any of the currently created Zarrs, or create a new one with satip and then load it.

Expected behavior
The dtype of the images to be either int8 or int16.

Additional context
Fixing this might result in a lot smaller Zarrs too, as the data would be in a smaller dtype.

Add support for Data Tailor

Detailed Description

The data tailor (API docs here: https://eumetsatspace.atlassian.net/wiki/spaces/DSDS/pages/564527165/Jupyter+Notebook+Using+the+Data+Tailor+REST+API) allows for subsetting and changing the output format, etc. for data from EUMETSAT.

Context

For a production service it would be nice to be able to get the data in a non-native format, and with different ROI, like just around the UK for example, which should speed up processing and inference.

Possible Implementation

The linked Jupiter notebook has nice examples.

RAM usage continually grows when running Zarr creation script

Describe the bug
The Zarr creation script slowly uses more and more RAM over time, resulting in OOM errors eventually, and crashing.

To Reproduce
Steps to reproduce the behavior:

  1. Run the Zarr script
  2. See error

Expected behavior
RAM usage stays the same once it starts running

Additional context
Add any other context about the problem here.

Fix CI bug

Describe the bug
Current python tests github actions are failing

To Reproduce
See latest actions

Expected behavior
To CI tests to pass and upload coverage to codecov

Add plotting data in CI step

Detailed Description

HAve tests that download data, transform it, and plot it.

Context

There was an issue with the OSGB coordinates being off by a bit. By plotting it on every PR and any other plots that would be helpful, we can easily see if what wee are doing is changing the coordinates or data in a way that makes it worse.

Possible Implementation

We would want plots for:

  • RSS Satellite Channel from Native File
  • Cloud Mask from GRIB file
  • Data Tailor RSS Imagery
  • Data Tailor Cloud Mask

Write "proper" datetimes into Zarr

Detailed Description

There's a quirk in xarray's Zarr implementation which means that, out-of-the-box, it corrupts datetimes if appending to the time dimension of an existing Zarr.

But there's a simple work-around (described in pydata/xarray#3942 (comment)) which means we can write "proper" datetimes into the Zarr (and append):

Possible Implementation

This is the function I've written for appending NWPs into a Zarr (maybe this function should go into a utils repo?!)

def append_to_zarr(dataset: xr.Dataset, zarr_path: Union[str, Path]):
    zarr_path = Path(zarr_path)
    if zarr_path.exists():
        to_zarr_kwargs = dict(
            append_dim = "time",
        )
    else:
        to_zarr_kwargs = dict(
            # Need to manually set the time units otherwise xarray defaults to using
            # units of *days* (and hence cannot represent sub-day temporal resolution), which corrupts
            # the `time` values when we appending to Zarr.  See:
            # https://github.com/pydata/xarray/issues/5969 and
            # http://xarray.pydata.org/en/stable/user-guide/io.html#time-units
            encoding={
                'time': {
                    'units': 'nanoseconds since 1970-01-01'
                },
            },
        )

    dataset.to_zarr(zarr_path, **to_zarr_kwargs)

This definitely isn't urgent... this should probably wait until early 2022 (because it takes a while to re-create the satellite Zarrs!)

Encode sequences of satellite images using modern video compression like AV1

Video compression has developed a lot over recent years (driven by Netflix etc.)

Our sequences of satellite images and NWPs can be considered video sequences. There's lots of redundant information across frames. So, if we wanted to squish the data down as much as possible (e.g. for sharing with students; or for regularly sending to Lancium; or just for archiving many years of data without breaking the bank) then we might want to consider using video compression like AV1 to compress our satellite data and/or NWP data.

ffmpeg supports AV1 encoding, including lossless, 10-bit, and 12-bit

And ffmpeg-python supports moving data between numpy arrays and ffmpeg.

If we really wanted to, we could probably write a numcodecs-like compression library to allow us to use ffmpeg to compress stuff, and still save into NetCDF / Zarr.

In terms of pre-prepared batches, it may be far easier to save each example as a standard video file (rather than trying to use AV1 within NetCDF... e.g. save as a sequence of TIFFs, and then ask ffmpeg to convert those TIFFs to a video file compressed using AV1). Which we can do now that we're saving each modality separately :)

This is not a priority, of course!

Twitter discussion.

Issues with satellite data

These are data issues found in the intermediate Zarr.

Check if these data issues are in the native data

  • VIS006 (and other chans?) is all zeros (or has a sizable chunk of zeros) at:
    • 2020-05-18 15:25
    • 2020-05-24 11:20
    • 2020-06-01 07:20
    • 2020-06-04 10:00
    • 2020-12-28 13:10 and 13:20

Documentation on openclimatefix.org out of date after satip simplification

Describe the bug

The satip documentation on openclimatefix.org is out of date and doesn't work since the radical Satip simplification in #7.

To Reproduce

Steps to reproduce the behavior:

  1. Go to https://openclimatefix.org/Satip/102_reprojecting
  2. Follow the instructions

Expected behavior

Online documentation should be current and up-to-date.

Actual behaviour

ImportError: cannot import name 'reproj' from 'satip' (/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/satip/__init__.py)

Additional context

openclimatefix looks great, but it seems satip is now reduced to a thin EUMETSAT Data Store wrapper? Which seems reasonable, since we have pytroll with satpy and pyresample for satellite data processing and reprojecting. Maybe consider renaming the package?

rename x --> x_osgb

Detailed Description

Rename x to x_osgb and y to y_osgb

Context

Good to use the same variable names across different repos
linked with openclimatefix/nowcasting_dataset#558

Possible Implementation

just rename zarr file, rather than remake all data

Possible Implementation

Fix pydocstyle and flake8 errors

Describe the bug
pydocstyle and flake8 raise a bunch of errors:

To Reproduce

  1. pip install pre-commit flake8 pydocstyle
  2. pre-commit install
  3. pre-commit run --al-files

Here's the error in full

(satip) jack@jack-NUC:~/dev/ocf/Satip$ pre-commit run --all-files
Trim Trailing Whitespace.................................................Passed
Fix End of Files.........................................................Passed
Check Yaml...............................................................Passed
Debug Statements (Python)................................................Passed
Detect Private Key.......................................................Passed
pydocstyle...............................................................Failed
- hook id: pydocstyle
- exit code: 1

satip/utils.py:1 at module level:
        D100: Missing docstring in public module
satip/utils.py:199 in public function `convert_scene_to_dataarray`:
        D103: Missing docstring in public function
satip/utils.py:256 in public function `save_dataset_to_zarr`:
        D417: Missing argument descriptions in the docstring (argument(s) channel_chunk_size, dtype are missing descriptions in 'save_dataset_to_zarr' docstring)
satip/utils.py:342 in public function `create_markdown_table`:
        D205: 1 blank line required between summary line and description (found 0)
satip/utils.py:383 in public function `set_up_logging`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:1 at module level:
        D100: Missing docstring in public module
satip/download.py:65 in public function `download_eumetsat_data`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:138 in public function `download_time_range`:
        D103: Missing docstring in public function
satip/download.py:185 in public function `sanity_check_files_and_move_to_directory`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:237 in public function `process_rss_images`:
        D103: Missing docstring in public function
satip/download.py:330 in public function `eumetsat_native_filename_to_datetime`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:330 in public function `eumetsat_native_filename_to_datetime`:
        D209: Multi-line docstring closing quotes should be on a separate line
satip/download.py:335 in public function `eumetsat_cloud_name_to_datetime`:
        D103: Missing docstring in public function
satip/download.py:339 in public function `get_basename`:
        D103: Missing docstring in public function
satip/download.py:346 in public function `get_missing_datetimes_from_list_of_files`:
        D205: 1 blank line required between summary line and description (found 0)
satip/compression.py:1 at module level:
        D100: Missing docstring in public module
satip/compression.py:7 in public class `Compressor`:
        D101: Missing docstring in public class
satip/compression.py:8 in public method `__init__`:
        D107: Missing docstring in __init__
satip/geospatial.py:1 at module level:
        D100: Missing docstring in public module
satip/intermediate.py:1 at module level:
        D100: Missing docstring in public module
satip/intermediate.py:28 in public function `split_per_month`:
        D417: Missing argument descriptions in the docstring (argument(s) hrv_zarr_path, temp_directory are missing descriptions in 'split_per_month' docstring)
satip/intermediate.py:111 in public function `wrapper`:
        D103: Missing docstring in public function
satip/intermediate.py:126 in public function `cloudmask_split_per_month`:
        D417: Missing argument descriptions in the docstring (argument(s) temp_directory are missing descriptions in 'cloudmask_split_per_month' docstring)
satip/intermediate.py:191 in public function `cloudmask_wrapper`:
        D103: Missing docstring in public function
satip/intermediate.py:206 in public function `create_or_update_zarr_with_cloud_mask_files`:
        D417: Missing argument descriptions in the docstring (argument(s) temp_directory are missing descriptions in 'create_or_update_zarr_with_cloud_mask_files' docstring)
satip/intermediate.py:262 in public function `create_or_update_zarr_with_native_files`:
        D417: Missing argument descriptions in the docstring (argument(s) hrv_zarr_path, temp_directory are missing descriptions in 'create_or_update_zarr_with_native_files' docstring)
satip/intermediate.py:321 in public function `pool_init`:
        D103: Missing docstring in public function
satip/intermediate.py:326 in public function `native_wrapper`:
        D103: Missing docstring in public function
satip/eumetsat.py:1 at module level:
        D100: Missing docstring in public module
satip/eumetsat.py:55 in public function `request_access_token`:
        D417: Missing argument descriptions in the docstring (argument(s) user_key, user_secret are missing descriptions in 'request_access_token' docstring)
satip/eumetsat.py:90 in public function `query_data_products`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:90 in public function `query_data_products`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, num_features, product_id, start_date, start_index are missing descriptions in 'query_data_products' docstring)
satip/eumetsat.py:131 in public function `identify_available_datasets`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:131 in public function `identify_available_datasets`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, log, product_id, start_date are missing descriptions in 'identify_available_datasets' docstring)
satip/eumetsat.py:184 in public function `dataset_id_to_link`:
        D103: Missing docstring in public function
satip/eumetsat.py:192 in public function `json_extract`:
        D103: Missing docstring in public function
satip/eumetsat.py:202 in public function `check_valid_request`:
        D417: Missing argument descriptions in the docstring (argument(s) r are missing descriptions in 'check_valid_request' docstring)
satip/eumetsat.py:223 in public class `DownloadManager`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:237 in public method `__init__`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:237 in public method `__init__`:
        D417: Missing argument descriptions in the docstring (argument(s) data_dir, log_fp, logger_name, user_key, user_secret are missing descriptions in '__init__' docstring)
satip/eumetsat.py:279 in public method `request_access_token`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:279 in public method `request_access_token`:
        D417: Missing argument descriptions in the docstring (argument(s) user_key, user_secret are missing descriptions in 'request_access_token' docstring)
satip/eumetsat.py:303 in public method `download_single_dataset`:
        D417: Missing argument descriptions in the docstring (argument(s) data_link are missing descriptions in 'download_single_dataset' docstring)
satip/eumetsat.py:324 in public method `download_date_range`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:324 in public method `download_date_range`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, product_id, start_date are missing descriptions in 'download_date_range' docstring)
satip/eumetsat.py:339 in public method `download_datasets`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:339 in public method `download_datasets`:
        D417: Missing argument descriptions in the docstring (argument(s) datasets, product_id are missing descriptions in 'download_datasets' docstring)
satip/eumetsat.py:380 in public method `download_tailored_date_range`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:380 in public method `download_tailored_date_range`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, file_format, product_id, projection, roi, start_date are missing descriptions in 'download_tailored_date_range' docstring)
satip/eumetsat.py:454 in private method `_download_single_tailored_dataset`:
        D417: Missing argument descriptions in the docstring (argument(s) product_id are missing descriptions in '_download_single_tailored_dataset' docstring)
satip/eumetsat.py:543 in public function `get_dir_size`:
        D103: Missing docstring in public function
satip/eumetsat.py:563 in public function `eumetsat_filename_to_datetime`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:563 in public function `eumetsat_filename_to_datetime`:
        D209: Multi-line docstring closing quotes should be on a separate line
satip/__init__.py:1 at module level:
        D104: Missing docstring in public package
satip/download.py:1 at module level:
        D100: Missing docstring in public module
satip/download.py:65 in public function `download_eumetsat_data`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:138 in public function `download_time_range`:
        D103: Missing docstring in public function
satip/download.py:185 in public function `sanity_check_files_and_move_to_directory`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:237 in public function `process_rss_images`:
        D103: Missing docstring in public function
satip/download.py:330 in public function `eumetsat_native_filename_to_datetime`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:330 in public function `eumetsat_native_filename_to_datetime`:
        D209: Multi-line docstring closing quotes should be on a separate line
satip/download.py:335 in public function `eumetsat_cloud_name_to_datetime`:
        D103: Missing docstring in public function
satip/download.py:339 in public function `get_basename`:
        D103: Missing docstring in public function
satip/download.py:346 in public function `get_missing_datetimes_from_list_of_files`:
        D205: 1 blank line required between summary line and description (found 0)
scripts/move_files.py:1 at module level:
        D100: Missing docstring in public module
satip/utils.py:1 at module level:
        D100: Missing docstring in public module
satip/utils.py:199 in public function `convert_scene_to_dataarray`:
        D103: Missing docstring in public function
satip/utils.py:256 in public function `save_dataset_to_zarr`:
        D417: Missing argument descriptions in the docstring (argument(s) channel_chunk_size, dtype are missing descriptions in 'save_dataset_to_zarr' docstring)
satip/utils.py:342 in public function `create_markdown_table`:
        D205: 1 blank line required between summary line and description (found 0)
satip/utils.py:383 in public function `set_up_logging`:
        D205: 1 blank line required between summary line and description (found 0)
satip/__init__.py:1 at module level:
        D104: Missing docstring in public package
satip/utils.py:1 at module level:
        D100: Missing docstring in public module
satip/utils.py:199 in public function `convert_scene_to_dataarray`:
        D103: Missing docstring in public function
satip/utils.py:256 in public function `save_dataset_to_zarr`:
        D417: Missing argument descriptions in the docstring (argument(s) channel_chunk_size, dtype are missing descriptions in 'save_dataset_to_zarr' docstring)
satip/utils.py:342 in public function `create_markdown_table`:
        D205: 1 blank line required between summary line and description (found 0)
satip/utils.py:383 in public function `set_up_logging`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:1 at module level:
        D100: Missing docstring in public module
satip/download.py:65 in public function `download_eumetsat_data`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:138 in public function `download_time_range`:
        D103: Missing docstring in public function
satip/download.py:185 in public function `sanity_check_files_and_move_to_directory`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:237 in public function `process_rss_images`:
        D103: Missing docstring in public function
satip/download.py:330 in public function `eumetsat_native_filename_to_datetime`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:330 in public function `eumetsat_native_filename_to_datetime`:
        D209: Multi-line docstring closing quotes should be on a separate line
satip/download.py:335 in public function `eumetsat_cloud_name_to_datetime`:
        D103: Missing docstring in public function
satip/download.py:339 in public function `get_basename`:
        D103: Missing docstring in public function
satip/download.py:346 in public function `get_missing_datetimes_from_list_of_files`:
        D205: 1 blank line required between summary line and description (found 0)
satip/compression.py:1 at module level:
        D100: Missing docstring in public module
satip/compression.py:7 in public class `Compressor`:
        D101: Missing docstring in public class
satip/compression.py:8 in public method `__init__`:
        D107: Missing docstring in __init__
satip/geospatial.py:1 at module level:
        D100: Missing docstring in public module
satip/intermediate.py:1 at module level:
        D100: Missing docstring in public module
satip/intermediate.py:28 in public function `split_per_month`:
        D417: Missing argument descriptions in the docstring (argument(s) hrv_zarr_path, temp_directory are missing descriptions in 'split_per_month' docstring)
satip/intermediate.py:111 in public function `wrapper`:
        D103: Missing docstring in public function
satip/intermediate.py:126 in public function `cloudmask_split_per_month`:
        D417: Missing argument descriptions in the docstring (argument(s) temp_directory are missing descriptions in 'cloudmask_split_per_month' docstring)
satip/intermediate.py:191 in public function `cloudmask_wrapper`:
        D103: Missing docstring in public function
satip/intermediate.py:206 in public function `create_or_update_zarr_with_cloud_mask_files`:
        D417: Missing argument descriptions in the docstring (argument(s) temp_directory are missing descriptions in 'create_or_update_zarr_with_cloud_mask_files' docstring)
satip/intermediate.py:262 in public function `create_or_update_zarr_with_native_files`:
        D417: Missing argument descriptions in the docstring (argument(s) hrv_zarr_path, temp_directory are missing descriptions in 'create_or_update_zarr_with_native_files' docstring)
satip/intermediate.py:321 in public function `pool_init`:
        D103: Missing docstring in public function
satip/intermediate.py:326 in public function `native_wrapper`:
        D103: Missing docstring in public function
satip/eumetsat.py:1 at module level:
        D100: Missing docstring in public module
satip/eumetsat.py:55 in public function `request_access_token`:
        D417: Missing argument descriptions in the docstring (argument(s) user_key, user_secret are missing descriptions in 'request_access_token' docstring)
satip/eumetsat.py:90 in public function `query_data_products`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:90 in public function `query_data_products`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, num_features, product_id, start_date, start_index are missing descriptions in 'query_data_products' docstring)
satip/eumetsat.py:131 in public function `identify_available_datasets`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:131 in public function `identify_available_datasets`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, log, product_id, start_date are missing descriptions in 'identify_available_datasets' docstring)
satip/eumetsat.py:184 in public function `dataset_id_to_link`:
        D103: Missing docstring in public function
satip/eumetsat.py:192 in public function `json_extract`:
        D103: Missing docstring in public function
satip/eumetsat.py:202 in public function `check_valid_request`:
        D417: Missing argument descriptions in the docstring (argument(s) r are missing descriptions in 'check_valid_request' docstring)
satip/eumetsat.py:223 in public class `DownloadManager`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:237 in public method `__init__`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:237 in public method `__init__`:
        D417: Missing argument descriptions in the docstring (argument(s) data_dir, log_fp, logger_name, user_key, user_secret are missing descriptions in '__init__' docstring)
satip/eumetsat.py:279 in public method `request_access_token`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:279 in public method `request_access_token`:
        D417: Missing argument descriptions in the docstring (argument(s) user_key, user_secret are missing descriptions in 'request_access_token' docstring)
satip/eumetsat.py:303 in public method `download_single_dataset`:
        D417: Missing argument descriptions in the docstring (argument(s) data_link are missing descriptions in 'download_single_dataset' docstring)
satip/eumetsat.py:324 in public method `download_date_range`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:324 in public method `download_date_range`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, product_id, start_date are missing descriptions in 'download_date_range' docstring)
satip/eumetsat.py:339 in public method `download_datasets`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:339 in public method `download_datasets`:
        D417: Missing argument descriptions in the docstring (argument(s) datasets, product_id are missing descriptions in 'download_datasets' docstring)
satip/eumetsat.py:380 in public method `download_tailored_date_range`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:380 in public method `download_tailored_date_range`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, file_format, product_id, projection, roi, start_date are missing descriptions in 'download_tailored_date_range' docstring)
satip/eumetsat.py:454 in private method `_download_single_tailored_dataset`:
        D417: Missing argument descriptions in the docstring (argument(s) product_id are missing descriptions in '_download_single_tailored_dataset' docstring)
satip/eumetsat.py:543 in public function `get_dir_size`:
        D103: Missing docstring in public function
satip/eumetsat.py:563 in public function `eumetsat_filename_to_datetime`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:563 in public function `eumetsat_filename_to_datetime`:
        D209: Multi-line docstring closing quotes should be on a separate line
satip/__init__.py:1 at module level:
        D104: Missing docstring in public package
scripts/convert_cloudmask_to_zarr.py:1 at module level:
        D100: Missing docstring in public module
scripts/convert_cloudmask_to_zarr.py:30 in public function `create_eumetsat_zarr`:
        D103: Missing docstring in public function
satip/compression.py:1 at module level:
        D100: Missing docstring in public module
satip/compression.py:7 in public class `Compressor`:
        D101: Missing docstring in public class
satip/compression.py:8 in public method `__init__`:
        D107: Missing docstring in __init__
satip/geospatial.py:1 at module level:
        D100: Missing docstring in public module
scripts/get_raw_eumetsat_data.py:1 at module level:
        D100: Missing docstring in public module
scripts/get_raw_eumetsat_data.py:23 in public function `validate_date`:
        D103: Missing docstring in public function
scripts/get_raw_eumetsat_data.py:96 in public function `download_sat_files`:
        D103: Missing docstring in public function
satip/utils.py:1 at module level:
        D100: Missing docstring in public module
satip/utils.py:199 in public function `convert_scene_to_dataarray`:
        D103: Missing docstring in public function
satip/utils.py:256 in public function `save_dataset_to_zarr`:
        D417: Missing argument descriptions in the docstring (argument(s) channel_chunk_size, dtype are missing descriptions in 'save_dataset_to_zarr' docstring)
satip/utils.py:342 in public function `create_markdown_table`:
        D205: 1 blank line required between summary line and description (found 0)
satip/utils.py:383 in public function `set_up_logging`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:1 at module level:
        D100: Missing docstring in public module
satip/download.py:65 in public function `download_eumetsat_data`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:138 in public function `download_time_range`:
        D103: Missing docstring in public function
satip/download.py:185 in public function `sanity_check_files_and_move_to_directory`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:237 in public function `process_rss_images`:
        D103: Missing docstring in public function
satip/download.py:330 in public function `eumetsat_native_filename_to_datetime`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:330 in public function `eumetsat_native_filename_to_datetime`:
        D209: Multi-line docstring closing quotes should be on a separate line
satip/download.py:335 in public function `eumetsat_cloud_name_to_datetime`:
        D103: Missing docstring in public function
satip/download.py:339 in public function `get_basename`:
        D103: Missing docstring in public function
satip/download.py:346 in public function `get_missing_datetimes_from_list_of_files`:
        D205: 1 blank line required between summary line and description (found 0)
satip/compression.py:1 at module level:
        D100: Missing docstring in public module
satip/compression.py:7 in public class `Compressor`:
        D101: Missing docstring in public class
satip/compression.py:8 in public method `__init__`:
        D107: Missing docstring in __init__
satip/geospatial.py:1 at module level:
        D100: Missing docstring in public module
satip/intermediate.py:1 at module level:
        D100: Missing docstring in public module
satip/intermediate.py:28 in public function `split_per_month`:
        D417: Missing argument descriptions in the docstring (argument(s) hrv_zarr_path, temp_directory are missing descriptions in 'split_per_month' docstring)
satip/intermediate.py:111 in public function `wrapper`:
        D103: Missing docstring in public function
satip/intermediate.py:126 in public function `cloudmask_split_per_month`:
        D417: Missing argument descriptions in the docstring (argument(s) temp_directory are missing descriptions in 'cloudmask_split_per_month' docstring)
satip/intermediate.py:191 in public function `cloudmask_wrapper`:
        D103: Missing docstring in public function
satip/intermediate.py:206 in public function `create_or_update_zarr_with_cloud_mask_files`:
        D417: Missing argument descriptions in the docstring (argument(s) temp_directory are missing descriptions in 'create_or_update_zarr_with_cloud_mask_files' docstring)
satip/intermediate.py:262 in public function `create_or_update_zarr_with_native_files`:
        D417: Missing argument descriptions in the docstring (argument(s) hrv_zarr_path, temp_directory are missing descriptions in 'create_or_update_zarr_with_native_files' docstring)
satip/intermediate.py:321 in public function `pool_init`:
        D103: Missing docstring in public function
satip/intermediate.py:326 in public function `native_wrapper`:
        D103: Missing docstring in public function
satip/eumetsat.py:1 at module level:
        D100: Missing docstring in public module
satip/eumetsat.py:55 in public function `request_access_token`:
        D417: Missing argument descriptions in the docstring (argument(s) user_key, user_secret are missing descriptions in 'request_access_token' docstring)
satip/eumetsat.py:90 in public function `query_data_products`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:90 in public function `query_data_products`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, num_features, product_id, start_date, start_index are missing descriptions in 'query_data_products' docstring)
satip/eumetsat.py:131 in public function `identify_available_datasets`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:131 in public function `identify_available_datasets`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, log, product_id, start_date are missing descriptions in 'identify_available_datasets' docstring)
satip/eumetsat.py:184 in public function `dataset_id_to_link`:
        D103: Missing docstring in public function
satip/eumetsat.py:192 in public function `json_extract`:
        D103: Missing docstring in public function
satip/eumetsat.py:202 in public function `check_valid_request`:
        D417: Missing argument descriptions in the docstring (argument(s) r are missing descriptions in 'check_valid_request' docstring)
satip/eumetsat.py:223 in public class `DownloadManager`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:237 in public method `__init__`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:237 in public method `__init__`:
        D417: Missing argument descriptions in the docstring (argument(s) data_dir, log_fp, logger_name, user_key, user_secret are missing descriptions in '__init__' docstring)
satip/eumetsat.py:279 in public method `request_access_token`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:279 in public method `request_access_token`:
        D417: Missing argument descriptions in the docstring (argument(s) user_key, user_secret are missing descriptions in 'request_access_token' docstring)
satip/eumetsat.py:303 in public method `download_single_dataset`:
        D417: Missing argument descriptions in the docstring (argument(s) data_link are missing descriptions in 'download_single_dataset' docstring)
satip/eumetsat.py:324 in public method `download_date_range`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:324 in public method `download_date_range`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, product_id, start_date are missing descriptions in 'download_date_range' docstring)
satip/eumetsat.py:339 in public method `download_datasets`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:339 in public method `download_datasets`:
        D417: Missing argument descriptions in the docstring (argument(s) datasets, product_id are missing descriptions in 'download_datasets' docstring)
satip/eumetsat.py:380 in public method `download_tailored_date_range`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:380 in public method `download_tailored_date_range`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, file_format, product_id, projection, roi, start_date are missing descriptions in 'download_tailored_date_range' docstring)
satip/eumetsat.py:454 in private method `_download_single_tailored_dataset`:
        D417: Missing argument descriptions in the docstring (argument(s) product_id are missing descriptions in '_download_single_tailored_dataset' docstring)
satip/eumetsat.py:543 in public function `get_dir_size`:
        D103: Missing docstring in public function
satip/eumetsat.py:563 in public function `eumetsat_filename_to_datetime`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:563 in public function `eumetsat_filename_to_datetime`:
        D209: Multi-line docstring closing quotes should be on a separate line
satip/__init__.py:1 at module level:
        D104: Missing docstring in public package
setup.py:1 at module level:
        D100: Missing docstring in public module
scripts/generate_test_plots.py:1 at module level:
        D205: 1 blank line required between summary line and description (found 0)
scripts/generate_test_plots.py:32 in public function `plot_tailored`:
        D103: Missing docstring in public function
scripts/generate_test_plots.py:73 in public function `plot_dataset`:
        D103: Missing docstring in public function
satip/intermediate.py:1 at module level:
        D100: Missing docstring in public module
satip/intermediate.py:28 in public function `split_per_month`:
        D417: Missing argument descriptions in the docstring (argument(s) hrv_zarr_path, temp_directory are missing descriptions in 'split_per_month' docstring)
satip/intermediate.py:111 in public function `wrapper`:
        D103: Missing docstring in public function
satip/intermediate.py:126 in public function `cloudmask_split_per_month`:
        D417: Missing argument descriptions in the docstring (argument(s) temp_directory are missing descriptions in 'cloudmask_split_per_month' docstring)
satip/intermediate.py:191 in public function `cloudmask_wrapper`:
        D103: Missing docstring in public function
satip/intermediate.py:206 in public function `create_or_update_zarr_with_cloud_mask_files`:
        D417: Missing argument descriptions in the docstring (argument(s) temp_directory are missing descriptions in 'create_or_update_zarr_with_cloud_mask_files' docstring)
satip/intermediate.py:262 in public function `create_or_update_zarr_with_native_files`:
        D417: Missing argument descriptions in the docstring (argument(s) hrv_zarr_path, temp_directory are missing descriptions in 'create_or_update_zarr_with_native_files' docstring)
satip/intermediate.py:321 in public function `pool_init`:
        D103: Missing docstring in public function
satip/intermediate.py:326 in public function `native_wrapper`:
        D103: Missing docstring in public function
satip/eumetsat.py:1 at module level:
        D100: Missing docstring in public module
satip/eumetsat.py:55 in public function `request_access_token`:
        D417: Missing argument descriptions in the docstring (argument(s) user_key, user_secret are missing descriptions in 'request_access_token' docstring)
satip/eumetsat.py:90 in public function `query_data_products`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:90 in public function `query_data_products`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, num_features, product_id, start_date, start_index are missing descriptions in 'query_data_products' docstring)
satip/eumetsat.py:131 in public function `identify_available_datasets`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:131 in public function `identify_available_datasets`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, log, product_id, start_date are missing descriptions in 'identify_available_datasets' docstring)
satip/eumetsat.py:184 in public function `dataset_id_to_link`:
        D103: Missing docstring in public function
satip/eumetsat.py:192 in public function `json_extract`:
        D103: Missing docstring in public function
satip/eumetsat.py:202 in public function `check_valid_request`:
        D417: Missing argument descriptions in the docstring (argument(s) r are missing descriptions in 'check_valid_request' docstring)
satip/eumetsat.py:223 in public class `DownloadManager`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:237 in public method `__init__`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:237 in public method `__init__`:
        D417: Missing argument descriptions in the docstring (argument(s) data_dir, log_fp, logger_name, user_key, user_secret are missing descriptions in '__init__' docstring)
satip/eumetsat.py:279 in public method `request_access_token`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:279 in public method `request_access_token`:
        D417: Missing argument descriptions in the docstring (argument(s) user_key, user_secret are missing descriptions in 'request_access_token' docstring)
satip/eumetsat.py:303 in public method `download_single_dataset`:
        D417: Missing argument descriptions in the docstring (argument(s) data_link are missing descriptions in 'download_single_dataset' docstring)
satip/eumetsat.py:324 in public method `download_date_range`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:324 in public method `download_date_range`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, product_id, start_date are missing descriptions in 'download_date_range' docstring)
satip/eumetsat.py:339 in public method `download_datasets`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:339 in public method `download_datasets`:
        D417: Missing argument descriptions in the docstring (argument(s) datasets, product_id are missing descriptions in 'download_datasets' docstring)
satip/eumetsat.py:380 in public method `download_tailored_date_range`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:380 in public method `download_tailored_date_range`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, file_format, product_id, projection, roi, start_date are missing descriptions in 'download_tailored_date_range' docstring)
satip/eumetsat.py:454 in private method `_download_single_tailored_dataset`:
        D417: Missing argument descriptions in the docstring (argument(s) product_id are missing descriptions in '_download_single_tailored_dataset' docstring)
satip/eumetsat.py:543 in public function `get_dir_size`:
        D103: Missing docstring in public function
satip/eumetsat.py:563 in public function `eumetsat_filename_to_datetime`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:563 in public function `eumetsat_filename_to_datetime`:
        D209: Multi-line docstring closing quotes should be on a separate line
satip/utils.py:1 at module level:
        D100: Missing docstring in public module
satip/utils.py:199 in public function `convert_scene_to_dataarray`:
        D103: Missing docstring in public function
satip/utils.py:256 in public function `save_dataset_to_zarr`:
        D417: Missing argument descriptions in the docstring (argument(s) channel_chunk_size, dtype are missing descriptions in 'save_dataset_to_zarr' docstring)
satip/utils.py:342 in public function `create_markdown_table`:
        D205: 1 blank line required between summary line and description (found 0)
satip/utils.py:383 in public function `set_up_logging`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:1 at module level:
        D100: Missing docstring in public module
satip/download.py:65 in public function `download_eumetsat_data`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:138 in public function `download_time_range`:
        D103: Missing docstring in public function
satip/download.py:185 in public function `sanity_check_files_and_move_to_directory`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:237 in public function `process_rss_images`:
        D103: Missing docstring in public function
satip/download.py:330 in public function `eumetsat_native_filename_to_datetime`:
        D205: 1 blank line required between summary line and description (found 0)
satip/download.py:330 in public function `eumetsat_native_filename_to_datetime`:
        D209: Multi-line docstring closing quotes should be on a separate line
satip/download.py:335 in public function `eumetsat_cloud_name_to_datetime`:
        D103: Missing docstring in public function
satip/download.py:339 in public function `get_basename`:
        D103: Missing docstring in public function
satip/download.py:346 in public function `get_missing_datetimes_from_list_of_files`:
        D205: 1 blank line required between summary line and description (found 0)
satip/compression.py:1 at module level:
        D100: Missing docstring in public module
satip/compression.py:7 in public class `Compressor`:
        D101: Missing docstring in public class
satip/compression.py:8 in public method `__init__`:
        D107: Missing docstring in __init__
satip/geospatial.py:1 at module level:
        D100: Missing docstring in public module
satip/intermediate.py:1 at module level:
        D100: Missing docstring in public module
satip/intermediate.py:28 in public function `split_per_month`:
        D417: Missing argument descriptions in the docstring (argument(s) hrv_zarr_path, temp_directory are missing descriptions in 'split_per_month' docstring)
satip/intermediate.py:111 in public function `wrapper`:
        D103: Missing docstring in public function
satip/intermediate.py:126 in public function `cloudmask_split_per_month`:
        D417: Missing argument descriptions in the docstring (argument(s) temp_directory are missing descriptions in 'cloudmask_split_per_month' docstring)
satip/intermediate.py:191 in public function `cloudmask_wrapper`:
        D103: Missing docstring in public function
satip/intermediate.py:206 in public function `create_or_update_zarr_with_cloud_mask_files`:
        D417: Missing argument descriptions in the docstring (argument(s) temp_directory are missing descriptions in 'create_or_update_zarr_with_cloud_mask_files' docstring)
satip/intermediate.py:262 in public function `create_or_update_zarr_with_native_files`:
        D417: Missing argument descriptions in the docstring (argument(s) hrv_zarr_path, temp_directory are missing descriptions in 'create_or_update_zarr_with_native_files' docstring)
satip/intermediate.py:321 in public function `pool_init`:
        D103: Missing docstring in public function
satip/intermediate.py:326 in public function `native_wrapper`:
        D103: Missing docstring in public function
satip/eumetsat.py:1 at module level:
        D100: Missing docstring in public module
satip/eumetsat.py:55 in public function `request_access_token`:
        D417: Missing argument descriptions in the docstring (argument(s) user_key, user_secret are missing descriptions in 'request_access_token' docstring)
satip/eumetsat.py:90 in public function `query_data_products`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:90 in public function `query_data_products`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, num_features, product_id, start_date, start_index are missing descriptions in 'query_data_products' docstring)
satip/eumetsat.py:131 in public function `identify_available_datasets`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:131 in public function `identify_available_datasets`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, log, product_id, start_date are missing descriptions in 'identify_available_datasets' docstring)
satip/eumetsat.py:184 in public function `dataset_id_to_link`:
        D103: Missing docstring in public function
satip/eumetsat.py:192 in public function `json_extract`:
        D103: Missing docstring in public function
satip/eumetsat.py:202 in public function `check_valid_request`:
        D417: Missing argument descriptions in the docstring (argument(s) r are missing descriptions in 'check_valid_request' docstring)
satip/eumetsat.py:223 in public class `DownloadManager`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:237 in public method `__init__`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:237 in public method `__init__`:
        D417: Missing argument descriptions in the docstring (argument(s) data_dir, log_fp, logger_name, user_key, user_secret are missing descriptions in '__init__' docstring)
satip/eumetsat.py:279 in public method `request_access_token`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:279 in public method `request_access_token`:
        D417: Missing argument descriptions in the docstring (argument(s) user_key, user_secret are missing descriptions in 'request_access_token' docstring)
satip/eumetsat.py:303 in public method `download_single_dataset`:
        D417: Missing argument descriptions in the docstring (argument(s) data_link are missing descriptions in 'download_single_dataset' docstring)
satip/eumetsat.py:324 in public method `download_date_range`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:324 in public method `download_date_range`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, product_id, start_date are missing descriptions in 'download_date_range' docstring)
satip/eumetsat.py:339 in public method `download_datasets`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:339 in public method `download_datasets`:
        D417: Missing argument descriptions in the docstring (argument(s) datasets, product_id are missing descriptions in 'download_datasets' docstring)
satip/eumetsat.py:380 in public method `download_tailored_date_range`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:380 in public method `download_tailored_date_range`:
        D417: Missing argument descriptions in the docstring (argument(s) end_date, file_format, product_id, projection, roi, start_date are missing descriptions in 'download_tailored_date_range' docstring)
satip/eumetsat.py:454 in private method `_download_single_tailored_dataset`:
        D417: Missing argument descriptions in the docstring (argument(s) product_id are missing descriptions in '_download_single_tailored_dataset' docstring)
satip/eumetsat.py:543 in public function `get_dir_size`:
        D103: Missing docstring in public function
satip/eumetsat.py:563 in public function `eumetsat_filename_to_datetime`:
        D205: 1 blank line required between summary line and description (found 0)
satip/eumetsat.py:563 in public function `eumetsat_filename_to_datetime`:
        D209: Multi-line docstring closing quotes should be on a separate line
satip/__init__.py:1 at module level:
        D104: Missing docstring in public package
tests/test_eumetsat.py:1 at module level:
        D100: Missing docstring in public module
tests/test_eumetsat.py:4 in public function `test_filename_to_datetime`:
        D103: Missing docstring in public function
tests/test_eumetsat.py:8 in public function `test_data_tailor`:
        D103: Missing docstring in public function
scripts/cloudmask.py:1 at module level:
        D100: Missing docstring in public module
scripts/convert_native_to_zarr.py:1 at module level:
        D100: Missing docstring in public module
scripts/convert_native_to_zarr.py:36 in public function `create_eumetsat_zarr`:
        D103: Missing docstring in public function

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

satip/utils.py:7:1: F401 'typing.Union' imported but unused
satip/utils.py:13:1: F401 'zarr' imported but unused
satip/utils.py:26:101: E501 line too long (221 > 100 characters)
satip/utils.py:153:101: E501 line too long (104 > 100 characters)
satip/utils.py:186:101: E501 line too long (104 > 100 characters)
satip/utils.py:419:101: E501 line too long (115 > 100 characters)
satip/utils.py:420:101: E501 line too long (118 > 100 characters)
satip/download.py:50:1: E731 do not assign a lambda expression, use a def
satip/download.py:73:101: E501 line too long (114 > 100 characters)
satip/download.py:225:101: E501 line too long (125 > 100 characters)
satip/download.py:245:101: E501 line too long (108 > 100 characters)
satip/download.py:249:9: E722 do not use bare 'except'
satip/download.py:267:9: E722 do not use bare 'except'
satip/download.py:271:101: E501 line too long (121 > 100 characters)
satip/download.py:276:9: E722 do not use bare 'except'
satip/download.py:319:101: E501 line too long (104 > 100 characters)
satip/download.py:361:101: E501 line too long (103 > 100 characters)
satip/download.py:374:101: E501 line too long (113 > 100 characters)
satip/geospatial.py:1:1: F401 'datetime' imported but unused
satip/eumetsat.py:10:1: F401 'typing.List' imported but unused
satip/eumetsat.py:14:1: F401 'requests.auth.HTTPBasicAuth' imported but unused
satip/eumetsat.py:80:1: E731 do not assign a lambda expression, use a def
satip/eumetsat.py:186:101: E501 line too long (136 > 100 characters)
satip/eumetsat.py:213:13: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
satip/eumetsat.py:258:26: F541 f-string is missing placeholders
satip/eumetsat.py:363:13: E722 do not use bare 'except'
satip/eumetsat.py:435:13: E722 do not use bare 'except'
satip/eumetsat.py:566:51: W605 invalid escape sequence '\d'
satip/eumetsat.py:566:55: W605 invalid escape sequence '\.'
satip/eumetsat.py:573:101: E501 line too long (112 > 100 characters)
satip/download.py:50:1: E731 do not assign a lambda expression, use a def
satip/download.py:73:101: E501 line too long (114 > 100 characters)
satip/download.py:225:101: E501 line too long (125 > 100 characters)
satip/download.py:245:101: E501 line too long (108 > 100 characters)
satip/download.py:249:9: E722 do not use bare 'except'
satip/download.py:267:9: E722 do not use bare 'except'
satip/download.py:271:101: E501 line too long (121 > 100 characters)
satip/download.py:276:9: E722 do not use bare 'except'
satip/download.py:319:101: E501 line too long (104 > 100 characters)
satip/download.py:361:101: E501 line too long (103 > 100 characters)
satip/download.py:374:101: E501 line too long (113 > 100 characters)
scripts/move_files.py:2:1: F401 'shutil' imported but unused
scripts/move_files.py:6:101: E501 line too long (137 > 100 characters)
scripts/move_files.py:10:14: F821 undefined name 'xr'
scripts/move_files.py:10:27: F821 undefined name 'xr'
scripts/move_files.py:12:31: F821 undefined name 'osgb_x'
scripts/move_files.py:14:39: F821 undefined name 'osgb_x'
scripts/move_files.py:14:49: F821 undefined name 'osgb_y'
scripts/move_files.py:18:9: F821 undefined name 'Scene'
scripts/move_files.py:18:48: F821 undefined name 'filename'
scripts/move_files.py:25:28: F821 undefined name 'GEOGRAPHIC_BOUNDS'
scripts/move_files.py:28:18: F821 undefined name 'lat_lon_to_osgb'
scripts/move_files.py:33:101: E501 line too long (141 > 100 characters)
scripts/move_files.py:37:14: F821 undefined name 'xr'
scripts/move_files.py:37:27: F821 undefined name 'xr'
satip/utils.py:7:1: F401 'typing.Union' imported but unused
satip/utils.py:13:1: F401 'zarr' imported but unused
satip/utils.py:26:101: E501 line too long (221 > 100 characters)
satip/utils.py:153:101: E501 line too long (104 > 100 characters)
satip/utils.py:186:101: E501 line too long (104 > 100 characters)
satip/utils.py:419:101: E501 line too long (115 > 100 characters)
satip/utils.py:420:101: E501 line too long (118 > 100 characters)
satip/geospatial.py:1:1: F401 'datetime' imported but unused
scripts/get_raw_eumetsat_data.py:20:1: E731 do not assign a lambda expression, use a def
scripts/get_raw_eumetsat_data.py:35:101: E501 line too long (105 > 100 characters)
scripts/get_raw_eumetsat_data.py:55:101: E501 line too long (106 > 100 characters)
scripts/get_raw_eumetsat_data.py:62:101: E501 line too long (104 > 100 characters)
scripts/get_raw_eumetsat_data.py:68:101: E501 line too long (110 > 100 characters)
setup.py:14:101: E501 line too long (103 > 100 characters)
satip/eumetsat.py:10:1: F401 'typing.List' imported but unused
satip/eumetsat.py:14:1: F401 'requests.auth.HTTPBasicAuth' imported but unused
satip/eumetsat.py:80:1: E731 do not assign a lambda expression, use a def
satip/eumetsat.py:186:101: E501 line too long (136 > 100 characters)
satip/eumetsat.py:213:13: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
satip/eumetsat.py:258:26: F541 f-string is missing placeholders
satip/eumetsat.py:363:13: E722 do not use bare 'except'
satip/eumetsat.py:435:13: E722 do not use bare 'except'
satip/eumetsat.py:566:51: W605 invalid escape sequence '\d'
satip/eumetsat.py:566:55: W605 invalid escape sequence '\.'
satip/eumetsat.py:573:101: E501 line too long (112 > 100 characters)
tests/test_eumetsat.py:1:1: F401 'satip.eumetsat' imported but unused
scripts/cloudmask.py:1:1: F401 'satpy' imported but unused
scripts/convert_native_to_zarr.py:3:1: F401 'satip.intermediate.create_or_update_zarr_with_native_files' imported but unused

isort....................................................................Passed
black....................................................................Passed
prettier.................................................................Passed

Test for latest app

Write test to check that satellite is downloaded for main app to get latest data

Possible Implementation

might need to mock an api

Why does `git` download 355 MBytes when cloning `satip`?!

jack@leonardo:~/dev/ocf$ git clone [email protected]:openclimatefix/Satip.git
Cloning into 'Satip'...
Enter passphrase for key '/home/jack/.ssh/id_ed25519': 
remote: Enumerating objects: 2221, done.
remote: Counting objects: 100% (101/101), done.
remote: Compressing objects: 100% (94/94), done.
remote: Total 2221 (delta 44), reused 54 (delta 0), pack-reused 2120
Receiving objects: 100% (2221/2221), 354.96 MiB | 5.65 MiB/s, done.
Resolving deltas: 100% (1215/1215), done.

jack@leonardo:~/dev/ocf$ du -h satip/
8.0K	satip/.git/logs/refs/remotes/origin
12K	satip/.git/logs/refs/remotes
8.0K	satip/.git/logs/refs/heads
24K	satip/.git/logs/refs
32K	satip/.git/logs
8.0K	satip/.git/info
4.0K	satip/.git/objects/info
356M	satip/.git/objects/pack
356M	satip/.git/objects
64K	satip/.git/hooks
4.0K	satip/.git/branches
8.0K	satip/.git/refs/remotes/origin
12K	satip/.git/refs/remotes
8.0K	satip/.git/refs/heads
4.0K	satip/.git/refs/tags
28K	satip/.git/refs
356M	satip/.git
12K	satip/.github/workflows
16K	satip/.github
8.0K	satip/tests
36K	satip/satip
356M	satip/

So almost all the space is being taken up by .git/objects/pack

I presume this is old testing data that's been removed from master, but still exists in the git history?

Plot latest.netcdf file

Detailed Description

related to #71 , would be great to have a script that plotted the latest.netcdf file. This will be good for any debugging.

Validate Zarr data

Detailed Description

Check that values are in the expected range, and that datetimes are correct, etc.

Possible Implementation

Validation could happen in two places:

  1. As part of the native to Zarr conversion.
  2. As a separate zarr_validation.py script.

(related to openclimatefix/nwp#8)

RSS Extant is off by 1 when appending to Zarr

Describe the bug
when using the full RSS extant, the y axis is either 1339 or 1340 pixels, and so fails to append the timestep depending on which size was used to create the Zarr store.

To Reproduce
Steps to reproduce the behavior:
Run the creation script

Expected behavior
All RSS extant images are the same size

Additional context
Add any other context about the problem here.

Add Linters

Detailed Description

Add the same linters used in other OCF repos, and fix any issues

Context

Its good to keep the code style and such the same throughout our codebase

Possible Implementation

Any of the .github folders in the main OCF repos should work

Permission Denied on creating file

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Run download script on leonardo for dates other than 2020/2021
  2. See error
PermissionError: [Errno 13] Permission denied: '/mnt/storage_a/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/satellite/EUMETSAT/SEVIRI_RSS/native/2018/11/13/MSG3-SEVI-MSG15-0100-NA-20181113041418.116000000Z-NA.nat.bz2'
Error [Errno 13] Permission denied: '/mnt/storage_a/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/satellite/EUMETSAT/SEVIRI_RSS/native/2018/11/13/MSG3-SEVI-MSG15-0100-NA-20181113041918.181000000Z-NA.nat.bz2' when sanity-checking /mnt/storage_a/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/satellite/EUMETSAT/SEVIRI_RSS/native/MSG3-SEVI-MSG15-0100-NA-20181113041918.181000000Z-NA.nat.  Deleting this file.  Will be downloaded next time this script is run.
Traceback (most recent call last):
  File "/home/jacob/Satip/satip/download.py", line 257, in process_rss_images
    fs.move(
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/fsspec/spec.py", line 1171, in move
    return self.mv(path1, path2, **kwargs)
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/fsspec/spec.py", line 883, in mv
    self.copy(path1, path2, recursive=recursive, maxdepth=maxdepth)
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/fsspec/spec.py", line 845, in copy
    self.cp_file(p1, p2, **kwargs)
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/fsspec/implementations/local.py", line 114, in cp_file
    shutil.copyfile(path1, path2)
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/shutil.py", line 265, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
PermissionError: [Errno 13] Permission denied: '/mnt/storage_a/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/satellite/EUMETSAT/SEVIRI_RSS/native/2018/11/13/MSG3-SEVI-MSG15-0100-NA-20181113041918.181000000Z-NA.nat.bz2'

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Experiment with using lossless JPEG-XL (colorspace=YUV 400)

Detailed Description

JPEG-XL is the "new kid on the block" for image compression. And it can losslessly compress greyscale images (using colorspace YUV 400). It might be great for compressing satellite images into our Zarr datasets.

Context

See issue #45 for previous experiments and notes.

This great comparison of lossless compression using JPEG-XL, WebP, AVIF, and PNG suggests JPEG-XL wins.

Possible Implementation

imagecodecs includes an interface to JPEG-XL.

Next step is to try manually compressing images using the cjxl app (not ImageMagick).

If that looks good then create a stand-alone little adapter library to adapt imagecodecs to be used with Zarr. Here's a super-simple little python library (just 51 lines of code!) which enables jpeg-2000 compression in Zarr using imagecodecs. Maybe it'd be possible to use the same pattern, but for JPEG-XL? UPDATE: @cgohike has already implemented this in imagecodecs! (See comment below)

To use ImageMagick for quick experiments at the commandline:

You need ImageMagick version > 7.0.10 to use JPEG-XL.

To install ImageMagick >7.0.10:

sudo apt-get install imagemagick libmagick++-dev 

See here for how to install ImageMagick from source.

Then use 'magick' command not 'convert'.

I'll do some experiments later today or tomorrow ๐Ÿ™‚

TODO

  • investigate whether the JPEG-XL lossly uint8 images are fine as is. It looks great visually. And 4 timesteps of UK HRV is only 0.6 MB (compared to 2.9 MB with bzip2; and 2.2 MB with JPEG-XL lossless uint16).
  • investigate whether we can use different color profiles. See these docs.
  • investigate if imagecodecs JPEG-XL can simply be installed through pip (or does it require libjxl to be manually installed first?) If it requires manual install then that maybe makes it inappropriate for a dataset that we might release publicly?
  • to get to 8bits, divide by 4 AND ROUND
  • Try the ideas Jon suggested on the JPEG-XL github issue queue (especially putting all 12 channels into a single JPEG-XL, using lossless compression)
  • See if it's possible to change the suffix of each Zarr chunk to .jxl (can't see how to do this and, anyway, Chrome cannot currently open jxl files)
  • Check output is the same (or roughly the same) as the input: plot gamma curve; compute MSE; etc.
  • try using alpha channel for NaNs. (haven't tried this but not going to bother because we can just use float16)
  • try float16 for saving NaNs. Yup, float16 saves NaNs, and there's no gamma curve. Need to map values to the range [0, 1].
  • try float32 with jpeg-xl
  • measure decompression speed of jpeg-xl vs gzip2
  • prepare a PR for Satip for using jpeg-xl

codecov > 70%

Detailed Description

Nice to get codecov greater than 70%

Add cloud masks to intermediate Zarr

Detailed Description

There are cloud masks available since 2008 for the RSS images that tell if a pixel is cloudy or not. Having these as another input could be useful to the model.

Context

This would could be useful for telling the model where clouds explicitly are in the images, potentially improving their performance.

Possible Implementation

The cloud masks are in a different format from the native files, but Satpy does open them, so it could be worth just adding it to the files that are opened by Satpy, and save it as another 'channel' in the Zarr file.

Cannot open file on `leonardo`

Describe the bug
There seems to be an issue with loading files with SatPy in leonardo, in that memap fails.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
Traceback (most recent call last):
  File "/home/jacob/Satip/scripts/convert_native_to_zarr.py", line 34, in <module>
    create_eumetsat_zarr()
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/jacob/Satip/scripts/convert_native_to_zarr.py", line 30, in create_eumetsat_zarr
    create_or_update_zarr_with_native_files(*args, **kwargs)
  File "/home/jacob/Satip/satip/intermediate.py", line 53, in create_or_update_zarr_with_native_files
    dataset, hrv_dataset = native_wrapper((compressed_native_files[0], region))
  File "/home/jacob/Satip/satip/intermediate.py", line 87, in native_wrapper
    return load_native_to_dataset(filename, area)
  File "/home/jacob/Satip/satip/utils.py", line 113, in load_native_to_dataset
    scene = Scene(filenames={"seviri_l1b_native": [decompressed_filename]})
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/satpy/scene.py", line 108, in __init__
    self._readers = self._create_reader_instances(filenames=filenames,
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/satpy/scene.py", line 157, in _create_reader_instances
    return load_readers(filenames=filenames,
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/satpy/readers/__init__.py", line 495, in load_readers
    reader_instance.create_filehandlers(
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/satpy/readers/yaml_reader.py", line 604, in create_filehandlers
    filehandlers = self._new_filehandlers_for_filetype(filetype_info,
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/satpy/readers/yaml_reader.py", line 592, in _new_filehandlers_for_filetype
    return list(filtered_iter)
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/satpy/readers/yaml_reader.py", line 560, in filter_fh_by_metadata
    for filehandler in filehandlers:
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/satpy/readers/yaml_reader.py", line 501, in _new_filehandler_instances
    yield filetype_cls(filename, filename_info, filetype_info, *req_fh, **fh_kwargs)
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/satpy/readers/seviri_l1b_native.py", line 108, in __init__
    self.dask_array = da.from_array(self._get_memmap(), chunks=(CHUNK_SIZE,))
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/satpy/readers/seviri_l1b_native.py", line 187, in _get_memmap
    return np.memmap(fp, dtype=data_dtype,
  File "/home/jacob/miniconda3/envs/ocf/lib/python3.9/site-packages/numpy/core/memmap.py", line 264, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OSError: [Errno 19] No such device

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Fix rectangle of zeros to the north east of the UK on `eumetsat_*.zarr`

Describe the bug
From manually looking at the "eumetsat_*.zarr" data (i.e. the non-HRV data), most timesteps have a rectangle of zeros like this:

image

Occasionally the rectangle extends further west:
image

Occasionally there is no rectangle:
image
image

The HRV data doesn't appear to have this "rectangle of zeros" problem.

In the non-HRV data: for a given timestep, the rectangle appears to be the same shape across all the non-HRV channels.

To Reproduce

ZARR_PATH = "/mnt/storage_ssd_8tb/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/satellite/EUMETSAT/SEVIRI_RSS/zarr/v2/eumetsat_*.zarr"

ds_from_zarr = xr.open_mfdataset(
    ZARR_PATH, mode="r", engine="zarr", chunks='auto', parallel=True, concat_dim='time', combine='nested',
    preprocess=lambda dataset: dataset.drop_vars("acq_time", errors="ignore")
)

VARIABLE = "VIS006"
data = ds_from_zarr['stacked_eumetsat_data'].sel(variable=VARIABLE, time=pd.Timestamp("2020-07-01T12:00"))
data.plot.imshow(figsize=(10, 10), x='x', y='y');

Expected behavior
No rectangle of zeros ๐Ÿ™‚

Additional context
For now, we can probably work around this issue:

  • The "rectangle of zeros" probably won't affect our training too much, because I would guess the "rectangle of zeros" is too far north to appear in most of our examples? (Although I don't remember how large of a satellite image we're currently using?)
  • We should maybe re-compute the means and standard deviations for the satellite data, ignoring the northern latitudes where the "rectangle of zeros" sometimes appears.
  • This doesn't appear to explain the -1 issue (openclimatefix/nowcasting_dataset#484), although it might be related (I'm not sure yet!)

`compress()` should probably use `clip(min=0, max=upper_bound)`

Detailed Description

In Compressor, if Compressor.mins and/or Compressor.maxs aren't quite right, then the 'compressed' data will not be guaranteed to lie in the range [0, upper_bound]. So, around line 108 of compression.py (just after round()), it might be good to do something like dataarray = dataarray.clip(min=0, max=upper_bound)

(although need to make sure clip is done before setting NaNs to -1 and casting to int)

Re-compute means and standard deviation of satellite data, ignoring 'rectangle of zeros' and ignoring `v2/eumetsat_zarr_2020_02.zarr`

Detailed Description

Issue #30 documents an issue in non-HRV Zarr data whereby there's a 'rectangle of zeros'. We should re-compute the means and standard deviations, ignoring northern latitudes which often have a 'rectangle of zeros'.

openclimatefix/nowcasting_dataset#484 documents an issue whereby v2/eumetsat_zarr_2020_02.zarr is full of noise from -30,000 to 30,000. We should ignore v2/eumetsat_zarr_2020_02.zarr when computing the means and stds.

Possible Implementation

Maybe create a simple script in Satip which computes the means and standard deviations (using dask?)

Only compute OSGB coordinates once per Zarr

Detailed Description

These two lines in utils.convert_scene_to_dataarray take several seconds to run and are currently run for every timestep:

    lon, lat = scene[band].attrs["area"].get_lonlats()
    osgb_x, osgb_y = lat_lon_to_osgb(lat, lon)

But xarray.Dataset.to_zarr() doesn't do anything with the coordinates when appending to a Zarr store. So the two lines above are wasted computation. (See the xarray docs on appending to Zarrs)

As a related task, we should check if the SEVIRI satellite changes the spatial coordinates of the imagery when the satellite "wobbles" due to propellant sloshing around.

Benchmark candidate intermediate file formats for EUMETSAT data

Detailed Description

In openclimatefix/nowcasting_dataset#176, it has become clear that opening .nat files using SatPy is way too slow to be used by nowcasting_dataset. So we need an intermediate file format, that's easier to load from.

Possible Implementation

Benchmark & explore several candidate intermediate formats:

  • GeoTIFF
  • NetCDF (one per timestep?)
  • Zarr
  • AVIF (modern compression, lossy or lossless, up to 12 bits per channel. The pillow plug in is only 8bpp)
  • WebP (modern compression, lossy or lossless, 8 bpp, chroma subsampling)
  • png
  • lossless JPEG 2000
  • any others

Try imagecodecs and tifffile https://www.lfd.uci.edu/~gohlke/

I like Zarr but it has two downsides for this usage:

  • one metadata file describes the entire Zarr dataset. If you mess up that metadata then you pretty much have to throw the whole thing away and start again. In contrast, if you mess up a few GeoTIFF files, just throw the bad ones away
  • I don't think Zarr seeks within chunks. It loads entire chunks. Which isn't great for random sampling. In contrast, I think we can seek into NetCDF and geotiff (and maybe others?)

Related:

Save the channel-specific metadata to the Zarr

Detailed Description

SatPy inserts some useful channel-specific metadata into data_array.attrs under the keys name, calibration, wavelength, standard_name, and _satpy_id.

SatPy stores this channel specific metadata as attrs under each DataArray in the Dataset. We can't do that because we collapse all the channel-specific DataArrays into a single DataArray (so we can put multiple channels into a single Zarr chunk).

Context

TBH, we don't current use this metadata in our downstream processing. But we might in the future. And other users definitely might want to see this metadata (for when we publish multi-channel SEVIRI data).

Possible Implementation

So we maybe need to write code to loop through each DataArray's attrs and prepend the channel name to the key. (e.g. the key name becomes VIS006_name etc.) and then concatenate all the channels' attrs together and then we can save all those keys and values into the single DataArray's attrs?

Add FAQ/Example

Detailed Description

Add a FAW for easier use, especially with the output Zarr files/xarray

Context

Useful as we go for more public competitions, other users of the data.

Possible Implementation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.