opendatacube / eo-datasets Goto Github PK

View Code? Open in Web Editor NEW

49.0 49.0 26.0 191.76 MB

Easily write, validate and convert EO datasets and metadata.

License: Apache License 2.0

Python 99.40% Shell 0.09% Dockerfile 0.26% Makefile 0.25%

hacktoberfest

eo-datasets's People

Contributors

Stargazers

Watchers

eo-datasets's Issues

Writing metadata only

Hello,
I want to create a datacube metadata index with absolute paths because my data are stored on another hard-drive. But when i try to write only metadata in a folder outside my data folder it doesn't work and it doesn't work neither when i try it in the same data folder, i have to write data too.
I have this error :

f"Path {value.as_posix()!r} is outside path {base_directory.as_posix()!r} "
ValueError: Path '/mnt/data/imagery/sentinel-2/SENTINEL2A_20180517-111020-570_L2A_T30UXU_D_V1-7/SENTINEL2A_20180517-111020-570_L2A_T30UXU_D_V1-7_FRE_B11.tif' is outside path '/home/alvarez_e/ODC' (allow_paths_outside_base=False)

It isn't possible to turn True allow_paths_outside_base in make_paths_relative() ?

Record groundstation metadata: Data quality

As detailed by @vrooke:

Provides Data quality

Read the processing log file (LANDSAT-7.86231.S3A2C2D4R4) which creates the raw data (RCC). I have attached a copy of this file to this email. To read the quality of this go to line

% error bins at 0.0 BER:

Read the Value, example: % error bins at 0.0 BER: 100.0

100% good data

Add raw MD packaging integration test

Rename 'ga-label' to 'label' on non-telemetry products

The label field was originally called 'ga-label' because it was expected to be in a GA-specific format (what was previously called "dataset id" on old systems).

AGDCv2 now uses labels, and defaults to 'label' as the field name (but the 'eo' type in agdc explicitly sets it as "ga_label").

We should rename it to 'label' for the metadata we write, as it's no longer GA specific, and it'll be a headache if we're using a different field name than every other AGDC instance. This should be done before the reprocess.

Satellite telemetry data should remain using "ga_label", because we've already written its metadata, and telemetry data will now be a separate metadata type in AGDC anyway (due to aos/los start/stop time conflicts Greg's reporting).

Add metadata support for product-specific fields.

Some fields are useful to track but cannot be generalised. Add a 'product-specific' section to the metadata model.

Use dataset ids in output NBAR band names

Requested by @lanweiwang

Could you please update the output reflectance file name? So that reflectance_brdf_1.tif become Dataset_ID_B1.tif; for terrain corrected SR, let’s use TNBAR to replace NBAR for the Dataset ID for the moment.

Unable to install on Windows

Running the command pip install . from within a Windows conda environment fails with the error:

Processing c:\users\u68320\pycharmprojects\eo-datasets
Could not install packages due to an EnvironmentError: [('C:\\Users\\u68320\\PycharmProjects\\eo-datasets\\tests\\integration\\input\\npp-viirs\\data\\NPP_VIIRS_STD-RDR_P00_NPP.VIIRS.18966.ALICE_0_0_20150626T053709Z20150626T054942_1\\RNSCA-RVIRS_npp_d20150626_t0537097_e0549423_b18966_c20150626055046759000_nfts_drl.h5', 'C:\\Users\\u68320\\AppData\\Local\\Temp\\pip-req-build-mkb3dieh\\tests\\integration\\input\\npp-viirs\\data\\NPP_VIIRS_STD-RDR_P00_NPP.VIIRS.18966.ALICE_0_0_20150626T053709Z20150626T054942_1\\RNSCA-RVIRS_npp_d20150626_t0537097_e0549423_b18966_c20150626055046759000_nfts_drl.h5', "[Errno 2] No such file or directory: 'C:\\\\Users\\\\u68320\\\\AppData\\\\Local\\\\Temp\\\\pip-req-build-mkb3dieh\\\\tests\\\\integration\\\\input\\\\npp-viirs\\\\data\\\\NPP_VIIRS_STD-RDR_P00_NPP.VIIRS.18966.ALICE_0_0_20150626T053709Z20150626T054942_1\\\\RNSCA-RVIRS_npp_d20150626_t0537097_e0549423_b18966_c20150626055046759000_nfts_drl.h5'")]

The no such file error path is 276 characters long. If I run set TMP=C:\tmp and set TEMP=C:\tmp first, the install succeeds. Windows doesn't like long file names. :(

Perhaps it would be possible to shorten the test data file names a bit to work around this.

change .yml extension to .yaml for consistency

eg. pq_metadata.yml

Allow creation of dataset metadata in memory

Allow creation of dataset metadata in memory, so that a separate tool can handle writing the data itself.

Datacube Statistician handles efficient writing to Object Stores and compressing to GeoTIFF itself, but to avoid duplication and incompatibilities we would like to use eo-datasets for the metadata document generation.

This assumes data/COGs/TIFs are already in place.

This may be a duplicate of #128 , I'm not sure.

@Kirill888

Support old-style yamls/datasets as sources

Currently the DatasetAssembler.add_source_* methods assume the source dataset is using the newer eo3 metadata format.

Perhaps we could look for the $schema variable in the metdata, and if missing assume it's an old-style dataset.

Add polygon boundary to ga-metadata.yml for NBAR/T and PQ

simplified json format generalised bounding box with 1 pixel buffer appended to yaml metadata as per @sixy6e example - addresses opendatacube/datacube-core#121

Move away from Travis CI

Travis queues are getting longer and failing more often. Our other projects have moved to Github Actions so this project should too.

We accidentally merged a PR with a (trivial) test failure yesterday, because it was merged manually to work around a Travis failure.

Add Public API for providing destination paths.

Implement a public API to provide destination paths (currently this function is not exposed publicly, but does exist).

This is required for Statistician to start using eo-datasets for metadata generation.

@Kirill888

Tests fail on local system

When I run pytest locally I'm getting the errors below.
Before running the tests I did pip install --user -e . to install the required packages.
Since the unit tests pass in the CI maybe there is a third-party package that needs to be more fully specified.

(dea) osboxes@osboxes:~/sandbox/pullrequest/eo-datasets/tests$ pytest
/home/osboxes/envs/dea/lib/python3.6/site-packages/_pytest/compat.py:340: PytestDeprecationWarning: The TerminalReporter.writer attribute is deprecated, use TerminalReporter._tw instead at your own risk.
See https://docs.pytest.org/en/stable/deprecations.html#terminalreporter-writer for more information.
  return getattr(object, name, default)
========================================================================================= test session starts =========================================================================================
platform linux -- Python 3.6.7, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/osboxes/sandbox/pullrequest/eo-datasets, configfile: setup.cfg
plugins: celery-4.3.0, cov-2.8.1
collected 85 items                                                                                                                                                                                    

test_documents.py ..                                                                                                                                                                            [  2%]
test_verify.py ..                                                                                                                                                                               [  4%]
integration/h5downsample.py .                                                                                                                                                                   [  5%]
integration/test_assemble.py ...........                                                                                                                                                        [ 18%]
integration/test_image.py ..                                                                                                                                                                    [ 21%]
integration/test_packagewagl.py ...                                                                                                                                                             [ 24%]
integration/test_recompress.py ......                                                                                                                                                           [ 31%]
integration/test_serialise.py ...                                                                                                                                                               [ 35%]
integration/test_thumbnail.py ..                                                                                                                                                                [ 37%]
integration/test_tostac.py ...                                                                                                                                                                  [ 41%]
integration/test_validate.py .........................................                                                                                                                          [ 89%]
integration/prepare/test_noaa_c_c_prwtreatm_1.py .                                                                                                                                              [ 90%]
integration/prepare/test_prepare_esa_sentinel_l1.py F                                                                                                                                           [ 91%]
integration/prepare/test_prepare_landsat_l1.py ......                                                                                                                                           [ 98%]
integration/prepare/test_prepare_sinergise_sentinel_l1.py F                                                                                                                                     [100%]

============================================================================================== FAILURES ===============================================================================================
______________________________________________________________________________________________ test_run _______________________________________________________________________________________________

tmp_path = PosixPath('/tmp/pytest-of-osboxes/pytest-4/test_run0')
expected_dataset_document = {'$schema': 'https://schemas.opendatacube.org/dataset', 'accessories': {'metadata:mtd_ds': {'path': 'S2B_MSIL1C_202010...800.0, 5990200.0], [600000.0, 5990200.0], [600000.0, 6099700.0], [600000.0, 6100000.0], ...]], 'type': 'Polygon'}, ...}

    def test_run(tmp_path, expected_dataset_document):
    
        # GIVEN:
        #     A folder of imagery
        dataset_id = DATASET_PATH.name.split(".")[0]
        outdir = tmp_path
        indir = DATASET_PATH
    
        if indir.is_file():
            shutil.copy(indir, outdir)
        else:
            shutil.copytree(indir, outdir)
    
        # WHEN:
        #    Run prepare on that folder
        output_yaml_path = outdir / (dataset_id + ".yaml")
    
        run_prepare_cli(
            sentinel_l1c_prepare.main,
            "--dataset",
            outdir / DATASET_PATH.name,
            "--dataset-document",
            output_yaml_path,
        )
    
        # THEN
        #     A metadata file is added to it, with valid properties
        #     Assert doc is expected doc
        with output_yaml_path.open("r") as f:
            generated_doc = yaml.safe_load(f)
            del generated_doc["id"]
>       assert expected_dataset_document == generated_doc
E       AssertionError: assert Doc differs in minor float precision:
E            ['properties']['odc:processing_datetime']: 
E                   datetime.datetime(2020, 10, 11, 1, 14, 46, tzinfo=datetime.timezone.utc)
E                != datetime.datetime(2020, 10, 11, 1, 14, 46)
E            ['properties']['datetime']: 
E                   datetime.datetime(2020, 10, 11, 0, 2, 49, 24000, tzinfo=datetime.timezone.utc)
E                != datetime.datetime(2020, 10, 11, 0, 2, 49, 24000)

integration/prepare/test_prepare_esa_sentinel_l1.py:214: AssertionError
_____________________________________________________________________________________ test_sinergise_sentinel_l1 ______________________________________________________________________________________

tmp_path = PosixPath('/tmp/pytest-of-osboxes/pytest-4/test_sinergise_sentinel_l10')
expected_dataset_document = {'$schema': 'https://schemas.opendatacube.org/dataset', 'accessories': {'metadata:product_info': {'path': 'productInfo...00000.0], [600332.7272727273, 6100000.0], [709800.0, 6100000.0], [709800.0, 5990200.0], ...]], 'type': 'Polygon'}, ...}

    def test_sinergise_sentinel_l1(tmp_path, expected_dataset_document):
    
        # GIVEN:
        #     A folder of imagery
        outdir = tmp_path / DATASET_DIR.name
        indir = DATASET_DIR
    
        if indir.is_file():
            shutil.copy(indir, outdir)
        else:
            shutil.copytree(indir, outdir)
    
        # WHEN:
        #    Run prepare on that folder
    
        output_yaml_path = outdir / "test.yaml"
    
        run_prepare_cli(
            sentinel_l1c_prepare.main,
            "--dataset",
            outdir,
            "--dataset-document",
            output_yaml_path,
        )
    
        # THEN
        #     A metadata file is added to it, with valid properties
        #     Assert doc is expected doc
        with output_yaml_path.open("r") as f:
            generated_doc = yaml.safe_load(f)
            del generated_doc["id"]
            from pprint import pprint
    
            pprint(generated_doc)
>       assert expected_dataset_document == generated_doc
E       AssertionError: assert Documents differ:
E            ['geometry']['coordinates'][0][0][0]: 
E                   600000.0
E                != 600332.7272727273
E            ['geometry']['coordinates'][0][0][1]: 
E                   5990200.0
E                != 6100000.0
E            ['geometry']['coordinates'][0][1][0]: ...
E         
E         ...Full output truncated (36 lines hidden), use '-vv' to show

integration/prepare/test_prepare_sinergise_sentinel_l1.py:176: AssertionError

Add provenance capture of local software

Improve error messages for bad/missing packaging data.

Some scenarios which currently have opaque error messages:

Parent datasets are missing or incomplete (eg. no metadata file)
Insufficient metadata detected for a package
Source/target locations are not readable/writable.

eo3-validate doesn't support multiple documents in one yaml file

(I thought it did?)

ODC core contains one of these files in its test data:

$ eo3-validate datacube-core/integration_tests/data/dataset_add/datasets_eo3.ym
...
ruamel.yaml.composer.ComposerError: expected a single document in the stream

Add support for retrieving a `DatasetDoc` from `DatasetAssembler`

I'd like to add support for getting a DatasetDoc from DatasetAssembler. Maybe this could be done with something as simple as:

class DatasetAssembler:
     ...
    def done(..., write_metadata=True): # option to not write metadata to file
        ....
        self.dataset = DatasetDoc(...) # store dataset for user retrieva

What do you reckon @jeremyh?

Add PQA packaging integration tests

A small number of datasets are slow to package, reporting self-intersection errors

Reported by @tebadi

Hi Jeremy, I'm running LCCS and a small number of tiles run for a considerable amount of time, something close to 2 hours while all the other tiles finish within 30 minutes. In all of these tiles I see
Ring Self-intersection at or near point ... and the CPU activity remains minimal. see a sample below (check the timestamps). They all happen when writing level4 data:

021-07-12 04:11:13,772] {verify.py:119} INFO - Checksumming PosixPath('/tmp/lccs/.odcdataset-clwyq2l3/ga_ls_landcover_class_cyear_2_1-0-0_au_2013-01-01_baregrad-phy-cat-l4d-au.tif')
[2021-07-12 04:11:14,188] {env.py:433} WARNING - CPLE_NotSupported in driver GTiff does not support creation option COUNT
[2021-07-12 04:11:14,188] {env.py:433} WARNING - CPLE_NotSupported in driver GTiff does not support creation option WIDTH
[2021-07-12 04:11:14,188] {env.py:433} WARNING - CPLE_NotSupported in driver GTiff does not support creation option HEIGHT
[2021-07-12 04:11:14,188] {env.py:433} WARNING - CPLE_NotSupported in driver GTiff does not support creation option CRS
[2021-07-12 04:11:14,188] {env.py:433} WARNING - CPLE_NotSupported in driver GTiff does not support creation option TRANSFORM
[2021-07-12 04:11:14,188] {env.py:433} WARNING - CPLE_NotSupported in driver GTiff does not support creation option DTYPE
[2021-07-12 04:11:14,432] {verify.py:119} INFO - Checksumming PosixPath('/tmp/lccs/.odcdataset-clwyq2l3/ga_ls_landcover_class_cyear_2_1-0-0_au_2013-01-01_level4.tif')
[2021-07-12 04:13:18,660] {geos.py:252} INFO - Ring Self-intersection at or near point 3467 46
[2021-07-12 04:13:18,667] {geos.py:252} INFO - Ring Self-intersection at or near point 3451 65
[2021-07-12 04:13:18,684] {geos.py:252} INFO - Ring Self-intersection at or near point 3477 111
[2021-07-12 04:13:18,684] {geos.py:252} INFO - Ring Self-intersection at or near point 3351 113
[2021-07-12 04:13:18,687] {geos.py:252} INFO - Ring Self-intersection at or near point 3387 111
[2021-07-12 04:13:18,691] {geos.py:252} INFO - Ring Self-intersection at or near point 3463 123
[2021-07-12 04:13:18,692] {geos.py:252} INFO - Ring Self-intersection at or near point 3484 126
[2021-07-12 04:13:18,697] {geos.py:252} INFO - Ring Self-intersection at or near point 3742 125
[2021-07-12 04:13:18,699] {geos.py:252} INFO - Ring Self-intersection at or near point 3454 132
[2021-07-12 04:13:18,706] {geos.py:252} INFO - Ring Self-intersection at or near point 3374 140
[2021-07-12 04:13:18,715] {geos.py:252} INFO - Ring Self-intersection at or near point 3494 149
[2021-07-12 04:13:18,717] {geos.py:252} INFO - Ring Self-intersection at or near point 3378 155
[2021-07-12 04:13:18,720] {geos.py:252} INFO - Ring Self-intersection at or near point 3368 153
[2021-07-12 04:13:18,725] {geos.py:252} INFO - Ring Self-intersection at or near point 3386 161
[2021-07-12 04:13:18,726] {geos.py:252} INFO - Ring Self-intersection at or near point 3885 162
[2021-07-12 04:13:18,778] {geos.py:252} INFO - Ring Self-intersection at or near point 3955 297
[2021-07-12 04:13:18,790] {geos.py:252} INFO - Ring Self-intersection at or near point 3903 303
[2021-07-12 04:13:18,795] {geos.py:252} INFO - Ring Self-intersection at or near point 3919 319
[2021-07-12 04:13:18,797] {geos.py:252} INFO - Ring Self-intersection at or near point 3936 319
[2021-07-12 04:13:18,803] {geos.py:252} INFO - Ring Self-intersection at or near point 3816 323
[2021-07-12 04:13:18,812] {geos.py:252} INFO - Ring Self-intersection at or near point 3921 342
[...............]
[2021-07-12 04:13:27,081] {geos.py:252} INFO - Ring Self-intersection at or near point 2637 3997
[2021-07-12 04:30:14,013] {verify.py:119} INFO - Checksumming PosixPath('/tmp/lccs/.odcdataset-clwyq2l3/ga_ls_landcover_class_cyear_2_1-0-0_au_2013-01-01.proc-info.yaml')
[2021-07-12 04:30:14,045] {verify.py:119} INFO - Checksumming PosixPath('/tmp/lccs/.odcdataset-clwyq2l3/ga_ls_landcover_class_cyear_2_1-0-0_au_2013-01-01.odc-metadata.yaml')
[2021-07-12 04:30:14,054] {gridded_classification.py:344} INFO -  Exporting Level 4 Classification RGB image
[2021-07-12 04:30:14,332] {gridded_export.py:158} INFO - Saved to /tmp/lccs/2013_21_-30L4_rgb_v-0.5.0.tif
[2021-07-12 04:30:14,373] {process_lccs.py:140} INFO - Uploading to lccs-dev
[2021-07-12 04:30:14,653] {process_lccs.py:157} INFO - tile 21 -30: uploaded 15 files to lccs-devCPU activity remains minimal

The rgb image for the above tile looks like:

Example failure:

Tile id: 21, -30, year 2013
Land cover results: s3://lccs-dev/1-0-0/2013/x_21/y_-30/

Also, this doesn't happen at low resolution [-100, 100]. It only happens when I run at full resolution [-25, 25].

Doing singleband thumbnails will sometimes generate a warning

/env/lib/python3.6/site-packages/eodatasets3/images.py:724: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison

The code triggering it is out_data[index][data == value] = rgb[index], which needs to be fixed.

metadata for Landsat collection not complete - requires updating

PQ:
Relevant snippets defining product:
pq_metadata.yml should be ga-metadata.yml for consistency
software version in pq metadata not populated and lineage from L1T and NBAR need to be included

software_repository: https://github.com/GeoscienceAustralia/ga-neo-landsat-processor.git
  software_version: _!!python/unicode_ '4.1+3.g73458ae.dirty'

NBAR/T:
Relevant snippets defining product:

            eodatasets:
                repo_url: https://github.com/GeoscienceAustralia/eo-datasets.git
                version: 0.2.2+13.gc397b5b
            _nbar: 4.1+dirty_

            lineage:
                algorithm:
                    name: LPGS
                    version: 12.7.0

Level1 lineage looks like this which is good!!! - addressing this issue will aim to maintain consistency in NBAR/T and PQ products:NBAR and PQ should style and build from this example

lineage:
    algorithm:
        name: LPGS
        version: 12.7.0
            eodatasets:
                repo_url: https://github.com/GeoscienceAustralia/eo-datasets.git
                version: '0.4'
            galpgs:
                repo_url: https://github.com/GeoscienceAustralia/galpgs.git
                version: 1.1.1
            gqa:
                repo_url: https://github.com/GeoscienceAustralia/gqa.git
                version: '0.6'
            pinkmatter: 4.1.4084

Setting processed time to "now" is error-prone for users

It's easy to set processing time like this:

p.processed = datetime.now()

But that's wrong! The date has no timezone, so the assembler will assume it's in UTC.

This is technically the users' fault but is an easy mistake to make. Ideally the api should make this mistake less easy.

Our fields use UTC by default because almost every metadata format we read from has dates in UTC (without any timezone).

Some options:

Warn when there's no timezone, so the user wants to include one explicitly.
Add UTC to the parameter name p.processed_utc = ...
Add a "p.processed_now()` function that fills it in for you. This could perhaps fill-in other host information too.

Replace gdal-cli calls with pure-python/rasterio

The cog-writing methods are descended from wagl's code, and use a mixture of both rasterio and the gdal shell commands (gdal_translate and gdalwarp).

This is not ideal.

there's nothing guaranteeing the user has gdal on their path, as @omad has pointed out.
it leads to "double" the memory usage, as the Python process grows its memory usage during rio operations, but this memory is not reused when calling the external commands. (this affected some pbs jobs)

Warn if package is finished without calling `done()`

It's probably an easy mistake to make to not call done() at the end of the an assembler block.

Ortho band metadata doesn't match the files: different case.

The packager reads band names from the MTL, which have an uppercase '.TIF' suffix.

But output files have a lowercase '.tif' suffix. Filenames aren't being translate properly.

Add support for writing Stac Items directly (without conversion)

As suggested by @alexgleith in #98

Extract basic metadata for raw Modis datasets

Document command-line packaging scripts

NBAR bands are sometimes not present in metadata

When defining a new naming convention with 'platform' as an optional field, a default plaform can be accidentally chosen.

Oh wow, with no platform specified at all in that test it's marking it as "ls". I assume we don't want that. It's because the derivate code assumes platform is required.

Originally posted by @jeremyh in #166 (comment)

update packaging for GA Landsat NBAR and NBAR-T

yaml format metadata is now part of production NBAR-T - updates are required to support packaging for NBAR-T

see examples here:

/g/data/v10/testing_ground/nbar-tc/2015-10/output/LS8_OLITIRS_NBAR_P51_GALPGS01-032_115_079_20151001/nbar-metadata.yml

We are targeting both NBAR and NBAR-T products
Eg. /g/data/v10/testing_ground/nbar-tc/2015-10/output/LS8_OLITIRS_NBAR_P51_GALPGS01-032_115_079_20151001/Reflectance_Outputs/
NBAR = reflectance_brdf_X.bin
NBAR-T = reflectance_terrain_X.bin

The ultimate aim is to have the data packaged and ingested to enable production of our first NBAR-T mosaic.

Overview image in generated COGs use default 128 pixel tile size

Default tile size for overview images is still stuck in the 90s, 128x128 pixel.

tiffinfo ga_ls_wo_3_101072_2019-12-13_final_water.tif | grep "Tile Width"
  Tile Width: 512 Tile Length: 512
  Tile Width: 128 Tile Length: 128
  Tile Width: 128 Tile Length: 128
  Tile Width: 128 Tile Length: 128

This is way too small given that our overview images can be as big as 4800x4800 pixels.

Overview tile size is configured via GDAL_TIFF_OVR_BLOCKSIZE= parameter, see here for example

https://github.com/opendatacube/datacube-core/blob/afca22831eecf8092e0c293064a9ea465aa6d778/datacube/utils/cog.py#L144

I think this code should be using write_cog from datacube lib.
https://github.com/GeoscienceAustralia/eo-datasets/blob/05c5e812ecf2e8baaa8a3e33cc55717dd452b46d/eodatasets3/images.py#L440-L448

https://datacube-core.readthedocs.io/en/latest/dev/api/generate/datacube.utils.cog.write_cog.html

Add LS7 packaging integration tests

Record groundstation metadata: Antenna, Channel, Demod

As detailed by @vrooke:

Provide Ground station, Antenna, Channel and Demod used

We can retrieve the antenna, channel number and demod number from the RCC files names and/or packaged directory names for L7

Directory: LANDSAT-7.86231.S3A2C2D4R4, S3 = ASN, A2=5m antenna, C2 = Channel 2, D4 = HDRM4 (Demod 4)

All available options

S (Site) options S1 = ASA , S3 = ASN

A (Antenna) A1 = 9m antenna , A2 = 5m antenna

C (Channel) options C1 = Channel 1 , C2 = Channel 2

D (Demods) options D1 = Demod1-HDRM1 , D2 = Demod2-HDRM2 , D3 = Demod3-HDRM3 , D4 = Demod4-HDRM4

File name: L7EB2015182144955ASN111Q00.data

ASN111 = ASN , Antenna 1, Channel 1, Demod1-HDRM1

All available options

YYYACD, YYY = Site (ASN or ASA), A = antenna, C = Channel (1 or 2) , D = Demod (HDRM1, HDRM2, HDRM3, HDRM4)

Add integration tests for packaging of Landsat datasets

Requirement for rapidjson is not set

Using EODatasets in datacube-alchemist has failed with:

Traceback (most recent call last):
  File "/env/bin/datacube-alchemist", line 5, in <module>
    from datacube_alchemist.cli import cli_with_envvar_handling
  File "/env/lib/python3.6/site-packages/datacube_alchemist/cli.py", line 10, in <module>
    from datacube_alchemist.worker import Alchemist, get_messages
  File "/env/lib/python3.6/site-packages/datacube_alchemist/worker.py", line 26, in <module>
    from datacube_alchemist._utils import (
  File "/env/lib/python3.6/site-packages/datacube_alchemist/_utils.py", line 11, in <module>
    from eodatasets3.scripts.tostac import dc_to_stac, json_fallback
  File "/env/lib/python3.6/site-packages/eodatasets3/scripts/tostac.py", line 14, in <module>
    import eodatasets3.stac as eo3stac
  File "/env/lib/python3.6/site-packages/eodatasets3/stac.py", line 11, in <module>
    import rapidjson
ModuleNotFoundError: No module named 'rapidjson'

Need additional instructions on setting up to perform development

Using current instructions pytest fails with missing imports. Additional steps are needed for pytest to work smoothly ( assumes conda is used to allow pre-compiled GDAL) :

pip install -e .
pip install -e .[test]
conda install gdal
export GDAL_DATA = "where GDAL_DATA went"
pip install eodatasets3[ancillary]
pip install h5py

As of 16/12/2019 running pytest here has the following result:

=== 2 failed, 91 passed, 25 warnings in 7.57s ===

against origin/HEAD

Add validation for minimum required metadata fields

Changes to the BRDF date logic for dataset maturity

We've updated the date logic for BRDF fallback determination. The previous logic was for anything pre 2001.

It has now changed to 2002-07-01.

The test located at:
https://github.com/GeoscienceAustralia/eo-datasets/blob/eodatasets3/tests/integration/test_packagewagl.py#L552

Needs to be updated.

Enforce consistency of Naming Conventions (e.g. GeoTiff/GeoTIFF)

Eo-datasets currently uses both GeoTiff and GeoTIFF internally as defaults, it should be brought into some kind of alignment internally.

If there is a requirement to support both (for historical reasons) this should be configurable at run-time.

Add LS5 packaging integration tests

Add DatasetAssembly support for outputting the `location` field

The eo3 format added the ability to include the location of a dataset inside the metadata, and datacube 1.8.2+ will use this value instead of the metadata's own location when it is provided.

eo-datasets can parse this field, and keeps track of locations, but the DatasetAssembler API doesn't currently have an option to write it into the document.

Ideally we want to add this before people start using the old workarounds in new products, such as using absolute paths in bands (which eodatasets currently warns about but does not provide an alternative to).

NoneType' object has no attribute 'tzinfo'

I am trying to use eo-datasets to generate a dataset document, and then load that data using datacube.
I use the devel branches from both eo-datasets and datacube-core.

I use the following script to generate the dataset document:

from eodatasets3 import DatasetAssembler
from datetime import datetime
from pathlib import Path

lc8 = Path('/home/danlipsa/data/iarpa_smart_sampledata/LC08_L1GT_044034_20130330_20170310_01_T2')
[blue_geotiff_path] = lc8.rglob("L*_B2.TIF")
out = Path(blue_geotiff_path.parent / Path(blue_geotiff_path.stem + '.yaml'))
print("Out: {}", out)

with DatasetAssembler(metadata_path=out) as p:
  p.datetime = datetime(2019, 7, 4, 13, 7, 5)
  p.product_family = "landsat8_example_product"
  p.properties["odc:file_format"] = "GeoTIFF"
  p.processed_now()

  # Note the measurement in the metadata. (instead of ``write``)
  p.note_measurement('blue', blue_geotiff_path,
                     relative_to_dataset_location=True)

  p.done()

which results in the following yaml file:

---
# Dataset
$schema: https://schemas.opendatacube.org/dataset
id: 2828e23a-8501-43ca-9449-fa7427e49614

label: landsat8_example_product_2019-07-04
product:
  name: landsat8_example_product

crs: epsg:32610
geometry:
  type: Polygon
  coordinates: [[[374895.0, 4258485.0], [332775.0, 4080195.0], [334387.96078400686,
        4079775.8375337427], [514935.0, 4036365.0], [515054.10427500436, 4036717.7239312488],
      [515804.18418644054, 4039868.051384181], [534524.1924086247, 4118768.0860084835],
      [557466.2132034355, 4216026.213203436], [556671.7954048398, 4216244.220240811],
      [374895.0, 4258485.0]]]
grids:
  default:
    shape: [7421, 7511]
    transform: [30.0, 0.0, 332385.0, 0.0, -30.0, 4258815.0, 0.0, 0.0, 1.0]

properties:
  datetime: 2019-07-04 13:07:05Z
  odc:file_format: GeoTIFF
  odc:product_family: landsat8_example_product

image:
  bands:
    'blue':
      path: LC08_L1GT_044034_20130330_20170310_01_T2_B2.TIF

accessories: {}

lineage: {}
...

Note that I had to replace

measurements:
  blue:

with

image:
  bands:
    'blue':

Otherwise the dataset does not get added to datacube. After this I can add both the product and the dataset to datacube.
Here is the product:

name: landsat8_example_product
description: Landsat 8 example product
metadata_type: eo

metadata:
    product:
        name: landsat8_example_product
    # Alternatively, include specific items to match
    # properties:
        # eo:instrument: OLI_TIRS
        # eo:platform: landsat-8

measurements:
    - name: 'blue'
      aliases: [band_2, sr_band2]
      dtype: int16
      nodata: -9999
      units: 'reflectance'

I try to show the image using datacube using the following script:

import datacube
import matplotlib.pyplot as plt

dc = datacube.Datacube(app='plot-rgb-recipe')
print(dc.list_products())
print(dc.list_measurements())

query = {
    'lat': (36.46, 38.46),
    'lon': (-124.92, -122.35)
}

data = dc.load(product='landsat8_example_product', measurements=['blue'], resolution=(-30, 30), output_crs="EPSG:32610")
print(data.data_vars['blue'])
data.data_vars['blue'].plot()
plt.show()

however I get the following error:

env) [~/tasks/datacube]$ python show-result-assembler.py 
                        name                description   lon   lat  time format platform creation_time label instrument product_type  crs  resolution  tile_size  spatial_dimensions
id                                                                                                                                                                                   
1   landsat8_example_product  Landsat 8 example product  None  None  None   None     None          None  None       None         None  NaN         NaN        NaN                 NaN
----------------------------------------------------------------------
                                      name  dtype        units  nodata             aliases
product                  measurement                                                      
landsat8_example_product blue         blue  int16  reflectance   -9999  [band_2, sr_band2]
Traceback (most recent call last):
  File "show-result-assembler.py", line 14, in 
    data = dc.load(product='landsat8_example_product', measurements=['blue'], resolution=(-30, 30), output_crs="EPSG:32610")
  File "/home/danlipsa/projects/datacube-core/env/lib/python3.8/site-packages/datacube/api/core.py", line 315, in load
    grouped = self.group_datasets(datasets, group_by)
  File "/home/danlipsa/projects/datacube-core/env/lib/python3.8/site-packages/datacube/api/core.py", line 404, in group_datasets
    datasets = sorted(datasets, key=group_func)
  File "/home/danlipsa/projects/datacube-core/env/lib/python3.8/site-packages/datacube/api/query.py", line 153, in _extract_time_from_ds
    return normalise_dt(ds.center_time)
  File "/home/danlipsa/projects/datacube-core/env/lib/python3.8/site-packages/datacube/utils/dates.py", line 109, in normalise_dt
    if dt.tzinfo is not None:
AttributeError: 'NoneType' object has no attribute 'tzinfo'

Any suggestions of what am I doing wrong? Thanks!

Record groundstation metadata: Demod version and configuration

As detailed by @vrooke:

Provide Demod software and config file version/name

Read the demod config file, LANDSAT-7.86231_D1

... load desktop LANDSAT-7_V0.1.ndt

This provides the name of the demod configuration used for this pass and the version is embedded with the filename.

Upload directory to s3 is incorrect if destination directory exists

This is for branch network-fs-support-newbase, but just to keep track.

When moving temp data from /some/temp_dir/* to s3://bucket/final/destination/ we expect that files from /some/temp_dir/ are copied to s3, but the directory itself is not copied to s3.

https://github.com/GeoscienceAustralia/eo-datasets/blob/075c0e787bc8809d9831afb917de4d4acad7091d/eodatasets3/utils.py#L177

Behaviour of the line above differs depending on whether destination folder exists, at least for s3, but probably for fs as well.

Expected behaviour is equivalent to:

mkdir -p $FINAL_DIR && cp -r $TMP_DIR/* $FINAL_DIR/

Actual behaviour when $FINAL_DIR exists is:

cp $TMP_DIR $FINAL_DIR/

which creates extra directory layer under $FINAL_DIR

Some of the new S2 integration test data is too large

A few of the new integration tests are quite large:

1.9M	tests/integration/data/S2B_OPER_MSI_L1C_TL_EPAE_20180617T013729_A006677_T55JGF_N02.06.AWSPDS.zip
11M	tests/integration/data/wagl-input
16M	tests/integration/data/esa_s2_l1c

These have made the download size for eodatasets quite big. Previous test datasets were only in the kilobytes for similar scenes.

Add TIRS SSM model for Landsat 8 to the Level 1 packaged metadata

One additional ancillary source needs to be added in the yaml file for LS8 product from the main LPGS log file (landsat-processor.log)

ga-metadata.yaml:
……
ancillary_quality: DEFINITIVE
ancillary:
cpf:
……………………..
access_dt: 2016-05-18 18:13:19
modification_dt: 2014-11-26 13:03:32
checksum_sha1: b59e3585757e2a09937a7b8482786c205354807e
ssm:
name: xxxxxxx
uri: /g/data/v10/eoancillarydata/sensor-specific/LANDSAT8/TIRS-SSM/xxxxxxxxx
…….
And, how do you define the field “ancillary_quality” – I assume that if all ancillary = definitive, then DEFINITIVE?

In LS8 case, ssm may not be DEFINITIVE. We could add “TIRS_SSM_MODEL” (from MTL) in yaml file. If TIRS_SSM_MODEL = Preliminary, then ancillary_quality is not DEFINITIVE.

And, how do you define the field “ancillary_quality” – I assume that if all ancillary = definitive, then DEFINITIVE?

In LS8 case, ssm may not be DEFINITIVE. We could add “TIRS_SSM_MODEL” (from MTL) in yaml file. If TIRS_SSM_MODEL = Preliminary, then ancillary_quality is not DEFINITIVE.

Can't use custom `eo:platform` metadata value

I'm trying to create eo3 dataset metadata for some MODIS data with the DatasetAssembler, and I'd like to include whether the data is from Terra or Aqua. The best fit in the metadata definition I've found is the eo:platform property.

But when I try to pass in a platform code, I get a NotImplementedError, raised in:

https://github.com/GeoscienceAustralia/eo-datasets/blob/e07a3e87e6d6990ef3a6045784930bbb49455bfd/eodatasets3/model.py#L337

and likely also in

https://github.com/GeoscienceAustralia/eo-datasets/blob/e07a3e87e6d6990ef3a6045784930bbb49455bfd/eodatasets3/model.py#L356

meaning I can't use any platform name that doesn't start with sentinel-1, sentinel-2 or landsat.

Is there any reason the DatasetAssembler can't just use the user defined platform code if it doesn't match S1/S2/LS ?

Excessive memory usage in NCI jobs

On monitored systems such as NCI, the reported memory usage includes any cached objects. GDAL defaults to using 5% of the available memory.

One suggestion would be to tile the I/O rather than read the whole image at once into memory. This would enable processing of images larger than the available memory, but would require some time to rewrite. Unfortunately, time is not on our side so this approach will have to wait.

A quicker solution for the short term is to change the GDAL_CACHEMAX environment. This can be done temporarily at the time of opening any imagery by:

with rasterio.Env(GDAL_CACHEMAX=64):
    with rasterio.open(<filename>) as src:
        # do stuff

Where 64 indicates 64MB.

opendatacube / eo-datasets Goto Github PK

eo-datasets's People

Contributors

Stargazers

Watchers

Forkers

eo-datasets's Issues

Recommend Projects

Recommend Topics

Recommend Org