nasa / opera-sds-pcm Goto Github PK

View Code? Open in Web Editor NEW

16.0 10.0 12.0 501.4 MB

Observational Products for End-Users from Remote Sensing Analysis (OPERA)

License: Apache License 2.0

Python 68.21% Shell 7.70% HCL 16.12% HTML 7.87% Dockerfile 0.09%

opera-sds-pcm's Introduction

opera-pcm

OPERA PCM (Process Control & data Management) Software and Configuration for the Science Data System (SDS)

opera-sds-pcm's People

Contributors

Stargazers

Watchers

Forkers

drewmee bosmanoglu riverma python-repository-hub gaybro8777 gshiroma 5l1v3r1 thegreatsimo pierrenapoletano mustafaelms ababyprogrammer yunks128

opera-sds-pcm's Issues

[Bug]: data_subscriber_download_timer throws exception when there is no data to download

Checked for duplicates

Yes - I've already checked

Describe the bug

When I did testing hlsl30_query_timer, hlss30_query_timer and data_subscriber_download_timer, I noticed that data_subscriber_download throws exception as below when hlsl30_query_timer and hlss30_query_timer failed to execute.

Traceback (most recent call last):
File "/home/ops/verdi/ops/hysds-1.1.5/hysds/job_worker.py", line 1237, in run_job
raise RuntimeError("Got non-zero exit code: {}".format(status))
RuntimeError: Got non-zero exit code: 1

What did you expect?

I expected the data_subscriber_download_timer to execute without any error and reported out that there is no data_subscriber_catalog ES index exists or no data to download.

Reproducible steps

No response

Environment

- PCM branch: issue_75 
- Lambda branch: issue_75

[New Feature]: Add all use cases for R1 in the smoke test script

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

PCM needs to have a smoke test script to exercise R1 capabilities.

ingest hls l2 data
execute l3_dswx_hls_pge
ingest l3 ouputs
send notification to daac
receive notification from daac and update the delivery status
query and download using data subscriber script

Refactor terraform install script

Terraform install script should be generalized so that it can perform a cluster provisioning with project specific variables terraform config file.

Data accountability UI update

Update Data Accountability UI

PCM Core capabilities verification

Verify the PCM core capabilities with NISAR PCM R2 deployment to OPERA dev venue.

Use NISAR PCM Operator Guide and Use Cases documents for details.

Ingest Ancillary/Auxiliary Data
Ingest main data such as L0A, L0B, etc
Produce products from PGEs
- verify that the pge is executed properly
- verify that the outputs are generated
- verify that the outputs are ingested in PCM
Verify that inputs/outputs are copied into correct storage (rolling storage vs long term storage
Deliver data product to DAAC
- verify that PCM generates CNM-S and notify
- validate the products
- catalog
- handle multiple version
Access data at daac once the data is delivered
Fault Tolerance
- Detect software failure (PCM or PGE)
- Detect hardware failure
- self-healing hardware
- Identify and resolve processing failures
Monitor SDS
Control SDS
Reports (data processing, data delivery, data accountability)
Metric - latency
Autoscaling - exercise with smoke test script?

Write a report with steps and results

[New Feature]: Use different queue for data download jobs and data_query_timer_trigger_frequency variable for query

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

Use different queue for the data download and PGE job execution.
Use data_query_timer_trigger_frequency variable to execute daac_data_subscriber_quey.sh as -m option

Initial design for bulk historical processing

OPERA SDS needs to process 125 months historical data within 12 months period.
We need to have a tool to automate the processing of bulk historical data in addition to the forward processing.
PGE to trigger: L2_CSLC_S1 PGE
Data region to cover: North America
Input data: S1 A/B SLC

Update on L3_DSWx_HLS_PGE integration

PCM ER2 contains initial integration of L3_DSWx_HLS_PGE and PCM.
PCM can trigger the L3_DSWx_HLS_PGE after staging HLSS30 or HLSL30 data into ISL bucket. However, PCM handles one .tif as output and didn't test for many outputs.

L3_DSWx_HLS_PGE plans to produce multiple .tif files.
PCM should be able to ingest all outputs and send notification for each product.

We also need to verity all following steps are working properly.

Invocation of L3_DSWx_HLS PGE
Storage of L3_DSWx_HLS products on rolling storage
Notification of product availability to PO DAAC
Update delivery status of output product upon PO DAAC confirmation of retrieval (may not be done in ER3)

DAAC query and retrieval

OPERA PCM should be able to query DAAC CMR and retrieve data from S3 or https URLs.

Trigger rule for L3_DSWx_HLS execution

Define a trigger rule to execute L3_DSWx_HLS_PGE

In order to run the PGE with S30, following files are required.

For L30 (Landsat) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B05, B06, B07, and Fmask
For S30 (Sentinel) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B8A, B11, B12, and Fmask

Restore HTTPS download functionality to data subscriber script

[New Feature]: Update AMIs with certs issue fix

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

We want to use the new AMIs (dated April 18) that fix the certs changes. AMIs info is available at https://wiki.jpl.nasa.gov/display/operasds/AMIs.

Manual DAAC query and retrieval script

SDS needs to be able to query various DAACs (ASF DAAC, LP DAAC, or PO DAAC) and retrieve data from them to OPERA ISL location

Following information should be passed to query.

product type(s)
which DAAC (ASF DAAC or LP DAAC)
time range (start date and/or end date)
ISL bucket name to download
access info to DAACs (optional)
download mode (S3-default or http)

Note: we can use NISAR daac retrieval script as reference.
https://github.jpl.nasa.gov/IEMS-SDS/nisar-pcm/blob/develop/tools/sds_daac_retrieval.py

This script will be used by a PCM worker to automatically query and retrieve data into OPERA ISL S3 bucket.

Split DAAC query and retrieval execution

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

Current data-subscriber script queries DAAC CMR and retrieves the data into our S3 bucket within one script.
We should split the querying and retrieving functions in separate scripts so that a querying script can be executed by a factotum worker and a retrieving script can be executed by many verdi workers.

Add query timestamp to data subscriber catalog index and download by oldest timestamp first

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

#58 (comment)

It would be great to add an ingested time field in the data_subscriber_catalog index and then use the ingested time field to download after sorting by asc. Then, we will be able to get a set of files for each granule (tile based) faster to trigger L3_DSWx_HLS pge.

Add query timestamp field to data_subscriber_catalog index
Download in ascending order on query timestamp field

Smoke test script for PCM R1

Write a smoke test script with L3_DSWx_HLS_PGE integration and add to terraform installation.

Test Epic

Split data query timer for HLS S30 and HLS L30

The data_query_timer queries both HLS S30 and HLS L30 types in a single script and it always ingested HLS L30 types first, so the download timer won't download HLS S30 until the HLS L30 data download is done. The query timer needs to be split to two separate times for S30 and L30 types.

Rename data_subscriber timer and shell script

data_query_timer -> hlsl30_query_timer & hlss30_query_timer
data_download_timer -> hls_download_timer
daac_data_subscriber_download.sh -> hls_download.sh
daac-data_subscriber_query.sh -> hlsl30_query.sh & hlss30_query.sh

Add following log messages into the end of query log

number of files found to download
number of files skipped (non-required bands)
number of files ingested to download
Range of data query: begin time & end time

[New Feature]: Use product_counter for L3_DSWx_HLS output

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

PCM currently generates outputs from L3_DSWx_HLS PGE with product_counter as 1. Therefore, we can't re-submit a PGE job with same inputs unless we clear out L3 outputs. We should be able to increment the product_counter when there are existing L3 outputs.

Bulk historical processing

[New Feature]: Migrate to the Universal AMIs

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

The PCM should be able to use the Universal AMIs used by all projects (not specific to OPERA)

Details are available at https://github.jpl.nasa.gov/IEMS-SDS/nisar-pcm/pull/468#issuecomment-457901

There now exists a Jenkins job that determines the latest patched AMIs and runs a smoke test against it:
https://opera-pcm-ci.jpl.nasa.gov/job/opera-pcm-develop-latest-patched-amis/

[New Feature]: Generate CNM-S msg with multiple tif files from L3_DSWx_HLS PGE

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

We need to generate a CNM-S message with multiple tif files after executing L3_DSWx_HLS PGE.

We also need to be able to set data_version and schema_version from the cnm notify trigger rule.
We need to set different product type for DAAC rather than using ProductType metadata.

Deliver OPERA products to DAAC

Deliver the outputs from OPERA PGEs to appropriate DAAC.

Write recovery document for system down

Write recovery procedures when there are system down/failure.
Use https://wiki.jpl.nasa.gov/display/operasds/PCM+Subsystem?preview=/483781733/577281017/NISAR%20PCM%20Disaster%20Recovery%20Guide_R2.docx as reference and modify for OPERA requirements/needs.

[New Feature]: Launch multiple workers by data download timer

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

Current DAAC data download timer submits a job using only one worker to download data from DAAC to our ISL bucket. We need to be able to launch many workers to download data simultaneously.

Benchmark testing for ASF DAAC query/retrieval

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

PCM needs to test how fast we can query and retrieve the data from ASF DAAC for preparation of bulk historical processing.

Instance type benchmarking

PCM installation script

Modify terraform script to install PCM system on DEV/INT/OPS venues.

DAAC retrieval timer needs to query HLS L30

Checked for duplicates

Yes - I've already checked

Describe the bug

When I tested data-subscriber-timer, I noticed it only querys HLSS30 type.
We should also query HLSL30 type as well.
I also noticed that ISL bucket name is always 'opera-dev-isl-fwd-mplotkin'. ISL bucket name should be entered by the user.

What did you expect?

I expected to query DAAC for both HLSS30 and HLSL30 by using a user's ISL bucket.
Log message should include the parameters that was passed to execute the script.
It will be better to combine stdout.txt msgs into the log.
Log filename should contain the datatype and log creation day.
ex) HLSS30_daac_data_subscriber.log

Reproducible steps

1.

Environment

- opera-pcm v1.0.0-er.2.0
- HySDS v4.0.1-beta.8 (CentOS 7) AMIs

Data Accountability UI

Automated DAAC query and retrieval

A worker should query and retrieve data from DAACs (LP DAAC, ASF DAAC, or PO DAAC) to OPERA ISL automatically. Duration of query and retrieval should be configurable.

Query DAAC CMR to get product metadata and URLs
Ingest product metadata returned by CMR query into PCM ElasticSearch catalog
Download data and copy into ISL and update the data status to 'downloaded'

Runconfig generation for L3_DSWx_HLS PGE

Generates a RunConfig for L3_DSWx_HLS PGE. A sample RunConfig will be provided by PGE team.

Input: L2_HLS (HLSS30 or HLSL30)
Anc: DEM. Global Landcover, Global built up land layer

Should be able to test with mock-up L3_DSWx_HLS PGE.
Test data and sample RunConfig file is located at
https://artifactory-fn.jpl.nasa.gov/artifactory/webapp/#/artifacts/browse/tree/General/general/gov/nasa/jpl/opera/adt/r1/interface/test_datasets.tar.gz

Required files:

For L30 (Landsat) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B05, B06, B07, and Fmask
For S30 (Sentinel) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B8A, B11, B12, and Fmask

L3_DSWx_HLS_PGE data ingestion

Ingest input and output data for L3_DSWx_HLS_PGE. PCM should be able to extract metadata and ingest into PCM catalog (ElasticSearch).

The input dataset is the HLS product version 2.0. HLS products provide surface reflectance (SR) data from the Operational Land Imager (OLI) aboard the Landsat 8 remote sensing satellite and the Multi-Spectral Instrument (MSI) aboard the Sentinel-2 A/B remote sensing satellite.

Input File Naming Convention:
HLS.<product type>.<tile id>.<data acquisition time>.<version>.<band/mask>.tif

product type: S30 or L30
data acquisition time: YYYYDOYTHHMMSS
version: v2.0
band or mask: B01 - B08, B8A, B09, B10, B11, B12, Fmask, SZA, SAA, VZA, VAA
ex) HLS.S30.T17SLU.2020117T160901.v2.0.B01.tif

Output File Naming Convention:
<Project>_<Level>_<ProductType>_<Source>_<Sensor>_<TileID>_<DateTime>_<ProductVersion>.<ext>

If the input is HLS.S30, the Sensor is 'Sentinel2'
If the input is HLS.L30, the Sensor is 'Landsat8'

production date time: YYYYMODDTHHMMSS
ex) OPERA_L3_DSWx_HLS_Sentinel2_T15SXR_20210211T163901_01.tiff

Format: GeoTIFF

Inital Data Accountability for L2_HLS and L3_DSWx_HLS

Initial design of data accountability page

retrieved products
generated products

Setup initial CI/CD for PCM Dev

Setup initial CI/CD for PCM Dev on https://opera-pcm-ci.jpl.nasa.gov/

[New Feature]: PGE log file should include in a output dataset

Checked for duplicates

Yes - I've already checked

Alternatives considered

No - I haven't considered

Describe the feature request

I want to include a PGE log as output dataset and stage into RS location as other output files do.

[New Feature]: Add spatial information for HLSS30 and HLSL30 products into the ES index

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

We need to store spatial information for HLSL3 and HLSS30 after querying CMR. The spatial information is required to find ancillary and DEM files for triggering L3_DSWx_HLS PGE.

We want to store more metadata into our ES index with the spatial information.

umm:DataGranule:ProductionDateTime
umm:DataGranule:Identifiers or umm:GranuleUR
umm:Platforms:ShortName
provider: LPCLOUD, ASFCLOUD, or POCLOUD

hls_spatial_catalog: granuleid, bbox, shortname, production_datatime, index_create_time

Improve data subscriber logging

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

#58 (comment)

I noticed that most log message stored in _stderr.txt rather than _stdout.txt. The logging level was DEBUG, how can we change to lower logging level?

Change default logging level to INFO
Verify all relevant information is being logged on the INFO level
Capture summary of query and download from DAAC at the end of processing (# of files returned by query,# of successful download, # of failed download, # of skipped download, etc)

[New Feature]: On-demand daac data subscriber processing

Checked for duplicates

Yes - I've already checked

Alternatives considered

No - I haven't considered

Describe the feature request

PCM should use Tosca UI to submit a job to query and download products from DAACs by entering geographic information and/or time range.

Following fields are needed to fill

provider: LP DAAC(LP CLOUD), ASF DAAC, PO DAAC (PO CLOUD?)
start datetime
end datetime
bounding box: optional
ISL bucket name

[Bug]: data-suscriber-timer throws an exception and download data <2000

Checked for duplicates

Yes - I've already checked

Describe the bug

When I tested data-subscriber jobs for both HLS S30 & HLS L30 types (issue_50), I noticed the data-subscriber-timer throws exception before finishing the HLS data download.

Traceback (most recent call last):
File "/export/home/hysdsops/verdi/ops/hysds-1.1.5/hysds/job_worker.py", line 1193, in run_job
monitoredRunner.join()
File "/export/home/hysdsops/verdi/lib/python3.9/site-packages/billiard/process.py", line 148, in join
res = self._popen.wait(timeout)
File "/export/home/hysdsops/verdi/lib/python3.9/site-packages/billiard/popen_fork.py", line 57, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/export/home/hysdsops/verdi/lib/python3.9/site-packages/billiard/popen_fork.py", line 33, in poll
pid, sts = os.waitpid(self.pid, flag)
File "/export/home/hysdsops/verdi/lib/python3.9/site-packages/billiard/pool.py", line 229, in soft_timeout_sighandler
raise SoftTimeLimitExceeded()
billiard.exceptions.SoftTimeLimitExceeded: SoftTimeLimitExceeded()

s3://opera-dev-triage-fwd-hyunlee/triaged_job-data-subscriber-timer-20220308T032000_20220308T033000-20220308T033020.402154Z_task-c768baf6-9194-46a6-aea8-c2cd7900ac8d/

It seems like # of results from the CMR query were larger than PAGE_SIZE (= 2000) and there was a warning message.
Warning: only the most recent 2000 granules will be downloaded; Try adjusting your search criteria.

What did you expect?

I expected that the data-subscriber continued to query and download HLS data into the ISL bucket.
It should log # of results from the CMR query so we know how many products are available to download. We should download data when it's larger than 2000.

Reproducible steps

No response

Environment

PCM cluster using HySDS v4.0.1-beta.8-oraclelinux

Parse a input list file instead of querying CMR

DAAC query and retrieval script should be able to read a file containing list of inputs (URIs) and download the data and stage into the ISL. This capability is for when there is an issue connecting to CMR.

Exercise resubmitting jobs

PCM should be able to resubmit jobs that is executed previously.

Ancillary/Auxiliary data ingestion for L3_DSWx_HLS_PGE

Add a capability to ingest ancillary/auxiliary products

detect anc/aux data from ISL
ingest data and store metadata into ES
move anc/aux data to LTS (long term storage)

Digital Elevation Model (DEM)
Copernicus Human Settlement Layer (GHSL)
Copernicus Land Cover Layer (CLCL)

Download source information is documented in OPERA DSWX HLS SAS Design Doc.

PCM Core capabilities verification

Verify the PCM core capabilities with NISAR PCM R2 deployment to OPERA dev venue.

Use NISAR PCM Operator Guide and Use Cases documents for details.

Automated DAAC retrieval using S3 URIs instead of https

PCM should be able to download data using S3 URIs into our ISL.
For Release1, we won't have enough time to have direct DAAC S3 -> OPERA SDS S3 setup.

We will proceed to have following data copy setup.
DAAC S3 -> local disk -> OPERA SDS S3

This capability will ensure that PCM can use many verdi workers to retrieve data.

We should only download required bands instead of all available bands and masks.

For L30 (Landsat) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B05, B06, B07, and Fmask
For S30 (Sentinel) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B8A, B11, B12, and Fmask

Schema: https://github.com/podaac/cloud-notification-message-schema

For OPERA:
collection - DSWX_S1
dataversion - 1.0

L3_DSWx_HLS_PGE integraion

CNM-R handler for L3_DSWx_HLS output deliver

Create a handler to update delivery status upon receipt of CNM-R from DAAC. For R1, the DAAC is PO DAAC.

nasa / opera-sds-pcm Goto Github PK

opera-sds-pcm's Introduction

opera-pcm

opera-sds-pcm's People

Contributors

Stargazers

Watchers

Forkers

opera-sds-pcm's Issues

Checked for duplicates

Describe the bug

What did you expect?

Reproducible steps

Environment

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Describe the bug

What did you expect?

Reproducible steps

Environment

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Describe the bug

What did you expect?

Reproducible steps

Environment