Giter Club home page Giter Club logo

opera-sds-pcm's Introduction

opera-pcm

OPERA PCM (Process Control & data Management) Software and Configuration for the Science Data System (SDS)

opera-sds-pcm's People

Contributors

chrisjrd avatar collinss-jpl avatar gshiroma avatar hhlee445 avatar maseca avatar philipjyoon avatar riverma avatar sjlewis-jpl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opera-sds-pcm's Issues

[Bug]: data_subscriber_download_timer throws exception when there is no data to download

Checked for duplicates

Yes - I've already checked

Describe the bug

When I did testing hlsl30_query_timer, hlss30_query_timer and data_subscriber_download_timer, I noticed that data_subscriber_download throws exception as below when hlsl30_query_timer and hlss30_query_timer failed to execute.

Traceback (most recent call last):
File "/home/ops/verdi/ops/hysds-1.1.5/hysds/job_worker.py", line 1237, in run_job
raise RuntimeError("Got non-zero exit code: {}".format(status))
RuntimeError: Got non-zero exit code: 1

What did you expect?

I expected the data_subscriber_download_timer to execute without any error and reported out that there is no data_subscriber_catalog ES index exists or no data to download.

Reproducible steps

No response

Environment

- PCM branch: issue_75 
- Lambda branch: issue_75

[New Feature]: Add all use cases for R1 in the smoke test script

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

PCM needs to have a smoke test script to exercise R1 capabilities.

  • ingest hls l2 data
  • execute l3_dswx_hls_pge
  • ingest l3 ouputs
  • send notification to daac
  • receive notification from daac and update the delivery status
  • query and download using data subscriber script

Refactor terraform install script

Terraform install script should be generalized so that it can perform a cluster provisioning with project specific variables terraform config file.

PCM Core capabilities verification

Verify the PCM core capabilities with NISAR PCM R2 deployment to OPERA dev venue.

Use NISAR PCM Operator Guide and Use Cases documents for details.

  1. Ingest Ancillary/Auxiliary Data
  2. Ingest main data such as L0A, L0B, etc
  3. Produce products from PGEs
    • verify that the pge is executed properly
    • verify that the outputs are generated
    • verify that the outputs are ingested in PCM
  4. Verify that inputs/outputs are copied into correct storage (rolling storage vs long term storage
  5. Deliver data product to DAAC
    • verify that PCM generates CNM-S and notify
    • validate the products
    • catalog
    • handle multiple version
  6. Access data at daac once the data is delivered
  7. Fault Tolerance
    • Detect software failure (PCM or PGE)
    • Detect hardware failure
    • self-healing hardware
    • Identify and resolve processing failures
  8. Monitor SDS
  9. Control SDS
  10. Reports (data processing, data delivery, data accountability)
  11. Metric - latency
  12. Autoscaling - exercise with smoke test script?

Write a report with steps and results

[New Feature]: Use different queue for data download jobs and data_query_timer_trigger_frequency variable for query

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

Data download jobs were using 'opera-job_worker-small' queue which is used by PGE jobs. Data download was not occurring while the PGE jobs are executing.

Describe the feature request

  1. Use different queue for the data download and PGE job execution.

  2. Use data_query_timer_trigger_frequency variable to execute daac_data_subscriber_quey.sh as -m option

Initial design for bulk historical processing

OPERA SDS needs to process 125 months historical data within 12 months period.
We need to have a tool to automate the processing of bulk historical data in addition to the forward processing.
PGE to trigger: L2_CSLC_S1 PGE
Data region to cover: North America
Input data: S1 A/B SLC

Update on L3_DSWx_HLS_PGE integration

PCM ER2 contains initial integration of L3_DSWx_HLS_PGE and PCM.
PCM can trigger the L3_DSWx_HLS_PGE after staging HLSS30 or HLSL30 data into ISL bucket. However, PCM handles one .tif as output and didn't test for many outputs.

L3_DSWx_HLS_PGE plans to produce multiple .tif files.
PCM should be able to ingest all outputs and send notification for each product.

We also need to verity all following steps are working properly.

  1. Invocation of L3_DSWx_HLS PGE
  2. Storage of L3_DSWx_HLS products on rolling storage
  3. Notification of product availability to PO DAAC
  4. Update delivery status of output product upon PO DAAC confirmation of retrieval (may not be done in ER3)

Trigger rule for L3_DSWx_HLS execution

Define a trigger rule to execute L3_DSWx_HLS_PGE

In order to run the PGE with S30, following files are required.

  • For L30 (Landsat) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B05, B06, B07, and Fmask

  • For S30 (Sentinel) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B8A, B11, B12, and Fmask

Manual DAAC query and retrieval script

SDS needs to be able to query various DAACs (ASF DAAC, LP DAAC, or PO DAAC) and retrieve data from them to OPERA ISL location

Following information should be passed to query.

  • product type(s)
  • which DAAC (ASF DAAC or LP DAAC)
  • time range (start date and/or end date)
  • ISL bucket name to download
  • access info to DAACs (optional)
  • download mode (S3-default or http)

Note: we can use NISAR daac retrieval script as reference.
https://github.jpl.nasa.gov/IEMS-SDS/nisar-pcm/blob/develop/tools/sds_daac_retrieval.py

This script will be used by a PCM worker to automatically query and retrieve data into OPERA ISL S3 bucket.

Split DAAC query and retrieval execution

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

Current data-subscriber script queries DAAC CMR and retrieves the data into our S3 bucket within one script.
We should split the querying and retrieving functions in separate scripts so that a querying script can be executed by a factotum worker and a retrieving script can be executed by many verdi workers.

Add query timestamp to data subscriber catalog index and download by oldest timestamp first

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

#58 (comment)

It would be great to add an ingested time field in the data_subscriber_catalog index and then use the ingested time field to download after sorting by asc. Then, we will be able to get a set of files for each granule (tile based) faster to trigger L3_DSWx_HLS pge.

  • Add query timestamp field to data_subscriber_catalog index
  • Download in ascending order on query timestamp field

Split data query timer for HLS S30 and HLS L30

The data_query_timer queries both HLS S30 and HLS L30 types in a single script and it always ingested HLS L30 types first, so the download timer won't download HLS S30 until the HLS L30 data download is done. The query timer needs to be split to two separate times for S30 and L30 types.

Rename data_subscriber timer and shell script

  • data_query_timer -> hlsl30_query_timer & hlss30_query_timer
  • data_download_timer -> hls_download_timer
  • daac_data_subscriber_download.sh -> hls_download.sh
  • daac-data_subscriber_query.sh -> hlsl30_query.sh & hlss30_query.sh

Add following log messages into the end of query log

  • number of files found to download
  • number of files skipped (non-required bands)
  • number of files ingested to download
  • Range of data query: begin time & end time

[New Feature]: Use product_counter for L3_DSWx_HLS output

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

PCM currently generates outputs from L3_DSWx_HLS PGE with product_counter as 1. Therefore, we can't re-submit a PGE job with same inputs unless we clear out L3 outputs. We should be able to increment the product_counter when there are existing L3 outputs.

[New Feature]: Migrate to the Universal AMIs

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

The PCM should be able to use the Universal AMIs used by all projects (not specific to OPERA)

Details are available at https://github.jpl.nasa.gov/IEMS-SDS/nisar-pcm/pull/468#issuecomment-457901

There now exists a Jenkins job that determines the latest patched AMIs and runs a smoke test against it:
https://opera-pcm-ci.jpl.nasa.gov/job/opera-pcm-develop-latest-patched-amis/

[New Feature]: Generate CNM-S msg with multiple tif files from L3_DSWx_HLS PGE

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

We need to generate a CNM-S message with multiple tif files after executing L3_DSWx_HLS PGE.

We also need to be able to set data_version and schema_version from the cnm notify trigger rule.
We need to set different product type for DAAC rather than using ProductType metadata.

[New Feature]: Launch multiple workers by data download timer

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

Current DAAC data download timer submits a job using only one worker to download data from DAAC to our ISL bucket. We need to be able to launch many workers to download data simultaneously.

Benchmark testing for ASF DAAC query/retrieval

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

Repeat of #22 with ASF DAAC data

Describe the feature request

PCM needs to test how fast we can query and retrieve the data from ASF DAAC for preparation of bulk historical processing.

DAAC retrieval timer needs to query HLS L30

Checked for duplicates

Yes - I've already checked

Describe the bug

When I tested data-subscriber-timer, I noticed it only querys HLSS30 type.
We should also query HLSL30 type as well.
I also noticed that ISL bucket name is always 'opera-dev-isl-fwd-mplotkin'. ISL bucket name should be entered by the user.

What did you expect?

I expected to query DAAC for both HLSS30 and HLSL30 by using a user's ISL bucket.
Log message should include the parameters that was passed to execute the script.
It will be better to combine stdout.txt msgs into the log.
Log filename should contain the datatype and log creation day.
ex) HLSS30_daac_data_subscriber
.log

Reproducible steps

1.

Environment

- opera-pcm v1.0.0-er.2.0
- HySDS v4.0.1-beta.8 (CentOS 7) AMIs

Automated DAAC query and retrieval

A worker should query and retrieve data from DAACs (LP DAAC, ASF DAAC, or PO DAAC) to OPERA ISL automatically. Duration of query and retrieval should be configurable.

  1. Query DAAC CMR to get product metadata and URLs
  2. Ingest product metadata returned by CMR query into PCM ElasticSearch catalog
  3. Download data and copy into ISL and update the data status to 'downloaded'

Runconfig generation for L3_DSWx_HLS PGE

Generates a RunConfig for L3_DSWx_HLS PGE. A sample RunConfig will be provided by PGE team.

Input: L2_HLS (HLSS30 or HLSL30)
Anc: DEM. Global Landcover, Global built up land layer

Should be able to test with mock-up L3_DSWx_HLS PGE.
Test data and sample RunConfig file is located at
https://artifactory-fn.jpl.nasa.gov/artifactory/webapp/#/artifacts/browse/tree/General/general/gov/nasa/jpl/opera/adt/r1/interface/test_datasets.tar.gz

Required files:

  • For L30 (Landsat) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B05, B06, B07, and Fmask
  • For S30 (Sentinel) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B8A, B11, B12, and Fmask

L3_DSWx_HLS_PGE data ingestion

Ingest input and output data for L3_DSWx_HLS_PGE. PCM should be able to extract metadata and ingest into PCM catalog (ElasticSearch).

The input dataset is the HLS product version 2.0. HLS products provide surface reflectance (SR) data from the Operational Land Imager (OLI) aboard the Landsat 8 remote sensing satellite and the Multi-Spectral Instrument (MSI) aboard the Sentinel-2 A/B remote sensing satellite.

Input File Naming Convention:
HLS.<product type>.<tile id>.<data acquisition time>.<version>.<band/mask>.tif

product type: S30 or L30
data acquisition time: YYYYDOYTHHMMSS
version: v2.0
band or mask: B01 - B08, B8A, B09, B10, B11, B12, Fmask, SZA, SAA, VZA, VAA
ex) HLS.S30.T17SLU.2020117T160901.v2.0.B01.tif

Output File Naming Convention:
<Project>_<Level>_<ProductType>_<Source>_<Sensor>_<TileID>_<DateTime>_<ProductVersion>.<ext>

If the input is HLS.S30, the Sensor is 'Sentinel2'
If the input is HLS.L30, the Sensor is 'Landsat8'

production date time: YYYYMODDTHHMMSS
ex) OPERA_L3_DSWx_HLS_Sentinel2_T15SXR_20210211T163901_01.tiff

Format: GeoTIFF

[New Feature]: PGE log file should include in a output dataset

Checked for duplicates

Yes - I've already checked

Alternatives considered

No - I haven't considered

Related problems

I'm frustrated when testing PCM ER2 release. I couldn't find any log for the PGE execution, after the PGE executed successfully and the verdi worker is no longer up and running.

Describe the feature request

I want to include a PGE log as output dataset and stage into RS location as other output files do.

[New Feature]: Add spatial information for HLSS30 and HLSL30 products into the ES index

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

We need to store spatial information for HLSL3 and HLSS30 after querying CMR. The spatial information is required to find ancillary and DEM files for triggering L3_DSWx_HLS PGE.

We want to store more metadata into our ES index with the spatial information.

  • umm:DataGranule:ProductionDateTime
  • umm:DataGranule:Identifiers or umm:GranuleUR
  • umm:Platforms:ShortName
  • provider: LPCLOUD, ASFCLOUD, or POCLOUD

hls_spatial_catalog: granuleid, bbox, shortname, production_datatime, index_create_time

Improve data subscriber logging

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

#58 (comment)

I noticed that most log message stored in _stderr.txt rather than _stdout.txt. The logging level was DEBUG, how can we change to lower logging level?

  • Change default logging level to INFO
  • Verify all relevant information is being logged on the INFO level
  • Capture summary of query and download from DAAC at the end of processing (# of files returned by query,# of successful download, # of failed download, # of skipped download, etc)

[New Feature]: On-demand daac data subscriber processing

Checked for duplicates

Yes - I've already checked

Alternatives considered

No - I haven't considered

Related problems

No response

Describe the feature request

PCM should use Tosca UI to submit a job to query and download products from DAACs by entering geographic information and/or time range.

Following fields are needed to fill

  • provider: LP DAAC(LP CLOUD), ASF DAAC, PO DAAC (PO CLOUD?)
  • start datetime
  • end datetime
  • bounding box: optional
  • ISL bucket name

[Bug]: data-suscriber-timer throws an exception and download data <2000

Checked for duplicates

Yes - I've already checked

Describe the bug

  1. When I tested data-subscriber jobs for both HLS S30 & HLS L30 types (issue_50), I noticed the data-subscriber-timer throws exception before finishing the HLS data download.

Traceback (most recent call last):
File "/export/home/hysdsops/verdi/ops/hysds-1.1.5/hysds/job_worker.py", line 1193, in run_job
monitoredRunner.join()
File "/export/home/hysdsops/verdi/lib/python3.9/site-packages/billiard/process.py", line 148, in join
res = self._popen.wait(timeout)
File "/export/home/hysdsops/verdi/lib/python3.9/site-packages/billiard/popen_fork.py", line 57, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/export/home/hysdsops/verdi/lib/python3.9/site-packages/billiard/popen_fork.py", line 33, in poll
pid, sts = os.waitpid(self.pid, flag)
File "/export/home/hysdsops/verdi/lib/python3.9/site-packages/billiard/pool.py", line 229, in soft_timeout_sighandler
raise SoftTimeLimitExceeded()
billiard.exceptions.SoftTimeLimitExceeded: SoftTimeLimitExceeded()

s3://opera-dev-triage-fwd-hyunlee/triaged_job-data-subscriber-timer-20220308T032000_20220308T033000-20220308T033020.402154Z_task-c768baf6-9194-46a6-aea8-c2cd7900ac8d/

  1. It seems like # of results from the CMR query were larger than PAGE_SIZE (= 2000) and there was a warning message.
    Warning: only the most recent 2000 granules will be downloaded; Try adjusting your search criteria.

What did you expect?

  1. I expected that the data-subscriber continued to query and download HLS data into the ISL bucket.

  2. It should log # of results from the CMR query so we know how many products are available to download. We should download data when it's larger than 2000.

Reproducible steps

No response

Environment

PCM cluster using HySDS v4.0.1-beta.8-oraclelinux

Parse a input list file instead of querying CMR

DAAC query and retrieval script should be able to read a file containing list of inputs (URIs) and download the data and stage into the ISL. This capability is for when there is an issue connecting to CMR.

Ancillary/Auxiliary data ingestion for L3_DSWx_HLS_PGE

Add a capability to ingest ancillary/auxiliary products

  • detect anc/aux data from ISL
  • ingest data and store metadata into ES
  • move anc/aux data to LTS (long term storage)
  1. Digital Elevation Model (DEM)
  2. Copernicus Human Settlement Layer (GHSL)
  3. Copernicus Land Cover Layer (CLCL)

Download source information is documented in OPERA DSWX HLS SAS Design Doc.

PCM Core capabilities verification

Verify the PCM core capabilities with NISAR PCM R2 deployment to OPERA dev venue.

Use NISAR PCM Operator Guide and Use Cases documents for details.

Automated DAAC retrieval using S3 URIs instead of https

PCM should be able to download data using S3 URIs into our ISL.
For Release1, we won't have enough time to have direct DAAC S3 -> OPERA SDS S3 setup.

We will proceed to have following data copy setup.
DAAC S3 -> local disk -> OPERA SDS S3

This capability will ensure that PCM can use many verdi workers to retrieve data.

We should only download required bands instead of all available bands and masks.

  • For L30 (Landsat) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B05, B06, B07, and Fmask

  • For S30 (Sentinel) HLS products, we need layers (6 reflectance bands + Q/A mask): B02, B03, B04, B8A, B11, B12, and Fmask

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.