Integration and testing activities and related artifacts
cmr_survey has been deprecated. opera-sds-pcm daac_data_subscriber.py now has survey feature built-in.
License: Apache License 2.0
This ticket is for updating the VnV Matrix for R2 Testing.
Requirement: "Demonstrate the that PCM system can successfully ingest ancillary files "offline", and use these files during operations."
Things to do:
o Verify that static data for DEMs and Land Cover is stored on S3 buckets
o Submit a PGE via standard procedure
o Verify that the PGE uses static data by looking at the RunConfig and job log file
Edge Case Test List
R2-RC9: (3 granules) that failed. Retest to make sure these passes.
Additional List of granules of edge cases (high latitude, anti-meridian) to be provided (Steven/Rosanna).
NO static layer products for these
Make sure to set ASG max of cnn send job queue to be 0. ASF is not yet ready to receive our products.
Use the following script to trigger jobs using text file list of granules:
https://github.com/nasa/opera-sds-ops/blob/r2-accountability/trigger_from_list.sh
While testing noted multiple issues with the runconfig file:
I do not find the location to update:
landcover_file: input_dir/TBD.tif
worldcover_file: input_dir/TBD.tif
I have executed script with ignoring those options. getting the following errors:
hysdsops@opera-int-mozart-pop1:~/mozart/ops/opera-pcm/tools/OPERA-SDSVnV-13_2$ docker run --rm -u
Running preprocessor for RtcS1PreProcessorMixin
Traceback (most recent call last):
File "/home/rtc_user/opera/pge/base/base_pge.py", line 96, in _validate_runconfig
self.runconfig.validate()
File "/home/rtc_user/opera/pge/base/runconfig.py", line 142, in validate
yamale.validate(pge_schema, runconfig_data, strict=strict_mode)
File "/home/rtc_user/miniconda3/lib/python3.9/site-packages/yamale/yamale.py", line 43, in validate
raise YamaleError(results)
yamale.yamale_error.YamaleError: Error validating data '/home/conda/runconfig/rtc_s1_sample_runconfig-v2.0.0-er.5.0.yaml' with schema '/home/rtc_user/opera/pge/base/schema/base_pge_schema.yaml'
RunConfig.Groups.SAS.runconfig.groups.input_file_group.burst_id: Length of [] is less than 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/rtc_user/opera/scripts/pge_main.py", line 187, in
pge_main()
File "/home/rtc_user/opera/scripts/pge_main.py", line 183, in pge_main
pge_start(run_config_filename)
File "/home/rtc_user/opera/scripts/pge_main.py", line 157, in pge_start
pge.run()
File "/home/rtc_user/opera/pge/base/base_pge.py", line 740, in run
self.run_preprocessor(**kwargs)
File "/home/rtc_user/opera/pge/rtc_s1/rtc_s1_pge.py", line 56, in run_preprocessor
super().run_preprocessor(**kwargs)
File "/home/rtc_user/opera/pge/base/base_pge.py", line 179, in run_preprocessor
self._validate_runconfig()
File "/home/rtc_user/opera/pge/base/base_pge.py", line 105, in _validate_runconfig
self.logger.critical(
File "/home/rtc_user/opera/util/logger.py", line 407, in critical
raise RuntimeError(description)
RuntimeError: Validation of RunConfig file /home/conda/runconfig/rtc_s1_sample_runconfig-v2.0.0-er.5.0.yaml failed, reason(s):
Error validating data '/home/conda/runconfig/rtc_s1_sample_runconfig-v2.0.0-er.5.0.yaml' with schema '/home/rtc_user/opera/pge/base/schema/base_pge_schema.yaml'
RunConfig.Groups.SAS.runconfig.groups.input_file_group.burst_id: Length of [] is less than 1
hysdsops@opera-int-mozart-pop1:~/mozart/ops/opera-pcm/tools/OPERA-SDSVnV-13_2$
Please see the ticket for details:
https://cae-testrail.jpl.nasa.gov/testrail/index.php?/tests/view/3996156&group_by=cases:title&group_order=asc&group_id=-1
We need methods to capture some metrics to support our load testing goals. In some cases, there may be existing tools we can leverage already available in the PCM or by TPS tools, but in other cases, we may need to write custom scripts for metric collection.
The above have to be manually calculated by hand currently.
There are likely tools already available to capture all the above. Likely, we just need to create a script which can obtain the information above and post into a CSV file for easy plotting.
Some suggested resources to evaluate:
Demonstrate that the PCM system can successfully query CMR for available input products for all PGEs (from current and previous releases), and download them from LP.DAAC to the SDS Input Storage Location (ISL). Do this for different time range selections, including the latest products (to support FWD mode) and some time interval in the past (to support "catch-up" mode).
Verify that the SDS adheres to the external interfaces with the DAACs as specified in the ICD agreement, specifically:
o The SDS identifies its input products by querying the ESDIS CMR (Common Metadata Registry)
o The SDS downloads its input data products using the Cumulus download service deployed at the source DAAC
TestRail link:
https://cae-testrail.jpl.nasa.gov/testrail/index.php?/tests/view/3351557&group_by=cases:section_id&group_order=asc&group_id=69570
Yes - I've already checked
Yes - and alternatives don't suffice
No response
We've been making significant progress towards being able to execute the PGE smoke test by a single script invocation:
https://github.com/nasa/opera-sds-int/blob/main/r2_smoketest/run_r2_smoketest_validation.sh
Two more changes need to be made:
Verify that the SDS software stack can be deployed and operated in multiple AWS environments (aka "venues"): DEV for developers, I&T for Integration and Testing, CalVal for product validation, and OPS for operations.
TestRail link: https://cae-testrail.jpl.nasa.gov/testrail/index.php?/tests/view/3718320&group_by=cases:section_id&group_order=asc&group_id=69570
We now have this process automated. Just clone this repo and run this script https://github.com/nasa/opera-sds-int/blob/main/r2_smoketest/run_r2_smoketest_validation.sh
with the following parameters:
final_0.5.2 final_1.0.1 opera-int-rs-fwd 2.0.0
The naming convention of the gold files changed this release so the script needs to be modified a little bit
L3 requirement:
Inspect the data products generated by each PGE to verify that they cover the spatial and temporal extent specified in the project data generation table. Also, capture PCM execution metrics to verify product accountability (i.e. that all expected products are actually generated) and latency (i.e. that products are generated within the required time).
The instructions in step 24 should be updated to better reflect what the tester will actually see when conducting the test. The instructions look like they are left-over from Release 1, and didn't get updated.
PST requires that we carry over the GRQ ES content into new deployment every time. This must be performed BEFORE PST cluster is brought down. Use following instructions:
https://wiki.jpl.nasa.gov/display/operasds/ElasticSearch+Backup+and+Restoration
In order to kick off load testing (as well as future I&T automated tests) we'll need a "starter script" that should kick off the first step in our OPERA SDS PCM processing pipeline: querying the input DAACs for products.
crontab
file that lists all the possible queries for the PCM to invoke the DAAC's CMR for products to begin downloading/ingesting/processingL3 requirement:
Verify that the PGE for generation of L3_DSWX_HLS products, wrapping the final version of the corresponding SAS algorithm, is delivered to PCM. Test that the PGE can be successfully executed using the RunConfig file specified in the PGE/PCM ICS.
Verify that all products generated by the L3_DSWX_HLS PGE contain a version number in their file name. Verify that the version number is incremented if and only the checksum of the output data product is changed. In other words, generating the same product twice with exactly the same software should NOT result in a version increment.
Verify that the DSWX_HLS PGE is accompanied by a corresponding Interface Specification Document (ICS) which should contain the complete information needed by the PCM system to invoke and monitor the PGE, including:
o The input and ancillary data sources
o The format of the RunConfig file that encapsulates all the iinput parameters
o The set of EC2 machines that the PGE should be running on
o The success and error status code
o Any other relevant information
Inspect the output location of a PGE execution to verify that it contains files with quality metrics information, with fields specific to each PGE
Verify that DAAC Metadata is correctly produced for DWSX_HLS products, and that this is done for each DSWX_HLS product.
Inspect the data products, log files and metadata generated by the PGE and verify that all timestamps use the Universal Time Coordinate (UTC) and include the "Z" to denote the Greenwich time zone.
TestRail link: https://cae-testrail.jpl.nasa.gov/testrail/index.php?/tests/view/3351586&group_by=cases:section_id&group_order=asc&group_id=69570
JPL wiki: operasds/PST+R2-RC10+RTC+CalVal+Processing
Run the DSWx-S1 pipeline in FWD mode, using the latest data being generated by the RTC-S1 pipeline. We should not try to deliver to the DAAC at this point, but we should have the system generate the CNM messages for inspection.
The test only needs to run long enough for multiple products to complete. This could be as short as a few hours, and not longer than an overnight run. We want to see the system apply the trigger/selection rules, kick off the relevant jobs, generate the products, and handle the resulting output correctly.
For this test, follow the V&V instructions here: TestRail: E2E Tests for DSWx-S1 (excluding anything related to DAAC delivery or archival - we will test those later).
Each time this test is run, it will be important to record the exact datetimes when it started & stopped. This will let us query the DAAC afterwards as an independent verification of the products it should have seen, and should have produced.
VnV Activity:
"Verify that the system can retain files (input products, output products, and dynamic ancillary data) in SDS storage. The duration of storage should be configurable in the system (without need for redeploying). Verify that files are purged at the end of their duration."
Issue: PGE execution is throwing an error for the L2_CSLC_S1:
Please see the details here:
Split this into multiple tickets depending on what disruption is simulated
DSWx-HLS
RTC-S1
CSLC-S1
Requirement: "For each PGE (from current and previous releases), verify that uploading the proper inputs to the designated Input Storage Location will trigger a succesfull PGE execution, followed by PCM moving all generated output to the designated Output Storage location. Includes inspecting the job logs to verify successful job execution.
Inspect the output location of a PGE execution to verify that it contains files with quality metrics information, with fields specific to each PGE
Inspect the latency metrics (one for production time, and one for retrieval time for FWD processing) reports generated by the PCM system, and ensure that they contain all relevant information to asses whether generation of each OPERA products meets the product latency requirement for that product.
Inspect the accountability metrics generated by the PCM system, and verify they contain all neecssary information to asses that all expected output products are generated, for each PGE."
VnV Activity:
"Verify that the SDS can retrieve and ingest input products for all available PGEs (for current and previous releases) automatically within the time allotted by OPERA-SDS-44 and OPERA-SDS-45.
"
Repeat Issue #33 but with static_layer turned on. Before running this must perform the following:
Run on INT-FWD
static_layer or non-static or both???
Verification activity:
"Demonstrate that the PCM system can successfully retrieve and ingest dynamic ancillary files needed for PGE execution."
Make sure to set ASG max of cnn send job queue to be 0. ASF is not yet ready to receive our products.
L3 Requirement:
Verify that the standard data products are available to designated DAACS as defined in ICD. Use the CMR query service to locate the products of interest, and Cumulus data services to download the data.Execute these tests for each OPERA data products, and for multiple time intervals and geographic selections.
Verify that the SDS adheres to the external interfaces with the DAACs as specified in the ICD agreement, specifically:
o The SDS uses CNM ("Cloud Notification Mechanism") to notify the target DAAC that new output prodcuts are available for archiving
Verify that the DAAC metadata is delivered to the DAAC with the product.
TestRail case: https://cae-testrail.jpl.nasa.gov/testrail/index.php?/tests/view/3678197&group_by=cases:section_id&group_order=asc&group_id=69570
Steps to Reproduce:
Open page: https://github.com/orgs/nasa/teams/opera-sds/repositories
you will notice this page requires login & does not show public:
Expected Result: Landing page need to be open sourced and does not require login to access the repositories: Please see the following link for the test results:
Requirement: "Verify that all relevant information for the DSWX_HLS data product (description of science algorithm, validation datasets, test results, etc.) is available from the same DAAC where the data products are archived."
To Do:
o Define list of all documents that need to be archived at the DAACs- the DAAC that will host the data products
Follow these steps to toggle static layer generation
https://wiki.jpl.nasa.gov/display/operasds/Making+changes+to+settings.yaml+on+Mozart
Requirement: "Demonstrate that system operations will begin within 1 month of the completion of product validation, for each major release"
To do:
o Record the data when CalVal ends (for each PGE)
o Record the data when Operations begin (for the same PGE)
Verify that the difference is <= 1 month
We need to clear out Rolling Storage Input folder and probably also the slc_catalog entries so that we can process the same granules with static_layer turned on.
Verification Activity:
Verify that the interface may allow for control of the operational system, including:
Dates TBD
Requirement: "Verify that OPERA SDS has passed the NASA A&A process to obtain the ATO (Authorization To Operate)."
To do:
Simply record the data when ATO is given to OPERA, and provide a link to the official document. The ISO will be notified and will notify everybody else.
L3 Requirement:
Verify that the SDS software repositories are hosted on the public GitHub, and that other tools (forums, tickets, etc.) are available for Open Source development.
TestRail case: https://cae-testrail.jpl.nasa.gov/testrail/index.php?/tests/view/3551351&group_by=cases:section_id&group_order=asc&group_id=69570
Demonstrate that the PCM system can successfully query CMR for available input products for all PGEs (from current and previous releases), and download them from LP.DAAC to the SDS Input Storage Location (ISL). Do this for different time range selections, including the latest products (to support FWD mode) and some time interval in the past (to support "catch-up" mode).
Verify that the SDS adheres to the external interfaces with the DAACs as specified in the ICD agreement, specifically:
o The SDS identifies its input products by querying the ESDIS CMR (Common Metadata Registry)
o The SDS downloads its input data products using the Cumulus download service deployed at the source DAAC
TestRail link: https://cae-testrail.jpl.nasa.gov/testrail/index.php?/tests/view/3718400&group_by=cases:section_id&group_order=asc&group_id=69570
Run the DSWx-S1 pipeline in On-Demand mode, running a predefined set of jobs. The list of jobs to use can be found here [link coming shortly]. We should not try to deliver to the DAAC at this point, but we should have the system generate the CNM messages for inspection.
For this test, follow the V&V instructions here: TestRail: E2E Tests for DSWx-S1 (excluding anything related to DAAC delivery or archival - we will test those later).
Make sure to set ASG max of all cnn send job queues to be 0. ASF is not yet ready to receive our products.
Steps:
Access the following link :
https://cae-testrail.jpl.nasa.gov/testrail/index.php?/tests/view/3989252&group_by=cases:title&group_order=asc&group_id=-1
Update the test results accordingly
We should put together the I&T/V&V Report for Release 1. This will include updating the VnV Matrix for R1 testing.
Updating the VnV Matrix for R1:
What the report needs to contain (can be a doc, or slides):
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.