Giter Club home page Giter Club logo

nucleus's People

Contributors

dependabot[bot] avatar jordanpadams avatar nutjob4life avatar ramesh-maddegoda avatar sjoshi-jpl avatar tloubrieu-jpl avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nucleus's Issues

As a user, I want to monitor a nucleus workflow execution

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I can know

  • where in the pipeline is currently executing
  • what/if something has failed
  • when it has completed successfully
  • how to see the failure reports from the specific tools

๐Ÿ“– Additional Details

  • Capture validate and other tool logs in a consistent fashion
  • Monitoring User Guide

โš–๏ธ Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

Develop Cost Model

๐Ÿ’ก Description

Sub-tasks necessary in order to benchmark and develop cost model for Nucleus deployments per number of apps in the pipeline

Develop Logging and Monitoring Strategy

๐Ÿ’ก Description

We need a consistent approach to logging and monitoring baseline EN components used in Airflow, including a clear and succinct error handling guide for users to know how to track down failures in the pipeline.

Sub-tasks:

Work with IMG to connect to S3 Bucket with MESSENGER data

๐Ÿ’ก Description

> aws s3 ls --no-sign-request s3://asc-pds-messenger
                           PRE MESSDEM_1001/
                           PRE MSGRMDS_1001/
                           PRE MSGRMDS_2001/
                           PRE MSGRMDS_3001/
                           PRE MSGRMDS_4001/
                           PRE MSGRMDS_5001/
                           PRE MSGRMDS_6001/
                           PRE MSGRMDS_7001/
                           PRE MSGRMDS_7101/
                           PRE MSGRMDS_7201/
                           PRE MSGRMDS_7301/
                           PRE MSGRMDS_8001/
  • Document how we can replicate "external" S3 buckets into a staging bucket

Initial Prototyping and Implementation

๐Ÿ’ก Description

Initial implementation as POC.

The POC will implement this simple workflow:

  • pre-condition: scalable harvest runs as a service
  • some data is pushed on s3
  • validate is trigger
  • if validate is successful, data is ingested with harvest-cli

The candidate solution tested are:

  • Apache AirFlow
  • CWS and/or Camunda Community Edition (depending on when CWS will be upgraded with latest Camunda)

Develop Support For Off-Nominal Data Loads

๐Ÿ’ก Description

Need to build out the designs for off-nominal behavior in archive management:

  • Acceptance of validation errors
  • Acceptance of validation warnings
  • "floating" files that do not contain labels, but we want in the archive anyways
  • Re-run data that has failed unexpectedly due to system failure
  • Re-run data that has been edited in the cloud (includes support for how someone could inspect and update data in AWS)
  • Remove data, and reload from the ground

As a user, I want to estimate the cost for a custom/adapted workflow

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Manager

๐Ÿ’ช Motivation

...so that I can adapt or create new workflows in nucleus and know how much it will cost

๐Ÿ“– Additional Details

  • Adaptation Cost Model
  • Cost Modeling User Guide: Adaptation

โš–๏ธ Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

As a user, I want to know the baseline architecture and deployment

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I can understand what Nucleus is, the baseline tools and services included in the deployment, and what this is all for.

๐Ÿ“– Additional Details

  • Nucleus Intro User Guide

โš–๏ธ Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

As a user, I want to configure a nucleus pipeline

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I can customize the baseline Nucleus deployment for my needs. e.g. how the data will be delivered, or any particular tool config options

๐Ÿ“– Additional Details

  • Provide the ability to configure nucleus baseline components
  • Configuration User Guide

โš–๏ธ Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

Evaluate the self-managed airflow on AWS EKS

The Amazon Managed Workflows for Apache Airflow (MWAA) has some ongoing costs due to the always up and running Airflow UI. There is no option available to suspend the MWAA environment when it is not in use.

This task is to evaluate the self-managed airflow on AWS EKS as an alternative to MWAA to reduce the operating cost of Nucleus.

B13.0 Initial Design and Trade Study

See sub-tasks for details, but this initial phase of PDS Nucleus will look into the overall design of the pipeline as well as an initial trade study to determine existing technologies to accomplish this task.

Deliverables:

  • Preliminary SRD
  • Preliminary set of user stories
  • Trade Study

Here are some background docs / diagrams that were the origins of this pipeline idea.

As a data provider, I want to deliver cloud-native data to Nucleus

Checked for duplicates

No - I haven't checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

data provider

๐Ÿ’ช Motivation

...so that I can trigger Nucleus with data delivery by a data provider in the cloud

๐Ÿ“– Additional Details

We want to support this use case in a couple ways

  1. PDS node data provider - in place S3 connection and ingestion where we do not need to copy the data into our cloud storage, we just want to execute the pipeline against the data and leave it where it is
  2. Mission/other data provider - we will want to replicate the data in PDS cloud storage for long term preservation and trigger the pipeline that way (most likely a new requirement on DUM or some DUM-equivalent tool)

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

Tentatively adding this towards initial CSS migration, but may need to be punted depending on how those efforts go.

As a user, I want to add a new component to the baseline nucleus workflow

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I can add my own custom data processing tools into the baseline nucleus pipeline

๐Ÿ“– Additional Details

  • Adaptation User Guide

โš–๏ธ Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

Evaluate options to backup and restore the metadata database and logs of MWAA

The Amazon Managed Workflows for Apache Airflow (MWAA) has some ongoing costs due to the always up and running Airflow UI. There is no option available to suspend the MWAA environment when it is not in use.

This task is to evaluate options to backup and restore the metadata database and logs of MWAA to reduce the operating cost of Nucleus.

As a user, I want to estimate the cost for a new deployment

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Manager

๐Ÿ’ช Motivation

...so that I can accurately measure how much it will cost for a baseline Nucleus deployment

๐Ÿ“– Additional Details

  • Baseline Nucleus Cost Model
  • Cost Modeling User Guide

โš–๏ธ Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

As a user, I want to monitor the pipeline cost

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Manager

๐Ÿ’ช Motivation

...so that I can ensure I am within my budget / cost estimate

๐Ÿ“– Additional Details

  • Cost Monitoring User Guide

โš–๏ธ Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

Evaluate the self-managed airflow on AWS EC2

The Amazon Managed Workflows for Apache Airflow (MWAA) has some ongoing costs due to the always up and running Airflow UI. There is no option available to suspend the MWAA environment when it is not in use.

This task is to evaluate the self-managed airflow on AWS EC2 as an alternative to MWAA to reduce the operating cost of Nucleus.

Evaluate self-managed airflow with ECS and Fargate for Nucleus

The Amazon Managed Workflows for Apache Airflow (MWAA) has some ongoing costs due to the always up and running Airflow UI. There is no option available to suspend the MWAA environment when it is not in use.

This task is to evaluate the self-managed airflow with ECS and Fargate as an alternative to MWAA to reduce the operating cost of Nucleus.

Workflow Manager Utility

๐Ÿ’ก Description

Currently if we want to develop a new workflow for Nucleus, we have to complete the following steps.

  1. Create and push the docker image to ECR
  2. Create an EFS volume to share data/configuration files between docker containers (optional, only if the use case must share data/configuration between tasks)
  3. Create an ECS task (This can be done through AWS Console or Terraform)
  4. Implement a DAG (workflow) primarily using the Airflow ECS Operator
  5. Upload DAGs to S3 bucket

We have to execute above multiple steps because different PDS docker images have different setups. For example, some of the PDS docker images need various environment variables and others need various volumes to be mounted. As a result, we cannot use a common template.

This problem is there regardless of the workflow tool we use (not specific to Airflow). With any workflow tool, we will have to maintain different setups for different docker images (for example, when we used docker compose, we had different settings for each components in docker compose file).

Implementing and uploading the DAG is easy if the PDS components are already available as ECS tasks. If we plan to automatically generate Terraform scripts, then we have to discuss,

  1. What are the INPUTS that we get from users to generate these scripts?
  2. How should those INPUTS looks like?
  3. Should we maintain a catalog of docker components that they have to select from?
  4. Should we have a UI to select docker components required for a workflow and after making the the selection a way to generate and execute the scripts?

A Workflow Manager Utility can be solution to address above issues.

As a user, I want to notified when actual/projected cost is reaching estimated budget

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Manager

๐Ÿ’ช Motivation

...so that I can know when our budget is approaching or passing our estimated budget.

๐Ÿ“– Additional Details

  • Trigger at actual cost >= 80%
  • Trigger at projected cost > 100% budget
  • Trigger at actual cost >= 100% budget

Acceptance Criteria

  • Open up AWS Costs and Reporting
  • Verify Budget Report is setup to trigger on budget guidelines noted above.
  • Verify email is being sent to [email protected]

โš™๏ธ Engineering Details

Analyze best way for DAG steps to share data

๐Ÿ’ก Description

Options are EFS, S3, other databases

That could managed as:

  • a constraint on the DAG steps implementation, e.g. must support S3 to read/write interfaces
  • generic DAG's component running during the execution and made available, somehow to the other components
  • specific DAG's running environments providing these specific component interfaces to the other components
  • ...

A good place to start is to get inputs from PODAAC on how they do that in Cumulus.

As a user, I want to deploy a baseline nucleus automatically.

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Sys Admin

๐Ÿ’ช Motivation

...so that I can automatically deploy a new baseline nucleus instance with minimal manual intervention.

๐Ÿ“– Additional Details

  • Terraform scripts for deploying in AWS (deploy.sh?)
  • Docker container for local deployment (It was decided to use MWAA on AWS and a local deployment is not applicable with MWAA)
  • Document any manual requirements

โš–๏ธ Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.