agpipeline / issues-and-projects Goto Github PK

This project forked from terraref/computing-pipeline

Repository for issues and projects

License: BSD 3-Clause "New" or "Revised" License

Shell 6.61% Python 61.15% Jupyter Notebook 24.38% C++ 1.28% MATLAB 1.31% R 0.71% JavaScript 3.08% Dockerfile 1.30% HTML 0.19%

issues-and-projects's People

Contributors

Stargazers

Watchers

Forkers

julianpistorius

issues-and-projects's Issues

IRODS ingestion of gantry data

Task to do

Set up and document an automated process for ingesting gantry data into CyVerse Data Store (IRODS)

Reason

Ease of access from VICE (for workbench/data exploration)
Close to CyVerse & UA computational resources

Result

Scripts and configuration necessary for setting this up
Document (as text file in the code repository where the automation & logic end up)
Monitoring & alerting (at least documenting what success/failure looks like)

Steps to take

Julian to talk to Tony, Edwin and Max to see what's involved in getting this done, and then add next steps below

Make the 'left' and 'right' processing a loop

Two questions: can we make this so that it doesn't need left and right, if that makes sense; is there a way to make the left and right processing a loop (if both are needed here)?

Originally posted by @Chris-Schnaufer in #46

Redirect makeflow pull requests to AgPipeline

Provide technical documentation for base template-lidar-plot

Task to do
Create technical documentation for template-lidar-plot and add it to the Organization web site

Reason
Documenting technical approach to repo, and what's provided by code is helpful for developers

Result
Technical documentation on the Lidar plot-level template is available

Note:

starting with the existing template-rgb-plot technical documentation would most likely shorten the timneframe needed to produce the documentation

Create base plot-level Lidar Docker image to be used with the template-lidar-plot derived repos

Task to do
Create the base image used as with the template-lidar-plot repo. This is similar to the rgb-plot-base-transformer folder, except for Lidar [try the following link if the previous one doesn't work and browse to the a branch that has the folder name: https://github.com/AgPipeline/drone-pipeline-transformer]

Reason
As part of making writing Lidar plot-level algorithms easier, a base image is needed to provide the context the algorithm runs in

Result
Ability to create new Docker images that can be used as the basis for plot-level Lidar algorithms

Steps to take

Create a new folder, next to common-image in ua-gantry-transformer repository to hold files
Add configuration.py file and fill in with correct information
Add transformer.py file that provides the support needed for the lidar plot-level template
Add Dockerfile, requirements.txt, and packages.txt files used to build image for Lidar
Create README.md file and fill in to document the code
Create docker image that can be used
Integrate TravisCI to test add_parameters(), check_continue(), and perform_process() functions only (don't add tests for other methods since they'll probably be moved; what's not moved will be added to TravidCI later), and to test Docker image

Note:

Clone the rgb-plot-base-transformer repo to get a quick start on what's needed
There should be quite a bit of overlap between the RGB and Lidar repos - the overlap should be put into a library; which is a separate issue

Add ability for transformers to download files after check_continue() call

Task to do
Enable the download of data after check_continue() call returns with an indication that data can be downloaded.

Reason
For some environments it's better to delay downloading data until conditions are right. This issue is to enable that functionality

Result
The environment is able to download data when needed

Use ua-gantry-pipeline template-rgb-plot to develop new version of Canopy Cover

Task to do
Redevelop Canopy Cover algorithm to use plot-level template

Reason
Reduces the specialized code that needs to be maintained

Result

Focussed canopy cover algorithm
Testing out new template-rgb-plot code
Demonstrates common algorithm sourcing and a possible implementation strategy

Steps to take

Create new repo to hold derived algorithms
Submodule current plot-level canopy cover (https://github.com/Chris-Schnaufer/canopy-cover)
Create Dockerfile to create new image
Create any other scripts to assist Dockerfile
Test new Docker image
Make new image available

Develop written tutorial for creating a transformer

Create a tutorial-code repo
Create rgb-plot-transformer folder
Create a transformer that leverages Clariessa's work
Create docker container for tutorial and push to dockerhub

Proposal to evolve transformer architecture

Task to do
Write a proposal and plan to modify the transformer architecture

Reason

We want to balance reliability, reproducibility, and developer ergonomics of the pipeline transformers

Result

A lightweight architecture decision record checked into AgPipeline/computing-pipeline to capture the requirements, reasoning, solution, and implementation plan
ZenHub epic and GitHub issues for the implementation plan

Steps to take

Summarize and justify requirements for transformers (david starts this) #152
Give reasons why current solution doesn't meet all of these = generalize to explanation of our decisions, analysis of trade-offs.
Write a draft proposal to evolve the current solution so it meets all the requirements
- migrate this to wiki; summarize etc
- dig up other stuff too.
- why dag workflows? (see gist) vs message queue
Create an iterative plan with steps to implement the proposal
Get approval from David for proposal and the plan to implement it
Create epic with issues for the plan

Create video tutorial showing transfer of Jupyter Notebook code to transformer

Merge plot-level canopy cover from Chris Schnaufer's repo to AgPipeline repo

Task to do
Due to class requirements, the current canopy cover code was kept intact. Development happened in Chris Schnaufer's UA account. This needs to be merged back into AgPipeline organization.

Reason
Common source for calculating canopy cover

Result
AgPipeline canopy cover will be plot-level based and common with Drone Pipeline

Get with someone from cyverse about setting up a "master email."

Meet with Chris Martin and see about how to make an "admin" email account to be in charge of repos and the travis ci deployment for now as well as future proofing.

Write basic test cases for ua-gantry-transformer and integrate with TravisCI

Upgrade jupyter notebook to jupyter lab

Jupyter Lab seems to be the future of jupyter and code notebooks so it may come up that everything we do currently in regular jupyter notebook should work with jupyter lab.

This shouldn't really be a problem since both use the same file, labs simply have more user facing functionality but it's good to keep this in mind.

A small RGB dataset to test pipeline

Task to do
See #46 (comment)

Would it be possible to provide a small dataset that could be used to run this? It could be available on Google Drive under the account that Jorge setup in .tar format?

Reason

To allow easy testing of the pipeline.

Result

One or more sample data sets (different sizes) available from a URL (e.g. Google Drive or CyVerse DE public link)

USERID should get its value from the jx-args.json file so that it can be configured

USERID should probably get its value from the jx-args.json file so that it can be configured

Originally posted by @Chris-Schnaufer in #46

Review all transformer's README and code to ensure typos are fixed and return types are consistent

Task to do
Review transformers for typos and incorrect return type declarations

Reason
Remove any spelling errors and/or declaration issues

Result
Better code and READMEs

Process captured RGB data as it arrives through to canopy cover & save data products

Task to do
Pre-production testing of process through final analysis step and store resulting data products

Reason
Ensure environment is setup to correctly process data and all data products are available. Allows follow on testing of plot-level analysis with Urbana generated data

Result
Able to access final analysis results to compare against current system. Able to compare intermediate data products against Urbana generated data.

Fix problems with travis test integration

ua-gantry-transformer needs some dependencies that are proving troublesome to install
base-docker-support needs a little more input for some tests
template-transformer-simple might be missing certain elements in the base repo

I'll have to go over this stuff with Chris to figure out what needs figuring out.

Create video showing CyVerse DE Jupyter Notebook transformer example

Changes to initial cut of tests for base-docker-support/base-image

Task to do
Change some tests for better coverage

Reason
Changes to help expand testing in the future and for better coverage

Result

Run testing code through pylint (using rc file at root of Organization-info repo
Place testing files into a test folder as possible (eg: base-image/test)

test_entrypoint.py:

test_handle_result(): change 'print' to 'file' and perform a file check

test_transformer.py:

transformer.check_continue() actually returns a tuple (template needed updating)

Port laz/ply extractors: las2height

Task to do
Port las2height ~~and panicle_detection~~ extractors to transformers (for Eric's class as well)

Reason
Giving higher precedence to these so they can be used by the class

Result
Dockerized transformers

Ensure all captured data at ua-mac field site are delivered to UA

Task to do
All data captured at the Maricopa field site is shipped to UA.

Reason
This not only is a necessary first step but also allows longer term testing of transfers before entering production mode, determination of space requirements, and other logistics.

Result
Data captures regularly are available for UA processing

Steps to take

Data is transferred to UA
Data is appropriately stored with no overwrites or deletions in a manner allowing easy discovery
Data is available for processing

Create a test cases for new extractor template

Task to do
Create unit and functional tests for the non-docker-image and docker-image template

Reason
Preparation for integrating with TravisCI

Result
A fully tested extractor template

Review ply2las TERRA REF extractor

Review ply2las extractor and compare to work Max is doing

https://github.com/terraref/extractors-3dscanner/tree/master/ply2las

Migrate canopy-height transformer to use new template-lidar-plot template as starting point

Task to do
Use template-lidar-plot template repo as the basis for the canopy height transformer

Reason
Much less code to maintain and standardization of plot-level transformers

Result
Updated canopy-height transformer

Convert base image container in base-docker-support to use Python3.7

Task to do
To support python 3.7 change the links in /usr/bin for Python3 to point to Python3.7. Also add a link for python to run python 3

Reason
Python 3.7 has additional features that are useful for timestamp conversions and other benefits

Result
Scripts will run Python3.7 by default (and not 3.6)

Command to add to Dockerfile

ln -sfn /usr/bin/python3.7 /usr/bin/python3
ln -sfn /usr/bin/python3.7m /usr/bin/python3m
ln -sfn /usr/bin/python3 /usr/bin/python
Create Pull Request

I would like the instructions for extractor development to be in a useful place

Currently the process for extractor dev is here https://docs.terraref.org/developer-manual/developing-clowder-extractors

It includes a standard structure for documenting individual extractors in a readme - this can be a template in the extractor templates. There is lots of other stuff to review and delete, modify, and put somewhere.

Please update and incorporate into agpipeline docs / wiki etc.

Update rgb-plot-base-transformer and lidar-plot-base-transformer to use new library

Task to do
Integrate common-code library developed earlier into rgb-plot-base-transformer and lidar-plot-base-transformer repos and remove common code now provided by library

Reason
Less code to maintain

Result
Smaller code repo's to support

Put the version number of the docker images to use in the jx-args.json file

What do you think of putting the version number of the docker images to use in the jx-args.json file?

Originally posted by @Chris-Schnaufer in #46

Determine minimum required transfer rate from Cache Server

Based on data transfer rates to date, it is clear that we don't need the 1 Gigabit line that we currently have and costs thousands / month. What is the minimum transfer rate from MAC to the internet that we need so that we can keep up with historical data generation rates?

Discuss w/ JD and Sean Stevens, then let Matt Rahr know requirements

Fix problem when no TIF file specified to process

The current behavior or issue
The code crashes if no tif file is specified on command line.

The steps taken to reproduce the behavior or issue, or specify a location where the steps were recorded
To reproduce, run the code with no TIF file specified on the command line

Expected behavior
An warning gets reports and the code doesn't crash

Add other supporting information that may be useful
https://github.com/AgPipeline/transformer-soilmask/blob/6ec902e26a2d0bc2b33a4ef8aeb2ae051b8bec86/transformer.py#L363

Completion criteria
The code doesn't crash

Move code to retrieve file's EPSG code to Transformer class

Task to do
The current code makes a call directly to terrautils to get the EPSG code. Moving this to the Transformer class allows the removal of a dependency and introduces flexibility benefiting the Drone pipeline through common code

Reason
Moving the functionality allows greater flexibility in providing a common code solution

Result
The dependent code doesn't directly call into terrautils allowing the Transformer class to better provide for its environment

Move common code for rgb-plot-base-transformer and lidar-plot-base-transformer to library

Task to do
Move common code to a library that's published

Reason
Common sourcing plot-level code allows easier updating of dependent applications and Docker images

Result
A published library containing common code for plot-level algorithm bases

Steps to take

Move common code to a separate library repo
Create appropriate build environment (adding setup.py, etc.) and build library
Publish library
Create README.md that clearly defines the goals and details of library (such as what online library repos are supported, etc)
Integrate with TravisCI to run pylint on code

Makeflow pipeline steps should not call out to external databases

The current behavior or issue

Currently the agpipeline/cleanmetadata and agpipeline/canopycover steps call out to an external database (BETYdb at Illinois).

This will cause reliability, scalability and reproducibility problems.

Expected behavior

Every run of a pipeline (same code and input) should be deterministic and idempotent.

There can be a 'stateful' wrapper around a deterministic core which talks to external systems.

Completion criteria

(DRAFT SOLUTION)

An initialization step (pre-workflow-workflow?) to generate text (JSON) files or databases (sqlite or pglite)
Use these generated files as one of the inputs to the pipeline (along with current input like image files, etc.)
Modify steps which call out to external systems for input: Take the local files created above as their input
Modify steps which push results out to external systems: Produce local files as output
A finalization step (post-workflow-workflow?) to push the local generated result files to one or more external systems.

See https://github.com/terraref/workflow-pilot for inspiration.

Finish makeflow workflow for drone processing pipeline

Task to do
Finish makeflow workflow for Drone Processing Pipeline

Reason
Migration from Clowder-only workflow

Result
Able to leverage the Cyverse environment

Steps to take

Finish workflow .jx file
Test workflow
Commit workflow to repo

Write basic test cases for template-transformer-simple and integrate with TravisCI

Integrate extractor template with TravisCI

Task to do
After tests are built, integrate with TravisCI

Reason
Able to automatically test changes to extractor template

Result
Full integration with TravisCI and ability to make Pull Requests dependent upon successful CI runs

Create ua-gantry-pipeline template-rgb-plot repo

Take the drone pipeline template-rgb-plot code and develop ua-gantry-pipeline equivalent.

Tasks:

Create a ua-gantry-compatible version of the drone pipeline code
Common source (library or submodule) the common code in a way to separate the RGB specific functions from the (future) Lidar, FLIR, etc specific code
Update/provide documentation for ua-gantry-pipeline implementation

Convert unit tests for base docker support to pytest

Take over responsibility for cache server

Need to coordinate with Sean Stevens at UA

Review TERRAREF cache server handoff docs
Document these protocols
Add scripts to github repository with restricted access
Automate cache clearing

[11/27/2019 - Chris S: I added 'with restricted access' to github repository line]

Come up with plan to move TERRA REF data archives from NCSA to UA GDrive

Need to archive data from the Storage Condo and Nearline tape at UIUC on GDrive using Globus transfer.

Coordinate with Sean Stevens to set up a plan. Allocation on Storage Condo officially ends in December (?) and tape in March of next year, so we need to transfer ~1PB of the zipped archived files by then.

AgPipeline Tutorials

Create ua-gantry-pipeline template-lidar-plot repo

Task to do
Create the code structure needed for, and create repo, for simple plot-level lidar algorithms

Reason
Provides simplified interface for implementing plot-level lidar transformers

Result

algorithm developers will have minimal work to do to create plot-level lidar transformers
lower maintenance costs associated with lidar analysis algorithms through common code
able to quickly prove out new algorithms

Steps to take

create common code to base template-lidar-plot algorithms on (similar to rgb-plot-base-transformer)
create template repo for template-lidar-plot

Modify DPP workflow to add metadata to Clowder instance

Task to do
Enable loading metadata to Clowder

Reason
Leverage Clowder search capabilities

Result
Metadata is available in Clowder

Steps to take

Determine what data to store in Clowder (work with David)
Add ability to create Clowder datasets (in spaces and collections) to hold metadata to workflow environment
Add metadata upload to workflow

Port ply2las to AgPipeline template format (once that's ready)

Task to do
Port the ply2las extractor to AgPipeline template

Reason
Test out the new template for extractors & provide an extractor for another sensor

Result
A docker container for the new extractor

Add a base docker image override to configuration.py for template

Task to do
Add another variable to configuration.py that the generate_docker.py script can use as the base image

Reason
Currently there's only a command line parameter override that allows the base image to be changed when generating Dockerfile. This will allow a more permanent change that won't rely on command line parameters being remembered

Result
The variable will allow the override of the base image for Dockerfile, and still allow the command line override to take precedence

Steps to take

Update the template-transformer-simple repo to and and use the new variable
Document this variable and how it interacts with the command line parameter
Propagate update to ~~ua-gantry-transformer~~ and other template repos
Create Pull Request

Move canopy height transformer to template-lidar-plot based transformer

Task to do
Use the new transformer-lidar-plot repo as the basis for canopy height analysis

Reason

reduces the overhead of maintaining canopy-height transformer
proves out the base image
provides demonstration of specialized templated transformers

Result
Specialized image for calculating canopy height

Create a template-lidar-plot repo and populate it

Task to do
Create a Lidar plot-level template similar to template-rgb-plot template except dealing with Lidar data.

Reason
Enables plot-level Lidar algorithm developers to easily create working workflow components

Result
Have a repository that can be used as a template to create new workflow algorithms

Steps to take

Create algorithm_lidar.py template file (filled in)
Create generate.py executable script that's used to create Dockerfile and supporting files
Create testing.py executable script to be used when testing algorithm
Obtain plot-level lidar files, ZIP them together and place on google drive along side the RGB testing files
Create HOW_TO.md file with detailed instructions on how to use new template file
Let AgPipeline owner know when new repository is created so correct permissions can be applied to repo
Integrate into TravisCI: use testing.py script to validate default algorithm using ZIP of Lidar data (include pylint, etc as well)

Notes:

cloning the template-rgb-plot repo to use as a starting point is probably the fastest approach to finishing this
the generated Dockerfile should reference the Lidar specific base image (created through different ticket(s))

Create Jupyter Notebook on how to develop a transformer

Review how other groups have done this (see slack message)
Show how to develop a transformer using Jupyter notebook so that it's copy and paste to actual transformer.
Create a docker instance for CyVerse DE app with resulting notebook

agpipeline / issues-and-projects Goto Github PK

issues-and-projects's People

Contributors

Stargazers

Watchers

Forkers

issues-and-projects's Issues

Recommend Projects

Recommend Topics

Recommend Org