zooniverse / theia Goto Github PK

View Code? Open in Web Editor NEW

3.0 12.0 2.0 1.76 MB

Building the next-generation Floating Forests pipeline

Dockerfile 0.23% Python 38.25% Shell 0.19% HTML 1.29% CSS 24.47% JavaScript 35.56%

data-import

theia's People

Contributors

Stargazers

Watchers

Forkers

amy-langley trellixvulnteam

theia's Issues

reject images that contain all water or no water

floating forests only cares about coastline images. detecting coastlines is hard, but maybe the answer lies in a) pixel_qa data that estimates water vapor or b) a simple neural net to look at the rgb histograms from #16

images with no water or images that are only water shouldn't be uploaded

more logging

service should just do more logging in general so we can look at the kube logs to see what it's doing

integration with USGS EROS

locate potential scenes for download according to user criteria

Design object model

Needs to describe image acquisition request, including search criteria, target project, and configuration options.

django secret key in environment instead of config

Login button on homepage obscured by footer

Desired Behavior: When visiting the Theia homepage, I can click the "Connect to Zooniverse" button to login/authenticate.

Current Behavior: The footer component of the webpage obscures the "Connect to Zooniverse" button and I cannot click it to follow the login link. Note: if I use dev tools to hide the footer component, I'm able to click the button and follow the link.

Screenshot:

Device Info: Chrome 89 on Mac OSX 10.14, also tested and see same behavior for Firefox 87.

make sure tests are comprehensive

Reduce 16-bit-depth TIFFs to 8-bit-depth

Related to #13

interaction with queueing system works

Meaningful unit testing for image actions

mocking parts of the system has been difficult, read this

http://www.developersite.net/question-335023

Limit celery retries

Currently celery tasks are retried forever, which has the potential for a ton of noise. It's possible to specify pretty easily how many retries a task should get and what kind of back-off period there should be before retrying

create basic django app

Issue panoptes credentials for app and configure container

Issue doorkeeper auth
Configure kube secrets

extract GIS data from imagery if possible

social-auth backend for panoptes

here are some Paw requests that might help

I had to create an issue because you can't upload files to the wiki

Paw.zip

project coastlines onto an image

Get Landsat 7, Landsat 4/5 images working

They should be pretty much done but there may be a little additional work needed in the adapter to get them looking right.

Upload entire directory

Since tiles are an entire directory full of images we need a way to process all of them in a stage, especially for uploading them all

CSRF protection for oauth login links

When using oauth login links we have to be careful to avoid allowing activation of the oauth login process without ensuring the request originated by a known logged in user.

This is a recent exploit that was raised in rails land via omniauth/omniauth#809

A mitigation would be a CSRF validation via a POST method to the social auth routes before redirecting to the upstream social auth provider. Depending on what your application does with the upstream user data it may be a vector for account take over. I assume in this app it won't be, most likely changing the logged in user at worst but something to keep in mind with oauth flows.

extract histograms from images for analysis

Build out docker container

If we want to be able to deploy this in EC2 then it should be containerized so it can run on kubernetes.

do not run production in debug

slice images

Determine projects accessible by authenticated user

kubernetes config

build config file to load theia containers into the kube

update static assets

Current behavior: The current app pulls static images (for logos, etc) via Wordpress upload URLs (e.g., https://chelseatroy.com/wp-content/uploads/2020/07/nasa-partner.png).

Desired behavior: Preference would be to add these images as static assets bundled with the app (e.g., in /static dir?), but also could move these to Zoo-hosted blob storage (e.g., https://static.zooniverse.org/assets/zooniverse-icon-web-black.png).

dispose of old working directories after a while

create subject sets and add subjects in panoptes

jenkinsfile for automated builds

Integrate oauth stuff with rest-framework

Right now anyone who can access the /api routes can do configuration for any project without any kind of authentication or authorization. It'd be ideal to get django-rest-framework to play nice with django-social-auth but this has been quite a struggle.

logging

add logging statements
configure greylog or other provider
errors to rollbar

Get running in local kube

github repo integrations

need to integrate with:

jenkins
travis
something to run linter
require tests to pass before merging

integrate with sentry for error logging

Technology selection

Since we're going to tentatively build this pipeline out in Python, we need to select Python versions of our familiar tools:

ORM (Rails)
Job queue (Sidekiq)
REST (Faraday)
Oauth
Unit tests (rspec)

As well as tools for handling some novel challenges:

Image processing
GIS processing

REST API

Necessary operations:

authenticate
create request
get request status
cancel request

kubernetes secrets

Create kubernetes secrets for staging and production API keys

[Security] Bump Pillow to >= 9.0.0

Multiple security updates have been tagged over the past few months; however, dependabot isn't opening an automated PR. These are found in this repo under Security - Dependabot alerts - Pillow. Info for the most recent bump is found here, and they are all moderate vulnerabilities.