kids-first / kf-api-study-creator Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 9.15 MB

📂 Powers investigator-driven data staging. Backend for Data Tracker app

Home Page: https://kids-first.github.io/kf-api-study-creator

License: Apache License 2.0

Dockerfile 0.07% Shell 0.12% Python 99.73% HTML 0.08%

django graphene graphql python

kf-api-study-creator's People

Contributors

Stargazers

Watchers

kf-api-study-creator's Issues

Fetch and store Ego public key for jwt validation

Currently all jwts are assumed to be valid.
We should instead be getting the public key used to validate them from ego.
The public key is available from /oauth/token/public_key on ego.

Put tagging on uploaded objects in s3

Object tags containing at least the app name, kf_id, and date created should be added to files uploaded in s3.

Grant permission to all queries for development

We should remove authentication middleware for the development configuration to allow developers to use /graphql un-restricted.

Add file download endpoint

Files uploaded as in #9 and stored in s3 as in #10 need to be downloaded by users.
This may need to be a separate endpoint such as /data as GraphQL does not support file transfers.

Endpoint will be: GET /download/study/[studyId]/file/[fileId]?token=[token]
Where studyId refers to the study that the file belongs to
fileId is the internal file identifier
token is ego jwt that must be in the studyId group to download
This will download the latest version of the file.

GET /download/study/[studyId]/file/[fileId]/version/[version]?token=[token]
This will download the given version of the file.

Split settings module

The settings.py module should be split up into different files so that we may configure the application depending on what deployment environment we're working in.

creator/settings.py should become:

creator/settings/dev.py
creator/settings/test.py
creator/settings/prd.py

Add Investigator/StudyContact Node

Each study should connect to a some sort of owner/user/investigator (I'll leave the naming up to you) node. This will allow us to group by and manage studies by investigator.

Use case query:

{
    allStudies {
      edges {
        node {
          id
          kfId
          name
          investigator
        }
      }
    }
  }

Automatically sync with dataservice in deployment

The entrypoint.sh should sync studies with dataservice so that every time a container is run.

Update documentation with docker usage and upload/download features

Update documentation with:

What are the setting differences for docker among production, development, and testing
How to run docker on development mode with DATASERVICE data as PRELOAD_DATA
How to upload/download files using APIs

and fix minor typos

Add permission check to file downloads

The GET /download/study/<study_id>/file/<file_id> endpoint should allow a file download to occur if the user belongs to the study_id group, or has an ADMIN role. Should return 403 otherwise.

Users will have to specify their token in the Authorization header as usual during the get request to verify their identity. This means that blind sharing of the file urls will not be possible, but will instead have to occur through some interface that handles sending of the token.

Fetch study node by kfId

Right now you can only fetch a single study via it's node id, ideally we would like to be able to get a study via it's kfId as that is what would be used in the url for the single study view, something like /study/<kfid>/files

Add mutation to create study in dataservice

Add an on_create hook on the study model to send a request to the dataservice to create a new study.

This should trigger both a new object in the dataservice and an new bucket for the study.

Move to postgres

We should switch to using postgres under the hood sooner than later to avoid any possibility for adding models that are not compatible between the two.

Add fields from dataservice to study

The below are the fields on the dataservice's study model. These should also exist on the study model in the study creator:

{
  "attribution": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001168.v1.p1",
  "created_at": "2018-05-22T21:12:42.999818+00:00",
  "data_access_authority": "dbGaP",
  "external_id": "phs001168",
  "kf_id": "SD_9PYZAHHE",
  "modified_at": "2018-05-22T21:12:42.999823+00:00",
  "name": "Genomic Studies of Orofacial Cleft Birth Defects",
  "release_status": "Pending",
  "short_name": "Orofacial Cleft: European Ancestry",
  "version": "v1.p1",
  "visible": true
}

Anyone can download from downloadUrl

It seems anyone can download using urls returned in a files downloadUrl field. This should only be allowed for requests containing a valid JWT of a user that belongs to the study that the file is part of.

Files are created even when upload fails

When an S3 error occurs, the file object is still created. although the object is not. The error message for uploading the file from boto is also returned. This should instead return a standard 'problem uploading' error message and not create a new object.

Add corsheaders package

We should add the django-cors-headers module to allow preflight requests and add CORS headers to responses.

For now, it's probably ok to allow all origins and we can limit it further once it's deployed.

Add mutation to modify files

A mutation to modify file descriptor fields needs to be added so that file properties may be modified after they have been uploaded.

Add authorization to file upload

A user should only be able to upload a file if the file is being uploaded to a batch in a study that they belong to.

Add django-dotenv package

Our deployments are dependent on loading variables from the environment. Using django-dotenv will make this easier and allow us to configure the application from various .env files and override them directly with variables in the environment.

We may wish to only do this or #24 as they may address the same issue.

Split settings file

The settings file should be split out into development, testing and production to allow different configuration based on environment.

Implement authentication for users with ego

Users should be able to pass their ego JWT in the Authorization header as a Bearer token.
When a request with a token comes in, it should be validated against ego's public key to ensure the token was issued by ego.

The user's role and groups should be populated on the user's context in the Django request in an authentication middleware for future authorization.

Downloads of files with spaces in the name are not correct when downloading

The file being downloaded is named Screen Shot 2019-02-18 at 9.42.43 AM.png but the download returns only Screen:

Add s3 storage backend for uploads

Files uploaded through graphql in #9 need to be uploaded to the proper s3 study bucket.
We should use the django-storages module to support S3 uploads for this.

Files uploaded to the api should be placed in:

s3://kf-study-us-east-1-{env}-sd-XXXXXXXX/source/uploads

Write up approach/operations spec for how to handle study files

@baileyckelly commented on Tue Sep 18 2018

Notes from Sprint Planning:

How are we tracking changes to files?
Which files do we distribute?
How do we communicate the changes we made?
- Long term: There is an upload tool.
Original files or original files that have been updated because we’ve been told things are wrong.
- Upload everything - only mark the ones we used as visible = true?
- No Duplicates uploaded.
Folder Names:
- Have them separated on S3 Original vs Modified.
Review some study specific use cases
Shipping Manifests are included in this.
Should we have a Working directory for derived files?
Need to get them into the dataservice and in QA so OICR can start working.
How do we handle study files when we have multiple years in one study file?
What are the ACLs on these files?
- From Allison: The other thing is consent codes, iirc once you have any dbgap access you have access to the DS files, so that would mean we just put all of them on the study files - but need to double check

Support expected value counts

More clarification is needed about what expected values are required and optional and whether this is a dynamic set of values.

Add file download url to file schema

File types should expose a download url field in their schema that will point to the download url where they may be downloaded.

Add permission check to allFiles query

Need to check the user's permission to view files to sort out which files to return in the allFiles query.

The allFiles query should return only files that are in a study that a user is in the group of.

Refactor download views into one

The download and download latest views can be combined into one view to prevent from writing the same download logic twice.

Construct method to sync studies with dataservice

Write manage command to sync the study creator's study objects with those that exist in the dataservice.

This should be run with ./manage.py syncdataservice during the entry point, or on demand.

This command will scrape the /studies endpoint of the dataservice and insert any studies that exist in the dataservice, but not in the study creator.

Set up Jenkins pipeline

Need to add a service type 1 jenkins file.

Update Upload page on sphinx site

Upload instructions are outdated now. Modify to:

Curl example
^^^^^^^^^^^^
.. code-block:: bash

       curl localhost:8080/graphql \
      -F operations='{ "query": "mutation ($file: Upload!, $studyId: String!) { createFile(file: $file, studyId: $studyId) { success } }", "variables": { "file": null, "studyId": <study kf id> } }' \
      -F map='{ "0": ["variables.file"] }' \
      -F 0=@<your filepath>

Return downloadUrl for file versions

The Object schema should return a downloadUrl field for that specific version of the file, similar to the downloadUrl on the File object.

Add permission check for allVersions query

Need to check the user's permission to view files to sort out which objects to return in the allVersions query.

The allVersions query should return only objects that are in a study that a user is in the group of.

Change files to relate to studies

The Batch concept still needs to be better defined, but it's leaning towards being more of a selection of entities created from a study's files.

This makes files being directly related to a study more natural, so the Files entity should point directly to a Study rather than a Batch.

Add versions download endpoint

Currently, only the latest file version may be downloaded from the /download endpoint. We should support specifying what version to download through an endpoint:
GET /download/study/<study_ids>/file/<file_id>/version/<version_id>.
This will have the same authorization mechanism as the /download by file endpoint.

Add state field to file

Add state field to Version that is an enumeration of one of the below:

Pending Review
Changes Needed
Approved
Processed

The state flow will look something like the following, although it won't be enforced:

              +> Changes Needed       +> Changes Needed
               |                       |
               |                       |
Pending Review +------------> Approved +-------------> Processed

Add file to createFile query

The createFile query should return the new file returned in it's schema.

Add mutation for batches

Need a mutation for batches so that users may create new batches through the api.

Rename sample manifest to shipping manifest

The ('SAM', 'Sample Manifest') enum on the file_type field in the File model should be changed to ('SEQ', 'Sequencing Manifest).

Rename FileEssence to File

We should rename FileEssence to simply File

Remove CSRF from /graphql endpoint

/graphql does not need CSRF checks on it. We can exclude the middleware on the route in our urls.py.

Cleanup docker entrypoint for dev

Instead of using the base Dockerfile and installing test dependencies in the entrypoint.sh, it may be better to create a second Dockerfile.dev that includes these dependencies already installed and runs a different entrypoint to either pre-populate with mock data or from the data service.

Set up CircleCI

CircleCI should be set up to run tests for PR status checks.

Suggestion - Make it easier to dev w dockerized dataservice

I wanted to use my KF dataservice docker container with the KF study creator API container so that I could preload studies from my dataservice.

Not sure if this is the best approach, but I added the kf-data-stack user-defined network to both of the docker compose files for the dataservice and the study creator. Then I just set PRELOAD_DATA to http://<dataservice web server container name>

Maybe also add something to the sphinx docs about using your dockerized dataservice

Create initial version of api schema

Create a graphQL api with a first draft data model and mock data for evaluating use cases.

Change file primary key to uuid

The the primary key, id, on the File model is currently an incremental integer.
This should be changed to a unique uuid type to cause less confusion when evaluating a url.
The id field should also be renamed to uuid so that it will not conflict with the graphql id field.

Change object primary key to uuid

The the primary key, id, on the Object model is currently an incremental integer.
This should be changed to a unique uuid type to cause less confusion when evaluating a url.
The id field should also be renamed to uuid so that it will not conflict with the graphql id field.

kids-first / kf-api-study-creator Goto Github PK

kf-api-study-creator's People

Contributors

Stargazers

Watchers

kf-api-study-creator's Issues

Recommend Projects

Recommend Topics

Recommend Org