Giter Club home page Giter Club logo

kf-api-study-creator's People

Contributors

alubneuski avatar bdolly avatar blackdenc avatar chris-s-friedman avatar dankolbman avatar dependabot[bot] avatar fiendish avatar gsantia avatar jgnieuwhof avatar shanewilson avatar snyk-bot avatar xuthebunny avatar znatty22 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kf-api-study-creator's Issues

Add file download endpoint

Files uploaded as in #9 and stored in s3 as in #10 need to be downloaded by users.
This may need to be a separate endpoint such as /data as GraphQL does not support file transfers.

Endpoint will be: GET /download/study/[studyId]/file/[fileId]?token=[token]
Where studyId refers to the study that the file belongs to
fileId is the internal file identifier
token is ego jwt that must be in the studyId group to download
This will download the latest version of the file.

GET /download/study/[studyId]/file/[fileId]/version/[version]?token=[token]
This will download the given version of the file.

Split settings module

The settings.py module should be split up into different files so that we may configure the application depending on what deployment environment we're working in.

creator/settings.py should become:

  • creator/settings/dev.py
  • creator/settings/test.py
  • creator/settings/prd.py

Add Investigator/StudyContact Node

Each study should connect to a some sort of owner/user/investigator (I'll leave the naming up to you) node. This will allow us to group by and manage studies by investigator.

Use case query:

{
    allStudies {
      edges {
        node {
          id
          kfId
          name
          investigator
        }
      }
    }
  }

Add permission check to file downloads

The GET /download/study/<study_id>/file/<file_id> endpoint should allow a file download to occur if the user belongs to the study_id group, or has an ADMIN role. Should return 403 otherwise.

Users will have to specify their token in the Authorization header as usual during the get request to verify their identity. This means that blind sharing of the file urls will not be possible, but will instead have to occur through some interface that handles sending of the token.

Fetch study node by kfId

Right now you can only fetch a single study via it's node id, ideally we would like to be able to get a study via it's kfId as that is what would be used in the url for the single study view, something like /study/<kfid>/files

Add mutation to create study in dataservice

Add an on_create hook on the study model to send a request to the dataservice to create a new study.

This should trigger both a new object in the dataservice and an new bucket for the study.

Move to postgres

We should switch to using postgres under the hood sooner than later to avoid any possibility for adding models that are not compatible between the two.

Add fields from dataservice to study

The below are the fields on the dataservice's study model. These should also exist on the study model in the study creator:

{
  "attribution": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001168.v1.p1",
  "created_at": "2018-05-22T21:12:42.999818+00:00",
  "data_access_authority": "dbGaP",
  "external_id": "phs001168",
  "kf_id": "SD_9PYZAHHE",
  "modified_at": "2018-05-22T21:12:42.999823+00:00",
  "name": "Genomic Studies of Orofacial Cleft Birth Defects",
  "release_status": "Pending",
  "short_name": "Orofacial Cleft: European Ancestry",
  "version": "v1.p1",
  "visible": true
}

Anyone can download from downloadUrl

It seems anyone can download using urls returned in a files downloadUrl field. This should only be allowed for requests containing a valid JWT of a user that belongs to the study that the file is part of.

Files are created even when upload fails

When an S3 error occurs, the file object is still created. although the object is not. The error message for uploading the file from boto is also returned. This should instead return a standard 'problem uploading' error message and not create a new object.

Add corsheaders package

We should add the django-cors-headers module to allow preflight requests and add CORS headers to responses.

For now, it's probably ok to allow all origins and we can limit it further once it's deployed.

Add mutation to modify files

A mutation to modify file descriptor fields needs to be added so that file properties may be modified after they have been uploaded.

Add django-dotenv package

Our deployments are dependent on loading variables from the environment. Using django-dotenv will make this easier and allow us to configure the application from various .env files and override them directly with variables in the environment.

We may wish to only do this or #24 as they may address the same issue.

Split settings file

The settings file should be split out into development, testing and production to allow different configuration based on environment.

Implement authentication for users with ego

Users should be able to pass their ego JWT in the Authorization header as a Bearer token.
When a request with a token comes in, it should be validated against ego's public key to ensure the token was issued by ego.

The user's role and groups should be populated on the user's context in the Django request in an authentication middleware for future authorization.

Add s3 storage backend for uploads

Files uploaded through graphql in #9 need to be uploaded to the proper s3 study bucket.
We should use the django-storages module to support S3 uploads for this.

Files uploaded to the api should be placed in:

s3://kf-study-us-east-1-{env}-sd-XXXXXXXX/source/uploads

Write up approach/operations spec for how to handle study files

@baileyckelly commented on Tue Sep 18 2018

Notes from Sprint Planning:

  • How are we tracking changes to files?
  • Which files do we distribute?
  • How do we communicate the changes we made?
    • Long term: There is an upload tool.
  • Original files or original files that have been updated because weโ€™ve been told things are wrong.
    • Upload everything - only mark the ones we used as visible = true?
    • No Duplicates uploaded.
  • Folder Names:
    • Have them separated on S3 Original vs Modified.
  • Review some study specific use cases
  • Shipping Manifests are included in this.
  • Should we have a Working directory for derived files?
  • Need to get them into the dataservice and in QA so OICR can start working.
  • How do we handle study files when we have multiple years in one study file?
  • What are the ACLs on these files?
    • From Allison: The other thing is consent codes, iirc once you have any dbgap access you have access to the DS files, so that would mean we just put all of them on the study files - but need to double check

Support expected value counts

More clarification is needed about what expected values are required and optional and whether this is a dynamic set of values.

Add permission check to allFiles query

Need to check the user's permission to view files to sort out which files to return in the allFiles query.

The allFiles query should return only files that are in a study that a user is in the group of.

Construct method to sync studies with dataservice

Write manage command to sync the study creator's study objects with those that exist in the dataservice.

This should be run with ./manage.py syncdataservice during the entry point, or on demand.

This command will scrape the /studies endpoint of the dataservice and insert any studies that exist in the dataservice, but not in the study creator.

Update Upload page on sphinx site

Upload instructions are outdated now. Modify to:

Curl example
^^^^^^^^^^^^
.. code-block:: bash

       curl localhost:8080/graphql \
      -F operations='{ "query": "mutation ($file: Upload!, $studyId: String!) { createFile(file: $file, studyId: $studyId) { success } }", "variables": { "file": null, "studyId": <study kf id> } }' \
      -F map='{ "0": ["variables.file"] }' \
      -F 0=@<your filepath>

Add permission check for allVersions query

Need to check the user's permission to view files to sort out which objects to return in the allVersions query.

The allVersions query should return only objects that are in a study that a user is in the group of.

Change files to relate to studies

The Batch concept still needs to be better defined, but it's leaning towards being more of a selection of entities created from a study's files.

This makes files being directly related to a study more natural, so the Files entity should point directly to a Study rather than a Batch.

Add versions download endpoint

Currently, only the latest file version may be downloaded from the /download endpoint. We should support specifying what version to download through an endpoint:
GET /download/study/<study_ids>/file/<file_id>/version/<version_id>.
This will have the same authorization mechanism as the /download by file endpoint.

Add state field to file

Add state field to Version that is an enumeration of one of the below:

  • Pending Review
  • Changes Needed
  • Approved
  • Processed

The state flow will look something like the following, although it won't be enforced:

              +> Changes Needed       +> Changes Needed
               |                       |
               |                       |
Pending Review +------------> Approved +-------------> Processed

Cleanup docker entrypoint for dev

Instead of using the base Dockerfile and installing test dependencies in the entrypoint.sh, it may be better to create a second Dockerfile.dev that includes these dependencies already installed and runs a different entrypoint to either pre-populate with mock data or from the data service.

Set up CircleCI

CircleCI should be set up to run tests for PR status checks.

Suggestion - Make it easier to dev w dockerized dataservice

I wanted to use my KF dataservice docker container with the KF study creator API container so that I could preload studies from my dataservice.

Not sure if this is the best approach, but I added the kf-data-stack user-defined network to both of the docker compose files for the dataservice and the study creator. Then I just set PRELOAD_DATA to http://<dataservice web server container name>

Maybe also add something to the sphinx docs about using your dockerized dataservice

Change file primary key to uuid

The the primary key, id, on the File model is currently an incremental integer.
This should be changed to a unique uuid type to cause less confusion when evaluating a url.
The id field should also be renamed to uuid so that it will not conflict with the graphql id field.

Change object primary key to uuid

The the primary key, id, on the Object model is currently an incremental integer.
This should be changed to a unique uuid type to cause less confusion when evaluating a url.
The id field should also be renamed to uuid so that it will not conflict with the graphql id field.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.