Giter Club home page Giter Club logo

kf-api-dataservice's People

Contributors

allisonheath avatar alubneuski avatar aw3334 avatar baileyckelly avatar blackdenc avatar calkinsh avatar chris-s-friedman avatar christina-j-diaz avatar dankolbman avatar dependabot[bot] avatar devbyaccident avatar fiendish avatar liberaliscomputing avatar parimalak avatar r3m0chop avatar youngnm avatar znatty22 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

connorbarnhill

kf-api-dataservice's Issues

Add Investigator to the data model

Per AOC feedback of the portal mockups:
"When they do a query can they?
They want to know who the investigators are
They also want to do a query based on an investigator"

This will require us to add investigator to the data model.

Create Initial 'Demographic' Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with Person Demographics.

Tasks
[ ] Create Model
[ ] Resource
[ ] API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Rename person to participant

After surveying a few people, including the AOC, a better term for people whose data are in Kids First is participant. This aligns with what people in clinical trails and population studies are caleld.

This will also help distinguish between people who are users versus people whose data is in the Kids First dataset (although these two could overlap in the future).

Create initial Study model

The externally existing concept is one of a "dbGaP Study". In dbGaP two major use cases for the study are:

  1. Access control, once you're approved for access you have access to all data in the study
  2. Consent groups for DUL and reviewing of access requests

The second use case is typically not discussed outside of dbGaP but is an important component for Kids First. In reality, consent is tied to a specific study protocol as determined by an IRB. So someone can participate in multiple different studies resulting in physical samples, genomic data, clinical data, etc. that are covered under the consent of one or more of studies they participated in.

Separately in portal design discussions we had developed the concept of Dataset being
"a set of data created by a particular entity tied to access, data use limitations, IRB/institutions (specific X01 cohorts (dbGaP))".

So we need to resolve the above into what we're going to track in our data model. The immediate use cases I see are:

  • Tracking what dbGaP study each file (or perhaps entity as well) belongs to for authorized access
  • Tracking a participants consent tied to a specific IRB study protocol and which of the entities related to that participant are covered under that consent, especially if a participant is under more than one dbGaP study

There are some foreseeable use cases if we want to bring datasets that aren't managed under the purview of dbGaP. This would include designating "data ownership" and the ability for that data owner to grant access (this is really how I think we should think about dbGaP, it just happens in an automated way from our point of view). For example, including consortium-based or foundation based datasets. This is longer term, but perhaps important to consider so we don't get backed into only supporting the dbGaP use cases.

Create Initial 'Sample' Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Sample.

Tasks

  • Create Model
  • Resource
  • API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create Initial Diagnosis Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Diagnosis.

Tasks

  • Create Model
  • Resource
  • API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create initial SequencingExperiment Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with an Experiement.

Tasks

  • Create Model
  • Resource
  • API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Switch to use Postgres as primary backend

Although SQLite reduces dependency requirements for development, we'll need to move to Postgres for our deployments. We should at least support this in the ProductionConfig SQLALCHEMY_DATABASE_URI.

Add fields masks to endpoints

Users should be able to select which fields they want the api to return by specifying them in the request.

Eg:
GET /participants?fields=kf_id,demographic,diagnoses.pathological_diagnosis

{
  "results": [
    {
      "kf_id": "AABB1123",
      "demographic": {
        "race": "white",
        "age": 324,
        "gender": "male"
      },
      "diagnoses": {
        "pathological_diagnosis": "medulloblastoma"
      }
    }
  ]
}

Here, the user specifies they want the kf_id of the participant, the participant's demographic with default fields, and the pathological_diagnosis field from the participant's diagnosis.

Fix Participant put and delete methods in resource class

Problem:
When trying to put/delete a participant that does not exist (kf_id does not exist in db), an exception is thrown because these methods are not returning when they should be.

Solution:
Add return statement before self._not_found() method in both put and delete methods in:
dataservice.api.participant resources.py

Validate:
Write tests for put and delete to verify that put/delete for a kf_id that does not exist, will return the correct response (content and code = 404)

Example code change:

@participant.expect(participant_fields)
    def put(self, kf_id):
        """
        Update an existing participant
        """
        body = request.json
        participant = models.participant.query.filter_by(kf_id=kf_id).one_or_none()
        if not participant:
            return self._not_found(kf_id)

        participant.external_id = body.get('external_id')
        db.session.commit()

        return kf_response(participant, 201, 'participant updated')

Deploy multiple branches into dev

It would be nice to have the latest commit from every active branch deployed inside the dev environment, as the deployment process is somewhat slow and different feature branches will likely overwrite one-another.

Eg:
I want to view the api with the new add-entities branch, I should be able to navigate to
add-entities.kf-api-dataservice-dev.kids-first.io and view that api deployment.
Alternatively, I may want to go to the hash directly, such as:
abc1234.kf-api-dataservice-dev.kids-first.io

Issues to consider:
Any features that make changes to the data model and therefor the database will break other branches. Perhaps we need to create a new database for each branch?

Add Jenkinsfile for CI/CD

We need to define a Jenkins file that will outline the pipeline for testing, building, and deployment in Jenkins. It should look something like the expample @alubneuski laid out here

Re-write serialization layer in Marshmallow

Flask-RESTPlus gets in the way with regards to error handling and doesn't provide very powerful request parsing. Marshmallow is a popular (de)serialization library that supports inheriting schemas from sqlalchemy models. There is also apispec which will generate much of the swagger documentation using marshmallow schemas.
Replacing Flask-RESTPlus should be straightforward in terms of functionality and will give us much more power in parsing requests and responses as well as reduce the number of hacky work-arounds needed.

Sync updates to Gen3

When relevant updates come in, they should be synchronously updated to Gen3 to provide authn/authz for the files.

Create initial GenomicFile Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a File.

Tasks

  • Create Model
  • Resource
  • API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create MVP ERD

Note: This is due EOD Thursday 1/25/18

We need to create an ERD for OICR of the MVP entities

  • Participant
    • Rename Person to Participant
  • Sample
  • File
  • Demographic
  • Diagnosis
  • ~Participant Relationships
    • (Needs spec’d out)
  • ~Phenotype
    • Just HPO ID
    • Age at observation (Optional)
  • ~Dataset
    • Needs spec’d out
  • ~Outcome/Encounter/Event
    • Needs to be spec’d out

Create Dummy Data Generator Application

Since we will continuously be needing to generate dummy data, we will create an app that goes with the data service to populate our data model/service with Dummy Data.

Make sure we populate data model with dummy data when we're done.

Create Initial Aliquot Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with an Aliquot.

Tasks

  • Create Model
  • Resource
  • API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Secret management with vault

To be able to connect to Postgres in our deployment environments, we will need to maintain a password to Postgres. We are using Vault, so will need to incorporate the retrieval of our password with the Vault cluster, probably using the hvac package.

Create Initial 'Person' Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Person.

Tasks

  • Create Model
  • Resource
  • API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Abstract common models to the base API

Many responses are formed inside common envelopes, for example, status messages and pagination.

These should be defined on the api level so that they may be shared among all resources.

Create initial 'Demographic' Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Demographic.

Tasks

  • Create Model
  • Resource
  • API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create Dockerfile

A dockerfile is needed to build a container with our api so that it may be shipped to our container registry during our deployment.

Acceptance Criteria:
Be deployable to our environment
Be usable by other groups (IE OICR) to set up a local dev environment.

Research Best Methods for Swagger Documentation

Switching away from RESTPlus cost us the automatic swagger documentation.
There is still a couple tools to make documentation easier with Marshmallow:

apispec
flasgger

Time box research for Flask-APIspec to no more than 4 hours. If it is not straight forward- then we need to proceed with manual documentation for MVP - which would require a ticket for each resource ~2 pts each.

Implement optional routing parameters to consolidate resources

Most resources will require routes with optional parameters, for example:

POST /persons - to create a person
GET /persons - to get a list of persons
GET /persons/1 - to get person with id 1

This requires a resource persons with optional parameter id. Our Resource, Person should
be able to support these different parameters using a single Resource class definition.

List Desired Enums

Participant Fields

  • Race
  • Gender
  • Ethnicity

Diagnosis Fields

  • diagnosis_category

Create initial Dataset Entity

User Story
As a Kids First developer, I would like basic CRUD functionality to work with a Dataset. Note - this needs to be spec'd out from a data model perspective.

Tasks

  • Create Model
  • Resource
  • API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Create Initial Phenotype Model

User Story
As a Kids First developer, I would like basic CRUD functionality to work with Phenotypes. Note - this needs to be spec'd out from a data model perspective.

Tasks

  • Create Model
  • Resource
  • API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

History for entities

For each entity type we'll want to capture a history of updates. Exact implementation TBD as we should lay out some of the key use cases first.

Some builtin concepts that might help with the TBD are trigger functionality to provide similar features to postgres' old time travel feature: https://www.postgresql.org/docs/current/static/contrib-spi.html or temporal tables extensions: https://github.com/arkhipov/temporal_tables

Should use care, as time travel was originally removed because of storage and performance: https://www.postgresql.org/docs/6.3/static/c0503.htm

Create FamilyRelationship Model

For the vast majority of current Kids First data, it is trio based (proband/mother/father). However, some of the cohorts are more generally family based and potentially would not have the mother/father, but perhaps two participants that are siblings. So while a rare use case, it is important to support because it can impact analysis, especially for rare diseases where perhaps more unusual family relationships were the only available.

The initial model in #43 has a structure like this:
screen shot 2018-01-25 at 5 06 08 pm

This would let us capture all of the existing relationships to help verify that the above is actually true. However, it makes a lot of the typical queries of just getting the trio or getting everyone in a family a bit more complex. A proposed structure to support those queries might look like:
screen shot 2018-01-25 at 2 07 54 pm

But then that can't support relationships where the mother/father isn't present. One could imagine a hybrid of the two, perhaps leaving mother/father in the participant and relegating the FamilyRelationship to only non parent/child relationships.

Need to determine how we want to move forward for an initial MVP.

Add pagination to participant resource

The top level resources that return lists should return items in a paginated format using a configurable number of results.

This should be implemented using good practices for pagination against the database such as use of a cursor.

Information about the page number, total number of results, etc. should also be returned either through the header or in a standardized envelope.

Examples:

GET /persons

{
  "_links": {
    "prev": "/persons?page=1",
    "self": "/persons?page=2",
    "next": "/persons?page=3"
  },
  "total": 23,
  "limit": 10,
  "results": {
    {"kf_id": "001"},
    {"kf_id": "002"},
    ...
    {"kf_id": "010"}
  }
}

Deploy to Development

This is depending on Alex getting a DB up. Trying to figure out how to mark a ticket in another repo as a blocker... But for now - you get the point.

Create Initial Outcome Model

User Story
As a Kids First developer, I would like basic CRUD functionality to work with an Outcome/encounter/event. Note - this needs to be spec'd out from a data model perspective. We also need to define the name.

Tasks

  • Create Model
  • Resource
  • API: Put, Patch, Delete, Get

Acceptance Criteria
Documentation
Have Put, Patch, Delete, Get
Unit Tests - Tests on Model, Resource, API

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.