cnr-ibba / smarter-backend Goto Github PK

View Code? Open in Web Editor NEW

0.0 3.0 0.0 342 KB

SMARTER Backend API

Home Page: https://webserver.ibba.cnr.it/smarter-api/docs/

License: GNU General Public License v3.0

JavaScript 0.17% Python 98.02% Dockerfile 1.06% Makefile 0.33% Batchfile 0.42%

flask mongoengine flask-restful flask-mongoengine rest-api smarter

smarter-backend's People

Contributors

Watchers

smarter-backend's Issues

:boom: substitute flask-restful components with better alternatives

As suggested in flask-restful/issues/883, there are better alternatives to flask-restful for some functionalities:

Flask MethodView or flask-classy in replacement of Resources
webargs in replacement of reqparse
flask-marshmallow to define fields (see also marshmallow-mongoengine for database related stuff (specify which fields to render)
flask-cors which can be useful to handle CORS

:memo: complete the RtD documentation

Add installation section
Add example for python
Remove 3rd party software
Describe SWAGGER interface
Add information about cnr-ibba/r-smarter-api

:sparkles: support searching with patterns on original and smarter id

Implement search through patterns in samples original_id and smarter_id

:zap: try to compress responses

Try to figure out how to reply in gzipped format. See:

:sparkles: return 4xx error when querying with wrong parameters

When making a request with a wrong parameter (a parameter not implemented) such parameter is ignored and there's no cloue to understand that this parameter is wrong (for example, there's a typo) and more results than expected are returned. The allowed parameters need to be clearly stated when defining and endpoint

:sparkles: query with multiple parameters

Support query with multiple parameters like /smarter-api/datasets?type=foreground&type=phenotypes traslated in R with:

get_smarter_datasets(token, query=list(type='foreground', type='phenotypes'))

When searching in arrays, all condition need to be applied. When searching in a single field or condition need to be applied

Desidered behaviour:

query for multiple dataset types (all condition)
query for multiple chips (all condition) in variants
query for multiple chips (or condition) in dataset and samples
query for multiple breeds, breed codes and countries in samples (or condition)
query for multiple datasets in samples (or condition)

:sparkles: add endpoint for supported assemblies variants

Add endpoints for the 4 major assembly versions supported
Filter and project locations only for supported assemblies

:sparkles: support for GIS search in sample endpoints

Search samples by point and radius
Search samples using Bounding Boxes
Search samples inside a polygon
Add endpoints for GeoJSON objects
Document GIS paramenters and endpoints

:sparkles: enforce token authentication

Enforce token authentication to browse data:

Install and configure bcrypt and jwt-extended flask module
Add user using script (no API for new user)
Test cli script
Add endpoint for authentication
Test authentication
Protect all endpoints using authentication
Test authentication with endpoints

:bug: return error when searching variant with a wrong region

Query for a region like 1:1000 (or any other term different from <chrom>:<start>-<end> should raise a validation error. At least, search using only chromosomes could be supported

:sparkles: support `text/csv` header request

Accept different header request, like text/csv in order to support data download in CSV format

:wrench: configure MongoDB memory usage

Set max MongoDB memory usage. See the following links for reference:

:card_file_box: support variant sorting by locations

Sorting variant by locations is a little tricky, since ordering with mongoengine or using a mongodb query doesn't work as intended (it's not clear which fields in the location array is used). The only way to order by the desired location is by using an aggregation pipeline using $filter method, since $elemMatch can't be applied in projection. The query is the structured like this:

db.variantSheep.aggregate(
    [
        {
            $match: {
                locations: {
                    $elemMatch: {
                        imported_from: "SNPchiMp v.3",
                        version: "Oar_v3.1"
                    }
                }
            }
        },
        {
            $project:  {
                name: 1,
                rs_id:1,
                locations: {
                    $filter: {
                        input: "$locations",
                        as: "loc",
                        cond: {
                            $and: [
                                {$eq: ["$$loc.imported_from", "SNPchiMp v.3"]},
                                {$eq: ["$$loc.version", "Oar_v3.1"]}
                            ]
                        }
                    }
                }
            }
        },
        {
            $sort: {
                "locations.chrom": 1
            }
        }
    ], { "allowDiskUse" : true }
).pretty()

Moreover, this aggregation exceed the default 100Mb or RAM reserved by sorting in aggregation, and require to use disk to be completed. Considering this, is not clear how could be informative sorting variants by location, especially if this endpoints will be used with async operations. Ordering by locations will be not supported

:loud_sound: configure logging

Configure logging by following this guide:

:zap: lower uwsgi workers CPU usage

uwsgi workers have little CPU usage even if they don't reply to any requests. See unbit/uwsgi/issues/1010 for more informations

:sparkles: add country endpoint

Model the country endpoint

:sparkles: support parameters into GeoJSON endpoint

Support query with parameters for GeoJSON endpoints

Accept parameters for get endpoints (breed, breed_code, country, chip_name, dataset_id)
Accept multiple parameters for get endpoints
Model geo_within_polygon and geo_within_sphere in post endpoints

:sparkles: support multiple parameters in Breeds endpoint

modify Breeds endpoint to support multiple parameters

change parser.add_argument inputs
update get_queryset method
update swagger

:zap: limit variant locations in variants detail page

display only smarter coordinates in variant detail endpoint. This could be done by creating a new collection with only the information I need. This collection will be served by the endpoint.

Create variant collections with only SMARTER coordinates
Replace the collection used by the variant endpoints
Test if there are improvement in speed or in fetching data

:bug: querying with expired tokens return 500 errors

Deal with expired tokens
Return a more user friendly error message

:construction_worker: update CI system

Travis CI seems to have issues with build:

Move to GitHub workflow
Setup readthedocs CI (and let's start documentation on backend)

:sparkles: add samples and variants endpoints

Add missing endpoint (and protect them with authentication)

Take a look at mongoengine-goodjson

:recycle: flat location data in variants endpoints

Since variation endpoints return a single location from database, location content can be flatten within the varriation object. For example

{
    "_id": {
        "$oid": "6151e7cbb031ca6359300237"
    },
    "chip_name": [
        "IlluminaGoatSNP50"
    ],
    "locations": [
        {
            "chrom": "1",
            "date": {
                "$date": 1610064000000
            },
            "illumina": "T/C",
            "illumina_strand": "BOT",
            "illumina_top": "A/G",
            "imported_from": "manifest",
            "position": 18960,
            "strand": "TOP",
            "version": "ARS1"
        }
    ],
    "name": "1_18960_AF-PAKI",
    "probeset_id": [],
    "sequence": {
        "manifest": "AGCTGGAGATGACACTGGCTCGGAGTGGAGCTGTGGCCACGCAGGCGGAATATGAAACGC[A/G]TTTTGGAGGCATAGGAGATCTGGGGCAAGGACCAAGGACCACTCAAAAAACTTAAGAGAC"
    }
}

Could become

{
    "_id": {
        "$oid": "6151e7cbb031ca6359300237"
    },
    "chip_name": [
        "IlluminaGoatSNP50"
    ],
    
    "chrom": "1",
    "date": {
        "$date": 1610064000000
    },
    "illumina": "T/C",
    "illumina_strand": "BOT",
    "illumina_top": "A/G",
    "imported_from": "manifest",
    "position": 18960,
    "strand": "TOP",
    "version": "ARS1",

    "name": "1_18960_AF-PAKI",
    "probeset_id": [],
    "sequence": {
        "manifest": "AGCTGGAGATGACACTGGCTCGGAGTGGAGCTGTGGCCACGCAGGCGGAATATGAAACGC[A/G]TTTTGGAGGCATAGGAGATCTGGGGCAAGGACCAAGGACCACTCAAAAAACTTAAGAGAC"
    }
}

An this will be simpler to parser with smarterapi R library

:whale: try to install a newer version of docker compose

Try to install a more recent docker-compose version on Partner Queue Solution machines, by modifying .travis.yml (maybe by changing uname -s and uname -m outputs):

env:
  global:
  - DOCKER_COMPOSE_VERSION=1.27.4
# https://docs.travis-ci.com/user/docker/#using-docker-compose
before_install:
  - sudo rm /usr/local/bin/docker-compose
  - curl -L https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-`uname -s`-`uname -m` > docker-compose
  - chmod +x docker-compose
  - sudo mv docker-compose /usr/local/bin

:sparkles: search samples by aliases

Query samples endpoint by aliases

cnr-ibba / smarter-backend Goto Github PK

smarter-backend's People

Contributors

Watchers

smarter-backend's Issues

Recommend Projects

Recommend Topics

Recommend Org