cnr-ibba / smarter-backend Goto Github PK
View Code? Open in Web Editor NEWSMARTER Backend API
Home Page: https://webserver.ibba.cnr.it/smarter-api/docs/
License: GNU General Public License v3.0
SMARTER Backend API
Home Page: https://webserver.ibba.cnr.it/smarter-api/docs/
License: GNU General Public License v3.0
As suggested in flask-restful/issues/883, there are better alternatives to flask-restful
for some functionalities:
original_id
and smarter_id
Try to figure out how to reply in gzipped format. See:
When making a request with a wrong parameter (a parameter not implemented) such parameter is ignored and there's no cloue to understand that this parameter is wrong (for example, there's a typo) and more results than expected are returned. The allowed parameters need to be clearly stated when defining and endpoint
Support query with multiple parameters like /smarter-api/datasets?type=foreground&type=phenotypes
traslated in R with:
get_smarter_datasets(token, query=list(type='foreground', type='phenotypes'))
When searching in arrays, all
condition need to be applied. When searching in a single field or
condition need to be applied
Desidered behaviour:
GeoJSON
objectsEnforce token authentication to browse data:
bcrypt
and jwt-extended
flask moduleQuery for a region like 1:1000
(or any other term different from <chrom>:<start>-<end>
should raise a validation error. At least, search using only chromosomes could be supported
Accept different header request, like text/csv
in order to support data download in CSV format
Set max MongoDB memory usage. See the following links for reference:
Sorting variant by locations is a little tricky, since ordering with mongoengine
or using a mongodb
query doesn't work as intended (it's not clear which fields in the location array is used). The only way to order by the desired location is by using an aggregation pipeline using $filter method, since $elemMatch
can't be applied in projection. The query is the structured like this:
db.variantSheep.aggregate(
[
{
$match: {
locations: {
$elemMatch: {
imported_from: "SNPchiMp v.3",
version: "Oar_v3.1"
}
}
}
},
{
$project: {
name: 1,
rs_id:1,
locations: {
$filter: {
input: "$locations",
as: "loc",
cond: {
$and: [
{$eq: ["$$loc.imported_from", "SNPchiMp v.3"]},
{$eq: ["$$loc.version", "Oar_v3.1"]}
]
}
}
}
}
},
{
$sort: {
"locations.chrom": 1
}
}
], { "allowDiskUse" : true }
).pretty()
Moreover, this aggregation exceed the default 100Mb or RAM reserved by sorting in aggregation, and require to use disk to be completed. Considering this, is not clear how could be informative sorting variants by location, especially if this endpoints will be used with async operations. Ordering by locations will be not supported
Configure logging by following this guide:
uwsgi
workers have little CPU usage even if they don't reply to any requests. See unbit/uwsgi/issues/1010 for more informations
Model the country endpoint
Support query with parameters for GeoJSON endpoints
breed
, breed_code
, country
, chip_name
, dataset_id
)geo_within_polygon
and geo_within_sphere
in post endpointsmodify Breeds endpoint to support multiple parameters
parser.add_argument
inputsget_queryset
methoddisplay only smarter coordinates in variant detail endpoint. This could be done by creating a new collection with only the information I need. This collection will be served by the endpoint.
Travis CI seems to have issues with build:
Add missing endpoint (and protect them with authentication)
Take a look at mongoengine-goodjson
Since variation endpoints return a single location from database, location content can be flatten within the varriation object. For example
{
"_id": {
"$oid": "6151e7cbb031ca6359300237"
},
"chip_name": [
"IlluminaGoatSNP50"
],
"locations": [
{
"chrom": "1",
"date": {
"$date": 1610064000000
},
"illumina": "T/C",
"illumina_strand": "BOT",
"illumina_top": "A/G",
"imported_from": "manifest",
"position": 18960,
"strand": "TOP",
"version": "ARS1"
}
],
"name": "1_18960_AF-PAKI",
"probeset_id": [],
"sequence": {
"manifest": "AGCTGGAGATGACACTGGCTCGGAGTGGAGCTGTGGCCACGCAGGCGGAATATGAAACGC[A/G]TTTTGGAGGCATAGGAGATCTGGGGCAAGGACCAAGGACCACTCAAAAAACTTAAGAGAC"
}
}
Could become
{
"_id": {
"$oid": "6151e7cbb031ca6359300237"
},
"chip_name": [
"IlluminaGoatSNP50"
],
"chrom": "1",
"date": {
"$date": 1610064000000
},
"illumina": "T/C",
"illumina_strand": "BOT",
"illumina_top": "A/G",
"imported_from": "manifest",
"position": 18960,
"strand": "TOP",
"version": "ARS1",
"name": "1_18960_AF-PAKI",
"probeset_id": [],
"sequence": {
"manifest": "AGCTGGAGATGACACTGGCTCGGAGTGGAGCTGTGGCCACGCAGGCGGAATATGAAACGC[A/G]TTTTGGAGGCATAGGAGATCTGGGGCAAGGACCAAGGACCACTCAAAAAACTTAAGAGAC"
}
}
An this will be simpler to parser with smarterapi
R library
Try to install a more recent docker-compose version on Partner Queue Solution machines, by modifying .travis.yml
(maybe by changing uname -s
and uname -m
outputs):
env:
global:
- DOCKER_COMPOSE_VERSION=1.27.4
# https://docs.travis-ci.com/user/docker/#using-docker-compose
before_install:
- sudo rm /usr/local/bin/docker-compose
- curl -L https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-`uname -s`-`uname -m` > docker-compose
- chmod +x docker-compose
- sudo mv docker-compose /usr/local/bin
Deal with errors while working with endpoints
Filter out breeds and countries by filtering for countries and breeds endpoint respectively. This could be done relying on collection data (see cnr-ibba/SMARTER-database#93 for more info) or by executing an aggregation pipeline
Take a look at this and try to document API using some libraries or templates. Could we use swagger?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.