Giter Club home page Giter Club logo

ddipy's Introduction

ddipy

An Python package to obtain data from the Omics Discovery Index (OmicsDI. It uses the RESTful Web Services at OmicsDI WS for that purpose.

Installation

we need to install ddipy:

pip install ddipy

Examples

This example shows how retrieve details of one dataset by using the Python package ddipy.

from ddipy.dataset_client import DatasetClient
    
if __name__ == '__main__':
   client = DatasetClient()
   res = client.get_dataset_details("pride", "PXD000210", False)
   

This example shows a search for all the datasets for human.

from ddipy.dataset_client import DatasetClient
    
if __name__ == '__main__':
   client = DatasetClient()
   res = client.search("cancer human", "publication_date", "ascending")
   

This example shows a search for all the datasets for cancer human and loop over the pagination

from ddipy.dataset_client import DatasetClient
    
if __name__ == '__main__':
   client = DatasetClient()
   res = client.search("cancer human", "publication_date", "ascending", 1200, 30, 20)
   

This example is a query to retrieve all the datasets that reported the UniProt protein P21399 as identified.

from ddipy.dataset_client import DatasetClient

if __name__ == '__main__':
    client = DatasetClient()
    res = client.search("UNIPROT:P21399")
    

This example is a query to find all the datasets where the gene ENSG00000147251 is reported as differentially expressed.

from ddipy.dataset_client import DatasetClient

if __name__ == '__main__':
    client = DatasetClient()
    res = client.search("ENSEMBL:ENSG00000147251")
    

This example is a query to retrieve all databases recorded in OmicsDI

from ddipy.dataset_client import DatabaseClient

if __name__ == '__main__':
   client = DatabaseClient()
   res = client.get_database_all()

This example is retrieving JSON+LD for dataset page

from ddipy.dataset_client import SeoClient

if __name__ == '__main__':
    client = SeoClient()
    res = client.get_seo_dataset("pride", "PXD000210")

This example is retriveing JSON+LD for home page

from ddipy.dataset_client import SeoClient

if __name__ == '__main__':
    client = SeoClient()
    res = client.get_seo_home()

This example is a query for statistics about the number of datasets per Tissue

from ddipy.dataset_client import StatisticsClient

if __name__ == '__main__':
    client = StatisticsClient()
    res = client.get_statistics_tissues(20)

This example is a query for statistics about the number of datasets per dieases

from ddipy.dataset_client import StatisticsClient

if __name__ == '__main__':
    client = StatisticsClient()
    res = client.get_statistics_diseases(20)

This example for searching dictionary terms

from ddipy.dataset_client import TermClient

if __name__ == '__main__':
    client = TermClient()
    res = client.get_term_by_pattern("hom", 10)

This example for retrieving frequently terms from the repo

from ddipy.dataset_client import TermClient

if __name__ == '__main__':
    client = TermClient()
    res = client.get_term_by_pattern("pride", "description", 20)

Find out about us in our GitHub profiles:

Yasset Perez-Riverol
Pan Xu

CLI for downloading files

When ddipy is installed correctly it should be available on your path through the command omicsdi. This command line interface makes it possible to list all the data links and download the data itself related to an accession number. The tool has one mandatory parameter 'accession number' and several options:

omicsdi_fetcher [OPTIONS] ACC_NUMBER
Option Type Description
--version FLAG Show the version and exit.
-d, --download FLAG Use this flag to download the files in the current directory or a specified output directory
-v, --verbose FLAG Use this flag to print identifiers and file extension along with the urls
-i, --input LIST This option allows you to download the a selection of the files based on comma separated list of identifiers
-o, --output PATH Output directory when downloading files (default: CWD)
-h, --help FLAG Show this message and exit.

Examples

  • A microarray dataset with ftp links:

    omicsdi E-MTAB-5612
    
  • Downloading the microarray dataset with ftp links:

    omicsdi E-MTAB-5612 -d
    
  • A BioModels dataset with https links and exposing identifiers for each file link:

    omicsdi BIOMD0000000048 -v
    
  • Downloading a selection of the files belonging to an accession number based on a list of identifiers as input:

    omicsdi BIOMD0000000048 -d -i "8b52492888, d3144265ac"
    

ddipy's People

Contributors

bedroesb avatar hll3939092 avatar ypriverol avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

bedroesb

ddipy's Issues

comments from manuscript review 1

  • the pypi page is empty, thus setup.py has not been properly setup
  • there are no automated tests, there is no continuous integration for this,
    I consider Travis CI builds to be state of the art
  • there is no API documentation, I would consider a readthedocs.org documentation as state of the art for Python packages with both examples and API documentation
  • the lack of pypi/setup.py configuration goes so far that not even the compatible Python versions are specified
  • there are some (copy and paste?) typos in the README.md file
  • when licensing software using the MIT license actually requires you to ship the MIT license text with the software but I think most people are not aware of this.

# R Package

  • tests are lacking
  • there is no continuous integration for the R package as well, there is a .travis.yml file but no corresponding Travis CI entry
  • documentation appears to be greatly consisting of semi-autogenerated "man" pages
  • distribution via bioconductor or cran is missing, a bioconda package could be helpful as well

comments from manuscript reviewer 2

Errors vs unit test

Comments on implementation

  • the base url https://www.omicsdi.org/ws should be written in a single/few variable and re-used instead of writing it 53 times : baseUrl="https://www.omicsdi.org/ws/" and then baseDatasetUrl = self.baseUrl + "dataset/"
  • copy-pasting the User-Agent (16 times) is error prone for future evolution, instead please consider doing
tokened_header=self.headers.copy()
tokened_header.update("x-auth-token": access_token)

New comments from reviewers

  • After reviewing the codebase of https://github.com/OmicsDI/ddi-web-service/ it appears that there are almost no tests, but more importantly I did not found any trace of continuous integration. I can only recommend to enrich the tests, and enable it as you did for the clients, even though it might be harder due to the dependencies.

  • https://www.omicsdi.org/ws/dataset/pridr/PXD000210/ is now indicating that the database is missing but still uses a 500 while it should be a 404

  • In the python example of the client, providing a boolean argument while never indicating what it is add little to no information. Please either use named attribute, or remove it:

res=client.get_dataset_details(domain="pride", accession="PXD000210")
# or
res=client.get_dataset_details(domain="pride", accession="PXD000210", debug=True)
  • https://www.omicsdi.org/ws/database/blabla/picture fails with a 500 error and a NullPointerException, it should indicate that blabla database does not exists, and return a status_code of 404

  • In dataset_client.py search method refuse to not specify sortfield or order while API allows it. Client should not introduce such restriction without a reason. Here after a snippet where arguments are optional (but still must not be an empty string).

class DatasetClient:

    # (...)

    @staticmethod
    def search(query, sortfield=None, order=None, start=0, size=20, face_count=20):
        params = {
            "start": start,
            "size": size,
            "faceCount": face_count
        }
        if not query:
            raise BadRequest("missing parameter query", MISSING_PARAMETER, payload=None)
        else:
            params.update(query=query)
        if sortfield:
            params.update(sortfield=sortfield)
        if order:
            params.update(order=order)

Can you provide some examples

Please can you provide some examples to retrieve datasets with search and filtering? Some queries:

This is a search for all the datasets for human. It would be nice to see how to navigate after a query across pages.

This is a query to retrieve all the datasets that reported the UniProt protein P21399 as identified.

This query to find all the datasets where the gene ENSG00000147251 is reported as differentially expressed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.