Giter Club home page Giter Club logo

modelhub's Introduction

We are building a collection of deep learning models. Check it out here.

Crowdsourced through contributions by the scientific research community, Modelhub is a repository of deep learning models pretrained for a wide variety of applications. Modelhub highlights recent trends in deep learning applications, enables transfer learning approaches and promotes reproducible science.

This repository is the index/registry of all models, and as such the point where all developments of the Modelhub Project come together.

Essential Information/Documentation:

Please refer to modelhub.readthedocs.io for the full documentation of the Modelhub project and infrastructure.

About Us:

We are the Computational Imaging and Bioinformatics Laboratory at the Harvard Medical School, Brigham and Women’s Hospital and Dana-Farber Cancer Institute. We are a data science lab focused on the development and application of novel Artificial Intelligence (AI) approaches to various types of medical data.

modelhub's People

Contributors

9zelle9 avatar ahmedhosny avatar bees4ever avatar christophbrgr avatar christophereeles avatar michaelschwier avatar zhongyizhang18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

modelhub's Issues

Make a modelhub pip installable package for startup

This would replace the current start script and should have the following sub packages (runnable commands essentially):

  • start (start a model with various options)
  • list (list all models available online)
  • [to be extended]

Integration Test: Download JSON Schema

Download the schema from github instead of expecting it to exist locally. Like this we also make sure that always the newest schema is used in the test.

Keep Nvidia-Docker 2 support or move on to native Docker?

With Docker version 19.03 the mechanism to enable GPU acceleration in Docker containers changed, it is no longer necessary to install the full Nvidia-Docker 2 env as there is now native support to pass a GPU argument to docker run.
See here for the updated usage of Nvidia-Docker: https://github.com/NVIDIA/nvidia-docker#quickstart

Users still need to install the nvidia-docker-toolkit, but that's it. The issue is that this breaks our current start script as the commands to run containers are different:
Old: docker run --runtime=nvidia ...
New: docker run --gpus all ...

Should we still stick to the old syntax or require users to simply update to 19.03 and use the most recent version? Images built with either version are still compatible, so this is only about changing the start script and the requirements.

Refactor start.py and test_integration.py

Currently these are very long script files. Refactoring steps:

  1. Try a better separation of functionality makes sense, possibly splitting up into classes. However, still keep everything in one file each, such that the user does not have to download several files for starting/testing.

  2. Turn the whole thing into an installable python package. Then we can split functionality over several files to make the architecture even nicer. Usage of the package would then be like:
    modelhub start
    modelhub list
    modelhub test_integration

Fix/Update model start up

  • Update dockers, currently the dockers are outdated, hence models don't run.
  • Remove shell start scripts, should only use central python start script now.
  • Update instruction on web frontend.
  • ...

Write schema for accepted prediction outputs

To clearly define how the outputs are supposed to be formatted, depending on the type of output. will also make integration_test much easier, since we just have to check the output against the schema.

Unable to test local models when they are not in the online index

Due to the GPU mode check with the online model index, local models (e.g. still in development) are not able to start due to the failing check for the GPU flag with the following error:
(base) christophs-mbp:modelhub christoph$ python start.py brats-modelhub -e Model folder exists already. Skipping download. ERROR: Model startup failed. ERROR DETAIL: 'Model "brats-modelhub" not found in online model index'

AWS security group

The SG running our instances needs updating - needs to point to specific IP's.

Slicer Plugin

This is a future TODO/TO THINK ABOUT.

Implement a Slicer module that provides a generic interface to use modelhub models via the REST API. We could even integrate the model discovery (from the index) into the Slicer module such that you can search from within Slicer for available models, and if you start it automatically downloads and runs them.

I think this would open modelhub to a large community and create additional impact.

Modelhub startup script: Find solution for GitHub API call limit

GitHub limits anonymous calls to their API to 60 per hour from the same IP (see here). Our start script requires several calls to the GitHub API when downloading a model (essentially one call per subdirectory in the model directory + 1 call to get the init.json first). This means at least 7 API calls for one model download => a user cannot download more than about 8 models per hour. This will probably not happen too often, but would still be nice to avoid.

One attempt to solve this could be to go through the GitHub Git Trees API and retrieve the whole directory tree. Unfortunately the returned results are not as informative as the current solution and would require a bunch of string fiddling to get the download URLs. It would also still require 3 API calls (as far as I understand the API for now), because we'd have to go through "master root tree" -> "models sha tree" -> "actual model sha tree" to get the sha for the actual model dir tree we're looking for (we cannot just get the root tree, because results are limited to 1000 entries, which we will probably exceed at some point).

Another solution would be for users to provide their GitHub credentials and run authenticated API calls, which are limited to 5000/hour. But I don't think we should expect that all users have a GitHub account.

Note that starting models which were downloaded before and already exist locally won't require any GitHub API calls (but updating them of course will).

Refactor models.json (remove some fields to keep it cleaner)

I'm mainly thinking about the url and ports. In the long run these should be assigned dynamically by some kind of service, such that we don't have to hardcode them in the index (with a larger number of model this will become infeasible). The index should really just be a list of models which are available via modelhub with the essential information of how/where to find them.

In any case, I expect that the way we store and organize the index will anyway evolve the more models we get.

ModelHub demo GIF?

I think it would be quite helpful to have a quick gif with a demo of how modelhub works (not internals, just the summary of the web site features), so that it can be featured in the readme, and included in presentation slides. Would be quite handy for me at the moment!

Clarify license for the models on the hub

When models are submitted, submitters should understand under what license the models will be distributed. That same license should probably be included in this repository and apply to all models stored. It is good to have this issue sorted out early in the process. The submission form says by default it is MIT, and individual submissions can have different licenses as needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.