modelhub-ai / modelhub Goto Github PK

A collection of deep learning models with a unified API.

License: MIT License

Python 100.00%

deep-learning deep-neural-networks artificial-intelligence artificial-neural-networks medical-imaging medical-image-processing medical-image-computing

modelhub's Introduction

We are building a collection of deep learning models. Check it out here.

Crowdsourced through contributions by the scientific research community, Modelhub is a repository of deep learning models pretrained for a wide variety of applications. Modelhub highlights recent trends in deep learning applications, enables transfer learning approaches and promotes reproducible science.

This repository is the index/registry of all models, and as such the point where all developments of the Modelhub Project come together.

Essential Information/Documentation:

Please refer to modelhub.readthedocs.io for the full documentation of the Modelhub project and infrastructure.

About Us:

We are the Computational Imaging and Bioinformatics Laboratory at the Harvard Medical School, Brigham and Women’s Hospital and Dana-Farber Cancer Institute. We are a data science lab focused on the development and application of novel Artificial Intelligence (AI) approaches to various types of medical data.

modelhub's People

Contributors

Stargazers

Watchers

Forkers

devhliu evinpinar dd-repo christophbrgr zhongyizhang18 simon-cj actuarial-tools bees4ever jac101010

modelhub's Issues

Make a modelhub pip installable package for startup

This would replace the current start script and should have the following sub packages (runnable commands essentially):

start (start a model with various options)
list (list all models available online)
[to be extended]

Integration Test: Download JSON Schema

Download the schema from github instead of expecting it to exist locally. Like this we also make sure that always the newest schema is used in the test.

standardize naming convention for start scripts

Keep Nvidia-Docker 2 support or move on to native Docker?

With Docker version 19.03 the mechanism to enable GPU acceleration in Docker containers changed, it is no longer necessary to install the full Nvidia-Docker 2 env as there is now native support to pass a GPU argument to docker run.
See here for the updated usage of Nvidia-Docker: https://github.com/NVIDIA/nvidia-docker#quickstart

Users still need to install the nvidia-docker-toolkit, but that's it. The issue is that this breaks our current start script as the commands to run containers are different:
Old: docker run --runtime=nvidia ...
New: docker run --gpus all ...

Should we still stick to the old syntax or require users to simply update to 19.03 and use the most recent version? Images built with either version are still compatible, so this is only about changing the start script and the requirements.

Refactor start.py and test_integration.py

Currently these are very long script files. Refactoring steps:

Try a better separation of functionality makes sense, possibly splitting up into classes. However, still keep everything in one file each, such that the user does not have to download several files for starting/testing.
Turn the whole thing into an installable python package. Then we can split functionality over several files to make the architecture even nicer. Usage of the package would then be like:
modelhub start
modelhub list
modelhub test_integration

Fix/Update model start up

Update dockers, currently the dockers are outdated, hence models don't run.
Remove shell start scripts, should only use central python start script now.
Update instruction on web frontend.
...

cascaded live model unable to receive tests

Currently hardcoded to go through png, then load dicom.

Write schema for accepted prediction outputs

To clearly define how the outputs are supposed to be formatted, depending on the type of output. will also make integration_test much easier, since we just have to check the output against the schema.

add framework and model_format to config

also visualize these on front end

setup schema for config.json + test

Unable to test local models when they are not in the online index

Due to the GPU mode check with the online model index, local models (e.g. still in development) are not able to start due to the failing check for the GPU flag with the following error:
(base) christophs-mbp:modelhub christoph$ python start.py brats-modelhub -e Model folder exists already. Skipping download. ERROR: Model startup failed. ERROR DETAIL: 'Model "brats-modelhub" not found in online model index'

For models that are not run on our servers because it $$

We need a way to have a blank IP on the front end - and fetch the config from github directly since there will be no cloud instance running.

Integration Test: Add port option

Choose port on which to run API (or expect API in manual mode)

AWS security group

The SG running our instances needs updating - needs to point to specific IP's.

Slicer Plugin

This is a future TODO/TO THINK ABOUT.

Implement a Slicer module that provides a generic interface to use modelhub models via the REST API. We could even integrate the model discovery (from the index) into the Slicer module such that you can search from within Slicer for available models, and if you start it automatically downloads and runs them.

I think this would open modelhub to a large community and create additional impact.

put all models under "models" folder at root repo directory - just for cleanliness.

I dont want to move them myself because this will mess up your start scripts?

Start script should automatically discover all contributer_src files

It is too cumbersome and error prone to list all files manually in the start script.

Only files in special locations (like large model files which don't fit into the repo) should we listed manually

Is there a step-by-step tutorial describing how we'd build the docker for modelhub?

Any in-depth documentation or step-by step tutorials to help set the docker would be wonderful!

Modelhub startup script: Find solution for GitHub API call limit

GitHub limits anonymous calls to their API to 60 per hour from the same IP (see here). Our start script requires several calls to the GitHub API when downloading a model (essentially one call per subdirectory in the model directory + 1 call to get the init.json first). This means at least 7 API calls for one model download => a user cannot download more than about 8 models per hour. This will probably not happen too often, but would still be nice to avoid.

One attempt to solve this could be to go through the GitHub Git Trees API and retrieve the whole directory tree. Unfortunately the returned results are not as informative as the current solution and would require a bunch of string fiddling to get the download URLs. It would also still require 3 API calls (as far as I understand the API for now), because we'd have to go through "master root tree" -> "models sha tree" -> "actual model sha tree" to get the sha for the actual model dir tree we're looking for (we cannot just get the root tree, because results are limited to 1000 entries, which we will probably exceed at some point).

Another solution would be for users to provide their GitHub credentials and run authenticated API calls, which are limited to 5000/hour. But I don't think we should expect that all users have a GitHub account.

Note that starting models which were downloaded before and already exist locally won't require any GitHub API calls (but updating them of course will).

Refactor models.json (remove some fields to keep it cleaner)

I'm mainly thinking about the url and ports. In the long run these should be assigned dynamically by some kind of service, such that we don't have to hardcode them in the index (with a larger number of model this will become infeasible). The index should really just be a list of models which are available via modelhub with the essential information of how/where to find them.

In any case, I expect that the way we store and organize the index will anyway evolve the more models we get.

Starting model notebook does not work if port is taken

When a notebook server is already running, (default) port 8888 is taken, and it appears that the server started within the docker is not available, although there are no errors reported on the console.

start.py to update docker image eben with same tag

something like -uc to update contrib src and -ud to update docker (both optional)

Add Update and List functionality to start script

Update: Force a fresh download of the model
List: Discover and list all available models in the repository

ModelHub demo GIF?

I think it would be quite helpful to have a quick gif with a demo of how modelhub works (not internals, just the summary of the web site features), so that it can be featured in the readme, and included in presentation slides. Would be quite handy for me at the moment!

JSONSchema and UISchema

This project might be an interesting direction to explore to support rendering of the JSONSchema-defined template and UISchema for the representation for the model configuration files, so that input types, attributes can be typed and validated at the time of submission: https://github.com/mozilla-services/react-jsonschema-form#the-uischema-object

start.py seems to not function properly under python 3

for the model registry, or models.json, do we need to include the commit hash as well?

ConnectionResetError: [Errono 104] Connection reset by peer

Hi!
While testing a model for contribution to ModelHub.ai a connection reset error is thrown.

Additionally, a 404 error when trying to access the API from the running docker web-app.
Any advice would be appreciated!

Handle downloading weights from different source

Since weight files can be large, the start scripts should be able to download weights from a different source than github

Clarify license for the models on the hub

When models are submitted, submitters should understand under what license the models will be distributed. That same license should probably be included in this repository and apply to all models stored. It is good to have this issue sorted out early in the process. The submission form says by default it is MIT, and individual submissions can have different licenses as needed.

add cxr prognosis to models

also change docker image name to match others