Giter Club home page Giter Club logo

automated-sc-rna-seq-analysis-in-the-cloud's People

Contributors

allissadillman avatar aopisco avatar cattellj avatar mccalluc avatar shahsanjana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

automated-sc-rna-seq-analysis-in-the-cloud's Issues

Do batch correction before OnClass

In the current version we run OnClass without doing any batch correcting between the new data and the data used to build the pre-trained model.

While this might work in a couple of situations, the current approach is not general.

What needs to be done:

  1. Update onclass_annotations.py to do batch correction
  2. Allow the user to select from a couple of methods to perform batch correction

How to manage processed data?

This might be the same as #19, or it might be different, if the original data is only needed temporarily, but the derived data needs to be kept longer.

Pre-trained models

Currently we are using the a fixed pre-trained model. We want to allow the user to select different pre-trained models because at the moment the model has been trained using Tabula Muris Senis, so it's better suited for mouse datasets.

What needs to be done:

  1. Collect annotated datasets from different species
  2. Run OnClass to build those pre-trained models
  3. Update code such that the user can choose/upload different pre-trained models.

Clean up templates

  • base.html is currently unused. (It's from the microblog tutorial.)
  • index.html is not used either.
  • results.html and (stub) uploads.html should use a shared (new) base template.

Implement unit tests

For instance, you could mock subprocess and see if it starts what you think it should start.

How to manage user uploaded data?

  • Can the uploaded data be deleted after processing? or does it form a set with the processed data?
  • Even if it's just temporary, storing it on the server local hard drive might not be reasonable.
  • Conceivably, AWS EFS could be the target of uploads, and mounted into containers, but it's a lot more expensive than S3, last I checked... but I think a lot of what you're paying for is redundant edge servers, and that's not something we need.

Where should the data processing actually run?

Having a webapplication spawn a subprocess, much less a docker container, is a bad idea. Even apart from security concerns, there's a mismatch between the needs of the webserver (minimal! except for storage... see #19 and #20.) and the needs of the processing (huge, above 10G RAM, at least.)

This might be appropriate for AWS Lambda, if the jobs can run quickly enough. Google or Azure might have their own offerings in this space.

Different compute platforms would support different storage possibilities.

How to use cellxgene inside a larger webapp?

From their readme, it sounds monolithic: There is no obvious way to pull out the data processing, and incorporate the visualizations within some other site. Simply spawning multiple instances and running on multiple ports sounds like a really bad idea... but getting into their code to pull out reusable parts doesn't seem easy, either.

Validate HTML output

Run the pages from the site through a validator... Down the road, it would also be nice to have automated tests that validate the HTML.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.