License: MIT License

Shell 20.35% Python 53.77% Dockerfile 11.06% HTML 14.81%

sc-nygc-jan2020 tabula-muris tabula-muris-senis sc-rna-seq sc-rna-seq-analysis

automated-sc-rna-seq-analysis-in-the-cloud's People

Contributors

Stargazers

Watchers

Forkers

shahsanjana shahnirav1005 aopisco vickyzauner deepsystemspharmacology standardgalactic

automated-sc-rna-seq-analysis-in-the-cloud's Issues

IPython should probably not be used inside Docker

The docker containers are only about information processing, so we shouldn't need the IPython dependency.

Ask user for confirmation before overwriting data

Move CSS out to separate static file

Do batch correction before OnClass

In the current version we run OnClass without doing any batch correcting between the new data and the data used to build the pre-trained model.

While this might work in a couple of situations, the current approach is not general.

What needs to be done:

Update onclass_annotations.py to do batch correction
Allow the user to select from a couple of methods to perform batch correction

How to manage processed data?

This might be the same as #19, or it might be different, if the original data is only needed temporarily, but the derived data needs to be kept longer.

Currently we are using the a fixed pre-trained model. We want to allow the user to select different pre-trained models because at the moment the model has been trained using Tabula Muris Senis, so it's better suited for mouse datasets.

What needs to be done:

Collect annotated datasets from different species
Run OnClass to build those pre-trained models
Update code such that the user can choose/upload different pre-trained models.

Clean up templates

base.html is currently unused. (It's from the microblog tutorial.)
index.html is not used either.
results.html and (stub) uploads.html should use a shared (new) base template.

Implement unit tests

For instance, you could mock subprocess and see if it starts what you think it should start.

How to manage user uploaded data?

Can the uploaded data be deleted after processing? or does it form a set with the processed data?
Even if it's just temporary, storing it on the server local hard drive might not be reasonable.
Conceivably, AWS EFS could be the target of uploads, and mounted into containers, but it's a lot more expensive than S3, last I checked... but I think a lot of what you're paying for is redundant edge servers, and that's not something we need.

Existing scrnaseq pipeline

Hi there,

I discovered your repository by chance.
Do you know of the https://github.com/nf-core/scrnaseq pipeline?
(part of nf-core)
Your goal description (QC and cloud computing for scrna) seems rather similar.

Just thought I should mention that here to avoid unnecessary work.

Rename routes2.py to routes.py

Where should the data processing actually run?

Having a webapplication spawn a subprocess, much less a docker container, is a bad idea. Even apart from security concerns, there's a mismatch between the needs of the webserver (minimal! except for storage... see #19 and #20.) and the needs of the processing (huge, above 10G RAM, at least.)

This might be appropriate for AWS Lambda, if the jobs can run quickly enough. Google or Azure might have their own offerings in this space.

Different compute platforms would support different storage possibilities.

How to use cellxgene inside a larger webapp?

From their readme, it sounds monolithic: There is no obvious way to pull out the data processing, and incorporate the visualizations within some other site. Simply spawning multiple instances and running on multiple ports sounds like a really bad idea... but getting into their code to pull out reusable parts doesn't seem easy, either.

ncbi-codeathons / automated-sc-rna-seq-analysis-in-the-cloud Goto Github PK

automated-sc-rna-seq-analysis-in-the-cloud's People

Contributors

Stargazers

Watchers

Forkers

automated-sc-rna-seq-analysis-in-the-cloud's Issues

Recommend Projects

Recommend Topics

Recommend Org