Comments (15)
@sjfleming
yes i saw that and we are using this now. Also we have got a collaborator who is letting me run the samples on their GPU. I have now been using this and samples run fine :)
Thanks
Devika
from cellbender.
It's not clear if the v2 branch will help with this issue (yet), although once v2 is done, it will be a significant improvement in a number of ways.
We hope that cell calling is one of those improvements... but v2 is not complete yet. I think you may see some improvement in the current state of v2, but I am working on a few more ways to address this.
For now, what you can count on is: remove-background v1 does not leave cells out. All cells will be called cells. But it will pick up some empty droplets. This is worse in some datasets than others. Currently, the best practice is to filter those out based on other QC metrics downstream.
from cellbender.
#42 explains about overparameterization, but pretty much you have to keep lowering the dimensions until the training is stable.
from cellbender.
Thanks @cnk113 for your answer!
I actually have already read this issue, and I did not felt concerned as I thought that for my first run with default parameters the learning curve looked stable. I do agree that the second run was indeed overparametrized.
Following your suggestion I tried to reduce z-dims (keeping the other parameters at default), and it turns out that lowering it as low as 3 dims still does not lead to a proper separation. I also tried reducing zlayers, wihtout success:
from cellbender.
Hi @LouisFaure,
The algorithm really does seem to be struggling on this kind of dataset. I had never tested it with data that has such a high number of ambient counts. While it should work in principle, I see that it hasn't worked well.
I am adding several things in remove-background
"version 2" (which should be out within a couple weeks... hopefully) that should help in this case.
... as a matter of fact, if you wouldn't mind, and if you're running on Google Colab with your own CellBender install, would you mind trying to run the current v2 branch, sf_removebkg_v2
? I wonder if the changes I've made so far address the issue.
Thanks!
from cellbender.
Also, for a dataset like this, it is helpful to include the parameter --low-count-threshold
. It is not completely necessary, but it is often helpful just so that the prior estimates don't get confused. --low-count-threshold
just excludes all barcodes with counts below a given value from the very outset, as if they weren't even part of the dataset.
For this dataset, perhaps use --low-count-threshold 1000
, which is well below the level of the empty droplet plateau. That long tail of empties with counts ~30 will get totally excluded.
from cellbender.
Take a look at the first post by @LouisFaure here. He mentions running CellBender on a Google Colab notebook, which gives you free access to a GPU. I wonder if this could be an option for you?
from cellbender.
Hi @sjfleming,
Thanks for your answer, I think that the default parameters already do a pretty good job at guessing the prior (it was excluding bc with counts below 1445), so all long tail of very low UMIs should be already removed. I even further manually raised the value to 1900 in order to see any changes, but this does not seem to improve anything unfortunately.
Fun fact, at some point I mistakenly stopped a run at the first epoch, and here log plot looks much nicer!
But I guess one shouldn't trust such short run right?
Finally, I tried to quickly check with version 2, but a module might be missing:
Traceback (most recent call last):
File "/usr/local/bin/cellbender", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/cellbender/base_cli.py", line 91, in main
cli_dict = generate_cli_dictionary()
File "/usr/local/lib/python3.6/dist-packages/cellbender/base_cli.py", line 52, in generate_cli_dictionary
module_cli = importlib.import_module('.'.join(module_cli_str_list))
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.6/dist-packages/cellbender/remove_background/cli.py", line 4, in <module>
from cellbender.remove_background.data.dataset import SingleCellRNACountsDataset
File "/usr/local/lib/python3.6/dist-packages/cellbender/remove_background/data/dataset.py", line 11, in <module>
import cellbender.remove_background.model
File "/usr/local/lib/python3.6/dist-packages/cellbender/remove_background/model.py", line 21, in <module>
from cellbender.remove_background.distributions.PoissonImportanceMarginalizedGamma \
ModuleNotFoundError: No module named 'cellbender.remove_background.distributions.PoissonImportanceMarginalizedGamma'
from cellbender.
Oh, yes, that's a stale import from a file that's not committed. I just pushed a fix that should address that on the v2 branch.
As far as the run with 1 epoch: you are basically visualizing the initialization that the algorithm starts with. So I am quite happy about a good initialization. But something seems to be making it decide, over the course of training, that a lot of those droplets out between 30k - 50k are real... this is something I need to look into. There is a fine balancing act between wanting to (1) explain the data accurately by saying "there is a cell with this specific gene expression" and (2) obeying the prior that lower-count droplets are empty. Getting this balance exactly right and robust is something we are actively working on.
from cellbender.
hi @sjfleming
I have had the same issue as Louis while running on GPU using google colab with my own install of Cellbender. That cells between 20K and 30K are thought to be real and there is no real separation on running default parameters, but training has converged and also increasing the dims and layers. i havent tried the other end, where i reduce the parameters. So running v2 branch should help with this?
from cellbender.
Hi @sjfleming . Looking forward to the completed V2 when its ready. In the meanwhile I will give it a go for my data in it's current state and post the results here. Thanks again.
from cellbender.
Hi @sjfleming
i tested the removebkg_v2 branch on the same sample. while it did run it failed on epoch 087 with an error a wild NaN appeared.
this is the command i used
! cellbender remove-background --input ./drive/My\ Drive/Thirst_cellranger/G1_rep1/raw_feature_bc_matrix.h5 --output ./G1_rep1_v14.h5 --cuda --expected-cells 12240 --total-droplets-included 30000 --epochs 200
see the top and bottom of the screen shot of the log file
from cellbender.
@deevdevil88 and @LouisFaure, following up on this thread: did you manage to get the result you were looking for? also, it will help us understand the behavior of CellBender better if you could take a peek into what you believe to be empty droplets. For example -- if you try embedding and clustering the presumably empties (e.g. barcodes ranking 10k-30k), do you see any reasonable biological structure?
from cellbender.
@mbabadi thank you for the great suggestion! I did the following:
From the filtered output of remove-background, I solely applied a filtering according to mitochondrial proportion (cells with less than 10% of proportion of these genes are shown in blue):
The filtering kept some cells having around 1000 umis. If we look at these low umis cells that also passed the threshold (all TRUE cells after rank 25000 are shown in blue on the right plot) here how they look like on UMAP embedding:
As we can see, all low umis cells actually co-localise with other high quality relevant cells, they also colocalize with higher proportion of mitochondrial genes.
So it is clear that the plateau is not empty droplet, but rather a very high proportion of dying cells! I think this explains why remove-background is having troubles correcting them.
from cellbender.
Very interesting @LouisFaure !
This brings up an interesting point... while we say that remove-background
outputs a "cell probability", the more accurate statement would be that it outputs a "probability that a droplet is not empty". So in our model, there is no distinction between a good cell and a dead cell, there is only a distinction between "empty" and "non-empty". It's probably the case that, because the dying cells do not look like empty droplets (lots of mitochondrial reads, and probably distinct in some other ways as well), they are being (correctly) identified as "non-empty".
from cellbender.
Related Issues (20)
- cellbender v3.0 doesn't generate most of the output files, but doesn't have any errors
- "Trying to use CUDA, " \ AssertionError: Trying to use CUDA, but CUDA is not available.
- Number of cells after cellbender much more than number of cells from cellranger (filtered)
- New fileformat output from BD rhapsody HOT 1
- Should I keep decreasing the learning rate?
- Question about input h5 file HOT 1
- Computing the output in asynchronous chunks in parallel takes longer than 144 hours
- Unhandled division by zero
- Never mind
- Can't Computing target noise counts per gene for MCKP estimator HOT 1
- Importance of model loss
- Background Fraction HOT 2
- OOM posterior inference for chimeric sample even using --posterior-batch-size 1
- Not saving ckpt.tar.gz checkpoint HOT 7
- Fixed Single cell RNA seq HOT 1
- Cellbender on multiplexed chemistries
- RuntimeError: CUDA driver error: invalid argument - Google Container Registry (GCR)
- CellBender for .BAM file
- Increased number of UMIs per barcode after running remove-background HOT 1
- Mismatch between summary for algorithm convergence and learning curve
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cellbender.