Dear coders of CellBender, First of all thank you for providing such

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="51

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

High level of background: misassigned droplets about cellbender HOT 15 CLOSED

LouisFaure commented on August 13, 2024

High level of background: misassigned droplets

from cellbender.

Comments (15)

deevdevil88 commented on August 13, 2024 1

@sjfleming
yes i saw that and we are using this now. Also we have got a collaborator who is letting me run the samples on their GPU. I have now been using this and samples run fine :)

Thanks
Devika

from cellbender.

sjfleming commented on August 13, 2024 1

It's not clear if the v2 branch will help with this issue (yet), although once v2 is done, it will be a significant improvement in a number of ways.

We hope that cell calling is one of those improvements... but v2 is not complete yet. I think you may see some improvement in the current state of v2, but I am working on a few more ways to address this.

For now, what you can count on is: remove-background v1 does not leave cells out. All cells will be called cells. But it will pick up some empty droplets. This is worse in some datasets than others. Currently, the best practice is to filter those out based on other QC metrics downstream.

from cellbender.

cnk113 commented on August 13, 2024

#42 explains about overparameterization, but pretty much you have to keep lowering the dimensions until the training is stable.

from cellbender.

LouisFaure commented on August 13, 2024

Thanks @cnk113 for your answer!

I actually have already read this issue, and I did not felt concerned as I thought that for my first run with default parameters the learning curve looked stable. I do agree that the second run was indeed overparametrized.

Following your suggestion I tried to reduce z-dims (keeping the other parameters at default), and it turns out that lowering it as low as 3 dims still does not lead to a proper separation. I also tried reducing zlayers, wihtout success:

from cellbender.

sjfleming commented on August 13, 2024

Hi @LouisFaure,

The algorithm really does seem to be struggling on this kind of dataset. I had never tested it with data that has such a high number of ambient counts. While it should work in principle, I see that it hasn't worked well.

I am adding several things in remove-background "version 2" (which should be out within a couple weeks... hopefully) that should help in this case.

... as a matter of fact, if you wouldn't mind, and if you're running on Google Colab with your own CellBender install, would you mind trying to run the current v2 branch, sf_removebkg_v2? I wonder if the changes I've made so far address the issue.

Thanks!

from cellbender.

sjfleming commented on August 13, 2024

Also, for a dataset like this, it is helpful to include the parameter --low-count-threshold. It is not completely necessary, but it is often helpful just so that the prior estimates don't get confused. --low-count-threshold just excludes all barcodes with counts below a given value from the very outset, as if they weren't even part of the dataset.

For this dataset, perhaps use --low-count-threshold 1000, which is well below the level of the empty droplet plateau. That long tail of empties with counts ~30 will get totally excluded.

from cellbender.

sjfleming commented on August 13, 2024

@deevdevil88

Take a look at the first post by @LouisFaure here. He mentions running CellBender on a Google Colab notebook, which gives you free access to a GPU. I wonder if this could be an option for you?

from cellbender.

LouisFaure commented on August 13, 2024

Hi @sjfleming,

Thanks for your answer, I think that the default parameters already do a pretty good job at guessing the prior (it was excluding bc with counts below 1445), so all long tail of very low UMIs should be already removed. I even further manually raised the value to 1900 in order to see any changes, but this does not seem to improve anything unfortunately.

Fun fact, at some point I mistakenly stopped a run at the first epoch, and here log plot looks much nicer!

But I guess one shouldn't trust such short run right?

Finally, I tried to quickly check with version 2, but a module might be missing:

Traceback (most recent call last):
  File "/usr/local/bin/cellbender", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/cellbender/base_cli.py", line 91, in main
    cli_dict = generate_cli_dictionary()
  File "/usr/local/lib/python3.6/dist-packages/cellbender/base_cli.py", line 52, in generate_cli_dictionary
    module_cli = importlib.import_module('.'.join(module_cli_str_list))
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.6/dist-packages/cellbender/remove_background/cli.py", line 4, in <module>
    from cellbender.remove_background.data.dataset import SingleCellRNACountsDataset
  File "/usr/local/lib/python3.6/dist-packages/cellbender/remove_background/data/dataset.py", line 11, in <module>
    import cellbender.remove_background.model
  File "/usr/local/lib/python3.6/dist-packages/cellbender/remove_background/model.py", line 21, in <module>
    from cellbender.remove_background.distributions.PoissonImportanceMarginalizedGamma \
ModuleNotFoundError: No module named 'cellbender.remove_background.distributions.PoissonImportanceMarginalizedGamma'

from cellbender.

sjfleming commented on August 13, 2024

Oh, yes, that's a stale import from a file that's not committed. I just pushed a fix that should address that on the v2 branch.

As far as the run with 1 epoch: you are basically visualizing the initialization that the algorithm starts with. So I am quite happy about a good initialization. But something seems to be making it decide, over the course of training, that a lot of those droplets out between 30k - 50k are real... this is something I need to look into. There is a fine balancing act between wanting to (1) explain the data accurately by saying "there is a cell with this specific gene expression" and (2) obeying the prior that lower-count droplets are empty. Getting this balance exactly right and robust is something we are actively working on.

from cellbender.

deevdevil88 commented on August 13, 2024

hi @sjfleming
I have had the same issue as Louis while running on GPU using google colab with my own install of Cellbender. That cells between 20K and 30K are thought to be real and there is no real separation on running default parameters, but training has converged and also increasing the dims and layers. i havent tried the other end, where i reduce the parameters. So running v2 branch should help with this?

from cellbender.

deevdevil88 commented on August 13, 2024

Hi @sjfleming . Looking forward to the completed V2 when its ready. In the meanwhile I will give it a go for my data in it's current state and post the results here. Thanks again.

from cellbender.

deevdevil88 commented on August 13, 2024

Hi @sjfleming
i tested the removebkg_v2 branch on the same sample. while it did run it failed on epoch 087 with an error a wild NaN appeared.
this is the command i used
! cellbender remove-background --input ./drive/My\ Drive/Thirst_cellranger/G1_rep1/raw_feature_bc_matrix.h5 --output ./G1_rep1_v14.h5 --cuda --expected-cells 12240 --total-droplets-included 30000 --epochs 200

see the top and bottom of the screen shot of the log file

from cellbender.

mbabadi commented on August 13, 2024

@deevdevil88 and @LouisFaure, following up on this thread: did you manage to get the result you were looking for? also, it will help us understand the behavior of CellBender better if you could take a peek into what you believe to be empty droplets. For example -- if you try embedding and clustering the presumably empties (e.g. barcodes ranking 10k-30k), do you see any reasonable biological structure?

from cellbender.

LouisFaure commented on August 13, 2024

@mbabadi thank you for the great suggestion! I did the following:

From the filtered output of remove-background, I solely applied a filtering according to mitochondrial proportion (cells with less than 10% of proportion of these genes are shown in blue):

The filtering kept some cells having around 1000 umis. If we look at these low umis cells that also passed the threshold (all TRUE cells after rank 25000 are shown in blue on the right plot) here how they look like on UMAP embedding:

As we can see, all low umis cells actually co-localise with other high quality relevant cells, they also colocalize with higher proportion of mitochondrial genes.

So it is clear that the plateau is not empty droplet, but rather a very high proportion of dying cells! I think this explains why remove-background is having troubles correcting them.

from cellbender.

sjfleming commented on August 13, 2024

Very interesting @LouisFaure !

This brings up an interesting point... while we say that remove-background outputs a "cell probability", the more accurate statement would be that it outputs a "probability that a droplet is not empty". So in our model, there is no distinction between a good cell and a dead cell, there is only a distinction between "empty" and "non-empty". It's probably the case that, because the dying cells do not look like empty droplets (lots of mitochondrial reads, and probably distinct in some other ways as well), they are being (correctly) identified as "non-empty".

from cellbender.

High level of background: misassigned droplets about cellbender HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent