Giter Club home page Giter Club logo

Comments (15)

deevdevil88 avatar deevdevil88 commented on August 13, 2024 1

@sjfleming
yes i saw that and we are using this now. Also we have got a collaborator who is letting me run the samples on their GPU. I have now been using this and samples run fine :)

Thanks
Devika

from cellbender.

sjfleming avatar sjfleming commented on August 13, 2024 1

It's not clear if the v2 branch will help with this issue (yet), although once v2 is done, it will be a significant improvement in a number of ways.

We hope that cell calling is one of those improvements... but v2 is not complete yet. I think you may see some improvement in the current state of v2, but I am working on a few more ways to address this.

For now, what you can count on is: remove-background v1 does not leave cells out. All cells will be called cells. But it will pick up some empty droplets. This is worse in some datasets than others. Currently, the best practice is to filter those out based on other QC metrics downstream.

from cellbender.

cnk113 avatar cnk113 commented on August 13, 2024

#42 explains about overparameterization, but pretty much you have to keep lowering the dimensions until the training is stable.

from cellbender.

LouisFaure avatar LouisFaure commented on August 13, 2024

Thanks @cnk113 for your answer!

I actually have already read this issue, and I did not felt concerned as I thought that for my first run with default parameters the learning curve looked stable. I do agree that the second run was indeed overparametrized.

Following your suggestion I tried to reduce z-dims (keeping the other parameters at default), and it turns out that lowering it as low as 3 dims still does not lead to a proper separation. I also tried reducing zlayers, wihtout success:
tests

from cellbender.

sjfleming avatar sjfleming commented on August 13, 2024

Hi @LouisFaure,

The algorithm really does seem to be struggling on this kind of dataset. I had never tested it with data that has such a high number of ambient counts. While it should work in principle, I see that it hasn't worked well.

I am adding several things in remove-background "version 2" (which should be out within a couple weeks... hopefully) that should help in this case.

... as a matter of fact, if you wouldn't mind, and if you're running on Google Colab with your own CellBender install, would you mind trying to run the current v2 branch, sf_removebkg_v2? I wonder if the changes I've made so far address the issue.

Thanks!

from cellbender.

sjfleming avatar sjfleming commented on August 13, 2024

Also, for a dataset like this, it is helpful to include the parameter --low-count-threshold. It is not completely necessary, but it is often helpful just so that the prior estimates don't get confused. --low-count-threshold just excludes all barcodes with counts below a given value from the very outset, as if they weren't even part of the dataset.

For this dataset, perhaps use --low-count-threshold 1000, which is well below the level of the empty droplet plateau. That long tail of empties with counts ~30 will get totally excluded.

from cellbender.

sjfleming avatar sjfleming commented on August 13, 2024

@deevdevil88

Take a look at the first post by @LouisFaure here. He mentions running CellBender on a Google Colab notebook, which gives you free access to a GPU. I wonder if this could be an option for you?

from cellbender.

LouisFaure avatar LouisFaure commented on August 13, 2024

Hi @sjfleming,

Thanks for your answer, I think that the default parameters already do a pretty good job at guessing the prior (it was excluding bc with counts below 1445), so all long tail of very low UMIs should be already removed. I even further manually raised the value to 1900 in order to see any changes, but this does not seem to improve anything unfortunately.

Fun fact, at some point I mistakenly stopped a run at the first epoch, and here log plot looks much nicer!
image
But I guess one shouldn't trust such short run right?

Finally, I tried to quickly check with version 2, but a module might be missing:

Traceback (most recent call last):
  File "/usr/local/bin/cellbender", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/cellbender/base_cli.py", line 91, in main
    cli_dict = generate_cli_dictionary()
  File "/usr/local/lib/python3.6/dist-packages/cellbender/base_cli.py", line 52, in generate_cli_dictionary
    module_cli = importlib.import_module('.'.join(module_cli_str_list))
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.6/dist-packages/cellbender/remove_background/cli.py", line 4, in <module>
    from cellbender.remove_background.data.dataset import SingleCellRNACountsDataset
  File "/usr/local/lib/python3.6/dist-packages/cellbender/remove_background/data/dataset.py", line 11, in <module>
    import cellbender.remove_background.model
  File "/usr/local/lib/python3.6/dist-packages/cellbender/remove_background/model.py", line 21, in <module>
    from cellbender.remove_background.distributions.PoissonImportanceMarginalizedGamma \
ModuleNotFoundError: No module named 'cellbender.remove_background.distributions.PoissonImportanceMarginalizedGamma'

from cellbender.

sjfleming avatar sjfleming commented on August 13, 2024

Oh, yes, that's a stale import from a file that's not committed. I just pushed a fix that should address that on the v2 branch.

As far as the run with 1 epoch: you are basically visualizing the initialization that the algorithm starts with. So I am quite happy about a good initialization. But something seems to be making it decide, over the course of training, that a lot of those droplets out between 30k - 50k are real... this is something I need to look into. There is a fine balancing act between wanting to (1) explain the data accurately by saying "there is a cell with this specific gene expression" and (2) obeying the prior that lower-count droplets are empty. Getting this balance exactly right and robust is something we are actively working on.

from cellbender.

deevdevil88 avatar deevdevil88 commented on August 13, 2024

hi @sjfleming
I have had the same issue as Louis while running on GPU using google colab with my own install of Cellbender. That cells between 20K and 30K are thought to be real and there is no real separation on running default parameters, but training has converged and also increasing the dims and layers. i havent tried the other end, where i reduce the parameters. So running v2 branch should help with this?

image

image

from cellbender.

deevdevil88 avatar deevdevil88 commented on August 13, 2024

Hi @sjfleming . Looking forward to the completed V2 when its ready. In the meanwhile I will give it a go for my data in it's current state and post the results here. Thanks again.

from cellbender.

deevdevil88 avatar deevdevil88 commented on August 13, 2024

Hi @sjfleming
i tested the removebkg_v2 branch on the same sample. while it did run it failed on epoch 087 with an error a wild NaN appeared.
this is the command i used
! cellbender remove-background --input ./drive/My\ Drive/Thirst_cellranger/G1_rep1/raw_feature_bc_matrix.h5 --output ./G1_rep1_v14.h5 --cuda --expected-cells 12240 --total-droplets-included 30000 --epochs 200

see the top and bottom of the screen shot of the log file
image
image

from cellbender.

mbabadi avatar mbabadi commented on August 13, 2024

@deevdevil88 and @LouisFaure, following up on this thread: did you manage to get the result you were looking for? also, it will help us understand the behavior of CellBender better if you could take a peek into what you believe to be empty droplets. For example -- if you try embedding and clustering the presumably empties (e.g. barcodes ranking 10k-30k), do you see any reasonable biological structure?

from cellbender.

LouisFaure avatar LouisFaure commented on August 13, 2024

@mbabadi thank you for the great suggestion! I did the following:

From the filtered output of remove-background, I solely applied a filtering according to mitochondrial proportion (cells with less than 10% of proportion of these genes are shown in blue):

image

The filtering kept some cells having around 1000 umis. If we look at these low umis cells that also passed the threshold (all TRUE cells after rank 25000 are shown in blue on the right plot) here how they look like on UMAP embedding:

image

As we can see, all low umis cells actually co-localise with other high quality relevant cells, they also colocalize with higher proportion of mitochondrial genes.

So it is clear that the plateau is not empty droplet, but rather a very high proportion of dying cells! I think this explains why remove-background is having troubles correcting them.

from cellbender.

sjfleming avatar sjfleming commented on August 13, 2024

Very interesting @LouisFaure !

This brings up an interesting point... while we say that remove-background outputs a "cell probability", the more accurate statement would be that it outputs a "probability that a droplet is not empty". So in our model, there is no distinction between a good cell and a dead cell, there is only a distinction between "empty" and "non-empty". It's probably the case that, because the dying cells do not look like empty droplets (lots of mitochondrial reads, and probably distinct in some other ways as well), they are being (correctly) identified as "non-empty".

from cellbender.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.