Comments (9)
This run might not have totally converged, but this is the result of running
cellbender remove-background --input 10k_hgmm_v3_nextgem_raw_feature_bc_matrix.h5 --output 10k_hgmm_v3_nextgem_out.h5 --cuda --expected-cells 10000 --total-droplets-included 20000 --epochs 300 --z-dim 20 --z-layers 100
from cellbender.
Just wanted to add that the HgMm mix datasets are sometimes problematic in assessing ambient RNAs, as I've found 0%-3% of UMIs (depending on cell type) will end up in the wrong species cells simply due to mismapping thanks to genome, and annotation and sequencing error.
from cellbender.
Found it. It was coming from the use of the datatype uint16
to store gene indices during the creation of the output sparse count matrix...
I guess at some point way back, I thought, "There won't be transcriptomes with more than 65k genes, right?" Not right.
I will push a fix for this soon.
from cellbender.
I believe I have a parsing error in the newer v3 format HDF5 files from CellRanger that involve multiple genomes! I will track down this bug asap.
Thanks for reporting. I think what you're seeing there is essentially a garbled output due to an input parsing error.
from cellbender.
For an urgent workaround, I believe you can input your data using the CellRanger mtx directory format, and then even the v3 multiple-genome data should be parsed correctly. But this is a hunch, and I still need to try it myself. Either way, I will be working on fixing that bug soon.
from cellbender.
Thank you for the reply! If what I saw was simply garbled output, I would expect to see some cells with high mouse gene counts. The fact that 1) 90% of mouse gene counts are removed from all cells, 2) tens of thousands human gene counts are added to cells that originally had only hundreds, and 3) the inferred priors and cutoffs look correct, makes me suspect it might be due to something else.
from cellbender.
Tried supplying mtx and got exactly the same result.
The code for loading/parsing input seems alright. Though it doesn't read "/matrix/features/genome", the inference shouldn't care about an extra gene label, should it?
CellBender/cellbender/remove_background/data/dataset.py
Lines 845 to 871 in d68bf9d
What I notice is that in hgmm5k_v3 and hgmm10k_v3, nUMI per cell is distinctively lower in mouse cells than human cells, whereas in hgmm12k_v2 the difference is smaller. See plots below (blue: human, green: mouse, green: empty droplets)
Could it be that this distribution somehow confused the method to (partially) model empty droplets out of mouse cells?
from cellbender.
I will look into that, but the fact that you gave it the --expected-cells
parameter should enable it to figure out a good prior on cell counts that can cover both human and mouse...
from cellbender.
Thank you for the quick fix!
from cellbender.
Related Issues (20)
- Index Error in priors.py HOT 2
- ERROR: Could not build wheels for pyzmq, which is required to install pyproject.toml-based projects HOT 1
- Fix readthedocs HOT 1
- HTML output fails due to lxml change HOT 5
- Feature: tool for users to rescue v0.3.1 runs and to re-compute output counts in general HOT 1
- Problem installing cellbender HOT 2
- Is it possible to adjust the priors so that cellbender can work with shallow-sequenced data?
- How to run CellBender for a pooling library?
- Numpy build failed when installing cell bender from source
- cellbender v3.0 doesn't generate most of the output files, but doesn't have any errors
- "Trying to use CUDA, " \ AssertionError: Trying to use CUDA, but CUDA is not available.
- Number of cells after cellbender much more than number of cells from cellranger (filtered)
- New fileformat output from BD rhapsody HOT 1
- Should I keep decreasing the learning rate?
- Question about input h5 file HOT 1
- Computing the output in asynchronous chunks in parallel takes longer than 144 hours
- Unhandled division by zero
- Never mind
- Can't Computing target noise counts per gene for MCKP estimator HOT 1
- Importance of model loss
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cellbender.