Comments (4)
I have checked some chunk of code of tangram such as mapping utils.py. I am confused about the algorithm to compute prior density based on rna count. Because in the code, the adata.X was used. Seurat and Scanpy advice to scale the data and the scaled data is always stored in adata.X. This is what most seurat and scanpy user's do. In this case, the prior desity cannot be computed based on adata.X and that's why I got negative values. Furthermore, if the logarithmized normalized data are stored in adata.X, tangram still cannot get proper prior density, because the data has been already normalized and the sum of each voxel will all approximately equal ! And that's why the rna density almost the same in the results of tangram tutorial (https://tangram-sc.readthedocs.io/en/latest/tutorial_sq_link.html#). The following the how the density is computed according to tangram source code:
def pp_adatas(adata_sc, adata_sp, genes=None):
"""
Pre-process AnnDatas so that they can be mapped. Specifically:
- Remove genes that all entries are zero
- Find the intersection between adata_sc, adata_sp and given marker gene list, save the intersected markers in two adatas
- Calculate density priors and save it with adata_sp
Args:
adata_sc (AnnData): single cell data
adata_sp (AnnData): spatial expression data
genes (List): Optional. List of genes to use. If `None`, all genes are used.
Returns:
update adata_sc by creating `uns` `training_genes` `overlap_genes` fields
update adata_sp by creating `uns` `training_genes` `overlap_genes` fields and creating `obs` `rna_count_based_density` & `uniform_density` field
"""
# put all var index to lower case to align
adata_sc.var.index = [g.lower() for g in adata_sc.var.index]
adata_sp.var.index = [g.lower() for g in adata_sp.var.index]
adata_sc.var_names_make_unique()
adata_sp.var_names_make_unique()
# remove all-zero-valued genes
sc.pp.filter_genes(adata_sc, min_cells=1)
sc.pp.filter_genes(adata_sp, min_cells=1)
if genes is None:
# Use all genes
genes = [g.lower() for g in adata_sc.var.index]
else:
genes = list(g.lower() for g in genes)
# Refine `marker_genes` so that they are shared by both adatas
genes = list(set(genes) & set(adata_sc.var.index) & set(adata_sp.var.index))
# logging.info(f"{len(genes)} shared marker genes.")
adata_sc.uns["training_genes"] = genes
adata_sp.uns["training_genes"] = genes
logging.info(
"{} training genes are saved in `uns``training_genes` of both single cell and spatial Anndatas.".format(
len(genes)
)
)
# Find overlap genes between two AnnDatas
overlap_genes = list(set(adata_sc.var.index) & set(adata_sp.var.index))
# logging.info(f"{len(overlap_genes)} shared genes.")
adata_sc.uns["overlap_genes"] = overlap_genes
adata_sp.uns["overlap_genes"] = overlap_genes
logging.info(
"{} overlapped genes are saved in `uns``overlap_genes` of both single cell and spatial Anndatas.".format(
len(overlap_genes)
)
)
# Calculate uniform density prior as 1/number_of_spots
adata_sp.obs["uniform_density"] = np.ones(adata_sp.X.shape[0]) / adata_sp.X.shape[0]
logging.info(
f"uniform based density prior is calculated and saved in `obs``uniform_density` of the spatial Anndata."
)
# Calculate rna_count_based density prior as % of rna molecule count
rna_count_per_spot = np.array(adata_sp.X.sum(axis=1)).squeeze()
adata_sp.obs["rna_count_based_density"] = rna_count_per_spot / np.sum(rna_count_per_spot)
logging.info(
f"rna count based density prior is calculated and saved in `obs``rna_count_based_density` of the spatial Anndata."
)
from tangram.
Besides, I am not sure if tangram's cluster expression works on scaled data. probably it works, if the single cell data and spatial transcriptome both use the scaled data.
from tangram.
Sorry for the super late response - since I changed job my time to respond messages is very small.
The "density_prior" is your best estimate for the cell density within spatial voxels. For example, if you segment cells on histology for Visium, you know exactly how many cells you have per voxel (and that's your density). Note that you don't need the absolute number of cells, but simply the density (so overall multiplied by an arbitrary factor). If you work with MERFISH, density is uniform as you have one cell per voxel.
Typically, on Visium, we cannot really segment cells. So we assume that the cell density is proportional to the number of RNA molecules (works great, really). Use the rna_count_based
to pass the RNA counts as a proxy for cell density.
As per negative values, let me ask somebody to review the code.
from tangram.
Hi @wangjiawen2013 Thank you for using Tangram! You're right about the source of the negative values i.e., lack of raw counts.
@Hejin0701 and I reviewed your issue and it is recommended to use raw counts for spatial data in Tangram.
from tangram.
Related Issues (20)
- Can we have an API as batch size for the integration method? HOT 3
- No attribute 'pp_adatas' in tangram HOT 2
- Possible to use multiple single cell datasets in Tangram? HOT 3
- Cannot find correspondence of the input data HOT 1
- question about training genes HOT 1
- Some questions about best practices HOT 1
- Attribute error : module 'tangram' has no attribute 'map_cells_to_space' HOT 2
- scRNAseq cells < spatial cells, curious about how mapping works HOT 3
- Unexpected behaviour HOT 1
- Question about acceptable AUC, improving AUC HOT 1
- potential overfitting HOT 5
- Interpretation of tangram_ct_pred HOT 2
- Option "enforce gene lowercase" HOT 1
- error when sq.im.segment
- Tangram Deconvolution HOT 1
- Using Integrated single cell data for alignment
- AttributeError: module 'tangram' has no attribute 'pp_adatas'
- Sparsity_sc and sparsity_sp = 0
- Please explain `project_cell_annotations`
- Projecting spatial annotations to single cell HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tangram.