Hi, Thanks for you great work. Now I'm trying do perform tangram with squidpy.

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

prior density about tangram HOT 4 CLOSED

broadinstitute commented on June 19, 2024

prior density

from tangram.

Comments (4)

wangjiawen2013 commented on June 19, 2024

I have checked some chunk of code of tangram such as mapping utils.py. I am confused about the algorithm to compute prior density based on rna count. Because in the code, the adata.X was used. Seurat and Scanpy advice to scale the data and the scaled data is always stored in adata.X. This is what most seurat and scanpy user's do. In this case, the prior desity cannot be computed based on adata.X and that's why I got negative values. Furthermore, if the logarithmized normalized data are stored in adata.X, tangram still cannot get proper prior density, because the data has been already normalized and the sum of each voxel will all approximately equal ! And that's why the rna density almost the same in the results of tangram tutorial (https://tangram-sc.readthedocs.io/en/latest/tutorial_sq_link.html#). The following the how the density is computed according to tangram source code:
def pp_adatas(adata_sc, adata_sp, genes=None):
"""
Pre-process AnnDatas so that they can be mapped. Specifically:
- Remove genes that all entries are zero
- Find the intersection between adata_sc, adata_sp and given marker gene list, save the intersected markers in two adatas
- Calculate density priors and save it with adata_sp

Args:
    adata_sc (AnnData): single cell data
    adata_sp (AnnData): spatial expression data
    genes (List): Optional. List of genes to use. If `None`, all genes are used.

Returns:
    update adata_sc by creating `uns` `training_genes` `overlap_genes` fields 
    update adata_sp by creating `uns` `training_genes` `overlap_genes` fields and creating `obs` `rna_count_based_density` & `uniform_density` field
"""

# put all var index to lower case to align
adata_sc.var.index = [g.lower() for g in adata_sc.var.index]
adata_sp.var.index = [g.lower() for g in adata_sp.var.index]

adata_sc.var_names_make_unique()
adata_sp.var_names_make_unique()

# remove all-zero-valued genes
sc.pp.filter_genes(adata_sc, min_cells=1)
sc.pp.filter_genes(adata_sp, min_cells=1)

if genes is None:
    # Use all genes
    genes = [g.lower() for g in adata_sc.var.index]
else:
    genes = list(g.lower() for g in genes)

# Refine `marker_genes` so that they are shared by both adatas
genes = list(set(genes) & set(adata_sc.var.index) & set(adata_sp.var.index))
# logging.info(f"{len(genes)} shared marker genes.")

adata_sc.uns["training_genes"] = genes
adata_sp.uns["training_genes"] = genes
logging.info(
    "{} training genes are saved in `uns``training_genes` of both single cell and spatial Anndatas.".format(
        len(genes)
    )
)

# Find overlap genes between two AnnDatas
overlap_genes = list(set(adata_sc.var.index) & set(adata_sp.var.index))
# logging.info(f"{len(overlap_genes)} shared genes.")

adata_sc.uns["overlap_genes"] = overlap_genes
adata_sp.uns["overlap_genes"] = overlap_genes
logging.info(
    "{} overlapped genes are saved in `uns``overlap_genes` of both single cell and spatial Anndatas.".format(
        len(overlap_genes)
    )
)

# Calculate uniform density prior as 1/number_of_spots
adata_sp.obs["uniform_density"] = np.ones(adata_sp.X.shape[0]) / adata_sp.X.shape[0]
logging.info(
    f"uniform based density prior is calculated and saved in `obs``uniform_density` of the spatial Anndata."
)

# Calculate rna_count_based density prior as % of rna molecule count
rna_count_per_spot = np.array(adata_sp.X.sum(axis=1)).squeeze()
adata_sp.obs["rna_count_based_density"] = rna_count_per_spot / np.sum(rna_count_per_spot)
logging.info(
    f"rna count based density prior is calculated and saved in `obs``rna_count_based_density` of the spatial Anndata."
)

from tangram.

wangjiawen2013 commented on June 19, 2024

Besides, I am not sure if tangram's cluster expression works on scaled data. probably it works, if the single cell data and spatial transcriptome both use the scaled data.

from tangram.

lewlin commented on June 19, 2024

Sorry for the super late response - since I changed job my time to respond messages is very small.

The "density_prior" is your best estimate for the cell density within spatial voxels. For example, if you segment cells on histology for Visium, you know exactly how many cells you have per voxel (and that's your density). Note that you don't need the absolute number of cells, but simply the density (so overall multiplied by an arbitrary factor). If you work with MERFISH, density is uniform as you have one cell per voxel.

Typically, on Visium, we cannot really segment cells. So we assume that the cell density is proportional to the number of RNA molecules (works great, really). Use the rna_count_based to pass the RNA counts as a proxy for cell density.

As per negative values, let me ask somebody to review the code.

from tangram.

gaddamshreya commented on June 19, 2024

Hi @wangjiawen2013 Thank you for using Tangram! You're right about the source of the negative values i.e., lack of raw counts.
@Hejin0701 and I reviewed your issue and it is recommended to use raw counts for spatial data in Tangram.

from tangram.

prior density about tangram HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent