Giter Club home page Giter Club logo

Comments (4)

wangjiawen2013 avatar wangjiawen2013 commented on June 19, 2024

I have checked some chunk of code of tangram such as mapping utils.py. I am confused about the algorithm to compute prior density based on rna count. Because in the code, the adata.X was used. Seurat and Scanpy advice to scale the data and the scaled data is always stored in adata.X. This is what most seurat and scanpy user's do. In this case, the prior desity cannot be computed based on adata.X and that's why I got negative values. Furthermore, if the logarithmized normalized data are stored in adata.X, tangram still cannot get proper prior density, because the data has been already normalized and the sum of each voxel will all approximately equal ! And that's why the rna density almost the same in the results of tangram tutorial (https://tangram-sc.readthedocs.io/en/latest/tutorial_sq_link.html#). The following the how the density is computed according to tangram source code:
def pp_adatas(adata_sc, adata_sp, genes=None):
"""
Pre-process AnnDatas so that they can be mapped. Specifically:
- Remove genes that all entries are zero
- Find the intersection between adata_sc, adata_sp and given marker gene list, save the intersected markers in two adatas
- Calculate density priors and save it with adata_sp

Args:
    adata_sc (AnnData): single cell data
    adata_sp (AnnData): spatial expression data
    genes (List): Optional. List of genes to use. If `None`, all genes are used.

Returns:
    update adata_sc by creating `uns` `training_genes` `overlap_genes` fields 
    update adata_sp by creating `uns` `training_genes` `overlap_genes` fields and creating `obs` `rna_count_based_density` & `uniform_density` field
"""

# put all var index to lower case to align
adata_sc.var.index = [g.lower() for g in adata_sc.var.index]
adata_sp.var.index = [g.lower() for g in adata_sp.var.index]

adata_sc.var_names_make_unique()
adata_sp.var_names_make_unique()

# remove all-zero-valued genes
sc.pp.filter_genes(adata_sc, min_cells=1)
sc.pp.filter_genes(adata_sp, min_cells=1)

if genes is None:
    # Use all genes
    genes = [g.lower() for g in adata_sc.var.index]
else:
    genes = list(g.lower() for g in genes)

# Refine `marker_genes` so that they are shared by both adatas
genes = list(set(genes) & set(adata_sc.var.index) & set(adata_sp.var.index))
# logging.info(f"{len(genes)} shared marker genes.")

adata_sc.uns["training_genes"] = genes
adata_sp.uns["training_genes"] = genes
logging.info(
    "{} training genes are saved in `uns``training_genes` of both single cell and spatial Anndatas.".format(
        len(genes)
    )
)

# Find overlap genes between two AnnDatas
overlap_genes = list(set(adata_sc.var.index) & set(adata_sp.var.index))
# logging.info(f"{len(overlap_genes)} shared genes.")

adata_sc.uns["overlap_genes"] = overlap_genes
adata_sp.uns["overlap_genes"] = overlap_genes
logging.info(
    "{} overlapped genes are saved in `uns``overlap_genes` of both single cell and spatial Anndatas.".format(
        len(overlap_genes)
    )
)

# Calculate uniform density prior as 1/number_of_spots
adata_sp.obs["uniform_density"] = np.ones(adata_sp.X.shape[0]) / adata_sp.X.shape[0]
logging.info(
    f"uniform based density prior is calculated and saved in `obs``uniform_density` of the spatial Anndata."
)

# Calculate rna_count_based density prior as % of rna molecule count
rna_count_per_spot = np.array(adata_sp.X.sum(axis=1)).squeeze()
adata_sp.obs["rna_count_based_density"] = rna_count_per_spot / np.sum(rna_count_per_spot)
logging.info(
    f"rna count based density prior is calculated and saved in `obs``rna_count_based_density` of the spatial Anndata."
)

from tangram.

wangjiawen2013 avatar wangjiawen2013 commented on June 19, 2024

Besides, I am not sure if tangram's cluster expression works on scaled data. probably it works, if the single cell data and spatial transcriptome both use the scaled data.

from tangram.

lewlin avatar lewlin commented on June 19, 2024

Sorry for the super late response - since I changed job my time to respond messages is very small.

The "density_prior" is your best estimate for the cell density within spatial voxels. For example, if you segment cells on histology for Visium, you know exactly how many cells you have per voxel (and that's your density). Note that you don't need the absolute number of cells, but simply the density (so overall multiplied by an arbitrary factor). If you work with MERFISH, density is uniform as you have one cell per voxel.

Typically, on Visium, we cannot really segment cells. So we assume that the cell density is proportional to the number of RNA molecules (works great, really). Use the rna_count_based to pass the RNA counts as a proxy for cell density.

As per negative values, let me ask somebody to review the code.

from tangram.

gaddamshreya avatar gaddamshreya commented on June 19, 2024

Hi @wangjiawen2013 Thank you for using Tangram! You're right about the source of the negative values i.e., lack of raw counts.
@Hejin0701 and I reviewed your issue and it is recommended to use raw counts for spatial data in Tangram.

from tangram.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.