nest's Introduction

NeST

Analysis of nested hierarchical structure in spatial transcriptomic data

Please see our manuscript at https://www.nature.com/articles/s41467-023-42343-x.

Installation

For best results, or to run the rpy2 based functionality, installing in an isolated conda environment is recommended. A full installation including NeST and the provided examples can be created by following the following steps:

Clone the NeST repository
Navigate inside the repository in a command prompt
Create the conda environment using the provided environment file: conda env create environment.yml
Activate the conda environment: conda activate nest
Install the NeST package locally: pip install .
To run examples, navigate inside the examples/ directory and run jupyter notebook.

NeST can also be directly installed through pip as pip install nest-analysis.

Installation may take several minutes on a typical computer. If conda takes a long time, using mamba or micromamba instead may speed up the process (https://mamba.readthedocs.io/en/latest/index.html).

A Docker container with NeST and a fully configured environment is also available at https://hub.docker.com/r/blwalker/nest, or as docker pull blwalker/nest:latest.

Usage

Here we overview the main functions available in NeST along with examples from Slideseq (Stickels et al 2021) and Seqfish (Moffitt et al 2018) datasets. See /examples for further information and full running example. Example analysis typically takes ~5-10 minutes on a typical computer, depending on dataset.

Nested Hierarchical Structure

We load the adata object through squidpy wrapped by the nest.data.get_data function, which can be used to load a variety of datasets including all used in the manuscript.

adata = nest.data.get_data(dataset)

Next we compute the single-gene hotspots representing enriched areas of individual genes, over the full transcriptome.

nest.compute_gene_hotspots(adata, verbose=True, eps=75, min_samples=5, min_size=50)

Finally, we identify areas of coexpression. The parameter threshold represents a minimum Jaccard similarity between hotspots to be connected in the hotspot similarity network, and the parameter resolution controls the Leiden algorithm clustering of the network. min_size and min_genes serve for post-processing of the resulting coexpression hotspots to filter out coexpression hotspots that are very small.

nest.coexpression_hotspots(adata, threshold=0.35, min_size=30, min_genes=8, resolution=2.0)

By computing boundaries (parameter alpha_max controls how tightly the boundary follows the spots. Increasing the value gives a boundary with greater curvature.)

nest.compute_multi_boundaries(adata, alpha_max=0.005, alpha_min=0.00001) nest.plot.multi_hotspots(adata)

Nested structure plot allows for visualization of the nested hierarchical structure in the dataset, showing the existence of two layers (of overlapping hotspots) in the hippocampal formation, and one layer everywhere else in the dataset.

nest.plot.nested_structure_plot(adata, figsize=(5, 1.5), fontsize=8, legend_ncol=4, alpha_high=0.75, alpha_low=0.15, legend_kwargs={'loc':"upper left", 'bbox_to_anchor':(1, 1.03)})

NeST is by design highly explainable as all coexpression hotspots derive directly from an ensemble of genes. We can confirm that the identified hotspots are meaningful by looking at these markers for the five coexpression hotspots representing the hippocampal formation.

markers = nest.geometric_markers(adata, [3, 5, 7, 8, 15])

markers_sub = {k: v[:3] for k, v in res.items()}

fig, ax = nest.plot.tracks_plot(adata, markers_sub, width=2.5, track_height=0.1, fontsize=6, marked_genes=[])

nest's People

Contributors

Stargazers

Watchers

nest's Issues

About custom dataset

Hello, it's a great work for spatial transcriptome analysis. I want to run your program on my custom dataset. How should I do? Thanks for your help.

Run the breast example

I have run the example of breast cancer, but I get a different result from step 6.Is it normal? And why is it different? I'll be appreciates if you can give an reply.

nest.coexpression_hotspots() crashes with an error

Hi,
Thank you for developing the package!

I am running into an error when trying to run nest.coexpression_hotspots(adata, threshold=0.35, min_size=30, min_genes=8, resolution=2.0):

ValueError                                Traceback (most recent call last)
File <timed exec>:2

File /zaira/miniconda3/envs/xenium/lib/python3.9/site-packages/nest/hotspot/coexpression.py:133, in coexpression_hotspots(adata, jaccard_matrix, hotspot_lookup, threshold, min_size, min_genes, cutoff, verbose, processes, divisions, use_core, core_k, resolution)
    125 tmp_df = adata.obs.merge(
    126     pd.DataFrame(multi_arrays, index=adata.obs.index),
    127     how="outer",
   (...)
    130     suffixes=("_old", None),
    131 )
    132 tmp_df.drop(tmp_df.filter(regex="_old$").columns.tolist(), axis=1, inplace=True)
--> 133 adata.obs = tmp_df
    134 new_multigene_hotspots = {
    135     str(idx): v for idx, v in enumerate(new_multigene_hotspots)
    136 }
    137 adata.uns["multi_hotspots"] = new_multigene_hotspots

File /zaira/miniconda3/envs/xenium/lib/python3.9/site-packages/anndata/_core/anndata.py:913, in AnnData.obs(self, value)
    911 @obs.setter
    912 def obs(self, value: pd.DataFrame):
--> 913     self._set_dim_df(value, "obs")

File /zaira/miniconda3/envs/xenium/lib/python3.9/site-packages/anndata/_core/anndata.py:850, in AnnData._set_dim_df(self, value, attr)
    848 if not isinstance(value, pd.DataFrame):
    849     raise ValueError(f"Can only assign pd.DataFrame to {attr}.")
--> 850 value_idx = self._prep_dim_index(value.index, attr)
    851 if self.is_view:
    852     self._init_as_actual(self.copy())

File /zaira/miniconda3/envs/xenium/lib/python3.9/site-packages/anndata/_core/anndata.py:864, in AnnData._prep_dim_index(self, value, attr)
    859 """Prepares index to be uses as obs_names or var_names for AnnData object.AssertionError
    860 
    861 If a pd.Index is passed, this will use a reference, otherwise a new index object is created.
    862 """
    863 if self.shape[attr == "var"] != len(value):
--> 864     raise ValueError(
    865         f"Length of passed value for {attr}_names is {len(value)}, but this AnnData has shape: {self.shape}"
    866     )
    867 if isinstance(value, pd.Index) and not isinstance(
    868     value.name, (str, type(None))
    869 ):
    870     raise ValueError(
    871         f"AnnData expects .{attr}.index.name to be a string or None, "
    872         f"but you passed a name of type {type(value.name).__name__!r}"
    873     )

ValueError: Length of passed value for obs_names is 3310605, but this AnnData has shape: (1041175, 273)

I believe the problem here is that the tmp_df is created by outer merging, which results in more rows than adata.obs has, hence the error. Should it be how="left" or how="inner" instead?

Thank you for your help!

Recommend Projects