scverse / scanpy-tutorials Goto Github PK

View Code? Open in Web Editor NEW

179.0 10.0 112.0 207.24 MB

Scanpy Tutorials.

Home Page: https://scanpy-tutorials.readthedocs.io/

Makefile 0.01% Python 0.01% Jupyter Notebook 99.99% CSS 0.01%

scanpy-tutorials's Introduction

Scanpy tutorials

See this page for more context.

scanpy-tutorials's People

Contributors

Stargazers

Watchers

Forkers

volkerbergen 146790g vkedlian fidelram ivirshup crichgriffin ritwik7 aysefyesil kmussar yakun-pang childscharlie koncopd srshrira giovp amitbin1 jashjeet mkrzak ktpolanski shadow3g satellite119 pirasteh ran485 duydn antoinecollin lshh125 bence-szalai ll19960830 raphaelbuzzi onlybelter longhint jbrannan565 lihaone mpr121 gokceneraslan qiuxiazhou takshan axcyoung jlause claire2314 richardvdh mohandeepak drmasato sschoenbeck senbabay yu-zhang-oyo lwchn direnardak jdariosolis tawouk27 tianyibian mbuttner marantmir abannie shoo99 winuthayanon vvttrr emanoeelghabrial bmlewis-ucsd kateedriscoll temileke emberwhirl yangdian978 vsrivas dineshpalli iamjy zariyagh hasihays bobia9991 adamgayoso ercoale mys721tx samplertgithu xuf12 sbwiecko damiengulliver zepoch hrovatin ronfinn stanleyjacob uglyray7 stephanieredmond bennjmo wlzhdtk mahlaranjeet benjamhbrs jshon1 annied11 ppabba101 josephduraisingh lkampoli ishdafish24 canli1 partrita krisalejo binkum021 leshanzhao hqi87 iwatat littlepeaches chenwlearning

scanpy-tutorials's Issues

Scale data in Visium tutorial? #question

Couldn't find anything in exisitng/closed issue so I asked here.

Thank you for implementing those concise yet comprehensive tutorials. In Visium data processing tutorial, no scaling was done before PCA is conducted. Is there any specific reason for skipping this step?

Thanks in advance for the answer!

Finalize linking strategy (where to host, where to link)

e.g: Remove everything but the tutorials shown on scanpy docs from this repo and render them from within the scanpy docs

KeyError: 'rank_genes_groups'

I just practiced scanpy by the 3K PBMC tutorial. Yet, I encountered an error as following:

How to solve this problem?

Use myst-nb admonitions

Tutorials are now included via git submodule to the main scanpy docs. The docs there use myst/myst-nb to render ipynb's.

https://scanpy.readthedocs.io/en/latest/_tutorials.html

Here we need to change tutorials to use myst-nb admonitions.

Extract Differentialy Expressed Genes for each group

Quick question:

I can plot differentially expressed genes for a group of celltypes in a dataset, like this:

sc.tl.rank_genes_groups(adata, 'celltype', method='wilcoxon', key_added = "wilcoxon", min_fold_change=3)
sc.tl.rank_genes_groups(adata, 'celltype', method='wilcoxon', key_added = "wilcoxon",     min_fold_change=3)

How can I access those genes for each celltype?

The data seems to be stored in adata.uns['wilcoxon'], but I have no idea how to extract that


{'params': {'groupby': 'preds',
  'reference': 'rest',
  'method': 'wilcoxon',
  'use_raw': False,
  'layer': None,
  'corr_method': 'benjamini-hochberg'},
 'names': rec.array([('CD14', 'HAL', 'KRT19', 'VWF', 'GLUL', 'CD4', 'CYP2E1', 'MARCO', 'CD3E'),
            ('FLT4', 'COL1A1', 'SOX9', 'LHX6', 'CLEC10A', 'VWF', 'CD14', 'CSF1R', 'NKG7'),
            ('C5AR1', 'ADAMTSL2', 'SPP1', 'HTRA3', 'CD276', 'COL1A1', 'GLUL', 'VSIG4', 'IL7R'),
            ('MYH11', 'COLEC11', 'COL1A1', 'RSPO3', 'SIRPA', 'COLEC11', 'C5AR1', 'CD68', 'PTPRC'),
            ('HAL', 'CD14', 'IGFBP3', 'HAL', 'C5AR1', 'C5AR1', 'CSF1R', 'HAL', 'HAL'),
            ('SIRPA', 'CYP2E1', 'CYP2E1', 'CYP2E1', 'CYP2E1', 'CYP2E1', 'COL1A1', 'CYP2E1', 'CYP2E1'),
(.......................)  ],     
           dtype=[('Unassigned', 'O'), ('b-cells', 'O'), ('cholangiocytes', 'O'), ('endothelial cells', 'O'), ('erythroid cells', 'O'), ('hepatic stellate cells', 'O'), ('hepatocytes', 'O'), ('kupffer cells', 'O'), ('t-cells', 'O')]),
 'scores': rec.array([( 1.1047919 ,  1.4223580e+01,  8.5403233e+00,  2.9836638e+00,  3.7443349e+00,  1.03904781e+01,  5.28960686e+01,  2.30597286e+01,  1.96362038e+01),
            ( 0.65801376,  8.8617716e+00,  5.2942681e+00,  1.9939163e+00,  2.1673870e+00,  1.02436619e+01,  4.85130644e+00,  1.95555763e+01,  9.89364815e+00),
            ( 0.6156045 ,  4.8951001e+00,  4.3029914e+00,  9.9042362e-01,  1.9944884e+00,  8.08185673e+00,  4.40766430e+00,  1.81397591e+01,  5.23387718e+00),
(.......................)  ],     
           dtype=[('Unassigned', '<f4'), ('b-cells', '<f4'), ('cholangiocytes', '<f4'), ('endothelial cells', '<f4'), ('erythroid cells', '<f4'), ('hepatic stellate cells', '<f4'), ('hepatocytes', '<f4'), ('kupffer cells', '<f4'), ('t-cells', '<f4')]),
 'pvals': rec.array([(0.26924979, 6.54181873e-46, 1.33846555e-17, 0.00284819, 1.80872154e-04, 2.73975710e-25, 0.00000000e+00, 1.17486974e-117, 7.58615165e-86),
            (0.51052929, 7.87526234e-19, 1.19494060e-07, 0.04616121, 3.02053529e-02, 1.26361078e-24, 1.22650871e-06, 3.69808420e-085, 4.43561476e-23),
            (0.53815556, 9.82557304e-07, 1.68507525e-05, 0.3219671 , 4.60987120e-02, 6.37880428e-16, 1.04491384e-05, 1.54705656e-073, 1.65990714e-07),
            (0.56701976, 3.78240049e-06, 2.15205603e-05, 0.32967762, 1.87330858e-01, 1.54989449e-11, 6.36140894e-05, 1.67436768e-069, 3.60759031e-06),
(.......................)  ],

Displayed plot size in tutorials

In the visualization tutorials the plots are rendered to a box of predefined size/shape and plots are not properly displayed - they get the correct size only after clicking on the plots to open in a separate window. This is problematic for the tutorial as we want to show how to properly align/size plots and the current layout makes it seem like plots are not properly sized.
See the example below that is completely squashed:

While the image opened in a separate tab looks normal:

General questions

Can you help me with a couple of general things?

My object looks like this:

AnnData object with n_obs × n_vars = 37814 × 1802
obs: 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden'
var: 'gene_ids', 'feature_types', 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'leiden', 'rank_genes_groups', 'leiden_colors'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'distances', 'connectivities'

As far as I am aware, 'leiden' contains the cluster ID of my cells. How can I copy this to another obs slot?

And, I have sample ID, from the loup output file. It is a CSV with cell barcode in one column and sample ID in the other. In R, using seurat, I would assinge this ID suing the following code, is there a similar way I can do this in python using scanpy?

aggr$sampleID=(samples_ID[match(rownames([email protected]),samples_ID$Barcode),2])

Finally, to perform DE in seurat, I would copy cluster ID to smaple ID using;
aggr$cluster_sampleID <- paste(Idents(aggr), aggr$sampleID, sep = "_")
What would be the best way to do this using Scanpy?

Many thanks,
Chris

Analysis and visualization of spatial transcriptomics data

Hi, thank you for your great tutorial.

I read this tutorial. And I confused about adata.obsm['spatial'] variable. Is these variables corresponding to adata.uns images for hires and lowres?

Automate tutorial building in CI

See https://github.com/scverse/scverse-tutorials/blob/main/.github/workflows/execute-nbs.yaml

This fixes:

Analyzing CITE-seq data contains internal comments

Hello there, thanks for the amazing Scanpy tutorials!

The CITE-seq tutorials contains a number of internal comments, which you might want to hide. Here are a few examples:

We still need to explain the function here. I’m happy if we add it to the first tutorial, too (I know you did it already at some point, but I didn’t want to let go of the simpler naming scheme back then; now I’d be happy to transition.)

TODO: I would like to include some justification for the change in normalization. It definitley has a much different distribution than transcripts. I think this could be shown through the qc plots, but it’s a huge pain to move around these matplotlib plots. This might be more appropriate for the in-depth guide though.

Agreed! But, having an explanation of the two images below is also a good first step. Also nice to contrast it with the corresponding RNA distributions.

Discuss that this here is a different normalization.

Yes, would be great to the see the QC plots, here, too. Are cells with low counts etc. weird only on the RNA level or also on the protein level? It’d be nice to see the QC plots side-by-side between RNA and protein, I’d say.

Switch to leiden

see scverse/scanpy#1283 (comment)

@LuckyMD we should change the tutorials instead of going back, because louvain is deprecated:

This package has been superseded by the leidenalg package and will no longer be maintained. Please upgrade to the leidenalg package.

pbmc3k tutorial issue

ModuleNotFoundError                       Traceback (most recent call last)
Input In [3], in <cell line: 3>()
      1 import numpy as np
      2 import pandas as pd
----> 3 import scanpy as sc

ModuleNotFoundError: No module named 'scanpy'

When trying to run the code through jupyter notebook I keep getting this error even though I downloaded and installed scanpy. What can I do to fix this?

The dataset of the scanpy workshop day2

Hi, is there data_processed.h5ad existed in somewhere? The scanpy workshop day 2 has to analyze it. I can only find GEO file, or should I create it myself. Thanks.

Wrong link? I could not connect the link to download pre-processed h5ad file.

In the tutorial "Integrating spatial data with scRNA-seq using scanorama", I could not download the pre-processed h5ad file, which is embedded(linked) to the following sentence: "Conveniently, you can also download the pre-processed dataset in h5ad format from here".

(https://hmgubox.helmholtz-muenchen.de/f/4ef254675e2a41f89835/?dl=1).
https://scanpy-tutorials.readthedocs.io/en/latest/spatial/integration-scanorama.html

Is it just a wrong link?
I am not familiar with this community and GitHub, but I am posting here. Please inform me if I posted in the incorrect location. Thank you.

Spatial Transcriptomics tutorial

Hi, I'm running the Spatial Transcriptomics tutorial and encountering an error loading data.

https://nbviewer.ipython.org/github/giovp/scanpy-tutorials/blob/spatial/analysis-visualization-spatial.ipynb

These are installed and imported with no problem:

import scanpy as sc
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from matplotlib import rcParams
import seaborn as sb

import SpatialDE

plt.rcParams['figure.figsize']=(8,8)

%load_ext autoreload
%autoreload 2

Then, with this command to import data:

adata = sc.datasets.visium_sge('V1_Human_Lymph_Node')

This happens:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-08de67716edc> in <module>
----> 1 adata = sc.datasets.visium_sge('V1_Human_Lymph_Node')

AttributeError: module 'scanpy.datasets' has no attribute 'visium_sge'

So, I tried loading an h5 file that I have downloaded:

adata=sc.read_visium('filtered_feature_bc_matrix.h5')

But this happened:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-a58c17f7f867> in <module>
----> 1 adata=sc.read_visium('filtered_feature_bc_matrix.h5')

AttributeError: module 'scanpy' has no attribute 'read_visium'

Version:

>>> sc.__version__
'1.4.5.1'

I thought it might have been my installation, so I tried this:

!pip install git+https://github.com/theislab/scanpy.git@spatial

But, this happened:

Collecting git+https://github.com/theislab/scanpy.git@spatial
  Cloning https://github.com/theislab/scanpy.git (to revision spatial) to /tmp/pip-req-build-t7upgber
  Running command git clone -q https://github.com/theislab/scanpy.git /tmp/pip-req-build-t7upgber
  WARNING: Did not find branch or tag 'spatial', assuming revision or ref.
  Running command git checkout -q spatial
  error: pathspec 'spatial' did not match any file(s) known to git.
ERROR: Command errored out with exit status 1: git checkout -q spatial Check the logs for full command output.

It seems like I don't have access to these functions. Is there something I should be doing differently? Any suggestions for things to try?

>>> sc.logging.print_versions() 
scanpy==1.4.5.1 anndata==0.7.1 umap==0.3.10 numpy==1.18.1 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.22.1 statsmodels==0.11.1 python-igraph==0.8.0 louvain==0.6.1

Array in 1-dimension erro

Hi,
I am getting this error while adding a new column for MT genes.
What is that I am doing wrong?

adata = sc.read_10x_h5(dir_path + 'filtered_feature_bc_matrix.h5')
adata.var_names_make_unique() 
print(adata.X)
print(adata.obs['sample'].value_counts())
print(adata.obs['sample'].value_counts())
print(f'Number of cells before filter: {adata.n_obs}')

# Quality control - calculate QC covariates
adata.obs['n_counts'] = adata.X.sum(1)
adata.obs['log_counts'] = np.log(adata.obs['n_counts'])
adata.obs['n_genes'] = (adata.X > 0).sum(1)

mt_gene_mask = [gene.startswith('MT-') for gene in adata.var_names]
adata.obs['mt_frac'] = adata.X[:, mt_gene_mask].sum(1)/adata.obs['n_counts']

AV_24    571
Name: sample, dtype: int64
Number of cells before filter: 571
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-26-5e7c1a502972> in <module>
     11 
     12 mt_gene_mask = [gene.startswith('MT-') for gene in adata.var_names]
---> 13 adata.obs['mt_frac'] = adata.X[:, mt_gene_mask].sum(1)/adata.obs['n_counts']

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/series.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    634         # for binary ops, use our custom dunder methods
    635         result = ops.maybe_dispatch_ufunc_to_dunder_op(
--> 636             self, ufunc, method, *inputs, **kwargs
    637         )
    638         if result is not NotImplemented:

pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/ops/common.py in new_method(self, other)
     62         other = item_from_zerodim(other)
     63 
---> 64         return method(self, other)
     65 
     66     return new_method

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)
    503         result = arithmetic_op(lvalues, rvalues, op, str_rep)
    504 
--> 505         return _construct_result(left, result, index=left.index, name=res_name)
    506 
    507     wrapper.__name__ = op_name

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/ops/__init__.py in _construct_result(left, result, index, name)
    476     # We do not pass dtype to ensure that the Series constructor
    477     #  does inference in the case where `result` has object-dtype.
--> 478     out = left._constructor(result, index=index)
    479     out = out.__finalize__(left)
    480 

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    303                     data = data.copy()
    304             else:
--> 305                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
    306 
    307                 data = SingleBlockManager(data, index, fastpath=True)

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

pbmc3k tutorial does weird stuff at the end

One of the last lines of the pbmc3k tutorial is:

pbmc.X = None

I'm not sure if this ever worked. The goal is to save on space in memory, so I think this could be changed to replacing the dense array with sparse data:

adata.X = adata.raw[:, adata.var_names].X

Bot to update submodule on main repo when PRs are merged

It would be great to have a bot that makes a PR on the main repo when this repo is updated. We'd like this for scvi-tools as well (cc @martinkim0)

Spatial integration tutorial has complicated concatenation expression

@giovp, input 8 of the spatial integration tutorial currently looks like this:

adata_spatial = adata_spatial_anterior.concatenate(
    adata_spatial_posterior,
    batch_key="library_id",
    uns_merge="unique",
    batch_categories=[
        k
        for d in [
            adata_spatial_anterior.uns["spatial"],
            adata_spatial_posterior.uns["spatial"],
        ]
        for k, v in d.items()
    ],
)

That batch_categories expression is a little hard to read, and simplifies to: ['V1_Mouse_Brain_Sagittal_Anterior', 'V1_Mouse_Brain_Sagittal_Posterior'].

Is there a more simple way to do this? Maybe:

adata_spatial = adata_spatial_anterior.concatenate(
    adata_spatial_posterior,
    batch_key="library_id",
    uns_merge="unique",
    batch_categories=[
        'V1_Mouse_Brain_Sagittal_Anterior',
        'V1_Mouse_Brain_Sagittal_Posterior',
    ],
)

instead?

Further improvements for the advanced visualisation tutorial

More notes on how to deal with more complex plots, e.g. DotPlot
Split up dotplot scverse/scanpy#1956
Legend hierarchy scverse/scanpy#2024
Access color palettes and make palettes taht were not yet added scverse/scanpy#1881
Improve explanation of palettes (uns colors as list correspond to categories, how to generate palette for categorical with scanpy function).

spatial tutorial about np.concatenate

When I running the spatial tutorial and get some trouble in thie step below:
embedding_anterior = np.concatenate(integrated_anterior, axis=0)
I run this code and python just keep running and got no result,use Ctrl + C to end and get the traceback below:
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3418, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
embedding = np.concatenate(integrated, axis=0)
File "<array_function internals>", line 5, in concatenate
File "/root/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1088, in getitem
return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
File "/root/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 305, in init
self._init_as_view(X, oidx, vidx)
File "/root/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 358, in _init_as_view
self._remove_unused_categories(adata_ref.var, var_sub, uns_new)
File "/root/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1092, in _remove_unused_categories
if not is_categorical_dtype(df_full[k]):
File "/root/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2878, in getitem
return self._get_item_cache(key)
File "/root/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 3540, in _get_item_cache
loc = self.columns.get_loc(item)
File "/root/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
KeyboardInterrupt

typo

'Take a look at tools in the external API or at the ecoystem page to get a start with other tools.'
should be
'Take a look at tools in the external API or at the ecosystem page to get a start with other tools.'

see
https://scanpy-tutorials.readthedocs.io/en/latest/integrating-data-using-ingest.html#Mapping-PBMCs-using-ingest

https://github.com/theislab/scanpy-tutorials/blob/master/integrating-data-using-ingest.ipynb

File not found error

Hi I am new to Scanpy and I continue to encounter this "FileNotFoundError: Did not find file data/filtered_gene_bc_matrices/hg19/matrix.mtx.gz. I have unzipped the file but that is of no use. Would anyone have any advice about how to solve this?

normalize_geometric scanpy version?

Hi, which scanpy version has the normalize_geometric function for citeseq data?

I'm running:

sc.logging.print_versions()
scanpy==1.5.2.dev24+g83844013 anndata==0.7.3 umap==0.4.4 numpy==1.18.5 scipy==1.4.1 pandas==1.0.5 scikit-learn==0.23.1 statsmodels==0.11.1

thanks!

I want to translate the document out of Chinese version

There are a lot of Chinese developers who need scanpy's Chinese documentation, so I want to translate the documentation into Chinese version, how do I translate, contribute, and merge it

how can I know what marker genes the adata contains directly ?

I'd like to plot the marker gene score figures, but a problem is that there are some marker genes not shown in the adata, or I am not sure the spelling is consistent or not. Is there any way that I can know the marker genes included in the adata directly ? Thanks

tutorial example fails, AttributeError: module 'scanpy' has no attribute 'settings'

Python 3.6.7 (default, Dec 26 2018, 21:06:52)
[GCC 5.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas as pd
>>> import scanpy as sc
>>>
>>> sc.settings.verbosity = 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'scanpy' has no attribute 'settings'
>>> sc.__version__
'1.3.6'

pip can't find branch @spatial

Hello! I'm triying to install the scanpy@spatial release but keep bumping into this error ->

pip install git+https://github.com/theislab/scanpy.git@spatial

Collecting git+https://github.com/theislab/scanpy.git@spatial
  Cloning https://github.com/theislab/scanpy.git (to revision spatial) to /tmp/pip-req-build-d8y5n7oc
  Running command git clone -q https://github.com/theislab/scanpy.git /tmp/pip-req-build-d8y5n7oc
  WARNING: Did not find branch or tag 'spatial', assuming revision or ref.
  Running command git checkout -q spatial
  error: pathspec 'spatial' did not match any file(s) known to git.
ERROR: Command errored out with exit status 1: git checkout -q spatial Check the logs for full command output.
...

Could you guys please help me out? Thanks in advance!

key Error - pl.paga_path

Run tutorial on Google Cloud VM instance

Hello, I'm a new user to run the scanpy-tutorials on Google cloud vm instance. I can't get figure when I run it. For example, when I run "sc.pl.highest_expr_genes(adata, n_top=20, )", it just finished, but no figure showing as expected. I know it's not an issue of scanpy, but do you have any experience and suggestions? Thank you so much.

key Error - paga_path

I am getting an error with sc.pl.paga_path. It doesnt plot genes that my data has - i doubled checked my gene, it is in adata.var_names... if i use a gene that is not in adata.var_names, then it can plot... any idea about why the code is wrong?

_, axs = pl.subplots(ncols=3, figsize=(6, 2.5), gridspec_kw={'wspace': 0.05, 'left': 0.12})
pl.subplots_adjust(left=0.05, right=0.98, top=0.82, bottom=0.2)
for ipath, (descr, path) in enumerate(paths):
    _, data = sc.pl.paga_path(
        adata, path, gene_names,
        show_node_names=False,
        ax=axs[ipath],
        ytick_fontsize=12,
        left_margin=0.15,
        n_avg=50,
        annotations=['distance'],
        show_yticks=True if ipath==0 else False,
        show_colorbar=False,
        color_map='Greys',
        groups_key='clusters',
        color_maps_annotations={'distance': 'viridis'},
        title='{} path'.format(descr),
        return_data=True,
        show=False)
pl.savefig('./figures/paga_path_paul15.pdf')
pl.show()

Errors:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Ascl1'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-522-a191d9b34213> in <module>
     17         title='{} path'.format(descr),
     18         return_data=True,
---> 19         show=False)
     20     data.to_csv('./write/paga_path_{}.csv'.format(descr))
     21 pl.savefig('./figures/paga_path_paul15.pdf')

~/opt/anaconda3/lib/python3.7/site-packages/scanpy/plotting/_tools/paga.py in paga_path(adata, nodes, keys, use_raw, annotations, color_map, color_maps_annotations, palette_groups, n_avg, groups_key, xlim, title, left_margin, ytick_fontsize, title_fontsize, show_node_names, show_yticks, show_colorbar, legend_fontsize, legend_fontweight, normalize_to_zero_one, as_heatmap, return_data, show, save, ax)
   1090                 x += list(adata.obs[key].values[idcs])
   1091             else:
-> 1092                 x += list(adata_X[:, key].X[idcs])
   1093             if ikey == 0:
   1094                 groups += [group for i in range(len(idcs))]

~/opt/anaconda3/lib/python3.7/site-packages/anndata/_core/raw.py in __getitem__(self, index)
     99 
    100     def __getitem__(self, index):
--> 101         oidx, vidx = self._normalize_indices(index)
    102 
    103         # To preserve two dimensional shape

~/opt/anaconda3/lib/python3.7/site-packages/anndata/_core/raw.py in _normalize_indices(self, packed_index)
    159         obs, var = unpack_index(packed_index)
    160         obs = _normalize_index(obs, self._adata.obs_names)
--> 161         var = _normalize_index(var, self.var_names)
    162         return obs, var
    163 

~/opt/anaconda3/lib/python3.7/site-packages/anndata/_core/index.py in _normalize_index(indexer, index)
     72         return indexer
     73     elif isinstance(indexer, str):
---> 74         return index.get_loc(indexer)  # int
     75     elif isinstance(indexer, (Sequence, np.ndarray, pd.Index, spmatrix, np.matrix)):
     76         if hasattr(indexer, "shape") and (

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Ascl1'

differentially expressed genes

(bug): `pbmc3k` tutorial is not reproducible

To Reproduce

Run https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html with the current scanpy master branch and the results of marker gene plotting do not match the table. For example IL7R is not ranked highly in the first group (notwithstanding the fact that it is filtered out by the highly variable genes selection). It seems that even in the rendered version, this is not happening.

I also get a different number of clusters as a default.

From what I heard, this tutorial will be replaced anyway but figured I'd register an issue.

combining protein and RNA

Dear developers,

I'm trying to follow your tutorial in combining for CITE-seq but fail to recombine both datasets, protein & RNA back into one object.
This might have to do with processing each set differently (filtering cells, genes) and trying to recombine again. How can this be facilitated or in which manner can I process protein & RNA and recombine both files to visualize?

Kind Regards

Use sklearn-ann in "alternative knn libraries" tutorial?

@flying-sheep, should we point out (and maybe use) sklearn-ann in the tutorial you added?

ImportError: Please install the leiden algorithm: `conda install -c conda-forge leidenalg` or `pip3 install leidenalg`.

Hi,
I'm running the pbmc3k tutorial on Jupyter Notebook. When I ran the code sc.tl.leiden(adata) , I encountered this error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
File ~/anaconda3/envs/env4/lib/python3.9/site-packages/scanpy/tools/_leiden.py:108, in leiden(adata, resolution, restrict_to, random_state, key_added, adjacency, directed, use_weights, n_iterations, partition_type, neighbors_key, obsp, copy, **partition_kwargs)
    107 try:
--> 108     import leidenalg
    109 except ImportError:

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/leidenalg/__init__.py:35
      2 r""" This package implements the Leiden algorithm in ``C++`` and exposes it to
      3 python.  It relies on ``(python-)igraph`` for it to function. Besides the
      4 relative flexibility of the implementation, it also scales well, and can be run
   (...)
     33 not immediately available in :func:`leidenalg.find_partition`.
     34 """
---> 35 from .functions import ALL_COMMS
     36 from .functions import ALL_NEIGH_COMMS

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/leidenalg/functions.py:2
      1 import sys
----> 2 import igraph as _ig
      3 from . import _c_leiden

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/__init__.py:25
      6 __license__ = """
      7 Copyright (C) 2006- The igraph development team
      8 
   (...)
     22 02110-1301 USA
     23 """
---> 25 from igraph._igraph import (
     26     ADJ_DIRECTED,
     27     ADJ_LOWER,
     28     ADJ_MAX,
     29     ADJ_MIN,
     30     ADJ_PLUS,
     31     ADJ_UNDIRECTED,
     32     ADJ_UPPER,
     33     ALL,
     34     ARPACKOptions,
     35     BFSIter,
     36     BLISS_F,
     37     BLISS_FL,
     38     BLISS_FLM,
     39     BLISS_FM,
     40     BLISS_FS,
     41     BLISS_FSM,
     42     DFSIter,
     43     Edge,
     44     GET_ADJACENCY_BOTH,
     45     GET_ADJACENCY_LOWER,
     46     GET_ADJACENCY_UPPER,
     47     GraphBase,
     48     IN,
     49     InternalError,
     50     OUT,
     51     REWIRING_SIMPLE,
     52     REWIRING_SIMPLE_LOOPS,
     53     STAR_IN,
     54     STAR_MUTUAL,
     55     STAR_OUT,
     56     STAR_UNDIRECTED,
     57     STRONG,
     58     TRANSITIVITY_NAN,
     59     TRANSITIVITY_ZERO,
     60     TREE_IN,
     61     TREE_OUT,
     62     TREE_UNDIRECTED,
     63     Vertex,
     64     WEAK,
     65     arpack_options as default_arpack_options,
     66     community_to_membership,
     67     convex_hull,
     68     is_bigraphical,
     69     is_degree_sequence,
     70     is_graphical,
     71     is_graphical_degree_sequence,
     72     set_progress_handler,
     73     set_random_number_generator,
     74     set_status_handler,
     75     umap_compute_weights,
     76     __igraph_version__,
     77 )
     78 from igraph.adjacency import (
     79     _get_adjacency,
     80     _get_adjacency_sparse,
   (...)
     83     _get_inclist,
     84 )

ImportError: dlopen(/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/_igraph.abi3.so, 0x0002): Library not loaded: @rpath/libblas.3.dylib
  Referenced from: /Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/_igraph.abi3.so
  Reason: tried: '/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/../../../libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/../../../libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/bin/../lib/libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/bin/../lib/libblas.3.dylib' (no such file), '/usr/local/lib/libblas.3.dylib' (no such file), '/usr/lib/libblas.3.dylib' (no such file)

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Cell In[60], line 1
----> 1 sc.tl.leiden(adata)

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/scanpy/tools/_leiden.py:110, in leiden(adata, resolution, restrict_to, random_state, key_added, adjacency, directed, use_weights, n_iterations, partition_type, neighbors_key, obsp, copy, **partition_kwargs)
    108     import leidenalg
    109 except ImportError:
--> 110     raise ImportError(
    111         'Please install the leiden algorithm: `conda install -c conda-forge leidenalg` or `pip3 install leidenalg`.'
    112     )
    113 partition_kwargs = dict(partition_kwargs)
    115 start = logg.info('running Leiden clustering')

ImportError: Please install the leiden algorithm: `conda install -c conda-forge leidenalg` or `pip3 install leidenalg`.

Then I tried import leidenalg but still came to error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[61], line 1
----> 1 import leidenalg

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/leidenalg/__init__.py:35
      1 # -*- coding: utf-8 -*-
      2 r""" This package implements the Leiden algorithm in ``C++`` and exposes it to
      3 python.  It relies on ``(python-)igraph`` for it to function. Besides the
      4 relative flexibility of the implementation, it also scales well, and can be run
   (...)
     33 not immediately available in :func:`leidenalg.find_partition`.
     34 """
---> 35 from .functions import ALL_COMMS
     36 from .functions import ALL_NEIGH_COMMS
     37 from .functions import RAND_COMM

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/leidenalg/functions.py:2
      1 import sys
----> 2 import igraph as _ig
      3 from . import _c_leiden
      4 from ._c_leiden import ALL_COMMS

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/__init__.py:25
      1 """
      2 igraph library.
      3 """
      6 __license__ = """
      7 Copyright (C) 2006- The igraph development team
      8 
   (...)
     22 02110-1301 USA
     23 """
---> 25 from igraph._igraph import (
     26     ADJ_DIRECTED,
     27     ADJ_LOWER,
     28     ADJ_MAX,
     29     ADJ_MIN,
     30     ADJ_PLUS,
     31     ADJ_UNDIRECTED,
     32     ADJ_UPPER,
     33     ALL,
     34     ARPACKOptions,
     35     BFSIter,
     36     BLISS_F,
     37     BLISS_FL,
     38     BLISS_FLM,
     39     BLISS_FM,
     40     BLISS_FS,
     41     BLISS_FSM,
     42     DFSIter,
     43     Edge,
     44     GET_ADJACENCY_BOTH,
     45     GET_ADJACENCY_LOWER,
     46     GET_ADJACENCY_UPPER,
     47     GraphBase,
     48     IN,
     49     InternalError,
     50     OUT,
     51     REWIRING_SIMPLE,
     52     REWIRING_SIMPLE_LOOPS,
     53     STAR_IN,
     54     STAR_MUTUAL,
     55     STAR_OUT,
     56     STAR_UNDIRECTED,
     57     STRONG,
     58     TRANSITIVITY_NAN,
     59     TRANSITIVITY_ZERO,
     60     TREE_IN,
     61     TREE_OUT,
     62     TREE_UNDIRECTED,
     63     Vertex,
     64     WEAK,
     65     arpack_options as default_arpack_options,
     66     community_to_membership,
     67     convex_hull,
     68     is_bigraphical,
     69     is_degree_sequence,
     70     is_graphical,
     71     is_graphical_degree_sequence,
     72     set_progress_handler,
     73     set_random_number_generator,
     74     set_status_handler,
     75     umap_compute_weights,
     76     __igraph_version__,
     77 )
     78 from igraph.adjacency import (
     79     _get_adjacency,
     80     _get_adjacency_sparse,
   (...)
     83     _get_inclist,
     84 )
     85 from igraph.automorphisms import (
     86     _count_automorphisms_vf2,
     87     _get_automorphisms_vf2,
     88 )

ImportError: dlopen(/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/_igraph.abi3.so, 0x0002): Library not loaded: @rpath/libblas.3.dylib
  Referenced from: /Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/_igraph.abi3.so
  Reason: tried: '/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/../../../libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/../../../libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/bin/../lib/libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/bin/../lib/libblas.3.dylib' (no such file), '/usr/local/lib/libblas.3.dylib' (no such file), '/usr/lib/libblas.3.dylib' (no such file)

I've installed the leidenalg package. How can I solve this?
I am using Python 3.9.18, Jupyter Notebook 6.5.4, scanpy==1.9.5 anndata==0.10.0 umap==0.5.3 numpy==1.24.3 scipy==1.11.3 pandas==2.0.3 scikit-learn==1.3.0 statsmodels==0.14.0 pynndescent==0.5.10

Thanks a lot!

pearson residue normalization project to tSNE coordinates vs UMAP and the surface outline property?

Hey, I really don't know if this could be called an issue.
Since in the paper justifying square root normalization for visualization and clustering? and the Pearson residue normalizes and selected hvgs, both of the results are projected onto tSNE space. Are you suggesting we use tSNE as a major distance projection method?

I tried with my data and it seems with the UMAP projection, my data distribution surface seems less differentiable than other normalization methods. And I check in Pearson residue in this paper Fig4d,e you also see more "linear" /less differentiable distribution on the tSNE embeddings (compared to fig4a and c).
Actually, I personally like the simplicity of linearity but I reckon I need some explanation with regard to how this is the case (e.g. to explain to my colleagues and such since they are using sctransform and get the projection as in fig4c) and the confirmation that tSNE embeddings outperform UMAP with pearson residues.

Best,

cell type annotation problem

When I ran the code in clustering of the tutorials, all were well, except the "sc.pl.umap(adata, legend_loc='on data', title='', frameon=False, save='.pdf', color='leiden')". It reports that 'Float64Index' object has no attribute 'add_categories', how should i solve this problem

missing code in Paul15?

Hi,

I'm following the script for Paul 2015 (https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html). I'm not familiar with python but is it missing something in [40]?
I have error in line 1.

_, axs = pl.subplots(ncols=3, figsize=(6, 2.5), gridspec_kw={'wspace': 0.05, 'left': 0.12})
pl.subplots_adjust(left=0.05, right=0.98, top=0.82, bottom=0.2)
for ipath, (descr, path) in enumerate(paths):
    _, data = sc.pl.paga_path(
        adata, path, gene_names,
        show_node_names=False,
        ax=axs[ipath],
        ytick_fontsize=12,
        left_margin=0.15,
        n_avg=50,
        annotations=['distance'],
        show_yticks=True if ipath==0 else False,
        show_colorbar=False,
        color_map='Greys',
        groups_key='clusters',
        color_maps_annotations={'distance': 'viridis'},
        title='{} path'.format(descr),
        return_data=True,
        show=False)
    data.to_csv('./write/paga_path_{}.csv'.format(descr))
pl.savefig('./figures/paga_path_paul15.pdf')
pl.show()

tutorial_pearson_residuals: why are residuals computed twice?

In the tutorial_pearson_residuals tutorial, the steps are:

# sets adata.var["highly_variable"]
sc.experimental.pp.highly_variable_genes(adata, flavor="pearson_residuals", n_top_genes=2000)

adata = adata[:, adata.var["highly_variable"]]

sc.experimental.pp.normalize_pearson_residuals(adata)

sc.experimental.pp.highly_variable_genes(flavor="pearson_residuals") computes pearson residuals here:

https://github.com/scverse/scanpy/blob/2e98705347ea484c36caa9ba10de1987b09081bf/scanpy/experimental/pp/_highly_variable_genes.py#L120-L127

And sc.experimental.pp.normalize_pearson_residuals computes pearson residuals here:

https://github.com/scverse/scanpy/blob/2e98705347ea484c36caa9ba10de1987b09081bf/scanpy/experimental/pp/_normalization.py#L130

which refers to

https://github.com/scverse/scanpy/blob/2e98705347ea484c36caa9ba10de1987b09081bf/scanpy/experimental/pp/_normalization.py#L57-L64

So if I'm understanding, the tutorial leads you to compute pearson residuals twice, which is a little confusing. Why are there two different functions for computing those residuals anyhow? @jlause

3k PBMCs tutorial: neighborhood graph scanpy vs. Seurat

Hello!

I have been trying to translate a colleague's Seurat-based R code to scanpy/Python and have been using the PBMC 3k guided tutorials from each as a reference for basic preprocessing workflow. Let me know if this question would be better suited for mainstream scanpy, but I was having trouble understanding the translation between Seurat FindNeighbors() and scanpy pp.neighbors() based on the docs.

In Seurat's 3k PBMCs tutorial, they use JackStraw to determine the dimensionality of the data (first 10 PCs) and then construct the KNN graph with that dimensionality as a param of the FindNeighbors() function:
pbmc <- FindNeighbors(pbmc, dims = 1:10)

Based on the sc.pp.neighbors() documentation, I assumed that scanpy's n_pcs param would be equivalent to the FindNeighbors() dims param, but in the scanpy 3k PBMCs tutorial it reads:

Let us compute the neighborhood graph of cells using the PCA representation of the data matrix. You might simply use default values here. For the sake of reproducing Seurat’s results, let’s take the following values.
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)

If we wanted to reproduce Seurat's results from their tutorial, wouldn't we want to set n_pcs = 10 to include the first 10 PCs when creating the neighborhood graph? I just wanted to clarify that n_pcs is the correct scanpy parameter for setting the number of PCs included for the KNN graph.

Thank you!

Phoebe

pbmc3k - KeyError: 'base'

Following the pbmc3k tutorial I get an error: KeyError: 'base'
when executing the following cmd:

sc.tl.rank_genes_groups(adata, 'leiden', groups=['0'], reference='1', method='wilcoxon')

I run python 3.8 in a venv on Ubuntu 20.04 (VM)

scanpy==1.9.3

More error details (notebook cell's output):

ranking genes
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[47], line 1
----> 1 sc.tl.rank_genes_groups(adata, 'leiden', groups=['0'], reference='1', method='wilcoxon')
      2 sc.pl.rank_genes_groups(adata, groups=['0'], n_genes=20)

File .../lib/python3.8/site-packages/scanpy/tools/_rank_genes_groups.py:590, in rank_genes_groups(adata, groupby, use_raw, groups, reference, n_genes, rankby_abs, pts, key_added, copy, method, corr_method, tie_correct, layer, **kwds)
    580 adata.uns[key_added] = {}
    581 adata.uns[key_added]['params'] = dict(
    582     groupby=groupby,
    583     reference=reference,
   (...)
    587     corr_method=corr_method,
    588 )
--> 590 test_obj = _RankGenes(adata, groups_order, groupby, reference, use_raw, layer, pts)
    592 if check_nonnegative_integers(test_obj.X) and method != 'logreg':
    593     logg.warning(
    594         "It seems you use rank_genes_groups on the raw count data. "
    595         "Please logarithmize your data before calling rank_genes_groups."
    596     )

File .../lib/python3.8/site-packages/scanpy/tools/_rank_genes_groups.py:93, in _RankGenes.__init__(self, adata, groups, groupby, reference, use_raw, layer, comp_pts)
     82 def __init__(
     83     self,
     84     adata,
   (...)
     90     comp_pts=False,
     91 ):
---> 93     if 'log1p' in adata.uns_keys() and adata.uns['log1p']['base'] is not None:
     94         self.expm1_func = lambda x: np.expm1(x * np.log(adata.uns['log1p']['base']))
     95     else:

KeyError: 'base'

pbmc scanpy tutorial reproducibility error in sc.tl.leiden arguments

Hello, I am a PhD student working on single-cell analysis. I got hired two months ago, I have to move the analysis from R in Python, so I downloaded directly the .ipynb file in my laptop with the dataset to check if it works. I got this error :

TypeError: RBConfigurationVertexPartition.init() got an unexpected keyword argument 'flavor'

In this part of the code:
sc.tl.leiden(
adata,
resolution=0.9,
random_state=0,
flavor="igraph",
n_iterations=2,
directed=False,
)

And it is actually right there is no "flavor" argument in the sc.tl.leiden function, so I commented the 'flavor="igraph"' line of code and it works, however the clusterization it is slightly different and also the assignation of the genes it is not the same of the analysis reported here https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html.

I know it is an hard work create tutorials and pages to demonstrate workflows, so if it possible I would like to know how to reproduce the same clusters, and/or if there is a different argument that I could use to reproduce the same results, sorry for my few experience with scanpy but it is the first time I am trying this workflow on Python. Thank you for your availability.

ipynb that can be run as a single pipeline from start to finish

@ivirshup
Hi!

Is there a SINGLE ipynb to run preprocessing, core plotting and trajectory as one pipeline in jupyter notebook?

Also are there provisions to subset clusters and analyze further?

Thanks!

Cannot reproduce trajectory inference tutorial

Hi there,

I downloaded and ran the trajectory inference Jupyter notebook. I could run smoothly, but I cannot get the plot where cells are overlayed to the trajectory.

(As you can see, the upper image is fine, but the bottoms are wrong.)

The results in tutorial are:

I didn't make any change to the code.

I am using scanpy 1.9.3. Any suggestions?