Giter Club home page Giter Club logo

scanpy-tutorials's Introduction

Scanpy tutorials

See this page for more context.

scanpy-tutorials's People

Contributors

abearab avatar adamgayoso avatar dineshpalli avatar evanbiederstedt avatar falexwolf avatar fidelram avatar flying-sheep avatar giovp avatar gokceneraslan avatar grst avatar hrovatin avatar ilan-gold avatar ivirshup avatar jlause avatar koncopd avatar ktpolanski avatar mr-milk avatar mys721tx avatar pre-commit-ci[bot] avatar raphaelbuzzi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scanpy-tutorials's Issues

KeyError: 'rank_genes_groups'

I just practiced scanpy by the 3K PBMC tutorial. Yet, I encountered an error as following:

Rank genes groups Error

How to solve this problem?

Extract Differentialy Expressed Genes for each group

Quick question:

I can plot differentially expressed genes for a group of celltypes in a dataset, like this:

sc.tl.rank_genes_groups(adata, 'celltype', method='wilcoxon', key_added = "wilcoxon", min_fold_change=3)
sc.tl.rank_genes_groups(adata, 'celltype', method='wilcoxon', key_added = "wilcoxon",     min_fold_change=3)

How can I access those genes for each celltype?

The data seems to be stored in adata.uns['wilcoxon'], but I have no idea how to extract that


{'params': {'groupby': 'preds',
  'reference': 'rest',
  'method': 'wilcoxon',
  'use_raw': False,
  'layer': None,
  'corr_method': 'benjamini-hochberg'},
 'names': rec.array([('CD14', 'HAL', 'KRT19', 'VWF', 'GLUL', 'CD4', 'CYP2E1', 'MARCO', 'CD3E'),
            ('FLT4', 'COL1A1', 'SOX9', 'LHX6', 'CLEC10A', 'VWF', 'CD14', 'CSF1R', 'NKG7'),
            ('C5AR1', 'ADAMTSL2', 'SPP1', 'HTRA3', 'CD276', 'COL1A1', 'GLUL', 'VSIG4', 'IL7R'),
            ('MYH11', 'COLEC11', 'COL1A1', 'RSPO3', 'SIRPA', 'COLEC11', 'C5AR1', 'CD68', 'PTPRC'),
            ('HAL', 'CD14', 'IGFBP3', 'HAL', 'C5AR1', 'C5AR1', 'CSF1R', 'HAL', 'HAL'),
            ('SIRPA', 'CYP2E1', 'CYP2E1', 'CYP2E1', 'CYP2E1', 'CYP2E1', 'COL1A1', 'CYP2E1', 'CYP2E1'),
(.......................)  ],     
           dtype=[('Unassigned', 'O'), ('b-cells', 'O'), ('cholangiocytes', 'O'), ('endothelial cells', 'O'), ('erythroid cells', 'O'), ('hepatic stellate cells', 'O'), ('hepatocytes', 'O'), ('kupffer cells', 'O'), ('t-cells', 'O')]),
 'scores': rec.array([( 1.1047919 ,  1.4223580e+01,  8.5403233e+00,  2.9836638e+00,  3.7443349e+00,  1.03904781e+01,  5.28960686e+01,  2.30597286e+01,  1.96362038e+01),
            ( 0.65801376,  8.8617716e+00,  5.2942681e+00,  1.9939163e+00,  2.1673870e+00,  1.02436619e+01,  4.85130644e+00,  1.95555763e+01,  9.89364815e+00),
            ( 0.6156045 ,  4.8951001e+00,  4.3029914e+00,  9.9042362e-01,  1.9944884e+00,  8.08185673e+00,  4.40766430e+00,  1.81397591e+01,  5.23387718e+00),
(.......................)  ],     
           dtype=[('Unassigned', '<f4'), ('b-cells', '<f4'), ('cholangiocytes', '<f4'), ('endothelial cells', '<f4'), ('erythroid cells', '<f4'), ('hepatic stellate cells', '<f4'), ('hepatocytes', '<f4'), ('kupffer cells', '<f4'), ('t-cells', '<f4')]),
 'pvals': rec.array([(0.26924979, 6.54181873e-46, 1.33846555e-17, 0.00284819, 1.80872154e-04, 2.73975710e-25, 0.00000000e+00, 1.17486974e-117, 7.58615165e-86),
            (0.51052929, 7.87526234e-19, 1.19494060e-07, 0.04616121, 3.02053529e-02, 1.26361078e-24, 1.22650871e-06, 3.69808420e-085, 4.43561476e-23),
            (0.53815556, 9.82557304e-07, 1.68507525e-05, 0.3219671 , 4.60987120e-02, 6.37880428e-16, 1.04491384e-05, 1.54705656e-073, 1.65990714e-07),
            (0.56701976, 3.78240049e-06, 2.15205603e-05, 0.32967762, 1.87330858e-01, 1.54989449e-11, 6.36140894e-05, 1.67436768e-069, 3.60759031e-06),
(.......................)  ],     


Displayed plot size in tutorials

In the visualization tutorials the plots are rendered to a box of predefined size/shape and plots are not properly displayed - they get the correct size only after clicking on the plots to open in a separate window. This is problematic for the tutorial as we want to show how to properly align/size plots and the current layout makes it seem like plots are not properly sized.
See the example below that is completely squashed:
image
While the image opened in a separate tab looks normal:
image

General questions

Can you help me with a couple of general things?

My object looks like this:

AnnData object with n_obs × n_vars = 37814 × 1802
obs: 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden'
var: 'gene_ids', 'feature_types', 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'leiden', 'rank_genes_groups', 'leiden_colors'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'distances', 'connectivities'

As far as I am aware, 'leiden' contains the cluster ID of my cells. How can I copy this to another obs slot?

And, I have sample ID, from the loup output file. It is a CSV with cell barcode in one column and sample ID in the other. In R, using seurat, I would assinge this ID suing the following code, is there a similar way I can do this in python using scanpy?

aggr$sampleID=(samples_ID[match(rownames([email protected]),samples_ID$Barcode),2])

Finally, to perform DE in seurat, I would copy cluster ID to smaple ID using;
aggr$cluster_sampleID <- paste(Idents(aggr), aggr$sampleID, sep = "_")
What would be the best way to do this using Scanpy?

Many thanks,
Chris

Analyzing CITE-seq data contains internal comments

Hello there, thanks for the amazing Scanpy tutorials!

The CITE-seq tutorials contains a number of internal comments, which you might want to hide. Here are a few examples:

We still need to explain the function here. I’m happy if we add it to the first tutorial, too (I know you did it already at some point, but I didn’t want to let go of the simpler naming scheme back then; now I’d be happy to transition.)

TODO: I would like to include some justification for the change in normalization. It definitley has a much different distribution than transcripts. I think this could be shown through the qc plots, but it’s a huge pain to move around these matplotlib plots. This might be more appropriate for the in-depth guide though.

Agreed! But, having an explanation of the two images below is also a good first step. Also nice to contrast it with the corresponding RNA distributions.

Discuss that this here is a different normalization.

Yes, would be great to the see the QC plots, here, too. Are cells with low counts etc. weird only on the RNA level or also on the protein level? It’d be nice to see the QC plots side-by-side between RNA and protein, I’d say.

pbmc3k tutorial issue

ModuleNotFoundError                       Traceback (most recent call last)
Input In [3], in <cell line: 3>()
      1 import numpy as np
      2 import pandas as pd
----> 3 import scanpy as sc

ModuleNotFoundError: No module named 'scanpy'

When trying to run the code through jupyter notebook I keep getting this error even though I downloaded and installed scanpy. What can I do to fix this?

The dataset of the scanpy workshop day2

Hi, is there data_processed.h5ad existed in somewhere? The scanpy workshop day 2 has to analyze it. I can only find GEO file, or should I create it myself. Thanks.

Wrong link? I could not connect the link to download pre-processed h5ad file.

In the tutorial "Integrating spatial data with scRNA-seq using scanorama", I could not download the pre-processed h5ad file, which is embedded(linked) to the following sentence: "Conveniently, you can also download the pre-processed dataset in h5ad format from here".

(https://hmgubox.helmholtz-muenchen.de/f/4ef254675e2a41f89835/?dl=1).
https://scanpy-tutorials.readthedocs.io/en/latest/spatial/integration-scanorama.html

Is it just a wrong link?
I am not familiar with this community and GitHub, but I am posting here. Please inform me if I posted in the incorrect location. Thank you.

Spatial Transcriptomics tutorial

Hi, I'm running the Spatial Transcriptomics tutorial and encountering an error loading data.

https://nbviewer.ipython.org/github/giovp/scanpy-tutorials/blob/spatial/analysis-visualization-spatial.ipynb

These are installed and imported with no problem:

import scanpy as sc
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from matplotlib import rcParams
import seaborn as sb

import SpatialDE

plt.rcParams['figure.figsize']=(8,8)

%load_ext autoreload
%autoreload 2

Then, with this command to import data:

adata = sc.datasets.visium_sge('V1_Human_Lymph_Node')

This happens:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-08de67716edc> in <module>
----> 1 adata = sc.datasets.visium_sge('V1_Human_Lymph_Node')

AttributeError: module 'scanpy.datasets' has no attribute 'visium_sge'

So, I tried loading an h5 file that I have downloaded:

adata=sc.read_visium('filtered_feature_bc_matrix.h5')

But this happened:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-a58c17f7f867> in <module>
----> 1 adata=sc.read_visium('filtered_feature_bc_matrix.h5')

AttributeError: module 'scanpy' has no attribute 'read_visium'

Version:

>>> sc.__version__
'1.4.5.1'

I thought it might have been my installation, so I tried this:

!pip install git+https://github.com/theislab/scanpy.git@spatial

But, this happened:

Collecting git+https://github.com/theislab/scanpy.git@spatial
  Cloning https://github.com/theislab/scanpy.git (to revision spatial) to /tmp/pip-req-build-t7upgber
  Running command git clone -q https://github.com/theislab/scanpy.git /tmp/pip-req-build-t7upgber
  WARNING: Did not find branch or tag 'spatial', assuming revision or ref.
  Running command git checkout -q spatial
  error: pathspec 'spatial' did not match any file(s) known to git.
ERROR: Command errored out with exit status 1: git checkout -q spatial Check the logs for full command output.

It seems like I don't have access to these functions. Is there something I should be doing differently? Any suggestions for things to try?

>>> sc.logging.print_versions() 
scanpy==1.4.5.1 anndata==0.7.1 umap==0.3.10 numpy==1.18.1 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.22.1 statsmodels==0.11.1 python-igraph==0.8.0 louvain==0.6.1

Array in 1-dimension erro

Hi,
I am getting this error while adding a new column for MT genes.
What is that I am doing wrong?

adata = sc.read_10x_h5(dir_path + 'filtered_feature_bc_matrix.h5')
adata.var_names_make_unique() 
print(adata.X)
print(adata.obs['sample'].value_counts())
print(adata.obs['sample'].value_counts())
print(f'Number of cells before filter: {adata.n_obs}')

# Quality control - calculate QC covariates
adata.obs['n_counts'] = adata.X.sum(1)
adata.obs['log_counts'] = np.log(adata.obs['n_counts'])
adata.obs['n_genes'] = (adata.X > 0).sum(1)

mt_gene_mask = [gene.startswith('MT-') for gene in adata.var_names]
adata.obs['mt_frac'] = adata.X[:, mt_gene_mask].sum(1)/adata.obs['n_counts']
AV_24    571
Name: sample, dtype: int64
Number of cells before filter: 571
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-26-5e7c1a502972> in <module>
     11 
     12 mt_gene_mask = [gene.startswith('MT-') for gene in adata.var_names]
---> 13 adata.obs['mt_frac'] = adata.X[:, mt_gene_mask].sum(1)/adata.obs['n_counts']

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/series.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    634         # for binary ops, use our custom dunder methods
    635         result = ops.maybe_dispatch_ufunc_to_dunder_op(
--> 636             self, ufunc, method, *inputs, **kwargs
    637         )
    638         if result is not NotImplemented:

pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/ops/common.py in new_method(self, other)
     62         other = item_from_zerodim(other)
     63 
---> 64         return method(self, other)
     65 
     66     return new_method

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)
    503         result = arithmetic_op(lvalues, rvalues, op, str_rep)
    504 
--> 505         return _construct_result(left, result, index=left.index, name=res_name)
    506 
    507     wrapper.__name__ = op_name

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/ops/__init__.py in _construct_result(left, result, index, name)
    476     # We do not pass dtype to ensure that the Series constructor
    477     #  does inference in the case where `result` has object-dtype.
--> 478     out = left._constructor(result, index=index)
    479     out = out.__finalize__(left)
    480 

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    303                     data = data.copy()
    304             else:
--> 305                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
    306 
    307                 data = SingleBlockManager(data, index, fastpath=True)

~/my_virtualenv/lib/python3.7/site-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

pbmc3k tutorial does weird stuff at the end

One of the last lines of the pbmc3k tutorial is:

pbmc.X = None

I'm not sure if this ever worked. The goal is to save on space in memory, so I think this could be changed to replacing the dense array with sparse data:

adata.X = adata.raw[:, adata.var_names].X

Spatial integration tutorial has complicated concatenation expression

@giovp, input 8 of the spatial integration tutorial currently looks like this:

adata_spatial = adata_spatial_anterior.concatenate(
    adata_spatial_posterior,
    batch_key="library_id",
    uns_merge="unique",
    batch_categories=[
        k
        for d in [
            adata_spatial_anterior.uns["spatial"],
            adata_spatial_posterior.uns["spatial"],
        ]
        for k, v in d.items()
    ],
)

That batch_categories expression is a little hard to read, and simplifies to: ['V1_Mouse_Brain_Sagittal_Anterior', 'V1_Mouse_Brain_Sagittal_Posterior'].

Is there a more simple way to do this? Maybe:

adata_spatial = adata_spatial_anterior.concatenate(
    adata_spatial_posterior,
    batch_key="library_id",
    uns_merge="unique",
    batch_categories=[
        'V1_Mouse_Brain_Sagittal_Anterior',
        'V1_Mouse_Brain_Sagittal_Posterior',
    ],
)

instead?

spatial tutorial about np.concatenate

When I running the spatial tutorial and get some trouble in thie step below:
embedding_anterior = np.concatenate(integrated_anterior, axis=0)
I run this code and python just keep running and got no result,use Ctrl + C to end and get the traceback below:
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3418, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
embedding = np.concatenate(integrated, axis=0)
File "<array_function internals>", line 5, in concatenate
File "/root/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1088, in getitem
return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
File "/root/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 305, in init
self._init_as_view(X, oidx, vidx)
File "/root/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 358, in _init_as_view
self._remove_unused_categories(adata_ref.var, var_sub, uns_new)
File "/root/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1092, in _remove_unused_categories
if not is_categorical_dtype(df_full[k]):
File "/root/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2878, in getitem
return self._get_item_cache(key)
File "/root/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 3540, in _get_item_cache
loc = self.columns.get_loc(item)
File "/root/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
KeyboardInterrupt

File not found error

Hi I am new to Scanpy and I continue to encounter this "FileNotFoundError: Did not find file data/filtered_gene_bc_matrices/hg19/matrix.mtx.gz. I have unzipped the file but that is of no use. Would anyone have any advice about how to solve this?

normalize_geometric scanpy version?

Hi, which scanpy version has the normalize_geometric function for citeseq data?

I'm running:

sc.logging.print_versions()
scanpy==1.5.2.dev24+g83844013 anndata==0.7.3 umap==0.4.4 numpy==1.18.5 scipy==1.4.1 pandas==1.0.5 scikit-learn==0.23.1 statsmodels==0.11.1

thanks!

how can I know what marker genes the adata contains directly ?

I'd like to plot the marker gene score figures, but a problem is that there are some marker genes not shown in the adata, or I am not sure the spelling is consistent or not. Is there any way that I can know the marker genes included in the adata directly ? Thanks

tutorial example fails, AttributeError: module 'scanpy' has no attribute 'settings'

Python 3.6.7 (default, Dec 26 2018, 21:06:52)
[GCC 5.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pandas as pd
>>> import scanpy as sc
>>>
>>> sc.settings.verbosity = 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'scanpy' has no attribute 'settings'
>>> sc.__version__
'1.3.6'

pip can't find branch @spatial

Hello! I'm triying to install the scanpy@spatial release but keep bumping into this error ->

pip install git+https://github.com/theislab/scanpy.git@spatial
Collecting git+https://github.com/theislab/scanpy.git@spatial
  Cloning https://github.com/theislab/scanpy.git (to revision spatial) to /tmp/pip-req-build-d8y5n7oc
  Running command git clone -q https://github.com/theislab/scanpy.git /tmp/pip-req-build-d8y5n7oc
  WARNING: Did not find branch or tag 'spatial', assuming revision or ref.
  Running command git checkout -q spatial
  error: pathspec 'spatial' did not match any file(s) known to git.
ERROR: Command errored out with exit status 1: git checkout -q spatial Check the logs for full command output.
...

Could you guys please help me out? Thanks in advance!

Run tutorial on Google Cloud VM instance

Hello, I'm a new user to run the scanpy-tutorials on Google cloud vm instance. I can't get figure when I run it. For example, when I run "sc.pl.highest_expr_genes(adata, n_top=20, )", it just finished, but no figure showing as expected. I know it's not an issue of scanpy, but do you have any experience and suggestions? Thank you so much.

key Error - paga_path

I am getting an error with sc.pl.paga_path. It doesnt plot genes that my data has - i doubled checked my gene, it is in adata.var_names... if i use a gene that is not in adata.var_names, then it can plot... any idea about why the code is wrong?

_, axs = pl.subplots(ncols=3, figsize=(6, 2.5), gridspec_kw={'wspace': 0.05, 'left': 0.12})
pl.subplots_adjust(left=0.05, right=0.98, top=0.82, bottom=0.2)
for ipath, (descr, path) in enumerate(paths):
    _, data = sc.pl.paga_path(
        adata, path, gene_names,
        show_node_names=False,
        ax=axs[ipath],
        ytick_fontsize=12,
        left_margin=0.15,
        n_avg=50,
        annotations=['distance'],
        show_yticks=True if ipath==0 else False,
        show_colorbar=False,
        color_map='Greys',
        groups_key='clusters',
        color_maps_annotations={'distance': 'viridis'},
        title='{} path'.format(descr),
        return_data=True,
        show=False)
pl.savefig('./figures/paga_path_paul15.pdf')
pl.show()

Errors:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Ascl1'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-522-a191d9b34213> in <module>
     17         title='{} path'.format(descr),
     18         return_data=True,
---> 19         show=False)
     20     data.to_csv('./write/paga_path_{}.csv'.format(descr))
     21 pl.savefig('./figures/paga_path_paul15.pdf')

~/opt/anaconda3/lib/python3.7/site-packages/scanpy/plotting/_tools/paga.py in paga_path(adata, nodes, keys, use_raw, annotations, color_map, color_maps_annotations, palette_groups, n_avg, groups_key, xlim, title, left_margin, ytick_fontsize, title_fontsize, show_node_names, show_yticks, show_colorbar, legend_fontsize, legend_fontweight, normalize_to_zero_one, as_heatmap, return_data, show, save, ax)
   1090                 x += list(adata.obs[key].values[idcs])
   1091             else:
-> 1092                 x += list(adata_X[:, key].X[idcs])
   1093             if ikey == 0:
   1094                 groups += [group for i in range(len(idcs))]

~/opt/anaconda3/lib/python3.7/site-packages/anndata/_core/raw.py in __getitem__(self, index)
     99 
    100     def __getitem__(self, index):
--> 101         oidx, vidx = self._normalize_indices(index)
    102 
    103         # To preserve two dimensional shape

~/opt/anaconda3/lib/python3.7/site-packages/anndata/_core/raw.py in _normalize_indices(self, packed_index)
    159         obs, var = unpack_index(packed_index)
    160         obs = _normalize_index(obs, self._adata.obs_names)
--> 161         var = _normalize_index(var, self.var_names)
    162         return obs, var
    163 

~/opt/anaconda3/lib/python3.7/site-packages/anndata/_core/index.py in _normalize_index(indexer, index)
     72         return indexer
     73     elif isinstance(indexer, str):
---> 74         return index.get_loc(indexer)  # int
     75     elif isinstance(indexer, (Sequence, np.ndarray, pd.Index, spmatrix, np.matrix)):
     76         if hasattr(indexer, "shape") and (

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Ascl1'

(bug): `pbmc3k` tutorial is not reproducible

To Reproduce

Run https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html with the current scanpy master branch and the results of marker gene plotting do not match the table. For example IL7R is not ranked highly in the first group (notwithstanding the fact that it is filtered out by the highly variable genes selection). It seems that even in the rendered version, this is not happening.

I also get a different number of clusters as a default.

From what I heard, this tutorial will be replaced anyway but figured I'd register an issue.

combining protein and RNA

Dear developers,

I'm trying to follow your tutorial in combining for CITE-seq but fail to recombine both datasets, protein & RNA back into one object.
This might have to do with processing each set differently (filtering cells, genes) and trying to recombine again. How can this be facilitated or in which manner can I process protein & RNA and recombine both files to visualize?

Kind Regards

ImportError: Please install the leiden algorithm: `conda install -c conda-forge leidenalg` or `pip3 install leidenalg`.

Hi,
I'm running the pbmc3k tutorial on Jupyter Notebook. When I ran the code sc.tl.leiden(adata) , I encountered this error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
File ~/anaconda3/envs/env4/lib/python3.9/site-packages/scanpy/tools/_leiden.py:108, in leiden(adata, resolution, restrict_to, random_state, key_added, adjacency, directed, use_weights, n_iterations, partition_type, neighbors_key, obsp, copy, **partition_kwargs)
    107 try:
--> 108     import leidenalg
    109 except ImportError:

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/leidenalg/__init__.py:35
      2 r""" This package implements the Leiden algorithm in ``C++`` and exposes it to
      3 python.  It relies on ``(python-)igraph`` for it to function. Besides the
      4 relative flexibility of the implementation, it also scales well, and can be run
   (...)
     33 not immediately available in :func:`leidenalg.find_partition`.
     34 """
---> 35 from .functions import ALL_COMMS
     36 from .functions import ALL_NEIGH_COMMS

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/leidenalg/functions.py:2
      1 import sys
----> 2 import igraph as _ig
      3 from . import _c_leiden

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/__init__.py:25
      6 __license__ = """
      7 Copyright (C) 2006- The igraph development team
      8 
   (...)
     22 02110-1301 USA
     23 """
---> 25 from igraph._igraph import (
     26     ADJ_DIRECTED,
     27     ADJ_LOWER,
     28     ADJ_MAX,
     29     ADJ_MIN,
     30     ADJ_PLUS,
     31     ADJ_UNDIRECTED,
     32     ADJ_UPPER,
     33     ALL,
     34     ARPACKOptions,
     35     BFSIter,
     36     BLISS_F,
     37     BLISS_FL,
     38     BLISS_FLM,
     39     BLISS_FM,
     40     BLISS_FS,
     41     BLISS_FSM,
     42     DFSIter,
     43     Edge,
     44     GET_ADJACENCY_BOTH,
     45     GET_ADJACENCY_LOWER,
     46     GET_ADJACENCY_UPPER,
     47     GraphBase,
     48     IN,
     49     InternalError,
     50     OUT,
     51     REWIRING_SIMPLE,
     52     REWIRING_SIMPLE_LOOPS,
     53     STAR_IN,
     54     STAR_MUTUAL,
     55     STAR_OUT,
     56     STAR_UNDIRECTED,
     57     STRONG,
     58     TRANSITIVITY_NAN,
     59     TRANSITIVITY_ZERO,
     60     TREE_IN,
     61     TREE_OUT,
     62     TREE_UNDIRECTED,
     63     Vertex,
     64     WEAK,
     65     arpack_options as default_arpack_options,
     66     community_to_membership,
     67     convex_hull,
     68     is_bigraphical,
     69     is_degree_sequence,
     70     is_graphical,
     71     is_graphical_degree_sequence,
     72     set_progress_handler,
     73     set_random_number_generator,
     74     set_status_handler,
     75     umap_compute_weights,
     76     __igraph_version__,
     77 )
     78 from igraph.adjacency import (
     79     _get_adjacency,
     80     _get_adjacency_sparse,
   (...)
     83     _get_inclist,
     84 )

ImportError: dlopen(/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/_igraph.abi3.so, 0x0002): Library not loaded: @rpath/libblas.3.dylib
  Referenced from: /Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/_igraph.abi3.so
  Reason: tried: '/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/../../../libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/../../../libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/bin/../lib/libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/bin/../lib/libblas.3.dylib' (no such file), '/usr/local/lib/libblas.3.dylib' (no such file), '/usr/lib/libblas.3.dylib' (no such file)

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Cell In[60], line 1
----> 1 sc.tl.leiden(adata)

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/scanpy/tools/_leiden.py:110, in leiden(adata, resolution, restrict_to, random_state, key_added, adjacency, directed, use_weights, n_iterations, partition_type, neighbors_key, obsp, copy, **partition_kwargs)
    108     import leidenalg
    109 except ImportError:
--> 110     raise ImportError(
    111         'Please install the leiden algorithm: `conda install -c conda-forge leidenalg` or `pip3 install leidenalg`.'
    112     )
    113 partition_kwargs = dict(partition_kwargs)
    115 start = logg.info('running Leiden clustering')

ImportError: Please install the leiden algorithm: `conda install -c conda-forge leidenalg` or `pip3 install leidenalg`.

Then I tried import leidenalg but still came to error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[61], line 1
----> 1 import leidenalg

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/leidenalg/__init__.py:35
      1 # -*- coding: utf-8 -*-
      2 r""" This package implements the Leiden algorithm in ``C++`` and exposes it to
      3 python.  It relies on ``(python-)igraph`` for it to function. Besides the
      4 relative flexibility of the implementation, it also scales well, and can be run
   (...)
     33 not immediately available in :func:`leidenalg.find_partition`.
     34 """
---> 35 from .functions import ALL_COMMS
     36 from .functions import ALL_NEIGH_COMMS
     37 from .functions import RAND_COMM

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/leidenalg/functions.py:2
      1 import sys
----> 2 import igraph as _ig
      3 from . import _c_leiden
      4 from ._c_leiden import ALL_COMMS

File ~/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/__init__.py:25
      1 """
      2 igraph library.
      3 """
      6 __license__ = """
      7 Copyright (C) 2006- The igraph development team
      8 
   (...)
     22 02110-1301 USA
     23 """
---> 25 from igraph._igraph import (
     26     ADJ_DIRECTED,
     27     ADJ_LOWER,
     28     ADJ_MAX,
     29     ADJ_MIN,
     30     ADJ_PLUS,
     31     ADJ_UNDIRECTED,
     32     ADJ_UPPER,
     33     ALL,
     34     ARPACKOptions,
     35     BFSIter,
     36     BLISS_F,
     37     BLISS_FL,
     38     BLISS_FLM,
     39     BLISS_FM,
     40     BLISS_FS,
     41     BLISS_FSM,
     42     DFSIter,
     43     Edge,
     44     GET_ADJACENCY_BOTH,
     45     GET_ADJACENCY_LOWER,
     46     GET_ADJACENCY_UPPER,
     47     GraphBase,
     48     IN,
     49     InternalError,
     50     OUT,
     51     REWIRING_SIMPLE,
     52     REWIRING_SIMPLE_LOOPS,
     53     STAR_IN,
     54     STAR_MUTUAL,
     55     STAR_OUT,
     56     STAR_UNDIRECTED,
     57     STRONG,
     58     TRANSITIVITY_NAN,
     59     TRANSITIVITY_ZERO,
     60     TREE_IN,
     61     TREE_OUT,
     62     TREE_UNDIRECTED,
     63     Vertex,
     64     WEAK,
     65     arpack_options as default_arpack_options,
     66     community_to_membership,
     67     convex_hull,
     68     is_bigraphical,
     69     is_degree_sequence,
     70     is_graphical,
     71     is_graphical_degree_sequence,
     72     set_progress_handler,
     73     set_random_number_generator,
     74     set_status_handler,
     75     umap_compute_weights,
     76     __igraph_version__,
     77 )
     78 from igraph.adjacency import (
     79     _get_adjacency,
     80     _get_adjacency_sparse,
   (...)
     83     _get_inclist,
     84 )
     85 from igraph.automorphisms import (
     86     _count_automorphisms_vf2,
     87     _get_automorphisms_vf2,
     88 )

ImportError: dlopen(/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/_igraph.abi3.so, 0x0002): Library not loaded: @rpath/libblas.3.dylib
  Referenced from: /Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/_igraph.abi3.so
  Reason: tried: '/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/../../../libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/lib/python3.9/site-packages/igraph/../../../libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/bin/../lib/libblas.3.dylib' (no such file), '/Users/cailinghui/anaconda3/envs/env4/bin/../lib/libblas.3.dylib' (no such file), '/usr/local/lib/libblas.3.dylib' (no such file), '/usr/lib/libblas.3.dylib' (no such file)

I've installed the leidenalg package. How can I solve this?
I am using Python 3.9.18, Jupyter Notebook 6.5.4, scanpy==1.9.5 anndata==0.10.0 umap==0.5.3 numpy==1.24.3 scipy==1.11.3 pandas==2.0.3 scikit-learn==1.3.0 statsmodels==0.14.0 pynndescent==0.5.10

Thanks a lot!

pearson residue normalization project to tSNE coordinates vs UMAP and the surface outline property?

Hey, I really don't know if this could be called an issue.
Since in the paper justifying square root normalization for visualization and clustering? and the Pearson residue normalizes and selected hvgs, both of the results are projected onto tSNE space. Are you suggesting we use tSNE as a major distance projection method?

I tried with my data and it seems with the UMAP projection, my data distribution surface seems less differentiable than other normalization methods. And I check in Pearson residue in this paper Fig4d,e you also see more "linear" /less differentiable distribution on the tSNE embeddings (compared to fig4a and c).
Actually, I personally like the simplicity of linearity but I reckon I need some explanation with regard to how this is the case (e.g. to explain to my colleagues and such since they are using sctransform and get the projection as in fig4c) and the confirmation that tSNE embeddings outperform UMAP with pearson residues.

Best,

cell type annotation problem

When I ran the code in clustering of the tutorials, all were well, except the "sc.pl.umap(adata, legend_loc='on data', title='', frameon=False, save='.pdf', color='leiden')". It reports that 'Float64Index' object has no attribute 'add_categories', how should i solve this problem
微信图片_20210831165057

missing code in Paul15?

Hi,

I'm following the script for Paul 2015 (https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html). I'm not familiar with python but is it missing something in [40]?
I have error in line 1.

_, axs = pl.subplots(ncols=3, figsize=(6, 2.5), gridspec_kw={'wspace': 0.05, 'left': 0.12})
pl.subplots_adjust(left=0.05, right=0.98, top=0.82, bottom=0.2)
for ipath, (descr, path) in enumerate(paths):
    _, data = sc.pl.paga_path(
        adata, path, gene_names,
        show_node_names=False,
        ax=axs[ipath],
        ytick_fontsize=12,
        left_margin=0.15,
        n_avg=50,
        annotations=['distance'],
        show_yticks=True if ipath==0 else False,
        show_colorbar=False,
        color_map='Greys',
        groups_key='clusters',
        color_maps_annotations={'distance': 'viridis'},
        title='{} path'.format(descr),
        return_data=True,
        show=False)
    data.to_csv('./write/paga_path_{}.csv'.format(descr))
pl.savefig('./figures/paga_path_paul15.pdf')
pl.show()

tutorial_pearson_residuals: why are residuals computed twice?

In the tutorial_pearson_residuals tutorial, the steps are:

# sets adata.var["highly_variable"]
sc.experimental.pp.highly_variable_genes(adata, flavor="pearson_residuals", n_top_genes=2000) 
adata = adata[:, adata.var["highly_variable"]]
sc.experimental.pp.normalize_pearson_residuals(adata)

sc.experimental.pp.highly_variable_genes(flavor="pearson_residuals") computes pearson residuals here:

https://github.com/scverse/scanpy/blob/2e98705347ea484c36caa9ba10de1987b09081bf/scanpy/experimental/pp/_highly_variable_genes.py#L120-L127

And sc.experimental.pp.normalize_pearson_residuals computes pearson residuals here:

https://github.com/scverse/scanpy/blob/2e98705347ea484c36caa9ba10de1987b09081bf/scanpy/experimental/pp/_normalization.py#L130

which refers to

https://github.com/scverse/scanpy/blob/2e98705347ea484c36caa9ba10de1987b09081bf/scanpy/experimental/pp/_normalization.py#L57-L64

So if I'm understanding, the tutorial leads you to compute pearson residuals twice, which is a little confusing. Why are there two different functions for computing those residuals anyhow? @jlause

3k PBMCs tutorial: neighborhood graph scanpy vs. Seurat

Hello!

I have been trying to translate a colleague's Seurat-based R code to scanpy/Python and have been using the PBMC 3k guided tutorials from each as a reference for basic preprocessing workflow. Let me know if this question would be better suited for mainstream scanpy, but I was having trouble understanding the translation between Seurat FindNeighbors() and scanpy pp.neighbors() based on the docs.

In Seurat's 3k PBMCs tutorial, they use JackStraw to determine the dimensionality of the data (first 10 PCs) and then construct the KNN graph with that dimensionality as a param of the FindNeighbors() function:
pbmc <- FindNeighbors(pbmc, dims = 1:10)

Based on the sc.pp.neighbors() documentation, I assumed that scanpy's n_pcs param would be equivalent to the FindNeighbors() dims param, but in the scanpy 3k PBMCs tutorial it reads:

Let us compute the neighborhood graph of cells using the PCA representation of the data matrix. You might simply use default values here. For the sake of reproducing Seurat’s results, let’s take the following values.
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)

If we wanted to reproduce Seurat's results from their tutorial, wouldn't we want to set n_pcs = 10 to include the first 10 PCs when creating the neighborhood graph? I just wanted to clarify that n_pcs is the correct scanpy parameter for setting the number of PCs included for the KNN graph.

Thank you!

Phoebe

pbmc3k - KeyError: 'base'

Following the pbmc3k tutorial I get an error: KeyError: 'base'
when executing the following cmd:

sc.tl.rank_genes_groups(adata, 'leiden', groups=['0'], reference='1', method='wilcoxon')

I run python 3.8 in a venv on Ubuntu 20.04 (VM)

scanpy==1.9.3

More error details (notebook cell's output):

ranking genes
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[47], line 1
----> 1 sc.tl.rank_genes_groups(adata, 'leiden', groups=['0'], reference='1', method='wilcoxon')
      2 sc.pl.rank_genes_groups(adata, groups=['0'], n_genes=20)

File .../lib/python3.8/site-packages/scanpy/tools/_rank_genes_groups.py:590, in rank_genes_groups(adata, groupby, use_raw, groups, reference, n_genes, rankby_abs, pts, key_added, copy, method, corr_method, tie_correct, layer, **kwds)
    580 adata.uns[key_added] = {}
    581 adata.uns[key_added]['params'] = dict(
    582     groupby=groupby,
    583     reference=reference,
   (...)
    587     corr_method=corr_method,
    588 )
--> 590 test_obj = _RankGenes(adata, groups_order, groupby, reference, use_raw, layer, pts)
    592 if check_nonnegative_integers(test_obj.X) and method != 'logreg':
    593     logg.warning(
    594         "It seems you use rank_genes_groups on the raw count data. "
    595         "Please logarithmize your data before calling rank_genes_groups."
    596     )

File .../lib/python3.8/site-packages/scanpy/tools/_rank_genes_groups.py:93, in _RankGenes.__init__(self, adata, groups, groupby, reference, use_raw, layer, comp_pts)
     82 def __init__(
     83     self,
     84     adata,
   (...)
     90     comp_pts=False,
     91 ):
---> 93     if 'log1p' in adata.uns_keys() and adata.uns['log1p']['base'] is not None:
     94         self.expm1_func = lambda x: np.expm1(x * np.log(adata.uns['log1p']['base']))
     95     else:

KeyError: 'base'

pbmc scanpy tutorial reproducibility error in sc.tl.leiden arguments

Hello, I am a PhD student working on single-cell analysis. I got hired two months ago, I have to move the analysis from R in Python, so I downloaded directly the .ipynb file in my laptop with the dataset to check if it works. I got this error :

TypeError: RBConfigurationVertexPartition.init() got an unexpected keyword argument 'flavor'

In this part of the code:
sc.tl.leiden(
adata,
resolution=0.9,
random_state=0,
flavor="igraph",
n_iterations=2,
directed=False,
)

And it is actually right there is no "flavor" argument in the sc.tl.leiden function, so I commented the 'flavor="igraph"' line of code and it works, however the clusterization it is slightly different and also the assignation of the genes it is not the same of the analysis reported here https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html.

I know it is an hard work create tutorials and pages to demonstrate workflows, so if it possible I would like to know how to reproduce the same clusters, and/or if there is a different argument that I could use to reproduce the same results, sorry for my few experience with scanpy but it is the first time I am trying this workflow on Python. Thank you for your availability.

Cannot reproduce trajectory inference tutorial

Hi there,

I downloaded and ran the trajectory inference Jupyter notebook. I could run smoothly, but I cannot get the plot where cells are overlayed to the trajectory.

(As you can see, the upper image is fine, but the bottoms are wrong.)
example

The results in tutorial are:
example2

I didn't make any change to the code.

I am using scanpy 1.9.3. Any suggestions?

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.