theislab / single-cell-tutorial Goto Github PK

Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"

Jupyter Notebook 99.99% R 0.01% Python 0.01% Dockerfile 0.01%

single-cell-tutorial's Introduction

Scripts for "Current best-practices in single-cell RNA-seq: a tutorial"

Note The "current" best practices that are detailed in this workflow were set up in 2019. Thus, they do not necessarily follow the latest best practices for scRNA-seq analysis anymore. For an up-to-date version of the latest best practices for single-cell RNA-seq analysis (and more modalities) please see our consistently updated online book: https://www.sc-best-practices.org.

For more information and contribution guidelines please visit the associated Github repository: https://github.com/theislab/single-cell-best-practices

This repository is complementary to the publication:

M.D. Luecken, F.J. Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial", Molecular Systems Biology 15(6) (2019): e8746

The paper was recommended on F1000 prime as being of special significance in the field.

The repository contains:

scripts to generate the paper figures
a case study which complements the manuscript
the code for the marker gene detection study from the supplementary material

The main part of this repository is a case study where the best-practices established in the manuscript are applied to a mouse intestinal epithelium regions dataset from Haber et al., Nature 551 (2018) available from the GEO under GSE92332. This case study can be found in different versions in the latest_notebook/ and old_releases/ directories.

The scripts in the plotting_scripts/ folder reproduce the figures that are shown in the manuscript and the supplementary materials. These scripts contain comments to explain each step. Each figure that does not have a corresponding script in the plotting_scripts/ folder was taken from the case study or the marker gene study.

In case of questions or issues, please get in touch by posting an issue in this repository.

If the materials in this repo are of use to you, please consider citing the above publication.

Environment set up

A docker container with a working sc-tutorial environment is now available here thanks to Leander Dony. If you would like to set up the environment via conda or manually outside of the docker container, please follow the instructions below.

To run the tutorial case study, several packages must be installed. As both R and python packages are required, we prefer using a conda environment. To facilitate the setup of a conda environment, we have provided the sc_tutorial_environment.yml file, which contains all conda and pip installable dependencies. R dependencies, which are not already available as conda packages, must be installed into the environment itself.

To set up a conda environment, the following instructions must be followed.

Set up the conda environment from the sc_tutorial_environment.yml file.
```
conda env create -f sc_tutorial_environment.yml
```
Ensure that the environment can find the gsl libraries from R. This is done by setting the CFLAGS and LDFLAGS environment variables (see https://bit.ly/2CjJsgn). Here we set them so that they are correctly set every time the environment is activated.
```
cd YOUR_CONDA_ENV_DIRECTORY
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
```
Where YOUR_CONDA_ENV_DIRECTORY can be found by running conda info --envs and using the directory that corresponds to your conda environment name (default: sc-tutorail).

WHILE NOT IN THE ENVIRONMENT(!!!!) open the env_vars.sh file at ./etc/conda/activate.d/env_vars.sh and enter the following into the file:
```
#!/bin/sh

CFLAGS_OLD=$CFLAGS
export CFLAGS_OLD
export CFLAGS="`gsl-config --cflags` ${CFLAGS_OLD}"
 
LDFLAGS_OLD=$LDFLAGS
export LDFLAGS_OLD
export LDFLAGS="`gsl-config --libs` ${LDFLAGS_OLD}"
```
Also change the ./etc/conda/deactivate.d/env_vars.sh file to:
```
#!/bin/sh
 
CFLAGS=$CFLAGS_OLD
export CFLAGS
unset CFLAGS_OLD
 
LDFLAGS=$LDFLAGS_OLD
export LDFLAGS
unset LDFLAGS_OLD
```
Note again that these files should be written WHILE NOT IN THE ENVIRONMENT. Otherwise you may overwrite the CFLAGS and LDFLAGS environment variables in the base environment!
Enter the environment by conda activate sc-tutorial or conda activate ENV_NAME if you changed the environment name in the sc_tutorial_environment.yml file.

Open R and install the dependencies via the commands:

install.packages(c('devtools', 'gam', 'RColorBrewer', 'BiocManager'))
update.packages(ask=F)
BiocManager::install(c("scran","MAST","monocle","ComplexHeatmap","slingshot"), version = "3.8")

These steps should set up an environment to perform single cell analysis with the tutorial workflow on a Linux system. Please note that we have encountered issues with conda environments on Mac OS. When using Mac OS we recommend installing the packages without conda using separately installed python and R versions. Alternatively, you can try using the base conda environment and installing all packages as described in the conda_env_instructions_for_mac.txt file. In the base environment, R should be able to find the relevant gsl libraries, so LDFLAGS and CFLAGS should not need to be set.

Also note that conda and pip doesn't always play nice together. Conda developers have suggested first installing all conda packages and then installing pip packages on top of this where conda packages are not available. Thus, installing further conda packages into the environment may cause issues. Instead, start a new environment and reinstall all conda packages first.

If you prefer to set up an environment manually, a list of all package requirements are given at the end of this document.

Downloading the data

As mentioned above the data for the case study comes from GSE92332. To run the case study as shown, you must download this data and place it in the correct folder. Unpacking the data requires tar and gunzip, which should already be available on most systems. If you are cloning the github repository and have the case study script in a latest_notebook/ folder, then from the location where you store the case study ipynb file, this can be done via the following commands:

cd ../  #To get to the main github repo folder
mkdir -p data/Haber-et-al_mouse-intestinal-epithelium/
cd data/Haber-et-al_mouse-intestinal-epithelium/
wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE92nnn/GSE92332/suppl/GSE92332_RAW.tar
mkdir GSE92332_RAW
tar -C GSE92332_RAW -xvf GSE92332_RAW.tar
gunzip GSE92332_RAW/*_Regional_*

The annotated dataset with which we briefly compare the results at the end of the notebook, is available from the same GEO accession ID (GSE92332). It can be obtained using the following command:

cd data/Haber-et-al_mouse-intestinal-epithelium/
wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE92nnn/GSE92332/suppl/GSE92332_Regional_UMIcounts.txt.gz
gunzip GSE92332_Regional_UMIcounts.txt.gz

Case study notes

We have noticed that results such as visualization, dimensionality reduction, and clustering (and hence all downstream results as well) can give slightly different results on different systems. This has to do with the numerical libraries that are used in the backend. Thus, we cannot guarantee that a rerun of the notebook will generate exactly the same clusters.

While all results are qualitatively similar, the assignment of cells to clusters especialy for stem cells, TA cells, and enterocyte progenitors can differ between runs across systems. To show the diversity that can be expected, we have uploaded shortened case study notebooks to the alternative_clustering_results/ folder.

Note that running sc.pp.pca() with the parameter svd_solver='arpack' drastically reduces the variability between systems, however the output is not exactly the same.

Adapting the pipeline for other datasets:

The pipeline was designed to be easily adaptable to new datasets. However, there are several limitations to the general applicability of the current workflow. When adapting the pipeline for your own dataset please take into account the following:

Sparse data formats are not supported by rpy2 and therefore do not work with any of the integrated R commands. Datasets can be turned into a dense format using the code: adata.X = adata.X.toarray()
The case study assumes that the input data is count data obtained from a single-cell protocol with UMIs. If the input data is full-length read data, then one could consider replacing the normalization method with another method that includes gene length normalization (e.g., TPM).

Manual installation of package requirements

The following packages are required to run the first version of the case study notebook. For further versions see the README.md in the latest_notebook/ and old_releases/ folders.

General:

Jupyter notebook
IRKernel
rpy2
R >= 3.4.3
Python >= 3.5

Python:

scanpy
numpy
scipy
pandas
seaborn
louvain>=0.6
python-igraph
gprofiler-official (from Case study notebook 1906 version)
python-gprofiler from Valentine Svensson's github (vals/python-gprofiler)
- only needed for notebooks before version 1906
ComBat python implementation from Maren Buettner's github (mbuttner/maren_codes/combat.py)
- only needed for scanpy versions before 1.3.8 which don't include sc.pp.combat()

scater
scran
MAST
gam
slingshot (change DESCRIPTION file for R version 3.4.3)
monocle 2
limma
ComplexHeatmap
RColorBrewer
clusterExperiment
ggplot2
IRkernel

Possible sources of error in the manual installation:

For R 3.4.3:

When using Slingshot in R 3.4.3, you must pull a local copy of slingshot via the github repository and change the DESCRIPTION file to say R>=3.4.3 instead of R>=3.5.0.

For R >= 3.5 and bioconductor >= 3.7:

The clusterExperiment version that comes for bioconductor 3.7 has slightly changed naming convention. clusterExperiment() is now called ClusterExperiment(). The latest version of the notebook includes this change, but when using the original notebook, please note that this may throw an error.

For rpy2 < 3.0.0:

Pandas 0.24.0 is not compatible with rpy2 < 3.0.0. When using old versions of rpy2, please downgrade pandas to 0.23.X. Please also note that Pandas 0.24.0 requires anndata version 0.6.18 and scanpy version > 1.37.0.

For enrichment analysis with g:profiler:

Ensure that the correct g:profiler package is used for the notebook. Notebooks until 1904 use python-gprofiler from valentine svensson's github, and Notebooks from 1906 use the gprofiler-official package from the g:profiler team.

If not R packages can be found:

Ensure that IRkernel has linked the correct version of R with your jupyter notebook. Check instructions at https://github.com/IRkernel/IRkernel.

single-cell-tutorial's People

Contributors

Stargazers

Watchers

Forkers

color4 colewunderlich akmazad yanxiliang1991 xhyuo jipeifeng emilibragimov aneeshpanoli hooooooly manolisathan cansavvy nsteinau dbrg77 pythseq lingzili csbioazim jun-lizst ciaranwelsh emirerkol changrong1023 huguesac ttriche wenmm inambioinfo chenxofhit federicomarini rsemeraro mvonpapen ning-liang biterbilen vyoming moxgreen franklyfakeli evafast shuang1330 boxizhang aih-szu sunliang3361 cameronraysmith bpar1 morganfuture v-clarissa yancylau leonguos jmkekala whirlfirst neworan 04linsi ssicreative83 codejoey nkm47 zhenzonglei yu-1011 akshaygupta533 hyperl1ght yazbraimah kitinje si-nan operontop yakun-pang michaelzh24 danyangxy bailins2 fengfeng0918 aybugealtay wangchengww qscaiboa feigeliudan01 shiyinw misaka-dayu juzheng87 jessica-2019 maozhitao kinghe233 esdenrun hkmztrk ryansohny jht1995 jidixueye yiluheihei gottfrid91 freshsunxwk zehualilab jokre3 nhyda amitbin1 yimmy23 dafafafafa brennerut cyn-111 jchenpku mtaom hanruizhang chen318liang larryyang1980 rardy akhileshkaushal clairejj20 dongwei1220 luisviolinist

single-cell-tutorial's Issues

Unable to deploy the .yml - Docker enhancement request

Creating the environment with the yml file provided generate error:

WARNING: The conda.compat module is deprecated and will be removed in a future release.
WARNING: The conda.compat module is deprecated and will be removed in a future release.

>>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

Traceback (most recent call last):
  File "/home/xma/anaconda3/lib/python3.7/site-packages/conda/exceptions.py", line 1003, in __call__
    return func(*args, **kwargs)
  File "/home/xma/anaconda3/lib/python3.7/site-packages/conda_env/cli/main.py", line 73, in do_call
    exit_code = getattr(module, func_name)(args, parser)
  File "/home/xma/anaconda3/lib/python3.7/site-packages/conda_env/cli/main_create.py", line 77, in execute
    directory=os.getcwd())
  File "/home/xma/anaconda3/lib/python3.7/site-packages/conda_env/specs/__init__.py", line 40, in detect
    if spec.can_handle():
  File "/home/xma/anaconda3/lib/python3.7/site-packages/conda_env/specs/yaml_file.py", line 18, in can_handle
    self._environment = env.from_file(self.filename)
  File "/home/xma/anaconda3/lib/python3.7/site-packages/conda_env/env.py", line 143, in from_file
    return from_yaml(yamlstr, filename=filename)
  File "/home/xma/anaconda3/lib/python3.7/site-packages/conda_env/env.py", line 128, in from_yaml
    data = yaml_load_standard(yamlstr)
  File "/home/xma/anaconda3/lib/python3.7/site-packages/conda/common/serialize.py", line 76, in yaml_load_standard
    return yaml.load(string, Loader=yaml.Loader, version="1.2")
  File "/home/xma/anaconda3/lib/python3.7/site-packages/ruamel_yaml/main.py", line 640, in load
    return loader._constructor.get_single_data()  # type: ignore
  File "/home/xma/anaconda3/lib/python3.7/site-packages/ruamel_yaml/constructor.py", line 102, in get_single_data
    node = self.composer.get_single_node()
  File "/home/xma/anaconda3/lib/python3.7/site-packages/ruamel_yaml/composer.py", line 75, in get_single_node
    document = self.compose_document()
  File "/home/xma/anaconda3/lib/python3.7/site-packages/ruamel_yaml/composer.py", line 99, in compose_document
    self.parser.get_event()
  File "/home/xma/anaconda3/lib/python3.7/site-packages/ruamel_yaml/parser.py", line 166, in get_event
    self.current_event = self.state()
  File "/home/xma/anaconda3/lib/python3.7/site-packages/ruamel_yaml/parser.py", line 244, in parse_document_end
    token = self.scanner.peek_token()
  File "/home/xma/anaconda3/lib/python3.7/site-packages/ruamel_yaml/scanner.py", line 173, in peek_token
    self.fetch_more_tokens()
  File "/home/xma/anaconda3/lib/python3.7/site-packages/ruamel_yaml/scanner.py", line 273, in fetch_more_tokens
    return self.fetch_value()
  File "/home/xma/anaconda3/lib/python3.7/site-packages/ruamel_yaml/scanner.py", line 626, in fetch_value
    self.reader.get_mark())
ruamel_yaml.scanner.ScannerError: mapping values are not allowed here
  in "<unicode string>", line 32, column 187:
     ...  in single-cell RNA-seq analysis: a tutorial&quot;  - theislab/s ... 
                                         ^ (line: 32)

$ /home/xma/anaconda3/bin/conda-env create -f /home/xma/Downloads/sc_tutorial_environment.yml

environment variables:
CIO_TEST=
CONDA_AUTO_UPDATE_CONDA=false
CONDA_DEFAULT_ENV=base
CONDA_EXE=/home/xma/anaconda3/bin/conda
CONDA_PREFIX=/home/xma/anaconda3
CONDA_PROMPT_MODIFIER=(base)
CONDA_ROOT=/home/xma/anaconda3
CONDA_SHLVL=1
PATH=/home/xma/anaconda3/bin:/home/xma/anaconda3/condabin:/home/xma/anacond
a3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/u
sr/games:/usr/local/games:/snap/bin
REQUESTS_CA_BUNDLE=
SSL_CERT_FILE=
WINDOWPATH=2

 active environment : base
active env location : /home/xma/anaconda3
        shell level : 1
   user config file : /home/xma/.condarc

populated config files : /home/xma/.condarc
conda version : 4.6.11
conda-build version : 3.17.8
python version : 3.7.3.final.0
base environment : /home/xma/anaconda3 (read only)
channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/free/linux-64
https://repo.anaconda.com/pkgs/free/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /home/xma/anaconda3/pkgs
/home/xma/.conda/pkgs
envs directories : /home/xma/.conda/envs
/home/xma/anaconda3/envs
platform : linux-64
user-agent : conda/4.6.11 requests/2.21.0 CPython/3.7.3 Linux/4.18.0-20-generic ubuntu/18.04.2 glibc/2.27
UID:GID : 1000:1000
netrc file : None
offline mode : False

An unexpected error has occurred. Conda has prepared the above report.

If submitted, this report will be used by core maintainers to improve
future releases of conda.
Would you like conda to send this report to the core maintainers?

select highly-variable genes

Hi,
I have a question about select highly-variable genes. In scanpy there seems two functions can do this, one is filter_genes_dispersion and another one is highly_variable_genes, and there seems a little difference about those two, highly_variable_genes need take log first while filter_genes_dispersion take log after filtration, correct? But I find the results from those two are some different, should I use which one?
And I am a little confused about the pipeline of scanpy.api.pp.recipe_zheng17, it renormalize after filtering with only keep the highly variable genes, but according to the scanpy clustering tutorial, it only did one time normalization(before filtration). So why need renormalize after filtration?

Thanks,
Jphe

MAST returns natural log, not log2FC

I have recently checked with the MAST developers what log basis they use to report logfoldchanges in MAST because I could not find that information anywhere. (RGLab/MAST#112)

Turns out that given we provide (natural)log-normalised as an input to mast, the reported logfoldchanges are also on a natural log basis. I.e. they need to be devided by ln(2) to become log2FC.

I guess this will also be important to how you use those values in the notebook as well.

Get environment setup info

Hi Malte,

I am walking myself through the environment setup that you described. I'd like to suggest the following in Step 2:
In order to find out where the sc-tutorial environment is located, I suggest to run the command conda info --envs. It lists all installed environments and also shows, which one is active. That's probably more convenient than activating the sc-tutorial environment, check for the environment path with echo $CONDA_PREFIX and then deactivate again.

Best,
Maren

Best practices for using regress_out?

I am adapting the current best practices workflow (epithelial cells) from @LuckyMD with my own data set, and am running into an issue/question. I am subsetting my data to include a few clusters of interest. Once I have those clusters isolated, I am selecting highly variable genes, regressing out effects of cell cycle, ribo genes and mito genes, scaling the data, and embedding a new UMAP, all in preparation for some downstream trajectory analysis (I opted not to regress out anything in my main data set, but want to regress out some confounding factors in my subset specifically for trajectory analysis). Is it better to first run pp.highly_variable_genes and then use pp.regress_out, or is better to run pp.regress_out followed by pp.highly_variable_genes? My code currently looks like the following:

Subset to highly variable genes
sc.pp.highly_variable_genes(adata_sub, flavor='cell_ranger', n_top_genes=4000, subset=True)

Regress out effects of cell cycle, mito genes, and ribo genes
sc.pp.regress_out(adata_sub, ['S_score', 'G2M_score', 'percent_mt', 'percent_ribo'])

Scale
sc.pp.scale(adata_sub, max_value=10)

Calculate the visualization

sc.pp.pca(adata_sub, n_comps=50, use_highly_variable=True, svd_solver='arpack')
sc.pp.neighbors(adata_sub)
sc.tl.umap(adata_sub)

If I run pp.regress_out before pp.highly_variable_genes, I have to include the line pp.filter_genes(adata_sub, min_counts=1 or else I get ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported. However, after doing some trial and error runs, I believe that including pp.filter_genes(adata_sub, min_counts=1 is excluding some genes of interest from my downstream trajectory analysis. I am able to recover these genes by reverting back to running pp.highly_variable_genes before pp.regress_out and excluding pp.filter_genes.

Intuitively I feel like it makes more sense to run pp.regress_out before pp.highly_variable_genes, but considering I am having issues using that order for downstream analysis, is it OK to run pp.regress_out after pp.highly_variable_genes? What is the best practice?

ComBat error "TypeError: data type not understood"

Issue report for the issue posted in #1:
ComBat gives the following error: TypeError: data type not understood.

@jphe Could you clarify whether you are still using a sparse data matrix? The current ComBat implementation does not work with the sparse matrix format.

The ComBat function from www.github.com/mbuttner/maren_codes/ was designed to take pandas Dataframes as input, so the pandas dataframe is not the problem. The code does have issues when your data has 0 variance in the expression values of a gene. So you should filter out genes with constant gene expression values (usually genes with 0 expression).

It would also be good to know the output of type(data.T).

scanpy.plotting.palettes' has no attribute 'default_64

When running the following cell:

#Visualize the clustering and how this is reflected by different technical covariates
sc.pl.umap(adata, color=['louvain_r1', 'louvain_r0.5'], palette=sc.pl.palettes.default_64)
sc.pl.umap(adata, color=['region', 'n_counts'])
sc.pl.umap(adata, color=['log_counts', 'mt_frac'])

I get the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-35-b037a02f8aaa> in <module>
      1 #Visualize the clustering and how this is reflected by different technical covariates
----> 2 sc.pl.umap(adata, color=['louvain_r1', 'louvain_r0.5'], palette=sc.pl.palettes.default_64)
      3 sc.pl.umap(adata, color=['region', 'n_counts'])
      4 sc.pl.umap(adata, color=['log_counts', 'mt_frac'])

AttributeError: module 'scanpy.plotting.palettes' has no attribute 'default_64'

Which system information would be useful for troubleshooting?

Kind regards

adata.var['gene_id-1']

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Hello,

Thank you for your excellent work, but I ran into a bug while running your code. Could you please help me solve such a mistake? (InvalidIndexError: Reindexing only valid with uniquely valued Index objects)

Thanks.

Best, Tian.

Get errors when performing sc.pp.highly_variable_genes!

Hi,
I am following this excellent workflow to analyze my single-cell sequencing data sets.
I have calculated the size factor using the scran package and did not perform the batch correction step as I have only one sample. Then, I intended to extract highly variable genes by using the function sc.pp.highly_variable_genes. Unfortunately, I got an error:

'LinAlgError: Last 2 dimensions of the array must be square'

LinAlgError Traceback (most recent call last)
in
----> 1 sc.pp.highly_variable_genes(adata)

~/miniconda3/lib/python3.6/site-packages/scanpy/preprocessing/highly_variable_genes.py in highly_variable_genes(adata, min_disp, max_disp, min_mean, max_mean, n_top_genes, n_bins, flavor, subset, inplace)
94 X = np.expm1(adata.X) if flavor == 'seurat' else adata.X
95
---> 96 mean, var = materialize_as_ndarray(_get_mean_var(X))
97 # now actually compute the dispersion
98 mean[mean == 0] = 1e-12 # set entries equal to zero to small value

~/miniconda3/lib/python3.6/site-packages/scanpy/preprocessing/utils.py in _get_mean_var(X)
16 mean_sq = np.multiply(X, X).mean(axis=0)
17 # enforece R convention (unbiased estimator) for variance
---> 18 var = (mean_sq - mean**2) * (X.shape[0]/(X.shape[0]-1))
19 else:
20 from sklearn.preprocessing import StandardScaler

~/miniconda3/lib/python3.6/site-packages/numpy/matrixlib/defmatrix.py in pow(self, other)
226
227 def pow(self, other):
--> 228 return matrix_power(self, other)
229
230 def ipow(self, other):

~/miniconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py in matrix_power(a, n)
600 a = asanyarray(a)
601 _assertRankAtLeast2(a)
--> 602 _assertNdSquareness(a)
603
604 try:

~/miniconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py in _assertNdSquareness(*arrays)
213 m, n = a.shape[-2:]
214 if m != n:
--> 215 raise LinAlgError('Last 2 dimensions of the array must be square')
216
217 def _assertFinite(*arrays):

This is my adata.X looks like right now:
matrix([[0. , 0. , 0. , ..., 0. , 0. , 0. ],
[0. , 0. , 1.203, ..., 0. , 0. , 0. ],
[0. , 1.096, 0. , ..., 0. , 0. , 0. ],
...,
[0. , 0. , 2.042, ..., 0. , 0. , 0. ],
[0. , 0. , 0. , ..., 0.926, 0. , 0. ],
[0. , 0. , 2.951, ..., 0. , 0. , 0. ]])

also versions of my modules:
scanpy==1.3.7 anndata==0.6.17 numpy==1.15.4 scipy==1.2.0 pandas==0.24.0 scikit-learn==0.20.2 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1

Looking forward your response!
Thank you !

Dockerfile is not visible

Hello,

We are trying to run your pipeline. Unfortunately the image is not working when run from singularity and we cannot look into it since the source code is not available.

Please help being able to run this and provide the full source code to this.

Exception: Data must be 1-dimensional when plotting new marker genes in jupyter notebook

I am following the notebook to understand the steps of single-cell RNA seq.

I had the same issue #21 for the version of scanpy, so I followed what is said in the answer.
It worked perfectly beside some problems that I solved and I want to write here so everyone that has the same issues can resolve them:

also adata = adata.concatenate(adata_tmp, batch_key='sample_id')and adata.obs.drop(columns=['sample_id'], inplace=True) generated errors so I commented out also those lines
I had errors when a graph is plotted using sc.pl.palettes.godsnot_64 or sc.pl.palettes.default_64 so I use sc.pl.palettes.vega_20_scanpy instead
In the step Marker genes & cluster annotation I replaced:

adata.rename_categories('louvain_r0.5', ['TA', 'EP (early)', 'Stem', 'Goblet', 'EP (stress)', 'Enterocyte', 'Paneth', 'Enteroendocrine', 'Tuft'])

with

adata.rename_categories('louvain_r0.5', ['TA', 'EP (early)', 'Stem', 'Goblet', 'EP (stress)', 'Enterocyte', 'Paneth'])

because the number of old and new categories don't match, so that way it works.

When I run

#Plot the new marker genes
sc.pl.rank_genes_groups(adata, key='rank_genes_r0.5_entero_sub', groups=['Enterocyte,0','Enterocyte,1','Enterocyte,2'], fontsize=12)

I get that no field of name Enterocyte,2, so I commented out that group and with the other two it works.

I get now to my problem. I am running the notebook with the case study data and with scanpy==1.4.5.1 anndata==0.7.1 umap==0.3.10 numpy==1.18.1 scipy==1.4.1 pandas==0.25.3 scikit-learn==0.22.1 statsmodels==0.11.1 python-igraph==0.8.0 louvain==0.6.1.
(Note: I use pandas 0.25.3 because previously, when I try to run it with 1.0.1 it had incompatibility problems)

Now I have a problem in the steps of subclustering, when I try to run:

entero_clusts = [clust for clust in adata.obs['louvain_r0.5_entero_sub'].cat.categories if clust.startswith('Enterocyte')]
 for clust in entero_clusts:
    sc.pl.rank_genes_groups_violin(adata, use_raw=True, key='rank_genes_r0.5_entero_sub', groups=[clust], gene_names=adata.uns['rank_genes_r0.5_entero_sub']['names'][clust][90:100])

the error I get is


 ---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-58-4453b5691a91> in <module>
      2 
      3 for clust in entero_clusts:
----> 4     sc.pl.rank_genes_groups_violin(adata, use_raw=True, key='rank_genes_r0.5_entero_sub', groups=[clust], gene_names=adata.uns['rank_genes_r0.5_entero_sub']['names'][clust][90:100])
      5 

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/scanpy/plotting/_tools/__init__.py in rank_genes_groups_violin(adata, groups, n_genes, gene_names, gene_symbols, use_raw, key, split, scale, strip, jitter, size, ax, show, save)
    727             if issparse(X_col): X_col = X_col.toarray().flatten()
    728             new_gene_names.append(g)
--> 729             df[g] = X_col
    730         df['hue'] = adata.obs[groups_key].astype(str).values
    731         if reference == 'rest':

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   3485         else:
   3486             # set column
-> 3487             self._set_item(key, value)
   3488 
   3489     def _setitem_slice(self, key, value):

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   3561         """
   3562 
-> 3563         self._ensure_valid_index(value)
   3564         value = self._sanitize_column(key, value)
   3565         NDFrame._set_item(self, key, value)

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/frame.py in _ensure_valid_index(self, value)
   3538         if not len(self.index) and is_list_like(value):
   3539             try:
-> 3540                 value = Series(value)
   3541             except (ValueError, NotImplementedError, TypeError):
   3542                 raise ValueError(

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    312                     data = data.copy()
    313             else:
--> 314                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
    315 
    316                 data = SingleBlockManager(data, index, fastpath=True)

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/internals/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    727     elif subarr.ndim > 1:
    728         if isinstance(data, np.ndarray):
--> 729             raise Exception("Data must be 1-dimensional")
    730         else:
    731             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

I thought it was an issue for the cache but I cleaned it, restarted the kernel and cleared all outputs and the problem remains.
How can I fix this?

R libraries not accessible

When using R in anaconda, a warning that the libraries cannot be installed occurs. These libraries can be added only locally. In order to overcome this, it helps to change the R_LIB path in the activate script

Key error with gene_id-1

I am following the notebook to better understand the steps of single-cell RNA seq. I am running the notebook on a mac os 10.12.6, the version of R that I am using is 3.6.1. I am running the original tutorial with the data you have kindly provided. I had to change the original line(below)

from gprofiler import GProfiler

from gprofiler import gprofiler

Everything was working great until I attempted to concatenate to main adata object, a key error appears claiming that the key "gene_id-1" does not exist. Error message below. Thank you so much for your help!


KeyError                                  Traceback (most recent call last)
/Applications/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2656             try:
-> 2657                 return self._engine.get_loc(key)
   2658             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'gene_id-1'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-7-a74b7db26376> in <module>
     30     # Concatenate to main adata object
     31     adata = adata.concatenate(adata_tmp, batch_key='sample_id')
---> 32     adata.var['gene_id'] = adata.var['gene_id-1']
     33     adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
     34     adata.obs.drop(columns=['sample_id'], inplace=True)

/Applications/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2925             if self.columns.nlevels > 1:
   2926                 return self._getitem_multilevel(key)
-> 2927             indexer = self.columns.get_loc(key)
   2928             if is_integer(indexer):
   2929                 indexer = [indexer]

/Applications/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2657                 return self._engine.get_loc(key)
   2658             except KeyError:
-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2661         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'gene_id-1'

sc.tl.marker_gene_overlap error

Hi LuckyMD,

I was exploring the new sc tutorial and am having an issue with the marker_gene_overlap function. The error is: module 'scanpy.tools' has no attribute 'marker_gene_overlap'. And when I look at the scanpy API, it is not listed under tools. I tried updating scanpy but that did not change. How can I implement marker_gene_overlap?

Cheers,

Jurgen

Error in samples concatenation

Hi,

When I try to run the following line (on the case study) I receive an error:

adata.var['gene_id'] = adata.var['gene_id-1']

KeyError Traceback (most recent call last)
//anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2656 try:
-> 2657 return self._engine.get_loc(key)
2658 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'gene_id-1'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
in
31 # Concatenate to main adata object
32 adata = adata.concatenate(adata_tmp, batch_key='sample_id')
---> 33 adata.var['gene_id'] = adata.var['gene_id-1']
34 #adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
35 #adata.obs.drop(columns=['sample_id'], inplace=True)

//anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in getitem(self, key)
2925 if self.columns.nlevels > 1:
2926 return self._getitem_multilevel(key)
-> 2927 indexer = self.columns.get_loc(key)
2928 if is_integer(indexer):
2929 indexer = [indexer]

//anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2657 return self._engine.get_loc(key)
2658 except KeyError:
-> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key))
2660 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'gene_id-1'

Seems like the 'adata.concatenate' function does not create any variables as 'gene_id-1' or 'gene_id-0'. Can you please check this?

normalization: scran vs neg. binomial regression (sctransform)

Hi Malte,

in the single-cell tutorial, you recommend scran as the go-to solution for normalization; however, in the recent single-cell webinar, you said you found using scran + NB regression worked best. As far as I understand sctransform is such a NB regression method that is getting popular and being added to scanpy.

As of now, would you recommend to use sctransform rather than scran, or even both? If you use both, do you first adjust the counts with scran size factors and then feed them in the regression model?

Best,
Gregor

To get an idea, I applied CPM-normalization (top), scran, sctransform, scran+sctransform (bottom) on the Lung dataset from Laughney et al. All with default parameters. Just judging from these plots I don't like the result of sctransform too much: The CD8 and CD4 T cells are not clearly separated after this normalization any more.

Problem correcting batches with Combat

Hi there!

I have been trying to apply the tutorial steps you described in the notebook to my own data. I can run combat with no errors. However, when I checked the data integration, each sample is completely separate from each other, a behavior that I don't expect and also I do not see when using other tools such as Seurat (there are two or three cell types that should cluster together).

I do not know whether this issue is due to several warnings I get after running sc.pp.combat. Find the messages below.

Standardizing Data across genes.

Found 11 batches

Found 0 numerical variables:
	

Fitting L/S model and finding priors

Finding parametric adjustments

/home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:269: NumbaWarning: 
Compilation is falling back to object mode WITH looplifting enabled because Function "_it_sol" failed type inference due to: Cannot unify array(float64, 2d, C) and array(float64, 1d, C) for 'sum2', defined at /home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py (311)

File "../../../../anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py", line 311:
def _it_sol(s_data, g_hat, d_hat, g_bar, t2, a, b, conv=0.0001) -> Tuple[float, float]:
    <source elided>
        g_new = (t2*n*g_hat + d_old*g_bar) / (t2*n + d_old)
        sum2 = s_data - g_new.reshape((g_new.shape[0], 1)) @ np.ones((1, s_data.shape[1]))
        ^

[1] During: typing of assignment at /home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py (313)

File "../../../../anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py", line 313:
def _it_sol(s_data, g_hat, d_hat, g_bar, t2, a, b, conv=0.0001) -> Tuple[float, float]:
    <source elided>
        sum2 = sum2 ** 2
        sum2 = sum2.sum(axis=1)
        ^

  @numba.jit
/home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:269: NumbaWarning: 
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "_it_sol" failed type inference due to: cannot determine Numba type of <class 'numba.dispatcher.LiftedLoop'>

File "../../../../anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py", line 305:
def _it_sol(s_data, g_hat, d_hat, g_bar, t2, a, b, conv=0.0001) -> Tuple[float, float]:
    <source elided>
    change = 1
    count = 0
    ^

  @numba.jit
/home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/numba/compiler.py:742: NumbaWarning: Function "_it_sol" was compiled in object mode without forceobj=True, but has lifted loops.

File "../../../../anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py", line 270:
@numba.jit
def _it_sol(s_data, g_hat, d_hat, g_bar, t2, a, b, conv=0.0001) -> Tuple[float, float]:
^

  self.func_ir.loc))
/home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/numba/compiler.py:751: NumbaDeprecationWarning: 
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "../../../../anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py", line 270:
@numba.jit
def _it_sol(s_data, g_hat, d_hat, g_bar, t2, a, b, conv=0.0001) -> Tuple[float, float]:
^

  warnings.warn(errors.NumbaDeprecationWarning(msg, self.func_ir.loc))
/home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:269: NumbaWarning: 
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "_it_sol" failed type inference due to: Cannot unify array(float64, 2d, C) and array(float64, 1d, C) for 'sum2', defined at /home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py (311)

File "../../../../anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py", line 311:
def _it_sol(s_data, g_hat, d_hat, g_bar, t2, a, b, conv=0.0001) -> Tuple[float, float]:
    <source elided>
        g_new = (t2*n*g_hat + d_old*g_bar) / (t2*n + d_old)
        sum2 = s_data - g_new.reshape((g_new.shape[0], 1)) @ np.ones((1, s_data.shape[1]))
        ^

[1] During: typing of assignment at /home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py (313)

File "../../../../anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py", line 313:
def _it_sol(s_data, g_hat, d_hat, g_bar, t2, a, b, conv=0.0001) -> Tuple[float, float]:
    <source elided>
        sum2 = sum2 ** 2
        sum2 = sum2.sum(axis=1)
        ^

  @numba.jit
/home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/numba/compiler.py:742: NumbaWarning: Function "_it_sol" was compiled in object mode without forceobj=True.

File "../../../../anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py", line 305:
def _it_sol(s_data, g_hat, d_hat, g_bar, t2, a, b, conv=0.0001) -> Tuple[float, float]:
    <source elided>
    change = 1
    count = 0
    ^

  self.func_ir.loc))
/home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/numba/compiler.py:751: NumbaDeprecationWarning: 
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "../../../../anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py", line 305:
def _it_sol(s_data, g_hat, d_hat, g_bar, t2, a, b, conv=0.0001) -> Tuple[float, float]:
    <source elided>
    change = 1
    count = 0
    ^

  warnings.warn(errors.NumbaDeprecationWarning(msg, self.func_ir.loc))
/home/cruiz/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:235: RuntimeWarning: divide by zero encountered in true_divide
  b_prior[i],
Adjusting data

Do you think this might be the problem why I am not having a proper integration of the data?

Below the UMAP generated by Seurat (with any integration method, only merging and clustering) and the UMAP I got from Scanpy after running the pipeline.

Thanks in advance!

Issue with rgl package when trying to load slingshot

Everything is working for me except for the slingshot package. When I try to load it via rpy2, I get the error shown below. The version of R that I have installed is 3.5.1, if that helps at all.

R[write to console]: Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/stor/home/ericb123/anaconda3/envs/sc-tutorial/lib/R/library/rgl/libs/rgl.so':
  dlopen: cannot load any more object with static TLS

R[write to console]: In addition: 
R[write to console]: Warning message:

R[write to console]: replacing previous import ‘SingleCellExperiment::weights’ by ‘stats::weights’ when loading ‘slingshot’ 

R[write to console]: Error: package or namespace load failed for ‘slingshot’:
 .onLoad failed in loadNamespace() for 'rgl', details:
  call: NULL
  error:        Loading rgl's DLL failed. 


Error: package or namespace load failed for ‘slingshot’:
 .onLoad failed in loadNamespace() for 'rgl', details:
  call: NULL
  error:        Loading rgl's DLL failed.

adata.concatenate

[hi]

Loop to load rest of data sets

Concatenate to main adata object

adata = adata.concatenate(adata_tmp, batch_key='sample_id')

When I conducted that，it remind me

Any insight on what could be causing this?

Thanks!
Ohad

TypeError: Callable[[arg, ...], result]: each arg must be a type. Got Ellipsis.

Hi,

I was following the latest notebook and I successfully installed all the packages required to complete the tutorial. However, I find the following error when running the first chunk of code of section 2.5 Visualization:

TypeError Traceback (most recent call last)
in
3 sc.pp.neighbors(adata)
4
----> 5 sc.tl.tsne(adata, n_jobs=12) #Note n_jobs works for MulticoreTSNE, but not regular implementation)
6 sc.tl.umap(adata)
7 sc.tl.diffmap(adata)

~/anaconda3/lib/python3.7/site-packages/scanpy/tools/_tsne.py in tsne(adata, n_pcs, use_rep, perplexity, early_exaggeration, learning_rate, random_state, use_fast_tsne, n_jobs, copy)
108 if X_tsne is None:
109 from sklearn.manifold import TSNE
--> 110 from . import _tsne_fix # fix by D. DeTomaso for sklearn < 0.19
111
112 # unfortunately, sklearn does not allow to set a minimum number

~/anaconda3/lib/python3.7/site-packages/scanpy/tools/_tsne_fix.py in
32 verbose: int = 0,
33 args: Iterable[Any] = (),
---> 34 kwargs: Mapping[str, Any] = MappingProxyType({}),
35 ) -> Tuple[np.ndarray, float, int]:
36 """\

~/anaconda3/lib/python3.7/typing.py in getitem(self, params)
756 f" Got {args}")
757 params = (tuple(args), result)
--> 758 return self.getitem_inner(params)
759
760 @_tp_cache

~/anaconda3/lib/python3.7/typing.py in inner(*args, **kwds)
252 except TypeError:
253 pass # All real errors (not unhashable args) are raised below.
--> 254 return func(*args, **kwds)
255 return inner
256

~/anaconda3/lib/python3.7/typing.py in getitem_inner(self, params)
779 return self.copy_with((_TypingEllipsis, result))
780 msg = "Callable[[arg, ...], result]: each arg must be a type."
--> 781 args = tuple(_type_check(arg, msg) for arg in args)
782 params = args + (result,)
783 return self.copy_with(params)

~/anaconda3/lib/python3.7/typing.py in (.0)
779 return self.copy_with((_TypingEllipsis, result))
780 msg = "Callable[[arg, ...], result]: each arg must be a type."
--> 781 args = tuple(_type_check(arg, msg) for arg in args)
782 params = args + (result,)
783 return self.copy_with(params)

~/anaconda3/lib/python3.7/typing.py in _type_check(arg, msg, is_argument)
140 return arg
141 if not callable(arg):
--> 142 raise TypeError(f"{msg} Got {arg!r:.100}.")
143 return arg
144

TypeError: Callable[[arg, ...], result]: each arg must be a type. Got Ellipsis.

I couldn't find a possible solution. Could you provide any hints? Thanks in advance.

--
I'm working on the Google Cloud Platform (Ubuntu instance). Here is my IPython.sys_info() :

{'commit_hash': 'c233c25ab',
'commit_source': 'installation',
'default_encoding': 'UTF-8',
'ipython_path': 'home/user/anaconda3/lib/python3.7/site-packages/IPython',
'ipython_version': '7.8.0',
'os_name': 'posix',
'platform': 'Linux-5.0.0-1033-gcp-x86_64-with-debian-buster-sid',
'sys_executable': 'home/user/anaconda3/bin/python',
'sys_platform': 'linux',
'sys_version': '3.7.4 (default, Aug 13 2019, 20:35:49) \n[GCC 7.3.0]'}

ComBat Error

Hi, here. After learning, I tried to use my own data to analysis. Everything went well before the batch correction. And I couldn't understand what happened and how to debug. Can anyone help me? Thx!
`

ComBat batch correction

sc.pp.combat(adata, key='sample')

Standardizing Data across genes.

AssertionError Traceback (most recent call last)
in ()
1 # ComBat batch correction
----> 2 sc.pp.combat(adata, key='sample')

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/scanpy/preprocessing/_combat.py in combat(adata, key, covariates, inplace)
210 # standardize across genes using a pooled variance estimator
211 logg.info("Standardizing Data across genes.\n")
--> 212 s_data, design, var_pooled, stand_mean = _standardize_data(model, data, key)
213
214 # fitting the parameters on the standardized data

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/scanpy/preprocessing/_combat.py in _standardize_data(model, data, batch_key)
103 n_array = float(sum(n_batches))
104
--> 105 design = _design_matrix(model, batch_key, batch_levels)
106
107 # compute pooled variance estimator

/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/scanpy/preprocessing/_combat.py in _design_matrix(model, batch_key, batch_levels)
37 " 0 + C(Q('{}'), levels=batch_levels)".format(batch_key),
38 model,
---> 39 return_type="dataframe",
40 )
41 model = model.drop([batch_key], axis=1)

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/patsy/highlevel.py in dmatrix(formula_like, data, eval_env, NA_action, return_type)
289 eval_env = EvalEnvironment.capture(eval_env, reference=1)
290 (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
--> 291 NA_action, return_type)
292 if lhs.shape[1] != 0:
293 raise PatsyError("encountered outcome variables for a model "

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/patsy/highlevel.py in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
163 return iter([data])
164 design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
--> 165 NA_action)
166 if design_infos is not None:
167 return build_design_matrices(design_infos, data,

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/patsy/highlevel.py in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
60 "ascii-only, or else upgrade to Python 3.")
61 if isinstance(formula_like, str):
---> 62 formula_like = ModelDesc.from_formula(formula_like)
63 # fallthrough
64 if isinstance(formula_like, ModelDesc):

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/patsy/desc.py in from_formula(cls, tree_or_string)
162 tree = tree_or_string
163 else:
--> 164 tree = parse_formula(tree_or_string)
165 value = Evaluator().eval(tree, require_evalexpr=False)
166 assert isinstance(value, cls)

/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/patsy/parse_formula.py in parse_formula(code, extra_operators)
146 tree = infix_parse(_tokenize_formula(code, operator_strings),
147 operators,
--> 148 _atomic_token_types)
149 if not isinstance(tree, ParseNode) or tree.type != "":
150 tree = ParseNode("~", None, [tree], tree.origin)

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/patsy/infix_parser.py in infix_parse(tokens, operators, atomic_types, trace)
208
209 want_noun = True
--> 210 for token in token_source:
211 if c.trace:
212 print("Reading next token (want_noun=%r)" % (want_noun,))

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/patsy/parse_formula.py in _tokenize_formula(code, operator_strings)
92 else:
93 it.push_back((pytype, token_string, origin))
---> 94 yield _read_python_expr(it, end_tokens)
95
96 def test__tokenize_formula():

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/patsy/parse_formula.py in _read_python_expr(it, end_tokens)
42 origins = []
43 bracket_level = 0
---> 44 for pytype, token_string, origin in it:
45 assert bracket_level >= 0
46 if bracket_level == 0 and token_string in end_tokens:

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/patsy/util.py in next(self)
319 else:
320 # May raise StopIteration
--> 321 return six.advance_iterator(self._it)
322 next = next
323

~/anaconda2/envs/cnmf_env/lib/python3.6/site-packages/patsy/tokens.py in python_tokenize(code)
33 break
34 origin = Origin(code, start, end)
---> 35 assert pytype not in (tokenize.NL, tokenize.NEWLINE)
36 if pytype == tokenize.ERRORTOKEN:
37 raise PatsyError("error tokenizing input "

AssertionError:
`

%%R -i data_mat -i input_groups -o size_factors

Thanks for creating such an excellent tutorial.

I run into an error when running this cell.

%%R -i data_mat -i input_groups -o size_factors 
size_factors = computeSumFactors(data_mat, clusters=input_groups, min.mean=0.1)

I get this error:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘assay’ for signature ‘"matrix", "character"’

I am using the docker container here: https://hub.docker.com/r/leanderd/single-cell-analysis
I'm using a Mac.

I thought it might be an R version problem, so I also tried not using the docker and installing the package requirements manually with an earlier version of R, but as you might imagine package conflicts made it impossible to install the required packages using BiocManager.

Could not find function clusterExperiment()

I have encountered the following error when trying to execute the r code which sets up a clusterExperiment data structure for heatmap visualization in the Gene expression dynamics section (3.5.5):

Error in clusterExperiment(heatdata, heatclus, transformation = function(x) { : 
  could not find function "clusterExperiment"

Surprisingly, running library(clusterExperiment) does not produce any error messages, the package seems to be installed correctly.

I am on macOS Mojave and R version 3.5.1 with Bioconductor 3.7 and clusterExperiment version 2.0.2 and Anaconda Python 3.6

Furthermore: scanpy==1.3.7 anndata==0.6.17 numpy==1.15.4 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.2 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1

Issues with sc.pp.highly_variable_genes following tutorial

This is related to closed issue #8, so I apologize for the repetition. Since the issue was closed I wasn't sure if I could comment or not.

Essentially, I am following your excellent and thorough Case-study_Mouse-intestinal-epithelium_1906.ipynb notebook, but using my own data. After the Normalization steps (and skipping the Batch Correction section, as I only have one sample), I am unable to run the sc.pp.highly_variable_genes command in In[26]. I receive the following error:
LinAlgError: Last 2 dimensions of the array must be square. However, if I omit the scran normalization process (In[18]-In[21]) and replace In[23] with:

sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

I can execute the rest of the code. Is this an OK practice to do, or is scran normalization much better? And is scran normalization is better, how can I fix my error?

Also, perhaps related (?), I am trying to run the R tradeSeq package (https://github.com/statOmics/tradeSeqPaper) downstream of the slingshot object generated following the steps in your tutorial. However, I keep erroring out on their fitGAM function, receiving the following error:
could not broadcast input array from shape (2) into shape (1136). This makes me feel like maybe there is something wrong with my adata object somewhere along the way, although I was able to run other R packages like Slingshot with no issue. Any insight/idea how to run tradeSeq downstream of Slingshot following data processing as outlined in your tutorial?

Thank you!

Monocle: Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : Polygon edge not found

I have encountered the following error when trying to execute the notebook cells which visualise the monocle2 output:

Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : Polygon edge not found

I am on macOS Mojave and R version 3.5.1 witb Bioconductor 3.7 and Monocle version 2.8.0 and Anaconda Python 3.6

Furthermore: scanpy==1.3.7 anndata==0.6.17 numpy==1.15.4 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.2 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1

Different outputs

Hi!
When I try to repeat the entire process, including QC, normalization, batch correction, dimensionality reduction, clustering, difference analysis, I use the same input file and almost the same parameters, why do I get obvious different results? Among the differences between my use and the tutorial are the software version updates I use.

Different results include that cluster 0 has 4000+ cells, and the top 20 differentially expressed genes contain many mitochondrial and ribosomal genes. And some difference can be seen on all output graphs, even if the QC graph is not particularly obvious.

Looking forward to getting a reply！
Thanks！

In vivo dataset comparisons

Hi Malte,

Have a question regarding comparing my dataset to public datasets (~11 sets) from the Allen brain atlas. I'd be interested in being able to "score" my dataset against the in vivo dataset and see how likely it correlates to those datasets. Sort of like an ageing approximation based off how closely it relates to one of the inputted in vivo datasets.
Following the tutorial, lines 38-39 comes to mind where my rank_genes_groups for each cluster are compared to the inputted marker genes to look for overlap. Is there a way to do something similar with the described datasets (most of the datasets are gene sets with ~2000 genes each) and plot such correlations with a corresponding score of similarity? I guess it wouldn't necessarily need to be from the rank_genes for each cluster but more comparing the system as a whole against the datasets.

Thanks!

Kernel restarting when concatenating

Hi,

I used the docker container provided to set up the environment which was fast and easy, thanks!

However, when I start running the notebook, the cell starting with "# Loop to load rest of data sets" in the first section (reading the data) keeps crashing the kernel (tried several times with the same result). I added some printouts to narrow it down to the "# Concatenate to main adata object" section. Not sure why this is happening. Any insight on what could be causing this?

Thanks!
Ohad

mt fraction calculation

In case you are planned to update tutorial at some point, Isaac pointed me to a cleaner way to calculate mt fraction.

adata.var["mt"] = adata.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)

It calculates both total_counts_mt and pct_counts_mt

just leaving it here as reference.

Jupyter notebook kernel crashes silently: Intel OMP

When trying to call scran for the normalisation step in the notebook, the kernel of the notebook dies without producing any error in the notebook.

When checking the jupyter notebook server output, Ican see the following messages:

OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized. OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. [I 18:24:28.130 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports

I am on macOS Mojave and R version 3.5.1 with Bioconductor 3.7, scran version 1.8.4 and Anaconda Python 3.6

Furthermore: scanpy==1.3.7 anndata==0.6.17 numpy==1.15.4 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.2 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1

Running Slingshot through rpy2 fails with pandas v0.24

After updating to pandas 0.24, I get the following error when trying to run slingshot through rpy2 in the notebook:

TypeError: Parameter 'categories' must be list-like, was <rpy2.rinterface.StrSexpVector - Python:0x1c26c34eb8 / R:0x7fa21b9ef388>

Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-85-fa9723fda7cc> in <module>
----> 1 get_ipython().run_cell_magic('R', '-i data_mat -i obs -i var -i pca_x -i dm_x -o sce_start -o sce_startend -o sce_simple_startend', 'require(slingshot)\nrequire(RColorBrewer)\n\n#Create the SingleCellExperiment data structure\nsce <- SingleCellExperiment(assays=data_mat, rowData=var, colData=obs)\n\n#Load the dimensionality reduced data\nreducedDims(sce) <- SimpleList(PCA = pca_x, DM = dm_x)\n\n#Plot 1\ncolour_map = brewer.pal(20,\'Set1\')\npar(xpd=TRUE)\npar(mar=c(4.5,5.5,2,7))\nplot(pca_x[,1], pca_x[,2], col=colour_map[colData(sce)$louvain_final], bty=\'L\')\nlegend(x=12, y=12, legend=unique(colData(sce)$louvain_final), fill=colour_map[as.integer(unique(colData(sce)$louvain_final))])\n\nprint("1:")\nsce_start <- slingshot(sce, clusterLabels = \'louvain_final\', reducedDim = \'PCA\', start.clus=\'Stem\')\nprint(SlingshotDataSet(sce_start))\n\nprint("")\nprint("2:")\nsce_startend <- slingshot(sce, clusterLabels = \'louvain_final\', reducedDim = \'PCA\', start.clus=\'Stem\', end.clus=c(\'Enterocyte mat. (Proximal)\', \'Enterocyte mat. (Distal)\'))\nprint(SlingshotDataSet(sce_startend))\n\nprint("")\nprint("3:")\nsce_simple_startend <- slingshot(sce, clusterLabels = \'louvain_r0.5\', reducedDim = \'PCA\', start.clus=\'Stem\', end.clus=\'Enterocyte\')\nprint(SlingshotDataSet(sce_simple_startend))\n')

~/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2321             magic_arg_s = self.var_expand(line, stack_depth)
   2322             with self.builtin_trap:
-> 2323                 result = fn(magic_arg_s, cell)
   2324             return result
   2325 

<decorator-gen-797> in R(self, line, cell, local_ns)

~/anaconda3/lib/python3.6/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

~/anaconda3/lib/python3.6/site-packages/rpy2/ipython/rmagic.py in R(self, line, cell, local_ns)
    688                         raise NameError("name '%s' is not defined" % input)
    689                 with localconverter(converter) as cv:
--> 690                     ro.r.assign(input, val)
    691 
    692         tmpd = self.setup_graphics(args)

~/anaconda3/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
    176                 v = kwargs.pop(k)
    177                 kwargs[r_k] = v
--> 178         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
    179 
    180 pattern_link = re.compile(r'\\link\{(.+?)\}')

~/anaconda3/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)
    105             new_kwargs[k] = conversion.py2ri(v)
    106         res = super(Function, self).__call__(*new_args, **new_kwargs)
--> 107         res = conversion.ri2ro(res)
    108         return res
    109 

~/anaconda3/lib/python3.6/functools.py in wrapper(*args, **kw)
    805                             '1 positional argument')
    806 
--> 807         return dispatch(args[0].__class__)(*args, **kw)
    808 
    809     funcname = getattr(func, '__name__', 'singledispatch function')

~/anaconda3/lib/python3.6/site-packages/rpy2/ipython/rmagic.py in _(obj)
    147     if 'data.frame' in obj.rclass:
    148         # request to turn it to a pandas DataFrame
--> 149         res = converter.ri2py(obj)
    150     else:
    151         res = ro.sexpvector_to_ro(obj)

~/anaconda3/lib/python3.6/functools.py in wrapper(*args, **kw)
    805                             '1 positional argument')
    806 
--> 807         return dispatch(args[0].__class__)(*args, **kw)
    808 
    809     funcname = getattr(func, '__name__', 'singledispatch function')

~/anaconda3/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py in ri2py_listvector(obj)
    181 def ri2py_listvector(obj):
    182     if 'data.frame' in obj.rclass:
--> 183         res = ri2py(DataFrame(obj))
    184     else:
    185         res = numpy2ri.ri2py(obj)

~/anaconda3/lib/python3.6/functools.py in wrapper(*args, **kw)
    805                             '1 positional argument')
    806 
--> 807         return dispatch(args[0].__class__)(*args, **kw)
    808 
    809     funcname = getattr(func, '__name__', 'singledispatch function')

~/anaconda3/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py in ri2py_dataframe(obj)
    188 @ri2py.register(DataFrame)
    189 def ri2py_dataframe(obj):
--> 190     items = tuple((k, ri2py(v)) for k, v in obj.items())
    191     res = PandasDataFrame.from_items(items)
    192     return res

~/anaconda3/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py in <genexpr>(.0)
    188 @ri2py.register(DataFrame)
    189 def ri2py_dataframe(obj):
--> 190     items = tuple((k, ri2py(v)) for k, v in obj.items())
    191     res = PandasDataFrame.from_items(items)
    192     return res

~/anaconda3/lib/python3.6/functools.py in wrapper(*args, **kw)
    805                             '1 positional argument')
    806 
--> 807         return dispatch(args[0].__class__)(*args, **kw)
    808 
    809     funcname = getattr(func, '__name__', 'singledispatch function')

~/anaconda3/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py in ri2py_intvector(obj)
    146         res = pandas.Categorical.from_codes(numpy.asarray(obj) - 1,
    147                                             categories = obj.do_slot('levels'),
--> 148                                             ordered = 'ordered' in obj.rclass)
    149     else:
    150         res = numpy2ri.ri2py(obj)

~/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/categorical.py in from_codes(cls, codes, categories, ordered, dtype)
    646         dtype = CategoricalDtype._from_values_or_dtype(categories=categories,
    647                                                        ordered=ordered,
--> 648                                                        dtype=dtype)
    649         if dtype.categories is None:
    650             msg = ("The categories must be provided in 'categories' or "

~/anaconda3/lib/python3.6/site-packages/pandas/core/dtypes/dtypes.py in _from_values_or_dtype(cls, values, categories, ordered, dtype)
    322             # Note: This could potentially have categories=None and
    323             # ordered=None.
--> 324             dtype = CategoricalDtype(categories, ordered)
    325 
    326         return dtype

~/anaconda3/lib/python3.6/site-packages/pandas/core/dtypes/dtypes.py in __init__(self, categories, ordered)
    224 
    225     def __init__(self, categories=None, ordered=None):
--> 226         self._finalize(categories, ordered, fastpath=False)
    227 
    228     @classmethod

~/anaconda3/lib/python3.6/site-packages/pandas/core/dtypes/dtypes.py in _finalize(self, categories, ordered, fastpath)
    333         if categories is not None:
    334             categories = self.validate_categories(categories,
--> 335                                                   fastpath=fastpath)
    336 
    337         self._categories = categories

~/anaconda3/lib/python3.6/site-packages/pandas/core/dtypes/dtypes.py in validate_categories(categories, fastpath)
    502         if not fastpath and not is_list_like(categories):
    503             msg = "Parameter 'categories' must be list-like, was {!r}"
--> 504             raise TypeError(msg.format(categories))
    505         elif not isinstance(categories, ABCIndexClass):
    506             categories = Index(categories, tupleize_cols=False)

TypeError: Parameter 'categories' must be list-like, was <rpy2.rinterface.StrSexpVector - Python:0x1c26c34eb8 / R:0x7fa21b9ef388>

Building container image failes due to missing files

There are at least 2 files missing.

I just did a clean checkout of the repository and wanted to build the image (yes, I'm on Windows -- not all of us have a choice here).

PS C:\Users\marcherm\src\single-cell-tutorial> git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
PS C:\Users\marcherm\src\single-cell-tutorial> git rev-parse --short HEAD
8168bcb

It fails with the following message:

> docker build -t single-cell-tutorial .
[+] Building 9.8s (40/41)
 => [internal] load .dockerignore                                                                                                                                                                                     0.1s
 => => transferring context: 2B                                                                                                                                                                                       0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                                  0.3s
 => => transferring dockerfile: 3.23kB                                                                                                                                                                                0.0s
 => [internal] load metadata for docker.io/library/debian:10                                                                                                                                                          9.2s
 => [internal] load build context                                                                                                                                                                                     0.1s
 => => transferring context: 2B                                                                                                                                                                                       0.0s
 => CANCELED [1/37] FROM docker.io/library/debian:10@sha256:e2cc6fb403be437ef8af68bdc3a89fd58e80b4e390c58f14c77c466002391193                                                                                          0.1s
 => => resolve docker.io/library/debian:10@sha256:e2cc6fb403be437ef8af68bdc3a89fd58e80b4e390c58f14c77c466002391193                                                                                                    0.0s
 => CACHED [2/37] RUN apt-get update && apt-get install -y --no-install-recommends build-essential apt-utils ca-certificates zlib1g-dev gfortran locales libxml2-dev libcurl4-openssl-dev libssl-dev libzmq3-dev lib  0.0s
 => CACHED [3/37] RUN apt-get update && apt-get install -y --no-install-recommends wget curl htop less nano vim emacs git                                                                                             0.0s
 => CACHED [4/37] RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen     && locale-gen en_US.utf8     && /usr/sbin/update-locale LANG=en_US.UTF-8                                                                        0.0s
 => CACHED [5/37] WORKDIR /opt/R                                                                                                                                                                                      0.0s
 => CACHED [6/37] RUN wget https://cran.rstudio.com/src/base/R-3/R-3.6.3.tar.gz                                                                                                                                       0.0s
 => CACHED [7/37] RUN tar xvfz R-3.6.3.tar.gz && rm R-3.6.3.tar.gz                                                                                                                                                    0.0s
 => CACHED [8/37] WORKDIR /opt/R/R-3.6.3                                                                                                                                                                              0.0s
 => CACHED [9/37] RUN ./configure --enable-R-shlib --with-cairo --with-libpng --prefix=/opt/R/                                                                                                                        0.0s
 => CACHED [10/37] RUN make && make install                                                                                                                                                                           0.0s
 => CACHED [11/37] WORKDIR /opt/R                                                                                                                                                                                     0.0s
 => CACHED [12/37] RUN rm -rf /opt/R/R-3.6.3                                                                                                                                                                          0.0s
 => CACHED [13/37] RUN Rscript -e "update.packages(ask=FALSE, repos='https://ftp.gwdg.de/pub/misc/cran/')"                                                                                                            0.0s
 => CACHED [14/37] RUN Rscript -e "install.packages(c('devtools', 'gam', 'RColorBrewer', 'BiocManager', 'IRkernel'), repos='https://ftp.gwdg.de/pub/misc/cran/')"                                                     0.0s
 => CACHED [15/37] RUN Rscript -e "devtools::install_github(c('patzaw/neo2R', 'patzaw/BED'))"                                                                                                                         0.0s
 => CACHED [16/37] RUN Rscript -e "BiocManager::install(c('scran','MAST','monocle','ComplexHeatmap','limma','slingshot','clusterExperiment','DropletUtils'), version = '3.10')"                                       0.0s
 => CACHED [17/37] RUN Rscript -e 'writeLines(capture.output(sessionInfo()), "../package_versions_r.txt")' --default-packages=scran,RColorBrewer,slingshot,monocle,gam,clusterExperiment,ggplot2,plyr,MAST,DropletUt  0.0s
 => CACHED [18/37] WORKDIR /opt/python                                                                                                                                                                                0.0s
 => CACHED [19/37] RUN wget https://www.python.org/ftp/python/3.7.7/Python-3.7.7.tgz                                                                                                                                  0.0s
 => CACHED [20/37] RUN tar zxfv Python-3.7.7.tgz && rm Python-3.7.7.tgz                                                                                                                                               0.0s
 => CACHED [21/37] WORKDIR /opt/python/Python-3.7.7                                                                                                                                                                   0.0s
 => CACHED [22/37] RUN ./configure --enable-optimizations --with-lto --prefix=/opt/python/                                                                                                                            0.0s
 => CACHED [23/37] RUN make && make install                                                                                                                                                                           0.0s
 => CACHED [24/37] WORKDIR /opt/python                                                                                                                                                                                0.0s
 => CACHED [25/37] RUN rm -rf /opt/python/Python-3.7.7                                                                                                                                                                0.0s
 => CACHED [26/37] RUN ln -s /opt/python/bin/python3 /opt/python/bin/python                                                                                                                                           0.0s
 => CACHED [27/37] RUN ln -s /opt/python/bin/pip3 /opt/python/bin/pip                                                                                                                                                 0.0s
 => CACHED [28/37] RUN pip install --no-cache-dir -U pip wheel setuptools cmake                                                                                                                                       0.0s
 => CACHED [29/37] RUN pip install --no-cache-dir -U scanpy==1.4.6 python-igraph==0.8.0 louvain==0.6.1 jupyterlab=2.1.0 cellxgene==0.15.0 rpy2==3.2.7 anndata2ri==1.0.2 leidenalg==0.7.0 fa2==0.3.5 MulticoreTSNE==0  0.0s
 => CACHED [30/37] RUN pip install --no-cache-dir git+https://github.com/le-ander/epiScanpy.git                                                                                                                       0.0s
 => CACHED [31/37] RUN jupyter contrib nbextension install --system                                                                                                                                                   0.0s
 => CACHED [32/37] RUN jupyter nbextension enable --py widgetsnbextension                                                                                                                                             0.0s
 => CACHED [33/37] RUN pip freeze > ../package_versions_py.txt                                                                                                                                                        0.0s
 => CACHED [34/37] RUN Rscript -e "IRkernel::installspec(user = FALSE)"                                                                                                                                               0.0s
 => ERROR [35/37] COPY .bashrc_docker /root/.bashrc                                                                                                                                                                   0.0s
 => ERROR [36/37] COPY .profile_docker /root/.profile                                                                                                                                                                 0.0s
------
 > [35/37] COPY .bashrc_docker /root/.bashrc:
------
------
 > [36/37] COPY .profile_docker /root/.profile:
------
failed to solve with frontend dockerfile.v0: failed to build LLB: failed to compute cache key: "/.profile_docker" not found: not found

Concatenate files error

I'm following the notebook but I'm getting an error in concatenating the files:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-b1c46f8469f7> in <module>
     28 
     29     # Concatenate to main adata object
---> 30     adata = adata.concatenate(adata_tmp)
     31     adata.var['gene_id'] = adata.var['gene_id-1']
     32     adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)

~/anaconda3/envs/sc-tutorial/lib/python3.7/site-packages/anndata/base.py in concatenate(self, join, batch_key, batch_categories, index_unique, *adatas)
   2018             # var
   2019             for c in ad.var.columns:
-> 2020                 new_c = c + (index_unique if index_unique is not None else '-') + categories[i]
   2021                 var.loc[vars_intersect, new_c] = ad.var.loc[vars_intersect, c]
   2022 

TypeError: unsupported operand type(s) for +: 'int' and `'str'`

I'm using this parameters:

#Data files
sample_strings = ['bystander', 'uninfected', 'infected']
sample_id_strings = ['1', '2', '3']
file_base = '/home/ec2-user/LSHIV/LSHIV8'
exp_string = '_Regional_'
data_file_end = '_matrix.mtx'
barcode_file_end = '_barcodes.tsv'
gene_file_end = '_genes.tsv'
cc_genes_file = '~/pipeline/Macosko_cell_cycle_genes.txt'`

ValueError: cannot reindex from a duplicate axis

Hi,

I followed the instruction step by step, and have successfully set up the environment and load the data. But when I ran Case-study_Mouse-intestinal-epithelium_1904.ipynb in linux, I got errors:

ValueError: cannot reindex from a duplicate axis

The full error message is like this:
==============================================================================
Traceback (most recent call last):
  File "/home/lwang/miniconda3/bin/jupyter-nbconvert", line 11, in <module>
    sys.exit(main())
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/jupyter_core/application.py", line 254, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/traitlets/config/application.py", line 845, in launch_instance
    app.start()
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 350, in start
    self.convert_notebooks()
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 524, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 489, in convert_single_notebook
    output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 418, in export_single_notebook
    output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 181, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 199, in from_file
    return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/exporters/notebook.py", line 32, in from_notebook_node
    nb_copy, resources = super().from_notebook_node(nb, resources, **kw)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 143, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 318, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/preprocessors/base.py", line 47, in __call__
    return self.preprocess(nb, resources)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 79, in preprocess
    self.execute()
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbclient/util.py", line 74, in wrapped
    return just_run(coro(*args, **kwargs))
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbclient/util.py", line 53, in just_run
    return loop.run_until_complete(coro)
  File "/home/lwang/miniconda3/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbclient/client.py", line 540, in async_execute
    await self.async_execute_cell(
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 123, in async_execute_cell
    cell, resources = self.preprocess_cell(cell, self.resources, cell_index)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 146, in preprocess_cell
    cell = run_sync(NotebookClient.async_execute_cell)(self, cell, index, store_history=self.store_history)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbclient/util.py", line 74, in wrapped
    return just_run(coro(*args, **kwargs))
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbclient/util.py", line 53, in just_run
    return loop.run_until_complete(coro)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
  File "/home/lwang/miniconda3/lib/python3.8/asyncio/futures.py", line 178, in result
    raise self._exception
  File "/home/lwang/miniconda3/lib/python3.8/asyncio/tasks.py", line 280, in __step
    result = coro.send(None)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbclient/client.py", line 832, in async_execute_cell
    self._check_raise_for_error(cell, exec_reply)
  File "/home/lwang/miniconda3/lib/python3.8/site-packages/nbclient/client.py", line 740, in _check_raise_for_error
    raise CellExecutionError.from_cell_and_msg(cell, exec_reply['content'])
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
# Loop to load rest of data sets
for i in range(len(sample_strings)):
    #Parse Filenames
    sample = sample_strings[i]
    sample_id = sample_id_strings[i]
    data_file = file_base+sample_id+exp_string+sample+data_file_end
    barcode_file = file_base+sample_id+exp_string+sample+barcode_file_end
    gene_file = file_base+sample_id+exp_string+sample+gene_file_end
    
    #Load data
    adata_tmp = sc.read(data_file, cache=True)
    adata_tmp = adata_tmp.transpose()
    adata_tmp.X = adata_tmp.X.toarray()

    barcodes_tmp = pd.read_csv(barcode_file, header=None, sep='\t')
    genes_tmp = pd.read_csv(gene_file, header=None, sep='\t')
    
    #Annotate data
    barcodes_tmp.rename(columns={0:'barcode'}, inplace=True)
    barcodes_tmp.set_index('barcode', inplace=True)
    adata_tmp.obs = barcodes_tmp
    adata_tmp.obs['sample'] = [sample]*adata_tmp.n_obs
    adata_tmp.obs['region'] = [sample.split("_")[0]]*adata_tmp.n_obs
    adata_tmp.obs['donor'] = [sample.split("_")[1]]*adata_tmp.n_obs
    
    genes_tmp.rename(columns={0:'gene_id', 1:'gene_symbol'}, inplace=True)
    genes_tmp.set_index('gene_symbol', inplace=True)
    adata_tmp.var = genes_tmp
    adata_tmp.var_names_make_unique()

    # Concatenate to main adata object
    adata = adata.concatenate(adata_tmp, batch_key='sample_id')
    adata.var['gene_id'] = adata.var['gene_id-1']
    adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
    adata.obs.drop(columns=['sample_id'], inplace=True)
    adata.obs_names = [c.split("-")[0] for c in adata.obs_names]
    adata.obs_names_make_unique(join='_')
------------------

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-01aaadebece1> in <module>
     30 
     31     # Concatenate to main adata object
---> 32     adata = adata.concatenate(adata_tmp, batch_key='sample_id')
     33     adata.var['gene_id'] = adata.var['gene_id-1']
     34     adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)

~/miniconda3/lib/python3.8/site-packages/anndata/_core/anndata.py in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
   1694         all_adatas = (self,) + tuple(adatas)
   1695 
-> 1696         out = concat(
   1697             all_adatas,
   1698             axis=0,

~/miniconda3/lib/python3.8/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
    812 
    813     # Annotation for other axis
--> 814     alt_annot = merge_dataframes(
    815         [getattr(a, alt_dim) for a in adatas], alt_indices, merge
    816     )

~/miniconda3/lib/python3.8/site-packages/anndata/_core/merge.py in merge_dataframes(dfs, new_index, merge_strategy)
    524 
    525 def merge_dataframes(dfs, new_index, merge_strategy=merge_unique):
--> 526     dfs = [df.reindex(index=new_index) for df in dfs]
    527     # New dataframe with all shared data
    528     new_df = pd.DataFrame(merge_strategy(dfs), index=new_index)

~/miniconda3/lib/python3.8/site-packages/anndata/_core/merge.py in <listcomp>(.0)
    524 
    525 def merge_dataframes(dfs, new_index, merge_strategy=merge_unique):
--> 526     dfs = [df.reindex(index=new_index) for df in dfs]
    527     # New dataframe with all shared data
    528     new_df = pd.DataFrame(merge_strategy(dfs), index=new_index)

~/miniconda3/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    310         @wraps(func)
    311         def wrapper(*args, **kwargs) -> Callable[..., Any]:
--> 312             return func(*args, **kwargs)
    313 
    314         kind = inspect.Parameter.POSITIONAL_OR_KEYWORD

~/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py in reindex(self, *args, **kwargs)
   4171         kwargs.pop("axis", None)
   4172         kwargs.pop("labels", None)
-> 4173         return super().reindex(**kwargs)
   4174 
   4175     def drop(

~/miniconda3/lib/python3.8/site-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
   4804 
   4805         # perform the reindex on the axes
-> 4806         return self._reindex_axes(
   4807             axes, level, limit, tolerance, method, fill_value, copy
   4808         ).__finalize__(self, method="reindex")

~/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4017         index = axes["index"]
   4018         if index is not None:
-> 4019             frame = frame._reindex_index(
   4020                 index, method, copy, level, fill_value, limit, tolerance
   4021             )

~/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py in _reindex_index(self, new_index, method, copy, level, fill_value, limit, tolerance)
   4036             new_index, method=method, level=level, limit=limit, tolerance=tolerance
   4037         )
-> 4038         return self._reindex_with_indexers(
   4039             {0: [new_index, indexer]},
   4040             copy=copy,

~/miniconda3/lib/python3.8/site-packages/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   4870 
   4871             # TODO: speed up on homogeneous DataFrame objects
-> 4872             new_data = new_data.reindex_indexer(
   4873                 index,
   4874                 indexer,

~/miniconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate, only_slice)
   1299         # some axes don't allow reindexing with dups
   1300         if not allow_dups:
-> 1301             self.axes[axis]._can_reindex(indexer)
   1302 
   1303         if axis >= self.ndim:

~/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
   3474         # trying to reindex on an axis with duplicates
   3475         if not self._index_as_unique and len(indexer):
-> 3476             raise ValueError("cannot reindex from a duplicate axis")
   3477 
   3478     def reindex(self, target, method=None, level=None, limit=None, tolerance=None):

ValueError: cannot reindex from a duplicate axis
ValueError: cannot reindex from a duplicate axis

===============================================================================

I didn't change anything (the file names, the file directory or code all stay the same), so I'm not sure why it didn't work. Am I supposed to modify the code or anything? I'm very new in scRNA analysis, so I have a hard time to figure our where to start debugging, could you please give me some advice on this?

Thanks very very much!
Leran

rpy2 does not support sparse matrices

A warning should be added that R method integration relies on rpy2, which only supports dense matrices.

Converting between sparse and dense matrices

Hi Malte,

Question about sparse and dense matrices. I understand we need to work with dense matrices in order to use rpy2. Is there a way to convert back to a sparse matrix (at the end of the tutorial?) if needed for other functions downstream?
My understanding of how dense vs sparse formats are handled is limited.

Not sure I understand it, but do most of the scanpy notebooks run off sparse matrices?
I tried running bdata = sp.sparse.csr_matrix(adata) to convert to a sparse matrix but got an error of ValueError: unrecognized csr_matrix constructor usage.

Previously when working off the scanpy notebooks I used to run a function to generate a gene regulatory network based off a list of provided transcripts. I get an error when trying to run that function in the current dense format after following the tutorial and I'm wondering if its to do with dense/sparse matrices. Happy to take this offline if you'd like more detail.

Thanks!

no latest tag for Docker container

Dear @le-ander and @LuckyMD,

to get a head start on my SC analysis adventures I wanted to complete the tutorial.
However, when attempting to pull the container via:

docker pull leanderd/single-cell-analysis

I encountered

Error response from daemon: manifest for leanderd/single-cell-analysis:latest not found: manifest unknown: manifest unknown

The reason for this behaviour is that by default Docker looks for the latest tag if no tag was specified.
However, you only have an explicit tag of 200402. It is pretty much standard to always have a latest tag, which corresponds to the latest release and I would like to suggest to add a latest tag.

Error on "Making variable names unique for controlled concatenation."

Hello,
First of all, thank you very much for all the details you have provided.

I am getting an error on "making variable names unique for controlled concatenation" on the codes you've posted [here|https://github.com/theislab/single-cell-tutorial/blob/master/latest_notebook/Case-study_Mouse-intestinal-epithelium_1906.ipynb]

Making variable names unique for controlled concatenation.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Anaconda_python3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'gene_id-1'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-22-01aaadebece1> in <module>
     31     # Concatenate to main adata object
     32     adata = adata.concatenate(adata_tmp, batch_key='sample_id')
---> 33     adata.var['gene_id'] = adata.var['gene_id-1']
     34     adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
     35     adata.obs.drop(columns=['sample_id'], inplace=True)

/Anaconda_python3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2978             if self.columns.nlevels > 1:
   2979                 return self._getitem_multilevel(key)
-> 2980             indexer = self.columns.get_loc(key)
   2981             if is_integer(indexer):
   2982                 indexer = [indexer]

/Anaconda_python3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'gene_id-1'

  | gene_id
-- | --
mm10_Xkr4 | mm10_ENSMUSG00000051951
mm10_Gm1992 | mm10_ENSMUSG00000089699
mm10_Gm37381 | mm10_ENSMUSG00000102343
mm10_Rp1 | mm10_ENSMUSG00000025900
mm10_Rp1-1 | mm10_ENSMUSG00000109048
... | ...
mm10_AC168977.1 | mm10_ENSMUSG00000079808
mm10_PISD | mm10_ENSMUSG00000095041
mm10_DHRSX | mm10_ENSMUSG00000063897
mm10_Vmn2r122 | mm10_ENSMUSG00000096730
mm10_CAAA01147332.1 | mm10_ENSMUSG00000095742

Error in differential expression analysis

Hello, I was using this workflow to analyze my data and everything was working perfectly until I reached the differential expression analysis. I am running the notebook on a mac os 10.12.6, the version of R that I am using is 3.6.1.
I have identified that the error occurs in the first line and produces the following error message

%%R -i adata_test -o ent_de 

sca <- SceToSingleCellAssay(adata_test, class = "SingleCellAssay")

The following error message is produced

Error in (function (..., along = N, rev.along = NULL, new.names = NULL,  : 
 arg 'counts' is non-atomic
Calls: <Anonymous> ... callGeneric -> eval -> eval -> do.call -> do.call -> <Anonymous>

I thought the error might be a result of the fact that after dividing the measured counts by the size factor adata.X is a numpy matrix instead of an array so I attempted to convert the matrix to an array by adding the npsqueeze(np.asarray( function in the second line

adata_test = adata.copy()
adata_test.X = np.squeeze(np.asarray(adata.raw.X))
adata_test.obs['n_genes'] = (adata_test.X > 0).sum(1)

Any help would be much appreciated

sc.pp.neighbors(adata_pp)

When I conducted the comond "sc.pp.neighbors(adata_pp)"
it reminds me that
'''
computing neighbors
using 'X_pca' with n_pcs = 15

python3.6/site-packages/numba/core/typed_passes.py:314: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "python3.6/site-packages/umap/rp_tree.py", line 135:
@numba.njit(fastmath=True, nogil=True, parallel=True)
def euclidean_random_projection_split(data, indices, rng_state):
^

state.func_ir.loc))
python3.6/site-packages/umap/nndescent.py:92: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "python3.6/site-packages/umap/utils.py", line 409:
@numba.njit(parallel=True)
def build_candidates(current_graph, n_vertices, n_neighbors, max_candidates, rng_state):
^

current_graph, n_vertices, n_neighbors, max_candidates, rng_state
python3.6/site-packages/numba/core/typed_passes.py:314: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "python3.6/site-packages/umap/nndescent.py", line 47:
@numba.njit(parallel=True)
def nn_descent(
^

state.func_ir.loc))
finished: added to .uns['neighbors']
.obsp['distances'], distances for each pair of neighbors
.obsp['connectivities'], weighted adjacency matrix (0:00:06)

''''

Could you help me solve the problem？
Thanks and best wishes

can't set up environment

Hi,
I was trying to follow your instruction to set up the environment on win 10 using
conda env create -f sc_tutorial_environment.yml
But an error pops up.
_Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:

bioconductor-rhdf5lib_
Is it because of my platform? Any idea how to fix this?
Thanks.

%%R -i data_ent

Hi.
Thank you for a very thorough and excellent guide.
I've encountered a a type error while running the cell in the slingshot section. Error starts at the first line of the cell.
%%R -i adata_ent

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-142-648349c1be8b> in <module>
----> 1 get_ipython().run_cell_magic('R', '-i adata_ent', '\n#Plot 1\ncolour_map = brewer.pal(20,\'Set1\')\npar(xpd=TRUE)\npar(mar=c(4.5,5.5,2,7))\nplot(reducedDims(adata_ent)$PCA[,1], reducedDims(adata_ent)$PCA[,2], col=colour_map[colData(adata_ent)$louvain_final], bty=\'L\', xlab=\'PC1\', ylab=\'PC2\')\nlegend(x=12, y=12, legend=unique(colData(adata_ent)$louvain_final), fill=colour_map[as.integer(unique(colData(adata_ent)$louvain_final))])\n\nprint("1:")\nadata_ent_start <- slingshot(adata_ent, clusterLabels = \'louvain_final\', reducedDim = \'PCA\', start.clus=\'Stem\')\nprint(SlingshotDataSet(adata_ent_start))\n\nprint("")\nprint("2:")\nadata_ent_startend <- slingshot(adata_ent, clusterLabels = \'louvain_final\', reducedDim = \'PCA\', start.clus=\'Stem\', end.clus=c(\'Enterocyte mat. (Proximal)\', \'Enterocyte mat. (Distal)\'))\nprint(SlingshotDataSet(adata_ent_startend))\n\nprint("")\nprint("3:")\nadata_ent_simple_startend <- slingshot(adata_ent, clusterLabels = \'louvain_r0.5\', reducedDim = \'PCA\', start.clus=\'Stem\', end.clus=\'Enterocyte\')\nprint(SlingshotDataSet(adata_ent_simple_startend))\n')

~\Anaconda3\envs\sc-tutorial\lib\site-packages\IPython\core\interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2379             with self.builtin_trap:
   2380                 args = (magic_arg_s, cell)
-> 2381                 result = fn(*args, **kwargs)
   2382             return result
   2383 

<decorator-gen-122> in R(self, line, cell, local_ns)

~\Anaconda3\envs\sc-tutorial\lib\site-packages\IPython\core\magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

~\Anaconda3\envs\sc-tutorial\lib\site-packages\rpy2\ipython\rmagic.py in R(self, line, cell, local_ns)
    735                         raise NameError("name '%s' is not defined" % input)
    736                 with localconverter(converter) as cv:
--> 737                     ro.r.assign(input, val)
    738 
    739         if args.display:

~\Anaconda3\envs\sc-tutorial\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    195                 v = kwargs.pop(k)
    196                 kwargs[r_k] = v
--> 197         return (super(SignatureTranslatedFunction, self)
    198                 .__call__(*args, **kwargs))
    199 

~\Anaconda3\envs\sc-tutorial\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    124                 new_kwargs[k] = conversion.py2rpy(v)
    125         res = super(Function, self).__call__(*new_args, **new_kwargs)
--> 126         res = conversion.rpy2py(res)
    127         return res
    128 

~\Anaconda3\envs\sc-tutorial\lib\functools.py in wrapper(*args, **kw)
    873                             '1 positional argument')
    874 
--> 875         return dispatch(args[0].__class__)(*args, **kw)
    876 
    877     funcname = getattr(func, '__name__', 'singledispatch function')

~\Anaconda3\envs\sc-tutorial\lib\site-packages\anndata2ri\r2py.py in rpy2py_s4(obj)
     25         return rpy2py_data_frame(obj)
     26     elif "SingleCellExperiment" in r_classes:
---> 27         return rpy2py_single_cell_experiment(obj)
     28     elif supported_r_matrix_classes() & r_classes:
     29         return rmat_to_spmat(obj)

~\Anaconda3\envs\sc-tutorial\lib\site-packages\anndata2ri\r2py.py in rpy2py_single_cell_experiment(obj)
     82 
     83     # TODO: Once the AnnData bug is fixed, remove the “or None”
---> 84     return AnnData(exprs, obs, var, uns, obsm or None, layers=layers)

~\Anaconda3\envs\sc-tutorial\lib\site-packages\anndata\_core\anndata.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
    305             self._init_as_view(X, oidx, vidx)
    306         else:
--> 307             self._init_as_actual(
    308                 X=X,
    309                 obs=obs,

~\Anaconda3\envs\sc-tutorial\lib\site-packages\anndata\_core\anndata.py in _init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
    514 
    515         # Backwards compat for connectivities matrices in uns["neighbors"]
--> 516         _move_adj_mtx({"uns": self._uns, "obsp": self._obsp})
    517 
    518         self._check_dimensions()

~\Anaconda3\envs\sc-tutorial\lib\site-packages\anndata\compat\__init__.py in _move_adj_mtx(d)
    153         if (
    154             (k in n)
--> 155             and isinstance(n[k], (spmatrix, np.ndarray))
    156             and len(n[k].shape) == 2
    157         ):

~\Anaconda3\envs\sc-tutorial\lib\site-packages\rpy2\robjects\vectors.py in __getitem__(self, i)
    263 
    264     def __getitem__(self, i):
--> 265         res = super().__getitem__(i)
    266 
    267         if isinstance(res, Sexp):

~\Anaconda3\envs\sc-tutorial\lib\site-packages\rpy2\rinterface_lib\sexp.py in __getitem__(self, i)
    628             )
    629         else:
--> 630             raise TypeError(
    631                 'Indices must be integers or slices, not %s' % type(i))
    632         return res

TypeError: Indices must be integers or slices, not <class 'str'>

Here's is my adata_ent before input in the cell :

adata_ent
[Out]
AnnData object with n_obs × n_vars = 5495 × 4000
    obs: 'sample', 'region', 'donor', 'n_counts', 'log_counts', 'n_genes', 'mt_frac', 'size_factors', 'S_score', 'G2M_score', 'phase', 'louvain_r1', 'louvain_r0.5', 'Enterocyte_marker_expr', 'Stem_marker_expr', 'louvain_r0.5_entero_sub', 'louvain_r0.5_entero_mat_sub', 'louvain_final', 'prox_dist', 'umap_density_prox_dist'
    var: 'gene_id', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'sample_colors', 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'diffmap_evals', 'draw_graph', 'phase_colors', 'louvain', 'louvain_r1_colors', 'louvain_r0.5_colors', 'region_colors', 'rank_genes_r0.5', 'louvain_r0.5_entero_sub_colors', 'rank_genes_r0.5_entero_sub', 'louvain_r0.5_entero_mat_sub_colors', 'rank_genes_r0.5_entero_mat_sub', 'louvain_final_colors', 'umap_density_prox_dist_params'
    obsm: 'X_pca', 'X_tsne', 'X_umap', 'X_diffmap', 'X_draw_graph_fa'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'distances', 'connectivities'

Thank you for looking into this.

Regards,
T

Problems when installing package"Slingshot"and "Monocle"

I am following the "Scripts for "Current best-practices in single-cell RNA-seq: a tutorial"". The environment is set up successfully and can works for my single-cell data and produce many useful results. But when I installed the "Slingshot" or "Monocle", there is an error: Warning message: “dependency ‘XML’ is not available”.
Some people told me maybe the R version in the "sc_tutorial_environment.yml" is low, now it is 3.5.1(2018-7-2), not compatible with the latest version of "Slingshot" or "Monocle". But if R is upgraded to 4.0, the python program RPY2 will not be compatible with R 4.0 as well. I am not sure how to solve the problem?

Thanks

Problem with adata concatenation

Hello, I am working with the jupyter notebook on macos and followed the Environment set up instructions. I am aware that the conda build may be problematic on macos, but as far as I can tell that was not an issue for me. To confirm, interactive cell 2 returns

scanpy==1.5.1 anndata==0.7.3 umap==0.4.3 numpy==1.18.4 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.23.1 statsmodels==0.11.1 python-igraph==0.8.2 louvain==0.6.1

When running the notebook (inside the conda environment) I encounter the following error message apparently triggered by the adata method concatenate:

 ... reading from cache file cache/..-data-Haber-et-al_mouse-intestinal-epithelium-GSE92332_RAW-GSM2836574_Regional_Duo_M2_matrix.h5ad

---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
<ipython-input-6-01aaadebece1> in <module>
     30 
     31     # Concatenate to main adata object
---> 32     adata = adata.concatenate(adata_tmp, batch_key='sample_id')
     33     adata.var['gene_id'] = adata.var['gene_id-1']
     34     adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)

~/opt/miniconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/anndata.py in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
   1696         all_adatas = (self,) + tuple(adatas)
   1697 
-> 1698         out = concat(
   1699             all_adatas,
   1700             join=join,

~/opt/miniconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in concat(adatas, join, batch_key, batch_categories, uns_merge, index_unique, fill_value)
    454 
    455     var_names = resolve_index([a.var_names for a in adatas], join=join)
--> 456     reindexers = [
    457         gen_reindexer(var_names, a.var_names, fill_value=fill_value) for a in adatas
    458     ]

~/opt/miniconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in <listcomp>(.0)
    455     var_names = resolve_index([a.var_names for a in adatas], join=join)
    456     reindexers = [
--> 457         gen_reindexer(var_names, a.var_names, fill_value=fill_value) for a in adatas
    458     ]
    459 

~/opt/miniconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in gen_reindexer(new_var, cur_var, fill_value)
    255     new_size = len(new_var)
    256     old_size = len(cur_var)
--> 257     new_pts = new_var.get_indexer(cur_var)
    258     cur_pts = np.arange(len(new_pts))
    259 

~/opt/miniconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2731 
   2732         if not self.is_unique:
-> 2733             raise InvalidIndexError(
   2734                 "Reindexing only valid with uniquely valued Index objects"
   2735             )

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

By title, this seems potentially related to #25 that in turn took me to #21 and #28, where it is stated that commenting out adata = adata.concatenate(adata_tmp, batch_key='sample_id') and adata.obs.drop(columns=['sample_id'], inplace=True) may be required. However, this in turn generated the error message

... reading from cache file cache/..-data-Haber-et-al_mouse-intestinal-epithelium-GSE92332_RAW-GSM2836574_Regional_Duo_M2_matrix.h5ad

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/opt/miniconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'gene_id-1'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-7-6ee374501f9d> in <module>
     31     # Concatenate to main adata object
     32     #adata = adata.concatenate(adata_tmp, batch_key='sample_id')
---> 33     adata.var['gene_id'] = adata.var['gene_id-1']
     34     adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
     35     #adata.obs.drop(columns=['sample_id'], inplace=True)

~/opt/miniconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~/opt/miniconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'gene_id-1'

that brought me back to comment on #21 . I have been asked to open as a separate issue.

I think even if there is an already established solution, it may be confusing chasing through the issues having to find it- sorry!

Use scanpy combat implementation

I just saw that combat is now included in scanpy as of version 1.4

batch correction can therefore be performed with the oneliner: sc.pp.combat(adata, key='sample')

this could be included in the updated version of the notebook

this should save people an extra step as they no longer will have to download the combat code from marens github.

Error when assigning variable names and gene id columns

Hi,
Great tutorial! I'm trying to use it on a dataset comprised of 2 samples that I've concatenated following the initial steps described in your tutorial.
My initial issue comes up when running line 6 "#Assign variable names and gene id columns
adata.var_names = [g.split("")[1] for g in adata.var_names]
adata.var['gene_id'] = [g.split("")[1] for g in adata.var['gene_id']]" where I get the following error:

IndexError Traceback (most recent call last)
in
1 #Assign variable names and gene id columns
----> 2 adata.var_names = [g.split("")[1] for g in adata.var_names]
3 adata.var['gene_id'] = [g.split("")[1] for g in adata.var['gene_id']]

in (.0)
1 #Assign variable names and gene id columns
----> 2 adata.var_names = [g.split("")[1] for g in adata.var_names]
3 adata.var['gene_id'] = [g.split("")[1] for g in adata.var['gene_id']]

IndexError: list index out of range

If I change the [1] to [0] I can continue with the subsequent script lines although I get "Variable names are not unique. To make them unique, call .var_names_make_unique." throughout. I'm not entirely sure what the [1] variable represents, any help would be appreciated understanding the error.

An additional error I get is when running line 31 (after changing [1] to [0] on line 6) is when attempting to do cell cycle scoring. I get the following error:

AxisError Traceback (most recent call last)
in
12 g2m_genes_mm_ens = adata.var_names[np.in1d(adata.var_names, g2m_genes_mm)]
13
---> 14 sc.tl.score_genes_cell_cycle(adata, s_genes=s_genes_mm_ens, g2m_genes=g2m_genes_mm_ens)

~\Anaconda3\lib\site-packages\scanpy\tools_score_genes.py in score_genes_cell_cycle(adata, s_genes, g2m_genes, copy, **kwargs)
185 ctrl_size = min(len(s_genes), len(g2m_genes))
186 # add s-score
--> 187 score_genes(adata, gene_list=s_genes, score_name='S_score', ctrl_size=ctrl_size, **kwargs)
188 # add g2m-score
189 score_genes(adata, gene_list=g2m_genes, score_name='G2M_score', ctrl_size=ctrl_size, **kwargs)

~\Anaconda3\lib\site-packages\scanpy\tools_score_genes.py in score_genes(adata, gene_list, ctrl_size, gene_pool, n_bins, score_name, random_state, copy, use_raw)
114 X_control = _adata[:, control_genes].X
115 if issparse(X_control): X_control = X_control.toarray()
--> 116 X_control = np.nanmean(X_control, axis=1)
117
118 if len(gene_list) == 0:

~\Anaconda3\lib\site-packages\numpy\lib\nanfunctions.py in nanmean(a, axis, dtype, out, keepdims)
861 raise TypeError("If a is inexact, then out must be inexact")
862
--> 863 cnt = np.sum(~mask, axis=axis, dtype=np.intp, keepdims=keepdims)
864 tot = np.sum(arr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
865 avg = _divide_by_count(tot, cnt, out=out)

~\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial)
1928
1929 return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
-> 1930 initial=initial)
1931
1932

~\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
77 # support a dtype.
78 if dtype is not None:
---> 79 return reduction(axis=axis, dtype=dtype, out=out, **passkwargs)
80 else:
81 return reduction(axis=axis, out=out, **passkwargs)

~\Anaconda3\lib\site-packages\numpy\core_methods.py in _sum(a, axis, dtype, out, keepdims, initial)
34 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
35 initial=_NoValue):
---> 36 return umr_sum(a, axis, dtype, out, keepdims, initial)
37
38 def _prod(a, axis=None, dtype=None, out=None, keepdims=False,

AxisError: axis 1 is out of bounds for array of dimension 1

Any help would be greatly appreciated understanding these errors, thanks!

Currently running scanpy==1.4 anndata==0.6.18 numpy==1.15.4 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.0 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1

Aesthetics must be either length 1 or the same as the data (87): fill

I have encountered the following error when trying to execute the first code cell in the Metastable States section (3.5.6):

Error: Aesthetics must be either length 1 or the same as the data (87): fill

Furthermore:

RRuntimeError: Error in dev.off() : 
  QuartzBitmap_Output - unable to open file '/var/folders/pp/bph3z1rx3zs9m5j97r9qslxctsft0g/T/tmp3u47ajsw/Rplots001.png'

I am on macOS Mojave and R version 3.5.1 with Bioconductor 3.7 and Anaconda Python 3.6

Furthermore: scanpy==1.3.7 anndata==0.6.17 numpy==1.15.4 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.2 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1

concatenation issue persists after commenting out two lines as instructed in issue #41

I'm sorry to refer to this closed issue again, but commenting out the two lines as in #41 `#adata.var['gene_id'] = adata.var['gene_id-1']`; `#adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)` didn't seem to work for me. Could it be caused by the fact that I'm using scanpy==1.6.0 and annData==0.7.4?

Here is my code for this cell:

#Annotate data
   barcodes_tmp.rename(columns={0:'barcode'}, inplace=True)
   barcodes_tmp.set_index('barcode', inplace=True)
   adata_tmp.obs = barcodes_tmp
   adata_tmp.obs['sample'] = [sample]*adata_tmp.n_obs
   adata_tmp.obs['region'] = [sample.split("_")[0]]*adata_tmp.n_obs
   adata_tmp.obs['donor'] = [sample.split("_")[1]]*adata_tmp.n_obs
#     adata_tmp.obs_names_make_unique()
   
   genes_tmp.rename(columns={0:'gene_id', 1:'gene_symbol'}, inplace=True)
   genes_tmp.set_index('gene_symbol', inplace=True)
   adata_tmp.var = genes_tmp
   adata_tmp.var_names_make_unique()

   # Concatenate to main adata object
   adata = adata.concatenate(adata_tmp, batch_key='sample_id')
   #adata.var['gene_id'] = adata.var['gene_id-1']
   #adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
   adata.obs.drop(columns=['sample_id'], inplace=True)
   adata.obs_names = [c.split("-")[0] for c in adata.obs_names]
   adata.obs_names_make_unique(join='_')

And the error message persists like this:

... reading from cache file cache/..-data-Haber-et-al_mouse-intestinal-epithelium-GSE92332_RAW-GSM2836574_Regional_Duo_M2_matrix.h5ad

---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
<ipython-input-8-68186e73aaae> in <module>
     31 
     32     # Concatenate to main adata object
---> 33     adata = adata.concatenate(adata_tmp, batch_key='sample_id')
     34     #adata.var['gene_id'] = adata.var['gene_id-1']
     35     #adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/anndata.py in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
   1696         all_adatas = (self,) + tuple(adatas)
   1697 
-> 1698         out = concat(
   1699             all_adatas,
   1700             axis=0,

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
    799         [dim_indices(a, axis=1 - axis) for a in adatas], join=join
    800     )
--> 801     reindexers = [
    802         gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
    803     ]

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in <listcomp>(.0)
    800     )
    801     reindexers = [
--> 802         gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
    803     ]
    804 

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in gen_reindexer(new_var, cur_var)
    393            [1., 0., 0.]], dtype=float32)
    394     """
--> 395     return Reindexer(cur_var, new_var)
    396 
    397 

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in __init__(self, old_idx, new_idx)
    265         self.no_change = new_idx.equals(old_idx)
    266 
--> 267         new_pos = new_idx.get_indexer(old_idx)
    268         old_pos = np.arange(len(new_pos))
    269 

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2978 
   2979         if not self.is_unique:
-> 2980             raise InvalidIndexError(
   2981                 "Reindexing only valid with uniquely valued Index objects"
   2982             )

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Version of packages:

versions


-----
anndata     0.7.4
scanpy      1.6.0
sinfo       0.3.1
-----
PIL                 7.2.0
anndata             0.7.4
anndata2ri          1.0.4
attr                20.2.0
backcall            0.2.0
cairo               1.19.1
certifi             2020.06.20
cffi                1.14.1
chardet             3.0.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.1
decorator           4.4.2
get_version         2.1
gprofiler           1.0.0
h5py                2.10.0
idna                2.10
igraph              0.8.2
ipykernel           5.3.4
ipython_genutils    0.2.0
ipywidgets          7.5.1
jedi                0.17.2
jinja2              2.11.2
joblib              0.16.0
jsonschema          3.2.0
kiwisolver          1.2.0
legacy_api_wrap     1.2
llvmlite            0.34.0
louvain             0.6.1
markupsafe          1.1.1
matplotlib          3.3.1
mpl_toolkits        NA
natsort             7.0.1
nbformat            5.0.7
numba               0.51.2
numexpr             2.7.1
numpy               1.19.1
packaging           20.4
pandas              1.1.1
parso               0.7.1
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
prometheus_client   NA
prompt_toolkit      3.0.7
ptyprocess          0.6.0
pvectorc            NA
pygments            2.6.1
pyparsing           2.4.7
pyrsistent          NA
pytz                2020.1
requests            2.24.0
rpy2                3.3.5
scanpy              1.6.0
scipy               1.5.2
seaborn             0.10.1
send2trash          NA
setuptools_scm      NA
sinfo               0.3.1
six                 1.15.0
sklearn             0.23.2
statsmodels         0.12.0
storemagic          NA
tables              3.6.1
terminado           0.8.3
texttable           1.6.3
tornado             6.0.4
traitlets           4.3.3
tzlocal             NA
urllib3             1.25.10
wcwidth             0.2.5
yaml                5.3.1
zmq                 19.0.2
-----
IPython             7.18.1
jupyter_client      6.1.7
jupyter_core        4.6.3
notebook            6.1.3
-----
Python 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 01:22:49) [GCC 7.5.0]
Linux-4.13.0-36-generic-x86_64-with-glibc2.10
4 logical CPU cores, x86_64
-----
Session information updated at 2020-09-09 10:48;

computeSumFactors(data_mat, clusters=input_groups, min.mean=0.1) gives error

I can run everything without problem until these lines:

%%R -i data_mat -i input_groups -o size_factors
size_factors = computeSumFactors(data_mat, clusters=input_groups, min.mean=0.1)

where I get this error:

Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘assay’ for signature ‘"matrix", "character"’

Any idea whats going on here?

Im using the original code except for these lines that I commented:
#adata.var['gene_id'] = adata.var['gene_id-1']
#adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)

theislab / single-cell-tutorial Goto Github PK

single-cell-tutorial's Introduction

Scripts for "Current best-practices in single-cell RNA-seq: a tutorial"

Environment set up

Downloading the data

Case study notes

Adapting the pipeline for other datasets:

Manual installation of package requirements

Possible sources of error in the manual installation:

For R 3.4.3:

For R >= 3.5 and bioconductor >= 3.7:

For rpy2 < 3.0.0:

For enrichment analysis with g:profiler:

If not R packages can be found:

single-cell-tutorial's People

Contributors

Stargazers

Watchers

Forkers

single-cell-tutorial's Issues

>>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

'LinAlgError: Last 2 dimensions of the array must be square'

Loop to load rest of data sets

Concatenate to main adata object

ComBat batch correction

Recommend Projects

Recommend Topics

Recommend Org