theislab / scib-pipeline Goto Github PK

Snakemake pipeline that works with the scIB package to benchmark data integration methods.

License: MIT License

R 15.95% Python 78.71% Shell 5.35%

scib-pipeline's Introduction

Pipeline for benchmarking atlas-level single-cell integration

This repository contains the snakemake pipeline for our benchmarking study for data integration tools. In this study, we benchmark 16 methods (see here) with 4 combinations of preprocessing steps leading to 68 methods combinations on 85 batches of gene expression and chromatin accessibility data. The pipeline uses the scib package and allows for reproducible and automated analysis of the different steps and combinations of preprocesssing and integration methods.

Resources

On our website we visualise the results of the study.
The scib package that is used in this pipeline can be found here.
For reproducibility and visualisation we have a dedicated repository: scib-reproducibility.
The data used in the study on figshare

Please cite:

Luecken, M.D., Büttner, M., Chaichoompu, K. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2022). https://doi.org/10.1038/s41592-021-01336-8

Installation

To reproduce the results from this study, two separate conda environments are needed for python and R operations. Please make sure you have either mambaforge or conda installed on your system to be able to use the pipeline. We recommend using mamba, which is also available for conda, for faster package installations with a smaller memory footprint.

We provide python and R environment YAML files in envs/, together with an installation script for setting up the correct environments in a single command. based on the R version you want to use. The pipeline currently supports R 3.6 and R 4.0, and we generally recommend using version R 4.0. Call the script as follows e.g. for R 4.0

bash envs/create_conda_environments.sh -r 4.0

Check the script's help output in order to get the full list of arguments it uses.

bash envs/create_conda_environments.sh -h

Once installation is successful, you will have the python environment scib-pipeline-R<version> and the R environment scib-R<version> that you must specify in the config file.

R version	Python environment name	R environment name	Test data config YAML file
4.0	`scib-pipeline-R4.0`	`scib-R4.0`	`configs/test_data-R4.0.yml`
3.6	`scib-pipeline-R3.6`	`scib-R3.6`	`configs/test_data-R3.6.yml`

Note: The installation script only works for the environments listed in the table above. The environments used in our study are included for reproducibility purposes and are described in envs/.

For a more detailed description of the environment files and how to install the different environments manually, please refer to the README in envs/.

Running the Pipeline

This repository contains a snakemake pipeline to run integration methods and metrics reproducibly for different data scenarios preprocessing setups.

Generate Test data

A script in data/ can be used to generate test data. This is useful, in order to ensure that the installation was successful before moving on to a larger dataset. The pipeline expects an anndata object with normalised and log-transformed counts in adata.X and counts in adata.layers['counts']. More information on how to use the data generation script can be found in data/README.md.

Setup Configuration File

The parameters and input files are specified in config files. A description of the config formats and example files can found in configs/. You can use the example config that use the test data to get the pipeline running quickly, and then modify a copy of it to work with your own data.

Pipeline Commands

To call the pipeline on the test data e.g. using R 4.0

snakemake --configfile configs/test_data-R4.0.yaml -n

This gives you an overview of the jobs that will be run. In order to execute these jobs with up to 10 cores, call

snakemake --configfile configs/test_data-R4.0.yaml --cores 10

More snakemake commands can be found in the documentation.

Visualise the Workflow

A dependency graph of the workflow can be created anytime and is useful to gain a general understanding of the workflow. Snakemake can create a graphviz representation of the rules, which can be piped into an image file.

snakemake --configfile configs/test_data-R3.6.yaml --rulegraph | dot -Tpng -Grankdir=TB > dependency.png

Tools

Tools that are compared include:

scib-pipeline's People

Contributors

Stargazers

Watchers

scib-pipeline's Issues

Error when running snakemake

Hi. As with my precedent issue, this is probably an error linked to the system I work on but I am hoping someone will have an idea as to why this is happening.

I have set up the R4.0 environments and wanted to run it on the test data generated with the generate_data.py script provided in the data directory. The dry run snakemake --configfile configs/test_data-R4.0.yaml -n --use-conda works fine but i get the following error after snakemake --configfile configs/test_data-R4.0.yaml --cores 10 --use-conda :

ERROR conda.cli.main_run:execute(41): `conda run python scripts/preprocessing/runPP.py -i data/adata_norm.h5ad -o /SCRATCH-BIRD/users/tnoel/atlas/tools/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.RDS -b batch --hvgs 0 -r -l` failed. (See above for error)
Save as RDS

Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 80, in <module>
    runPP(file, out, hvg, batch, rout, scale, seurat)
  File "scripts/preprocessing/runPP.py", line 51, in runPP
    scib.preprocessing.saveSeurat(adata, outPath, batch, hvgs)
  File "/CONDAS/users/tnoel/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/_package_tools.py", line 20, in wrapper
    return func(*args, **kwargs)
  File "/CONDAS/users/tnoel/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/preprocessing.py", line 707, in save_seurat
    ro.r('sobj = as.Seurat(adata, counts="counts", data = "X")')
  File "/CONDAS/users/tnoel/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 438, in __call__
    res = self.eval(p)
  File "/CONDAS/users/tnoel/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 199, in __call__
    .__call__(*args, **kwargs))
  File "/CONDAS/users/tnoel/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
  File "/CONDAS/users/tnoel/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
    cdata = function(*args, **kwargs)
  File "/CONDAS/users/tnoel/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/rinterface.py", line 677, in __call__
    raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Error in as.Seurat(adata, counts = "counts", data = "X") : 
 cannot find function "as.Seurat"


[Thu May 12 16:31:59 2022]
Error in rule integration_prepare:
    jobid: 10
    output: /SCRATCH-BIRD/users/tnoel/atlas/tools/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.RDS
    shell:
        
        conda run -n scib-pipeline-R4.0 python scripts/preprocessing/runPP.py -i data/adata_norm.h5ad         -o /SCRATCH-BIRD/users/tnoel/atlas/tools/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.RDS -b batch         --hvgs 0  -r -l
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

I'm also getting a lot of warnings but I am not sure this would be linked to my problem seeing as most of them are about deprecated names.

Has anyone encountered this kind of issue ? I've checked and am able to use the as.Seurat function in R from both R4.0 environments.

mouse brain dataset is not available?

It's a wonderful work, I have found all datasets but mouse brain dataset on https://figshare.com/articles/dataset/Benchmarking_atlas-level_data_integration_in_single-cell_genomics_-_integration_task_datasets_Immune_and_pancreas_/12420968?
Could you provide the mouse brain dataset? Thank you very much

from SingleCellExperiment to pipeline

Hello,

sorry to bother again.

I am trying to start the pipeline from a R dataset.

# Prepare the system

#system("conda install -c bioconda r-sceasy")
system("conda create --name sceasy")
system("conda activate sceasy")

library(sceasy)
library(reticulate)
use_condaenv('/home/users/allstaff/mangiola.s/.conda/envs/sceasy')
loompy <- reticulate::import('loompy')

sceasy::convertFormat("software/scib-pipeline/data/adata_norm.h5ad", from="anndata", to="seurat",  outFile='test_seurat.rds')


# Convert the dataset because of a problem of scib-pipeline with column-compressed matrices
counts_SCE = counts %>% as.SingleCellExperiment()
counts_SCE@assays@data$counts = as(counts_SCE@assays@data$counts, "RsparseMatrix")

# Save the dataset
sceasy::convertFormat(counts_SCE, from="sce", to="anndata",  outFile='software/scib-pipeline/data/pbmc_CD8.h5ad')

All this works

my pipeline is

ROOT: data_cd8
r_env : scib-R4
py_env : scib-pipeline-R4

timing: false
unintegrated_metrics: false

FEATURE_SELECTION:
  #hvg: 2000
  full_feature: 0

SCALING:
  - unscaled
  #- scaled

METHODS:
# python methods
  bbknn:
    output_type: knn
  combat:
    output_type: full
  #desc:
  #  output_type: embed
#  mnn:
#    output_type: full
  #saucie:
  #  output_type:
  #    - full
  #    - embed
  scanorama:
    output_type:
      - embed
      - full
  scanvi:
    output_type: embed
    no_scale: true
    use_celltype: true
  scgen:
    output_type: full
    use_celltype: true
  scvi:
    no_scale: true
    output_type: embed
  #trvae:
  #  no_scale: true
  # output_type:
  #    - embed
  #    - full
  # trvaep:
  #   no_scale: true
  #   output_type:
  #     - embed
  #     - full
# R methods
  #conos: # temporary directory issue
  #  R: true
  #  output_type: knn
  fastmnn:
    R: true
    output_type:
      - embed
      - full
  harmony:
    R: true
    output_type: embed
  liger:
    no_scale: true
    R: true
    output_type: embed
  seurat:
    R: true
    output_type: full
  seuratrpca:
      R: true
      output_type: full

DATA_SCENARIOS:
  run_cd8:
    batch_key: batch
    label_key: celltype
    organism: human
    assay: expression
    file: data/pbmc_CD8.h5ad

What I don't understand is the error



        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/preprocessing/runPP.py', '-i', 'data/pbmc_CD8.h5ad', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data_cd8/run_cd8/prepare/unscaled/full_feature/adata_pre.RDS', '-b', 'batch', '--hvgs', '0', '-r', '-l']' command failed.  (See above for error)
Save as RDS

Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 80, in <module>
    runPP(file, out, hvg, batch, rout, scale, seurat)
  File "scripts/preprocessing/runPP.py", line 51, in runPP
    scib.preprocessing.saveSeurat(adata, outPath, batch, hvgs)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 496, in saveSeurat
    ro.r('sobj = as.Seurat(adata, counts="counts", data = "X")')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 438, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 199, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
    cdata = function(*args, **kwargs)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/rinterface.py", line 677, in __call__
    raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Error: No data in provided assay - counts


[Fri Apr 22 23:29:27 2022]
Error in rule integration_prepare:
    jobid: 10
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data_cd8/run_cd8/prepare/unscaled/full_feature/adata_pre.RDS
    shell:

        conda run -n scib-pipeline-R4 python scripts/preprocessing/runPP.py -i data/pbmc_CD8.h5ad         -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data_cd8/run_cd8/prepare/unscaled/full_feature/adata_pre.RDS -b batch         --hvgs 0  -r -l

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

do data found in assay counts, that seems strange

this is my singleCellExperiment object

```r
> counts_SCE
class: SingleCellExperiment 
dim: 277 2597 
metadata(0):
assays(2): counts logcounts
rownames(277): CD80 CD86 ... CD4 CD340-erb-B2-HER-2
rowData names(0):
colnames(2597): 2_CTGATCCAGGCTGAAC-1 4_AATGCCATCCAAGCTA-1 ... 6_AGCGATTAGAAGCGCT-1 6_CAGTGCGAGTAGGAAG-1
colData names(78): nCount_RNA nFeature_RNA ... n_cells ident

error when running pipeline with test data

Hi, thanks for your excellent work! It really helps me because I'm new in bioinformatics.

There is one error when I'm running the pipeline with the test data. I clone your git on my server. After generting the test data, the data_norm.h5ad_ is right under the scib-pipeline/data directory. Then I go back to the scib-pipeline directory, I run

snakemake --configfile configs/test_data-R4.0.yaml -n

but it comes with following reports. I don't know why it keeps reporting missing output. Could you help me with it?

And by the way, could you tell me the standard outputs that can prove what I'm doing with test data is right？Thank you so much for your patience!

Building DAG of jobs...
Job stats:
job                       count    min threads    max threads
----------------------  -------  -------------  -------------
all                           1              1              1
convert_RDS_h5ad              5              1              1
embeddings                    1              1              1
embeddings_single            13              1              1
integration_prepare           2              1              1
integration_run_python        6              1              1
integration_run_r             5              1              1
metrics                       1              1              1
metrics_single               13              1              1
total                        47              1              1


[Sat May 28 23:35:27 2022]
Job 10: 
        Preparing adata
        wildcards: hvg=full_feature,prep=RDS,scaling=unscaled,scenario=test_data_r4
        parameters: batch 0  -r -l conda run -n scib-pipeline-R4.0
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.RDS
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.RDS


[Sat May 28 23:35:27 2022]
Job 2: 
        Preparing adata
        wildcards: hvg=full_feature,prep=h5ad,scaling=unscaled,scenario=test_data_r4
        parameters: batch 0    conda run -n scib-pipeline-R4.0
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad


[Sat May 28 23:35:27 2022]
Job 4: 
        Run scanorama on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-pipeline-R4.0 python
        hvgs: 
        cell type option: 
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanorama.h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanorama.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad


[Sat May 28 23:35:27 2022]
Job 3: 
        Run combat on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-pipeline-R4.0 python
        hvgs: 
        cell type option: 
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/combat.h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/combat.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad


[Sat May 28 23:35:27 2022]
Job 12: 
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/harmony.RDS
        Run harmony on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-R4.0 Rscript
        hvgs: 
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/harmony.RDS; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.RDS


[Sat May 28 23:35:27 2022]
Job 5: 
        Run scanvi on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-pipeline-R4.0 python
        hvgs: 
        cell type option: -c celltype
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad


[Sat May 28 23:35:27 2022]
Job 14: 
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/liger.RDS
        Run liger on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-R4.0 Rscript
        hvgs: 
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/liger.RDS; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.RDS


[Sat May 28 23:35:27 2022]
Job 7: 
        Run scvi on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-pipeline-R4.0 python
        hvgs: 
        cell type option: 
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scvi.h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scvi.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad


[Sat May 28 23:35:27 2022]
Job 16: 
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seurat.RDS
        Run seurat on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-R4.0 Rscript
        hvgs: 
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seurat.RDS; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.RDS


[Sat May 28 23:35:27 2022]
Job 9: 
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.RDS
        Run fastmnn on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-R4.0 Rscript
        hvgs: 
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.RDS; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.RDS


[Sat May 28 23:35:27 2022]
Job 18: 
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seuratrpca.RDS
        Run seuratrpca on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-R4.0 Rscript
        hvgs: 
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seuratrpca.RDS; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.RDS


[Sat May 28 23:35:27 2022]
Job 1: 
        Run bbknn on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-pipeline-R4.0 python
        hvgs: 
        cell type option: 
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad


[Sat May 28 23:35:27 2022]
Job 6: 
        Run scgen on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-pipeline-R4.0 python
        hvgs: 
        cell type option: -c celltype
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scgen.h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scgen.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad


[Sat May 28 23:35:27 2022]
Job 24: 
        Metrics hvg=full_feature,method=scanvi,o_type=embed,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanvi_embed.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanvi_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad


[Sat May 28 23:35:27 2022]
Job 38: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: scanvi embed
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanvi_embed.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanvi_embed_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanvi_embed_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanvi_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad


[Sat May 28 23:35:27 2022]
Job 17: 
        Convert integrated data from seuratrpca into h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seuratrpca.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seuratrpca.RDS


[Sat May 28 23:35:27 2022]
Job 21: 
        Metrics hvg=full_feature,method=combat,o_type=full,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/combat_full.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/combat_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/combat.h5ad


[Sat May 28 23:35:27 2022]
Job 35: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: combat full
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/combat.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/combat_full.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/combat_full_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/combat_full_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/combat_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/combat.h5ad


[Sat May 28 23:35:27 2022]
Job 20: 
        Metrics hvg=full_feature,method=bbknn,o_type=knn,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/bbknn_knn.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/bbknn_knn.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad


[Sat May 28 23:35:27 2022]
Job 11: 
        Convert integrated data from harmony into h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/harmony.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/harmony.RDS


[Sat May 28 23:35:27 2022]
Job 25: 
        Metrics hvg=full_feature,method=scgen,o_type=full,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scgen_full.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scgen_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scgen.h5ad


[Sat May 28 23:35:27 2022]
Job 39: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: scgen full
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scgen.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scgen_full.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scgen_full_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scgen_full_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scgen_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scgen.h5ad


[Sat May 28 23:35:27 2022]
Job 15: 
        Convert integrated data from seurat into h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seurat.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seurat.RDS


[Sat May 28 23:35:27 2022]
Job 22: 
        Metrics hvg=full_feature,method=scanorama,o_type=full,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanorama_full.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanorama_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanorama.h5ad


[Sat May 28 23:35:27 2022]
Job 8: 
        Convert integrated data from fastmnn into h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.RDS


[Sat May 28 23:35:27 2022]
Job 36: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: scanorama full
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanorama.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanorama_full.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanorama_full_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanorama_full_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanorama_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanorama.h5ad


[Sat May 28 23:35:27 2022]
Job 26: 
        Metrics hvg=full_feature,method=scvi,o_type=embed,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scvi_embed.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scvi_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scvi.h5ad


[Sat May 28 23:35:27 2022]
Job 40: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: scvi embed
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scvi.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scvi_embed.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scvi_embed_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scvi_embed_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scvi_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scvi.h5ad


[Sat May 28 23:35:27 2022]
Job 23: 
        Metrics hvg=full_feature,method=scanorama,o_type=embed,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanorama_embed.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanorama_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanorama.h5ad


[Sat May 28 23:35:27 2022]
Job 37: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: scanorama embed
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanorama.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanorama_embed.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanorama_embed_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanorama_embed_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanorama_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanorama.h5ad


[Sat May 28 23:35:27 2022]
Job 13: 
        Convert integrated data from liger into h5ad
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/liger.h5ad; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/liger.RDS


[Sat May 28 23:35:27 2022]
Job 34: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: bbknn knn
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/bbknn_knn.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/bbknn_knn_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/bbknn_knn_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/bbknn_knn.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad


[Sat May 28 23:35:27 2022]
Job 41: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: fastmnn full
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/fastmnn_full.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/fastmnn_full_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/fastmnn_full_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/fastmnn_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.h5ad


[Sat May 28 23:35:27 2022]
Job 46: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: seuratrpca full
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seuratrpca.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/seuratrpca_full.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/seuratrpca_full_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/seuratrpca_full_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/seuratrpca_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seuratrpca.h5ad


[Sat May 28 23:35:27 2022]
Job 31: 
        Metrics hvg=full_feature,method=seurat,o_type=full,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/seurat_full.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/seurat_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seurat.h5ad


[Sat May 28 23:35:27 2022]
Job 45: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: seurat full
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seurat.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/seurat_full.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/seurat_full_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/seurat_full_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/seurat_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seurat.h5ad


[Sat May 28 23:35:27 2022]
Job 28: 
        Metrics hvg=full_feature,method=fastmnn,o_type=embed,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/fastmnn_embed.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/fastmnn_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.h5ad


[Sat May 28 23:35:27 2022]
Job 42: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: fastmnn embed
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/fastmnn_embed.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/fastmnn_embed_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/fastmnn_embed_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/fastmnn_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.h5ad


[Sat May 28 23:35:27 2022]
Job 30: 
        Metrics hvg=full_feature,method=liger,o_type=embed,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/liger_embed.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/liger_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/liger.h5ad


[Sat May 28 23:35:27 2022]
Job 44: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: liger embed
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/liger.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/liger_embed.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/liger_embed_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/liger_embed_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/liger_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/liger.h5ad


[Sat May 28 23:35:27 2022]
Job 32: 
        Metrics hvg=full_feature,method=seuratrpca,o_type=full,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/seuratrpca_full.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/seuratrpca_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seuratrpca.h5ad


[Sat May 28 23:35:27 2022]
Job 43: 
        SAVE EMBEDDING
        Scenario: test_data_r4 unscaled full_feature
        Method: harmony embed
        Input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/harmony.h5ad
        Output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/harmony_embed.csv /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/harmony_embed_batch.png /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/harmony_embed_labels.png
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/harmony_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/harmony.h5ad


[Sat May 28 23:35:27 2022]
Job 27: 
        Metrics hvg=full_feature,method=fastmnn,o_type=full,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/fastmnn_full.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/fastmnn_full.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.h5ad


[Sat May 28 23:35:27 2022]
Job 29: 
        Metrics hvg=full_feature,method=harmony,o_type=embed,scaling=unscaled,scenario=test_data_r4
        output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/harmony_embed.csv
        
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/harmony_embed.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/harmony.h5ad


[Sat May 28 23:35:27 2022]
Job 33: Completed all embeddings
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/embeddings.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/bbknn_knn.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/harmony_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/fastmnn_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/seuratrpca_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/combat_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scvi_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanorama_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scgen_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/liger_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/seurat_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanorama_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/fastmnn_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/embeddings/unscaled/full_feature/scanvi_embed.csv


[Sat May 28 23:35:27 2022]
Job 19: Merge all metrics
Reason: Missing output files: /home/data/vip19/tools/snakemake/scib-pipeline/data/metrics.csv; Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/fastmnn_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanorama_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanorama_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/combat_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/fastmnn_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/seuratrpca_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanvi_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/liger_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/seurat_full.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/bbknn_knn.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scvi_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/harmony_embed.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scgen_full.csv


[Sat May 28 23:35:27 2022]
localrule all:
    input: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/combat.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanorama.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scgen.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scvi.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/harmony.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/liger.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seurat.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seuratrpca.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/metrics.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/embeddings.csv
    jobid: 0
    reason: Input files updated by another job: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scvi.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/combat.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seuratrpca.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/harmony.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/metrics.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/embeddings.csv, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/seurat.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/liger.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scgen.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/R/fastmnn.h5ad, /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanorama.h5ad
    resources: tmpdir=/tmp

Job stats:
job                       count    min threads    max threads
----------------------  -------  -------------  -------------
all                           1              1              1
convert_RDS_h5ad              5              1              1
embeddings                    1              1              1
embeddings_single            13              1              1
integration_prepare           2              1              1
integration_run_python        6              1              1
integration_run_r             5              1              1
metrics                       1              1              1
metrics_single               13              1              1
total                        47              1              1


This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

failing to run the initial example

Hi,
I've been trying to build this package from source.
Because this is part of work on Easybuild HPC package manager I am also taking care of dependencies manually, not using conda.

I'm using tag 0.2.0.

It seems to install fine, but currently, when I try to run the initial example snakemake -n --cores 1 --configfile config.yaml,
I am getting this error:

Building DAG of jobs...
MissingInputException in rule integration_prepare in file /kyukon/scratch/gent/vo/001/gvo00117/easybuild/RHEL8/haswell-ib/software/scib-pipeline/0.2.0-foss-2022a/pipeline/Snakefile, line 32:
Missing input files for rule integration_prepare:
    output: /storage/groups/ml01/workspace/scIB/pancreas/prepare/unscaled/hvg/adata_pre.h5ad
    wildcards: scenario=pancreas, scaling=unscaled, hvg=hvg, prep=h5ad
    affected files:
        /storage/groups/ml01/workspace/maren.buettner/data_integration/data/human_pancreas/human_pancreas_norm.h5ad

Does it ring any bells?
Thank you very much.

No trajectory output

@LuckyMD @mumichae Thank you for your help so far, I've run the scvi and scanvi now with no issues. One final note is that I did not get any readouts for trajectory with any methods. What could be the reason here?

Originally posted by @AlinaKurjan in #46 (comment)

Generated test data not readable

I generated test data but get the bellow error when trying to read it.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    176         try:
--> 177             return func(elem, *args, **kwargs)
    178         except Exception as e:

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_group(group)
    526     if encoding_type:
--> 527         EncodingVersions[encoding_type].check(
    528             group.name, group.attrs["encoding-version"]

~/miniconda3/envs/rpy2_3/lib/python3.8/enum.py in __getitem__(cls, name)
    348     def __getitem__(cls, name):
--> 349         return cls._member_map_[name]
    350 

KeyError: 'dict'

During handling of the above exception, another exception occurred:

AnnDataReadError                          Traceback (most recent call last)
<ipython-input-13-949d8123e8d8> in <module>
      1 import scanpy as sc
----> 2 sc.read("/lustre/groups/ml01/code/karin.hrovatin/scib-pipeline/data/adata_norm.h5ad")

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/scanpy/readwrite.py in read(filename, backed, sheet, ext, delimiter, first_column_names, backup_url, cache, cache_compression, **kwargs)
    110     filename = Path(filename)  # allow passing strings
    111     if is_valid_filename(filename):
--> 112         return _read(
    113             filename,
    114             backed=backed,

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/scanpy/readwrite.py in _read(filename, backed, sheet, ext, delimiter, first_column_names, backup_url, cache, cache_compression, suppress_cache_warning, **kwargs)
    711     if ext in {'h5', 'h5ad'}:
    712         if sheet is None:
--> 713             return read_h5ad(filename, backed=backed)
    714         else:
    715             logg.debug(f'reading sheet {sheet} from file {filename}')

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
    419                 d[k] = read_dataframe(f[k])
    420             else:  # Base case
--> 421                 d[k] = read_attribute(f[k])
    422 
    423         d["raw"] = _read_raw(f, as_sparse, rdasp)

~/miniconda3/envs/rpy2_3/lib/python3.8/functools.py in wrapper(*args, **kw)
    873                             '1 positional argument')
    874 
--> 875         return dispatch(args[0].__class__)(*args, **kw)
    876 
    877     funcname = getattr(func, '__name__', 'singledispatch function')

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, *args, **kwargs)
    181             else:
    182                 parent = _get_parent(elem)
--> 183                 raise AnnDataReadError(
    184                     f"Above error raised while reading key {elem.name!r} of "
    185                     f"type {type(elem)} from {parent}."

AnnDataReadError: Above error raised while reading key '/layers' of type <class 'h5py._hl.group.Group'> from /.

running integration metric through scib-pipeline

Hello!

I have integrated two tissues via Seurat integration pipeline. I would like to use scib-pipeline to evaluate the quality of this integration.

I have converted Seurat object to anndata and tried to run the pipeline via this script:

snakemake --configfile configs/test_data-R4.0.yaml --cores 1

However, the pipeline asked me to preprocess and integrate data again. I was wondering if I misinterpret some steps of the pipeline and I would not be able to output metric for integrated data from Seurat.

Thank you for your help!
Olha

trouble in running the scib-pipeline

Hello. Thank you for the great tools! Recently I am trying to run the pipeline on the test dataset on the slurm clusters. However, it turns out errors, stating that:

(scib-pipeline-R4.0) [zhonh0b@login509-02-r scib-pipeline]$ snakemake --configfile configs/test_data-R4.0.yaml -c 1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job                       count    min threads    max threads
----------------------  -------  -------------  -------------
all                           1              1              1
convert_RDS_h5ad              5              1              1
embeddings                    1              1              1
embeddings_single            13              1              1
integration_run_python        4              1              1
integration_run_r             5              1              1
metrics                       1              1              1
metrics_single               13              1              1
total                        43              1              1

Select jobs to execute...

[Tue Jun 21 19:37:25 2022]
Job 1: 
        Run bbknn on unscaled data
        feature selection: full_feature
        dataset: test_data_r4
        command: conda run --no-capture-output -n scib-pipeline-R4.0 python
        hvgs: 
        cell type option: 
        output: /ibex/scratch/projects/c2101/Acropora_sc_analysis/multi_species/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad
        
Reason: Missing output files: /ibex/scratch/projects/c2101/Acropora_sc_analysis/multi_species/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad

usage: conda [-h] [-V] command ...
conda: error: unrecognized arguments: --no-capture-output
[Tue Jun 21 19:37:26 2022]
Error in rule integration_run_python:
    jobid: 1
    output: /ibex/scratch/projects/c2101/Acropora_sc_analysis/multi_species/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad
    shell:
        
        conda run --no-capture-output -n scib-pipeline-R4.0 python scripts/integration/runIntegration.py           -i /ibex/scratch/projects/c2101/Acropora_sc_analysis/multi_species/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad -o /ibex/scratch/projects/c2101/Acropora_sc_analysis/multi_species/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/bbknn.h5ad 	      -b batch --method bbknn   	      
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-06-21T193722.619820.snakemake.log

It seems like the system can not distinguish conda run command. But I tried to run the conda run command, it successfully run the process without any errors.

(scib-pipeline-R4.0) [zhonh0b@login509-02-r scib-pipeline]$  conda run --no-capture-output -n scib-pipeline-R4.0 python scripts/integration/runIntegration.py           -i /ibex/scratch/projects/c2101/Acropora_sc_analysis/multi_species/scib-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad -o /ibex/scratch/projects/c2101/Acropora_sc_analysis/multi_species/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/combat.h5ad       -b batch --method combat   
During startup - Warning message:
Setting LC_CTYPE failed, using "C"

Could you please help me fix the conda run problem in snakemake? Thank you for your help in advance!

confuse in adata and adata_int

Hello! I am following the scib's tutorial https://scib.readthedocs.io/en/latest/api.html
But there are some places i am not sure:

scib.metrics.metrics(adata, adata_int, ari=True, nmi=True)

does the adata hear mean the object before integration, and adata_int mean the object after integration? For the object before integration, how can I generate it from two files? For example, I have A and B two raw data. Should I merge them to be the object before integration? and then perform the preprocess? Or should I perform the preprocess separately and then merge them?

Solving env with conda causes memory usage explosion

Hi!

I tried setting up scib-pipeline with conda and solving env causes OOM on my 32G RAM machine.

Is this something that you have experienced?
I am currently trying to run this on machine with 125G RAM, but it takes a really long time (more than 1 hr as of now).

Error using sparse matrix

Now I notice I am getting this error:

Traceback (most recent call last):
  File "scripts/metrics/metrics.py", line 263, in <module>
    trajectory_=trajectory_
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/metrics.py", line 362, in metrics
    organism=organism,
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/cell_cycle.py", line 83, in cell_cycle
    verbose=verbose,
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/cell_cycle.py", line 175, in get_pcr_before_after
    raw_sub.X, covariate, pca_var=None, n_comps=n_comps, verbose=verbose
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/pcr.py", line 167, in pc_regression
    copy=True,
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scanpy/preprocessing/_pca.py", line 196, in pca
    'svd_solver: {svd_solver} can not be used with sparse input.\n'
ValueError: svd_solver: {svd_solver} can not be used with sparse input.
Use "arpack" (the default) or "lobpcg" instead.

ERROR conda.cli.main_run:execute(49): `conda run python scripts/metrics/metrics.py -u data/gbmap_subset.h5ad -i /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/integration/scaled/hvg/combat.h5ad -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/metrics/scaled/hvg/combat_full.csv -m combat -b batch -l celltype --type full --hvgs 2000 --organism human --assay expression -v` failed. (See above for error)
[Tue Nov  8 15:09:00 2022]
Error in rule metrics_single:
    jobid: 72
    output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/metrics/scaled/hvg/combat_full.csv
    shell:

        conda run -n scib-pipeline-R4.0 python scripts/metrics/metrics.py -u data/gbmap_subset.h5ad -i /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/integration/scaled/hvg/combat.h5ad          -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/metrics/scaled/hvg/combat_full.csv -m combat          -b batch -l celltype --type full          --hvgs 2000 --organism human --assay expression -v

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

I noticed the test file contains an array in adata.X instead of a spare matrix. I guess I need also to store my matrix as array to avoid this error? (adata.X = adata.X.toarray())

Originally posted by @ccruizm in #49 (comment)

scvi and other issues?

Hi all, thank you for your wonderful work, I am really keen to apply it for my own analysis. Did a fresh install of the pipeline today and had success running it with test data and the small test config pipeline, but unfortunately running into a few issues with my own data.

First error concerns the scvi.module not being installed and I am not entirely sure how to install it as I get dependency issues with both pip and mamba. I see that you are already trying to update code to include scvi-tools rather than scvi. Will this issue be fixed by this change?

Second error is this:

Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 80, in <module>
    runPP(file, out, hvg, batch, rout, scale, seurat)
  File "scripts/preprocessing/runPP.py", line 51, in runPP
    scib.preprocessing.saveSeurat(adata, outPath, batch, hvgs)
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/_package_tools.py", line 20, in wrapper
    return func(*args, **kwargs)
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/preprocessing.py", line 732, in save_seurat
    ro.globalenv["adata"] = adata
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/robjects/environments.py", line 34, in __setitem__
    robj = conversion.converter.py2rpy(value)
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata2ri/py2r.py", line 60, in py2rpy_anndata
    row_args = {k: pandas2ri.py2rpy(v) for k, v in obj.var.items()}
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata2ri/py2r.py", line 60, in <dictcomp>
    row_args = {k: pandas2ri.py2rpy(v) for k, v in obj.var.items()}
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py", line 191, in py2rpy_pandasseries
    res = func(obj)
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/robjects/numpy2ri.py", line 93, in numpy2rpy
    res = unsignednumpyint_to_rint(o)
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/robjects/numpy2ri.py", line 68, in unsignednumpyint_to_rint
    if intarray.itemsize >= (RINT_SIZE / 8):
  File "/home/a/akurjan/obds_conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/pandas/core/generic.py", line 5487, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'itemsize'

ERROR conda.cli.main_run:execute(49): `conda run python scripts/preprocessing/runPP.py -i [...my long redacted path...]/quads_labelled_scbi.h5ad -o [...my long redacted path...]/devquads/prepare/unscaled/full_feature/adata_pre.RDS -b sample --hvgs 0 -r -l` failed. (See above for error)

Any ideas what could be causing this? Thank you again!

Pipeline error with test data related to metrics.py

Hello,
I'm having trouble running the pipeline with the test and it seems to get stuck at the rule metrics_single which executes
conda run -n scIB-python python scripts/metrics/metrics.py -u data/adata_norm.h5ad -i /data/al862/scib_pipeline/data/test_data/integration/unscaled/full_feature/scanorama.h5ad -o /data/al862/scib_pipeline/data/test_data/metrics/unscaled/full_feature/scanorama_full.csv -m scanorama -b batch -l celltype --type full --hvgs 0 --organism mouse --assay expression -v

I've attached the progress/error messages below, any help with this would be much appreciated!
Best wishes,
Anika

Options
    type:       full
    batch_key:  batch
    label_key:  celltype
    assay:      expression
    organism:   mouse
    n_hvgs:     None
    setup:      scanorama_full
    optimised clustering results:       /data/al862/scib_pipeline/data/test_data/metrics/unscaled/full_feature/scanorama_full_nmi.txt
reading adata before integration
AnnData object with n_obs × n_vars = 2730 × 3451
    obs: 'paul15_clusters', 'celltype', 'batch', 'n_counts'
    var: 'n_counts'
    uns: 'iroot'
    layers: 'counts'
reading adata after integration
AnnData object with n_obs × n_vars = 2730 × 3451
    obs: 'paul15_clusters', 'celltype', 'batch', 'n_counts'
    var: 'n_counts'
    obsm: 'X_emb', 'X_scanorama'
reduce integrated data:
    HVG selection:      None
    compute neighbourhood graph:        True on X_pca
    precompute PCA:     True
PCA
Nearest Neigbours
computing metrics
type:   full
    ASW:        True
    NMI:        True
    ARI:        True
    PCR:        True
    cell cycle: True
    iso lab F1: True
    iso lab ASW:        True
    HVGs:       True
    kBET:       True
    LISI:       True
    Trajectory: False
clustering...
optimised clustering against celltype
optimal cluster resolution: 1.5
optimal score: 0.47700604961660414
saved clustering NMI values to /data/al862/scib_pipeline/data/test_data/metrics/unscaled/full_feature/scanorama_full_nmi.txt
NMI...
ARI...
silhouette score...
PC regression...
covariate: batch
compute PCA n_comps: 50
covariate: batch
compute PCA n_comps: 50
cell cycle effect...

/home/al862/anaconda3/envs/scIB-python/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py:14: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import Index as PandasIndex
Traceback (most recent call last):
  File "scripts/metrics/metrics.py", line 248, in <module>
    trajectory_=trajectory_
  File "/home/al862/anaconda3/envs/scIB-python/lib/python3.7/site-packages/scIB/metrics.py", line 1865, in metrics
    agg_func=np.mean, organism=organism)
  File "/home/al862/anaconda3/envs/scIB-python/lib/python3.7/site-packages/scIB/metrics.py", line 582, in cell_cycle
    raise ValueError(message)
ValueError: batch "3" of batch_key "batch" has unequal number of entries before and after integration.before: 564 after: 550

Issue with the pipeline

I am currently all set up according to the instruction, then should I run Snakemake in the base environment or scib-python?
However in either environment, when I am running the pipeline for either my own data or the test data, a syntax error occurs:

SyntaxError:
Input and output files have to be specified as strings or list of strings/
xxxxxx/xxxxx/xxxxxx/ scib-pipeline/Snakefile, line 10, in module
xxxxxx/xxxxx/xxxxxx/ scib-pipeline/scripts/preprocessing/Snakefile, line 23, in module

Error in rule metrics_single.

Hi !
Is this pipeline still being maintained?
I encountered an error while running the sample data.
A more specific error message is:

Traceback (most recent call last):
  File "scripts/metrics/metrics.py", line 263, in <module>
    trajectory_=trajectory_
  File "/home/u2204084007/.conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/metrics.py", line 414, in metrics
    verbose=verbose,
  File "/home/u2204084007/.conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/kbet.py", line 150, in kBET
    k0=k0,
  File "/home/u2204084007/.conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/kbet.py", line 215, in kBET_single
    import anndata2ri
  File "/home/u2204084007/.conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata2ri/__init__.py", line 20, in <module>
    from . import _py2r, _r2py  # noqa: F401
  File "/home/u2204084007/.conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata2ri/_py2r.py", line 14, in <module>
    from ._conv import converter, full_converter, mat_py2rpy
  File "/home/u2204084007/.conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata2ri/_conv.py", line 10, in <module>
    from . import scipy2ri
  File "/home/u2204084007/.conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata2ri/scipy2ri/__init__.py", line 27, in <module>
    from . import _py2r, _r2py  # noqa: F401
  File "/home/u2204084007/.conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata2ri/scipy2ri/_py2r.py", line 18, in <module>
    from anndata2ri._rpy2_ext import importr
  File "/home/u2204084007/.conda/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata2ri/_rpy2_ext.py", line 8, in <module>
    @lru_cache
  File "/home/u2204084007/.conda/envs/scib-pipeline-R4.0/lib/python3.7/functools.py", line 490, in lru_cache
    raise TypeError('Expected maxsize to be an integer or None')
TypeError: Expected maxsize to be an integer or None

ERROR conda.cli.main_run:execute(47): `conda run python scripts/metrics/metrics.py -u data/adata_norm.h5ad -i /home/u2204084007/biosoft/scib-pipeline-main/data/scib-R4.0_small/test_data_r4_small/integration/unscaled/full_feature/bbknn.h5ad -o /home/u2204084007/biosoft/scib-pipeline-main/data/scib-R4.0_small/test_data_r4_small/metrics/unscaled/full_feature/bbknn_knn.csv -m bbknn -b batch -l celltype --type knn --hvgs 0 --organism mouse --assay expression -v` failed. (See above for error)

Error in rule metrics_single:
    jobid: 8
    input: data/adata_norm.h5ad, /home/u2204084007/biosoft/scib-pipeline-main/data/scib-R4.0_small/test_data_r4_small/integration/unscaled/full_feature/bbknn.h5ad
    output: /home/u2204084007/biosoft/scib-pipeline-main/data/scib-R4.0_small/test_data_r4_small/metrics/unscaled/full_feature/bbknn_knn.csv
    shell:
        
        conda run -n scib-pipeline-R4.0 python scripts/metrics/metrics.py -u data/adata_norm.h5ad -i /home/u2204084007/biosoft/scib-pipeline-main/data/scib-R4.0_small/test_data_r4_small/integration/unscaled/full_feature/bbknn.h5ad          -o /home/u2204084007/biosoft/scib-pipeline-main/data/scib-R4.0_small/test_data_r4_small/metrics/unscaled/full_feature/bbknn_knn.csv -m bbknn          -b batch -l celltype --type knn          --hvgs 0 --organism mouse --assay expression -v
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Attached are my error log and configuration file
2023-08-04T115620.309217.snakemake.log
test_data-R4.0_small.txt

Error in `[[<-`(`tmp`, assay, value = assay.data) : [[<- defined for objects of type "S4" only for subclasses of environment

Hello again!

Getting the following issue running R methods:

Error in `[[<-`(`*tmp*`, assay, value = assay.data) : 
  [[<- defined for objects of type "S4" only for subclasses of environment
Calls: runHarm -> ScaleData -> ScaleData.Seurat
Execution halted

Example errors:

Error in rule integration_run_r:
    jobid: 44
    output: [path...]/devcombined/integration/scaled/full_feature/R/harmony.RDS
    shell:
        conda run --no-capture-output -n scib-R4.0 Rscript scripts/integration/runMethods.R -i [path...]/devcombined/prepare/scaled/full_feature/adata_pre.RDS -o [path...]/devcombined/integration/scaled/full_feature/R/harmony.RDS -b libbatch --method harmony 

Error in rule integration_run_r:
    jobid: 50
    output: [path...]/devcombined/integration/unscaled/hvg/R/seurat.RDS
    shell:
        conda run --no-capture-output -n scib-R4.0 Rscript scripts/integration/runMethods.R -i [path...]/devcombined/prepare/unscaled$ method seurat -v "[path...]/devcombined/prepare/unscaled/hvg/adata_pre_hvg.RDS"

Any thoughts on what could be causing it?

R 4 support?

Plans to get this working with R 4+ anytime soon? Also, possible to have snakemake generate the conda environments (e.g., bollito)?

Issue with running snakemake

I am currently all set up according to the instruction, then should we run snakemake in the base environment?
If so, when I am running the pipeline for either my own data or the test data, a syntax error occurs:

Providing custom variable gene to the full pipeline

Hello, I would like to know if there is a way with the current setup to provide custom variable jeans to all algorithms.

Thanks a lot

Buffer has wrong number of dimensions (expected 1, got 2)

Hello,

I uncommented

FEATURE_SELECTION:
  hvg: 2000

and

SCALING:
  - scaled

And I get this new error. Any ides of the reason?

Thanks.

raceback (most recent call last):
  File "scripts/integration/runIntegration.py", line 81, in <module>
    runIntegration(file, out, run, hvg, batch, celltype)
  File "scripts/integration/runIntegration.py", line 36, in runIntegration
    integrated = method(adata, batch)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/integration.py", line 33, in scanorama
    *corrected, batch_key=batch, batch_categories=categories, index_unique=None
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata/_core/anndata.py", line 1801, in concatenate
    partial(merge_outer, batch_keys=batch_categories, merge=merge_same),
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata/_core/merge.py", line 557, in merge_dataframes
    new_df = pd.DataFrame(merge_strategy(dfs), index=new_index)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/pandas/core/frame.py", line 614, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 465, in dict_to_mgr
    arrays, data_names, index, columns, dtype=dtype, typ=typ, consolidate=copy
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 124, in arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 590, in _homogenize
    val, index, dtype=dtype, copy=False, raise_cast_failure=False
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/pandas/core/construction.py", line 571, in sanitize_array
    subarr = maybe_convert_platform(data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 126, in maybe_convert_platform
    arr = lib.maybe_convert_objects(arr)
  File "pandas/_libs/lib.pyx", line 2385, in pandas._libs.lib.maybe_convert_objects
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

[Wed May  4 00:38:52 2022]
Error in rule integration_run_python:
    jobid: 16
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data_pbmc/run_pbmc/integration/scaled/full_feature/scanorama.h5ad
    shell:

        conda run -n scib-pipeline-R4 python scripts/integration/runIntegration.py           -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data_pbmc/run_pbmc/prepare/scaled/full_feature/adata_pre.h5ad -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data_pbmc/run_pbmc/integration/scaled/full_feature/scanorama.h5ad              -b batch --method scanorama            

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Submitting SLURM jobs

[...] Could you advise me on how to submit a SLURM job with the pipeline, please? I have tried, but it does not read the lib installed in the conda envs.

Originally posted by @ccruizm in #49 (comment)

Dependency errors with running scib-pipeline.

Hi!

Thank you so much for publishing the pipeline but I am having some errors getting the pipeline working properly.

Starting with the igraph version error, adding in louvain and cairocffi as dependencies, the (cannot import name 'Protocol' from 'typing') error, and working on finding the right version of optax. scvi, and scgen.

Here are some of the errors I have gotten:

The conflict is caused by:
    The user requested igraph<0.10
    scib 1.1.3 depends on igraph>=0.10

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ImportError: cannot import name 'Protocol' from 'typing'

ModuleNotFoundError: No module named "louvain

ImportError: cairo backend requires that pvcairo> 1.11.0 or cairocffi is installed

Tracesack most recent call last):
File "scripts/integration/runIntegration.py", line 67, in <module>
runintegration flle, out, run, hvg, batch, celltype)
File "scripts/integration/runIntegration.py", line 24, in runIntegration
integrated = method adata, batch
File " /home/ubuntu/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/integration.py",line228,inscvi
from scvi.model import SCVI
File
" /home/ubuntu/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scvi/__init_
•py", line 10, in <module>
from . import data, model, external, utils
File "/home/ubuntu/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/sci/model/init.py",line6,in<module>
from •_jaxsevi import JaxsCVI
File " /home/ubuntu/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scvi/model/_jaxsevi.py",line7,in<module>
import optax
File "/ home/ubuntu/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/optax/_init_py",line17,in<module>
from optax import experimental
File " /home/ubuntu/mambaforge/envs/scib-pipeline-R4.0/1ib/python3.7/site-packages/optax/experimental/__init_•py",line20,in<module> from optax._src.experimental.complex_valued import split_real_and_imaginary
File " /home/ubuntu/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/optax/_src/experimental/complex_valued.py",line36,in<module>
from optax. src import base
File " /home/ubuntu/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/optax/_src/base.py",line17,in<module> from typing import Any, Callable, NamedTuple, Optional, Protocol, Sequence, Tuple
ImportError: cannot import name 'Protocol from
'typing' (/home/ubuntu/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/typing.py)

[Mon Jul 24 09:53:27 2023]
Error in rule integration_run_python:
    jobid: 5
    input: /repos/batch-effect-correction-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad
    output: /repos/batch-effect-correction-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad
    shell:
        
        conda run --no-capture-output -n scib-pipeline-R4.0 python scripts/integration/runIntegration.py           -i /repos/batch-effect-correction-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad -o /repos/batch-effect-correction-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad              -b batch --method scanvi  -c celltype
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Mon Jul 24 09:53:28 2023]
Error in rule integration_run_python:
    jobid: 7
    input: /repos/batch-effect-correction-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad
    output: /repos/batch-effect-correction-pipeline/data/test_data_r4/integration/unscaled/full_feature/scvi.h5ad
    shell:
        
        conda run --no-capture-output -n scib-pipeline-R4.0 python scripts/integration/runIntegration.py           -i /repos/batch-effect-correction-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad -o /repos/batch-effect-correction-pipeline/data/test_data_r4/integration/unscaled/full_feature/scvi.h5ad        -b batch --method scvi  
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Mon Jul 24 09:53:28 2023]
Error in rule integration_run_python:
    jobid: 6
    input: /repos/batch-effect-correction-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad
    output: /repos/batch-effect-correction-pipeline/data/test_data_r4/integration/unscaled/full_feature/scgen.h5ad
    shell:
        
        conda run --no-capture-output -n scib-pipeline-R4.0 python scripts/integration/runIntegration.py           -i /repos/batch-effect-correction-pipeline/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad -o /repos/batch-effect-correction-pipeline/data/test_data_r4/integration/unscaled/full_feature/scgen.h5ad               -b batch --method scgen  -c celltype
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

There have been many roadblocks in getting the pipeline working with the information we have posted. Is there a way to get an updated version of the pipeline, or an updated pipeline-R4.0.yml file?

I really appreciate any help you can provide.

One tool = one environment?

Hi @LuckyMD and @danielStrobl,

I am interested in setting up a one-environment per R package set of environments (e.g. scIB-R-integration-liger, scIB-R-integration-fastmnn, etc). This is for avoiding multi-package conflicts with some tools that I think could be solved if isolating and treating manually each tool.

I think this is a reasonable idea to avoid multi-tool conflicts within the same environment, and as far as I remember @danielStrobl also agreed with me last week (perhaps not anymore). @LuckyMD might I start a branch for this? If you observe caveats or prefer more thoughts/discussion into this let me know.

Thank you!

KeyError: 'connectivities'

Hi,

After installing both environments I've been trying to run the pipeline with the test data as suggested. However one of the metrics rules breaks.

Error in rule metrics_single:
    jobid: 19
    output: /scib-pipeline/test_data/metrics/unscaled/full_feature/bbknn_knn.csv
    shell:
        
        conda run -n scIB-python python scripts/metrics/metrics.py -u data/adata_norm.h5ad -i /scib-pipeline/test_data/integration/unscaled/full_feature/bbknn.h5ad          -o /scib-pipeline/test_data/metrics/unscaled/full_feature/bbknn_knn.csv -m bbknn          -b batch -l celltype --type knn          --hvgs 0 --organism mouse --assay expression -v
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

This is the error inside that rule:

Job 19: 
        Metrics hvg=full_feature,method=bbknn,o_type=knn,scaling=unscaled,scenario=test_data
        output: /scib-pipeline/test_data/metrics/unscaled/full_feature/bbknn_knn.csv
        

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/metrics/metrics.py', '-u', 'data/adata_norm.h5ad', '-i', '/scib-pipeline/test_data/integration/unscaled/full_feature/bbknn.h5ad', '-o', '/scib-pipeline/test_data/metrics/unscaled/full_feature/bbknn_knn.csv', '-m', 'bbknn', '-b', 'batch', '-l', 'celltype', '--type', 'knn', '--hvgs', '0', '--organism', 'mouse', '--assay', 'expression', '-v']' command failed.  (See above for error)
Options
    type:       knn
    batch_key:  batch
    label_key:  celltype
    assay:      expression
    organism:   mouse
    n_hvgs:     None
    setup:      bbknn_knn
    optimised clustering results:       /scib-pipeline/test_data/metrics/unscaled/full_feature/bbknn_knn_nmi.txt
reading adata before integration
AnnData object with n_obs × n_vars = 2730 × 3451 
    obs: 'paul15_clusters', 'celltype', 'batch', 'n_counts'
    var: 'n_counts'
    uns: 'iroot'
    layers: 'counts'
reading adata after integration
AnnData object with n_obs × n_vars = 2730 × 3451 
    obs: 'paul15_clusters', 'celltype', 'batch', 'n_counts'
    var: 'n_counts'
    uns: 'iroot', 'neighbors', 'pca'
    obsm: 'X_pca'
    varm: 'PCs'
    layers: 'counts'
reduce integrated data:
    HVG selection:      None
    compute neighbourhood graph:        False
    precompute PCA:     False
computing metrics
type:   knn
    ASW:        False
    NMI:        True
    ARI:        True
    PCR:        False
    cell cycle: False
    iso lab F1: True
    iso lab ASW:        False
    HVGs:       False
    kBET:       True
    LISI:       True
    Trajectory: False
clustering...
optimised clustering against celltype
optimal cluster resolution: 1.4
optimal score: 0.4182767661269383
saved clustering NMI values to /scib-pipeline/test_data/metrics/unscaled/full_feature/bbknn_knn_nmi.txt
NMI...
ARI...
isolated labels...
Graph connectivity...

conda_envs/scIB-python/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
Traceback (most recent call last):
  File "scripts/metrics/metrics.py", line 248, in <module>
    trajectory_=trajectory_
  File "conda_envs/scIB-python/lib/python3.7/site-packages/scIB/metrics.py", line 1926, in metrics
    graph_conn_score = graph_connectivity(adata_int, label_key=label_key)
  File "conda_envs/scIB-python/lib/python3.7/site-packages/scIB/metrics.py", line 1795, in graph_connectivity
    _,labs = connected_components(adata_post_sub.obsp['connectivities'], connection='strong')
  File "conda_envs/scIB-python/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py", line 112, in __getitem__
    _subset(self.parent_mapping[key], self.subset_idx),
  File "conda_envs/scIB-python/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py", line 147, in __getitem__
    return self._data[key]
KeyError: 'connectivities'

Any idea how to troubleshoot this?

Setting environnement variables

Hi,
I'd like to start by saying I am relatively new to bioinformatics and might just be missing something obvious, thank you for your patience.

I'd like to run this pipeline but i can't seem to install the additionnal R packages needed. I've set the environment variables manually as was written in the ReadMe :

conda activate scib-pipeline
echo $CONDA_PREFIX  # referred to as <conda_prefix>
conda deactivate

cp envs/env_vars_activate.sh <conda_prefix>/etc/conda/activate.d/env_vars.sh
cp envs/env_vars_deactivate.sh <conda_prefix>/etc/conda/deactivate.d/env_vars.sh

But when i try to install packages i get the following message :

conda activate scib-R4
cd envs/
Rscript install_R_methods.R r4_dependencies.tsv
R dependencies from r4_dependencies.tsv
Dependencies:
                    package version    how
1:      kharchenkolab/conos   1.3.0 github
2: kharchenkolab/conosPanel    <NA> github
3:                  harmony    <NA>   cran
install kharchenkolab/conos
install kharchenkolab/[email protected] from Github
Erreur : .onLoad a échoué dans loadNamespace() pour 'pkgload', détails :
  appel : NULL
  erreur : package ‘rprojroot’ was installed before R 4.0.0: please re-install it
Exécution arrêtée

I've checked in both python and R environment and .libPaths() doesn't seem to be changed to the conda environment R library, is this related to my problem ?

Thank you for your help!

Pipeline can not find data in adata.obs

Good day,

I have been trying to run the pipeline with my own data but have failed so far. I used the test file, and the pipeline works and runs with no issues. However, once I input my data I get the error below:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 24
Rules claiming more threads will be scaled down.
Job stats:
job                       count    min threads    max threads
----------------------  -------  -------------  -------------
all                           1              1              1
convert_RDS_h5ad             18              1              1
embeddings                    1              1              1
embeddings_single            46              1              1
integration_prepare           8              1              1
integration_run_python       20              1              1
integration_run_r            18              1              1
metrics                       1              1              1
metrics_single               46              1              1
total                       159              1              1

Select jobs to execute...

[Tue Nov  8 11:03:58 2022]
Job 2:
        Preparing adata
        wildcards: hvg=hvg,prep=h5ad,scaling=unscaled,scenario=gb_test
        parameters: batch 2000    conda run -n scib-pipeline-R4.0
        output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/hvg/adata_pre.h5ad

Reason: Missing output files: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/hvg/adata_pre.h5ad


[Tue Nov  8 11:04:00 2022]
Job 36:
        Preparing adata
        wildcards: hvg=full_feature,prep=RDS,scaling=scaled,scenario=gb_test
        parameters: batch 0 -s -r -l conda run -n scib-pipeline-R4.0
        output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/full_feature/adata_pre.RDS

Reason: Missing output files: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/full_feature/adata_pre.RDS


[Tue Nov  8 11:04:00 2022]
Job 4:
        Preparing adata
        wildcards: hvg=full_feature,prep=h5ad,scaling=unscaled,scenario=gb_test
        parameters: batch 0    conda run -n scib-pipeline-R4.0
        output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/full_feature/adata_pre.h5ad

Reason: Missing output files: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/full_feature/adata_pre.h5ad


[Tue Nov  8 11:04:00 2022]
Job 6:
        Preparing adata
        wildcards: hvg=hvg,prep=h5ad,scaling=scaled,scenario=gb_test
        parameters: batch 2000 -s   conda run -n scib-pipeline-R4.0
        output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/hvg/adata_pre.h5ad

Reason: Missing output files: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/hvg/adata_pre.h5ad


[Tue Nov  8 11:04:00 2022]
Job 33:
        Preparing adata
        wildcards: hvg=hvg,prep=RDS,scaling=scaled,scenario=gb_test
        parameters: batch 2000 -s -r -l conda run -n scib-pipeline-R4.0
        output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/hvg/adata_pre.RDS

Reason: Missing output files: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/hvg/adata_pre.RDS


[Tue Nov  8 11:04:00 2022]
Job 8:
        Preparing adata
        wildcards: hvg=full_feature,prep=h5ad,scaling=scaled,scenario=gb_test
        parameters: batch 0 -s   conda run -n scib-pipeline-R4.0
        output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/full_feature/adata_pre.h5ad

Reason: Missing output files: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/full_feature/adata_pre.h5ad


[Tue Nov  8 11:04:00 2022]
Job 30:
        Preparing adata
        wildcards: hvg=full_feature,prep=RDS,scaling=unscaled,scenario=gb_test
        parameters: batch 0  -r -l conda run -n scib-pipeline-R4.0
        output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/full_feature/adata_pre.RDS

Reason: Missing output files: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/full_feature/adata_pre.RDS


[Tue Nov  8 11:04:00 2022]
Job 27:
        Preparing adata
        wildcards: hvg=hvg,prep=RDS,scaling=unscaled,scenario=gb_test
        parameters: batch 2000  -r -l conda run -n scib-pipeline-R4.0
        output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/hvg/adata_pre.RDS

Reason: Missing output files: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/hvg/adata_pre.RDS

Scaling data ...

Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 80, in <module>
    runPP(file, out, hvg, batch, rout, scale, seurat)
  File "scripts/preprocessing/runPP.py", line 47, in runPP
    adata = scib.preprocessing.scale_batch(adata, batch)
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/preprocessing.py", line 368, in scale_batch
    utils.check_batch(batch, adata.obs)
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/utils.py", line 12, in check_batch
    raise ValueError(f"column {batch} is not in obs")
ValueError: column batch is not in obs

ERROR conda.cli.main_run:execute(49): `conda run python scripts/preprocessing/runPP.py -i data/gbmap_subset.h5ad -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/full_feature/adata_pre.RDS -b batch --hvgs 0 -s -r -l` failed. (See above for error)
Computing HVGs ...

Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 80, in <module>
    runPP(file, out, hvg, batch, rout, scale, seurat)
  File "scripts/preprocessing/runPP.py", line 43, in runPP
    adataOut=True
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/preprocessing.py", line 497, in hvg_batch
    utils.check_batch(batch_key, adata.obs)
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/utils.py", line 12, in check_batch
    raise ValueError(f"column {batch} is not in obs")
ValueError: column batch is not in obs

ERROR conda.cli.main_run:execute(49): `conda run python scripts/preprocessing/runPP.py -i data/gbmap_subset.h5ad -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/hvg/adata_pre.h5ad -b batch --hvgs 2000` failed. (See above for error)
[Tue Nov  8 11:05:21 2022]
[Tue Nov  8 11:05:21 2022]
Error in rule integration_prepare:
    jobid: 2
    output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/hvg/adata_pre.h5ad
    shell:

        conda run -n scib-pipeline-R4.0 python scripts/preprocessing/runPP.py -i data/gbmap_subset.h5ad         -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/hvg/adata_pre.h5ad -b batch         --hvgs 2000

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Error in rule integration_prepare:
    jobid: 36
    output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/full_feature/adata_pre.RDS
    shell:

        conda run -n scib-pipeline-R4.0 python scripts/preprocessing/runPP.py -i data/gbmap_subset.h5ad         -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/full_feature/adata_pre.RDS -b batch         --hvgs 0 -s -r -l

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Computing HVGs ...

Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 80, in <module>
    runPP(file, out, hvg, batch, rout, scale, seurat)
  File "scripts/preprocessing/runPP.py", line 36, in runPP
    adataOut=False
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/preprocessing.py", line 497, in hvg_batch
    utils.check_batch(batch_key, adata.obs)
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/utils.py", line 12, in check_batch
    raise ValueError(f"column {batch} is not in obs")
ValueError: column batch is not in obs

ERROR conda.cli.main_run:execute(49): `conda run python scripts/preprocessing/runPP.py -i data/gbmap_subset.h5ad -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/hvg/adata_pre.RDS -b batch --hvgs 2000 -r -l` failed. (See above for error)
Computing HVGs ...

Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 80, in <module>
    runPP(file, out, hvg, batch, rout, scale, seurat)
  File "scripts/preprocessing/runPP.py", line 36, in runPP
    adataOut=False
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/preprocessing.py", line 497, in hvg_batch
    utils.check_batch(batch_key, adata.obs)
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/utils.py", line 12, in check_batch
    raise ValueError(f"column {batch} is not in obs")
ValueError: column batch is not in obs

ERROR conda.cli.main_run:execute(49): `conda run python scripts/preprocessing/runPP.py -i data/gbmap_subset.h5ad -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/hvg/adata_pre.RDS -b batch --hvgs 2000 -s -r -l` failed. (See above for error)
Scaling data ...

Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 80, in <module>
    runPP(file, out, hvg, batch, rout, scale, seurat)
  File "scripts/preprocessing/runPP.py", line 47, in runPP
    adata = scib.preprocessing.scale_batch(adata, batch)
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/preprocessing.py", line 368, in scale_batch
    utils.check_batch(batch, adata.obs)
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/utils.py", line 12, in check_batch
    raise ValueError(f"column {batch} is not in obs")
ValueError: column batch is not in obs

ERROR conda.cli.main_run:execute(49): `conda run python scripts/preprocessing/runPP.py -i data/gbmap_subset.h5ad -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/full_feature/adata_pre.h5ad -b batch --hvgs 0 -s` failed. (See above for error)
Computing HVGs ...

Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 80, in <module>
    runPP(file, out, hvg, batch, rout, scale, seurat)
  File "scripts/preprocessing/runPP.py", line 43, in runPP
    adataOut=True
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/preprocessing.py", line 497, in hvg_batch
    utils.check_batch(batch_key, adata.obs)
  File "/home/cruiz2/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/utils.py", line 12, in check_batch
    raise ValueError(f"column {batch} is not in obs")
ValueError: column batch is not in obs

ERROR conda.cli.main_run:execute(49): `conda run python scripts/preprocessing/runPP.py -i data/gbmap_subset.h5ad -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/scaled/hvg/adata_pre.h5ad -b batch --hvgs 2000 -s` failed. (See above for error)
[Tue Nov  8 11:05:21 2022]
[Tue Nov  8 11:05:21 2022]
Error in rule integration_prepare:
    jobid: 27
    output: /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/hvg/adata_pre.RDS
    shell:

        conda run -n scib-pipeline-R4.0 python scripts/preprocessing/runPP.py -i data/gbmap_subset.h5ad         -o /gpfs/work2/0/einf2548/cruiz/gbm/scib-pipeline/data/gb_test/prepare/unscaled/hvg/adata_pre.RDS -b batch         --hvgs 2000  -r -l

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

I really do not understand why it does not find the batch key when it is in the object. My Anndata object is a downsampled version for testing, which I created using sc.pp.subsample(adata, fraction=0.01, copy=True). I am sharing the object here in case you want to look, and perhaps we can find the problem (https://we.tl/t-Ohdw4qk7LL).

Btw, I ran this interactive mode. I have not been able to make it work via SLURM. It is weird because it seems to activate the conda env where snakemake s installed, but later it does not find scanpy. I am attaching the script and output I got. I would greatly appreciate it if you have a suggestion how I could submit the job!

Thanks in advance for your help! Please let me know whether any additional information is needed.
Scripts.zip

Error Running Pipeline

TLDR: Error running pipeline, ModuleNotFoundError: No module named 'scIB'

I tried running this pipeline (using the included test data) and followed the instructions in the README. When I get to the section, Running the Pipeline and run the command snakemake --configfile configs/test_data.yaml --cores 6 it crashes. Attached is the traceback. The command snakemake --configfile configs/test_data.yaml -n runs with no issue. What I did notice was the package name is scib but all the scripts and files used scIB.

Traceback

(scib-R) [user scib-pipeline]$ snakemake --configfile configs/test_data.yaml --cores 6
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 6
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	all
	5	convert_RDS_h5ad
	1	embeddings
	14	embeddings_single
	2	integration_prepare
	7	integration_run_python
	5	integration_run_r
	1	metrics
	14	metrics_single
	50

[Sun Feb  6 13:30:57 2022]
Job 11:
        Preparing adata
        wildcards: hvg=full_feature,prep=RDS,scaling=unscaled,scenario=test_data
        parameters: batch 0  -r -l conda run -n scib-pipeline
        output: /local/workdir/dt425/scib-pipeline/data/test_data/prepare/unscaled/full_feature/adata_pre.RDS



[Sun Feb  6 13:30:57 2022]
Job 2:
        Preparing adata
        wildcards: hvg=full_feature,prep=h5ad,scaling=unscaled,scenario=test_data
        parameters: batch 0    conda run -n scib-pipeline
        output: /local/workdir/dt425/scib-pipeline/data/test_data/prepare/unscaled/full_feature/adata_pre.h5ad


ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/preprocessing/runPP.py', '-i', 'data/adata_norm.h5ad', '-o', '/local/workdir/dt425/scib-pipeline/data/test_data/prepare/unscaled/full_feature/adata_pre.RDS', '-b', 'batch', '--hvgs', '0', '-r', '-l']' command failed.  (See above for error)
Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 5, in <module>
    import scIB
ModuleNotFoundError: No module named 'scIB'

[Sun Feb  6 13:31:19 2022]
Error in rule integration_prepare:
    jobid: 11
    output: /local/workdir/dt425/scib-pipeline/data/test_data/prepare/unscaled/full_feature/adata_pre.RDS
    shell:

        conda run -n scib-pipeline python scripts/preprocessing/runPP.py -i data/adata_norm.h5ad         -o /local/workdir/dt425/scib-pipeline/data/test_data/prepare/unscaled/full_feature/adata_pre.RDS -b batch         --hvgs 0  -r -l

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/preprocessing/runPP.py', '-i', 'data/adata_norm.h5ad', '-o', '/local/workdir/dt425/scib-pipeline/data/test_data/prepare/unscaled/full_feature/adata_pre.h5ad', '-b', 'batch', '--hvgs', '0']' command failed.  (See above for error)
Traceback (most recent call last):
  File "scripts/preprocessing/runPP.py", line 5, in <module>
    import scIB
ModuleNotFoundError: No module named 'scIB'

[Sun Feb  6 13:31:19 2022]
Error in rule integration_prepare:
    jobid: 2
    output: /local/workdir/dt425/scib-pipeline/data/test_data/prepare/unscaled/full_feature/adata_pre.h5ad
    shell:

        conda run -n scib-pipeline python scripts/preprocessing/runPP.py -i data/adata_norm.h5ad         -o /local/workdir/dt425/scib-pipeline/data/test_data/prepare/unscaled/full_feature/adata_pre.h5ad -b batch         --hvgs 0

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /local/workdir/dt425/scib-pipeline/.snakemake/log/2022-02-06T133057.142601.snakemake.log

ImportError: cannot import name 'Protocol' from 'typing'

Hello,

Thanks for the nice pipeline!

I am encountering some issues when running it on your test data.

I cloned your Github main branch and then manually commented the line

  - igraph<0.10

in envs/scib-pipeline-R4.0.yml following an error during installation and the instructions here.

But when I ran the workflow, I got this error:

ImportError: cannot import name 'Protocol' from 'typing'
Traceback (most recent call last):
  File "scripts/integration/runIntegration.py", line 67, in <module>
    runIntegration(file, out, run, hvg, batch, celltype)
  File "scripts/integration/runIntegration.py", line 22, in runIntegration
    integrated = method(adata, batch, celltype)
...

By looking into miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/typing.py, I found a mentioning of:

* The public counterpart of the generics API consists of two classes: Generic and Protocol
  (the latter is currently private, but will be made public after PEP 544 acceptance).

Then I found here that the PEP 544 was supported since Python 3.8.

However, when I tried to change

- python>=3.7

- python>=3.8

in the envs/scib-pipeline-R4.0.yml and update the environment, I got a confliction

The following packages are incompatible
├─ bbknn 1.3.9**  is installable with the potential options
│  ├─ bbknn 1.3.9 would require
│  │  ├─ python >=2.7,<2.8.0a0 , which can be installed;
│  │  └─ python_abi 2.7.* *_cp27mu, which can be installed;
│  ├─ bbknn 1.3.9 would require
│  │  ├─ python >=3.6,<3.7.0a0 , which can be installed;
│  │  └─ python_abi 3.6.* *_cp36m, which can be installed;
│  └─ bbknn 1.3.9 would require
│     ├─ python >=3.7,<3.8.0a0 , which can be installed;
│     └─ python_abi 3.7.* *_cp37m, which can be installed;
└─ python >=3.8  is uninstallable because there are no viable options
   ├─ python [3.10.0|3.10.10|...|3.9.7] conflicts with any installable versions previously reported;
   ├─ python [3.10.0|3.10.1|...|3.10.9] would require
   │  └─ python_abi 3.10.* *_cp310, which conflicts with any installable versions previously reported;
   ├─ python [3.11.0|3.11.1|3.11.2|3.11.3|3.11.4] would require
   │  └─ python_abi 3.11.* *_cp311, which conflicts with any installable versions previously reported;
   ├─ python [3.8.0|3.8.1] would require
   │  └─ python_abi * *_cp38, which conflicts with any installable versions previously reported;
   ├─ python [3.8.10|3.8.12|...|3.8.8] would require
   │  └─ python_abi 3.8.* *_cp38, which conflicts with any installable versions previously reported;
   └─ python [3.9.0|3.9.1|...|3.9.9] would require
      └─ python_abi 3.9.* *_cp39, which conflicts with any installable versions previously reported.

Do you know how should I solve this please?
Also, I would suggest to have a pre-build docker or singularity image of this pipeline if possible, in which anyone can launch the benchmarking without package installation.

Lost Snakefile, snakefile, workflow/Snakefile, workflow/snakefile

Hello,

Thank you for your pipeline.

The first question is that the paul15 file is not available from the link you provided.

The second question is, since I cannot find the paul15 file, I used my own dataset to run the code. When I run snakemake --configfile configs/test_data-R3.6.yaml, I encountered an error: "Error: no Snakefile found, tried Snakefile, snakefile, workflow/Snakefile, workflow/snakefile."

I am not sure where I went wrong. Could you please give me some hints? What I did was using scanpy to read all my objects and create an h5ad file. Then I ran the snakemake --configfile configs/test_data-R3.6.yaml command, making changes to the YAML file parameters based on my own requirements. That's when I got the error. I hope you could offer me some support with this.

Note: I corrected the typo "h3ad" to "h5ad" assuming it was a typographical error. If "h3ad" is indeed correct in your context, please ignore this correction.

Suspected issue with scanpy version

Hi @LuckyMD ,

I ran into an issue using the snakemake pipeline which I suspect might be due to a bug in scanpy.

I tried to run the pipeline on the test dataset using snakemake --configfile configs/test_data.yaml.
Running the rule metrics_single failed with

Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.
Traceback (most recent call last):
  File "scripts/metrics/metrics.py", line 248, in <module>
    trajectory_=trajectory_
  File "/home/wkopp/anaconda3/envs/scIB-python/lib/python3.7/site-packages/scIB/metrics.py", line 1926, in metrics
    graph_conn_score = graph_connectivity(adata_int, label_key=label_key)
  File "/home/wkopp/anaconda3/envs/scIB-python/lib/python3.7/site-packages/scIB/metrics.py", line 1795, in graph_connectivity
    _,labs = connected_components(adata_post_sub.obsp['connectivities'], connection='strong')
  File "/home/wkopp/anaconda3/envs/scIB-python/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py", line 112, in __getitem__
    _subset(self.parent_mapping[key], self.subset_idx),
  File "/home/wkopp/anaconda3/envs/scIB-python/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py", line 147, in __getitem__
    return self._data[key]
KeyError: 'connectivities'

Suggesting that something failed when creating the neighborhood graph.

Then I did

import scanpy as sc
adata = sc.read_h5ad("data/adata_norm.h5ad")
sc.pp.pca(adata)
sc.pp.neighbors(adata)
adata

Which results in the following AnnData object:

AnnData object with n_obs × n_vars = 2730 × 3451 
    obs: 'paul15_clusters', 'celltype', 'batch', 'n_counts'
    var: 'n_counts'
    uns: 'iroot', 'pca', 'neighbors'
    obsm: 'X_pca'
    varm: 'PCs'
    layers: 'counts'

The 'connectivities' are missing here with scanpy==1.4.6.

I tried to do the same with scanpy==1.7.2 in which case I get the correct AnnData object:

>>> adata
AnnData object with n_obs × n_vars = 2730 × 3451
    obs: 'paul15_clusters', 'celltype', 'batch', 'n_counts'
    var: 'n_counts'
    uns: 'iroot', 'pca', 'neighbors'
    obsm: 'X_pca'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'distances', 'connectivities'

Can you reproduce this issue?
Best,
Wolfgang

error running pipeline

Hi! Thanks again for your excellent work!

I am running pipeline on my data, it has 4 study about myeloid cells from different labs. And their celltype are labeled based on different methods. For example, one study has celltype all as "myeloid cells", one as " TYPE1, TYPE2, TYPE3", one as" TAM1(PD-L1),TAM2".

I have get rid of "scanvi" and "scgen" methods in my config since they use celltype. But I keep the original celltype in the obs otherwise it will break in the embedding step.
So can I still run the pipeline with my data?

It has the errors like :
1.

Traceback (most recent call last):
  File "scripts/integration/runIntegration.py", line 81, in <module>
    runIntegration(file, out, run, hvg, batch, celltype)
  File "scripts/integration/runIntegration.py", line 36, in runIntegration
    integrated = method(adata, batch)
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/integration.py", line 317, in mnn
    **kwargs,
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/mnnpy/mnn.py", line 126, in mnn_correct
    svd_mode=svd_mode, do_concatenate=do_concatenate, **kwargs)
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/mnnpy/mnn.py", line 182, in mnn_correct
    new_batch_in, sigma)
IndexError: arrays used as indices must be of integer (or boolean) type

Traceback (most recent call last):
  File "scripts/metrics/metrics.py", line 263, in <module>
    trajectory_=trajectory_
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/metrics.py", line 340, in metrics
    verbose=False,
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/silhouette.py", line 115, in silhouette_batch
    sil_means = sil_all.groupby("group").mean()
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1499, in mean
    numeric_only=numeric_only,
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1016, in _cython_agg_general
    how, alt=alt, numeric_only=numeric_only, min_count=min_count
  File "/data/msun/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 1121, in _cython_agg_blocks
    raise DataError("No numeric types to aggregate")
pandas.core.base.DataError: No numeric types to aggregate

And here is my config:

ROOT: /data/msun/01_integration
r_env : scib-R4.0
py_env : scib-pipeline-R4.0
timing: false

unintegrated_metrics: false

FEATURE_SELECTION:
  hvg: 2000
  full_feature: 0

SCALING:
  - unscaled
  - scaled

METHODS:
# python methods : bbknn, combat, desc, mnn, saucie, scanorama, scanvi, scgen, scvi, trvae, trvaep
  bbknn:
    output_type: knn
  combat:
    output_type: full
  desc:
    output_type: embed
  mnn:
    output_type: full
  saucie:
    output_type:
      - full
      - embed
  scanorama:
    output_type:
      - embed
      - full
  #scanvi:
  #  output_type: embed
  #  no_scale: true
  #  use_celltype: true
  #scgen:
  #  output_type: full
  #  use_celltype: true
  scvi:
    no_scale: true
    output_type: embed
  #trvae:
  #  no_scale: true
  #  output_type:
  #    - embed
  #    - full
  #trvaep:
  #  no_scale: true
  #  output_type:
  #    - embed
  #    - full
# R methods : conos, fastmnn, harmony, liger, seurat, seuratpca
  conos: 
    R: true
    output_type: knn
  fastmnn:
    R: true
    output_type:
      - embed
      - full
  harmony:
    R: true
    output_type: embed
  liger:
    no_scale: true
    R: true
    output_type: embed
  seurat:
    R: true
    output_type: full
  seuratrpca:
      R: true
      output_type: full

DATA_SCENARIOS:
  integrate_output:
    batch_key: batch # name of key on anndata.obs that annotates the batches
    label_key: celltype  # name of key on anndata.obs that annotates the cell identity labels
    organism: mouse
    assay: expression
    file: /data/msun/01_integration/ori_data/with_layers/pure_adatas.h5ad

Could you help me with it? Does this error happen because of celltype issue or something else? Is it necessary to relabel their celltype?

Thank you for your time!!!!

Hi authors

Great works.

error goes with test data

Hi! Thank you so much for explaining the dry try step. But it goes wrong when I apply this:

snakemake --configfile configs/test_data-R4.0.yaml --cores 10

The metrics step breaks several times at different jobid like other issue, but the key error seems different with them.

  File "scripts/metrics/metrics.py", line 263, in <module>
    trajectory_=trajectory_
  File "/home/data/vip19/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/metrics.py", line 414, in metrics
    verbose=verbose,
  File "/home/data/vip19/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/kbet.py", line 152, in kBET
    k0=k0,
  File "/home/data/vip19/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/metrics/kbet.py", line 201, in kBET_single
    "batch.estimate <- kBET("
  File "/home/data/vip19/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 438, in __call__
    res = self.eval(p)
  File "/home/data/vip19/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 199, in __call__
    .__call__(*args, **kwargs))
  File "/home/data/vip19/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
  File "/home/data/vip19/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
    cdata = function(*args, **kwargs)
  File "/home/data/vip19/mambaforge/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/rpy2/rinterface.py", line 677, in __call__
    raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Error in is(object, "list") : object 'object' not found


[Sun May 29 02:16:56 2022]
Error in rule metrics_single:
    jobid: 24
    output: /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanvi_embed.csv
    shell:
        
        conda run -n scib-pipeline-R4.0 python scripts/metrics/metrics.py -u data/adata_norm.h5ad -i /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad          -o /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scanvi_embed.csv -m scanvi          -b batch -l celltype --type embed          --hvgs 0 --organism mouse --assay expression -v
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(41): `conda run python scripts/metrics/metrics.py -u data/adata_norm.h5ad -i /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/integration/unscaled/full_feature/scvi.h5ad -o /home/data/vip19/tools/snakemake/scib-pipeline/data/test_data_r4/metrics/unscaled/full_feature/scvi_embed.csv -m scvi -b batch -l celltype --type embed --hvgs 0 --organism mouse --assay expression -v` failed. (See above for error)

Pandas version conflict in creating environments

Thank you for this awesome pipeline.

I am running into issues while creating the R and python environments required to run the snakemake pipeline. In particular, I am getting the following package conflicts:

Error: pip failed to install packages
The conflict is caused by:
trvaep 0.1.0 depends on pandas
scib 1.1.4 depends on pandas>=2

Could you please advise how to resolve this? I have tried removing the pandas requirement in the .yml files to let pip try to resolve this package conflict, but it still does not work.

Thank you!

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

I receive this error (below my snakemake config)

I don't know how to debug this, or understand what is going on.

Thanks a lot.

(scib-pipeline-R4) slurm-login02 279 % snakemake --configfile configs/test_data.yaml --cores 5
Building DAG of jobs...
The params used to generate one or several output files has changed:
    To inspect which output files have changes, run 'snakemake --list-params-changes'.
    To trigger a re-run, use 'snakemake -R $(snakemake --list-params-changes)'.
Using shell: /usr/bin/bash
Provided cores: 5
Rules claiming more threads will be scaled down.
Job stats:
job                  count    min threads    max threads
-----------------  -------  -------------  -------------
all                      1              1              1
convert_RDS_h5ad         5              1              1
embeddings               1              1              1
embeddings_single        6              1              1
metrics                  1              1              1
metrics_single           6              1              1
total                   20              1              1

Select jobs to execute...

[Tue Apr 19 23:28:11 2022]
Job 11:
        Convert integrated data from harmony into h5ad



[Tue Apr 19 23:28:11 2022]
Job 13:
        Convert integrated data from liger into h5ad



[Tue Apr 19 23:28:11 2022]
Job 8:
        Convert integrated data from fastmnn into h5ad



[Tue Apr 19 23:28:11 2022]
Job 15:
        Convert integrated data from seurat into h5ad



[Tue Apr 19 23:28:11 2022]
Job 17:
        Convert integrated data from seuratrpca into h5ad


ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/integration/runPost.py', '-i', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.RDS', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.h5ad']' command failed.  (See above for error)

    WARNING: The R package "reticulate" does not
    consider that it could be called from a Python process. This
    results in a quasi-obligatory segfault when rpy2 is evaluating
    R code using it. On the hand, rpy2 is accounting for the
    fact that it might already be running embedded in a Python
    process. This is why:
    - Python -> rpy2 -> R -> reticulate: crashes
    - R -> reticulate -> Python -> rpy2: works

    The issue with reticulate is tracked here:
    https://github.com/rstudio/reticulate/issues/208

Traceback (most recent call last):
  File "scripts/integration/runPost.py", line 40, in <module>
    runPost(file, out, conos)
  File "scripts/integration/runPost.py", line 21, in runPost
    adata = scib.pp.read_seurat(inPath)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 517, in read_seurat
    adata = ro.r('as.SingleCellExperiment(sobj)')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 451, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = conversion.rpy2py(res)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 28, in rpy2py_s4
    return rpy2py_single_cell_experiment(obj)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 95, in rpy2py_single_cell_experiment
    obs = rpy2py_data_frame(col_data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in rpy2py_data_frame
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in <dictcomp>
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 47, in rpy2py_vector
    r[np.array(baseenv["is.na"](v), dtype=bool)] = pd.NA
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

[Tue Apr 19 23:29:19 2022]
Error in rule convert_RDS_h5ad:
    jobid: 13
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.h5ad
    shell:

        if [ liger == "conos" ]
        then
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.h5ad -c
        else
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/liger.h5ad
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/integration/runPost.py', '-i', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.RDS', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.h5ad']' command failed.  (See above for error)

    WARNING: The R package "reticulate" does not
    consider that it could be called from a Python process. This
    results in a quasi-obligatory segfault when rpy2 is evaluating
    R code using it. On the hand, rpy2 is accounting for the
    fact that it might already be running embedded in a Python
    process. This is why:
    - Python -> rpy2 -> R -> reticulate: crashes
    - R -> reticulate -> Python -> rpy2: works

    The issue with reticulate is tracked here:
    https://github.com/rstudio/reticulate/issues/208

Traceback (most recent call last):
  File "scripts/integration/runPost.py", line 40, in <module>
    runPost(file, out, conos)
  File "scripts/integration/runPost.py", line 21, in runPost
    adata = scib.pp.read_seurat(inPath)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 517, in read_seurat
    adata = ro.r('as.SingleCellExperiment(sobj)')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 451, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = conversion.rpy2py(res)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 28, in rpy2py_s4
    return rpy2py_single_cell_experiment(obj)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 95, in rpy2py_single_cell_experiment
    obs = rpy2py_data_frame(col_data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in rpy2py_data_frame
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in <dictcomp>
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 47, in rpy2py_vector
    r[np.array(baseenv["is.na"](v), dtype=bool)] = pd.NA
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

[Tue Apr 19 23:29:20 2022]
Error in rule convert_RDS_h5ad:
    jobid: 17
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.h5ad
    shell:

        if [ seuratrpca == "conos" ]
        then
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.h5ad -c
        else
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seuratrpca.h5ad
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/integration/runPost.py', '-i', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.RDS', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.h5ad']' command failed.  (See above for error)

    WARNING: The R package "reticulate" does not
    consider that it could be called from a Python process. This
    results in a quasi-obligatory segfault when rpy2 is evaluating
    R code using it. On the hand, rpy2 is accounting for the
    fact that it might already be running embedded in a Python
    process. This is why:
    - Python -> rpy2 -> R -> reticulate: crashes
    - R -> reticulate -> Python -> rpy2: works

    The issue with reticulate is tracked here:
    https://github.com/rstudio/reticulate/issues/208

Traceback (most recent call last):
  File "scripts/integration/runPost.py", line 40, in <module>
    runPost(file, out, conos)
  File "scripts/integration/runPost.py", line 21, in runPost
    adata = scib.pp.read_seurat(inPath)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 517, in read_seurat
    adata = ro.r('as.SingleCellExperiment(sobj)')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 451, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = conversion.rpy2py(res)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 28, in rpy2py_s4
    return rpy2py_single_cell_experiment(obj)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 95, in rpy2py_single_cell_experiment
    obs = rpy2py_data_frame(col_data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in rpy2py_data_frame
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in <dictcomp>
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 47, in rpy2py_vector
    r[np.array(baseenv["is.na"](v), dtype=bool)] = pd.NA
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

[Tue Apr 19 23:29:22 2022]
Error in rule convert_RDS_h5ad:
    jobid: 11
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.h5ad
    shell:

        if [ harmony == "conos" ]
        then
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.h5ad -c
        else
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/harmony.h5ad
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/integration/runPost.py', '-i', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.RDS', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.h5ad']' command failed.  (See above for error)

    WARNING: The R package "reticulate" does not
    consider that it could be called from a Python process. This
    results in a quasi-obligatory segfault when rpy2 is evaluating
    R code using it. On the hand, rpy2 is accounting for the
    fact that it might already be running embedded in a Python
    process. This is why:
    - Python -> rpy2 -> R -> reticulate: crashes
    - R -> reticulate -> Python -> rpy2: works

    The issue with reticulate is tracked here:
    https://github.com/rstudio/reticulate/issues/208

Traceback (most recent call last):
  File "scripts/integration/runPost.py", line 40, in <module>
    runPost(file, out, conos)
  File "scripts/integration/runPost.py", line 21, in runPost
    adata = scib.pp.read_seurat(inPath)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 517, in read_seurat
    adata = ro.r('as.SingleCellExperiment(sobj)')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 451, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = conversion.rpy2py(res)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 28, in rpy2py_s4
    return rpy2py_single_cell_experiment(obj)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 95, in rpy2py_single_cell_experiment
    obs = rpy2py_data_frame(col_data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in rpy2py_data_frame
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in <dictcomp>
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 47, in rpy2py_vector
    r[np.array(baseenv["is.na"](v), dtype=bool)] = pd.NA
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

[Tue Apr 19 23:29:22 2022]
Error in rule convert_RDS_h5ad:
    jobid: 15
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.h5ad
    shell:

        if [ seurat == "conos" ]
        then
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.h5ad -c
        else
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/seurat.h5ad
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'scripts/integration/runPost.py', '-i', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.RDS', '-o', '/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.h5ad']' command failed.  (See above for error)

    WARNING: The R package "reticulate" does not
    consider that it could be called from a Python process. This
    results in a quasi-obligatory segfault when rpy2 is evaluating
    R code using it. On the hand, rpy2 is accounting for the
    fact that it might already be running embedded in a Python
    process. This is why:
    - Python -> rpy2 -> R -> reticulate: crashes
    - R -> reticulate -> Python -> rpy2: works

    The issue with reticulate is tracked here:
    https://github.com/rstudio/reticulate/issues/208

Traceback (most recent call last):
  File "scripts/integration/runPost.py", line 40, in <module>
    runPost(file, out, conos)
  File "scripts/integration/runPost.py", line 21, in runPost
    adata = scib.pp.read_seurat(inPath)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/preprocessing.py", line 517, in read_seurat
    adata = ro.r('as.SingleCellExperiment(sobj)')
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 451, in __call__
    res = self.eval(p)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 202, in __call__
    .__call__(*args, **kwargs))
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 125, in __call__
    res = conversion.rpy2py(res)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 28, in rpy2py_s4
    return rpy2py_single_cell_experiment(obj)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 95, in rpy2py_single_cell_experiment
    obs = rpy2py_data_frame(col_data)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in rpy2py_data_frame
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 58, in <dictcomp>
    columns = {k: rpy2py_vector(v) for k, v in slots["listData"].items()}
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/anndata2ri/r2py.py", line 47, in rpy2py_vector
    r[np.array(baseenv["is.na"](v), dtype=bool)] = pd.NA
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

[Tue Apr 19 23:29:23 2022]
Error in rule convert_RDS_h5ad:
    jobid: 8
    output: /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.h5ad
    shell:

        if [ fastmnn == "conos" ]
        then
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.h5ad -c
        else
            conda run -n scib-pipeline-R4 python scripts/integration/runPost.py -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.RDS -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data/test_data/integration/unscaled/full_feature/R/fastmnn.h5ad
        fi

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
The params used to generate one or several output files has changed:
    To inspect which output files have changes, run 'snakemake --list-params-changes'.
    To trigger a re-run, use 'snakemake -R $(snakemake --list-params-changes)'.
Complete log: .snakemake/log/2022-04-19T232810.877193.snakemake.log

snakemake config

ROOT: data
r_env : scib-R4
py_env : scib-pipeline-R4

timing: false
unintegrated_metrics: false

FEATURE_SELECTION:
  #hvg: 2000
  full_feature: 0

SCALING:
  - unscaled
  #- scaled

METHODS:
# python methods
  bbknn:
    output_type: knn
  combat:
    output_type: full
  #desc:
  #  output_type: embed
#  mnn:
#    output_type: full
  #saucie:
  #  output_type:
  #    - full
  #    - embed
  scanorama:
    output_type:
      - embed
      - full
  scanvi:
    output_type: embed
    no_scale: true
    use_celltype: true
  scgen:
    output_type: full
    use_celltype: true
  scvi:
    no_scale: true
    output_type: embed
  #trvae:
  #  no_scale: true
  # output_type:
  #    - embed
  #    - full
  # trvaep:
  #   no_scale: true
  #   output_type:
  #     - embed
  #     - full
# R methods
  #conos: # temporary directory issue
  #  R: true
  #  output_type: knn
  fastmnn:
    R: true
    output_type:
      - embed
      - full
  harmony:
    R: true
    output_type: embed
  liger:
    no_scale: true
    R: true
    output_type: embed
  seurat:
    R: true
    output_type: full
  seuratrpca:
      R: true
      output_type: full

DATA_SCENARIOS:
  test_data:
    batch_key: batch
    label_key: celltype
    organism: mouse
    assay: expression
    file: data/adata_norm.h5ad

conda - /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found

Hello,

I reinstalled scib within my conda environment scib-pipeline-R4

If I enter my environment

conda activate scib-pipeline-R4
cd /home/users/allstaff/mangiola.s/PostDoc/covid19pbmc/thirdparty_software/scib-pipeline

And execute

conda run -n scib-pipeline-R4 python scripts/metrics/metrics.py -u data/pbmc_simplified.h5ad -i /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/thirdparty_software/scib-pipeline/data_pbmc/run_pbmc/integration/scaled/full_feature/scgen.h5ad          -o /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/thirdparty_software/scib-pipeline/data_pbmc/run_pbmc/metrics/scaled/full_feature/scgen_full.csv -m scgen          -b batch -l celltype --type full          --hvgs 0 --organism human --assay expression -v

I get the below error. How can I make python 3.7 in this environment to find all GLIBCXX it needs? Thanks!

(my IT support says it's all about the conda environment and not something they can fix from outside).

Isolated labels F1...
Isolated labels ASW...
Graph connectivity...
kBET...
cLISI score...
using precomputed kNN graph
Convert nearest neighbor matrix and distances for LISI.
Compute knn on shortest paths
call /home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/knn_graph/knn_graph.o /tmp/lisi_5z873agj/graph_lisi_input.mtx /tmp/lisi_5z873agj/graph_lisi 90 1 50
LISI score estimation
 
/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/knn_graph/knn_graph.o: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/knn_graph/knn_graph.o)
/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/knn_graph/knn_graph.o: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/knn_graph/knn_graph.o)
Traceback (most recent call last):
  File "scripts/metrics/metrics.py", line 263, in <module>
    trajectory_=trajectory_
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/metrics/metrics.py", line 433, in metrics
    verbose=verbose,
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/metrics/lisi.py", line 151, in clisi_graph
    verbose=verbose,
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/metrics/lisi.py", line 299, in lisi_graph_py
    n_neighbors=n_neighbors,
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/metrics/lisi.py", line 408, in compute_simpson_index_graph
    if os.stat(index_file).st_size == 0:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/lisi_5z873agj/graph_lisi_indices_0.txt'

igraph version conflict

Hey! You might want to update the igraph version requirement, it seems that there is a conflict between what the scib package requires and what is specified here.

INFO: pip is looking at multiple versions of trvaep to determine which version is compatible with other requirements. This could take a while.

The conflict is caused by:
    The user requested igraph<0.10
    scib 1.1.3 depends on igraph>=0.10

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict


Pip subprocess error:
  Running command git clone --filter=blob:none --quiet https://github.com/theislab/scib.git /tmp/pip-req-build-tp8ds9mf
ERROR: Cannot install -r /home/carlo.dedonno/git_new/immunai-product/research/2023_02_scib_pipelines/scib-pipeline/envs/condaenv.2_jbkacp.requirements.txt (line 1) and igraph<0.10 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

failed

CondaEnvException: Pip failed

data sets available?

Great job on the scib pipeline. I imagine it was a big headache getting all of these methods to work. I was wondering if the benchmark data set h5ad files were available for download? They seem to be hard coded in config/reproduce_paper.yaml Thanks!

from_scvi_model

Hello,
Thank you for the documentation and all the work you have done on integration.

I find that an error occurs when running scvi. The error is right at the end of the process and it appears to do with "TypeError: from_scvi_model() missing 1 required positional argument: 'unlabeled_category'"
When I comment out the scvi from the config, the whole pipeline seems to work fine.
Have you encountered this error before?

Thank you in advance, Carmen

Traceback (most recent call last):
File "scripts/integration/runIntegration.py", line 81, in
runIntegration(file, out, run, hvg, batch, celltype)
File "scripts/integration/runIntegration.py", line 34, in runIntegration
integrated = method(adata, batch, celltype)
File "/home/ubuntu/miniconda3/envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scib/integration.py", line 281, in scanvi
unknown_category="UnknownUnknown", # pick anything definitely not in a dataset
TypeError: from_scvi_model() missing 1 required positional argument: 'unlabeled_category'

conda run python scripts/integration/runIntegration.py -i integration_test/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad -o integration_test/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad -b batch --method scanvi -c celltype failed. (See above for error)`

Benchmarking out of repo model integration metrics

I am working on integration models and wanted to test my models integration metrics against those currently supported in the pipeline. I was wondering if there is a way to run benchmarking on custom models? Is this currently supported?

Pipeline works on your test dataset but fails for experimental dataset - ValueError: Bin edges must be unique: array

Hello,

I managed to complete the pipeline with your provided test dataset, however I get errors with my experimental dataset. One of them is below in the metrics step

ValueError: Bin edges must be unique: array


Options
    type:       full
    batch_key:  batch
    label_key:  celltype
    assay:      expression
    organism:   human
    n_hvgs:     None
    setup:      scgen_full
    optimised clustering results:       /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data_pbmc/run_cd8/metrics/unscaled/full_feature/scgen_full_nmi.txt
reading adata before integration
AnnData object with n_obs × n_vars = 13500 × 23806
    obs: 'nCount_RNA', 'nFeature_RNA', 'orig.ident', 'Barcode', 'capture', 'hashedDrops.Total', 'hashedDrops.Best', 'hashedDrops.Second', 'hashedDrops.LogFC', 'hashedDrops.LogFC2', 'hashedDrops.Doublet', 'hashedDrops.Confident', 'HTO', 'vireo.donor_id', 'vireo.prob_max', 'vireo.prob_doublet', 'vireo.n_vars', 'vireo.best_singlet', 'vireo.best_doublet', 'donor', 'simple_timepoint', 'viability', 'comment', 'actual_timepoint', 'severity', 'days_since_symptom_onset', 'date_of_sample', 'COVID', 'colours.hto_colours', 'colours.donor_colours', 'colours.capture_colours', 'colours.simple_timepoint_colours', 'colours.actual_timepoint_colours', 'colours.severity_colours', 'sum', 'detected', 'subsets_Mito_sum', 'subsets_Mito_detected', 'subsets_Mito_percent', 'subsets_Ribo_sum', 'subsets_Ribo_detected', 'subsets_Ribo_percent', 'subsets_Hemo_sum', 'subsets_Hemo_detected', 'subsets_Hemo_percent', 'altexps_HTO_sum', 'altexps_HTO_detected', 'altexps_HTO_percent', 'altexps_ADT_sum', 'altexps_ADT_detected', 'altexps_ADT_percent', 'total', 'batch', 'sizeFactor', 'nCount_ADT', 'nFeature_ADT', 'RNA.weight', 'ADT.weight', 'wsnn_res.2', 'celltype', 'nCount_SCT', 'nFeature_SCT', 'predicted.celltype.l1.score', 'predicted.celltype.l1', 'predicted.celltype.l2.score', 'predicted.celltype.l2', 'doublets.proportion', 'doublets.known', 'doublets.predicted', 'cell_count', '.palette', '.color', 'has_covid', 'sample', 'n_cells', 'ident'
    uns: 'X_name'
    layers: 'counts', 'logcounts'
reading adata after integration
AnnData object with n_obs × n_vars = 13500 × 23806
    obs: 'nCount_RNA', 'nFeature_RNA', 'orig.ident', 'Barcode', 'capture', 'hashedDrops.Total', 'hashedDrops.Best', 'hashedDrops.Second', 'hashedDrops.LogFC', 'hashedDrops.LogFC2', 'hashedDrops.Doublet', 'hashedDrops.Confident', 'HTO', 'vireo.donor_id', 'vireo.prob_max', 'vireo.prob_doublet', 'vireo.n_vars', 'vireo.best_singlet', 'vireo.best_doublet', 'donor', 'simple_timepoint', 'viability', 'comment', 'actual_timepoint', 'severity', 'days_since_symptom_onset', 'date_of_sample', 'COVID', 'colours.hto_colours', 'colours.donor_colours', 'colours.capture_colours', 'colours.simple_timepoint_colours', 'colours.actual_timepoint_colours', 'colours.severity_colours', 'sum', 'detected', 'subsets_Mito_sum', 'subsets_Mito_detected', 'subsets_Mito_percent', 'subsets_Ribo_sum', 'subsets_Ribo_detected', 'subsets_Ribo_percent', 'subsets_Hemo_sum', 'subsets_Hemo_detected', 'subsets_Hemo_percent', 'altexps_HTO_sum', 'altexps_HTO_detected', 'altexps_HTO_percent', 'altexps_ADT_sum', 'altexps_ADT_detected', 'altexps_ADT_percent', 'total', 'batch', 'sizeFactor', 'nCount_ADT', 'nFeature_ADT', 'RNA.weight', 'ADT.weight', 'wsnn_res.2', 'celltype', 'nCount_SCT', 'nFeature_SCT', 'predicted.celltype.l1.score', 'predicted.celltype.l1', 'predicted.celltype.l2.score', 'predicted.celltype.l2', 'doublets.proportion', 'doublets.known', 'doublets.predicted', 'cell_count', '.palette', '.color', 'has_covid', 'sample', 'n_cells', 'ident'
    obsm: 'latent'
reduce integrated data:
    HVG selection:      None
    compute neighbourhood graph:        True on X_pca
    precompute PCA:     True
PCA
Nearest Neigbours
computing metrics
type:   full
    ASW:        True
    NMI:        True
    ARI:        True
    PCR:        True
    cell cycle: True
    iso lab F1: True
    iso lab ASW:        True
    HVGs:       True
    kBET:       True
    LISI:       True
    Trajectory: False
Clustering...
resolution: 0.1, nmi: 0.3694215076797655
resolution: 0.2, nmi: 0.3712330993289096
resolution: 0.3, nmi: 0.3751855221332878
resolution: 0.4, nmi: 0.39350369264205204
resolution: 0.5, nmi: 0.38384292382636226
resolution: 0.6, nmi: 0.39501282627712797
resolution: 0.7, nmi: 0.40047348749776385
resolution: 0.8, nmi: 0.401657168636766
resolution: 0.9, nmi: 0.399057260917533
resolution: 1.0, nmi: 0.40947313985181905
resolution: 1.1, nmi: 0.4082897771829039
resolution: 1.2, nmi: 0.41863900815331434
resolution: 1.3, nmi: 0.4146104228389089
resolution: 1.4, nmi: 0.4174159583303697
resolution: 1.5, nmi: 0.4163787317115119
resolution: 1.6, nmi: 0.4178321905271655
resolution: 1.7, nmi: 0.4180688344241309
resolution: 1.8, nmi: 0.41657580811531025
resolution: 1.9, nmi: 0.41601617917907485
resolution: 2.0, nmi: 0.4219354667274935
optimised clustering against celltype
optimal cluster resolution: 2.0
optimal score: 0.4219354667274935
saved clustering NMI values to /stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/mangiola.s/PostDoc/covid19pbmc/software/scib-pipeline/data_pbmc/run_cd8/metrics/unscaled/full_feature/scgen_full_nmi.txt
NMI...
ARI...
Silhouette score...
PC regression...
covariate: batch
compute PCA n_comps: 50
covariate: batch
compute PCA n_comps: 50
cell cycle effect...
Isolated labels F1...
Isolated labels ASW...
Graph connectivity...
kBET...
cLISI score...
using precomputed kNN graph
Convert nearest neighbor matrix and distances for LISI.
Compute knn on shortest paths
/tmp/lisi_f6fkjcl_/input.mtx /tmp/lisi_f6fkjcl_/
LISI score estimation
56 processes started.
iLISI score...
using precomputed kNN graph
Convert nearest neighbor matrix and distances for LISI.
Compute knn on shortest paths
/tmp/lisi_ow39cyvq/input.mtx /tmp/lisi_ow39cyvq/
LISI score estimation
56 processes started.

Traceback (most recent call last):
  File "scripts/metrics/metrics.py", line 263, in <module>
    trajectory_=trajectory_
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/metrics/metrics.py", line 364, in metrics
    hvg_score = hvg_overlap(adata, adata_int, batch_key)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/metrics/highly_variable_genes.py", line 35, in hvg_overlap
    hvg_pre_list = precompute_hvg_batch(adata_pre, batch, hvg_post)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scib/metrics/highly_variable_genes.py", line 16, in precompute_hvg_batch
    hvg = sc.pp.highly_variable_genes(i, flavor='cell_ranger', n_top_genes=n_hvg_tmp, inplace=False)
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 443, in highly_variable_genes
    flavor=flavor,
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 246, in _highly_variable_genes_single_batch
    np.r_[-np.inf, np.percentile(df['means'], np.arange(10, 105, 5)), np.inf],
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 296, in cut
    ordered=ordered,
  File "/home/users/allstaff/mangiola.s/.conda/envs/scib-pipeline-R4/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 414, in _bins_to_cuts
    f"Bin edges must be unique: {repr(bins)}.\n"
ValueError: Bin edges must be unique: array([       -inf, 2.00000e-03, 2.00000e-03, 4.00000e-03, 6.00000e-03,
       8.00000e-03, 1.00000e-02, 1.40000e-02, 2.00000e-02, 2.60000e-02,
       3.40000e-02, 4.40000e-02, 5.80000e-02, 7.80000e-02, 1.02000e-01,
       1.38000e-01, 1.94000e-01, 2.92000e-01, 5.60000e-01, 2.01934e+02,
               inf]).
You can drop duplicate edges by setting the 'duplicates' kwarg

Why graph connectivity is considered a Batch correction metric?

By reading the description in [Luecken, 2022] and the implementation, graph connectivity metric only considers biological information (label_key). However, the website and the paper present it as a "batch correction" metric.

I appreciate your help on figuring out what I am missing here

error in running scanvi integration

Hello!

I have followed the documentation and tried to run the test dataset through scib-pipeline. I have followed these steps:

########## scib-pipeline

# GENERATE TEST DATA
# Navigate to data directory
cd Desktop/scib-pipeline-main/data

# Execute code within data directory. It will output adata_norm.h5ad file
python3 generate_data.py 

# GENERATE OUTPUT FILE adata_pre.h5ad BASED ON adata_norm.h5ad
# Navigate to this path
/Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/prepare/unscaled/full_feature

# Run the script
python3 /Users/f006qpk/Desktop/scib-pipeline-main/scripts/preprocessing/runPP.py -i /Users/f006qpk/Desktop/scib-pipeline-main/data/adata_norm.h5ad -o /Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad -b batch --hvgs 0

# INTEGRATION STEP
# Navigate to this path
/Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/integration/unscaled/full_feature 

# Run the integration script
python3 /Users/f006qpk/Desktop/scib-pipeline-main/scripts/integration/runIntegration.py -i /Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad  -o /Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/integration/unscaled/full_feature/combat.h5ad -b batch --method combat

# GENERATE EMBEDDINGS
# Navigate to this path
/Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/embeddings/unscaled/full_feature 

# Run the embeddings script
python3 /Users/f006qpk/Desktop/scib-pipeline-main/scripts/visualization/save_embeddings.py --input /Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/integration/unscaled/full_feature/combat.h5ad --outfile /Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/embeddings/unscaled/full_feature/combat_full.csv --method combat --batch_key batch             --label_key celltype --result full

# scanvi 
# Navigate to this path
/Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/integration/unscaled/full_feature

When I tried to run scanvi on my local machine
python3 /Users/f006qpk/Desktop/scib-pipeline-main/scripts/integration/runIntegration.py -i /Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/prepare/unscaled/full_feature/adata_pre.h5ad -o /Users/f006qpk/Desktop/scib-pipeline-main/data/test_data_r4/integration/unscaled/full_feature/scanvi.h5ad -b batch --method scanvi -c celltype

I have received the following error message:

Traceback (most recent call last):
  File "/Users/f006qpk/Desktop/scib-pipeline-main/scripts/integration/runIntegration.py", line 67, in <module>
    runIntegration(file, out, run, hvg, batch, celltype)
  File "/Users/f006qpk/Desktop/scib-pipeline-main/scripts/integration/runIntegration.py", line 22, in runIntegration
    integrated = method(adata, batch, celltype)
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/scib/integration.py", line 299, in scanvi
    vae = scvi(adata, batch, hvg, return_model=True, max_epochs=n_epochs_scVI)
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/scib/integration.py", line 261, in scvi
    vae.train(**train_kwargs)
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/scvi/model/base/_training_mixin.py", line 78, in train
    runner = self._train_runner_cls(
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/scvi/train/_trainrunner.py", line 85, in __init__
    self.trainer = self._trainer_cls(
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/scvi/train/_trainer.py", line 139, in __init__
    super().__init__(
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/lightning/pytorch/utilities/argparse.py", line 69, in insert_env_defaults
    return fn(self, **kwargs)
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 393, in __init__
    self._accelerator_connector = _AcceleratorConnector(
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 157, in __init__
    self._set_parallel_devices_and_init_accelerator()
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 391, in _set_parallel_devices_and_init_accelerator
    self._devices_flag = accelerator_cls.parse_devices(self._devices_flag)
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/lightning/pytorch/accelerators/cpu.py", line 47, in parse_devices
    devices = _parse_cpu_cores(devices)
  File "/Users/f006qpk/mambaforge/lib/python3.10/site-packages/lightning/fabric/accelerators/cpu.py", line 85, in _parse_cpu_cores
    raise TypeError("`devices` selected with `CPUAccelerator` should be an int > 0.")
TypeError: `devices` selected with `CPUAccelerator` should be an int > 0.

I was wondering if it's because I'm running the analysis on M2 MacBook Pro.

Thank you for your help!

Olha

Installation issues that need fixing

Things to improve installation experience:

pin rpy2 3.4.2
- install specifically via conda (if possible)
scib: try-except around ro.r('library(Seurat)’) to report that library installs were problematic
README:
- Add “This is a 3-5 step installation” to give overview of long installation section-> add links to sections
- remove details, improve flow
- First have small statements of simple instructions, then go deeper to details

theislab / scib-pipeline Goto Github PK

scib-pipeline's Introduction

Pipeline for benchmarking atlas-level single-cell integration

Resources

Please cite:

Installation

Running the Pipeline

Generate Test data

Setup Configuration File

Pipeline Commands

Visualise the Workflow

Tools

scib-pipeline's People

Contributors

Stargazers

Watchers

Forkers

scib-pipeline's Issues

Recommend Projects

Recommend Topics

Recommend Org