Giter Club home page Giter Club logo

scot's Introduction

Single-Cell alignment using Optimal Transport (SCOT)

SCOT is a Python tool for performing unsupervised alignment of single-cell multi-omics datasets. Its methodology is detailed in the following two papers:

For full documentation, please visit https://rsinghlab.github.io/SCOT/ (currently being updated).

SCOT v.1.0

Unsupervised single-cell multi-omic integration with Gromov-Wasserstein optimal transport & a self-tuning heuristic for hyperparameter selection.

THIS ARCHIVE CONTAINS BOTH SCOT v.1.0 AND SCOT v.2.0
Usage: All dependencies are recorded in requirements.txt. You can install them together with pip install requirements.txt.
Jupyter notebooks to replicate the results from the manuscript are in the folder /replication. These also give examples for how to use SCOT. Scripts in /examples contain sample scripts for unsupervised and supervised hyperparameter selection.

E-mail: [email protected], [email protected], [email protected] or [email protected] if you have any questions.

Basic use:

from scotv1 import *
# or
from scotv2 import * 

# Given two numpy matrices, domain1 and domain2, where the rows are cells and columns are different genomic features:
scot= SCOT(domain1, domain2)
aligned_domain1, aligned_domain2 = scot.align(k=50, e=1e-3)

#If you can't pick the parameters k and e, you can try out our unsupervised self-tuning heuristic by running:
scot= SCOT(domain1, domain2)
aligned_domain1, aligned_domain2 = scot.align(selfTune=True)

Required parameters for the align() method:

  • k: Number of neighbors to be used when constructing kNN graphs. Default= min(min(n_1, n_2), 50), where n_i, for i=1,2 corresponds to the number of samples in the i^th domain.
  • e: Regularization constant for the entropic regularization term in entropic Gromov-Wasserstein optimal transport formulation. Default= 1e-3

Optional parameters:

  • normalize= Determines whether to normalize input data ahead of alignment. True or False (boolean parameter). Default = True.
  • norm= Determines what sort of normalization to run, "l2", "l1", "max", "zscore". Default="l2"
  • mode: "connectivity" or "distance". Determines whether to use a connectivity graph (adjacency matrix of 1s/0s based on whether nodes are connected) or a distance graph (adjacency matrix entries weighted by distances between nodes). Default="connectivity"
  • metric: Sets the metric to use while constructing nearest neighbor graphs. some possible choices are "correlation", "minkowski". "correlation" is Pearson's correlation and "minkowski" is equivalent to Euclidean distance in its default form (). Default= "correlation".
  • verbose: Prints loss while optimizing the optimal transport formulation. Default=True
  • XontoY: Determines the direction of barycentric projection. True or False (boolean parameter). If True, projects domain1 onto domain2. If False, projects domain2 onto domain1. Default=True.

Note: If you want to specify the marginal distributions of the input domains and not use uniform distribution, please set the attributes p and q to the distributions of your choice (for domain 1, and 2, respectively) after initializing a SCOT class instance and before running alignment and set init_marginals=False in .align() parameters

SCOT v.1.1

A naive extension to multi-modal alignment, where the first dataset in the input as treated as the anchor to align on.

SCOT v.2.0

A few extensions:

  1. Alignment with the unbalanced Gromov-Wasserstein optimal transport formulation to handle cell-type representation disparities (Sejourne et al, 2020)
  2. Multi-modal alignment by picking the anchor domain based on imputation potential of domain-specific nearest neighbor graphs
  3. Different choices for joint embedding/projection

Citation:

We are excited to see any extentions and improvements our work! If you are using code from this repository, please kindly cite our work:

For SCOT v.1.0:
Demetci, P. Santorella, R. Sandstede, B., Noble, W. S., Singh, R. 2020. Gromov-Wasserstein based optimal transport for aligning single-cell multi-omics data. bioRxiv. 2020.04.28.066787; doi: https://doi.org/10.1101/2020.04.28.066787
BibTex Citation:

@article {Demetci2020.SCOT,  
	author = {Demetci, Pinar and Santorella, Rebecca and Sandstede, Bj{\"o}rn and Noble, William Stafford and Singh, Ritambhara},  
	title = {Gromov-Wasserstein optimal transport to align single-cell multi-omics data},  
	elocation-id = {2020.04.28.066787},  
	year = {2020},  
	doi = {10.1101/2020.04.28.066787},  
	publisher = {Cold Spring Harbor Laboratory},  
	URL = {https://www.biorxiv.org/content/early/2020/11/11/2020.04.28.066787},  
	eprint = {https://www.biorxiv.org/content/early/2020/11/11/2020.04.28.066787.full.pdf},  
	journal = {bioRxiv}. 
}

For SCOT v.2.0:
Demetci, P. Santorella, R. Sandstede, B., Noble, W. S., Singh, R. 2021. Unsupervised integration of single-cell multi-omics datasets with disparities in cell-type representation. bioRxiv. 2021.11.09.467903; doi: https://doi.org/10.1101/2021.11.09.467903
BibTex Citation:


@article{Demetci2021.SCOTv2,
	author = {Demetci, Pinar and Santorella, Rebecca and Sandstede, Bj{\"o}rn and Singh, Ritambhara},
	doi = {10.1101/2021.11.09.467903},
	elocation-id = {2021.11.09.467903},
	eprint = {https://www.biorxiv.org/content/early/2021/11/11/2021.11.09.467903.full.pdf},
	journal = {bioRxiv},
	publisher = {Cold Spring Harbor Laboratory},
	title = {Unsupervised integration of single-cell multi-omics datasets with disparities in cell-type representation},
	url = {https://www.biorxiv.org/content/early/2021/11/11/2021.11.09.467903},
	year = {2021},
	Bdsk-Url-1 = {https://www.biorxiv.org/content/early/2021/11/11/2021.11.09.467903},
	Bdsk-Url-2 = {https://doi.org/10.1101/2021.11.09.467903}}

scot's People

Contributors

clousilli avatar pinardemetci avatar rsantorella avatar zsteve avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

scot's Issues

Self-tuning in SCOT v2

Hi,

In the SCOT v2 code, I could not self-tune the model. The align method does not have any parameter that allows self-tuning. However paper itself says that the self-tuning can be done with SCOT v2.

Can you help with this issue, please?
Thank you

Tuning function error

Hi,

In SCOT v1 line 152 X_aligned, y_aligned= self.unsupervised_scot() we are expecting two returned parameters but the function unsupervised_scot() returns only one value. That is why SCOT v1 cannot do self-tuning how do we solve this issue?

Upgrading from scot v1 to scot v2 : Problem with integrating datasets with different number of samples

Whilst upgrading I get the following error integrating datasets with different numbers of samples (3293 and 3164)

Traceback (most recent call last):
  File "scot_v2_try.py", line 22, in <module>
    aligned_X, aligned_y= scot_aligner.align(k=k, e=e, normalize=normalize)
  File "SCOTv2/src/scot.py", line 173, in align
    X_aligned, y_aligned = self.barycentric_projection(XontoY=XontoY)
  File "SCOTv2/src/scot.py", line 143, in barycentric_projection
    self.X_aligned=np.matmul(self.coupling, self.y) / weights[:, None]
ValueError: operands could not be broadcast together with shapes (3293,1300) (3164,1)

Here's my code

scot_aligner=SCOT(X, y)
aligned_X, aligned_y= scot_aligner.align(k=k, e=e, normalize=normalize)

Please let me know what I can change. The examples I am able to find seem to all be for the same number of samples

Errors when running the coembed_datasets() method

Hi,

I've been trying to work with the co-embedding version of scot V2, and I think there are a few typos in the coembed_datasets() method of the scotv2 class.
Once corrected, its seems to work fine, but I'd prefer if you could tell me if the changes I made in the corresponding pull request are legit?

RESOLVES #9

Thanks for your help and for your very nice work

Barthelemy Caron, postdoc in Ivan Costa's lab at RWTH Aachen.

Confusion about the data preprocessing

As far as I understood preprocessing steps for snare-seq are
atac-seq dataset-> cistopic -> unit normalization
rna-seq dataset-> unit normalization -> PCA-10 components

Is that correct?

I know this is a general machine learning question, but what did you use to choose the number of components when doing PCA for a different dataset? Which tool/settings do you recommend?

Errors when running hyperparameterTuning_example.py, missing hyperparameter XontoY

The XontoY parameter seems to be missing in the align function although it was referenced in the code

def align(self, k, e, balanced=True, rho=1e-3, verbose=True, normalize=True, norm="l2", init_coupling=True):

Here is the error:
Traceback (most recent call last): File "SCOT/hyperparameterTuning_example.py", line 41, in <module> X_aligned, y_aligned = scot.align(k, e, normalize = False, XontoY=True) TypeError: align() got an unexpected keyword argument 'XontoY'

Similary, the next line in the hyperparameterTuning_example.py shown below also probably throws an issue because it also seems to be missing the same hyperparameter in the function:

X_aligned2, y_aligned2 = scot.barycentric_projection(XontoY=False)

unbalanced transport

Hi there,

First thanks for a great method and paper. I've recently been looking at integrating scATAC-seq and scRNA-seq datasets for identically prepared but distinct cell populations. In doing so, one issue that I am wary of is that cell populations may not be represented in identical proportions in both datasets due to sampling or batch-to-batch variation in abundance. As a result, using balanced transport can result in erroneous assignment of mass between domains.

To address this, I've experimented with using an unbalanced Gromov-Wasserstein formulation and demonstrated some examples of this in two notebooks (one simulated dataset and also in SNARE-seq example). I have implemented this in a fork of the repo here: https://github.com/zsteve/SCOT

Although these modifications were motivated by my own use case, I thought this extension might be useful more generally as part of the package. If you are interested, let me know and I can make a pull request.

Stephen

Preprocessing/Hyperparameters of MEC and RNA-Imaging Experiments for SCOTv2

Hi,

I'm hoping to reproduce the results of SCOTv2 on the MEC and RNA-Imaging datasets. I have a couple of questions:

(1) Where can I gain access to the MEC dataset (there were no links in the paper) and how did you preprocess this data? Was it conducted in the same way as SNARE-seq and scGEM?
(2) I tried replicating the results of the RNA-imaging dataset using SCOTv2's default parameters but was not able to achieve good metrics. Can you share the specific hyperparameters you used for this experiment? Additionally, was there any additional preprocessing that you took after the data loaders provided by https://github.com/uhlerlab/cross-modal-autoencoders/tree/master or did you directly use those tensor representations? For instance, did you do any of the same preprocessing steps you took in SNARE-seq/scGEM for the gene expression profiles of this dataset?

Thank you in advance for your guidance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.