Giter Club home page Giter Club logo

scme's Introduction

scME

scME: A Dual-Modality Factor Model for Single-Cell Multi-Omics Embedding

Installation

  1. Install pytorch according to your computational platform
  2. You can use git to clone our repository.
  git clone https://github.com/bucky527/scME.git
  cd SCME/
  1. Install dependencies:

    you can install dependencies use pip

    pip3 install numpy scipy pandas scikit-learn pyro-ppl matplotlib scanpy anndata scvi-tools
    

    or use conda environment

    conda env create -f environment.yaml
    
  2. Install scME package

  python setup.py install

Prepare data

scME accepts as input the RNA gene counts matrix data and raw protein ADTs counts matrix data in the CSV format usually end in ".csv", where rows are cells and columns are genes, and the columns 0 in csv file should be the cells ids.

Usage

usage: scme.py [-h] --rna RNA --protein PROTEIN --output-dir OUTPUT_DIR
               [--max-epochs MAX_EPOCHS] [--batch-size BATCH_SIZE] [--lr LR]
               [--lr_classify LR_CLASSIFY] [--latentdim LATENTDIM]
               [--aux-loss-multiplier AUX_LOSS_MULTIPLIER]
               [--rna-latentdim RNA_LATENTDIM]
               [--protein-latentdim PROTEIN_LATENTDIM]
               [--lr-step LR_STEP [LR_STEP ...]] [--cuda CUDA]
               [--use-mnb USE_MNB]

Option arguments description

optional arguments:
  -h, --help            show this help message and exit
  --rna RNA             rna count data .csv data path
  --protein PROTEIN     protein count data .csv data path
  --output-dir OUTPUT_DIR
                        output directory to save cells embeddings
  --max-epochs MAX_EPOCHS
                        train max epochs
  --batch-size BATCH_SIZE
                        train dataset batch size
  --lr LR               learning rate
  --lr_classify LR_CLASSIFY
                        learning rate for classify loss
  --latentdim LATENTDIM
                        dimension for embedding
  --aux-loss-multiplier AUX_LOSS_MULTIPLIER
                        auxiliary loss multiplier
  --rna-latentdim RNA_LATENTDIM
                        rna latent dimension
  --protein-latentdim PROTEIN_LATENTDIM
                        protein latent dimension
  --lr-step LR_STEP [LR_STEP ...]
                        learning rate decay step
  --cuda CUDA           use cuda
  --use-mnb USE_MNB     use mixture negative binomial distribution or not for
                        proteindata

Get cell embedding for CITE-seq

You can use scme.py to easily obtain cell embeddings for CITE-seq data

python scme.py --rna [your rna gene counts csv file path] --protein [your protein ADTs counts csv file path] --output-dir [result save path] --batch-size 256

Building model and training

#Prepare your data and build scME models

scme=build_scme(rnadata=rna
                ,proteindata=protein
                ,protein_dist="NB",#'NB' or 'MNB'
                rna_latent_dim=24,
                protein_latent_dim=20,
                latent_dim=32)

#Train scme
scme=train_model(model,
                max_epochs=200,
                batchsize=256,
                lr=1e-4,
                lr_cla=1e-4,
                milestones=[80],
                save_model=False,
                save_dir=None)

#Inference cell embedding

zm=scme.inference(rna_data,protein_data) 

scME model example

see a running example in notebook tutorial.ipynb.

scme's People

Contributors

bucky527 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

zengflab luyiyun

scme's Issues

Problem when I want to get the latent space representation after training

I want to integrate my paired RNA data and Protein data as the tutorial described, but encountered an error when I'm going to get the latent space representation after training. The message is

RuntimeError                              Traceback (most recent call last)
Cell In[15], line 7
      5 rnatorch,proteintorch=rnatorch.to(model.device),proteintorch.to(model.device)
      6 model.eval()
----> 7 zm=model.inference(rnatorch, proteintorch)

File ~/Workspace/Multiomics/benchmark/script/./scME/pyroMethod.py:653, in ScMESVI_2.inference(self, rna, protein)
    650 self.eval() 
    651 # zr_loc,zr_scale,l_loc,l_scale=self.zr_encoder((rna,yr))
    652 # zp_loc,zp_scale,c_loc,c_scale,pi=self.zp_encoder((protein,yp))
--> 653 zr_loc,zr_scale,l_loc,l_scale=self.zr_encoder(rna)
    654 zp_loc,zp_scale,c_loc,c_scale,pi=self.zp_encoder(protein)
    655 zm_loc,zm_scale=self.zm_encoder((zr_loc,zp_loc))

RuntimeError: mat1 and mat2 must have the same dtype

The rnatorch.shape is torch.Size([16204, 2000]) and I get model.zr_encoder like

MLP(
  (sequential_mlp): Sequential(
    (0): ConcatModule()
    (1): DataParallel(
      (module): Linear(in_features=2000, out_features=1000, bias=True)
    )
    (2): BatchNorm1d(1000, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): ReLU()
    (4): DataParallel(
      (module): Linear(in_features=1000, out_features=256, bias=True)
    )
    (5): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): ReLU()
    (7): DataParallel(
      (module): Linear(in_features=256, out_features=64, bias=True)
    )
    (8): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): ReLU()
    (10): ListOutModule(
      (0): Sequential(
        (0): Linear(in_features=64, out_features=24, bias=True)
      )
      (1): Sequential(
        (0): Linear(in_features=64, out_features=24, bias=True)
        (1): Softplus(beta=1, threshold=20)
      )
      (2): Sequential(
        (0): Linear(in_features=64, out_features=1, bias=True)
      )
      (3): Sequential(
        (0): Linear(in_features=64, out_features=1, bias=True)
        (1): Softplus(beta=1, threshold=20)
      )
    )
  )
)

I consider the shapes of rnatorch and the MLP are matched, so I don't know where the problem is. What could I do to fix it? I'd appreciate it if you could consider my issue!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.