Giter Club home page Giter Club logo

cvae-rna-seq's Introduction

CVAE-RNA-seq

github for "Conditional Variational Autoencoder-based Generative Model for Gene Expression Data Augmentation" | Paper | Code 스크린샷 2023-04-02 오후 11 47 58

Overview

Gene expression data can be utilized in various studies, including the prediction of disease prognosis. However, there are challenges associated with collecting enough data due to cost constraints. In this paper, we propose a gene expression data generation model based on Conditional Variational Autoencoder. Our results demonstrate that the proposed model generates synthetic data with superior quality compared to two other state-of-the-art models for gene expression data generation, namely the Wasserstein Generative Adversarial Network with Gradient Penalty based model and the structured data generation models CTGAN and TVAE.

Simple Result

  • Test 2745 samples, 969 L1000 landmark genes.

    • Gamma score 0.98

      스크린샷 2023-04-02 오후 11 48 23
  • Compare with datasets such as [Ramon Viñas, Helena Andrés-Terré, Pietro Liò, Kevin Bryson, Adversarial generation of gene expression data, Bioinformatics, Volume 38, Issue 3, February 2022, Pages 730–737]

    • Gamma score 0.96

      스크린샷 2023-04-02 오후 11 48 56

Dataset

In this study, samples of 15 common tissues (lung, breast, kidney, thyroid, colon, stomach, prostate, saliva, liver, esophageal myopathy, esophageal mucosa, esophageal gastrointestinal tract, bladder, uterus, and cervix) of GTEx and TCGA were used. We followed the pipeline described by Wang et al. (2018) to integrate data and modify the deployment effect. Since then, 969 common genes with the L1000 landmark gene set were selected to create a dataset consisting of 9,146 samples and 969 genes.

  • GTEx(Genotype-Tissue Expression) Dataset
  • TCGA(Cancer Genome Atlas) Dataset
  • L1000 landmark
  • RNA-seq(human transcriptomics) Dataset (9147 samples and 18154 genes )

Install dependencies

  • torch >= 1.12.1
  • python >= 3.7
  • Python packages
    • umap-learn >= 0.5.3
    • scikit-learn >= 1.1.1

Usage

969 landmark gene sets were pretreated using log2 (expression_value+1) and standardization. You can download sample data for learning and testing from the Google Drive link below.

npy_data - Google Drive

Model Train

python train.py

Evaluation Notebook

Please check the evaluation.ipynb file.

Contact

If you have any question or problem, please send an email to [email protected]

cvae-rna-seq's People

Contributors

hyunsbong avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.