CVAE-RNA-seq

github for "Conditional Variational Autoencoder-based Generative Model for Gene Expression Data Augmentation" | Paper | Code

Overview

Gene expression data can be utilized in various studies, including the prediction of disease prognosis. However, there are challenges associated with collecting enough data due to cost constraints. In this paper, we propose a gene expression data generation model based on Conditional Variational Autoencoder. Our results demonstrate that the proposed model generates synthetic data with superior quality compared to two other state-of-the-art models for gene expression data generation, namely the Wasserstein Generative Adversarial Network with Gradient Penalty based model and the structured data generation models CTGAN and TVAE.

Simple Result

Test 2745 samples, 969 L1000 landmark genes.
- Gamma score 0.98
Compare with datasets such as [Ramon Viñas, Helena Andrés-Terré, Pietro Liò, Kevin Bryson, Adversarial generation of gene expression data, Bioinformatics, Volume 38, Issue 3, February 2022, Pages 730–737]
- Gamma score 0.96

Dataset

In this study, samples of 15 common tissues (lung, breast, kidney, thyroid, colon, stomach, prostate, saliva, liver, esophageal myopathy, esophageal mucosa, esophageal gastrointestinal tract, bladder, uterus, and cervix) of GTEx and TCGA were used. We followed the pipeline described by Wang et al. (2018) to integrate data and modify the deployment effect. Since then, 969 common genes with the L1000 landmark gene set were selected to create a dataset consisting of 9,146 samples and 969 genes.

GTEx(Genotype-Tissue Expression) Dataset
TCGA(Cancer Genome Atlas) Dataset
L1000 landmark
RNA-seq(human transcriptomics) Dataset (9147 samples and 18154 genes )

Install dependencies

torch >= 1.12.1
python >= 3.7
Python packages
- umap-learn >= 0.5.3
- scikit-learn >= 1.1.1

Usage

969 landmark gene sets were pretreated using log2 (expression_value+1) and standardization. You can download sample data for learning and testing from the Google Drive link below.

npy_data - Google Drive

Model Train

python train.py

Evaluation Notebook

Please check the evaluation.ipynb file.

Contact

If you have any question or problem, please send an email to [email protected]

hyunsbong / cvae-rna-seq Goto Github PK

cvae-rna-seq's Introduction

CVAE-RNA-seq

Overview

Simple Result

Dataset

Install dependencies

Usage

Model Train

Evaluation Notebook

Contact

cvae-rna-seq's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent