Giter Club home page Giter Club logo

td-multiomics's Introduction

TD-multiOmics

Tensor Based unsupervised feature extraction was applied to multiomics data which was used for performance evaluation of DIABLO algorithm implemented in mixOmics package

Introduction

Recently, I poposed multi-view data analysis strategy using tensor based unsupervised feature extraction (paper), and applied it to multi-omics data among them. However, because there is no multi-omics in the title, I feel that I am pretty defeated by the appealing power to the paper of the recent mixOmics package which emphasized this point. Therefore, I would like to compare the performance when applying the variable selection multi-view data application by unsupervised learning using tensor decomposition to the dataset whose performance is tried with mixOmics as well as the propagation of own method.

About mixOmics

As its name suggests, mixOmcs is an R package specialized for analyzing multi-omics data, published in CRAN, and can be normally installed using the install.packages () command. Various methods are included, but here we compare the performance with the method named DIABLO that they regard as the most advanced. Figure 1 in preprint is the easiest to understand in DIABLO, but you can create a variable by multiplying multiple multi-omics data in pair wise and then summing up the variables, and using that variable to set the predictor (Supervised learning) method to construct multiple classes. A proper explanation using mathematical expressions is posted in detail in the Sec. 4 in text S1 of the mixOmics paper. Since it is not the purpose of this paper to enter the detailed explanation of DIABLO method, please refer to the papers linking here for details.

DIABLO: A case study

The page of the above-mentioned preprint or DIABLO execution example describes an example of integrated analysis of mRNA, miRNA, proteomics data using TCGA data.

This data is displayed on the page of DIABLO execution example as

## $mRNA
## [1] 150 200
##
## $miRNA
## [1] 150 184
##
## $proteomics
## [1] 150 142

these say that there are 150 samples with 200 mRNA, 184 miRNA and 142 proteomics measurements. 150 samples consist of three classes,

## Basal Her2 LumA
## 45 30 75

as shown in DIABLO execution example.

図1

The figure above shows the classification performance by how many components created with DIABLO are used. The vertical axis is the error rate, but sufficient performance is obtained with the first two components. Also, the embedding of all 150 samples in this space

図2

can be shown in the above. It is rather obvious that three sub classes are well separated on this plane. In addition to this, DIABLO has a function that select important features.

図3

The above figure is the heatmap of the variable selected by DIABLO. Although rows are samples and columns are (selected) omics data, it is possible to determine three classes sufficiently by hierarchical clustering alone without having to carry out multivariate analysis each time variable selection can be performed on omics data.

Applying multi-view data analysis using tensor decomposition based unsupervised feature extraction to multi-omics data

Now, let us use the application of multi-view data analysis of variable selection by unsupervised learning using tensor decomposition to the above data set. In my paper, it corresponds to Type I Case I. Here, expresses the expression level in th sample of th mRNA, represents the expression level in the th sample of the th miRNA, represents the expression level in the th sample of th proteomics, then the tensor is defined as

After applying HOSVD to , we get

plot.jpg

The above figure shows the embedding of 150 samples coposed of three sub classes into plane spaned by , . It is also obvious that 150 samples are well separated into three sab classes. Confusion table (rows: prediction, columns: true classes) obtained by linear discriminat analysis using these two components is

Basal Her2 LumA
Basal 42 4 0
Her2 2 25 2
LumA 1 1 73

The accuracy is as large as 0.94. Feature selection can be done using these two selected components. are sorted by thier absolute values for , with having larger absolute values are selected.

rank
1 1 1 1 1 -407857.582
2 1 1 4 4 -209720.615
3 2 1 1 4 -20452.480
4 2 1 3 1 -11677.505
5 2 1 4 1 -10428.742
6 2 1 2 1 10157.467
7 1 1 2 1 -8973.774
8 1 2 1 4 8360.976
9 2 1 5 4 -6628.467
10 1 1 3 4 6623.046

The above table that lists top 10 ranked ones, turn out to be selected. With assuming that obey multiple Gaussian distribution as null hypothesis、P-values are attributed to . The top 10 are selected respectively.

heatmap.jpg

Rows are samples(black:Basel, red:Her2, green:Luma), columns are omics (blue:mRNA,pink:miRNA,cyan:proteomics). Comapative to DIABLO, feature that allows hierarchical clautsering classify three subclasses are well selected.

Discussion

DIABLO is a fairly complicated calculation, you must create a design matrix on how to combine multi-omics data yourself, and use label information There is supervised learning. On the other hand, tensor decomposition is unsupervised learning, and what you are doing is simple. Actually, although I raised the execution code of R as tensor.R in this repository, it is simple enough to beat it. I feel a little unbelievable if I think that the same performance as DIABLO is done with this. However, DIABLO also thinks of the product between multi-omics data, and thinks that if it performs linear discrimination it may mean that the direction is not deviated so much in the meaning.

In the tensor decomposition, the tensor becomes huge (in the present case, there are 150 × 200 × 184 × 142 elements), so there is a drawback that calculation time is required. Actually, although DIABLO can be executed by a lap-top, tensor decomposition can not be performed without a server machine with dozens of giga of memory. Nevertheless, in the future, tensor decomposition will be frequently used for multi-omics data analysis.

td-multiomics's People

Contributors

tagtag avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.