An R-package implementing de novo deconvolution method that allows identification of individual cell types in the mixture without knowing either cell types proportions or their corresponding cell-specific markers.
The basic idea of ClusDec is to find clusters of genes with linear expression profiles that then will be used as putative signatures for Digital Sortiing Algorithm (DSA) described in (Zhong et al. BMC Bioinformatics 2013, 14:89).
The basic workflow consists several steps: preprocessing (including clustering), evaluating accuracies of combination of clusters and then using best combination as putative signatures for DSA.
library(devtools)
install_github("ctlab/ClusDec")
Loading library
library(clusdec)
Loading example data:
data("datasetLiverBrainLung")
data("proportionsLiverBrainLung")
We don't take first 9 samples: these are pure samples, and we only leave mixed samples.
mixedGed <- datasetLiverBrainLung[, 10:42]
mixedProportions <- proportionsLiverBrainLung[, 10:42]
head(mixedGed[, 1:6])
head(mixedProportions[, 1:6])
Data looks like
## GSM495218 GSM495219 GSM495220 GSM495221 GSM495222 GSM495223
## A1bg 10.267250 10.374572 10.322145 13.064351 13.016001 13.018780
## A1cf 4.059308 4.458606 4.359954 7.676283 7.696134 7.653000
## A2bp1 9.490740 9.566670 9.422953 7.792056 7.831022 7.892111
## A2ld1 7.655956 7.912126 7.948867 8.187795 8.454980 8.261322
## A2m 6.127628 6.046218 6.080013 6.539769 6.443252 6.373277
## A3galt2 5.126264 5.325947 5.224698 4.968571 4.663900 4.890523
## GSM495218 GSM495219 GSM495220 GSM495221 GSM495222 GSM495223
## Liver 0.05 0.05 0.05 0.70 0.70 0.70
## Brain 0.25 0.25 0.25 0.05 0.05 0.05
## Lung 0.70 0.70 0.70 0.25 0.25 0.25
Now lets run ClusDec and perform deconvolution:
set.seed(31)
lusteredGed <- preprocessDataset(mixedGed, k=5)
accuracy <- clusdecAccuracy(clusteredGed, 3)
results <- chooseBest(clusteredGed, accuracy)
head(results$H[, 1:6])
GSM495218 GSM495219 GSM495220 GSM495221 GSM495222 GSM495223
Cluster 1 0.1407633 0.1417054 0.1426791 0.5317571 0.5254577 0.5288620
Cluster 2 0.4741810 0.4675254 0.4708257 0.2650922 0.2688183 0.2654242
Cluster 3 0.3811201 0.3759724 0.3755612 0.1922302 0.1916823 0.1883313
ClusDec chooses first three clusters as putative signatures for performing deconvolution with DSA. Now we can compare these estimated results with acutal proporions.
plotProportions(results$H, mixedProportions[c(1, 3, 2), ],
pnames=c("Estimated", "Actual"))