Giter Club home page Giter Club logo

methylmix's People

Contributors

dtenenba avatar hpages avatar lucaspatel avatar mpru avatar ogevaert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

methylmix's Issues

Stuck at Finding nonparametric adjustments

Hello, methylmix is stuck at

Done cluster 190189 
	Batch correction for the cancer samples.
Removing 4 samples because their batches are too small.
Reading Sample Information File
Reading Expression Data File
Found 18 batches
Found 0 covariate(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Finding nonparametric adjustments

for about 30 hours +. Is there any problem? My code is:

library(MethylMix)
library(doParallel)

cancerSite <- "LIHC"

targetDirectory <- getwd()


#Downloading methylation data
METdirectories <- Download_DNAmethylation(cancerSite, targetDirectory)

# Processing methylation data
METProcessedData <- Preprocess_DNAmethylation(cancerSite, METdirectories)

# Saving methylation processed data
saveRDS(METProcessedData, file =paste0(targetDirectory, "MET_", cancerSite, "_Processed.rds"))

# Downloading gene expression data
GEdirectories <- Download_GeneExpression(cancerSite, targetDirectory)

# Processing gene expression data
GEProcessedData <- Preprocess_GeneExpression(cancerSite, GEdirectories)

# Saving gene expression processed data
saveRDS(GEProcessedData, file = paste0(targetDirectory, "GE_", cancerSite, "_Processed.rds"))

# Clustering probes to genes methylation data
METProcessedData <- readRDS(paste0(targetDirectory, "MET_", cancerSite, "_Processed.rds"))
res <- ClusterProbes(METProcessedData[[1]], METProcessedData[[2]])

# Putting everything together in one file
toSave <- list(METcancer = res[[1]], METnormal = res[[2]], GEcancer = GEProcessedData[[1]], GEnormal = GEProcessedData[[2]], ProbeMapping = res$ProbeMapping)
saveRDS(toSave, file = paste0(targetDirectory, "data_", cancerSite, ".rds"))

Error with COAD data

I am running MethylMix as follows

cancerSite <- "COAD"
targetDirectory <- paste0(getwd(), "/")

library(doParallel)
no_cores <- detectCores()-2
cl <- makeCluster(no_cores)
registerDoParallel(cl)
GetData(cancerSite, targetDirectory)
stopCluster(cl)

I get the following error

Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'

How do you derive ProbeAnnotation in MethylMix packages?

Hi, I came across methylmix algorithm, I wanted to integrate multiple probe methylation values mapping to one gene into gene methylation level. methylmix clusters multiple probe methylation value with a probe annotation data, I have read methylmix citation articles, 1 and 2, but I didn't find where the annotation data came from, and I checked it by below code:

library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
library(MethylMix)
library(tidyverse)

probe_with_one_more_gene <- Other$UCSC_RefGene_Name %>%
  purrr::map( ~unique(str_split(., pattern = ";")[[1]]) ) %>%
  purrr::map_lgl(
    ~length(.) >= 2
  ) %>%
  {rownames(Other)[.]}


anno_diff <- inner_join(
  ProbeAnnotation,
  as_tibble(Other, rownames = "ILMNID") %>%
    dplyr::select(ILMNID, UCSC_RefGene_Name, UCSC_RefGene_Group),
  by = "ILMNID"
) %>%
  dplyr::filter(
    ILMNID %in% probe_with_one_more_gene
  )


head(anno_diff)

here is the output:

      ILMNID   GENESYMBOL   UCSC_RefGene_Name UCSC_RefGene_Group
1 cg00050873        TSPY4      TSPY4;FAM197Y2       Body;TSS1500
2 cg00061679         DAZ1      DAZ1;DAZ4;DAZ4     Body;Body;Body
3 cg00311963 LOC100101121 LOC100101121;TTTY23    TSS1500;TSS1500
4 cg00335297       RBMY1F      RBMY1F;RBMY2FP    TSS1500;TSS1500
5 cg00576139 LOC100101115 LOC100101115;TTTY21          Body;Body
6 cg00903245        TSPY4      TSPY4;FAM197Y2          Body;Body

Could you explain how to deal with the multiple mapping to gene name of a single probe ID? It seems ProbeAnnotation just takes the first one?

Error downloading the data

Hi, I followed the documentation to download the data from TCGA using the lines of code below

cancerSite <- "OV"
targetDirectory <- paste0(getwd(), "/")
GetData(cancerSite, targetDirectory)

But I get the error

Downloading methylation data for: OV 
Searching 27k MET data for: OV 
	There is no Merge_methylation__humanmethylation27 data for OV 
Searching 450k MET data for: OV 
	There is no Merge_methylation__humanmethylation450 data for OV 
Processing methylation data for: OV 
	Processing data for OV 
	Only 450k samples.
Saving methylation processed data for: OV 
Downloading gene expression data for: OV 
Searching MA data for: OV 
	There is no Merge_transcriptome__agilentg4502a_07_3__unc_edu__Level_3__unc_lowess_normalization_gene_level__data data for OV 
Processing gene expression data for: OV 
Error in dir(MAdirectories) : invalid 'path' argument

It seems like it does not find the data at the location it's pointing to.

Even downloading data individually with the Download_Data, Download_DNAmethylation or Download_GeneExpression functions give me the same error.

Do you know how to solve this issue?

The question of MethylMix

When i use the MethylMix, meet a question, like this:

head(GEcancer)
        TCGA.M7.A71Y.01A TCGA.EJ.5502.01A TCGA.EJ.7784.01A TCGA.VP.A879.01A
ELMO2           3057.710         3894.069         4035.237         4114.755
RPS11          92205.650        48892.200        65484.620        67174.120
CREB3L1        20421.450        20975.050        11748.700        18376.230
PNMA1           3558.358         2244.270         1224.140         5972.402
MMP2            7604.848         6796.484         3001.212         3926.834
head (METcancer[,1:4])
        TCGA.M7.A71Y.01A TCGA.EJ.5502.01A TCGA.EJ.7784.01A TCGA.VP.A879.01A
ELMO2           2.929522         2.483229         2.264610         2.313860
RPS11           2.929522         2.253174         1.721875         2.154875
CREB3L1         2.929522         7.082727         7.819387         6.668843
PNMA1           2.929522         1.151478         1.238564         1.212071
MMP2            2.929522         8.058874        10.746947         8.379230

head(METnormal[,1:4])
        TCGA.HC.7819.11A TCGA.G9.6356.11A TCGA.G9.6365.11A TCGA.EJ.7785.11A
ELMO2           2.683316        2.4848083        2.6970321        2.5897725
RPS11           2.278819        2.8515142        2.9640636        2.7447778
CREB3L1         5.304994        5.8204205        5.5262880        5.2994310
PNMA1           1.184480        0.8888437        0.7018168        0.9949657
MMP2            6.104484        6.4025733        6.5531913        6.7392151

when i run MethylMix it sends me an error

MethylMixResults <- MethylMix(METcancer,GEcancer,METnormal)

Found 179 samples with both methylation and expression data.
Correlating methylation data with gene expression...

Found 2 transcriptionally predictive genes.

Starting Beta mixture modeling.
Running Beta mixture model on 2 genes and on 179 samples.
ELMO2 : 1 component is best.
PNMA1 :  2  components are best.
Error in dimnames(x) <- dn : 
  length of 'dimnames' [1] not equal to array extent

What should i do about it? Thank you very much!

how to add batches to matrix?

Hello,

METcancer = matrix(data = methylation_data, nrow = nb_of_genes, ncol = nb_of_samples)
METnormal = matrix(data = methylation_data, nrow = nb_of_genes, ncol = nb_of_samples)
GEcancer = matrix(data = expression_data, nrow = nb_of_genes, ncol = nb_of_samples)
ClusterProbes(MET_Cancer, MET_Normal, CorThreshold = 0.4)

If the data contains batches, the user must provide numeric batch data within the matrices. MethylMix can be applied on all illumina arrays, including the newly released Epic platform and any array that outputs beta values. At the moment there are no restrictions to input sequencing-based methylation data, if the data is formatted in proportions however, as mixture modeling is computationally expensive, Methylmix will require more time to finish.

I don't understand this part. Can you explain how I am supposed to put the batch information into the matrix? Do you mean add a new row like this?

image

Thanks.

Task failed - subscript out of bounds

    library(MethylMix)

    cl <- makeCluster(10)
    registerDoParallel(cl)
    MethylMixResults <- MethylMix(METcancer=KIRP.meth, GEcancer=KIRP.mrna, METnormal=normal.meth)
    stopCluster(cl)

Traceback:

    Found 173 samples with both methylation and expression data.
    Correlating methylation data with gene expression...
    Error in { : task 27 failed - "subscript out of bounds"

Input:

    > dim(KIRP.meth)
    [1] 15303   173
    > dim(KIRP.mrna)
    [1] 15303   173
    > dim(normal.meth)
    [1] 15303    45
    > dput(KIRP.meth[1:5,1:5])
    structure(c(0.600779057354644, 0.655777740723865, 0.815910290849851,
    0.910945855401136, 0.134190006466966, 0.37318608461713, 0.49246343888403,
    0.794749348441249, 0.881573960745319, 0.0964743163693759, 0.492598954289208,
    0.49246343888403, 0.640456271154751, 0.923089664076206, 0.0793144655277971,
    0.409032309692879, 0.49246343888403, 0.846665808683809, 0.887081977280946,
    0.0992061641604958, 0.248944186060665, 0.682773900167259, 0.862532922981447,
    0.931424455647269, 0.0554991951237877), dim = c(5L, 5L), dimnames = list(
        c("A1BG", "A1CF", "A2M", "A2ML1", "A4GALT"), c("TCGA.2K.A9WE.01",
        "TCGA.2Z.A9J1.01", "TCGA.2Z.A9J3.01", "TCGA.2Z.A9J6.01",
        "TCGA.2Z.A9J7.01")))
> dput(KIRP.mrna[1:5,1:5])
structure(c(2.59432624081245, 0.954242509439325, 4.60945801720877,
0, 3.41229250932305, 1.84453930212901, 1.88081359228079, 4.04178850233385,
0, 3.67403400043125, 1.86272752831797, 1.46239799789896, 3.50267151256675,
0, 3.44232295574557, 1.85381984585676, 2.30102999566398, 3.92471233053849,
0, 3.5585885831082, 2.37145578191302, 1, 3.57821043337005, 0.698970004336019,
3.4679039465228), dim = c(5L, 5L), dimnames = list(c("A1BG",
"A1CF", "A2M", "A2ML1", "A4GALT"), c("TCGA.2K.A9WE.01", "TCGA.2Z.A9J1.01",
"TCGA.2Z.A9J3.01", "TCGA.2Z.A9J6.01", "TCGA.2Z.A9J7.01")))
> dput(normal.meth[1:5,1:5])
structure(c(0.349328212624544, 0.64070562208869, 0.827770814068351,
0.905037869297127, 0.0927688497941891, 0.386231521358919, 0.825015074989356,
0.772426622237963, 0.895876334199358, 0.151047955966834, 0.36338160144188,
0.637276378504495, 0.806073475591881, 0.888988587975131, 0.0689215903518559,
0.304882343721657, 0.650016439361786, 0.809938291034322, 0.880177371770518,
0.0705074131312755, 0.452569562949796, 0.59748205175223, 0.852044340187524,
0.904803008777971, 0.0872863750860763), dim = c(5L, 5L), dimnames = list(
    c("A1BG", "A1CF", "A2M", "A2ML1", "A4GALT"), c("TCGA.A4.7288.11",
    "TCGA.BQ.5880.11", "TCGA.BQ.5879.11", "TCGA.BQ.5883.11",
    "TCGA.BQ.7055.11")))

Thank you!

Development plans for MethylMix (and version difference between bioconductor- and github-repo)

Dear MethylMix-Team!

Thanks a lot for your great project! I'm currently using MethylMix in my thesis and am very interested in contributing (I'll open separate issues for that).

But I'm unsure which is the most current version, since the one on https://bioconductor.org/packages/release/bioc/html/MethylMix.html is 2.18.0. When I run

BiocManager::install("MethylMix")
library(MethylMix)
packageVersion("MethylMix")

the result is also 2.18.0. When cloning the repsitory from bioconductor using git clone https://git.bioconductor.org/packages/MethylMix I see that version 2.19.0 is mentioned in the DESCRIPTION-file.

But when cloning this repository and look into the DESCRIPTION-file I cloned from you at https://github.com/gevaertlab/MethylMix it states Version: 2.11.1. I also realized that the last update was in February 2019 and that the last commit is not part of the master-branch on bioconductor.

Therefore I'd like to ask if I should use this repository (and version?) for submitting pull requests and, if this is not the case, where currently the development is happening.

Best wishes, Alex
(life science PhD student from Vienna)

file(name, "wb") : cannot open the connection

Hi, I have tried to run the package as described in the documentation. However, I have gotten an error. Could you kindly advise me. Thank you.

Code: GetData(cancerSite, targetDirectory)

Error in file(name, "wb") : cannot open the connection
6.
file(name, "wb")
5.
untar2(tarfile, files, list, exdir)
4.
untar(nameForDownloadedFileFullPath, exdir = saveDir)
3.
get_firehoseData(downloadData, TargetDirectory, CancerSite, dataType, dataFileTag)
2.
Download_DNAmethylation(cancerSite, targetDirectory, TRUE)
1.
GetData(cancerSite, targetDirectory)

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_Singapore.1252 LC_CTYPE=English_Singapore.1252
[3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C
[5] LC_TIME=English_Singapore.1252

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] doParallel_1.0.14 iterators_1.0.10 foreach_1.4.4
[4] MethylMix_2.10.2 BiocInstaller_1.30.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.19 codetools_0.2-15 crayon_1.3.4 bitops_1.0-6
[5] grid_3.5.1 plyr_1.8.4 gtable_0.2.0 scales_1.0.0
[9] ggplot2_3.1.0 pillar_1.3.0 rlang_0.3.0.1 lazyeval_0.2.1
[13] limma_3.36.5 tools_3.5.1 RCurl_1.95-4.11 munsell_0.5.0
[17] compiler_3.5.1 colorspace_1.3-2 tibble_1.4.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.