Giter Club home page Giter Club logo

Comments (8)

sneumann avatar sneumann commented on June 16, 2024

Example script from Alan:

library(xcms)
library(faahKO)

cdfpath <- system.file("cdf", package = "faahKO")
list.files(cdfpath, recursive = TRUE)
cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE)

xset<-xcmsSet(cdffiles,profmethod = "bin", method="centWave", ppm=30, peakwidth=c(5,50), snthresh=10, prefilter=c(7,1000), integrate=1, mzdiff= -0.001, fitgauss=FALSE, scanrange= c(0,500),mzCenterFun="wMean",noise=300,sleep=0, verbose.columns=T,nSlaves=3);
xset2<-retcor(xset, method = "obiwarp",distFunc="cor_opt",profStep=1,plot="deviation",gapInit=0.8, gapExtend=3.3)
gxset2<-group(xset2,method="density", bw=8, minfrac = .5, minsamp = 2, mzwid = .01, max = 50, sleep = 0)
xset3<-fillPeaks(gxset2);

peaktab <-data.frame(ID=groupnames(xset3),peakTable(xset3))

a<-getEIC(xset3,groupidx=c("M205T2790","M392T2927"))
b<-getEIC(xset3,groupidx=c("M205T2790","M392T2927"),rt="raw")
pdf("plot.example.pdf")
plot(a)
plot(b)
dev.off()
sessionInfo()

from xcms.

sneumann avatar sneumann commented on June 16, 2024

So the raw RT EIC has :

> str(b)
Formal class 'xcmsEIC' [package "xcms"] with 5 slots
  ..@ eic       :List of 12
  .. ..$ ko15:List of 2
  .. .. ..$ : num [1:128, 1:2] 2689 2691 2692 2694 2695 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:2] "rt" "intensity"
  .. .. ..$ : num [1:128, 1:2] 2828 2830 2832 2833 2835 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:2] "rt" "intensity"
[...]
  .. ..$ wt22:List of 2
  .. .. ..$ : num [1:128, 1:2] 2689 2691 2692 2694 2695 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:2] "rt" "intensity"
  .. .. ..$ : num [1:128, 1:2] 2828 2830 2832 2833 2835 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : NULL
  .. .. .. .. ..$ : chr [1:2] "rt" "intensity"
  ..@ mzrange   : num [1:2, 1:2] 205 392 205 392
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "mzmin" "mzmax"
  ..@ rtrange   : num [1:2, 1:2] 2689 2827 2889 3027
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "rtmin" "rtmax"
  ..@ rt        : chr "raw"
  ..@ groupnames: chr [1:2] "M205T2790" "M392T2927"

i.e. the raw have 128 datapoints per EIC.

from xcms.

sneumann avatar sneumann commented on June 16, 2024

Hi Alan,
it would be a fantastic help if you could create a self-contained and standalone test here.
The we could try to isolate the relevant commit between 1.26.1 and 1.44.x with "git bisect"
(see https://git-scm.com/book/en/v2/Git-Tools-Debugging-with-Git or https://robots.thoughtbot.com/git-bisect )
Yours,
Steffen

from xcms.

sneumann avatar sneumann commented on June 16, 2024

Alan Smith made some excellent progress:

Hi Steffen,

I picked this up again and found that the problems in EIC extraction are caused by the length of the vector of corrected retention times. The vector of corrected RTs corresponds to the length of the all of the scans in an xcmsRaw object in xcms version 1.26.1 and only the scan range used for peak detection in xcms version 1.46.0.

  1. When we perform peak detection using findPeaks.centWave we only evaluate a portion of the gradient that contains peaks of interest. This subset represents a subset of the total number of scans present in the mzData data file. We use the argument scanrange to reduce scans to regions of interest using centwave. This reduces computation time and focus downstream analysis on the regions of the gradient we are interested in.
    a) A typical LC-MS analysis contains 7007 scans.
    b) We perform peak detection over 5325 scans.

  2. We typically analyze EICs to assess performance of peak detection, grouping and specifically retention correction.
    a) To evaluate retention time alignment we plot two EICs for each feature. getEICs with "rt" argument set to "raw" and then set to "corrected"
    b) We found that accurate EICs could not be generated when the "rt" argument was set to "corrected" with newer versions of xcms, but that accurate EICs
    could be generated with "rt" set to raw.
    c) The line of code in getEICs which leads to this observations is below. The code came from getMethod("getEIC", "xcmsSet") for xcms version 1.46.0. lcraw@scantime <- object@rt$corrected[[sampidx[i]]]

  3. The root of the EIC extraction problem seems to come from how the xcms class object is initially created. The older libraries created an rt slot that had values of the entire scan range [7007 entries per file] while the newer versions of xcms create a rt slot with the truncated scan ranges [5325 entries per file ]. When getEICs is called the scantime vector for the raw lcms file is overwritten with the truncated range of the corrected retention time range. This overwriting of the scan range causes some problems with when the eics are created. I was not able to follow the code through getMethod("getEICNew", "xcmsRaw") or getMethod("getEICOld", "xcmsRaw") to find the exact line of code where an index is messed up.

  4. I was able to hack the centwave (xset) object by taking the following steps.

# hack to plot rt corrected EICs on samples analyzed using a subset of the retention time.
lcraw <- xcmsRaw(filepaths(xset)[1], profmethod = profinfo(xset)$method, profstep = 0)
xset@scanrange <- c(0,length(lcraw@scantime)) 
raw.dat <- sapply(filepaths(xset),xcmsRaw,profmethod = profinfo(xset)$method, profstep = 0)
names(raw.dat ) <- 1:length(raw.dat)
xset@rt$raw <- lapply(raw.dat,function(x) x@scantime)
xset@rt$corrected <- lapply(raw.dat,function(x) x@scantime)

Attach is a pdf with an EIC generated using data after filling peaks [no hack] and an EIC generated using the hack on a centwave prior to retention time correction and filling peaks.

Let me know if there is anything else I can do to help.
Alan

eic-hackexample-0
eic-hackexample-1

from xcms.

jorainer avatar jorainer commented on June 16, 2024

Concerning Alan Smith's point 3) above:
for the getEICOld and getEICNew, I introduced that split as I had some problems with the "original" getEIC method (now referred to as getEICOld) on xcmsRaw objects. Just out of curiosity, do you see the same problem you're describing when using the getEICNew method? You can switch to getEICNew using:

library(xcms)
opts <- options()$BioC
opts$xcms$getEIC.method <- "getEICNew"
options(BioC=opts)

thanks, jo

from xcms.

jorainer avatar jorainer commented on June 16, 2024

Actually, I repeated the code chunk above after changing xcms to use the "new" getEIC method:

library(xcms)
library(faahKO)
## change getEIC to use getEICNew
opts <- options()$BioC
opts$xcms$getEIC.method <- "getEICNew"
options(BioC=opts)

cdfpath <- system.file("cdf", package = "faahKO")
list.files(cdfpath, recursive = TRUE)
cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE)

xset<-xcmsSet(cdffiles,profmethod = "bin", method="centWave", ppm=30, peakwidth=c(5,50), snthresh=10, prefilter=c(7,1000), integrate=1, mzdiff= -0.001, fitgauss=FALSE, scanrange= c(0,500),mzCenterFun="wMean",noise=300,sleep=0, verbose.columns=T,nSlaves=3);
xset2<-retcor(xset, method = "obiwarp",distFunc="cor_opt",profStep=1,plot="deviation",gapInit=0.8, gapExtend=3.3)
gxset2<-group(xset2,method="density", bw=8, minfrac = .5, minsamp = 2, mzwid = .01, max = 50, sleep = 0)
xset3<-fillPeaks(gxset2);

peaktab <-data.frame(ID=groupnames(xset3),peakTable(xset3))

a<-getEIC(xset3,groupidx=c("M205T2790","M392T2927"))
b<-getEIC(xset3,groupidx=c("M205T2790","M392T2927"),rt="raw")
pdf("plot.example-new.pdf")
plot(a)
plot(b)
dev.off()

The results look actually pretty good to me, but please reconfirm.
Eventually, we should switch by default to getEICNew (i.e. change getEIC.method="getEICOld" in init.R to getEIC.method="getEICNew")? What I didn't like about the getEICOld was that an EIC "buffer" was created, apparently to speed things up, but I guess this caused ultimately the problem. In the getEICNew I kept the code as simple as possible, loosing eventually some milliseconds in the calculation but be more stable.

from xcms.

sneumann avatar sneumann commented on June 16, 2024

Merged and pushed as 1.147.3 with Jo's fixes 2f5e291 and 35be059 . If Alain confirms, this can be closed.

from xcms.

jorainer avatar jorainer commented on June 16, 2024

closing this issue now - should have been fixed.

from xcms.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.