Comments (8)
Example script from Alan:
library(xcms)
library(faahKO)
cdfpath <- system.file("cdf", package = "faahKO")
list.files(cdfpath, recursive = TRUE)
cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE)
xset<-xcmsSet(cdffiles,profmethod = "bin", method="centWave", ppm=30, peakwidth=c(5,50), snthresh=10, prefilter=c(7,1000), integrate=1, mzdiff= -0.001, fitgauss=FALSE, scanrange= c(0,500),mzCenterFun="wMean",noise=300,sleep=0, verbose.columns=T,nSlaves=3);
xset2<-retcor(xset, method = "obiwarp",distFunc="cor_opt",profStep=1,plot="deviation",gapInit=0.8, gapExtend=3.3)
gxset2<-group(xset2,method="density", bw=8, minfrac = .5, minsamp = 2, mzwid = .01, max = 50, sleep = 0)
xset3<-fillPeaks(gxset2);
peaktab <-data.frame(ID=groupnames(xset3),peakTable(xset3))
a<-getEIC(xset3,groupidx=c("M205T2790","M392T2927"))
b<-getEIC(xset3,groupidx=c("M205T2790","M392T2927"),rt="raw")
pdf("plot.example.pdf")
plot(a)
plot(b)
dev.off()
sessionInfo()
from xcms.
So the raw RT EIC has :
> str(b)
Formal class 'xcmsEIC' [package "xcms"] with 5 slots
..@ eic :List of 12
.. ..$ ko15:List of 2
.. .. ..$ : num [1:128, 1:2] 2689 2691 2692 2694 2695 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : NULL
.. .. .. .. ..$ : chr [1:2] "rt" "intensity"
.. .. ..$ : num [1:128, 1:2] 2828 2830 2832 2833 2835 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : NULL
.. .. .. .. ..$ : chr [1:2] "rt" "intensity"
[...]
.. ..$ wt22:List of 2
.. .. ..$ : num [1:128, 1:2] 2689 2691 2692 2694 2695 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : NULL
.. .. .. .. ..$ : chr [1:2] "rt" "intensity"
.. .. ..$ : num [1:128, 1:2] 2828 2830 2832 2833 2835 ...
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : NULL
.. .. .. .. ..$ : chr [1:2] "rt" "intensity"
..@ mzrange : num [1:2, 1:2] 205 392 205 392
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "mzmin" "mzmax"
..@ rtrange : num [1:2, 1:2] 2689 2827 2889 3027
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "rtmin" "rtmax"
..@ rt : chr "raw"
..@ groupnames: chr [1:2] "M205T2790" "M392T2927"
i.e. the raw have 128 datapoints per EIC.
from xcms.
Hi Alan,
it would be a fantastic help if you could create a self-contained and standalone test here.
The we could try to isolate the relevant commit between 1.26.1 and 1.44.x with "git bisect"
(see https://git-scm.com/book/en/v2/Git-Tools-Debugging-with-Git or https://robots.thoughtbot.com/git-bisect )
Yours,
Steffen
from xcms.
Alan Smith made some excellent progress:
Hi Steffen,
I picked this up again and found that the problems in EIC extraction are caused by the length of the vector of corrected retention times. The vector of corrected RTs corresponds to the length of the all of the scans in an xcmsRaw object in xcms version 1.26.1 and only the scan range used for peak detection in xcms version 1.46.0.
-
When we perform peak detection using findPeaks.centWave we only evaluate a portion of the gradient that contains peaks of interest. This subset represents a subset of the total number of scans present in the mzData data file. We use the argument scanrange to reduce scans to regions of interest using centwave. This reduces computation time and focus downstream analysis on the regions of the gradient we are interested in.
a) A typical LC-MS analysis contains 7007 scans.
b) We perform peak detection over 5325 scans. -
We typically analyze EICs to assess performance of peak detection, grouping and specifically retention correction.
a) To evaluate retention time alignment we plot two EICs for each feature. getEICs with "rt" argument set to "raw" and then set to "corrected"
b) We found that accurate EICs could not be generated when the "rt" argument was set to "corrected" with newer versions of xcms, but that accurate EICs
could be generated with "rt" set to raw.
c) The line of code in getEICs which leads to this observations is below. The code came fromgetMethod("getEIC", "xcmsSet")
for xcms version 1.46.0.lcraw@scantime <- object@rt$corrected[[sampidx[i]]]
-
The root of the EIC extraction problem seems to come from how the xcms class object is initially created. The older libraries created an rt slot that had values of the entire scan range [7007 entries per file] while the newer versions of xcms create a rt slot with the truncated scan ranges [5325 entries per file ]. When getEICs is called the scantime vector for the raw lcms file is overwritten with the truncated range of the corrected retention time range. This overwriting of the scan range causes some problems with when the eics are created. I was not able to follow the code through getMethod("getEICNew", "xcmsRaw") or getMethod("getEICOld", "xcmsRaw") to find the exact line of code where an index is messed up.
-
I was able to hack the centwave (xset) object by taking the following steps.
# hack to plot rt corrected EICs on samples analyzed using a subset of the retention time.
lcraw <- xcmsRaw(filepaths(xset)[1], profmethod = profinfo(xset)$method, profstep = 0)
xset@scanrange <- c(0,length(lcraw@scantime))
raw.dat <- sapply(filepaths(xset),xcmsRaw,profmethod = profinfo(xset)$method, profstep = 0)
names(raw.dat ) <- 1:length(raw.dat)
xset@rt$raw <- lapply(raw.dat,function(x) x@scantime)
xset@rt$corrected <- lapply(raw.dat,function(x) x@scantime)
Attach is a pdf with an EIC generated using data after filling peaks [no hack] and an EIC generated using the hack on a centwave prior to retention time correction and filling peaks.
Let me know if there is anything else I can do to help.
Alan
from xcms.
Concerning Alan Smith's point 3) above:
for the getEICOld
and getEICNew
, I introduced that split as I had some problems with the "original" getEIC
method (now referred to as getEICOld
) on xcmsRaw
objects. Just out of curiosity, do you see the same problem you're describing when using the getEICNew
method? You can switch to getEICNew
using:
library(xcms)
opts <- options()$BioC
opts$xcms$getEIC.method <- "getEICNew"
options(BioC=opts)
thanks, jo
from xcms.
Actually, I repeated the code chunk above after changing xcms to use the "new" getEIC
method:
library(xcms)
library(faahKO)
## change getEIC to use getEICNew
opts <- options()$BioC
opts$xcms$getEIC.method <- "getEICNew"
options(BioC=opts)
cdfpath <- system.file("cdf", package = "faahKO")
list.files(cdfpath, recursive = TRUE)
cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE)
xset<-xcmsSet(cdffiles,profmethod = "bin", method="centWave", ppm=30, peakwidth=c(5,50), snthresh=10, prefilter=c(7,1000), integrate=1, mzdiff= -0.001, fitgauss=FALSE, scanrange= c(0,500),mzCenterFun="wMean",noise=300,sleep=0, verbose.columns=T,nSlaves=3);
xset2<-retcor(xset, method = "obiwarp",distFunc="cor_opt",profStep=1,plot="deviation",gapInit=0.8, gapExtend=3.3)
gxset2<-group(xset2,method="density", bw=8, minfrac = .5, minsamp = 2, mzwid = .01, max = 50, sleep = 0)
xset3<-fillPeaks(gxset2);
peaktab <-data.frame(ID=groupnames(xset3),peakTable(xset3))
a<-getEIC(xset3,groupidx=c("M205T2790","M392T2927"))
b<-getEIC(xset3,groupidx=c("M205T2790","M392T2927"),rt="raw")
pdf("plot.example-new.pdf")
plot(a)
plot(b)
dev.off()
The results look actually pretty good to me, but please reconfirm.
Eventually, we should switch by default to getEICNew
(i.e. change getEIC.method="getEICOld"
in init.R to getEIC.method="getEICNew"
)? What I didn't like about the getEICOld
was that an EIC "buffer" was created, apparently to speed things up, but I guess this caused ultimately the problem. In the getEICNew
I kept the code as simple as possible, loosing eventually some milliseconds in the calculation but be more stable.
from xcms.
Merged and pushed as 1.147.3 with Jo's fixes 2f5e291 and 35be059 . If Alain confirms, this can be closed.
from xcms.
closing this issue now - should have been fixed.
from xcms.
Related Issues (20)
- Need all chromatograms in particular sample HOT 1
- Build error invalid class "MsBackendMzR" object: slots not in object: "peaksVariables" HOT 2
- Build failure on arm/Linux HOT 2
- Properly serialize an xcms result object to disk/files HOT 21
- do_findChromPeaks_centWave returns wrong maxo and int* values HOT 3
- findChromPeaks finds duplicate peaks HOT 1
- No method to convert XcmsExperiment to xcmsSet - Version 3.99.5 HOT 1
- xcmsSet interface gives errors HOT 10
- Batch effect correction without QC sample HOT 1
- Definition of anchor peaks for subset-based peakGroups alignment not correct HOT 1
- Vignette: avoid "data" as object name HOT 2
- Add a function that calculates peak quality metrics for detected peaks HOT 10
- Add tests for some ion mobility files HOT 1
- negative subsetting removes chromPeaks
- How would a larger peak width vs smaller peak width affect m/z of features detected? HOT 1
- Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘findChromPeaks’ for signature ‘"XMLDocument", "MatchedFilterParam"’ HOT 2
- why do_groupChromPeaks_density processing peaks with fixed size of the overlapping slices in mz dimension HOT 17
- Retention time alignment using PeakGroupsParam's peakGroupsMatrix parameter renders error HOT 10
- Trouble extracting chromatogram plots after manualChromPeaks() HOT 5
- Seeking advices about data importing HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xcms.