Comments (7)
What about defaulting to useOriginalCode=TRUE
until the release in October,
switch to useOriginalCode=FALSE
plus marking as deprecated first thing after the next release ?
While your analysis and fixes are really excellent, I am hesitant to change the default behaviour
with only 4 weeks notice.
Deprecation is documented in https://www.bioconductor.org/developers/how-to/deprecation/
from xcms.
Results for method = "bin"
in findPeaks.matchedFilter
. (Corresponds to impute = "none"
in do_detectFeatures_matchedFilter
): this performs a simple binning without any missing value imputation.
Results are identical for all tested files and all settings except step = 0.2
(binSize = 0.2
for do_detectFeatures_matchedFilter
):
300:108 400:338 500:533 600:839
A vs original OK
B vs original FAILED! Mean relative difference: 0.7609699
-----------------------------
| Peaks: a: 444 b: 444 common: 434
| 'into' comparison: OK
| 'intf' comparison: 405 equal (93.31797%)
| 'maxo' comparison: OK
| 'maxf' comparison: 400 equal (92.1659%)
| 'i' comparison: OK
| 'sn' comparison: 376 equal (86.63594%)
-----------------------------
C vs original FAILED! Mean relative difference: 0.7609699
-----------------------------
| Peaks: a: 444 b: 444 common: 434
| 'into' comparison: OK
| 'intf' comparison: 405 equal (93.31797%)
| 'maxo' comparison: OK
| 'maxf' comparison: 400 equal (92.1659%)
| 'i' comparison: OK
| 'sn' comparison: 376 equal (86.63594%)
-----------------------------
D vs original FAILED! Mean relative difference: 0.7609699
-----------------------------
| Peaks: a: 444 b: 444 common: 434
| 'into' comparison: OK
| 'intf' comparison: 405 equal (93.31797%)
| 'maxo' comparison: OK
| 'maxf' comparison: 400 equal (92.1659%)
| 'i' comparison: OK
| 'sn' comparison: 376 equal (86.63594%)
-----------------------------
B vs C OK
B vs D OK
C vs D OK
Approaches "B"
, "C"
and "D"
yield identical results which are however different (albeit being highly similar) to the results from the original code. The reason for this is also discussed in issue #47: the separate bin definition in each individual iteration can, due to rounding errors of floating point number representation lead to small differences in the binning. All other approaches except "orig"
and "A"
calculate the breaks defining the bins once.
Thus, performing the binning on the whole matrix instead of iteratively binning smaller subsets results in more stable results.
from xcms.
Results for method = "binlin"
(i.e. impute = "lin"
for do_detectFeatures_matchedFilter
): this performs a binning followed by an imputation of missing bin values by linear interpolation with neighboring bins without missing values.
Using default settings on the test file:
A vs original OK
B vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.09036145 >
B vs original FAILED! Numeric: lengths (5976, 5436) differ
-----------------------------
| Peaks: a: 453 b: 498 common: 404
| 'into' comparison: 231 equal (57.17822%)
| 'intf' comparison: 56 equal (13.86139%)
| 'maxo' comparison: OK
| 'maxf' comparison: 65 equal (16.08911%)
| 'i' comparison: 391 equal (96.78218%)
| 'sn' comparison: 0 equal (0%)
-----------------------------
C vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.002008032 >
C vs original FAILED! Numeric: lengths (5976, 5964) differ
-----------------------------
| Peaks: a: 497 b: 498 common: 497
| 'into' comparison: 329 equal (66.19718%)
| 'intf' comparison: 136 equal (27.36419%)
| 'maxo' comparison: 407 equal (81.89135%)
| 'maxf' comparison: 128 equal (25.75453%)
| 'i' comparison: OK
| 'sn' comparison: 2 equal (0.4024145%)
-----------------------------
D vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.002008032 >
D vs original FAILED! Numeric: lengths (5976, 5988) differ
-----------------------------
| Peaks: a: 499 b: 498 common: 495
| 'into' comparison: 263 equal (53.13131%)
| 'intf' comparison: 63 equal (12.72727%)
| 'maxo' comparison: 331 equal (66.86869%)
| 'maxf' comparison: 77 equal (15.55556%)
| 'i' comparison: OK
| 'sn' comparison: 0 equal (0%)
-----------------------------
B vs C FAILED! Attributes: < Component “dim”: Mean relative difference: 0.09713024 >
B vs C FAILED! Numeric: lengths (5436, 5964) differ
-----------------------------
| Peaks: a: 453 b: 497 common: 404
| 'into' comparison: 231 equal (57.17822%)
| 'intf' comparison: 57 equal (14.10891%)
| 'maxo' comparison: OK
| 'maxf' comparison: 61 equal (15.09901%)
| 'i' comparison: 391 equal (96.78218%)
| 'sn' comparison: 0 equal (0%)
-----------------------------
B vs D FAILED! Attributes: < Component “dim”: Mean relative difference: 0.1015453 >
B vs D FAILED! Numeric: lengths (5436, 5988) differ
-----------------------------
| Peaks: a: 453 b: 499 common: 408
| 'into' comparison: 403 equal (98.77451%)
| 'intf' comparison: 252 equal (61.76471%)
| 'maxo' comparison: OK
| 'maxf' comparison: 205 equal (50.2451%)
| 'i' comparison: 395 equal (96.81373%)
| 'sn' comparison: 1 equal (0.245098%)
-----------------------------
C vs D FAILED! Attributes: < Component “dim”: Mean relative difference: 0.004024145 >
C vs D FAILED! Numeric: lengths (5964, 5988) differ
-----------------------------
| Peaks: a: 497 b: 499 common: 495
| 'into' comparison: OK
| 'intf' comparison: OK
| 'maxo' comparison: OK
| 'maxf' comparison: OK
| 'i' comparison: OK
| 'sn' comparison: 6 equal (1.212121%)
-----------------------------
Here we do expect to see differences between the original code and the newer implementations using binYonX
and imputeLinInterpol
since the latter fix the bugs reported in issues #46 and #49. Also, we expect to see differences between approaches with and without iterative buffering (issue #47). Thus, it is not surprising that we see differences in the results above. Despite not being identical, the results are however similar.
from xcms.
Results for method = "binlinbase"
(i.e. impute = "linbase"
for do_detectFeatures_matchedFilter
): performs binning followed by a missing value imputation that performs linear imputation for bins with missing values if they have neighboring bins up to a certain distance without missing values, setting the missing value to a baseValue
instead (by default half of the minimal intensity observed in the whole file).
We expect to see differences again between approaches with and without iterative buffering, if imputation is performed:
A vs original OK
B vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.3599847 >
B vs original FAILED! Numeric: lengths (31368, 20076) differ
-----------------------------
| Peaks: a: 1673 b: 2614 common: 1300
| 'into' comparison: 1229 equal (94.53846%)
| 'intf' comparison: 1073 equal (82.53846%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1066 equal (82%)
| 'i' comparison: 1260 equal (96.92308%)
| 'sn' comparison: 1053 equal (81%)
-----------------------------
C vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.3661056 >
C vs original FAILED! Numeric: lengths (31368, 19884) differ
-----------------------------
| Peaks: a: 1657 b: 2614 common: 1294
| 'into' comparison: 1221 equal (94.35858%)
| 'intf' comparison: 1075 equal (83.07573%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1072 equal (82.84389%)
| 'i' comparison: 1252 equal (96.75425%)
| 'sn' comparison: 1066 equal (82.38022%)
-----------------------------
D vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.3661056 >
D vs original FAILED! Numeric: lengths (31368, 19884) differ
-----------------------------
| Peaks: a: 1657 b: 2614 common: 1294
| 'into' comparison: 1221 equal (94.35858%)
| 'intf' comparison: 1075 equal (83.07573%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1072 equal (82.84389%)
| 'i' comparison: 1252 equal (96.75425%)
| 'sn' comparison: 1066 equal (82.38022%)
-----------------------------
B vs C FAILED! Attributes: < Component “dim”: Mean relative difference: 0.009563658 >
B vs C FAILED! Numeric: lengths (20076, 19884) differ
-----------------------------
| Peaks: a: 1673 b: 1657 common: 1646
| 'into' comparison: 1645 equal (99.93925%)
| 'intf' comparison: 1639 equal (99.57473%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1636 equal (99.39247%)
| 'i' comparison: 1645 equal (99.93925%)
| 'sn' comparison: 1625 equal (98.72418%)
-----------------------------
B vs D FAILED! Attributes: < Component “dim”: Mean relative difference: 0.009563658 >
B vs D FAILED! Numeric: lengths (20076, 19884) differ
-----------------------------
| Peaks: a: 1673 b: 1657 common: 1646
| 'into' comparison: 1645 equal (99.93925%)
| 'intf' comparison: 1639 equal (99.57473%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1636 equal (99.39247%)
| 'i' comparison: 1645 equal (99.93925%)
| 'sn' comparison: 1625 equal (98.72418%)
-----------------------------
C vs D OK
The results between the non-iterative approaches ("C"
and "D"
) are identical. For all other comparisons we can see differences, that are however relatively small.
from xcms.
Thus, summarizing, the approaches without the iterative buffering yield more reliable (and presumably correct) results. Given also that the binYonX
in combination with imputeLinInterpol
identify similar peaks than the non-iterative approaches using the original I would opt to change the code to use these former methods as default.
I would change findPeaks.matchedFilter
to use the new code (based on binYonX
and imputeLinInterpol
without iterative buffering) adding also a global option to allow the user to use the original code instead (something like options()$BioC$xcms$useOriginalCode
). This would fix issues #46, #47 and #49, results in some performance gain but would still allow the user to switch back to the old code.
@sneumann would you agree?
from xcms.
I agree, that would be a little too fast. OK, I'll set up using the original code as you suggested.
from xcms.
Closing this one since its in the xcms code.
from xcms.
Related Issues (20)
- Seeking advices about data importing HOT 5
- Suggested starting parameters for GC-MS data HOT 3
- reconstructChromPeakSpectra HOT 1
- Filter MsExperiment object by spectraData filterString HOT 16
- Bad practices with peak merging HOT 2
- msLevel=2L chromatograms (MRM or SRM) not working with MsExperiment interface HOT 1
- precusor charge for each features HOT 3
- fillChromPeaks error HOT 2
- Trouble plotting XChromatograms HOT 5
- R CMD check problems HOT 5
- xset crashes on MAC OS 14.3 R 4.3.3 , BiocManager 1.30.22 and xcms 4.0.2 HOT 6
- Update retention times to alkane retention time indices for GC-MS HOT 5
- The error message encountered when using the "IPO" package HOT 10
- Error for XIC plot of timsTof data HOT 3
- Best way to document additional XCMS object processing steps in the ProcessHistory slot? HOT 3
- After I perform chromatographic peak detection, how do I confirm the quality of my peaks? HOT 1
- Ways to extract all MS2 peaks? HOT 3
- Best way to deconvolute metabolite data HOT 1
- Waters data problems with Obiwarp HOT 9
- Support excluding samples for PeakDensity correspondence
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xcms.