In the course of remodeling and updating the xcms pac

Results for method = "bin" in <code class="notranslat

Results for method = "binlin" (i.e. <code class="notr

Results for method = "binlinbase" (i.e. <code class="

Tests and validation results for findPeaks.matchedFilter replacement functions about xcms HOT 7 CLOSED

sneumann commented on June 16, 2024

Tests and validation results for findPeaks.matchedFilter replacement functions

from xcms.

Comments (7)

sneumann commented on June 16, 2024 1

What about defaulting to useOriginalCode=TRUE until the release in October,
switch to useOriginalCode=FALSE plus marking as deprecated first thing after the next release ?
While your analysis and fixes are really excellent, I am hesitant to change the default behaviour
with only 4 weeks notice.
Deprecation is documented in https://www.bioconductor.org/developers/how-to/deprecation/

from xcms.

jorainer commented on June 16, 2024

Results for method = "bin" in findPeaks.matchedFilter. (Corresponds to impute = "none" in do_detectFeatures_matchedFilter): this performs a simple binning without any missing value imputation.

Results are identical for all tested files and all settings except step = 0.2 (binSize = 0.2 for do_detectFeatures_matchedFilter):

300:108 400:338 500:533 600:839 
A vs original OK
B vs original FAILED! Mean relative difference: 0.7609699
-----------------------------
| Peaks: a: 444 b: 444 common: 434
| 'into' comparison: OK
| 'intf' comparison: 405 equal (93.31797%)
| 'maxo' comparison: OK
| 'maxf' comparison: 400 equal (92.1659%)
| 'i' comparison: OK
| 'sn' comparison: 376 equal (86.63594%)
-----------------------------
C vs original FAILED! Mean relative difference: 0.7609699
-----------------------------
| Peaks: a: 444 b: 444 common: 434
| 'into' comparison: OK
| 'intf' comparison: 405 equal (93.31797%)
| 'maxo' comparison: OK
| 'maxf' comparison: 400 equal (92.1659%)
| 'i' comparison: OK
| 'sn' comparison: 376 equal (86.63594%)
-----------------------------
D vs original FAILED! Mean relative difference: 0.7609699
-----------------------------
| Peaks: a: 444 b: 444 common: 434
| 'into' comparison: OK
| 'intf' comparison: 405 equal (93.31797%)
| 'maxo' comparison: OK
| 'maxf' comparison: 400 equal (92.1659%)
| 'i' comparison: OK
| 'sn' comparison: 376 equal (86.63594%)
-----------------------------
B vs C OK
B vs D OK
C vs D OK

Approaches "B", "C" and "D" yield identical results which are however different (albeit being highly similar) to the results from the original code. The reason for this is also discussed in issue #47: the separate bin definition in each individual iteration can, due to rounding errors of floating point number representation lead to small differences in the binning. All other approaches except "orig" and "A" calculate the breaks defining the bins once.

Thus, performing the binning on the whole matrix instead of iteratively binning smaller subsets results in more stable results.

from xcms.

jorainer commented on June 16, 2024

Results for method = "binlin" (i.e. impute = "lin" for do_detectFeatures_matchedFilter): this performs a binning followed by an imputation of missing bin values by linear interpolation with neighboring bins without missing values.

Using default settings on the test file:

A vs original OK
B vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.09036145 >
 B vs original FAILED! Numeric: lengths (5976, 5436) differ
-----------------------------
| Peaks: a: 453 b: 498 common: 404
| 'into' comparison: 231 equal (57.17822%)
| 'intf' comparison: 56 equal (13.86139%)
| 'maxo' comparison: OK
| 'maxf' comparison: 65 equal (16.08911%)
| 'i' comparison: 391 equal (96.78218%)
| 'sn' comparison: 0 equal (0%)
-----------------------------
C vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.002008032 >
 C vs original FAILED! Numeric: lengths (5976, 5964) differ
-----------------------------
| Peaks: a: 497 b: 498 common: 497
| 'into' comparison: 329 equal (66.19718%)
| 'intf' comparison: 136 equal (27.36419%)
| 'maxo' comparison: 407 equal (81.89135%)
| 'maxf' comparison: 128 equal (25.75453%)
| 'i' comparison: OK
| 'sn' comparison: 2 equal (0.4024145%)
-----------------------------
D vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.002008032 >
 D vs original FAILED! Numeric: lengths (5976, 5988) differ
-----------------------------
| Peaks: a: 499 b: 498 common: 495
| 'into' comparison: 263 equal (53.13131%)
| 'intf' comparison: 63 equal (12.72727%)
| 'maxo' comparison: 331 equal (66.86869%)
| 'maxf' comparison: 77 equal (15.55556%)
| 'i' comparison: OK
| 'sn' comparison: 0 equal (0%)
-----------------------------
B vs C FAILED! Attributes: < Component “dim”: Mean relative difference: 0.09713024 >
 B vs C FAILED! Numeric: lengths (5436, 5964) differ
-----------------------------
| Peaks: a: 453 b: 497 common: 404
| 'into' comparison: 231 equal (57.17822%)
| 'intf' comparison: 57 equal (14.10891%)
| 'maxo' comparison: OK
| 'maxf' comparison: 61 equal (15.09901%)
| 'i' comparison: 391 equal (96.78218%)
| 'sn' comparison: 0 equal (0%)
-----------------------------
B vs D FAILED! Attributes: < Component “dim”: Mean relative difference: 0.1015453 >
 B vs D FAILED! Numeric: lengths (5436, 5988) differ
-----------------------------
| Peaks: a: 453 b: 499 common: 408
| 'into' comparison: 403 equal (98.77451%)
| 'intf' comparison: 252 equal (61.76471%)
| 'maxo' comparison: OK
| 'maxf' comparison: 205 equal (50.2451%)
| 'i' comparison: 395 equal (96.81373%)
| 'sn' comparison: 1 equal (0.245098%)
-----------------------------
C vs D FAILED! Attributes: < Component “dim”: Mean relative difference: 0.004024145 >
 C vs D FAILED! Numeric: lengths (5964, 5988) differ
-----------------------------
| Peaks: a: 497 b: 499 common: 495
| 'into' comparison: OK
| 'intf' comparison: OK
| 'maxo' comparison: OK
| 'maxf' comparison: OK
| 'i' comparison: OK
| 'sn' comparison: 6 equal (1.212121%)
-----------------------------

Here we do expect to see differences between the original code and the newer implementations using binYonX and imputeLinInterpol since the latter fix the bugs reported in issues #46 and #49. Also, we expect to see differences between approaches with and without iterative buffering (issue #47). Thus, it is not surprising that we see differences in the results above. Despite not being identical, the results are however similar.

from xcms.

jorainer commented on June 16, 2024

Results for method = "binlinbase" (i.e. impute = "linbase" for do_detectFeatures_matchedFilter): performs binning followed by a missing value imputation that performs linear imputation for bins with missing values if they have neighboring bins up to a certain distance without missing values, setting the missing value to a baseValue instead (by default half of the minimal intensity observed in the whole file).

We expect to see differences again between approaches with and without iterative buffering, if imputation is performed:

A vs original OK
B vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.3599847 >
 B vs original FAILED! Numeric: lengths (31368, 20076) differ
-----------------------------
| Peaks: a: 1673 b: 2614 common: 1300
| 'into' comparison: 1229 equal (94.53846%)
| 'intf' comparison: 1073 equal (82.53846%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1066 equal (82%)
| 'i' comparison: 1260 equal (96.92308%)
| 'sn' comparison: 1053 equal (81%)
-----------------------------
C vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.3661056 >
 C vs original FAILED! Numeric: lengths (31368, 19884) differ
-----------------------------
| Peaks: a: 1657 b: 2614 common: 1294
| 'into' comparison: 1221 equal (94.35858%)
| 'intf' comparison: 1075 equal (83.07573%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1072 equal (82.84389%)
| 'i' comparison: 1252 equal (96.75425%)
| 'sn' comparison: 1066 equal (82.38022%)
-----------------------------
D vs original FAILED! Attributes: < Component “dim”: Mean relative difference: 0.3661056 >
 D vs original FAILED! Numeric: lengths (31368, 19884) differ
-----------------------------
| Peaks: a: 1657 b: 2614 common: 1294
| 'into' comparison: 1221 equal (94.35858%)
| 'intf' comparison: 1075 equal (83.07573%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1072 equal (82.84389%)
| 'i' comparison: 1252 equal (96.75425%)
| 'sn' comparison: 1066 equal (82.38022%)
-----------------------------
B vs C FAILED! Attributes: < Component “dim”: Mean relative difference: 0.009563658 >
 B vs C FAILED! Numeric: lengths (20076, 19884) differ
-----------------------------
| Peaks: a: 1673 b: 1657 common: 1646
| 'into' comparison: 1645 equal (99.93925%)
| 'intf' comparison: 1639 equal (99.57473%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1636 equal (99.39247%)
| 'i' comparison: 1645 equal (99.93925%)
| 'sn' comparison: 1625 equal (98.72418%)
-----------------------------
B vs D FAILED! Attributes: < Component “dim”: Mean relative difference: 0.009563658 >
 B vs D FAILED! Numeric: lengths (20076, 19884) differ
-----------------------------
| Peaks: a: 1673 b: 1657 common: 1646
| 'into' comparison: 1645 equal (99.93925%)
| 'intf' comparison: 1639 equal (99.57473%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1636 equal (99.39247%)
| 'i' comparison: 1645 equal (99.93925%)
| 'sn' comparison: 1625 equal (98.72418%)
-----------------------------
C vs D OK

The results between the non-iterative approaches ("C" and "D") are identical. For all other comparisons we can see differences, that are however relatively small.

from xcms.

jorainer commented on June 16, 2024

Thus, summarizing, the approaches without the iterative buffering yield more reliable (and presumably correct) results. Given also that the binYonX in combination with imputeLinInterpol identify similar peaks than the non-iterative approaches using the original I would opt to change the code to use these former methods as default.

I would change findPeaks.matchedFilter to use the new code (based on binYonX and imputeLinInterpol without iterative buffering) adding also a global option to allow the user to use the original code instead (something like options()$BioC$xcms$useOriginalCode). This would fix issues #46, #47 and #49, results in some performance gain but would still allow the user to switch back to the old code.

@sneumann would you agree?

from xcms.

jorainer commented on June 16, 2024

I agree, that would be a little too fast. OK, I'll set up using the original code as you suggested.

from xcms.

jorainer commented on June 16, 2024

Closing this one since its in the xcms code.

from xcms.

Tests and validation results for findPeaks.matchedFilter replacement functions about xcms HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent