Giter Club home page Giter Club logo

Comments (7)

sneumann avatar sneumann commented on June 16, 2024 1

What about defaulting to useOriginalCode=TRUE until the release in October,
switch to useOriginalCode=FALSE plus marking as deprecated first thing after the next release ?
While your analysis and fixes are really excellent, I am hesitant to change the default behaviour
with only 4 weeks notice.
Deprecation is documented in https://www.bioconductor.org/developers/how-to/deprecation/

from xcms.

jorainer avatar jorainer commented on June 16, 2024

Results for method = "bin" in findPeaks.matchedFilter. (Corresponds to impute = "none" in do_detectFeatures_matchedFilter): this performs a simple binning without any missing value imputation.

Results are identical for all tested files and all settings except step = 0.2 (binSize = 0.2 for do_detectFeatures_matchedFilter):

300:108 400:338 500:533 600:839 
A vs original OK
B vs original FAILED! Mean relative difference: 0.7609699
-----------------------------
| Peaks: a: 444 b: 444 common: 434
| 'into' comparison: OK
| 'intf' comparison: 405 equal (93.31797%)
| 'maxo' comparison: OK
| 'maxf' comparison: 400 equal (92.1659%)
| 'i' comparison: OK
| 'sn' comparison: 376 equal (86.63594%)
-----------------------------
C vs original FAILED! Mean relative difference: 0.7609699
-----------------------------
| Peaks: a: 444 b: 444 common: 434
| 'into' comparison: OK
| 'intf' comparison: 405 equal (93.31797%)
| 'maxo' comparison: OK
| 'maxf' comparison: 400 equal (92.1659%)
| 'i' comparison: OK
| 'sn' comparison: 376 equal (86.63594%)
-----------------------------
D vs original FAILED! Mean relative difference: 0.7609699
-----------------------------
| Peaks: a: 444 b: 444 common: 434
| 'into' comparison: OK
| 'intf' comparison: 405 equal (93.31797%)
| 'maxo' comparison: OK
| 'maxf' comparison: 400 equal (92.1659%)
| 'i' comparison: OK
| 'sn' comparison: 376 equal (86.63594%)
-----------------------------
B vs C OK
B vs D OK
C vs D OK

Approaches "B", "C" and "D" yield identical results which are however different (albeit being highly similar) to the results from the original code. The reason for this is also discussed in issue #47: the separate bin definition in each individual iteration can, due to rounding errors of floating point number representation lead to small differences in the binning. All other approaches except "orig" and "A" calculate the breaks defining the bins once.

Thus, performing the binning on the whole matrix instead of iteratively binning smaller subsets results in more stable results.

from xcms.

jorainer avatar jorainer commented on June 16, 2024

Results for method = "binlin" (i.e. impute = "lin" for do_detectFeatures_matchedFilter): this performs a binning followed by an imputation of missing bin values by linear interpolation with neighboring bins without missing values.

Using default settings on the test file:

A vs original OK
B vs original FAILED! Attributes: < Componentdim: Mean relative difference: 0.09036145 >
 B vs original FAILED! Numeric: lengths (5976, 5436) differ
-----------------------------
| Peaks: a: 453 b: 498 common: 404
| 'into' comparison: 231 equal (57.17822%)
| 'intf' comparison: 56 equal (13.86139%)
| 'maxo' comparison: OK
| 'maxf' comparison: 65 equal (16.08911%)
| 'i' comparison: 391 equal (96.78218%)
| 'sn' comparison: 0 equal (0%)
-----------------------------
C vs original FAILED! Attributes: < Componentdim: Mean relative difference: 0.002008032 >
 C vs original FAILED! Numeric: lengths (5976, 5964) differ
-----------------------------
| Peaks: a: 497 b: 498 common: 497
| 'into' comparison: 329 equal (66.19718%)
| 'intf' comparison: 136 equal (27.36419%)
| 'maxo' comparison: 407 equal (81.89135%)
| 'maxf' comparison: 128 equal (25.75453%)
| 'i' comparison: OK
| 'sn' comparison: 2 equal (0.4024145%)
-----------------------------
D vs original FAILED! Attributes: < Componentdim: Mean relative difference: 0.002008032 >
 D vs original FAILED! Numeric: lengths (5976, 5988) differ
-----------------------------
| Peaks: a: 499 b: 498 common: 495
| 'into' comparison: 263 equal (53.13131%)
| 'intf' comparison: 63 equal (12.72727%)
| 'maxo' comparison: 331 equal (66.86869%)
| 'maxf' comparison: 77 equal (15.55556%)
| 'i' comparison: OK
| 'sn' comparison: 0 equal (0%)
-----------------------------
B vs C FAILED! Attributes: < Componentdim: Mean relative difference: 0.09713024 >
 B vs C FAILED! Numeric: lengths (5436, 5964) differ
-----------------------------
| Peaks: a: 453 b: 497 common: 404
| 'into' comparison: 231 equal (57.17822%)
| 'intf' comparison: 57 equal (14.10891%)
| 'maxo' comparison: OK
| 'maxf' comparison: 61 equal (15.09901%)
| 'i' comparison: 391 equal (96.78218%)
| 'sn' comparison: 0 equal (0%)
-----------------------------
B vs D FAILED! Attributes: < Componentdim: Mean relative difference: 0.1015453 >
 B vs D FAILED! Numeric: lengths (5436, 5988) differ
-----------------------------
| Peaks: a: 453 b: 499 common: 408
| 'into' comparison: 403 equal (98.77451%)
| 'intf' comparison: 252 equal (61.76471%)
| 'maxo' comparison: OK
| 'maxf' comparison: 205 equal (50.2451%)
| 'i' comparison: 395 equal (96.81373%)
| 'sn' comparison: 1 equal (0.245098%)
-----------------------------
C vs D FAILED! Attributes: < Componentdim: Mean relative difference: 0.004024145 >
 C vs D FAILED! Numeric: lengths (5964, 5988) differ
-----------------------------
| Peaks: a: 497 b: 499 common: 495
| 'into' comparison: OK
| 'intf' comparison: OK
| 'maxo' comparison: OK
| 'maxf' comparison: OK
| 'i' comparison: OK
| 'sn' comparison: 6 equal (1.212121%)
-----------------------------

Here we do expect to see differences between the original code and the newer implementations using binYonX and imputeLinInterpol since the latter fix the bugs reported in issues #46 and #49. Also, we expect to see differences between approaches with and without iterative buffering (issue #47). Thus, it is not surprising that we see differences in the results above. Despite not being identical, the results are however similar.

from xcms.

jorainer avatar jorainer commented on June 16, 2024

Results for method = "binlinbase" (i.e. impute = "linbase" for do_detectFeatures_matchedFilter): performs binning followed by a missing value imputation that performs linear imputation for bins with missing values if they have neighboring bins up to a certain distance without missing values, setting the missing value to a baseValue instead (by default half of the minimal intensity observed in the whole file).

We expect to see differences again between approaches with and without iterative buffering, if imputation is performed:

A vs original OK
B vs original FAILED! Attributes: < Componentdim: Mean relative difference: 0.3599847 >
 B vs original FAILED! Numeric: lengths (31368, 20076) differ
-----------------------------
| Peaks: a: 1673 b: 2614 common: 1300
| 'into' comparison: 1229 equal (94.53846%)
| 'intf' comparison: 1073 equal (82.53846%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1066 equal (82%)
| 'i' comparison: 1260 equal (96.92308%)
| 'sn' comparison: 1053 equal (81%)
-----------------------------
C vs original FAILED! Attributes: < Componentdim: Mean relative difference: 0.3661056 >
 C vs original FAILED! Numeric: lengths (31368, 19884) differ
-----------------------------
| Peaks: a: 1657 b: 2614 common: 1294
| 'into' comparison: 1221 equal (94.35858%)
| 'intf' comparison: 1075 equal (83.07573%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1072 equal (82.84389%)
| 'i' comparison: 1252 equal (96.75425%)
| 'sn' comparison: 1066 equal (82.38022%)
-----------------------------
D vs original FAILED! Attributes: < Componentdim: Mean relative difference: 0.3661056 >
 D vs original FAILED! Numeric: lengths (31368, 19884) differ
-----------------------------
| Peaks: a: 1657 b: 2614 common: 1294
| 'into' comparison: 1221 equal (94.35858%)
| 'intf' comparison: 1075 equal (83.07573%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1072 equal (82.84389%)
| 'i' comparison: 1252 equal (96.75425%)
| 'sn' comparison: 1066 equal (82.38022%)
-----------------------------
B vs C FAILED! Attributes: < Componentdim: Mean relative difference: 0.009563658 >
 B vs C FAILED! Numeric: lengths (20076, 19884) differ
-----------------------------
| Peaks: a: 1673 b: 1657 common: 1646
| 'into' comparison: 1645 equal (99.93925%)
| 'intf' comparison: 1639 equal (99.57473%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1636 equal (99.39247%)
| 'i' comparison: 1645 equal (99.93925%)
| 'sn' comparison: 1625 equal (98.72418%)
-----------------------------
B vs D FAILED! Attributes: < Componentdim: Mean relative difference: 0.009563658 >
 B vs D FAILED! Numeric: lengths (20076, 19884) differ
-----------------------------
| Peaks: a: 1673 b: 1657 common: 1646
| 'into' comparison: 1645 equal (99.93925%)
| 'intf' comparison: 1639 equal (99.57473%)
| 'maxo' comparison: OK
| 'maxf' comparison: 1636 equal (99.39247%)
| 'i' comparison: 1645 equal (99.93925%)
| 'sn' comparison: 1625 equal (98.72418%)
-----------------------------
C vs D OK

The results between the non-iterative approaches ("C" and "D") are identical. For all other comparisons we can see differences, that are however relatively small.

from xcms.

jorainer avatar jorainer commented on June 16, 2024

Thus, summarizing, the approaches without the iterative buffering yield more reliable (and presumably correct) results. Given also that the binYonX in combination with imputeLinInterpol identify similar peaks than the non-iterative approaches using the original I would opt to change the code to use these former methods as default.

I would change findPeaks.matchedFilter to use the new code (based on binYonX and imputeLinInterpol without iterative buffering) adding also a global option to allow the user to use the original code instead (something like options()$BioC$xcms$useOriginalCode). This would fix issues #46, #47 and #49, results in some performance gain but would still allow the user to switch back to the old code.

@sneumann would you agree?

from xcms.

jorainer avatar jorainer commented on June 16, 2024

I agree, that would be a little too fast. OK, I'll set up using the original code as you suggested.

from xcms.

jorainer avatar jorainer commented on June 16, 2024

Closing this one since its in the xcms code.

from xcms.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.