Giter Club home page Giter Club logo

bouls's Introduction

BOULS

BOULS (Bucketing of untargeted LCMS spectra) is a data processing approach based on the xcms package [1-3]. It is particularly suitable for the development of large machine learning models over time since it allows separate processing and joint analysis of untargeted LC‑HRMS data obtained from different devices and at different times by summing up the signal intensities within defined buckets. The application of the BOULS approach requires an Linux OS.

Installation

devtools::install_github("AGSeifert/BOULS")
library(BOULS)

Data

To demonstrate the functionality of this approach, example data is provided here. Please download and unzip the file. The phenodata folder contains a csv file with information about the samples, which can be customized individually (e.g. name, variety, geogr. origin, instrument, harvesting year etc). The other folders provide mzML files (we recommend converting the vendor specific files to mzML format with MSConvert [4]). The example files have been shortened to one minute to enable processing on a normal laptop. For real data sets, we recommend using a workstation. The development of large machine learning models over time requires the definition of one sample as a reference sample for the retention time alignment. For example, this could be the sample with the most detected peaks. This reference sample is added to each new data set to be processed, making the retention time alignment independent of the processing batch. The "Samples" folder contains the samples to be processed in the respective batch.

# Please insert your paths to the 'Sample', the 'Reference' and the 'phenodata' folders.
path_mzMLs <- '/home/hansen/example_data/Samples'
path_Ref <- '/home/hansen/example_data/Reference'
path_csv <- '/home/hansen/example_data/phenodata'

Data processing using BOULS

This package provides two functions. To facilitate the application , the first function process_bouls() includes all steps from data import to the bucketing of the spectra. In this function, the settings for the xcms functions are predefined. To adjust the xcms settings, the second function bouls() can be included in the xcms workflow, replacing the correspondence step.

General processing using the BOULS approach

Here, the example data is processed using the predefined xcms settings. Two CPUs are used in parallel (num_workers parameter). In this case, a retention time range of 300 s to 360 s and a mass range of 250 Da to 750 Da is processed. The size of the segments are 10 s in retention time dimension and 5 Da in mass dimension. The result is a matrix with summed intensities for the respective buckets. The buckets are named (rownames). The first value corresponds to the beginning of the bucket in retention time dimension and the second value to the beginning of the bucket in mass dimension. For example, the bucket 300-310 s and 250-255 Da would have the name "300.250".

bin.result <- process_bouls(path_mzMLs = path_mzMLs, 
                            path_Ref = path_Ref, 
                            path_csv = path_csv,
                            num_workers = 2,
                            RT_range = c(300, 360),
                            size_RT_bins = 10,
                            mz_range = c(250, 750),
                            size_mz_bins = 5)
head(bin.result)

Individual processing using xcms and BOULS approach

For individual settings in peak picking and retention time alignment, the bouls() function can replace the correspondence step in the xcms workflow. The XCMSnExp object resulting from the retention time alignment should be used as object. This is an example, how you can process your LCMS data using xcms and BOULS.

## load mzMLs
  mzMLs <- list.files(path_mzMLs, full.names = TRUE, recursive = TRUE)

  ## load mzML file of reference sample for retention time alignment
  mzMLs_Ref <- list.files(path_Ref, full.names = TRUE, recursive = TRUE)

  ## assembling of sample and reference mzMLs (last sample)
  mzMLs <- c(mzMLs, mzMLs_Ref[1])

  ## read phenodata from csv files
  pd <- read.csv(paste0(path_csv, "/phenodata.csv"))
  ## add reference sample to the pd (last row)
  pd <- rbind(pd, c("Ref.mzML", rep("Ref",times = ncol(pd)-1)))

  # read raw data
  raw_data <- MSnbase::readMSData(
    files = mzMLs,
    pdata = new("NAnnotatedDataFrame", pd),
    mode = "onDisk"
  )

  # Peak picking settings
  cwp <- xcms::CentWaveParam(
    peakwidth = c(5, 20),
    noise = 100,
    ppm = 5,
    snthresh = 5
  )

  # Peak detection
  xdata <- xcms::findChromPeaks(raw_data, param = cwp)

  # Retention time alignment with reference sample as center sample
  index.ref <- which(pd$Name == "Ref.mzML")

  xdata_adj <- xcms::adjustRtime(xdata,
                    param = xcms::ObiwarpParam(
                      binSize = 0.1,
                      centerSample = index.ref,
                      localAlignment = TRUE,
                      distFun = "cor_opt"
                    )
  )
  
  bin.result <- bouls(object = xdata_adj,
                      num_workers = 2,
                      RT_range = c(300, 360),
                      size_RT_bins = 10,
                      mz_range = c(250, 750),
                      size_mz_bins = 5)
head(bin.result)

[1] Smith, C.A., Want, E.J., O'Maille, G., Abagyan,R., Siuzdak, G. (2006). “XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification.” Analytical Chemistry, 78, 779–787.

[2] Tautenhahn R, Boettcher C, Neumann S (2008). “Highly sensitive feature detection for high resolution LC/MS.” BMC Bioinformatics, 9, 504.

[3] Benton HP, Want EJ, Ebbels TMD (2010). “Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data.” BIOINFORMATICS, 26, 2488.

[4] Chambers, M.C.; Maclean, B.; Burke, R.; Amodei, D.; Ruderman, D.L.; Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; et al. A Cross-Platform Toolkit for Mass Spectrometry and Proteomics. Nat. Biotechnol. 2012, 30, 918–920, doi:10.1038/nbt.2377.

bouls's People

Contributors

hansenju avatar stephanseifert avatar

Stargazers

Kozo Nishida avatar

Watchers

Florian Gärber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.