kaizadp / nmrrr Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 1.0 49.34 MB

Binning and visualizing NMR spectra from environmental samples

License: Other

R 100.00%

chemistry nmr-spectroscopy package

nmrrr's People

Contributors

Stargazers

Watchers

Forkers

compass-doe

nmrrr's Issues

KP_TODO

Spots that @kaizadp needs to look at are marked with KP_TODO.

Be wary of 'helpful' steps

In general I would suggest being wary of rounding numerical data, 'helpfully' dropping NA groups, etc. Let users make those decisions.

compute_relabund_treatments

compute_relabund_treatments seems trivial and not worth including in the package.

checks

@bpbond

Three notes in the most recent CMD check

size of tarball
This is probably because of the larger_datasets directory. This is staying only on the GitHub repo, so I will delete before I submit to CRAN. Not an issue.
non-standard directory found
same as above. Not an issue.
"use inherits() ..."
Need help, please. Not sure what the right syntax is for this.

https://github.com/kaizadp/nmrrr/actions/runs/4735866927/jobs/8406649483

relabund with topspin (peaks method)

method = "peaks" calculates relative abundance based on area under the curve. but topspin peaks files have Intensity, not Area. Need to update the function.

Temporary fix: rename Intensity to Area to get it to work 8266af0

In the gg_spectra() function, we have an argument to stagger the spectra, so we can see multiple spectra at once. Works fine when we have just a single panel in the graph. But when I use facet_wrap, the staggering is weird (see pics). Is there a way we can "reset" the staggering, so it starts at 0 for each facet?

... or is this asking for too much from our package? It would be a cool feature, but we also don't want to break/bog down the functions with too many customizations.

https://github.com/bpbond/nmrrr/blob/7c19c171deba6c697ccd32d02edaf76cf943181f/scratchpad.R#L32-L49

Bin assignment

> bins_Clemente2012
# A tibble: 6 × 5
  number group      start  stop description                                
   <int> <chr>      <dbl> <dbl> <chr>                                      
1      1 aliphatic1   0.3   1.3 aliphatic methyl and methylene             
2      2 aliphatic2   1.3   2.2 aliphatic methyl and methylene near O and N
3      3 oalkyl       2.9   4.1 O-alkyl, mainly from carbs and lignin      
4      4 alphah       4.1   4.8 alpha-H from proteins                      
5      5 aromatic     6.2   7.8 aromatic, from lignin and proteins         
6      6 amide        7.8   8.4 amide from proteins

Does a ppm value of 7.8 belong to bin 5 or 6?

data acquired from different programs

@bpbond

There are two major NMR programs, TopSpin and MestreNova (MNova), and we'd like our functions to be able to handle data from both. The problem - the data are in very different formats (see screenshots below). The way I see it, we have two options:

one single function (import_spectra, import_peaks, etc.) that takes into account the various formats. Maybe a kind of if_loop where the user enters the software name? If software == MNova, then do x. If software == TopSpin, then do y.
separate functions for each program (import_spectra_topspin, import_spectra_mnova)

Thoughts?

Spectra Files

TOPSPIN - spectra

MNOVA - spectra

Peaks Files

TOPSPIN - peaks

MNOVA - peaks, version 1

MNOVA - peaks, version 2

Update package README and citation()

Now that paper has been published 🎉 https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2023JG007768

Checking vignettes and vignette input files

@bpbond

I've written the three vignettes, could you please review to make sure they look ok, and that I'm not missing anything? I'll also ask AMP and MEB to check the content of the vignettes. Two of the vignettes contain only text, no code running. The nmrrr vignette walks the user through the workflow.
Suggestions on vignette titles? I currently have this, but I'm not sure how these will be saved in the package, or how to call the vignette in R (e.g. vignette("nmrrr_binsets") or vignette("Bin sets included with nmrrr")).
https://github.com/bpbond/nmrrr/blob/84d4d14dfcc9a0278a209ace7eb400591a806f5b/vignettes/nmrrr_binsets.Rmd#L2-L5
I'm having some trouble with the file paths in the nmrrr vignette. The "folder/file" format gave me a CMD error, so I switched to the system.file("folder", "file") format, and CMD check passed. But when I try to knit the .Rmd, it gives me an error.

https://github.com/bpbond/nmrrr/blob/84d4d14dfcc9a0278a209ace7eb400591a806f5b/vignettes/nmrrr.Rmd#L59-L66

Package functions/structure

See original issue here:
#6

testing data

Three potential datasets:

kfp_hysteresis
amp_tempest
meb_incubations

kfp_hysteresis:

corekey
spectra data
peaks data - multi-column

amp_tempest:

corekey
spectra data
peaks data - single column

meb_incubation:

corekey
spectra data
peaks data - single column, but different format

Status

Code reformatted #10
All R CMD CHECK notes fixed except for inst/extdata size
Bin assignment code reworked #21
Bins are now package data #18
Documentation and function exporting greatly changed #14 #17
Code, test code, and test data for processing code reworked and expanded #20 #24
Relabund code, documentation, and test code reworked #27

I won't have much time this coming week but @kaizadp please test out the package, in particular
reading spectra and peaks files, and assigning bins. Feedback welcome.

issues with import functions

@bpbond

Our functions are now unable to import the files in correctly, and I'm not sure why.

This error occurs only in the shorten_data_files branch, and not in the main one. All I did was delete (move) a lot of the files. Where we previously had 9-10 spectra/peak files, we now have only 2. Files here.

It's throwing up CMD-check errors

Two problems:

peaks files: does not recognize the "9-column groups" for the MNova peaks with the "multiple columns" selection (see error log)

spectra files: because of the weird tab vs. comma delim nonsense of MNova, it is not correctly splitting the data into the columns (see below)

Tried testing the code in the scratchpad file, it imports the files wrong.
https://github.com/bpbond/nmrrr/blob/bfc997d3198be7eab6befbc64a8fd789bdbb5b7f/old/scratchpad.R#L19-L21

assign_compound_classes_v2

assign_compound_classes_v2 <- function(dat, BINSET) {
  # Quiet R CMD CHECK notes
  group <- start <- NULL

  # load binsets
  bins <- BINSET %>%
    select(group, start, stop) %>%
    arrange(start, stop)

  # identify gaps between bins
  gaps <- c(utils::head(bins$stop, -1) != bins$start[-1], TRUE)
  # create new gap bins
  gapbins <- dplyr::tibble(group = NA_character_, start = bins$stop[gaps])
  newbins <- rbind(bins[c("group", "start")], gapbins) %>% arrange(start)
  newbins[is.na(newbins)] <- "NANA"

  dat$group <- cut(dat$ppm, newbins$start, labels = head(newbins$group, -1), right = FALSE)
  dat %>% filter(group != "NANA")
}

Currently assign_compound_classes_v2 isn't tested or referenced anywhere in the code.
Removing (in e79720e) until @kaizadp tells me what to do with it :)

different programs and file types

@bpbond

Cleaning of NMR spectra:

baseline correction, phase correction, peak deconvolution, etc.
this is done manually and individually per sample using specific NMR software (MestreNova/MNova and Topspin)
NOT part of this package

Processing and analysis of NMR spectra/data (using nmrrr package):

take output from MNova or Topspin (.csv files) and process them using the package
there are two categories of files: spectra files and peaks files
see attached chart for different types/formats, and functions used to process them. (PDF as well as PNG versions)

nmrrr_file_structure.pdf

setting defaults for graph function?

@bpbond

nmrrr/R/1-functions_graphs.R

Line 30 in afef424

nmr_plot_spectra <- function(dat, binset, label_position, mapping, stagger) {

Thoughts on setting some default values for the graphing function? Just so it prints the graph with minimal input, and then users can modify the parameters as needed. Right now, the function won't run unless we provide five inputs.

nmr_plot_spectra <- function(dat, binset, label_position = 100, mapping = aes(x = ppm, y = intensity), stagger = 10) {

^ these defaults gives me this graph for my kfp_hysteresis data. Not very pretty, but it at least gives me a graph, which I can then modify.

Function naming

Some packages, e.g. googlesheets, start all their exported (user-facing) functions with a common prefix, e.g. "gs_".

nmrrr could start all of its functions with "nmr_"? Just a thought.

joining dataframes using a range

@bpbond

I need to bin my spectra based on a set of compound classes. The spectra datafile has a ppm column that goes from 0 to ~15, whereas the bins represent only small chunks across this overall range. The code I have assigns the bins, but only subsets the parts that were a match. So I have blank spots for regions that were not a match. How can I assign these bins while also keeping all the non-matches?

Essentially, a full_join, but where one of the data-frames has only ranges.

https://github.com/bpbond/nmrrr/blob/f5e4ae4488028ae03795b686f128a735bb168bef/R/1-functions_processing.R#L167-L175

`extdata` files

getting new example files (smaller!) for extdata:

~~- [ ] drydown DMSO~~

drydown D2O tests (process in MNova)
send drydown/hysteresis files to MEB for TopSpin

R CMD CHECK failures

Hi @kaizadp , a few questions:

Tests

Right now the tests fail on my system:

── Failure (test-processing.R:31:3): import-spectra works ──────────────────────
`spectra_test` (`actual`) not equal to `spectra_old` (`expected`).

Is this known? Do you know what's going on?

No visible binding

At the moment checking the package produces lots of CRAN notes of the form

  assign_compound_classes: no visible binding for global variable ‘group’

This is a consequence of using dplyr and more specifically its nonstandard evaluation. If we're going to stick to this approach, we should clean up the warnings.

Documentation

  prepare_Rd: assign_compound_classes.Rd:22-24: Dropping empty section \examples

Similarly, there lots of messages like this...

nmr_plot_spectra parameter names

nmr_plot_spectra <- function(dat, BINSET, LABEL_POSITION, mapping, STAGGER) {

The all-caps BINSET, LABEL_POSITION, and STAGGER are jarring and inconsistent with the naming convention used by the rest of the package functions. Could we change them?

testing

trying an old-new comparison, and I get an error.

https://github.com/bpbond/nmrrr/blob/12819e9c0d015fd5a192060962e764f1fa96949d/tests/testthat/test-processing.R#L3-L13

@bpbond

final (?) push!

for @kaizadp:

check ~~Preson and Baldock~~ Baldock2004 bins from AMP
check CadeMenun bins from MEB
check names for all datasets and bins
add contexts to bins
delete old directory when we're done with everything
startup message (R/zzz.R)
Wiki info for MNova and initial processing (see here)

for @bpbond:

final checks on package, documentation, etc.
vignette:
- review vignette, please
- vignette: we provide one example (kfp_hysteresis) that is run (eval = TRUE) in the markdown, and one example (amp_ss) where we just provide the workflow, but don't actually run it (eval = FALSE). the latter is mainly because the initial processing is in tidyverse and I didn't want to deal with converting to base-R. Is this ok? (PS - we also include both examples in our manuscript)
- sessionInfo for vignette: yay or nay?
- I've installed and loaded the package, but am unable to access the vignette through vignette("nmrrr") or browseVignettes("nmrrr")
there is currently no documentation for the datasets in inst/extdata. is this ok?
should we set up a GitHub Pages for this package? or just stick with a GitHub wiki?

finally,

submit to CRAN!

kaizadp / nmrrr Goto Github PK

nmrrr's People

Contributors

Stargazers

Watchers

Forkers

nmrrr's Issues

Spectra Files

Peaks Files

Tests

No visible binding

Documentation

Recommend Projects

Recommend Topics

Recommend Org