Giter Club home page Giter Club logo

nmrrr's People

Contributors

bpbond avatar kaizadp avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

compass-doe

nmrrr's Issues

Be wary of 'helpful' steps

In general I would suggest being wary of rounding numerical data, 'helpfully' dropping NA groups, etc. Let users make those decisions.

checks

@bpbond

Three notes in the most recent CMD check

  1. size of tarball
    This is probably because of the larger_datasets directory. This is staying only on the GitHub repo, so I will delete before I submit to CRAN. Not an issue.

  2. non-standard directory found
    same as above. Not an issue.

  3. "use inherits() ..."
    Need help, please. Not sure what the right syntax is for this.

image

https://github.com/kaizadp/nmrrr/actions/runs/4735866927/jobs/8406649483

CMD CHECK --as-cran

@bpbond

Got some errors and notes when running CMD CHECK --as-cran

* checking CRAN incoming feasibility ... NOTE
Maintainer: ‘Kaizad Patel <[email protected]>’

^ this is OK

* checking top-level files ... NOTE
Files ‘README.md’ or ‘NEWS.md’ cannot be checked without ‘pandoc’ being installed.

^ do we need to include pandoc in the dependencies?

* checking PDF version of manual ... WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
* checking PDF version of manual without index ... ERROR
Re-running with no redirection of stdout/stderr.
* checking for non-standard things in the check directory ... NOTE
Found the following files/directories:
  ‘nmrrr-manual.tex’

^ I tried searching online, but can't find a solution!

Status: 1 ERROR, 1 WARNING, 3 NOTEs

This is the log:
00check.log

relabund with topspin (peaks method)

method = "peaks" calculates relative abundance based on area under the curve. but topspin peaks files have Intensity, not Area. Need to update the function.

Temporary fix: rename Intensity to Area to get it to work 8266af0

gg_spectra facets/stagger

@bpbond

In the gg_spectra() function, we have an argument to stagger the spectra, so we can see multiple spectra at once. Works fine when we have just a single panel in the graph. But when I use facet_wrap, the staggering is weird (see pics). Is there a way we can "reset" the staggering, so it starts at 0 for each facet?

... or is this asking for too much from our package? It would be a cool feature, but we also don't want to break/bog down the functions with too many customizations.

image
image

https://github.com/bpbond/nmrrr/blob/7c19c171deba6c697ccd32d02edaf76cf943181f/scratchpad.R#L32-L49

Bin assignment

> bins_Clemente2012
# A tibble: 6 × 5
  number group      start  stop description                                
   <int> <chr>      <dbl> <dbl> <chr>                                      
1      1 aliphatic1   0.3   1.3 aliphatic methyl and methylene             
2      2 aliphatic2   1.3   2.2 aliphatic methyl and methylene near O and N
3      3 oalkyl       2.9   4.1 O-alkyl, mainly from carbs and lignin      
4      4 alphah       4.1   4.8 alpha-H from proteins                      
5      5 aromatic     6.2   7.8 aromatic, from lignin and proteins         
6      6 amide        7.8   8.4 amide from proteins      

Does a ppm value of 7.8 belong to bin 5 or 6?

data acquired from different programs

@bpbond

There are two major NMR programs, TopSpin and MestreNova (MNova), and we'd like our functions to be able to handle data from both. The problem - the data are in very different formats (see screenshots below). The way I see it, we have two options:

  1. one single function (import_spectra, import_peaks, etc.) that takes into account the various formats. Maybe a kind of if_loop where the user enters the software name? If software == MNova, then do x. If software == TopSpin, then do y.
  2. separate functions for each program (import_spectra_topspin, import_spectra_mnova)

Thoughts?

Spectra Files

TOPSPIN - spectra
image

MNOVA - spectra
image

Peaks Files

TOPSPIN - peaks
image

MNOVA - peaks, version 1
image

MNOVA - peaks, version 2
image

Checking vignettes and vignette input files

@bpbond

  1. I've written the three vignettes, could you please review to make sure they look ok, and that I'm not missing anything? I'll also ask AMP and MEB to check the content of the vignettes. Two of the vignettes contain only text, no code running. The nmrrr vignette walks the user through the workflow.

  2. Suggestions on vignette titles? I currently have this, but I'm not sure how these will be saved in the package, or how to call the vignette in R (e.g. vignette("nmrrr_binsets") or vignette("Bin sets included with nmrrr")).
    https://github.com/bpbond/nmrrr/blob/84d4d14dfcc9a0278a209ace7eb400591a806f5b/vignettes/nmrrr_binsets.Rmd#L2-L5

  3. I'm having some trouble with the file paths in the nmrrr vignette. The "folder/file" format gave me a CMD error, so I switched to the system.file("folder", "file") format, and CMD check passed. But when I try to knit the .Rmd, it gives me an error.

https://github.com/bpbond/nmrrr/blob/84d4d14dfcc9a0278a209ace7eb400591a806f5b/vignettes/nmrrr.Rmd#L59-L66

image

testing data

Three potential datasets:

  1. kfp_hysteresis
  2. amp_tempest
  3. meb_incubations

kfp_hysteresis:

  • corekey
  • spectra data
  • peaks data - multi-column

amp_tempest:

  • corekey
  • spectra data
  • peaks data - single column

meb_incubation:

  • corekey
  • spectra data
  • peaks data - single column, but different format

Status

  • Code reformatted #10
  • All R CMD CHECK notes fixed except for inst/extdata size
  • Bin assignment code reworked #21
  • Bins are now package data #18
  • Documentation and function exporting greatly changed #14 #17
  • Code, test code, and test data for processing code reworked and expanded #20 #24
  • Relabund code, documentation, and test code reworked #27

I won't have much time this coming week but @kaizadp please test out the package, in particular
reading spectra and peaks files, and assigning bins. Feedback welcome.

issues with import functions

@bpbond

Our functions are now unable to import the files in correctly, and I'm not sure why.

This error occurs only in the shorten_data_files branch, and not in the main one. All I did was delete (move) a lot of the files. Where we previously had 9-10 spectra/peak files, we now have only 2. Files here.

It's throwing up CMD-check errors

Two problems:

  1. peaks files: does not recognize the "9-column groups" for the MNova peaks with the "multiple columns" selection (see error log)

image

  1. spectra files: because of the weird tab vs. comma delim nonsense of MNova, it is not correctly splitting the data into the columns (see below)

Tried testing the code in the scratchpad file, it imports the files wrong.
https://github.com/bpbond/nmrrr/blob/bfc997d3198be7eab6befbc64a8fd789bdbb5b7f/old/scratchpad.R#L19-L21

image

assign_compound_classes_v2

assign_compound_classes_v2 <- function(dat, BINSET) {
  # Quiet R CMD CHECK notes
  group <- start <- NULL

  # load binsets
  bins <- BINSET %>%
    select(group, start, stop) %>%
    arrange(start, stop)

  # identify gaps between bins
  gaps <- c(utils::head(bins$stop, -1) != bins$start[-1], TRUE)
  # create new gap bins
  gapbins <- dplyr::tibble(group = NA_character_, start = bins$stop[gaps])
  newbins <- rbind(bins[c("group", "start")], gapbins) %>% arrange(start)
  newbins[is.na(newbins)] <- "NANA"

  dat$group <- cut(dat$ppm, newbins$start, labels = head(newbins$group, -1), right = FALSE)
  dat %>% filter(group != "NANA")
}

Currently assign_compound_classes_v2 isn't tested or referenced anywhere in the code.
Removing (in e79720e) until @kaizadp tells me what to do with it :)

different programs and file types

@bpbond

Cleaning of NMR spectra:

  • baseline correction, phase correction, peak deconvolution, etc.
  • this is done manually and individually per sample using specific NMR software (MestreNova/MNova and Topspin)
  • NOT part of this package

Processing and analysis of NMR spectra/data (using nmrrr package):

  • take output from MNova or Topspin (.csv files) and process them using the package
  • there are two categories of files: spectra files and peaks files
  • see attached chart for different types/formats, and functions used to process them. (PDF as well as PNG versions)

nmrrr_file_structure.pdf

nmrrr_file_structure

setting defaults for graph function?

@bpbond

nmr_plot_spectra <- function(dat, binset, label_position, mapping, stagger) {

Thoughts on setting some default values for the graphing function? Just so it prints the graph with minimal input, and then users can modify the parameters as needed. Right now, the function won't run unless we provide five inputs.

nmr_plot_spectra <- function(dat, binset, label_position = 100, mapping = aes(x = ppm, y = intensity), stagger = 10) {

^ these defaults gives me this graph for my kfp_hysteresis data. Not very pretty, but it at least gives me a graph, which I can then modify.

image

Function naming

Some packages, e.g. googlesheets, start all their exported (user-facing) functions with a common prefix, e.g. "gs_".

nmrrr could start all of its functions with "nmr_"? Just a thought.

joining dataframes using a range

@bpbond

I need to bin my spectra based on a set of compound classes. The spectra datafile has a ppm column that goes from 0 to ~15, whereas the bins represent only small chunks across this overall range. The code I have assigns the bins, but only subsets the parts that were a match. So I have blank spots for regions that were not a match. How can I assign these bins while also keeping all the non-matches?

Essentially, a full_join, but where one of the data-frames has only ranges.

image

https://github.com/bpbond/nmrrr/blob/f5e4ae4488028ae03795b686f128a735bb168bef/R/1-functions_processing.R#L167-L175

`extdata` files

getting new example files (smaller!) for extdata:

- [ ] drydown DMSO

  • drydown D2O tests (process in MNova)
  • send drydown/hysteresis files to MEB for TopSpin

R CMD CHECK failures

Hi @kaizadp , a few questions:

Tests

Right now the tests fail on my system:

── Failure (test-processing.R:31:3): import-spectra works ──────────────────────
`spectra_test` (`actual`) not equal to `spectra_old` (`expected`).

Is this known? Do you know what's going on?

No visible binding

At the moment checking the package produces lots of CRAN notes of the form

  assign_compound_classes: no visible binding for global variable ‘group’

This is a consequence of using dplyr and more specifically its nonstandard evaluation. If we're going to stick to this approach, we should clean up the warnings.

Documentation

  prepare_Rd: assign_compound_classes.Rd:22-24: Dropping empty section \examples

Similarly, there lots of messages like this...

nmr_plot_spectra parameter names

nmr_plot_spectra <- function(dat, BINSET, LABEL_POSITION, mapping, STAGGER) {

The all-caps BINSET, LABEL_POSITION, and STAGGER are jarring and inconsistent with the naming convention used by the rest of the package functions. Could we change them?

final (?) push!

for @kaizadp:

  • check Preson and Baldock Baldock2004 bins from AMP
  • check CadeMenun bins from MEB
  • check names for all datasets and bins
  • add contexts to bins
  • delete old directory when we're done with everything
  • startup message (R/zzz.R)
  • Wiki info for MNova and initial processing (see here)

for @bpbond:

  • final checks on package, documentation, etc.
  • vignette:
    • review vignette, please
    • vignette: we provide one example (kfp_hysteresis) that is run (eval = TRUE) in the markdown, and one example (amp_ss) where we just provide the workflow, but don't actually run it (eval = FALSE). the latter is mainly because the initial processing is in tidyverse and I didn't want to deal with converting to base-R. Is this ok? (PS - we also include both examples in our manuscript)
    • sessionInfo for vignette: yay or nay?
    • I've installed and loaded the package, but am unable to access the vignette through vignette("nmrrr") or browseVignettes("nmrrr")
  • there is currently no documentation for the datasets in inst/extdata. is this ok?
  • should we set up a GitHub Pages for this package? or just stick with a GitHub wiki?

finally,

  • submit to CRAN!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.