kaizadp / nmrrr Goto Github PK
View Code? Open in Web Editor NEWBinning and visualizing NMR spectra from environmental samples
License: Other
Binning and visualizing NMR spectra from environmental samples
License: Other
Spots that @kaizadp needs to look at are marked with KP_TODO
.
In general I would suggest being wary of rounding numerical data, 'helpfully' dropping NA groups, etc. Let users make those decisions.
compute_relabund_treatments
seems trivial and not worth including in the package.
Three notes in the most recent CMD check
size of tarball
This is probably because of the larger_datasets
directory. This is staying only on the GitHub repo, so I will delete before I submit to CRAN. Not an issue.
non-standard directory found
same as above. Not an issue.
"use inherits() ..."
Need help, please. Not sure what the right syntax is for this.
https://github.com/kaizadp/nmrrr/actions/runs/4735866927/jobs/8406649483
Got some errors and notes when running CMD CHECK --as-cran
* checking CRAN incoming feasibility ... NOTE
Maintainer: ‘Kaizad Patel <[email protected]>’
^ this is OK
* checking top-level files ... NOTE
Files ‘README.md’ or ‘NEWS.md’ cannot be checked without ‘pandoc’ being installed.
^ do we need to include pandoc
in the dependencies?
* checking PDF version of manual ... WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
* checking PDF version of manual without index ... ERROR
Re-running with no redirection of stdout/stderr.
* checking for non-standard things in the check directory ... NOTE
Found the following files/directories:
‘nmrrr-manual.tex’
^ I tried searching online, but can't find a solution!
Status: 1 ERROR, 1 WARNING, 3 NOTEs
This is the log:
00check.log
method = "peaks" calculates relative abundance based on area under the curve. but topspin peaks files have Intensity, not Area. Need to update the function.
Temporary fix: rename Intensity to Area to get it to work 8266af0
In the gg_spectra()
function, we have an argument to stagger the spectra, so we can see multiple spectra at once. Works fine when we have just a single panel in the graph. But when I use facet_wrap
, the staggering is weird (see pics). Is there a way we can "reset" the staggering, so it starts at 0 for each facet?
... or is this asking for too much from our package? It would be a cool feature, but we also don't want to break/bog down the functions with too many customizations.
https://github.com/bpbond/nmrrr/blob/7c19c171deba6c697ccd32d02edaf76cf943181f/scratchpad.R#L32-L49
> bins_Clemente2012
# A tibble: 6 × 5
number group start stop description
<int> <chr> <dbl> <dbl> <chr>
1 1 aliphatic1 0.3 1.3 aliphatic methyl and methylene
2 2 aliphatic2 1.3 2.2 aliphatic methyl and methylene near O and N
3 3 oalkyl 2.9 4.1 O-alkyl, mainly from carbs and lignin
4 4 alphah 4.1 4.8 alpha-H from proteins
5 5 aromatic 6.2 7.8 aromatic, from lignin and proteins
6 6 amide 7.8 8.4 amide from proteins
Does a ppm
value of 7.8 belong to bin 5 or 6?
There are two major NMR programs, TopSpin and MestreNova (MNova), and we'd like our functions to be able to handle data from both. The problem - the data are in very different formats (see screenshots below). The way I see it, we have two options:
import_spectra
, import_peaks
, etc.) that takes into account the various formats. Maybe a kind of if_loop
where the user enters the software name? If software == MNova, then do x. If software == TopSpin, then do y.import_spectra_topspin
, import_spectra_mnova
)Thoughts?
Now that paper has been published 🎉 https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2023JG007768
I've written the three vignettes, could you please review to make sure they look ok, and that I'm not missing anything? I'll also ask AMP and MEB to check the content of the vignettes. Two of the vignettes contain only text, no code running. The nmrrr
vignette walks the user through the workflow.
Suggestions on vignette titles? I currently have this, but I'm not sure how these will be saved in the package, or how to call the vignette in R (e.g. vignette("nmrrr_binsets")
or vignette("Bin sets included with nmrrr")
).
https://github.com/bpbond/nmrrr/blob/84d4d14dfcc9a0278a209ace7eb400591a806f5b/vignettes/nmrrr_binsets.Rmd#L2-L5
I'm having some trouble with the file paths in the nmrrr
vignette. The "folder/file" format gave me a CMD error, so I switched to the system.file("folder", "file") format, and CMD check passed. But when I try to knit the .Rmd, it gives me an error.
See original issue here:
#6
Three potential datasets:
kfp_hysteresis:
amp_tempest:
meb_incubation:
inst/extdata
sizeI won't have much time this coming week but @kaizadp please test out the package, in particular
reading spectra and peaks files, and assigning bins. Feedback welcome.
Our functions are now unable to import the files in correctly, and I'm not sure why.
This error occurs only in the shorten_data_files
branch, and not in the main one. All I did was delete (move) a lot of the files. Where we previously had 9-10 spectra/peak files, we now have only 2. Files here.
It's throwing up CMD-check errors
Two problems:
Tried testing the code in the scratchpad
file, it imports the files wrong.
https://github.com/bpbond/nmrrr/blob/bfc997d3198be7eab6befbc64a8fd789bdbb5b7f/old/scratchpad.R#L19-L21
assign_compound_classes_v2 <- function(dat, BINSET) {
# Quiet R CMD CHECK notes
group <- start <- NULL
# load binsets
bins <- BINSET %>%
select(group, start, stop) %>%
arrange(start, stop)
# identify gaps between bins
gaps <- c(utils::head(bins$stop, -1) != bins$start[-1], TRUE)
# create new gap bins
gapbins <- dplyr::tibble(group = NA_character_, start = bins$stop[gaps])
newbins <- rbind(bins[c("group", "start")], gapbins) %>% arrange(start)
newbins[is.na(newbins)] <- "NANA"
dat$group <- cut(dat$ppm, newbins$start, labels = head(newbins$group, -1), right = FALSE)
dat %>% filter(group != "NANA")
}
Currently assign_compound_classes_v2
isn't tested or referenced anywhere in the code.
Removing (in e79720e) until @kaizadp tells me what to do with it :)
Cleaning of NMR spectra:
Processing and analysis of NMR spectra/data (using nmrrr
package):
Line 30 in afef424
Thoughts on setting some default values for the graphing function? Just so it prints the graph with minimal input, and then users can modify the parameters as needed. Right now, the function won't run unless we provide five inputs.
nmr_plot_spectra <- function(dat, binset, label_position = 100, mapping = aes(x = ppm, y = intensity), stagger = 10) {
^ these defaults gives me this graph for my kfp_hysteresis data. Not very pretty, but it at least gives me a graph, which I can then modify.
Some packages, e.g. googlesheets, start all their exported (user-facing) functions with a common prefix, e.g. "gs_".
nmrrr
could start all of its functions with "nmr_"? Just a thought.
I need to bin my spectra based on a set of compound classes. The spectra datafile has a ppm
column that goes from 0 to ~15, whereas the bins represent only small chunks across this overall range. The code I have assigns the bins, but only subsets the parts that were a match. So I have blank spots for regions that were not a match. How can I assign these bins while also keeping all the non-matches?
Essentially, a full_join
, but where one of the data-frames has only ranges.
getting new example files (smaller!) for extdata
:
- [ ] drydown DMSO
Hi @kaizadp , a few questions:
Right now the tests fail on my system:
── Failure (test-processing.R:31:3): import-spectra works ──────────────────────
`spectra_test` (`actual`) not equal to `spectra_old` (`expected`).
Is this known? Do you know what's going on?
At the moment checking the package produces lots of CRAN notes of the form
assign_compound_classes: no visible binding for global variable ‘group’
This is a consequence of using dplyr and more specifically its nonstandard evaluation. If we're going to stick to this approach, we should clean up the warnings.
prepare_Rd: assign_compound_classes.Rd:22-24: Dropping empty section \examples
Similarly, there lots of messages like this...
nmr_plot_spectra <- function(dat, BINSET, LABEL_POSITION, mapping, STAGGER) {
The all-caps BINSET, LABEL_POSITION, and STAGGER are jarring and inconsistent with the naming convention used by the rest of the package functions. Could we change them?
trying an old-new comparison, and I get an error.
for @kaizadp:
old
directory when we're done with everythingR/zzz.R
)for @bpbond:
vignette("nmrrr")
or browseVignettes("nmrrr")
inst/extdata
. is this ok?finally,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.