mi2datalab / memr Goto Github PK
View Code? Open in Web Editor NEWR package for Multisource Embeddings for Medical Records
Home Page: https://mi2datalab.github.io/memr/
License: Other
R package for Multisource Embeddings for Medical Records
Home Page: https://mi2datalab.github.io/memr/
License: Other
re: openjournals/joss-reviews#2482
State of the field: Do the authors describe how this software compares to other commonly-used packages?
What other packages have similar functionality to this one? Without memr, what would a researcher or doctor use to perform similar analysis on their data? What does memr add over someone using text2vec and performing cluster or PCA analysis themselves?
The following code from README.Rmd results in errors when executed in RStudio Version 1.4.1106 with R version 4.0.4 (64-bit):
embedding_size <- 5
interview_term_vectors <- embed_terms(merged_terms = interviews, embedding_size = embedding_size,
term_count_min = 1L)
Error in initialize(...) :
unused arguments (word_vectors_size = 5, vocabulary = list(c("fever", "rhinitis", "cough", "eye", "thyroid"), c(3, 3, 4, 4, 6), c(3, 3, 4, 4, 6)))
examination_term_vectors <- embed_terms(merged_terms = examinations, embedding_size = embedding_size,
term_count_min = 1L)
Error in initialize(...) :
unused arguments (word_vectors_size = 5, vocabulary = list(c("fever", "man", "mother", "cough", "heart", "patient", "thyroid", "eye", "rhinitis", "woman", "father"), c(2, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7), c(2, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7)))
embedding_size <- 5
interview_term_vectors <- embed_terms(merged_terms = interviews, embedding_size = embedding_size,
term_count_min = 1L)
Error in initialize(...) :
unused arguments (word_vectors_size = 5, vocabulary = list(c("fever", "rhinitis", "cough", "eye", "thyroid"), c(3, 3, 4, 4, 6), c(3, 3, 4, 4, 6)))
Consider recommending remotes::install_git() instead of devtools::install_git() as the remotes package has fewer dependencies than devtools and is less likely to cause installation issues for people.
Also consider submitting the package to CRAN or bioconductor to make discovery and installation easier for more R users.
For this package to be useful for other researchers and to serve a purpose beyond capturing the method and code used for https://arxiv.org/pdf/1907.04152.pdf, it needs a vignette and more extensive documentation.
After reading the JOSS paper, the readme here, and the documentation, I'm not clear on how a researcher or doctor would start to use this package.
The readme references "medical free-text records written by doctors" but the example data sets are highly distilled and contain just a few terms. Given the description both here and in the arxiv paper, I expected a sample dataset that approximates the structure of the "dataset of free-text clinical records" referenced. I then expected to see documentation and examples of how a user of the package would be expected to transform this raw data (or really their own similar data) into the distilled inputs expected by the functions of this package.
From https://arxiv.org/pdf/1907.04152.pdf, it seems that memr is not focused on this data processing. If this is correct, I'd suggest 1) editing the description of the package to reflect what type of data it can be used with, and 2) more documentation on what the structure of the data inputs to the functions are expected to contain and what the characteristics of the data should be (e.g. should terms be lowercase? certain parts of speech?). memr does not necessarily need to have all of the functionality to process medical free text records into the format the package needs (although that would be helpful), but potential users need to know what type of data inputs they need to create. The sample data sets and vectors are insufficient to determine this.
re: openjournals/joss-reviews#2482
memr needs:
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Re: re: openjournals/joss-reviews#2482
The JOSS paper should cite the other R packages you use. It currently only cites text2vec.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.