lrnv / cort Goto Github PK
View Code? Open in Web Editor NEWClasses and Tools For Some Empirical and Non-parametrical Copula Models.
Home Page: https://lrnv.github.io/cort/
License: Other
Classes and Tools For Some Empirical and Non-parametrical Copula Models.
Home Page: https://lrnv.github.io/cort/
License: Other
The review from @agisga:
I'm deeply sorry for the delay in the review (the timing turned out to be very far from ideal for me unfortunately). Thank you for the patience.
I think that cort
is quality software and a substantial scholarly effort, and I would suggest acceptance subject to minor revisions correcting the issues outlined in this thread.
Some of the issues that I came across were already mentioned by @coatless above, most importantly, the lack of community guidelines, instructions for potential contributors, for issue reporting or other support. So, I will try not to repeat what has already been said.
Vignettes
In addition to what @coatless has mentioned, the second code block in vignette 4 (https://cran.r-project.org/web/packages/cort/vignettes/vignette04_bootstrap_varying_m.html) has the following redundant lines that should be removed:
test <-
train <-
Automated tests
I get some warnings when running the automated tests manually with devtools::test
(tested with R version 4.0.3 (2020-10-10) on Arch Linux):
test-CortForest.R:5: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".
test-CortForest.R:9: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".
While this is very minor issue, I think such warnings should be fixed (or maybe explicitly set to be ignored) unless there are reasons not to.
State of the field
State of the field: Do the authors describe how this software compares to other commonly-used packages?
I'm not very familiar with estimation of non-parametric copulas and so this may not be relevant, but I don't really see a comparison to other R or Python packages. A simple web search shows that other copula estimation packages exist. While the cort
paper mentions publications by Li et al. and Nagler et al. as alternative approaches, it is not clear whether these papers correspond to some commonly-used software packages or are mainly theoretical/methodological papers.
Quality of writing
There are some minor grammatical and spelling mistakes in the paper. So, a proofreading would be beneficial. Here are some of the mistakes I noticed:
Page 1, third paragraph:
There also exists several tree-structured piecewise constant density estimators,
"exists" should be "exist"
The new models that are implemented in this package tries to solve these issues.
"tries" should be 'try"
Page 2:
"pakcage" should be "package"
"was design" should be "was designed"
Page 2:
"Examples datasets are included in the package, and the many vignettes gives examples of usecases."
"Examples" should be "Example", "gives" should be "give", "usecases" should be "use cases".
Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)
Some more comments from @pdebuyl:
Typo, page 2, first line. "efficicent" -> "efficient"
Typo, page 2, acknowledgments. "meaningfull" -> "meaningful".
Grammar, page 2, second paragraph. "by statistician" -> "by statisticians" (if you pick singular instead, then conjugation of "need" should be changed as well).
Affiliation: please add "France" to the first affiliation and spell out the second affiliation (+ add city and country).
For specific issues such as the API naming, you can discuss them in cort issues and report the conclusion here. For changes to the paper, I prefer if they are done in this thread.
Originally posted by @pdebuyl in openjournals/joss-reviews#2653 (comment)
@coatless wrote:
Paper-wise, there is a missing note that the underlying non-parametric algorithms are written in C++. It would be great to link out to the RcppArmadillo
/Rcpp
paper here.
Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)
Prepare for release:
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
rhub::check(platform = 'ubuntu-rchk')
rhub::check_with_sanitizers()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
When using the r*()
functions, sometimes a code chunk has a set.seed()
component. On other chunks, this is missing. Thus, there may unintentionally be a hidden state that is present. Moreover, in one vignette, we have the advocation for a non-standard set_seed()
function (note: the _
instead of .
).
https://lrnv.github.io/cort/articles/vignette02_cort_clayton.html#dataset-1
At this point, it would be better to explicitly set a dependency on R >= 3.6
Depending on the Vignette, the R code shown either lacks space or has a lot of space.
For example:
Consistency here would be appreciated. Consider running styler::style_dir("vignettes")
.
Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)
I would also add a note that the estimation routines for copulas are written in C++ within the JOSS paper. At present, the use of C++ isn't stated as it only mentions S4 and R.
State of the field
State of the field: Do the authors describe how this software compares to other commonly-used packages?
I'm not very familiar with estimation of non-parametric copulas and so this may not be relevant, but I don't really see a comparison to other R or Python packages. A simple web search shows that other copula estimation packages exist. While the cort
paper mentions publications by Li et al. and Nagler et al. as alternative approaches, it is not clear whether these papers correspond to some commonly-used software packages or are mainly theoretical/methodological papers.
Quality of writing
There are some minor grammatical and spelling mistakes in the paper. So, a proofreading would be beneficial. Here are some of the mistakes I noticed:
Page 1, third paragraph:
There also exists several tree-structured piecewise constant density estimators,
"exists" should be "exist"
The new models that are implemented in this package tries to solve these issues.
"tries" should be 'try"
Page 2:
"pakcage" should be "package"
"was design" should be "was designed"
Page 2:
"Examples datasets are included in the package, and the many vignettes gives examples of usecases."
"Examples" should be "Example", "gives" should be "give", "usecases" should be "use cases".
Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)
Typo, page 2, first line. "efficicent" -> "efficient"
Typo, page 2, acknowledgments. "meaningfull" -> "meaningful".
Grammar, page 2, second paragraph. "by statistician" -> "by statisticians" (if you pick singular instead, then conjugation of "need" should be changed as well).
Affiliation: please add "France" to the first affiliation and spell out the second affiliation (+ add city and country).
For specific issues such as the API naming, you can discuss them in cort issues and report the conclusion here. For changes to the paper, I prefer if they are done in this thread.
Originally posted by @pdebuyl in openjournals/joss-reviews#2653 (comment)
Functions within the package switch back and forth between using snake_case and CamelCase. It would be ideal if there could be some consistency.
https://lrnv.github.io/cort/reference/index.html
Moreover, could the reference portion of the API on the pkgdown
website be split into subcategories? e.g. data/cort/copulas.
Minor note: Cort()
uses N=
whereas elsewhere all function parameters are lower-case.
Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)
Prepare for release:
usethis::use_cran_comments()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
rhub::check(platform = 'ubuntu-rchk')
rhub::check_with_sanitizers()
cran-comments.md
Submit to CRAN:
usethis::use_version('minor')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
The review from @coatless:
I'm a huge fan of the package. Long ago I stepped foot into copula estimation with R and was disappointed by the tools available. cort
fills portions of that void. Examples with the package are easily accessible outside of R thanks to a pkgdown
website. Having said this, I do have a few comments and suggestions around the package. I've listed them below.
Please indicate on the README that installation from GitHub will require the system to have a compiler.
I would also add a note that the estimation routines for copulas are written in C++ within the JOSS paper. At present, the use of C++ isn't stated as it only mentions S4 and R.
There isn't a Contributor Code of Conduct, contributor guidelines, or how to report a problem list. Perhaps this could be added to the end of the README?
When using the r*()
functions, sometimes a code chunk has a set.seed()
component. On other chunks, this is missing. Thus, there may unintentionally be a hidden state that is present. Moreover, in one vignette, we have the advocation for a non-standard set_seed()
function (note: the _
instead of .
).
https://lrnv.github.io/cort/articles/vignette02_cort_clayton.html#dataset-1
At this point, it would be better to explicitly set a dependency on R >= 3.6
Depending on the Vignette, the R code shown either lacks space or has a lot of space.
For example:
Consistency here would be appreciated. Consider running styler::style_dir("vignettes")
.
Functions within the package switch back and forth between using snake_case and CamelCase. It would be ideal if there could be some consistency.
https://lrnv.github.io/cort/reference/index.html
Moreover, could the reference portion of the API on the pkgdown
website be split into subcategories? e.g. data/cort/copulas.
Minor note: Cort()
uses N=
whereas elsewhere all function parameters are lower-case.
There is a documentation article for each function. However, I think some of the documentation entries should receive additional insight into the implementation, improved explanation of what is happening in the example, and perhaps a periodic change in which data set is used for the computation. (~4 ship with the package but only LifeCycleSavings
from datasets
is used.)
For instance, in the the titular function Cort()
for the cort
package, there is sparse details and no indication of what is contained in a Cort
object:
https://lrnv.github.io/cort/reference/Cort-Class.html
R> cop <- Cort(LifeCycleSavings[,1:3])
#Splitting...
#
# 1 leaves to split...
# 5 leaves to split...
# 10 leaves to split...
# 1 leaves to split...
#Enforcing constraints...
#Done !
R> str(cop)
#Formal class 'Cort' [package "cort"] with 11 slots
# ..@ p_value_for_dim_red: num 0.75
# ..@ number_max_dim : int 3
# ..@ min_node_size : num 1
# ..@ verbose_lvl : num 1
# ..@ vols : num [1:88] 0.05559 0.16964 0.04987 0.06598 0.00195 ...
# ..@ f : num [1:88] 0 0.02 0.02 0.02 0.02 0 0 0 0.02 0 ...
# ..@ p : num [1:88] 1.95e-09 5.75e-02 2.66e-02 7.37e-03 9.78e-04 ...
# ..@ a : num [1:88, 1:3] 0 0.227 0 0 0 ...
# ..@ b : num [1:88, 1:3] 0.227 1 0.227 0.227 0.227 ...
# ..@ data : num [1:50, 1:3] 0.627 0.686 0.824 0.235 0.804 ...
# ..@ dim : int 3
In contrast, the S4 documentation for Matrix()
in the Matrix
package looks like: https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/Matrix-class.html
In another documentation entry, I think a numerical tolerance issue is present due to using ==
instead of all.equal()
:
https://lrnv.github.io/cort/reference/quad_prod-methods.html
R> quad_prod(cop,cop)
# [1] 10.81164
R> quad_norm(cop)
# [1] 10.81164
R> quad_norm(cop) == quad_prod(cop,cop)
# [1] FALSE
For the simulated data sets, there isn't a simulation script associated with the help file:
Though, the scripts can be found in:
https://github.com/lrnv/cort/tree/master/data-raw
I think it would be helpful to link to data-raw/
or directly embed it within the Rd file.
Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)
@coatless wrote
Regarding code, the documentation governing the object slots is still a bit sparse. It would be good to understand what is being returned within the Cort()
object explicitly.
That is, what do the following slot entries correspond to?
p_value_for_dim_red
number_max_dim
min_node_size
verbose_lvl
vols
f
p
a
b
data
dim
Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)
Please indicate on the README that installation from GitHub will require the system to have a compiler.
Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)
Some of the issues that I came across were already mentioned by @coatless above, most importantly, the lack of community guidelines, instructions for potential contributors, for issue reporting or other support. So, I will try not to repeat what has already been said.
Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)
@lrnv Generally the code changes look good. Here are a few comments based on my previous feedback and the new changes:
cort
package works flawlessly on my system. However, the first time I tried it, I was getting error messages, which went away after I upgraded furrr
to the newest version (from CRAN). I'm not sure what version of furrr
I had prior to upgrading. Maybe it would be good to provide version requirements for furrr
and all other dependencies in the DESCRIPTION
(currently, testthat
has a version requirement stated, but other dependencies do not)?I also had a very quick look at the new changes to the paper:
copula
R package is given at https://cran.r-project.org/web/packages/copula/citation.html I don't think it's appropriate to just give a link, when there are several peer-reviewed JSS publications.Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)
Prepare for release:
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
rhub::check(platform = 'ubuntu-rchk')
rhub::check_with_sanitizers()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
Hi, are there any plans to update the packages to fix the issues that lead to the CRAN archival of the package? I'm asking because the package is listed in the Distributions task view: cran-task-views/Distributions#23
Thanks in advance for your consideration!
For the simulated data sets, there isn't a simulation script associated with the help file:
Though, the scripts can be found in:
https://github.com/lrnv/cort/tree/master/data-raw
I think it would be helpful to link to data-raw/
or directly embed it within the Rd file.
Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)
Automated tests
I get some warnings when running the automated tests manually with devtools::test
(tested with R version 4.0.3 (2020-10-10) on Arch Linux):
test-CortForest.R:5: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".
test-CortForest.R:9: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".
While this is very minor issue, I think such warnings should be fixed (or maybe explicitly set to be ignored) unless there are reasons not to.
Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)
There is a documentation article for each function. However, I think some of the documentation entries should receive additional insight into the implementation, improved explanation of what is happening in the example, and perhaps a periodic change in which data set is used for the computation. (~4 ship with the package but only LifeCycleSavings
from datasets
is used.)
For instance, in the the titular function Cort()
for the cort
package, there is sparse details and no indication of what is contained in a Cort
object:
https://lrnv.github.io/cort/reference/Cort-Class.html
R> cop <- Cort(LifeCycleSavings[,1:3])
#Splitting...
#
# 1 leaves to split...
# 5 leaves to split...
# 10 leaves to split...
# 1 leaves to split...
#Enforcing constraints...
#Done !
R> str(cop)
#Formal class 'Cort' [package "cort"] with 11 slots
# ..@ p_value_for_dim_red: num 0.75
# ..@ number_max_dim : int 3
# ..@ min_node_size : num 1
# ..@ verbose_lvl : num 1
# ..@ vols : num [1:88] 0.05559 0.16964 0.04987 0.06598 0.00195 ...
# ..@ f : num [1:88] 0 0.02 0.02 0.02 0.02 0 0 0 0.02 0 ...
# ..@ p : num [1:88] 1.95e-09 5.75e-02 2.66e-02 7.37e-03 9.78e-04 ...
# ..@ a : num [1:88, 1:3] 0 0.227 0 0 0 ...
# ..@ b : num [1:88, 1:3] 0.227 1 0.227 0.227 0.227 ...
# ..@ data : num [1:50, 1:3] 0.627 0.686 0.824 0.235 0.804 ...
# ..@ dim : int 3
In contrast, the S4 documentation for Matrix()
in the Matrix
package looks like: https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/Matrix-class.html
In another documentation entry, I think a numerical tolerance issue is present due to using ==
instead of all.equal()
:
https://lrnv.github.io/cort/reference/quad_prod-methods.html
R> quad_prod(cop,cop)
# [1] 10.81164
R> quad_norm(cop)
# [1] 10.81164
R> quad_norm(cop) == quad_prod(cop,cop)
# [1] FALSE
Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.