cort,lrnv

@coatless review

The review from @coatless:

I'm a huge fan of the package. Long ago I stepped foot into copula estimation with R and was disappointed by the tools available. cort fills portions of that void. Examples with the package are easily accessible outside of R thanks to a pkgdown website. Having said this, I do have a few comments and suggestions around the package. I've listed them below.

README

Installation

Please indicate on the README that installation from GitHub will require the system to have a compiler.

Windows: Rtools.
macOS: Xcode CLI
Linux: r-base-dev (debian)

I would also add a note that the estimation routines for copulas are written in C++ within the JOSS paper. At present, the use of C++ isn't stated as it only mentions S4 and R.

Contributing

There isn't a Contributor Code of Conduct, contributor guidelines, or how to report a problem list. Perhaps this could be added to the end of the README?

Vignettes

Reproducibility

When using the r*() functions, sometimes a code chunk has a set.seed() component. On other chunks, this is missing. Thus, there may unintentionally be a hidden state that is present. Moreover, in one vignette, we have the advocation for a non-standard set_seed() function (note: the _ instead of .).

https://lrnv.github.io/cort/articles/vignette02_cort_clayton.html#dataset-1

At this point, it would be better to explicitly set a dependency on R >= 3.6

Styling

Depending on the Vignette, the R code shown either lacks space or has a lot of space.

For example:

Lacking white space: https://lrnv.github.io/cort/articles/vignette03_ecbkm.html#current-implementation-1
Spaced out: https://lrnv.github.io/cort/articles/vignette04_bootstrap_varying_m.html#fitting-function-1

Consistency here would be appreciated. Consider running styler::style_dir("vignettes").

Code

API

snake_case vs. camelCase

Functions within the package switch back and forth between using snake_case and CamelCase. It would be ideal if there could be some consistency.

https://lrnv.github.io/cort/reference/index.html

Moreover, could the reference portion of the API on the pkgdown website be split into subcategories? e.g. data/cort/copulas.

Minor note: Cort() uses N= whereas elsewhere all function parameters are lower-case.

Documentation

There is a documentation article for each function. However, I think some of the documentation entries should receive additional insight into the implementation, improved explanation of what is happening in the example, and perhaps a periodic change in which data set is used for the computation. (~4 ship with the package but only LifeCycleSavings from datasets is used.)

For instance, in the the titular function Cort() for the cort package, there is sparse details and no indication of what is contained in a Cort object:

https://lrnv.github.io/cort/reference/Cort-Class.html

R> cop <- Cort(LifeCycleSavings[,1:3])
#Splitting...
#
#     1 leaves to split...
#     5 leaves to split...
#     10 leaves to split...
#     1 leaves to split...
#Enforcing constraints...
#Done !
R> str(cop)
#Formal class 'Cort' [package "cort"] with 11 slots
#  ..@ p_value_for_dim_red: num 0.75
#  ..@ number_max_dim     : int 3
#  ..@ min_node_size      : num 1
#  ..@ verbose_lvl        : num 1
#  ..@ vols               : num [1:88] 0.05559 0.16964 0.04987 0.06598 0.00195 ...
#  ..@ f                  : num [1:88] 0 0.02 0.02 0.02 0.02 0 0 0 0.02 0 ...
#  ..@ p                  : num [1:88] 1.95e-09 5.75e-02 2.66e-02 7.37e-03 9.78e-04 ...
#  ..@ a                  : num [1:88, 1:3] 0 0.227 0 0 0 ...
#  ..@ b                  : num [1:88, 1:3] 0.227 1 0.227 0.227 0.227 ...
#  ..@ data               : num [1:50, 1:3] 0.627 0.686 0.824 0.235 0.804 ...
#  ..@ dim                : int 3

In contrast, the S4 documentation for Matrix() in the Matrix package looks like: https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/Matrix-class.html

In another documentation entry, I think a numerical tolerance issue is present due to using == instead of all.equal():
https://lrnv.github.io/cort/reference/quad_prod-methods.html

R> quad_prod(cop,cop)
# [1] 10.81164
R> quad_norm(cop)
# [1] 10.81164
R> quad_norm(cop) == quad_prod(cop,cop)
# [1] FALSE

Simulated Data

For the simulated data sets, there isn't a simulation script associated with the help file:

Though, the scripts can be found in:

https://github.com/lrnv/cort/tree/master/data-raw

I think it would be helpful to link to data-raw/ or directly embed it within the Rd file.

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

Release cort 0.3.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

@coatless comments: Paper

@coatless wrote:

Paper-wise, there is a missing note that the underlying non-parametric algorithms are written in C++. It would be great to link out to the RcppArmadillo/Rcpp paper here.

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

@agisga review

The review from @agisga:

I'm deeply sorry for the delay in the review (the timing turned out to be very far from ideal for me unfortunately). Thank you for the patience.

I think that cort is quality software and a substantial scholarly effort, and I would suggest acceptance subject to minor revisions correcting the issues outlined in this thread.
Some of the issues that I came across were already mentioned by @coatless above, most importantly, the lack of community guidelines, instructions for potential contributors, for issue reporting or other support. So, I will try not to repeat what has already been said.

Vignettes

In addition to what @coatless has mentioned, the second code block in vignette 4 (https://cran.r-project.org/web/packages/cort/vignettes/vignette04_bootstrap_varying_m.html) has the following redundant lines that should be removed:

        test <- 
        train <-

Automated tests

I get some warnings when running the automated tests manually with devtools::test (tested with R version 4.0.3 (2020-10-10) on Arch Linux):

test-CortForest.R:5: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".

test-CortForest.R:9: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".

While this is very minor issue, I think such warnings should be fixed (or maybe explicitly set to be ignored) unless there are reasons not to.

State of the field

State of the field: Do the authors describe how this software compares to other commonly-used packages?

I'm not very familiar with estimation of non-parametric copulas and so this may not be relevant, but I don't really see a comparison to other R or Python packages. A simple web search shows that other copula estimation packages exist. While the cort paper mentions publications by Li et al. and Nagler et al. as alternative approaches, it is not clear whether these papers correspond to some commonly-used software packages or are mainly theoretical/methodological papers.

Quality of writing

There are some minor grammatical and spelling mistakes in the paper. So, a proofreading would be beneficial. Here are some of the mistakes I noticed:

Page 1, third paragraph:

There also exists several tree-structured piecewise constant density estimators,

"exists" should be "exist"

The new models that are implemented in this package tries to solve these issues.

"tries" should be 'try"
Page 2:
- "pakcage" should be "package"
- "was design" should be "was designed"
Page 2:

"Examples datasets are included in the package, and the many vignettes gives examples of usecases."

"Examples" should be "Example", "gives" should be "give", "usecases" should be "use cases".

Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)

Readme fixes

README

Installation

Please indicate on the README that installation from GitHub will require the system to have a compiler.

Windows: Rtools.
macOS: Xcode CLI
Linux: r-base-dev (debian)

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

Some of the issues that I came across were already mentioned by @coatless above, most importantly, the lack of community guidelines, instructions for potential contributors, for issue reporting or other support. So, I will try not to repeat what has already been said.

Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)

CRAN resurrection?

Hi, are there any plans to update the packages to fix the issues that lead to the CRAN archival of the package? I'm asking because the package is listed in the Distributions task view: cran-task-views/Distributions#23
Thanks in advance for your consideration!

Documentation fixes

Documentation

There is a documentation article for each function. However, I think some of the documentation entries should receive additional insight into the implementation, improved explanation of what is happening in the example, and perhaps a periodic change in which data set is used for the computation. (~4 ship with the package but only LifeCycleSavings from datasets is used.)

For instance, in the the titular function Cort() for the cort package, there is sparse details and no indication of what is contained in a Cort object:

https://lrnv.github.io/cort/reference/Cort-Class.html

R> cop <- Cort(LifeCycleSavings[,1:3])
#Splitting...
#
#     1 leaves to split...
#     5 leaves to split...
#     10 leaves to split...
#     1 leaves to split...
#Enforcing constraints...
#Done !
R> str(cop)
#Formal class 'Cort' [package "cort"] with 11 slots
#  ..@ p_value_for_dim_red: num 0.75
#  ..@ number_max_dim     : int 3
#  ..@ min_node_size      : num 1
#  ..@ verbose_lvl        : num 1
#  ..@ vols               : num [1:88] 0.05559 0.16964 0.04987 0.06598 0.00195 ...
#  ..@ f                  : num [1:88] 0 0.02 0.02 0.02 0.02 0 0 0 0.02 0 ...
#  ..@ p                  : num [1:88] 1.95e-09 5.75e-02 2.66e-02 7.37e-03 9.78e-04 ...
#  ..@ a                  : num [1:88, 1:3] 0 0.227 0 0 0 ...
#  ..@ b                  : num [1:88, 1:3] 0.227 1 0.227 0.227 0.227 ...
#  ..@ data               : num [1:50, 1:3] 0.627 0.686 0.824 0.235 0.804 ...
#  ..@ dim                : int 3

In contrast, the S4 documentation for Matrix() in the Matrix package looks like: https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/Matrix-class.html

In another documentation entry, I think a numerical tolerance issue is present due to using == instead of all.equal():
https://lrnv.github.io/cort/reference/quad_prod-methods.html

R> quad_prod(cop,cop)
# [1] 10.81164
R> quad_norm(cop)
# [1] 10.81164
R> quad_norm(cop) == quad_prod(cop,cop)
# [1] FALSE

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

@pdebuyl review

Some more comments from @pdebuyl:

Typo, page 2, first line. "efficicent" -> "efficient"

Typo, page 2, acknowledgments. "meaningfull" -> "meaningful".

Grammar, page 2, second paragraph. "by statistician" -> "by statisticians" (if you pick singular instead, then conjugation of "need" should be changed as well).

Affiliation: please add "France" to the first affiliation and spell out the second affiliation (+ add city and country).

For specific issues such as the API naming, you can discuss them in cort issues and report the conclusion here. For changes to the paper, I prefer if they are done in this thread.

Originally posted by @pdebuyl in openjournals/joss-reviews#2653 (comment)

Simulated dataset fixes

Simulated Data

For the simulated data sets, there isn't a simulation script associated with the help file:

Though, the scripts can be found in:

https://github.com/lrnv/cort/tree/master/data-raw

I think it would be helpful to link to data-raw/ or directly embed it within the Rd file.

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

Vignettes fixes

Vignettes

Reproducibility

When using the r*() functions, sometimes a code chunk has a set.seed() component. On other chunks, this is missing. Thus, there may unintentionally be a hidden state that is present. Moreover, in one vignette, we have the advocation for a non-standard set_seed() function (note: the _ instead of .).

https://lrnv.github.io/cort/articles/vignette02_cort_clayton.html#dataset-1

At this point, it would be better to explicitly set a dependency on R >= 3.6

Styling

Depending on the Vignette, the R code shown either lacks space or has a lot of space.

For example:

Lacking white space: https://lrnv.github.io/cort/articles/vignette03_ecbkm.html#current-implementation-1
Spaced out: https://lrnv.github.io/cort/articles/vignette04_bootstrap_varying_m.html#fitting-function-1

Consistency here would be appreciated. Consider running styler::style_dir("vignettes").

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

Paper fixes

I would also add a note that the estimation routines for copulas are written in C++ within the JOSS paper. At present, the use of C++ isn't stated as it only mentions S4 and R.

State of the field

State of the field: Do the authors describe how this software compares to other commonly-used packages?

I'm not very familiar with estimation of non-parametric copulas and so this may not be relevant, but I don't really see a comparison to other R or Python packages. A simple web search shows that other copula estimation packages exist. While the cort paper mentions publications by Li et al. and Nagler et al. as alternative approaches, it is not clear whether these papers correspond to some commonly-used software packages or are mainly theoretical/methodological papers.

Quality of writing

There are some minor grammatical and spelling mistakes in the paper. So, a proofreading would be beneficial. Here are some of the mistakes I noticed:

Page 1, third paragraph:

There also exists several tree-structured piecewise constant density estimators,

"exists" should be "exist"

The new models that are implemented in this package tries to solve these issues.

"tries" should be 'try"
Page 2:
- "pakcage" should be "package"
- "was design" should be "was designed"
Page 2:

"Examples datasets are included in the package, and the many vignettes gives examples of usecases."

"Examples" should be "Example", "gives" should be "give", "usecases" should be "use cases".

Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)

Typo, page 2, first line. "efficicent" -> "efficient"

Typo, page 2, acknowledgments. "meaningfull" -> "meaningful".

Grammar, page 2, second paragraph. "by statistician" -> "by statisticians" (if you pick singular instead, then conjugation of "need" should be changed as well).

Affiliation: please add "France" to the first affiliation and spell out the second affiliation (+ add city and country).

For specific issues such as the API naming, you can discuss them in cort issues and report the conclusion here. For changes to the paper, I prefer if they are done in this thread.

Originally posted by @pdebuyl in openjournals/joss-reviews#2653 (comment)

@coatless comments: Docs

@coatless wrote

Regarding code, the documentation governing the object slots is still a bit sparse. It would be good to understand what is being returned within the Cort() object explicitly.

That is, what do the following slot entries correspond to?

p_value_for_dim_red
number_max_dim 
min_node_size
verbose_lvl 
vols
f
p
a
b
data
dim

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

API fixes

API

snake_case vs. camelCase

Functions within the package switch back and forth between using snake_case and CamelCase. It would be ideal if there could be some consistency.

https://lrnv.github.io/cort/reference/index.html

Moreover, could the reference portion of the API on the pkgdown website be split into subcategories? e.g. data/cort/copulas.

Minor note: Cort() uses N= whereas elsewhere all function parameters are lower-case.

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

@agisga 2nd round

@lrnv Generally the code changes look good. Here are a few comments based on my previous feedback and the new changes:

Vignette 4 still has syntactically wrong and redundant two lines of code here:

cort/vignettes/vignette04_bootstrap_varying_m.Rmd

Line 55 in dd6d5ab

test <-
In issue #11 you say that "the CamelCase and snake_case switches are to seperate constructors and standard (r/d/p/v)-functions from methods and the rest. Take a look at the new reference, you will see that the common uses of the two standards makes sense now that it's ordered." It would be good to mention these or other coding/syntax/style standard (if any) that you follow under the contribution guidelines in the README (or a separate file if it's too much information for the README).
The new version of the cort package works flawlessly on my system. However, the first time I tried it, I was getting error messages, which went away after I upgraded furrr to the newest version (from CRAN). I'm not sure what version of furrr I had prior to upgrading. Maybe it would be good to provide version requirements for furrr and all other dependencies in the DESCRIPTION (currently, testthat has a version requirement stated, but other dependencies do not)?

I also had a very quick look at the new changes to the paper:

The proper citation info for the copula R package is given at https://cran.r-project.org/web/packages/copula/citation.html I don't think it's appropriate to just give a link, when there are several peer-reviewed JSS publications.

Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)

Release cort 0.3.1

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()

Release cort 0.3.2

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()

Automatic tests fixes

Automated tests

I get some warnings when running the automated tests manually with devtools::test (tested with R version 4.0.3 (2020-10-10) on Arch Linux):

test-CortForest.R:5: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".

test-CortForest.R:9: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".

While this is very minor issue, I think such warnings should be fixed (or maybe explicitly set to be ignored) unless there are reasons not to.

Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)

lrnv / cort Goto Github PK

cort's People

Stargazers

Watchers

Forkers

cort's Issues

README

Installation

Contributing

Vignettes

Reproducibility

Styling

Code

API

snake_case vs. camelCase

Documentation

Simulated Data

README

Installation

Documentation

Simulated Data

Vignettes

Reproducibility

Styling

API

snake_case vs. camelCase

Recommend Projects

Recommend Topics

Recommend Org