Giter Club home page Giter Club logo

cort's People

Contributors

github-actions[bot] avatar lrnv avatar pdebuyl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cort's Issues

@agisga review

The review from @agisga:

I'm deeply sorry for the delay in the review (the timing turned out to be very far from ideal for me unfortunately). Thank you for the patience.

I think that cort is quality software and a substantial scholarly effort, and I would suggest acceptance subject to minor revisions correcting the issues outlined in this thread.
Some of the issues that I came across were already mentioned by @coatless above, most importantly, the lack of community guidelines, instructions for potential contributors, for issue reporting or other support. So, I will try not to repeat what has already been said.

Vignettes

In addition to what @coatless has mentioned, the second code block in vignette 4 (https://cran.r-project.org/web/packages/cort/vignettes/vignette04_bootstrap_varying_m.html) has the following redundant lines that should be removed:

        test <- 
        train <- 

Automated tests

I get some warnings when running the automated tests manually with devtools::test (tested with R version 4.0.3 (2020-10-10) on Arch Linux):

test-CortForest.R:5: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".

test-CortForest.R:9: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".

While this is very minor issue, I think such warnings should be fixed (or maybe explicitly set to be ignored) unless there are reasons not to.

State of the field

State of the field: Do the authors describe how this software compares to other commonly-used packages?

I'm not very familiar with estimation of non-parametric copulas and so this may not be relevant, but I don't really see a comparison to other R or Python packages. A simple web search shows that other copula estimation packages exist. While the cort paper mentions publications by Li et al. and Nagler et al. as alternative approaches, it is not clear whether these papers correspond to some commonly-used software packages or are mainly theoretical/methodological papers.

Quality of writing

There are some minor grammatical and spelling mistakes in the paper. So, a proofreading would be beneficial. Here are some of the mistakes I noticed:

  • Page 1, third paragraph:

    There also exists several tree-structured piecewise constant density estimators,

    "exists" should be "exist"

    The new models that are implemented in this package tries to solve these issues.

    "tries" should be 'try"

  • Page 2:

    • "pakcage" should be "package"

    • "was design" should be "was designed"

  • Page 2:

    "Examples datasets are included in the package, and the many vignettes gives examples of usecases."

    "Examples" should be "Example", "gives" should be "give", "usecases" should be "use cases".

Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)

@pdebuyl review

Some more comments from @pdebuyl:

Typo, page 2, first line. "efficicent" -> "efficient"

Typo, page 2, acknowledgments. "meaningfull" -> "meaningful".

Grammar, page 2, second paragraph. "by statistician" -> "by statisticians" (if you pick singular instead, then conjugation of "need" should be changed as well).

Affiliation: please add "France" to the first affiliation and spell out the second affiliation (+ add city and country).

For specific issues such as the API naming, you can discuss them in cort issues and report the conclusion here. For changes to the paper, I prefer if they are done in this thread.

Originally posted by @pdebuyl in openjournals/joss-reviews#2653 (comment)

Release cort 0.3.1

Prepare for release:

  • Check current CRAN check results
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • rhub::check(platform = 'ubuntu-rchk')
  • rhub::check_with_sanitizers()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Polish NEWS
  • Review pkgdown reference index for, e.g., missing topics

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted ๐ŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()

Vignettes fixes

Vignettes

Reproducibility

When using the r*() functions, sometimes a code chunk has a set.seed() component. On other chunks, this is missing. Thus, there may unintentionally be a hidden state that is present. Moreover, in one vignette, we have the advocation for a non-standard set_seed() function (note: the _ instead of .).

https://lrnv.github.io/cort/articles/vignette02_cort_clayton.html#dataset-1

At this point, it would be better to explicitly set a dependency on R >= 3.6

Styling

Depending on the Vignette, the R code shown either lacks space or has a lot of space.

For example:

Consistency here would be appreciated. Consider running styler::style_dir("vignettes").

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

Paper fixes

I would also add a note that the estimation routines for copulas are written in C++ within the JOSS paper. At present, the use of C++ isn't stated as it only mentions S4 and R.

State of the field

State of the field: Do the authors describe how this software compares to other commonly-used packages?

I'm not very familiar with estimation of non-parametric copulas and so this may not be relevant, but I don't really see a comparison to other R or Python packages. A simple web search shows that other copula estimation packages exist. While the cort paper mentions publications by Li et al. and Nagler et al. as alternative approaches, it is not clear whether these papers correspond to some commonly-used software packages or are mainly theoretical/methodological papers.

Quality of writing

There are some minor grammatical and spelling mistakes in the paper. So, a proofreading would be beneficial. Here are some of the mistakes I noticed:

  • Page 1, third paragraph:

    There also exists several tree-structured piecewise constant density estimators,

    "exists" should be "exist"

    The new models that are implemented in this package tries to solve these issues.

    "tries" should be 'try"

  • Page 2:

    • "pakcage" should be "package"

    • "was design" should be "was designed"

  • Page 2:

    "Examples datasets are included in the package, and the many vignettes gives examples of usecases."

    "Examples" should be "Example", "gives" should be "give", "usecases" should be "use cases".

Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)

Typo, page 2, first line. "efficicent" -> "efficient"

Typo, page 2, acknowledgments. "meaningfull" -> "meaningful".

Grammar, page 2, second paragraph. "by statistician" -> "by statisticians" (if you pick singular instead, then conjugation of "need" should be changed as well).

Affiliation: please add "France" to the first affiliation and spell out the second affiliation (+ add city and country).

For specific issues such as the API naming, you can discuss them in cort issues and report the conclusion here. For changes to the paper, I prefer if they are done in this thread.

Originally posted by @pdebuyl in openjournals/joss-reviews#2653 (comment)

API fixes

API

snake_case vs. camelCase

Functions within the package switch back and forth between using snake_case and CamelCase. It would be ideal if there could be some consistency.

cort-api-reference

https://lrnv.github.io/cort/reference/index.html

Moreover, could the reference portion of the API on the pkgdown website be split into subcategories? e.g. data/cort/copulas.

Minor note: Cort() uses N= whereas elsewhere all function parameters are lower-case.

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

Release cort 0.3.0

Prepare for release:

  • Check that description is informative
  • Check licensing of included files
  • usethis::use_cran_comments()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • rhub::check(platform = 'ubuntu-rchk')
  • rhub::check_with_sanitizers()
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted ๐ŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • merge the update PR
  • Update install instructions in README and add crna badge to it
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

@coatless review

The review from @coatless:

I'm a huge fan of the package. Long ago I stepped foot into copula estimation with R and was disappointed by the tools available. cort fills portions of that void. Examples with the package are easily accessible outside of R thanks to a pkgdown website. Having said this, I do have a few comments and suggestions around the package. I've listed them below.

README

Installation

Please indicate on the README that installation from GitHub will require the system to have a compiler.

  • Windows: Rtools.
  • macOS: Xcode CLI
  • Linux: r-base-dev (debian)

I would also add a note that the estimation routines for copulas are written in C++ within the JOSS paper. At present, the use of C++ isn't stated as it only mentions S4 and R.

Contributing

There isn't a Contributor Code of Conduct, contributor guidelines, or how to report a problem list. Perhaps this could be added to the end of the README?

Vignettes

Reproducibility

When using the r*() functions, sometimes a code chunk has a set.seed() component. On other chunks, this is missing. Thus, there may unintentionally be a hidden state that is present. Moreover, in one vignette, we have the advocation for a non-standard set_seed() function (note: the _ instead of .).

https://lrnv.github.io/cort/articles/vignette02_cort_clayton.html#dataset-1

At this point, it would be better to explicitly set a dependency on R >= 3.6

Styling

Depending on the Vignette, the R code shown either lacks space or has a lot of space.

For example:

Consistency here would be appreciated. Consider running styler::style_dir("vignettes").

Code

API

snake_case vs. camelCase

Functions within the package switch back and forth between using snake_case and CamelCase. It would be ideal if there could be some consistency.

cort-api-reference

https://lrnv.github.io/cort/reference/index.html

Moreover, could the reference portion of the API on the pkgdown website be split into subcategories? e.g. data/cort/copulas.

Minor note: Cort() uses N= whereas elsewhere all function parameters are lower-case.

Documentation

There is a documentation article for each function. However, I think some of the documentation entries should receive additional insight into the implementation, improved explanation of what is happening in the example, and perhaps a periodic change in which data set is used for the computation. (~4 ship with the package but only LifeCycleSavings from datasets is used.)

For instance, in the the titular function Cort() for the cort package, there is sparse details and no indication of what is contained in a Cort object:

https://lrnv.github.io/cort/reference/Cort-Class.html

R> cop <- Cort(LifeCycleSavings[,1:3])
#Splitting...
#
#     1 leaves to split...
#     5 leaves to split...
#     10 leaves to split...
#     1 leaves to split...
#Enforcing constraints...
#Done !
R> str(cop)
#Formal class 'Cort' [package "cort"] with 11 slots
#  ..@ p_value_for_dim_red: num 0.75
#  ..@ number_max_dim     : int 3
#  ..@ min_node_size      : num 1
#  ..@ verbose_lvl        : num 1
#  ..@ vols               : num [1:88] 0.05559 0.16964 0.04987 0.06598 0.00195 ...
#  ..@ f                  : num [1:88] 0 0.02 0.02 0.02 0.02 0 0 0 0.02 0 ...
#  ..@ p                  : num [1:88] 1.95e-09 5.75e-02 2.66e-02 7.37e-03 9.78e-04 ...
#  ..@ a                  : num [1:88, 1:3] 0 0.227 0 0 0 ...
#  ..@ b                  : num [1:88, 1:3] 0.227 1 0.227 0.227 0.227 ...
#  ..@ data               : num [1:50, 1:3] 0.627 0.686 0.824 0.235 0.804 ...
#  ..@ dim                : int 3

In contrast, the S4 documentation for Matrix() in the Matrix package looks like: https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/Matrix-class.html

In another documentation entry, I think a numerical tolerance issue is present due to using == instead of all.equal():
https://lrnv.github.io/cort/reference/quad_prod-methods.html

R> quad_prod(cop,cop)
# [1] 10.81164
R> quad_norm(cop)
# [1] 10.81164
R> quad_norm(cop) == quad_prod(cop,cop)
# [1] FALSE

Simulated Data

For the simulated data sets, there isn't a simulation script associated with the help file:

Though, the scripts can be found in:

https://github.com/lrnv/cort/tree/master/data-raw

I think it would be helpful to link to data-raw/ or directly embed it within the Rd file.

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

@coatless comments: Docs

@coatless wrote

Regarding code, the documentation governing the object slots is still a bit sparse. It would be good to understand what is being returned within the Cort() object explicitly.

That is, what do the following slot entries correspond to?

p_value_for_dim_red
number_max_dim 
min_node_size
verbose_lvl 
vols
f
p
a
b
data
dim

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

Readme fixes

README

Installation

Please indicate on the README that installation from GitHub will require the system to have a compiler.

  • Windows: Rtools.
  • macOS: Xcode CLI
  • Linux: r-base-dev (debian)

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

Some of the issues that I came across were already mentioned by @coatless above, most importantly, the lack of community guidelines, instructions for potential contributors, for issue reporting or other support. So, I will try not to repeat what has already been said.

Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)

@agisga 2nd round

@lrnv Generally the code changes look good. Here are a few comments based on my previous feedback and the new changes:

  • Vignette 4 still has syntactically wrong and redundant two lines of code here:
  • In issue #11 you say that "the CamelCase and snake_case switches are to seperate constructors and standard (r/d/p/v)-functions from methods and the rest. Take a look at the new reference, you will see that the common uses of the two standards makes sense now that it's ordered." It would be good to mention these or other coding/syntax/style standard (if any) that you follow under the contribution guidelines in the README (or a separate file if it's too much information for the README).
  • The new version of the cort package works flawlessly on my system. However, the first time I tried it, I was getting error messages, which went away after I upgraded furrr to the newest version (from CRAN). I'm not sure what version of furrr I had prior to upgrading. Maybe it would be good to provide version requirements for furrr and all other dependencies in the DESCRIPTION (currently, testthat has a version requirement stated, but other dependencies do not)?

I also had a very quick look at the new changes to the paper:

Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)

Release cort 0.3.2

Prepare for release:

  • Check current CRAN check results
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • rhub::check(platform = 'ubuntu-rchk')
  • rhub::check_with_sanitizers()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Polish NEWS
  • Review pkgdown reference index for, e.g., missing topics

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted ๐ŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()

Simulated dataset fixes

Simulated Data

For the simulated data sets, there isn't a simulation script associated with the help file:

Though, the scripts can be found in:

https://github.com/lrnv/cort/tree/master/data-raw

I think it would be helpful to link to data-raw/ or directly embed it within the Rd file.

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

Automatic tests fixes

Automated tests

I get some warnings when running the automated tests manually with devtools::test (tested with R version 4.0.3 (2020-10-10) on Arch Linux):

test-CortForest.R:5: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".

test-CortForest.R:9: warning: (unknown)
UNRELIABLE VALUE: Future ('<none>') unexpectedly generated random numbers without specifying argument '[future.]seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify argument '[future.]seed', e.g. 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use [future].seed=NULL, or set option 'future.rng.onMisuse' to "ignore".

While this is very minor issue, I think such warnings should be fixed (or maybe explicitly set to be ignored) unless there are reasons not to.

Originally posted by @agisga in openjournals/joss-reviews#2653 (comment)

Documentation fixes

Documentation

There is a documentation article for each function. However, I think some of the documentation entries should receive additional insight into the implementation, improved explanation of what is happening in the example, and perhaps a periodic change in which data set is used for the computation. (~4 ship with the package but only LifeCycleSavings from datasets is used.)

For instance, in the the titular function Cort() for the cort package, there is sparse details and no indication of what is contained in a Cort object:

https://lrnv.github.io/cort/reference/Cort-Class.html

R> cop <- Cort(LifeCycleSavings[,1:3])
#Splitting...
#
#     1 leaves to split...
#     5 leaves to split...
#     10 leaves to split...
#     1 leaves to split...
#Enforcing constraints...
#Done !
R> str(cop)
#Formal class 'Cort' [package "cort"] with 11 slots
#  ..@ p_value_for_dim_red: num 0.75
#  ..@ number_max_dim     : int 3
#  ..@ min_node_size      : num 1
#  ..@ verbose_lvl        : num 1
#  ..@ vols               : num [1:88] 0.05559 0.16964 0.04987 0.06598 0.00195 ...
#  ..@ f                  : num [1:88] 0 0.02 0.02 0.02 0.02 0 0 0 0.02 0 ...
#  ..@ p                  : num [1:88] 1.95e-09 5.75e-02 2.66e-02 7.37e-03 9.78e-04 ...
#  ..@ a                  : num [1:88, 1:3] 0 0.227 0 0 0 ...
#  ..@ b                  : num [1:88, 1:3] 0.227 1 0.227 0.227 0.227 ...
#  ..@ data               : num [1:50, 1:3] 0.627 0.686 0.824 0.235 0.804 ...
#  ..@ dim                : int 3

In contrast, the S4 documentation for Matrix() in the Matrix package looks like: https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/Matrix-class.html

In another documentation entry, I think a numerical tolerance issue is present due to using == instead of all.equal():
https://lrnv.github.io/cort/reference/quad_prod-methods.html

R> quad_prod(cop,cop)
# [1] 10.81164
R> quad_norm(cop)
# [1] 10.81164
R> quad_norm(cop) == quad_prod(cop,cop)
# [1] FALSE

Originally posted by @coatless in openjournals/joss-reviews#2653 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.