Giter Club home page Giter Club logo

rigr's People

Contributors

adw96 avatar bdwilliamson avatar cwolock avatar taylorokonek avatar yiqunchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rigr's Issues

Make styling consistent

Currently, function names and objects follow inconsistent styling. Ideally, we would follow a style guide (e.g., Hadley Wickham's or Google's) so that it is easier to read and debug the code.

I propose following Hadley's for the most part, and using underscores (_) in both function and object names.

Also, ideally the main user-facing functions will have their own files (e.g., descrip.R) and internal helper functions will live in a separate file (e.g., descrip_utils.R or possibly a separate file for each internal function).

Remove `version` parameter from functions, add version and/or package creation date when package is loaded

It will be far more maintainable to have the version listed in one spot (for example, when the package is loaded) as opposed to in individual functions. version parameter has already been removed from regress, descrip, and both of their utility functions, but the parameter likely still exists elsewhere.

At the suggestion of @bdwilliamson, we could consider is adding a print message when the package gets loaded that says something about the version and/or the package creation date.

This issue builds off of #33

online documentation

It would be wonderful to have online documentation. I haven't looked into this for many years but I recall this being pretty easy with github.io and pkgdown.

force users to write out the full fnctl argument in `regress`

We no longer want to support shorted fnctl strings in regress

Discussed in #53

Originally posted by taylorokonek August 9, 2021
Currently, users can enter a shortened character string for the fnctl argument in regress, and so long as it corresponds to a unique substring, the function will run. As an example, a user could simply set fnctl = "g", and the function would take this to mean fnctl = "geometric mean". Personally, I don't think there's a huge harm in forcing users to spell out the fnctl argument. Does anyone have any insight as to why this particular functionality was introduced (@bdwilliamson)?

interaction terms in regress

notice in the output of the following, the interaction term is not clearly labelled

also the overall F-statistic from regress does not agree with the F-statistic from lm

regress("mean",fev~ht*as.integer(sex))

devtools::check() fails

As a result of some of the change is associated with modernizing the package, devtools::check() fails.

`equal` function could be cleaner

In termTraverse.R (soon to be in regress_utils.R), there is a function defined called equal, and the value returned calls the function itself (not sure if there are any implications to this?). The function seems to be doing the same thing as checking if length(unique(x)) == 1, other than that they would handle vectors that include NAs differently. This is mostly a reminder to myself to edit this helper function to be a bit cleaner in the future.

equal <- function(x){
if(length(x)==1){
return(TRUE)
} else {
if(x[1]!=x[2]){
return(FALSE)
} else {
return(equal(x[-1]))
}
}
}

bplot axis label issue

the followiong produces a plot with double x axis labels

bplot(fev,as.integer(agecat),xlab="Age Category",ylab="FEV (liters)")

Does `ifelse1` need to exist?

I think this function has identical behavior and interface as ifelse - I'll look into it but it doesn't seem like we need this.

Incorrect label in ttest

I downloaded the rigr package on 7/23 and was using ttest() and got this result. Notice the incorrect label above the output that says "One-sample t-test"

with(hiv,ttest(cd4[art==0],cd4[art==1]))

Call:
ttest(var1 = cd4[art == 0], var2 = cd4[art == 1])

Two-sample t-test allowing for unequal variances :

One-sample t-test :

Summary:
Group Obs Missing Mean Std. Err. Std. Dev. 95% CI
cd4[art == 0] 78 14 388.5 32.5 260 [323.5, 454]
cd4[art == 1] 50 19 298.5 32.9 183 [231.3, 366]
Difference 128 33 90.1 46.3 [-2.02, 182]

Ho: difference in means = 0 ;
Ha: difference in means != 0
t = 1.946 , df = 80.7
Pr(|T| > t) = 0.0551195

salary dataset as dataset

Currently both some vignettes and some examples run the following to get the salary data

salary <- read.table("http://www.emersonstatistics.com/datasets/salary.txt", header = TRUE, stringsAsFactors = FALSE)

This should instead be a documented dataset released with the package.

This impact the vignette regress_intro amongst others.

CRAN

It would be great to be able to install this package from CRAN -- this would enormously help many of our students with using it for the first time. Target date: Sept 13.

why does `regress` no longer support proportional hazards regression, and what do we want moving forward?

The documentation for regress seems to suggest you can do proportional hazards regression using fnctl = "hazard", but the code will actually throw an error if you specify fnctl = "hazard", stating "proportional hazards regression no longer supported". That being said, the remainder of the code seems to still have if/else cases and other sections in case fnctl = "hazard", so is this error something that was thrown in recently? @bdwilliamson do you have any insight?

Another question is whether we want regress to be able to handle proportional hazards regression moving forward.

factors in bplot

bplot should be able to handle factors

bplot(fev,agecat,xlab="Age Category",ylab="FEV (liters)") # fails if agecat is a factor

`wilcoxon()` provides two (different) p-values when `exact = TRUE`

As currently written, wilcoxon() is (as far as I can tell) largely a copy-pasted version of wilcox.test() from the stats package. That copied code provides a p-value (variable PVAL in the code), either using an exact test, or Normal approximation.

One of the additions in wilcoxon() is the inf return object, which is described as "a formatted table of inference values, for printing." This object provides the PVAL mentioned above, but also includes the Z-score and corresponding Normal approximation p-value. This p-value will correspond to PVAL when exact = FALSE. However, when exact = TRUE is used, PVAL is an exact p-value, but the Normal approximation p-value is still included in inf.

Long story short: when exact = TRUE, wilcoxon() provides both the exact and approximate p-value, which I think would be confusing to a user. To see this run:

set.seed(2)
y <- rnorm(100)
wilcoxon(y, exact = TRUE)

The inf return object has p-value 0.70012 (exact) and 0.69762 (Normal approximation).

My feeling is that we shouldn't report two different p-values. Thoughts?

intercepts in linear models

If I recall correctly, you can use the argument intercept = FALSE in regress(). Is it possible to add the -1 functionality?

Make TRUE and FALSE consistent

Currently, T and F are sometimes used as shorthand for TRUE and FALSE. While this is possible, it isn't good practice -- TRUE and FALSE are protected, while T and F are valid object names and thus can be overwritten by the user.

We should use the full words!

changed behavior for descrp

The newest version of descrip prints, by default, the columns Max Restriction, FirstEvent, LastEvent, isDate. Those seem confusing for the average user, especially if none of the measures is a time-to-event. Can we have descrip not print those columns by default?

Also, at the bottom of the descrip output I see
attr(,"class")
[1] "uDescriptives"
which will be confusing to the naive user.

Finally, all statistics are printed in scientific notation which is different from before and harder to read.

Having said all this, I just figured out that I think the issue is simply that the print.uDescriptives function has not been included in the rigr package?

Streamline `ttest.R` code

The code for this function is much more verbose than it needs to be, including many statements like if (condition){}; else{do stuff}. For ease of future maintenance this should be cleaned up.

regress doesn't accept factors

Other regression commands in R accept factor covariates but
regress("mean", fev ~ ht+sex,data=fevdata) # fails if sex is a factor

Document helper functions

Many helper functions (i.e., non-exported functions) have no documentation. I think a good goal is to have all functions be documented in some way.

remove `version` parameter from `descrip()`?

Currently if you specify version = TRUE in descrip(), the function will return "20160730" and nothing else. I'd imagine we either want to to return a date that's a bit closer to today, or just remove this parameter entirely. Personally, I'm in favor of removing it entirely. Any objections or other thoughts?

descrip no longer works is intended on `mri` data

A previous line in test_cases vignette was descrip(mri). This no longer works. Possible reasons include that mri is now a tibble rather than a dataframe, but perhaps also because it contains variables of different types (not only numeric).

Fix `ttesti` documentation

ttesti.R documentation needs some changes - a few of the parameter descriptions aren't working properly.

Streamline `wilcoxon.R` code

wilcoxon.R has some redundancies, including calculating the p-value multiple times. The whole function should be simplified.

bug in ttesti

one-sided p-value in ttesti wrong for Ha: <. Fixed in the attached file (to find changes, search for "JPH")
my_ttesti.txt

what is 'stratify' supposed to do in bplot?

I can't seem to get an example working (that doesn't throw an error) where the 'strata' argument in bplot is used. Is this argument supposed to do something analogous to facet_wrap in ggplot, or something else?

coefficient names unclear for categorical variables in `regress`

Note that when the following code is run:

library(rigr); data(mri)
regress("mean", atrophy ~ sex, data=mri)

the coefficient names printed are "Intercept" and "sex", whereas from lm() we'd have "(Intercept)" and "sexMale", this telling us which value of the categorical variable the coefficient corresponds to. We should have something similar to this in our print method for regress. Maybe something like "sex: Male"?

`proptest`

Implement proptest function as a branch from ttest. Add continuity correction, which is not currently implemented.

Alternative hypothesis in ttest

The alternative hypothesis in test is specified with the argument "test.type". It would be better to be consistent with other R functions (particularly t.test) and use the argument "alternative". The attached version of ttest changes the
argument name everywhere.
my_ttest.txt

lincom

It would be very useful if lincom could handle lm objects as well as uRegress objects

`wilxcoxon()` returns incorrect `data.name`

This seems to me to be a scoping issue, since wilcoxon() calls wilcoxon.do(). I tried using sys.frame(1) in the substitute(), without success. Commented out failing unit tests for this - restore these when fixed.

updated functions from scott emerson

In an earlier email from Scott Emerson to me: "I have some updated functions that have not yet been put in the package."

Could the summer TA team please that reach out to him about this and incorporate those edits?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.