Giter Club home page Giter Club logo

statcheck's Introduction

statcheck

What is statcheck?

statcheck is a “spellchecker” for statistics. It checks whether your p-values match their accompanying test statistic and degrees of freedom.

statcheck searches for null-hypothesis significance test (NHST) in APA style (e.g., t(28) = 2.2, p < .05). It recalculates the p-value using the reported test statistic and degrees of freedom. If the reported and computed p-values don’t match, statcheck will flag the result as an error.

What can I use statcheck for?

statcheck is mainly useful for:

  1. Self-checks: you can use statcheck to make sure your manuscript doesn’t contain copy-paste errors or other inconsistencies before you submit it to a journal.
  2. Peer review: editors and reviewers can use statcheck to check submitted manuscripts for statistical inconsistencies. They can ask authors for a correction or clarification before publishing a manuscript.
  3. Research: statcheck can be used to automatically extract statistical test results from articles that can then be analyzed. You can for instance investigate whether you can predict statistical inconsistencies (see e.g., Nuijten et al., 2017), or use it to analyze p-value distributions (see e.g., Hartgerink et al., 2016).

How does statcheck work?

The algorithm behind statcheck consists of four basic steps:

  1. Convert pdf and html articles to plain text files.
  2. Search the text for instances of NHST results. Specifically, statcheck can recognize t-tests, F-tests, correlations, z-tests, $\chi^2$ -tests, and Q-tests (from meta-analyses) if they are reported completely (test statistic, degrees of freedom, and p-value) and in APA style.
  3. Recompute the p-value using the reported test statistic and degrees of freedom.
  4. Compare the reported and recomputed p-value. If the reported p-value does not match the computed one, the result is marked as an inconsistency (Error in the output). If the reported p-value is significant and the computed is not, or vice versa, the result is marked as a gross inconsistency (DecisionError in the output).

statcheck takes into account correct rounding of the test statistic, and has the option to take into account one-tailed testing. See the manual for details.

Installation and use

For detailed information about installing and using statcheck, see the manual on RPubs.

statcheck.io is a web-based interface for statcheck.

statcheck's People

Contributors

chartgerink avatar krz avatar michelenuijten avatar sachaepskamp avatar seanrife avatar tjmahr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

statcheck's Issues

reconsider ignoring minus signs followed by space

see this test:

weird encoding in minus sign followed by space

test_that("t-values with a weird minus sign and a space do not result in errors", {
txt1 <- " t(553) = − 4.46, p < .0001" # this is an em dash or something

expect_output(statcheck(txt1, messages = FALSE), "did not find any results")

})

not sure why I decided why these cases should be ignored. seems reasonable to include them.

Tables?

We are currently doing an error-detection hackathon (related to the ERROR project) ... and were wondering whether you'd be interested in having statcheck extended to (HMTL) tables ... or whether that would work better as a separate extension package? Would be great to hear your thoughts ...

optional parentheses for DFs

It's reasonably common for HTML to write the degrees of freedom as a subscript without a parentheses.

Examples:

  • T28 = -2.20, p = .03
  • F2,28 = 2.20, p = .03

When reading the text, the formatting is lost, so it appears without the parentheses:

  • T28 = -2.20, p = .03
  • F2,28 = 2.20, p = .03

Proposed solution:

Make the parentheses optional. Just adding a ? after the parentheses in the regexes is simple. But the code to determine the statistical test relies on the open parenthesis.

Test code

test_that("t-tests with without parentheses are retrieved from text", {
  txt1 <- "t 28 = 2.20, p = .03"
  txt2 <- "t28 = 2.20, p = .03"

  result <- statcheck(c(txt1, txt2), messages = FALSE)

  expect_equal(nrow(result), 2)
})

Issue with recognising string F = 0 or 1

Hi Michele,
Statcheck doesn't seem to be able to recognise the below string. Could you please advise what I might be doing wrong? Thanks!

statcheck("F(1,210) = 0, p = 1")
statcheck("F(1,210) = 1, p = 0")

incorrect OneTail output?

When I run the code below, the output says that OneTail is FALSE indicating that the results is incorrect if it were a one-tailed test (which it is, in the example). That doesn't seem right, though.

statcheck("this is a one-tailed test: t(40)=1.80,p<.04")

When I change p<.04 in p<.05 OneTail becomes TRUE, yet when I change it into p<.06 it becomes FALSE again. If I say p=.04, it changes back to TRUE. I'm not sure, but the issue seems to be line 1178 of statcheck.R, or am I missing something? Thanks.

Best,
Tom

"test (15)" scraped as "t(15)"

I saw something in the statcheck scrape that might be a bug. I think generic tests may get scraped as t-tests.

For example, in this paper, the authors write "Friedman's test (15) = 62.92", and statcheck scrapes it as "t(15) = 62.92".

I don't know much about Friedman's test (or nonparametric tests in general), but it seems to use its own Q-statistic that is closer to a chi-square distribution.

I don't think this is a common situation, of course, but if the regexp could be tweaked to avoid mistaking "...test (df)" for "t(df)" it would improve the specificity of the statcheck program.

don't run txt-to-file tests if there are no test materials/articles available

Statcheck checks if functions like checkHTML() work by scanning "test articles". These are not synced with git, because of copyright issues. That means that if you download statcheck from GitHub, you will fail a lot of tests, because there are no test articles. Skip these tests if the articles are not there (maybe with a printed message).

make failsafe if you feed checkPDF or checkHTML another article type

example:

checkPDF() # chose a html file
PDF error: May not be a PDF file (continuing anyway)
PDF error (2): Illegal character <21> in hex string
PDF error (4): Illegal character <4f> in hex string
PDF error (6): Illegal character <54> in hex string
PDF error (7): Illegal character <59> in hex string
PDF error (8): Illegal character <50> in hex string
PDF error (11): Illegal character <68> in hex string
PDF error (12): Illegal character <74> in hex string

(etc.)

error not detected when reported p < alpha < computed p

str(statcheck("t(10) = 3, p = .009", alpha = .01))

Classes ‘statcheck’ and 'data.frame':   1 obs. of  15 variables:
 $ Source             : Factor w/ 1 level "1": 1
 $ Statistic          : Factor w/ 1 level "t": 1
 $ df1                : logi NA
 $ df2                : num 10
 $ Test.Comparison    : Factor w/ 1 level "=": 1
 $ Value              : num 3
 $ Reported.Comparison: Factor w/ 1 level "=": 1
 $ Reported.P.Value   : num 0.009
 $ Computed           : num 0.0133
 $ Raw                : Factor w/ 1 level "t(10) = 3, p = .009": 1
 $ Error              : logi FALSE
 $ DecisionError      : logi FALSE
 $ OneTail            : logi TRUE
 $ OneTailedInTxt     : logi FALSE
 $ APAfactor          : num 1

issues to add/update in the branch feature-pdftools

  • add doi's of all test articles to the code, so that others can check which articles were used for testing
  • check the "manual" csv files in test directory; these contain manually extracted stats for test articles but are not finished. these might be allowed to upload.
  • when reading in manual csv files, remove last two rows (only show total nr of extracted results)
  • rewrite tests to focus on pdftools as default method
  • write explicit tests to test difference in retrieval between pdftools and xpdf

Skip files with too long filename instead of throwing an error

When a filename is very long, it can't be opened (for some reason). This causes statcheck to throw an error. It would be better to throw an informative message instead, so that when you scan an entire folder of papers and one has a file name that's too long, you just skip the long file and still scan the rest.

Error message:

Importing HTML files...
|== | 2%Error in readChar(con, file.info(fileName)$size, useBytes = TRUE) :
cannot open the connection
In addition: Warning message:
In readChar(con, file.info(fileName)$size, useBytes = TRUE) :
cannot open file 'C:/Users/mnuijten/surfdrive/UVT/Projects/EffectivenessStatcheck/effectiveness_statcheck/articles/PS/2013/A Longitudinal Cluster-Randomized Controlled Study on the Accumulating Effects of Individualized Literacy Instruction on Students’ Reading From First Through Third Grade.htm': No such file or directory

Warnings due (?) to file closing

After scanning a directory with 70 files in it, I got this message:
There were 50 or more warnings (use warnings() to see the first 50)

The output of warnings() is attached.
warnings.txt

Release plan 1.4 and stale develop branch

Thanks for this interesting project.

I'm wondering what the release plan is for version 1.4. There are prereleases for 1.4 https://github.com/MicheleNuijten/statcheck/releases, but no release yet.

Another thing I'm wondering about is the stale (?) develop branch. What is your plan with this branch? Merge it into master?https://github.com/MicheleNuijten/statcheck/network. There are interesting features in there regarding PDF parsing with pdftools. It might be better to abandon the develop branch after merging it into master.

Incorrect error diagnosis for one-sided t-tests

I think I found a bug in how statcheck() diagnoses errors in one-sided t-tests. Specifically, when the sample mean is in the "wrong" tail it is inappropriate to calculate the one-tailed p-value by halfing the two-sided p-value as is done by statcheck().

mean(sleep$extra[sleep$group == 1])
# [1] 0.75
mean(sleep$extra[sleep$group == 2])
# [1] 2.33

t.test(
  sleep$extra[sleep$group == 1]
  , sleep$extra[sleep$group == 2]
  , var.equal = T
)
# Two Sample t-test
# 
# data:  sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
# t = -1.8608, df = 18, p-value = 0.07919
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   -3.363874  0.203874
# sample estimates:
#   mean of x mean of y 
# 0.75      2.33 

t.test(
  sleep$extra[sleep$group == 1]
  , sleep$extra[sleep$group == 2]
  , alternative = "greater"
  , var.equal = T
)
# 	Two Sample t-test
# 
# data:  sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
# t = -1.8608, df = 18, p-value = 0.9604
# alternative hypothesis: true difference in means is greater than 0
# 95 percent confidence interval:
#  -3.052378       Inf
# sample estimates:
# mean of x mean of y 
#      0.75      2.33 

Hence, statcheck incorrectly indicates erroneous reporting, whereas the, in this case, incorrect p-value is deemed correct.

statcheck:::statcheck("t(18) = -1.86, p = 0.960", OneTailedTests = TRUE)[, c("Reported.P.Value", "Computed", "Error", "OneTail")]
# Reported.P.Value   Computed Error OneTail
# 1           0.96 0.03965356  TRUE   FALSE

statcheck:::statcheck("t(18) = -1.86, p = 0.039", OneTailedTests = TRUE)[, c("Reported.P.Value", "Computed", "Error", "OneTail")]
# Reported.P.Value   Computed Error OneTail
# 1          0.039 0.03965356 FALSE   FALSE

Since statcheck() can't know what the tested hypothesis is, it should probably always consider both possibilities and err on the side of caution?

improve progress bar

update the progress bar(s) such that when the progress bar is completed, the result is in (now you still have to wait quite some time after the progress bar "Extracting stats" is full)

Error if there is a space between two numbers after a test statistic and before a decimal in a reported statistical test result

Errors are caused if there is a space between two numbers after a test statistic and before a decimal in a reported statistical test result. (Scanned several thousand papers and this only occurred once so it's unlikely to pop up too often!)

Examples:

statcheck::statcheck(" z = 1 1 .25, p = .806. ")
#> Extracting statistics...
#> 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
#> Error in if (lower[i] < 0) {: missing value where TRUE/FALSE needed
statcheck::statcheck("t(123) = 1 0.25, p = .806")
#> Extracting statistics...
#> 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
#> Error in if (lower[i] < 0) {: missing value where TRUE/FALSE needed

Created on 2019-11-28 by the reprex package (v0.2.1)

Use of other characters as decimal separators

Some languages use the comma as a decimal separator. For instance, in some journals written in Spanish, it is recommended that results should be written as... "F(1, 19) = 4,44, p = 0,048". I was not able to extract results from such papers using statcheck. It would be nice if this could be somehow considered.

Displaying χ² for chisq tests

UI/UX could use a quick boost by displaying ASCII characters for parsed Chi-Squared tests. Currently displays something along the lines of "2 (1) = 3.3, p = 0.07" in the output table. Thanks!

Statcheck logo

Is the statcheck logo available for reuse? If so, could you add it to the repo (.svg?) and specify the license under which you make it available (CC 0 please? :-)).

I'd like to use it for statcheck-extension I am just starting on.

Line breaks prevent reading of statistical results

Line breaks in a (badly?) converted PDF file result in the not reading of a test result. Maybe it is a worthwhile addition to add the removal of newlines (\n) next to the space removal used in the statcheck function.

A reproducible example is:
statcheck("F(1, 45) = .12, p = .58 and F(2, 165)\n = .001, p = .96")

Michèle, if you agree I can do this sometime soon.

Release statcheck 1.5.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • usethis::use_github_links()
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • Add preemptive link to blog post in pkgdown news menu
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)
  • Finish blog post
  • Tweet

Problem with non-ASCII (?) characters in filenames

Error message:

Error in readChar(file(fileName), file.info(fileName)$size, useBytes = TRUE) :
cannot open the connection
In addition: Warning message:
In readChar(file(fileName), file.info(fileName)$size, useBytes = TRUE) :
cannot open file 'C:/Users/Nick/Desktop/html/ML1.12 Math = male, me = female, therefore math ? me - ProQuest.html': Invalid argument

Filename is "ML1.12 Math = male, me = female, therefore math ≠ me - ProQuest.html"
The character causing a problem seems to be "≠".
Original article: https://www.ncbi.nlm.nih.gov/pubmed/12088131

Correlations incorrectly extracted as Chi2

Based on the pubpeer reports a bug was found in v1.0.1 where some correlations are incorrectly extracted as Chi2. It is based on this paper and the Pubpeer comments are available here.

It might be worthwhile to make these a use case for testing purposes. But definitely something worth looking into. Attached is a csv of the results from statcheck for this paper.
issue.txt

Test for validity of confidence intervals?

Greetings from the SIPS conference where we are having an error checking discussion - and all very appreciative of statcheck. Given that confidence intervals are usually produced from standard errors, they can be calculated based on p-value and sample size. Could statcheck add a test of those, given that they have become highly recommended parts of the APA guidelines?

DecisionErrorAlphas boolean absent?

I get an error that is new to me. I pasted it below. I also attached the text for which I got this. You can read it in to R with read.table; I was using statcheck() when this error occurred.

ERROR

Extracting statistics...
  |=========================================================================| 100%
Error in if (any(DecisionErrorAlphas)) { : 
  missing value where TRUE/FALSE needed

bug_report.txt

Report matching vector element

To have a cli program, it would be nice to have the element of the input vector from which the statistics is extracted (eg. the line number)

I implemented an example program here (please forgive my poor R)

Example output format would be:

filename.org:27:23: info: F(1,132) = 5.59, p = 0.019
filename.org:28:24: info: F(1,132) = 8.96, p = 0.003
filename.org:38:8: info: F(1,130) = 4.86, p = 0.029
filename.org:39:9: error: The expected value is 0.043 (0.0426781658095173)
filename.org:40:2: info: F(1,130) = 7.41, p = 0.007
filename.org:54:2: error: The expected value is 0.019 (0.0189133318829514)
filename.org:54:42: error: The expected value is 0.007 (0.00737627921418102)
filename.org:56:26: error: The expected value is 0.011 (0.0112664797423938)
filename.org:60:16: info: F(1,132) = 5.59, p = 0.02

This can be used inside emacs with flycheck like this:

(flycheck-define-checker statscheck
  "A linter for statistics."
  :command ("statscheck" source)
  :error-patterns
  ((error line-start (file-name) ":" line ":" column ": error: "
	    (message) line-end)
   (info line-start (file-name) ":" line ":" column ": info: "
	    (message) line-end))
  :modes (text-mode markdown-mode org-mode))

(add-to-list 'flycheck-checkers 'statscheck)

Character recognition errors in older PDFs

When converting some older PDFs, I've encountered a couple of character recognition errors that I think could be addressed with some updated regex:

  • "F(1, X) = Y" gets converted by pdftotext to "F(l, X) = Y"; perhaps include 'l' in regex for F-tests' first degree of freedom (and record as 1)
  • "t(X) = Y" gets converted by pdftotext to "r(X) = Y"; perhaps r-tests with Y > 1 get converted to t-tests (and print warning)

PDF with chisq fails with unclear error message due to extraction issue (improve message / use pdftools?)

This PDF file
10.1111:apps.12362.pdf

fails with

Error in if (grepl(pattern = RGX_Q, x = test_raw)) { : 
  the condition has length > 1

This is because the chisq tests get read as follows:

a good model fit (2 (199) = 627.73, p < .001, CFI = .94, RMSEA = .07, SRMR = .05), and [...] loading on one factor (2 (206) = 2533.69, p < .001, CFI = .67, RMSEA = .15, SRMR = .15) and the one-factor model with all items loading on one common factor (2 (209) = 4489.05, p < .001, CFI = .40, RMSEA = .20, SRMR = .17).

This is really odd xpdf-behaviour because I can copy-paste them from the PDF without trouble, so they seem to be embedded as characters rather than images.

So, two questions here:

  • can the error message be clearer? If it was sth like could not process "(2 (199) = 627.73" then trouble-shooting would be much easier?
  • is xpdf the best choice? I have very limited knowledge of this, but the "pdftools" package is much easier to install (just with install.packages, with no separate installation of dependencies) and gets this correct:

a good model fit (χ 2 (199) = 627.73, p < .001, CFI = .94,\nRMSEA = .07, SRMR = .05), and [...] loading on one factor (χ 2 (206) = 2533.69, p < .001, CFI = .67, RMSEA = .15, SRMR = .15) and\nthe one-factor model with all items loading on one common factor (χ 2 (209) = 4489.05,\np < .001, CFI = .40, RMSEA = .20, SRMR = .17).

(Getting this to work requires two minor pre-processing steps:
pdftools::pdf_text(f) |> paste(collapse = "") |> gsub("\n", "", _) |> statcheck:::extract_stats("chisq")
)

Report APA style validity

It would be nice to have a dataframe variable saying whether the parsed "formula" is in valid APA style.
A simple way to reach it would be to match $Raw with the valid-APA regex

Respect messages = FALSE fully?

One more wish: could messages = FALSE also suppress the "statcheck did not find any results" message? Alternatively, could this be delivered as a message() rather than with cat()? The cat() output is quite difficult to suppress in a loop ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.