michelenuijten / statcheck Goto Github PK

A spellchecker for statistics

R 100.00%

statistics cran nhst p-values r reproducibility

statcheck's Introduction

statcheck

What is statcheck?

statcheck is a “spellchecker” for statistics. It checks whether your p-values match their accompanying test statistic and degrees of freedom.

statcheck searches for null-hypothesis significance test (NHST) in APA style (e.g., t(28) = 2.2, p < .05). It recalculates the p-value using the reported test statistic and degrees of freedom. If the reported and computed p-values don’t match, statcheck will flag the result as an error.

What can I use statcheck for?

statcheck is mainly useful for:

Self-checks: you can use statcheck to make sure your manuscript doesn’t contain copy-paste errors or other inconsistencies before you submit it to a journal.
Peer review: editors and reviewers can use statcheck to check submitted manuscripts for statistical inconsistencies. They can ask authors for a correction or clarification before publishing a manuscript.
Research: statcheck can be used to automatically extract statistical test results from articles that can then be analyzed. You can for instance investigate whether you can predict statistical inconsistencies (see e.g., Nuijten et al., 2017), or use it to analyze p-value distributions (see e.g., Hartgerink et al., 2016).

How does statcheck work?

The algorithm behind statcheck consists of four basic steps:

Convert pdf and html articles to plain text files.
Search the text for instances of NHST results. Specifically, statcheck can recognize t-tests, F-tests, correlations, z-tests, $\chi^2$ -tests, and Q-tests (from meta-analyses) if they are reported completely (test statistic, degrees of freedom, and p-value) and in APA style.
Recompute the p-value using the reported test statistic and degrees of freedom.
Compare the reported and recomputed p-value. If the reported p-value does not match the computed one, the result is marked as an inconsistency (Error in the output). If the reported p-value is significant and the computed is not, or vice versa, the result is marked as a gross inconsistency (DecisionError in the output).

statcheck takes into account correct rounding of the test statistic, and has the option to take into account one-tailed testing. See the manual for details.

Installation and use

For detailed information about installing and using statcheck, see the manual on RPubs.

statcheck.io is a web-based interface for statcheck.

statcheck's People

Contributors

Stargazers

Watchers

statcheck's Issues

reconsider ignoring minus signs followed by space

see this test:

weird encoding in minus sign followed by space

test_that("t-values with a weird minus sign and a space do not result in errors", {
txt1 <- " t(553) = − 4.46, p < .0001" # this is an em dash or something

expect_output(statcheck(txt1, messages = FALSE), "did not find any results")

})

not sure why I decided why these cases should be ignored. seems reasonable to include them.

deal with statcheck(NA)

if you run statcheck(NA), you'll get an error. Return warning/message instead.

Tables?

We are currently doing an error-detection hackathon (related to the ERROR project) ... and were wondering whether you'd be interested in having statcheck extended to (HMTL) tables ... or whether that would work better as a separate extension package? Would be great to hear your thoughts ...

optional parentheses for DFs

It's reasonably common for HTML to write the degrees of freedom as a subscript without a parentheses.

Examples:

T₂₈ = -2.20, p = .03
F_2,28 = 2.20, p = .03

When reading the text, the formatting is lost, so it appears without the parentheses:

T28 = -2.20, p = .03
F2,28 = 2.20, p = .03

Proposed solution:

Make the parentheses optional. Just adding a ? after the parentheses in the regexes is simple. But the code to determine the statistical test relies on the open parenthesis.

Test code

test_that("t-tests with without parentheses are retrieved from text", {
  txt1 <- "t 28 = 2.20, p = .03"
  txt2 <- "t28 = 2.20, p = .03"

  result <- statcheck(c(txt1, txt2), messages = FALSE)

  expect_equal(nrow(result), 2)
})

Issue with recognising string F = 0 or 1

Hi Michele,
Statcheck doesn't seem to be able to recognise the below string. Could you please advise what I might be doing wrong? Thanks!

statcheck("F(1,210) = 0, p = 1")
statcheck("F(1,210) = 1, p = 0")

count p < .000 as correct if pZeroError == FALSE

it seems as if p < .000 is always counted as an error, even when pZeroError == FALSE. This doesn't happen for p = .000.

The Source column is not sortable when unnamed sources are used

Sorting is messed up where source 10 appears before source 2.

incorrect OneTail output?

When I run the code below, the output says that OneTail is FALSE indicating that the results is incorrect if it were a one-tailed test (which it is, in the example). That doesn't seem right, though.

statcheck("this is a one-tailed test: t(40)=1.80,p<.04")

When I change p<.04 in p<.05 OneTail becomes TRUE, yet when I change it into p<.06 it becomes FALSE again. If I say p=.04, it changes back to TRUE. I'm not sure, but the issue seems to be line 1178 of statcheck.R, or am I missing something? Thanks.

Best,
Tom

"test (15)" scraped as "t(15)"

I saw something in the statcheck scrape that might be a bug. I think generic tests may get scraped as t-tests.

For example, in this paper, the authors write "Friedman's test (15) = 62.92", and statcheck scrapes it as "t(15) = 62.92".

I don't know much about Friedman's test (or nonparametric tests in general), but it seems to use its own Q-statistic that is closer to a chi-square distribution.

I don't think this is a common situation, of course, but if the regexp could be tweaked to avoid mistaking "...test (df)" for "t(df)" it would improve the specificity of the statcheck program.

don't run txt-to-file tests if there are no test materials/articles available

Statcheck checks if functions like checkHTML() work by scanning "test articles". These are not synced with git, because of copyright issues. That means that if you download statcheck from GitHub, you will fail a lot of tests, because there are no test articles. Skip these tests if the articles are not there (maybe with a printed message).

Flag cases where reported p-value is only correct when taking rounding into account

Add a flag to the output for cases where statcheck "unrounded" the test stat.
Otherwise you get cases like this:

r(97) = .17, p = .084
recalculated p = .0925
consistent

This is confusing if people don't know that statcheck also counts the p-value belonging to r = .165-.175 as correct

make failsafe if you feed checkPDF or checkHTML another article type

example:

checkPDF() # chose a html file
PDF error: May not be a PDF file (continuing anyway)
PDF error (2): Illegal character <21> in hex string
PDF error (4): Illegal character <4f> in hex string
PDF error (6): Illegal character <54> in hex string
PDF error (7): Illegal character <59> in hex string
PDF error (8): Illegal character <50> in hex string
PDF error (11): Illegal character <68> in hex string
PDF error (12): Illegal character <74> in hex string

(etc.)

pdftools

Alternative to xpdf which does not require installation of separate software: https://ropensci.org/blog/2016/03/01/pdftools-and-jeroen

error not detected when reported p < alpha < computed p

str(statcheck("t(10) = 3, p = .009", alpha = .01))

Classes ‘statcheck’ and 'data.frame':   1 obs. of  15 variables:
 $ Source             : Factor w/ 1 level "1": 1
 $ Statistic          : Factor w/ 1 level "t": 1
 $ df1                : logi NA
 $ df2                : num 10
 $ Test.Comparison    : Factor w/ 1 level "=": 1
 $ Value              : num 3
 $ Reported.Comparison: Factor w/ 1 level "=": 1
 $ Reported.P.Value   : num 0.009
 $ Computed           : num 0.0133
 $ Raw                : Factor w/ 1 level "t(10) = 3, p = .009": 1
 $ Error              : logi FALSE
 $ DecisionError      : logi FALSE
 $ OneTail            : logi TRUE
 $ OneTailedInTxt     : logi FALSE
 $ APAfactor          : num 1

add additional html tags for math symbols

See https://dev.w3.org/html5/html-author/charref for a list. Some tags are already included in file-to-txt.R, but not all variations.

optional square brackets for parentheses

Should authors use parentheses for df? Yes

Do some authors use square brackets instead? Also yes

Change RGX_OPEN_BRACKET to "(.+?(?=[\\(\\[]))"

Feature request: Provide option to return `AllPValues` with available tests

I would love to be able to consider all p-values from a paper and see for which ones tests could be extracted / which ones can be flagged as problematic. Currently, I can get the tests or AllPValues, but they are hard to match. Could there be an option to augment rather than replace the standard output?

issues to add/update in the branch feature-pdftools

add doi's of all test articles to the code, so that others can check which articles were used for testing
check the "manual" csv files in test directory; these contain manually extracted stats for test articles but are not finished. these might be allowed to upload.
when reading in manual csv files, remove last two rows (only show total nr of extracted results)
rewrite tests to focus on pdftools as default method
write explicit tests to test difference in retrieval between pdftools and xpdf

Skip files with too long filename instead of throwing an error

When a filename is very long, it can't be opened (for some reason). This causes statcheck to throw an error. It would be better to throw an informative message instead, so that when you scan an entire folder of papers and one has a file name that's too long, you just skip the long file and still scan the rest.

Error message:

Importing HTML files...
|== | 2%Error in readChar(con, file.info(fileName)$size, useBytes = TRUE) :
cannot open the connection
In addition: Warning message:
In readChar(con, file.info(fileName)$size, useBytes = TRUE) :
cannot open file 'C:/Users/mnuijten/surfdrive/UVT/Projects/EffectivenessStatcheck/effectiveness_statcheck/articles/PS/2013/A Longitudinal Cluster-Randomized Controlled Study on the Accumulating Effects of Individualized Literacy Instruction on Students’ Reading From First Through Third Grade.htm': No such file or directory

Warnings due (?) to file closing

After scanning a directory with 70 files in it, I got this message:
There were 50 or more warnings (use warnings() to see the first 50)

The output of warnings() is attached.
warnings.txt

Release plan 1.4 and stale develop branch

Thanks for this interesting project.

I'm wondering what the release plan is for version 1.4. There are prereleases for 1.4 https://github.com/MicheleNuijten/statcheck/releases, but no release yet.

Another thing I'm wondering about is the stale (?) develop branch. What is your plan with this branch? Merge it into master?https://github.com/MicheleNuijten/statcheck/network. There are interesting features in there regarding PDF parsing with pdftools. It might be better to abandon the develop branch after merging it into master.

Incorrect error diagnosis for one-sided t-tests

I think I found a bug in how statcheck() diagnoses errors in one-sided t-tests. Specifically, when the sample mean is in the "wrong" tail it is inappropriate to calculate the one-tailed p-value by halfing the two-sided p-value as is done by statcheck().

mean(sleep$extra[sleep$group == 1])
# [1] 0.75
mean(sleep$extra[sleep$group == 2])
# [1] 2.33

t.test(
  sleep$extra[sleep$group == 1]
  , sleep$extra[sleep$group == 2]
  , var.equal = T
)
# Two Sample t-test
# 
# data:  sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
# t = -1.8608, df = 18, p-value = 0.07919
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   -3.363874  0.203874
# sample estimates:
#   mean of x mean of y 
# 0.75      2.33 

t.test(
  sleep$extra[sleep$group == 1]
  , sleep$extra[sleep$group == 2]
  , alternative = "greater"
  , var.equal = T
)
# 	Two Sample t-test
# 
# data:  sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
# t = -1.8608, df = 18, p-value = 0.9604
# alternative hypothesis: true difference in means is greater than 0
# 95 percent confidence interval:
#  -3.052378       Inf
# sample estimates:
# mean of x mean of y 
#      0.75      2.33

Hence, statcheck incorrectly indicates erroneous reporting, whereas the, in this case, incorrect p-value is deemed correct.

statcheck:::statcheck("t(18) = -1.86, p = 0.960", OneTailedTests = TRUE)[, c("Reported.P.Value", "Computed", "Error", "OneTail")]
# Reported.P.Value   Computed Error OneTail
# 1           0.96 0.03965356  TRUE   FALSE

statcheck:::statcheck("t(18) = -1.86, p = 0.039", OneTailedTests = TRUE)[, c("Reported.P.Value", "Computed", "Error", "OneTail")]
# Reported.P.Value   Computed Error OneTail
# 1          0.039 0.03965356 FALSE   FALSE

Since statcheck() can't know what the tested hypothesis is, it should probably always consider both possibilities and err on the side of caution?

improve progress bar

update the progress bar(s) such that when the progress bar is completed, the result is in (now you still have to wait quite some time after the progress bar "Extracting stats" is full)

Add argument method = xpdf/pdftools to checkdir()

When scanning an entire folder of html & pdf articles, allow for choosing the pdf reader

Error if there is a space between two numbers after a test statistic and before a decimal in a reported statistical test result

Errors are caused if there is a space between two numbers after a test statistic and before a decimal in a reported statistical test result. (Scanned several thousand papers and this only occurred once so it's unlikely to pop up too often!)

Examples:

statcheck::statcheck(" z = 1 1 .25, p = .806. ")
#> Extracting statistics...
#> 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
#> Error in if (lower[i] < 0) {: missing value where TRUE/FALSE needed

statcheck::statcheck("t(123) = 1 0.25, p = .806")
#> Extracting statistics...
#> 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
#> Error in if (lower[i] < 0) {: missing value where TRUE/FALSE needed

^{Created on 2019-11-28 by the reprex package (v0.2.1)}

Use of other characters as decimal separators

Some languages use the comma as a decimal separator. For instance, in some journals written in Spanish, it is recommended that results should be written as... "F(1, 19) = 4,44, p = 0,048". I was not able to extract results from such papers using statcheck. It would be nice if this could be somehow considered.

Displaying χ² for chisq tests

UI/UX could use a quick boost by displaying ASCII characters for parsed Chi-Squared tests. Currently displays something along the lines of "2 (1) = 3.3, p = 0.07" in the output table. Thanks!

Statcheck logo

Is the statcheck logo available for reuse? If so, could you add it to the repo (.svg?) and specify the license under which you make it available (CC 0 please? :-)).

I'd like to use it for statcheck-extension I am just starting on.

Line breaks prevent reading of statistical results

Line breaks in a (badly?) converted PDF file result in the not reading of a test result. Maybe it is a worthwhile addition to add the removal of newlines (\n) next to the space removal used in the statcheck function.

A reproducible example is:
statcheck("F(1, 45) = .12, p = .58 and F(2, 165)\n = .001, p = .96")

Michèle, if you agree I can do this sometime soon.

Release statcheck 1.5.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

Problem with non-ASCII (?) characters in filenames

Error message:

Error in readChar(file(fileName), file.info(fileName)$size, useBytes = TRUE) :
cannot open the connection
In addition: Warning message:
In readChar(file(fileName), file.info(fileName)$size, useBytes = TRUE) :
cannot open file 'C:/Users/Nick/Desktop/html/ML1.12 Math = male, me = female, therefore math ? me - ProQuest.html': Invalid argument

Filename is "ML1.12 Math = male, me = female, therefore math ≠ me - ProQuest.html"
The character causing a problem seems to be "≠".
Original article: https://www.ncbi.nlm.nih.gov/pubmed/12088131

Correlations incorrectly extracted as Chi2

Based on the pubpeer reports a bug was found in v1.0.1 where some correlations are incorrectly extracted as Chi2. It is based on this paper and the Pubpeer comments are available here.

It might be worthwhile to make these a use case for testing purposes. But definitely something worth looking into. Attached is a csv of the results from statcheck for this paper.
issue.txt

allow semicolons instead of commas

Should there be a comma separating the value and the "p"? Yes.

Do some authors use a semicolon instead? Also yes.

Look at final character here

Control verbosity

It would be nice to have a be able to call statcheck with the optional verbosity = FALSE parameter that suppresses the following messages:

statcheck/R/statcheck.R

Line 56 in f24a4f9

message("Extracting statistics...")

statcheck/R/statcheck.R

Line 57 in f24a4f9

pb <- txtProgressBar(max = length(x), style = 3)

Instead of tctlk just use file.choose()

The library tctlk is sometimes problematic on Mac. Also, there doesn't seem to be a specific reason to choose this library over the base R functions

reconsider removing file extension from source

if you scan a folder that has both a pdf version and html version of the same file, they will get the same source name in the final result. this seems undesirable.

Test for validity of confidence intervals?

Greetings from the SIPS conference where we are having an error checking discussion - and all very appreciative of statcheck. Given that confidence intervals are usually produced from standard errors, they can be calculated based on p-value and sample size. Could statcheck add a test of those, given that they have become highly recommended parts of the APA guidelines?

DecisionErrorAlphas boolean absent?

I get an error that is new to me. I pasted it below. I also attached the text for which I got this. You can read it in to R with read.table; I was using statcheck() when this error occurred.

ERROR

Extracting statistics...
  |=========================================================================| 100%
Error in if (any(DecisionErrorAlphas)) { : 
  missing value where TRUE/FALSE needed

bug_report.txt

Cannot install from GitHub - malformed package version

Currently, devtools::install_github("MicheleNuijten/statcheck") fails due to a Malformed package version. Apparently, 1.4.1-beta.1 is not acceptable there ...

Report matching vector element

To have a cli program, it would be nice to have the element of the input vector from which the statistics is extracted (eg. the line number)

I implemented an example program here (please forgive my poor R)

Example output format would be:

filename.org:27:23: info: F(1,132) = 5.59, p = 0.019
filename.org:28:24: info: F(1,132) = 8.96, p = 0.003
filename.org:38:8: info: F(1,130) = 4.86, p = 0.029
filename.org:39:9: error: The expected value is 0.043 (0.0426781658095173)
filename.org:40:2: info: F(1,130) = 7.41, p = 0.007
filename.org:54:2: error: The expected value is 0.019 (0.0189133318829514)
filename.org:54:42: error: The expected value is 0.007 (0.00737627921418102)
filename.org:56:26: error: The expected value is 0.011 (0.0112664797423938)
filename.org:60:16: info: F(1,132) = 5.59, p = 0.02

This can be used inside emacs with flycheck like this:

(flycheck-define-checker statscheck
  "A linter for statistics."
  :command ("statscheck" source)
  :error-patterns
  ((error line-start (file-name) ":" line ":" column ": error: "
	    (message) line-end)
   (info line-start (file-name) ":" line ":" column ": info: "
	    (message) line-end))
  :modes (text-mode markdown-mode org-mode))

(add-to-list 'flycheck-checkers 'statscheck)

Bug in spearman rho?

StatCheck returns p = 1 for a spearman rho test:

https://pubpeer.com/publications/482004022406F33A920A732DC12DCC#fb99015

Apologies if this was fixed in the StatCheck update.

Use this paper as Chi2 test

In this paper a Spearman's rho was wrongly extracted as Chi2. Provides a useful testcase for future developments.

PubPeer
Paper

Character recognition errors in older PDFs

When converting some older PDFs, I've encountered a couple of character recognition errors that I think could be addressed with some updated regex:

"F(1, X) = Y" gets converted by pdftotext to "F(l, X) = Y"; perhaps include 'l' in regex for F-tests' first degree of freedom (and record as 1)
"t(X) = Y" gets converted by pdftotext to "r(X) = Y"; perhaps r-tests with Y > 1 get converted to t-tests (and print warning)

PDF with chisq fails with unclear error message due to extraction issue (improve message / use pdftools?)

This PDF file
10.1111:apps.12362.pdf

fails with

Error in if (grepl(pattern = RGX_Q, x = test_raw)) { : 
  the condition has length > 1

This is because the chisq tests get read as follows:

a good model fit (2 (199) = 627.73, p < .001, CFI = .94, RMSEA = .07, SRMR = .05), and [...] loading on one factor (2 (206) = 2533.69, p < .001, CFI = .67, RMSEA = .15, SRMR = .15) and the one-factor model with all items loading on one common factor (2 (209) = 4489.05, p < .001, CFI = .40, RMSEA = .20, SRMR = .17).

This is really odd xpdf-behaviour because I can copy-paste them from the PDF without trouble, so they seem to be embedded as characters rather than images.

So, two questions here:

can the error message be clearer? If it was sth like could not process "(2 (199) = 627.73" then trouble-shooting would be much easier?
is xpdf the best choice? I have very limited knowledge of this, but the "pdftools" package is much easier to install (just with install.packages, with no separate installation of dependencies) and gets this correct:

a good model fit (χ 2 (199) = 627.73, p < .001, CFI = .94,\nRMSEA = .07, SRMR = .05), and [...] loading on one factor (χ 2 (206) = 2533.69, p < .001, CFI = .67, RMSEA = .15, SRMR = .15) and\nthe one-factor model with all items loading on one common factor (χ 2 (209) = 4489.05,\np < .001, CFI = .40, RMSEA = .20, SRMR = .17).

(Getting this to work requires two minor pre-processing steps:
pdftools::pdf_text(f) |> paste(collapse = "") |> gsub("\n", "", _) |> statcheck:::extract_stats("chisq")
)

michelenuijten / statcheck Goto Github PK

statcheck's Introduction

statcheck

What is statcheck?

What can I use statcheck for?

How does statcheck work?

Installation and use

statcheck's People

Contributors

Stargazers

Watchers

Forkers

statcheck's Issues

weird encoding in minus sign followed by space

Examples:

Proposed solution:

Test code

Recommend Projects

Recommend Topics

Recommend Org