merck / metalite Goto Github PK

View Code? Open in Web Editor NEW

14.0 9.0 4.0 6.05 MB

An R package to create metadata structure for ADaM data analysis and reporting

Home Page: https://merck.github.io/metalite/

License: GNU General Public License v3.0

R 87.73% CSS 0.26% Rich Text Format 12.01%

r-package clinical-trials cdisc metadata

metalite's People

Contributors

Stargazers

Watchers

Forkers

fb-elong ghas-results rchaitanyapradeep

metalite's Issues

Independent test of `adam_mapping.r`

Test plan of adam_mapping():

run following code as start

x <- adam_mapping(
  name = "apat",
  id = "USUBJID",
  group = "TRT01A",
  subset = TRTFL == "Y",
  label = "All Participants as Treated"
)
x

output x is a adam_mapping class object
c("name", "id", "group", "subset", "label" ) is the default vector of column name of the list x.
name, id, group, subset, label would be mapped to their input or default value accordingly

Test plan of validate_adam_mapping():

expect error when name is NULL, run validate_adam_mapping(new_adam_mapping(list(name = NULL)))
run validate_adam_mapping(new_adam_mapping(list(name = "apat"))) then mapping would generate
if required variable is not character then will cause error:
For example:

fun <- function(name, id){
  exprs <- rlang::enquos(
    name = name,
    id =id
  )
  exprs
}

validate_adam_mapping(new_adam_mapping(fun("apat", USUBJID), env = parent.frame()))

run validate_adam_mapping(new_adam_mapping(list(name = "apat", id =c("USUBJID", "SUBJID")))) will generate error

More flexible design of `collect_title`

The current version of collect_title is kind of constrained to c(parameter, population, observation). We can think about a more flexible way to design this function.

Independent testing for `label.R`

Test plan for get_label:

It returns the columns when the input data has no labels, i.e., the output of

tbl <- data.frame(a = c(1, 2, 3), b = c(-1, -2, -3))
get_label(tbl)

is "a", "b"

It returns the labels when the input data has labels, i.e., the output of

tbl <- data.frame(a = c(1, 2, 3), b = c(-1, -2, -3))
attr(tbl[[1]], "label") <- "variable 1"
attr(tbl[[2]], "label") <- "variable 2"
get_label(tbl)

is "variable 1", "variable 2"

Independent test of `meta_dummy.R`

Test if the output of x <- meta_dummy()

class(x) = "meta_adam"
class(x$data_population) = "data.frame"
class(x$data_observation) = "data.frame"
class(x$population) = "list"
class(x$observation) = "list"
class(x$parameter) = "list"
class(x$analysis) = "list"

independent test of `define.R`

Test plan of define_plan():

data frame plan is contained in the output list

Test plan of define_population():

Pop-up error if one of name is not in the plan data frame of meta
output is a meta_adam class object with list population contains in the object.
list population contains a list which name is same as argument name input. Also, c("name", "id", "group", "var", "subset", "label" ) is the default vector of column name of the list.
"name", "id", "group", "var", "subset", "label" would be mapped to their input or default value accordingly

Test plan of define_observation():

Pop-up error if one of name is not in the plan data frame of meta
output is a meta_adam class object with list observation contains in the object.
list observation contains a list which name is same as argument name input. Also, c("name", "id", "group", "var", "subset", "label" ) is the default vector of column name of the list.
"name", "id", "group", "var", "subset", "label" would be mapped to their input or default value accordingly

Test plan of define_parameter():

Pop-up error if one of name is not in the plan data frame of meta
output is a meta_adam class object with list parameter contains in the object.
list parameter contains a list which name is same as argument name input. Also, c("name", "subset", "label" ) is the default vector of column name of the list.
"name", "subset", "label" would be mapped to their input or default value accordingly

Test plan of define_analysis():

Pop-up error if one of name is not in the plan data frame of meta
output is a meta_adam class object with list analysis contains in the object.
list analysis contains a list which name is same as argument name input. Also, c("name", "label" ) is the default vector of column name of the list.
"name", "label" would be mapped to their input or default value accordingly

Independent test of `update.R`

Test plan of update_adam_mapping():

add additional variables in the exsiting adam mapping such as relate -> "AEREL" and using collect_adam_mapping(meta, "ser") to collect the new added mapping rule.

Independent testing of `meta_run.R`

Test plan of meta_run():
Create two dummy functions before your test case:

meta <- meta_dummy()
ae_summary <- function(...) {
   paste("results of", deparse(match.call(), nlines = 1))
}
ae_specific <- function(...) {
   paste("results of", deparse(match.call(), nlines = 1))
 }

run meta_run(meta), all analysis based on the analysis plan meta$plan would be executed.(use snapshot testing)
run meta_run(meta, i = 1), only the first analysus based on the analysis plan meta$plan would be executed.(use snapshot testing)
run meta_run(meta, i = c(1, 3, 5), selected analysis based on the analysis plan meta$plan would be executed.(use snapshot testing)

Add `@return` and `@examples` to the exported functions

Search returned links not working

The links returned by the search box are broken.

To fix this, add the url field to _pkgdown.yml:

url: https://merck.github.io/metalite/

inherit `id` variable

When user define id in population, the same id variable should be used in observation.

Independent testing for `meta_split.R`

Test plan of meta_split():

Devide meta into groups by "SEX", run meta_dummy() |> meta_split("SEX"). (check whether meta is split and use snapshot testing)

Independent testing of `meta_check`

Test plan of meta_check_var():

check whether variable RACE is in population and observation: meta_check_var(meta_dummy(), var = "RACE")
check whether variable AEDECOD is in population and observation: meta_check_var(meta_dummy(), var = "AEDECOD"), expect to throw an error.
check whether variable BMIBL is in population and observation: meta_check_var(meta_dummy(), var = "BMIBL"), expect to throw an error.

Independent testing of `meta_validate.R`

Test plan of meta_validate():

use meta_dummy() as input. Will return the same meta
Check data_type: set meta <- meta_summy().
Change meta$data_population into NULL. Run meta_validate(meta) will throw an error
Change meta$data_observation into NULL. Run meta_validate(meta) will throw an error
Change meta$plan into NULL. Run meta_validate(meta) will throw an error
Check plan variable name: set meta <- meta_summy().
Change meta$plan variable name by using names(meta$plan)[5] <- "param". Run meta_validate(meta) will throw an error
Check id variable: set meta <- meta_summy().
Change population id variable by meta$population$apat$id <- "id". Run meta_validate(meta) will throw an error
Check label variable: set meta <- meta_summy().
Change population label variable by meta$population$apat$label <- NULL. Run meta_validate(meta) will throw an warning message
Check observation variables in the datasets:
set meta <- meta_summy(). Change meta$observation$wk12$id into 'ID'. Run meta_validate(meta) will throw an error
set meta <- meta_summy(). Change meta$observation$wk12$group into 'group'. Run meta_validate(meta) will throw an error
set meta <- meta_summy(). Change meta$observation$wk12$var into 'var'. Run meta_validate(meta) will throw an error

remove `rtf.R` from `metalite`

rtf.R should belong to r2rtf or its extension instead of metalite.

`collect_n_subject` add total column

add an argument to allow total column in collect_n_subject

keep label attribute for `collect_observation_record()`

We need to keep the label attribute of all variables for the output dataframe.

MVP for data exploration tool

By using metalite and reactable,

we can consider to build a data exploration tool with visualization and drill down feature.

Here is an MVP.

metalite can be used to define meta data structure to prepare item1 - 5 as needed.
reactable can combine all components to create the result.

library(reactable)
library(r2rtf)
library(plotly)
# library(metalite)
adsl <- r2rtf_adsl

library(tibble)
library(ggplot2)

tbl <- tribble(
  ~ term,                   ~ group1,     ~ group2,     ~group3,      ~total,   ~display,      ~variable,
  "Age (Year)",                   "",           "",          "",          "",   "figure",          "AGE",
  "Subject in Population",       "5",          "6",         "7",        "18",   "listing",         "AGE",
  "Mean (SD)",            "40 (2.5)",   "40 (2.5)",  "40 (2.5)",  "40 (2.5)",   "",                "AGE",
  "[Min, Max]",           "[18, 60]",   "[18, 60]",  "[18, 60]",  "[18, 60]",   "",                "AGE", 
  "Missing",                     "1",          "1",          "1",        "3",   "listing",         "AGE",
)

item1 <- ggplot(adsl, mapping = aes(x = AGE, group = TRT01P)) + 
  geom_histogram() + 
  facet_wrap(~ TRT01P) + 
  theme_bw()

item2 <- adsl %>% select(USUBJID, AGE)

item3 <- NULL

item4 <- NULL

item5 <- adsl %>% subset(AGE == 51) %>% select(USUBJID, AGE)

detail <- list(
  "Age (Year)" = item1, 
  "Subject in Population" = item2, 
  "Mean (SD)" = item3, 
  "[Min, Max]" = item4, 
  "Missing" = item5
)


# Example 1: with ggplot2
details_ggplot2 <- function(index){
  name <- tbl[index,][["term"]]
  
  x <- detail[[name]]
  
  if("data.frame" %in% class(x)) return(reactable(x))
  if("ggplot" %in% class(x)) return(htmltools::plotTag(x, alt="plots", width = 400))
  
  NULL
}

reactable(tbl, 
          columns = list(
            display = colDef(show = FALSE),
            variable = colDef(show = FALSE)
          ),
          details = details_ggplot2)

# Example 2: with plotly
details_plotly <- function(index){
  name <- tbl[index,][["term"]]
  
  x <- detail[[name]]
  
  if("data.frame" %in% class(x)) return(reactable(x))
  if("ggplot" %in% class(x)) return(ggplotly(x, alt="plots"))
  
  NULL
}

reactable(tbl, 
          columns = list(
            display = colDef(show = FALSE),
            variable = colDef(show = FALSE)
          ),
          details = details_plotly)

Bug: `collect_title` does not process `parameter` input

Then building titles, collect_title does not have the necessary implementation to return specified titles for the parameter argument.

collect_title(meta_dummy(), population = "apat", observation = "wk12", parameter = "aeosi", analysis = "ae_summary")

[1] "Summary of Adverse Events"   "Weeks 0 to 12"               "All Participants as Treated"

it looks like we probably need parameter in the list for lapply. I'm not sure why we need the with(collect_adam_mapping... logic.

collect_title <- function(meta,
                          population,
                          observation,
                          parameter,
                          analysis) {
  x <- lapply(
    c(analysis, observation, population),
    function(x) {
      tmp <- omit_null(collect_adam_mapping(meta, x)[c("title", "label")])
      if (length(tmp) > 0) {
        with(collect_adam_mapping(meta, parameter), fmt_sentence(glue::glue(tmp[[1]])))
      } else {
        NULL
      }
    }
  )

  unlist(x)
}

Independent test of `fmt_quote()` and `fmt_sentense()`

Test plan of fmt_quote()
- It outputs "'b'" by running fmt_quote('"b"')
- It outputs "'a' and 'b'" by running fmt_quote('"a" and "b"')
Test plan of fmt_sentense()
- It outputs "a" by running fmt_sentence(" a ")
- It outputs "a and b" by running fmt_sentence(" a and b")

Updates metalite

metalite/default_parameter_ae.R updates required for the below for the ae0summary

"with no adverse event" category is not present in ae0summary r table, adam_mapping in metalite to be updated to add it
"grade 3-4 adverse event" not present in adam_mapping.
disc0ser0rel subset condition has to be updated in adam_mapping to add drug-related
dtc0rel subset condtion need to be updated with aesdth=="Y" instead of aesdtc == "Y"
discontinued drug1(ACN1), drug2(ACN2) categories need to be added
dose modification categories need to be added.

Fix `@return` and `@examples` fields

I went through the release checklist and found two minor issues that should be fixed before release:

meta_add_total() - this function is exported thus needs @return and @examples.
assign_label() - the code example has 20,000 rows of outputs and makes the page slow to load and render.

Test of `collect_n_subject.R`

Test plan of n_subject

Throw error message "n_subject: group variable must be a factor" when run metalite:::n_subject(id = 1:3, group = c("a", "b","b"))
No error message when run metalite:::n_subject(id = 1:3, group = as.factor(c("a", "b","b")))
The counts from metalite:::n_subject(r2rtf_adae$USUBJID, as.factor(r2rtf_adae$TRTA)) is the same as that from r2rtf_adae %>% group_by(TRTA) %>% summarize(n_distinct(USUBJID))
The counts from r2rtf_adae %>% group_by(TRTA) %>% summarize(n_distinct(USUBJID)) matches that from r2rtf_adae %>% group_by(TRTA, TRTEMFL) %>% summarize(n_distinct(USUBJID))

Independent test of `meta_adam.R`

Test plan of meta_adam.R
- The return objects are of "meta_adam" class
- The return objects have no analysis plan, observation, population, parameters, and analysis by default

Independent test of `collect.R`

Test plan of collect_adam_mapping():

throw an error when collect_adam_mapping(meta_dummy())
throw an error when collect_adam_mapping(meta = meta_dummy(), name = 1)
output NULL when collect_adam_mapping(meta = meta_dummy(), name = "itt")
the .location of collect_adam_mapping(meta = meta_dummy(), name = "apat") is "population"
the .location of collect_adam_mapping(meta = meta_dummy(), name = "wk12") is "observation"
the .location of collect_adam_mapping(meta = meta_dummy(), name = "any") is "parameter"
the .location of collect_adam_mapping(meta = meta_dummy(), name = "ae_summary") is "analysis"

Checks before CRAN release

assign_label() bug fix

metalite:::assign_label(r2rtf::r2rtf_adae, var = "USUBJID", label = "Unique subject id")

In this example, all other variables in r2rtf_adae except USUBJID will be assigned to their variable name itself. We should keep their label if they already have label.

Independent testing of `rtf.R`

Test plan of rtf_assemble():
create two rtf file in tempdir() as input of rtf_assemble(). use snapshot testing

Independent test of `adam_mapping.R`

Hi @elong0527 , could you please list some test plans for the following functions?

Test plan of adam_mapping:
Test plan of new_mapping
Test plan of new_adam_mapping
Test plan of validate_adam_mapping
Test plan of print.adam_mapping
Test plan of as.data.frame.adam_mapping
Test plan of merge.adam_mapping

Validation and testing efforts

Hi, can we create a wish-list for validation/testing of this package? We have some potential contributors.

test of `meta_add_total.R`

Test plan of meta_add_total

Throw error when meta_add_total(meta = meta_dummy(), total = c("Total", "Sum"))
The output of

x <- meta_add_total(meta = meta_dummy(), total = "Total")
x$data_population$TRTA
x$data_observation$TRTA

has a "Total" there in both data_population and data_observation.

USUBJID TRTA
1 treatment
2 control
1 Total
2 Total

group by TRTA
summarize n(USUBJID)

treatment 1
control 1
Total 2

Release metalite 0.1.0

I'm putting this CRAN release checklist template here in case you are ready for release in the next months.

First release

Proof read Title: and Description: and ensure they are informative
Check that all exported functions have @returns and @examples
Check that Authors@R: includes a copyright holder (role 'cph')
Review extrachecks
usethis::use_cran_comments() (optional)

Prepare for release

Submit to CRAN

Draft GitHub release
Submit to CRAN via web form
Approve emails

Wait for CRAN

Accepted 🎉
Post on r-packages mailing list
Tweet

Independent test of `meta_build.R`

Test plan of meta_build():

throw an error if the input meta doesn't define the observation
throw an error if the input meta doesn't define the population
throw an error if the input meta doesn't have any analysis plan
check if the keywords are available as their default value

minor revision of the documentation

There are several improvements that can be done, regarding the function spec documentation.

outdata: better not to be exported, since the reference_group = .../order = ... are a numerical vector for indexing purpose.
new_data: the document of x argument is not correct. x is a list, rather than outdata.

Independent test of `utility.R`

Test plan of omit_null():
create a list with at least one NULL element as input of omit_null(). Output would omit NULL element

Test plan of fmt_quote():
Pass string 'a ="n"' to the function

Test plan of check_duplicate_name():
Create list(a = 1, b = 2, b = 3) as input. Warning for duplicated name b

Test plan of fmt_sentence():
Pass " leading, trailing space and extra internal space will be cleaned " as input of fmt_sentence()

Remove the use of triple colon operators

From the R CMD check, I see

* checking dependencies in R code ... NOTE
Unexported object imported by a ':::' call: ‘r2rtf:::as_rtf_new_page’
  See the note in ?`:::` about the use of this operator.

Since r2rtf:::as_rtf_new_page() is simply defined as:

function () paste("{\\pard\\fs2\\par}\\page{\\pard\\fs2\\par}")

To eliminate the note, you can either 1) export it in r2rtf or 2) copy it as an internal function to this package.

Improve documentation language

See the recommendations from Google developer documentation style guide.

Consider renaming the function and update the documentation.

Listing a few possible alternatives here: sample, prototype, example, demo, mock, stub.

Probably in the next release?

add a vignette to tutorial the A&R grid & mockup generation

Recreate the flowchart

My goal is to recreate the flowchart following the tidy-text-mining style using draw.io (and add the source for future editing).

Bulletproof for `validate_adam_mapping()` not work

We should expect an error for this case:

adam_mapping(
 name = "apat",
 id = "USUBJID",
 group = "TRT01A",
 subset = "TRTFL == 'Y'",
 label = "All Participants as Treated"
 )

Additional tests for `print.meta_adam`

This function is in the R/meta_adam.R file. Tests of print.meta_adam can use snaptestting.

  adsl <- r2rtf::r2rtf_adsl
  adsl$TRTA <- adsl$TRT01A
  adsl$TRTA <- factor(adsl$TRTA,
    levels = c("Placebo", "Xanomeline Low Dose", "Xanomeline High Dose")
  )

  adae <- r2rtf::r2rtf_adae
  adae$TRTA <- factor(adae$TRTA,
    levels = c("Placebo", "Xanomeline Low Dose", "Xanomeline High Dose")
  )

  plan <- plan(
    analysis = "ae_summary", population = "apat",
    observation = c("wk12"), parameter = "rel"
  ) 

  meta <- meta_adam(
    population = adsl,
    observation = adae
  ) |>
    define_plan(plan = plan) |>
    define_population(
      name = "apat",
      group = "TRTA",
      subset = quote(SAFFL == "Y")
    ) |>
    define_observation(
      name = "wk12",
      group = "TRTA",
      subset = quote(SAFFL == "Y"),
      label = "Weeks 0 to 12"
    ) |>
    define_parameter(
      name = "aeosi",
      subset = quote(AEOSI == "Y"),
      var = "AEDECOD",
      soc = "AEBODSYS",
      term1 = "",
      term2 = "of special interest",
      label = "adverse events of special interest"
    ) |>
    define_analysis(
      name = "ae_summary",
      title = "Summary of Adverse Events"
    ) 
    meta |> print()

1st block:
The 1st line is "ADaM Meta Data:".
The 2nd line summarizes the population details, i.e., " .$data_population Population data with xxx subjects". And the value of xxx should be the number of rows in r2rtf::r2rtf_adae.
The 3rd line summarizes the observation details, i.e., "
.$data_observation Observation data with xxx records". And the value of xxx should match the number of rows from r2rtf::r2rtf_adae
The 4th line summarizes the number of analysis plans, i.e., ".$plan Analysis plan with xxx plans". And the value of xxx should match the number of rows in plan above.
2nd block
The 1st line is "Analysis population type:"
The rest of the block should match as.data.frame(meta$population)
3rd block
The 1st line is "Analysis observation type:"
The rest of the block should match as.data.frame(meta$observation)
4th block
The 1st line is "Analysis parameter type:"
The rest of the block should match as.data.frame(meta$parameter) |> select(name, label, subset)
5th block
The 1st line is "Analysis function:"
The rest of the block should match as.data.frame(meta$analysis) |> select(name, label)

Return first argument invisibly for side effect functions

See https://style.tidyverse.org/functions.html?q=invisible#return for the rationale.

In particular, print.meta_adam().

Probably in the next release.

Independent test of `outdata.R`

Test plan of outdata():

meta <- meta_dummy()
 outdata(meta_dummy(), "apat", "wk12", "rel", n = meta()$data_population %>% group_by(TRTA) %>% summarize(n = n()), group = "TRTA", reference_group = 1, order = 1:3)

Its class is "outdata"
Its output is a list of length 8, including meta, population, observation, parameter, n, order, group, and reference_group.

test of `bind_rows2.R`

Test plan of is_named in bind_rows2.R:

return FALSE if is_named(matrix(1:2, 2))
return FALSE if is_named("")
return FALSE if is_named(NA)
return TRUE if is_named(data.frame(x = 1))

Independent testing of `default.R`

Test plan of default_apply():
Create a adam_mapping object for apat through adam_mapping(name = "apat"). Output would give default value in mapping for "apat". Check data-raw/default.R for details of the default mapping

Independent test of `plan.R`

Test plan of plan():
- throw a error if analysis = ... is a string vector with length >= 2
Test plan of new_plan():
- Test if the number of rows of the following output is 1 * 1 * 2 * 3 = 6

metalite:::new_plan(analysis = "ae_specific", 
                                population = "apart", 
                                observation = c("wk12", "wk24"), 
                                parameter = c("any", "rel", "ser"))

Test plan of validate_plan():
- Test the following code will throw 5 errors

metalite:::validate_plan(1)
metalite:::validate_plan(plan(analysis = "ae_summary", population = "apat", observation = "wk12"))
metalite:::validate_plan(plan(analysis = "ae_summary", observation = "apat", parameter = "any"))
metalite:::validate_plan(plan(analysis = "ae_summary", population = "apat", parameter = "any"))
metalite:::validate_plan(plan(population = "ae_summary", observation = "wk12", parameter = "any"))

Test plan of add_plan():
- Test the following two parts of code generate the same output

plan("ae_summary", population = "apat", observation = c("wk12", "wk24"), parameter = "any;rel") |>
  add_plan("ae_specific", population = "apat", observation = c("wk12", "wk24"), parameter = c("any", "rel"))

dplyr::bind_rows(
  plan("ae_summary", population = "apat", observation = c("wk12", "wk24"), parameter = "any;rel"), 
  plan("ae_specific", population = "apat", observation = c("wk12", "wk24"), parameter = c("any", "rel")))

independent testing of `spec.r`

Test plan of spec_filename():
snapshot testing for analysis output filename
Test plan of spec_analysis_population():
snapshot testing for population definition
Test plan of spec_call_program():
snapshot testing for analysis call program
Test plan of spec_title():
snapshot testing for analysis title