cud2v / pccc Goto Github PK
View Code? Open in Web Editor NEWPediatric Complex Chronic Conditions: An R Package
Pediatric Complex Chronic Conditions: An R Package
@jamesfeinstein to loan NEDS CD to @tdbennett
Hi @magic-lantern and @jamesfeinstein,
Something isn't clear about this section of the overview vignette:
* All codes in all categories employ "starts with substring" matching logic. Because of this, if a code to be evaluated starts with a code listed in one of the CCC categories, a match will be found. This means that if a bad ICD code is provided (such as ICD-9-CM code 0492,25042) PCCC would indicate a match for the Neuromuscular CCC.
How are those codes "bad?" 0492 is used in an example above that section in a way that makes it seem acceptable.
I'll push up some text edits to a few files shortly.
Document from Chris Feudtner attached with his decisions on code discrepancies. He highlighted codes that ARE CCCs and wrote his comments in CAPS. We will need to discuss strategy for implementing code for some of the substring issues, for example, ICD-9-CM 359*.
CCC V2 Issues_dai.docx
December 1st target submission date
Do not combine the diagnostic codes and procedure codes together.
For example, 3321 procedure is 33.21, whereas 3321 diagnostic is 332.1.
From @ck2136 pull request (accepted with modifications) #33
"Argument that user can specify within the ccc() function to select which category of ccc that they want"
This is a good idea and would significantly improve performance for large data sets. Instead of requiring end user to filter results, only look for CCCs of interest.
* checking for code/documentation mismatches ... WARNING
Data codoc mismatches from documentation object 'pccc_icd10_dataset':
Variables in data frame 'pccc_icd10_dataset'
Code: dx1 dx10 dx2 dx3 dx4 dx5 dx6 dx7 dx8 dx9 g1 g10 g2 g3 g4 g5 g6
g7 g8 g9 id pc1 pc10 pc2 pc3 pc4 pc5 pc6 pc7 pc8 pc9
Docs: dx1:dx10 g1:g10 id pc1:pc10
Data codoc mismatches from documentation object 'pccc_icd9_dataset':
Variables in data frame 'pccc_icd9_dataset'
Code: dx1 dx10 dx2 dx3 dx4 dx5 dx6 dx7 dx8 dx9 g1 g10 g2 g3 g4 g5 g6
g7 g8 g9 id pc1 pc10 pc2 pc3 pc4 pc5 pc6 pc7 pc8 pc9
Docs: dx1:dx10 g1:g10 id pc1:pc10
I think I've noticed some errors in the ICD9 procedure codes for the metabolic CCC.
> pccc::get_codes(9)["metabolic", ]$pc
[1] "064" "0652" "0681" "073" "0764" "0765" "0768" "0769" "6241" "645" "6551" "6553"
[13] "6561" "6563" "6841" "6849" "6851" "6859" "6861" "6869" "6871" "6879" "8606"
Shouldn't '064' and '073' be '0064' and '0073', respectively? Otherwise I match 64.0/0640 = circumcision or 73.0 = procedures during delivery.
Thanks
Set up the C++ code to use RcppParallel to work on a subject.
Currently, the C++ is designed for one subject. A Map
call in the ccc
function sends each subject's data to the ccc_rcpp
call. Redesign so that the each subject is set up to go to its own core.
pccc
will not build from source or install from github on Windows or OSX currently. I believe this is related to how we are requiring Rcpp
and dplyr
. I'm exploring this and will update.
With the release of dplyr version 0.7, the underscored functions such as dplyr::select_
have been deprecated. Read vignette("programming", package = "dplyr")
for details.
The ccc.data.frame
function, see the file: R/ccc.R
, needs to be updated.
@jamesfeinstein to ask Chris Feudtner/Dingwei Dai if they restricted the KID sample prior to conducting the validation analysis of the 2014 SAS and Stata code
Expect hundreds of thousands, if not millions, of individuals in the data sets that need to be searched. Design the ccc
function to work on the rows of the input data. avoid the tidyr::gather
method that was used in the initial design.
Got this from CRAN maintainers:
There is a PROTECT bug in your package pccc (version 1.0.2). The bug manifests itself as "heap-use-after-free" with ASAN (see Additional issues in CRAN checks). One can provoke the problem also with gctorture (e.g. R_GCTORTURE=100) and using this small example which will segfault:
library(pccc); get_codes(10)
The problem is in get_codes.cpp, in calls like
Rcpp::List dx_fixed = Rcpp::List::create(
Rcpp::wrap(cds.get_dx_fixed_neuromusc()),
Rcpp::wrap(cds.get_dx_fixed_cvd()),
Rcpp::wrap(cds.get_dx_fixed_respiratory()),
Rcpp::CharacterVector::create(),
Rcpp::CharacterVector::create(), ...
Rcpp::wrap returns an (unprotected) newly allocated SEXP. All uses of Rcpp::wrap inside a call to Rcpp::List (many instances in get_codes.cpp) need to be protected via Rcpp::Shield(), otherwise these get destroyed by allocations when evaluating the other arguments - as in the attached patch.
I'm migrating the "Submit package to CRAN" issue originally tracked at magic-lantern#3 to this repo.
To see history of the issue, go to magic-lantern#3
Currently, main next steps are:
Right now the ICD codes for each CCC are hard coded into the file pccc.cpp
For ease of maintenance, pull the codes out into some external format - such as a CSV or other data structure that allows for users to inspect codes and modify if desired.
Would need to provide documentation on how to modify.
If there are issues or changes with the set of ICD codes that go with a particular CCC, this might make it easier for users to make a pull request as they wouldn't need to be familiar with the source code.
@dewittpe to explore icd
R package capabilities vs. Stata native icd clean
functions.
SAS and Stata code in 2014 paper differ in this: Stata script includes pre-processing that removes leading and trailing blanks and leading 0's.
In file included from ccc.cpp:6:
pccc.h:43: error: a brace-enclosed initializer is not allowed here before ‘{’ token
pccc.h:43: error: ISO C++ forbids initialization of member ‘empty’
pccc.h:43: error: making ‘empty’ static
pccc.h:43: error: invalid in-class initialization of static data member of non-integral type ‘const std::vector<std::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::basic_string<char, std::char_traits, std::allocator > > >���
make: *** [ccc.o] Error 1
ERROR: compilation failed for package ‘pccc’
If this is not a bug, would you mind assisting me in installing the package? Thank you.0
The ccc
function uses the dx codes twice and ignores the px code inputs.
When fixing this also allow for NULL
values to be set for either the dx or px codes.
Documentation for malignancy ICD10 codes include "C00-C96" (https://github.com/CUD2V/pccc/blob/master/inst/pccc_references/Categories_of_CCCv2_and_Corresponding_ICD.docx)
In the src code for the package, however, only "C" is defined.
Line 172 in f564b5c
Do we need to explicitly define C00, C01, C02, C03, C04, ..., C96? I think so for two reasons,
library(pccc)
packageVersion("pccc")
# [1] ‘1.0.5’
# id2 has a made up code "CB" which should not match anything, but returns true
# for malignancy
eg_data <- data.frame(id = c("id1", "id2", "id3"),
dx1 = c("NOTACODE", "NOTACODE", "notacode"),
dx2 = c("C00", "E75", "NOTACODE"),
dx3 = c("A", "CB", "C"))
ccc(eg_data, dx_cols = dplyr::starts_with("dx"), icdv = 10)
# neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic malignancy neonatal tech_dep transplant ccc_flag
# 1 0 0 0 0 0 0 0 0 1 0 0 0 1
# 2 1 0 0 0 0 0 0 0 1 0 0 0 1
# 3 0 0 0 0 0 0 0 0 1 0 0 0 1
Update the file for v0.3.0
Found some codes that appear to be invalid ICD10CM codes. Some appear to be typos, but others I'm not sure about.
Reached out to JF to get advice.
icd10cm | category | code_type | notes |
---|---|---|---|
D08 | malignancy | dx | invalid |
D85 | hemato_immu | dx | ?E85 (Amyloidosis) |
D87 | hemato_immu | dx | invalid? |
D88 | hemato_immu | dx | invalid |
G8290 | neuromusc | dx | invalid |
P2521 | neonatal | dx | ? P52.21 (Intraventricular nontraumatic hemorrhage) |
P2522 | neonatal | dx | invalid |
Z446 | tech_dep | dx | ?Z46.6 (Encounter for fitting and adjustment of urinary device) |
Z446 | renal | dx | ?Z46.6 |
Z45441 | tech_dep | dx | invalid |
Z45442 | tech_dep | dx | invalid |
@dewittpe - tagging you so you are aware I've filed this issue.
@jamesfeinstein, here are the hand sketches of tables from our meeting 5/18
This issue was originally opened as magic-lantern#1 - moving to new primary repo. Due to the number of potential issues with the ICD matching, I've copied the entire original post here.
Code fixes will be implemented by @magic-lantern, actual decisions to how each item will be resolved will be primarily made by @jamesfeinstein
ICD 9 duplicates (are any of these mistakes and should actually be a different code?)
ICD9 code discrepancies
There are several differences between SAS/R and Stata for ICD10 codes. Here are the mismatches that I found
There are also some ICD10 codes that could be problematic - this is looking at the SAS file:
General logic question: For the procedure codes, I noticed that only respiratory_ccc is using “in:“ and all others are using “in” Doing substring matching for one group and not for the rest is quite a bit different than the ICD9 code logic. Is that what is wanted?
Hi @dewittpe, can you please add @jamesfeinstein to the repo?
@jamesfeinstein to get updated code lists from Chris Feudtner, add them to this issue
@jamesfeinstein and @tdbennett to edit the vignettes started by @dewittpe
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.