Light

brad-cannell / detect_pilot_test_1y Goto Github PK

View Code? Open in Web Editor NEW

1.0 4.0 1.0 120.99 MB

Detection of Elder abuse Through Emergency Care Technicians 1-year Pilot Study

Home Page: https://brad-cannell.github.io/detect_pilot_test_1y/

License: Other

R 1.84% CSS 0.11% Jupyter Notebook 88.13% PostScript 8.55% Python 1.38%

aps screening medic analysis elder-abuse aging detection emergency-care-technicians

detect_pilot_test_1y's Introduction

Detection of Elder abuse Through Emergency Care Technicians (DETECT) by Brad Cannell is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Detection of Elder Abuse Through Emergency Care Technicians (DETECT) 1-Year Pilot Study

Please see this article for information about the background, rationalle, and design of this study.

detect_pilot_test_1y's People

Contributors

Stargazers

Watchers

Forkers

corvidfox

detect_pilot_test_1y's Issues

Start a make file

Merge MedStar data with APS data

Goal: We want to measure the agreement between the results of DETECT screenings and the results APS investigations.

Problem 1: Currently, the results of the DETECT screenings are in a dataset we received from MedStar Mobile Healthcare and the results of APS investigations are in a separate dataset we received from APS. We need to merge the two separate datasets into a single dataset that can be used for analysis.

Problem 2: There is no common identifier variable in both datasets that we can use to match records in the MedStar data with records in the APS data. Therefore, we will have to match based on name and date of birth, which we have in both datasets.

Problem 3. Although we have name and date of birth (dob) in both datasets, we can't match records across datasets in a deterministic way (i.e., IF first name = John in MedStar AND first name = John in APS THEN match, ELSE no match) because there are typos in the data. For example, "John" and "Jon" clearly being the same person (i.e., same last name, dob, and address).

Solution: Therefore, we will need to link records across the datasets probabilistically. R has at least two packages that are designed for probabilistic record linking:

Steps in the record linking process:

Prepare data for linking. First, standardize string variables that will be used for match. For example, convert all string values to lower case and extra spaces. Second, break name, dob, and address variables into separate variables containing their component parts. For example, convert "name" to "name_first" and "name_last" and "dob" to "dob_month", "dob_day", and "dob_year." We did this step in separate files for each of the datasets: data_aps_02_variable_management.Rmd and data_medstar_epcr_02_variable_management.Rmd.

-[ ] Next step...

Old stuff....

I copied "data_medstar_aps_merged_01.Rmd" from the 5-week analysis project to the 1-year analysis project. Before moving on to trying to get FastLink to work or writing you own matching algorithm, see if you can get this file to work using the new RecordLinkage big data classes.

https://cran.r-project.org/web/packages/RecordLinkage/vignettes/BigData.pdf

Remove the TOC stuff from the top of data_medstar_aps_merged_01_merge.Rmd
Check the really low weight matches too. I'm not sure how Record Linkage handles missing data. Maybe start with a random sample just to quickly get an idea.
Save RecordLinkage objects to secure drive
Move drop investigation stage to data_aps_02_variable_management.Rmd, if you keep it
If we just reduce our search space to unique combinations, the entire section "Prepare APS data for record matching" may be unnecessary.
Move all the data management stuff in the "reduce search space" section to the appropriate variable management file.

After you finish matching, consider breaking this code up into 3 separate files:

Cleaning and merging
Filtering merge
Data checking merge

Incident Complaint v Symptoms Table

Hi @mbcann01 ,

I am having some trouble with this code. I just pushed what I have gotten so far to the develop-medical-conditions branch. Starting at line 172, I do not know how to create the table since we are looking for the frequency of the cells, but the rows are incident complaints and columns are the totals from each individual symptom column.

Tabulate medical information from MedStar

Julie and Sid are interested in seeing a rough outline of the kinds of conditions that MedStar is making note of in the clinical records.

Assign values for each symptom dummy variable

See e4af061

Starting on line 90 of table_medstar_epcr_patient_symtoms.Rmd, we need to assign 0's and 1's to each of the dummy variables we created for symptoms.

Run difference in difference and ITS analysis

Get Jared access to the data server

We need to get you direct access to the data server. This way we are always both working from the same data source and we can use common file paths in our code (i.e., one file path that works on both of our computers).

Brad needs to talk to Chris Harvey about adding Jared as an authorized user
Jared needs to learn how to access the data from his laptop once he's been authorized to do so.

Link records to records from Tarrant County ME

ME website: https://mepublic.tarrantcounty.com/

See if we can link our records with death records from the Tarrant County ME.

May be able to create a function that will automatically search the website for a match.

Pre-clean the APS data: One row per case number

Right now, in detect_pilot_test_1y_refine_matches.Rmd, the APS data has a row for each reporter, as opposed to each case. This is causing some issues with merging the data (#27). Specifically, it affects our ability to identify the most proximal APS investigation to each DETECT screening. And we can't just arbitrarily pick one row from each case because we found differing investigation outcomes in some cases.

So, we want to retain the information related to multiple reporters, but we want to do it in a wide format.

Resolve within case discrepancies between rows
Widen reporter information by creating dummy variables
Create min and max investigation date variables
Create a number of reporters variable
Check to see how what effect, if any, this has on the RecordLinkage results.
- Changed ems dummy variable from "ems" to "reporter_ems". See how that affects the DDD code.
Push wide APS data frame through downstream code

Addressing in the bug-31-aps-one-row-per-case branch.

Tabulate demographic information for final report to DOJ

Need to complete #2 first.

Merge the APS and MedStar datasets into a single dataset for analysis

Overview

MedStar provided us with data on all of the initial DETECT screenings their medics completed during the study period.
APS provided us with data on all of the investigations they completed during the study period.
We need to link DETECT screenings in the MedStar data with investigation outcomes in the APS data.

Compliations

There is no unique person identifier in the datasets that we can use to link rows. Therefore, we will have to probabilistically link rows based on name, dob, and address.
Each person may have more than one row in the MedStar data.
Each person may have more than one row in the APS data. Additionally, each investigation may have more than one row in the APS data.
There are typos and misspellings within our matching variables (name, dob, and address).

Software

Here is a list of software packages we have tried already with mixed results.

R RecordLinkage package. This package has worked well for us in the past (see files in the DETECT 5-week pilot repo). However, with the larger 1-year dataset, we have repeatedly run into really slow run times and memory errors.
R fastLink package. We also experimented with this package. It seems to avoid the slow run times and memory errors that we had with RecordLinkage; however, we had trouble getting the output we needed from this package. See an issue we posted on fastLink's GitHub repo for details.
Python Dedupe package. Patrick and Sydney from Meadow's have been experimenting with this package. However, we are not as familiar with Python as we are with R.
- My understanding is that we are having trouble using the results of this package to create unique identifiers in the datasets.
- Additionally, I believe there may be a point-and-click element to the training process. For example, I think the package may randomly select potential matches for the user to evaluate while the model is training. If that is the case, I'm concerned about the reproducibility of our results.
SAS Link King macro. Finally, somebody suggested this package. If it's the best choice, then so be it, but I would personally prefer not to use SAS if we don't have to. Additionally, it looks as though this macro is no longer under active development.

Tasks

Depending on how involved each of these tasks are, and on Morri's workflow, it may make sense to break some of these off into their own separate issues.

Create function and sequence for fuzzy matching rows in a single data frame

• Reduce search space
• Block data (optional)
• Get potential matches
• Review potential matches
• Select matches
• Join unique ids

Replace feather files with csv files

It just seems more durable

Notes

Use topical branches for files that are in development.
Clear the environment at the bottom of every file
Start to incorporate pathfinder
Put all functions in R scripts with roxygen headers. At the end of the analysis add to bfuncs.
Use built-in TOC for notebooks and explore different themes as described here: https://minimaxir.com/2017/06/r-notebooks/
Try versioning - bibliography etc in RStudio?
Learn how to save directly to Google Docs?
Try creating Word output using officer
Try making shaded diagram boxes where it makes sense

Need to figure out why fastLink is matching rows to NA rows.

Try to recreate the problem in a reproducible example
View posteriors- add to select
Try tweaking threshold match

Originally posted by @mbcann01 in https://github.com/_render_node/MDEyOklzc3VlQ29tbWVudDQ1MjkyMTY0Mg==/timeline/issue_comment#issuecomment-452921642

Turn fuzzy match rows into an interactive shiny app

Identify unique people in the data.

First, I need to identify unique people in the data.

Originally posted by @mbcann01 in https://github.com/brad-cannell/detect_pilot_test_1y/issue_comments#issuecomment-450172533

Started on line 80 of data_medstar_epcr_01_variable_management.Rmd

Add APS data to REDCap

Overview

If Morri decides to go this route, we will add the APS data to a REDCap database. We would do this to eliminate the need to share files through shared folders and reduce the risk of accidentally pushing data to GitHub.

Tasks

Sign up for a REDCap account. There are actually two choices here. UTHealth has a REDCap instance and the School of Public Health also has a REDCap instance. For either, you need to request access. I'm not aware of any big differences between the two instances. I think the UTHealth instance might be running a slightly newer version of REDCap. All else being equal, we may want to use that one.
Create a REDCap project that can store the data.
Add Brad to the project.
Import the raw data into REDCap.
Request an API for the project.
Test importing data into R/Python via the API.
Document the process in an SOP/Continuity Guide.

Factors associated with completed screenings

Duplicate (almost) rows in medstar epcr data

There were some rows with the same pcr number and only differed by whether or not DETECT screenings were completed.

We should probably go ahead and get those rows out too.

Fix fmr_add_unique_id

Currently, fmr_add_unique_id assumes that the fastLink object is correct. However, in reality, we determined that we get more accurate results when we do some manual adjustment to the fastLink results. We need to add in the ability to account for that manual adjustment.

Create a figure to accompany the ITS analysis

Revise data for ITS analysis

Create data_aps_02_variable_management.Rmd
Renumber data_aps_02_process_for_its.Rmd
Clean character variables
Subset city for its
Add dummy for city
Add allegation outcome variables
Make available to Livingston

Add MedStar data to REDCap

Overview

If Morri decides to go this route, we will add the MedStar data to a REDCap database. We would do this to eliminate the need to share files through shared folders and reduce the risk of accidentally pushing data to GitHub.

Tasks

Sign up for a REDCap account. There are actually two choices here. UTHealth has a REDCap instance and the School of Public Health also has a REDCap instance. For either, you need to request access. I'm not aware of any big differences between the two instances. I think the UTHealth instance might be running a slightly newer version of REDCap. All else being equal, we may want to use that one.
Create a REDCap project that can store the data.
Add Brad to the project.
Import the raw data into REDCap.
Request an API for the project.
Test importing data into R/Python via the API.
Document the process in an SOP/Continuity Guide.

Create codebooks

Overview

After merging and wrangling the APS and MedStar data (#33), we need to create a codebook (or codebooks) for the data using codebookr

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.