cohhio / cohhio_hmis Goto Github PK

Code for pulling in HMIS data, writing it out to reports

License: GNU Affero General Public License v3.0

R 100.00%

cohhio_hmis's Introduction

COHHIO HMIS

An open-source Homeless Management Information System (HMIS) custom reporting project

This repository would be helpful to any HUD-designated Continuum of Care looking to get more out of their HMIS data. While there are some reports here, this repository's most important function is getting data from HMIS to a tidy form that can be used in other projects (and repositories). Think of it like a staging area for other projects.

About

This is an open source project released under the GNU AGPLv3 license. See LICENSE for more details or visit the official GNU page at http://www.gnu.org/licenses/agpl-3.0.html.

All the code in this repository is written using R in R Studio. Please consult the book R for Data Science for help getting started with R and R Studio. Also feel free to do the R for HMIS Admins training series for an easy landing.

Data Sources

The HUD CSV export - Every HMIS should be able to export this regardless of HMIS vendor.
Other data as necessary, to supplement what is available in the Export.

While your reporting needs may be different, this report contains data we need for our reporting that is not included in the HUD CSVs. These include but are not limited to:

SPDAT Scores
County (Enrollment level)
Entry Exit Type (specific to ServicePoint)
Users' Default Providers
Full Provider and Organization Names (beyond the 50 character limit)
Grant Type (specific to ServicePoint)
Veteran data for Coordinated Entry (Custom fields)
Housing Offers (Custom fields for Veteran Active List)
COVID-19 data

You may not need to supplement the data in your HUD CSV export, so whatever you use from this repository will need to be adjusted to leave it out. Or you may need a different set of miscellaneous data which will need to be incorporated to fit your needs. Other CoCs using this repository would need to work out what other data would be needed and how that data could be obtained.

If you want to know exactly what data I'm using, see below, and email me for more specifics if you need it.

Workflow

The workflow I'm currently using (which I expect to become less tedious eventually) is the following:

Be sure you have the following directories in your R project:
- data (to keep your HUD CSV files and whatever other data you will use)
- images (see #4)
- random_data
Download and unzip your HUD CSV export into the data directory of the R Studio project. Permanently delete the .zip once it has been extracted so the PII doesn't remain on your hard drive.
You will probably want to build out your own data that's not available in the Export or you may find that the Export has everything you need for now. Either way, download any other necessary data into the data directory.
Run 00_first_run.R, which will check that you have all the main packages installed, then run 00_daily_update.R which will run all the other scripts that begin with a number. You can modify yours to fit your needs.
Doing 3. will create a .RData image file for each script that runs. Each image file has an extension of .RData and will be dropped into the images folder.
The 00_copy_images.R script will then copy the necessary files to R minor and R minor elevated (2 Shiny apps we use).

I hope you can find useful code here. Please share feedback in the issues or email me at genelledenzin at cohhio dot org!

Security

No HMIS data is ever included in this repository. To make this code work, you will need to supply your own HMIS data. You are responsible for securing your HUD CSV export on your computer and ensuring that it is not compromised using whatever security measures you use for that locally.

cohhio_hmis's People

Contributors

Stargazers

Watchers

Forkers

entomacy ellilock gwenbeebe yogat3ch kiadso

cohhio_hmis's Issues

Add Returns to Homelessness Calculations to QPR

This is a piece that was dropped out of the former QPR in the interest of time. We do still need it though.

Need to be able to easily calculate percentages of clients who are considered to have returned to homelessness and show that in a tab under the Quarterly Performance Report in both Rm & Rme.

Currently, the CoC team has to calculate this kind of manually thanks to a terrible ART report that needs totaling in Excel.

See the video titled "System Performance Measure #2: The Extent to Which Persons who Exit Homelessness Return to Homelessness" here: https://www.hudexchange.info/trainings/system-performance-measures/

Some SSNs coming out invalid that are

Awaiting example Client IDs from the user.

Participant Definition?

This scoring report is a beast! Especially with all the neat functions and modular pieces that have been added in the last few months. I'm starting to feel like I'm getting my bearings, and my first definition question is on the non-cash benefits points.

The 06 script is calculating it with pe_adults_moved_in_leavers, and that dataframe is limited to folks leaving during that time period that had a move-in date for housing programs. Why aren't we including folks that exited housing programs without a housing move-in date? I'm guessing it's part of the participant definition, but I'm not seeing that spelled out in the scoring documentation that I have so wanted to double-check.

Add PATH-related Data Quality issues to Data_Quality.R

Missing PATH data at Entry
Missing PATH data at Exit
Incorrect PATH Contact Date
Missing PATH Contact End Date
Missing PATH Contacts

Questions about SinglyChronic Logic

I have two questions about this SinglyChronic logic:

At line 444 we're checking to see whether a year has passed between the homelessness date and their entry date. If they are in a literally homeless project, perhaps we should be comparing that homelessness date to today? I'm thinking of a person who enters the unsheltered provider when they're at 11 months, but remain on that list beyond the one year mark.
We might want to consider updating line 448 to
(TimesHomelessPastThreeYears == 1 | TimesHomelessPastThreeYears == 4) &
to capture the possibility of users entering the correct numbers of months but perhaps not the correct start date. It isn't a big change--I'm only seeing 14 clients that would be considered chronic after making this adjustment that are not currently considered chronic--but I wanted to note it.

437	singly_chronic <-
438	  active_list %>%
439	  left_join(smallEnrollment,
440				by = c("PersonalID",
441					   "EnrollmentID",
442					   "HouseholdID")) %>%
443	  mutate(SinglyChronic =
444			   if_else(((ymd(DateToStreetESSH) + days(365) <= ymd(EntryDate) &
445						   !is.na(DateToStreetESSH)) |
446						  (
447							MonthsHomelessPastThreeYears %in% c(112, 113) &
448							  TimesHomelessPastThreeYears == 4 &
449							  !is.na(MonthsHomelessPastThreeYears) &
450							  !is.na(TimesHomelessPastThreeYears)
451						  )
452			   ) &
453				 DisablingCondition == 1 &
454				 !is.na(DisablingCondition), 1, 0))

Ignore HOPWA data entirely

Adjust importing workflow so we're getting ALL Services and ALL Referrals

The HUD CSV Export only pulls in PATH Referrals and certain Services. We need all of them.

Question about client counts

In 06_Project_Evaluation, we're deduplicating people right off the bat when we create the data frames like pe_clients_served and pe_adults_moved_in_leavers by selecting the last enrollment for each person for each program.

If someone met a specific criteria in an earlier enrollment in that period, however, shouldn't the program still get that point? Homelessness history, for instance--if a client was enrolled twice during the period and had a higher-priority homelessness history at the first point of enrollment, would we still want to be judging the program based on their re-entry?

I'm only seeing general standards like "% participants" in the scoring, not anything specifying that only their most recent entry in the period is considered. It may not matter that much--there isn't a ton of deduplication going on in these records--but I wanted to be sure that using the most recent one was a specific choice you had made.

Rme Utilization pulls in records by served_between() instead of stayed_between()

When you select a provider that has a client who has an Entry Date in one month and a Move in Date in another month and then you look at the month they entered (but didn't move in yet), the record pulls in with a null for '# of bed nights in [month]" This causes the infobox that's meant to calculate utilization to return "NA" when it should not.

To Reproduce

run it on Merici PSH, January 2020
check last record, client 266812
it should either not be there or it should show a 0 for # of bed nights

Expected behavior
When this happens, the record should either not be there or it should show a 0 for # of bed nights

Strategy
I can't just switch it to stayed_between() because there's no "EntryAdjust" in the utilizers_clients df. Could either add this or just overwrite the NA's with zeros and show the record, since they had entered that month.

Add reactive map to CE Access Points tab in Rm

Replace the static image of the Homeless Planning Regions with an interactive map that reacts to the inputs at the top of the page.

Modify cohorts.R so that the summary_dfs are deduplicated

Description by GB: the household counts in 00_cohorts are duplicated, so if a client exits and re-enters program multiple times in a reporting period they will be counted each time.

Discussion on Slack: this should be fixed in the cohorts script.

Current image save / load image method loads a significant number of objects into memory which aren't used in server.R.

Describe the bug
As I've been taking a look a the memory allocation of the app and the objects loaded into the server environment, I see a fairly large number of variables for which I am unable to find code that uses them in ui.R or server.R.
These are currently filling RAM unnecessarily on the server (duplicated as many times as there are sessions).

Recommendations
Here are the Memory Optimization analyses for Rminor and Rminor_elevated
Note: The drive preview will show the raw HTML. The file can be downloaded and opened in a browser to view as a typical webpage.

The results of the analysis are at the top of each of these files.

A save call is generated that can be used at the bottom of 00_daily_update.R for each of the respective apps to save the necessary data to a file of the corresponding app name in it's respective data folder.

Maintenance
If new objects are used in either app, the objects in the save list will need to be updated accordingly.

The respective save calls are tested and working with Rminor_elevated and Rminor.
Once we come to a decision on this I can make a PR with the agreed upon changes.

COVID Yes/No Duplication

All clients with a COVID assessment entered are currently being marked as Needs Isolation/Quarantine on the prioritization list because replace_yes_no() is applied to this assessment twice. The first application begins at line 417 in 00_get_Export_and_ART, and the second applications begins at line 202 in 08_Active_List.

The function replace_yes_no() can't translate the binary variable it creates, so when run twice it changes all fields to 1. I commented code in the COVID section of 00_get_Export_and_ART as shown below, and all four possible levels appeared in the prioritization list in the correct order. Another option might be extending replace_yes_no() to accept binary variables.

# COVID-19 ----------------------------------------------------------------
  
covid19 <-
  read_xlsx(paste0(directory, "/RMisc2.xlsx"), sheet = 6) %>%
  mutate(
    COVID19AssessmentDate = ymd(as.Date(COVID19AssessmentDate,
                                        origin = "1899-12-30")),
    ContactWithConfirmedDate = ymd(as.Date(ContactWithConfirmedDate,
                                           origin = "1899-12-30")),
    ContactWithUnderInvestigationDate = ymd(
      as.Date(ContactWithUnderInvestigationDate,
              origin = "1899-12-30")
    ),
    TestDate = ymd(as.Date(TestDate,
                           origin = "1899-12-30")),
    DateUnderInvestigation = ymd(as.Date(DateUnderInvestigation,
                                         origin = "1899-12-30"))#,
    # Tested = replace_yes_no(Tested),
    # UnderInvestigation = replace_yes_no(UnderInvestigation),
    # ContactWithConfirmedCOVID19Patient = replace_yes_no(
    #   ContactWithConfirmedCOVID19Patient
    # ),
    # ContactWithUnderCOVID19Investigation = replace_yes_no(
    #   ContactWithUnderCOVID19Investigation
    # )
  ) #%>%
  # mutate_at(vars(matches("Symptom")), replace_yes_no) %>%
  # mutate_at(vars(matches("HealthRisk")), replace_yes_no)

Whenever hh members have a different Move-In Date than the HoH, the Move-In Date doesn't show in the Client Counts rpt

Client IDs 224954-6 have a different Move-In Date than the HoH and show as having Exited No Move In Date on the Client Counts report in Rme. They shouldn't be counted that way even if their Move-In Dates are different than the HoH's.

Round out the Incorrect Entry Exit Type Issue in DataQuality.R

This is coded in a pretty crude way, needs way more nuance.

Community Need and DV

This is verging on pedantic, which I think says good things about this code? That we are picking up such small things? Anyway, in our appendix definitions we define a score for the "% head of households who entered the project during the date range and had a VI-SPDAT recorded
in HMIS (excludes clients for whom a current episode of DV was reported or who reported as currently fleeing)," emphasis mine.

At line 1611 in the 06 script, we are using the "Non-DV HoHs Entering PH or TH without SPDAT" issue from the data quality script to identify which households do and do not meet this metric. The filter logic in the data quality report is

ProjectType %in% c(2, 3, 9, 13) &
  ymd(EntryDate) > ymd(hc_began_requiring_spdats) &
  # only looking at 1/1/2019 forward
  RelationshipToHoH == 1 &
  (CurrentlyFleeing != 1 |
     is.na(CurrentlyFleeing) |
     !WhenOccurred %in% c(1:3))

If "a current episode of DV" is the same as WhenOccurred %in% c(1:3), then our code is filtering out households where a current episode of DV was reported and who reported as currently fleeing. If they're marked as currently fleeing but don't have a recent timeframe listed or vice versa, they'll still get the flag about VI-SPDATs.

I think that's generally appropriate--we want those to be as universal as possible, so anytime there might be an error it should be flagged! But in this specific case, where the guidelines say they can meet either one of those conditions, I wanted to flag it as possibly including some households that should be excluded.

Write replacement for Veterans Active List in ART

Since ART is going away soon, we need to move the Veterans Active List into Rme. It is used a lot, and users have been consulted about what in that report has been useful and not as important/helpful.

New report will have:

Instead of two separate reports for "current" vs "active list", there will just be a filter on a single dataframe that users can select which they want to see.
Benchmark calculations, up-to-date daily results
Chronic and Exempted veterans data listed separately
Totals & Summaries of Active List
Active Inflow/Outflow
Chronic/Long-Term Totals
Homeless with No Offers of Housing
Average Days Homeless

Considered transitioning this report to a new software vendor, but current research suggests that no vendors have this specific report that is coded to the benchmarks put out by USICH.

Veteran Benchmarks: https://www.usich.gov/tools-for-action/criteria-for-ending-veteran-homelessness/

Break out Utilization by HH Type

Would it help to differentiate in the Utilization reporting on Rm/Rme between Adults-Only, Adults-Children, and Children-Only beds vs clients? I feel like all the utilization reporting in Rm/Rme is not that helpful for LSA purposes, which is fine if I'm the only one that cares about that aspect, but if it would help literally anyone else, I may as well do this for next year's LSA/PIT/HIC.

Break out Missing and DKR Living Situation Issue into more granular issues in DataQuality.R

One reason is the user will find it more helpful
Another reason is in the CoC-wide Data Quality data, that error accounts for a LOT of errors and kind of obscures other (more granular) issues.

Add Mahoning's SPM data to the QPR

The Mahoning Performance & Outcomes Committee requested that I create a page for them like the BoS has with all the latest SPM data.

I already have a directory to keep Mahoning SPM data in, just need to rework the SPMs script to capture it and make the outputs clear about what CoC it's for, then adjust Rm to read those correctly.

Still not landed on how to show it specifically in Rm, as I don't have a sense for how Mahoning wants their QPR shown in Rm.

Household Size in Active_List

Right now, we're calculating household size after filtering down to a single row for each client. This means that if they originally had additional household members in that entry with them but those household members have a different entry selected to display on the list, those members will not be counted in their household size. This is causing roughly 120 clients to display a household size smaller than the number of people originally associated with their entry. I think it may make sense to move the household size calculation up into the co_currently_homeless pipes at the top.

To view the rows with differing household sizes, run the following code after all but the last two lines of the active_list script:

compare_hh_size <- co_currently_homeless %>%
  group_by(HouseholdID) %>%
  mutate(HouseholdSize = n()) %>%
  ungroup() %>%
  inner_join(
    active_list %>%
      select(HouseholdID, HouseholdSize),
    by = "HouseholdID") %>%
  filter(HouseholdSize.x != HouseholdSize.y)

Services and Referrals not tied to the exact Entry Exit

Currently the logic ties a Service or Referral to an EE if it falls within the Entry and Exit Dates. This causes a Service/Referral to be tied to sometimes multiple EEs, which is causing false positives in the Data Quality report for projects that didn't actually create the Service/Referral.

I need to be pulling in the Service/Referral Provider data to more neatly tie them to the EEs.

Notes on `pe_score`

I'm going through the scoring and the guidelines piece by piece, here are my potential discrepancies.

Rapid re-housing programs are currently getting full points for average stays of under 730 days, but the guidelines show four levels from 150 to 210 days
The guidelines don't account for possible homeless history indexes of 0 points, but they're possible in the code
The 34_40_10 structure is not used in the code
The PSH programs are currently being evaluated on the TH 24-30% scale for 0 income at entry instead of the 34-40% housing one
Some logic sets rely on order of evaluation, while others are mutually exclusive (i.e. value >= .34 & value < .37 vs. value >= .34 following a check for >= .37. This isn't a process issue, but I'd like to switch to order of evaluation all the way through for consistency

I can fix all but the second if you'd like, just wanted to be sure they weren't related to guideline changes! Let me know what you think.

Add Side Door Warning to Data Quality

Side Door means a household was entered into a project without a Referral from an Access Point.

Add Service Area maps to Rm

Currently our Service Area maps are spread out across Tableau Public and the cohhio website and may or may not be up to date. We need a central place where we keep updated Service Area information and the maps that pull from that consistently and reliably updated data.

A single new tab in Rm named Service Areas.
Should display a "Last updated" date.
Offer embed code (which will also display the "Last Updated" date)
User can select a Service Area and the map reacts

Create tab in Rme that groups clients who are due for their 2nd dose

Feb 5th, users began collecting vaccine data for a number of reasons, the main one being so that we can help them organize working with their local health districts to get homeless households vaccinated if they want.

In Rme, a separate tab named COVID-19 Vaccine Distribution (or something like this) searchable by County or Organization. Once parameters are set, the user should be able to see groups of clients who are due in the next 2, 4, and 7 days. Each row should represent a separate client and be grouped by household. Columns should include veteran status, contact info, and whether they're currently in a given project and if not where they were last and what their Destination was.

Add Missing Destination as a Warning to DataQuality.R

Add Data Quality errors for Vaccine questions

Feb 5th, users began collecting vaccine data and for the BoS, it is required, so I'm adding it to the Data Quality report.

Finding the exact cohort of households that should be checked will be the main issue. Here's how we're doing this:

client has been in a BoS provider since 2/5/2021 of project type 1, 2, 3, 4, 9, 13
client is in 3, 9, or 13 and either has no Move-In Date or the Move-In Date was prior to 2/5/2021 or the Move-In Date = the Entry Date.

Create Error:
Missing Vaccine Data

Error is thrown if the client is in the cohort described above and neither the consent question is answered nor do they have a vaccination sub.

Considered having 2 errors, one for if the client is current and one for if they aren't, but that will fluctuate, and would be kind of complicated to keep up with.

Creating Move-In Dates

In the 00_get_export_and_ART script, the second condition in the code creating the MoveInDateAdjust column sets additional household members' move-in dates to their entry date if it doesn't match the head of household's entry date and they are in a housing program (line 267).

Applying this logic to a hypothetical scenario where the head of household and another member have different program entry dates but have not moved into a unit yet seems like it could cause a false move-in date to be imputed to the non-head of household.

Enrollment <- Enrollment %>%
  left_join(small_project, by = "ProjectID") %>%
  left_join(HoHsEntry, by = "HouseholdID") %>%
  mutate(
    MoveInDateAdjust = case_when(
      EntryDate < mdy("10012017") &
        ProjectType %in% c(3, 9)
      ~ EntryDate,
      EntryDate != HoHsEntry &                        <- starting on this line
        ProjectType %in% c(3, 9, 13) ~ EntryDate,
      EntryDate >= mdy("10012017") &
        ProjectType %in% c(3, 9) &
        ymd(EntryDate) <= ymd(MoveInDate) &
        ymd(MoveInDate) <= ExitAdjust
      ~ MoveInDate,
      ymd(EntryDate) <= ymd(MoveInDate) &
        ymd(MoveInDate) <= ExitAdjust &
        ProjectType == 13 ~ MoveInDate
    ),
    EntryAdjust = case_when(
      ProjectType %in% c(1, 2, 4, 8, 12) ~ EntryDate,
      ProjectType %in% c(3, 9, 13) &
        !is.na(MoveInDateAdjust) ~ MoveInDateAdjust
    )
  )

Round out the Non-HoHs with Services/Referrals logic in DataQuality.R

SSVF projects should be showing this as an Error, whereas non-SSVF projects should be showing it as a warning, and only back to Feb of 2018.

Add analysis of APs who have (and have not) created Diversion Services

Add to the CE Summary tab.

Correct Incomplete Living Situation (something is wrong) in DataQuality.R

Something in the code is causing false positives.

Account for future Exit Dates in Active List

Description from GB: When the active list filters just for clients that are currently homeless, it's checking for null exit dates. There's ~25 folks with future exit dates, so I went and spot-checked some of their audit records. It looks like they're mostly year errors (like they exited at the end of 2019, but the data was entered in 2020 and folks just entered 2020 by default).

Discussion from Slack: Added a check for this in the Data Quality reporting but also we should include these clients since technically speaking, HMIS thinks those clients are still in that program. (Returning user-entered data back to them.)

On the COVID-19 Vaccine Distribution tab, add counts of interested clients with age breakdown, veteran status

Since Feb 5th, users have been collecting vaccine data. One of the reasons was to help agencies coordinate with local health districts. One of the data elements they are collecting is if each client would consent to a vaccine at no cost. So a count of all the clients who would consent broken out by county, veteran status, and age would be helpful for coordination purposes.

We're also collecting a space for reasons a client would not consent. We may just use this internally, but it could also be good to make this data available by county as well to help agencies with messaging/information gathering from health professionals.

Ohio map with a count in each county, optional filters to show Veterans only, and different age ranges.

Under this, maybe a word cloud that can show the most common words in the "concerns" data element.

The word cloud can wait, but the map with the county counts is needed for sure.

Other alternative considered was adding a way to also filter by organization but I think that's more difficult and not that relevant?

Add Diversion Data Quality elements

Not currently checking anything to do with diversion data. (Need to work with VW on getting at what to check for, etc.)

Break up Utilization reporting by Household Type

Current Utilization looks at unit and bed utilization rates at the project level, which is great, but it seems, based on the LSA warnings they've built, that we're supposed to be looking at utilization at an even more micro level, by Household Type as well.

I'm thinking of adding the Household Type dimension to the analysis so that for a given provider, you can see their AC and AO bed and unit utilization across time (Rm) and at the client detail level (Rme).

I'm ranking this as medium priority because it is not a thing that needs to happen prior to vendor transition but it would be really helpful to have prior to the next LSA.

Which `pe_` cohorts should be deduplicated? (and when?)

Right now, most of our pe_ cohorts are getting deduplicated by person and alternate project ID. However, pe_adults_entered contains a few duplicates (my data files aren't from today, but I'm showing five duplicates). If we deduplicate it, we see different results in our summary_pe_adults_entered dataframe because our summarizing is done with n() instead of distinct counts. This leaves me with two questions:

Should this be deduplicated, and if so, should we keep the first or most recent entry?
Are you interested in switching our aggregations to distinct counts instead of using n() as a de-duping failsafe, or does it make more sense to keep going by row counts?

Minor DV Redundancy

On lines 601 and 602 in the active list script, we check to see if any of the rows we haven't set yet have CurrentlyFleeing or WhenOccured set to 8, 9, or 99. After the first two conditions, all the possible remaining combinations are inside CurrentlyFleeing 8 and 9 (as shown here), so we could just check for that.

583	dv <- active_list %>%
584	  left_join(
585	    HealthAndDV %>%
586	      filter(DataCollectionStage == 1) %>%
587	      select(EnrollmentID,
588	             PersonalID,
589	             CurrentlyFleeing,
590	             WhenOccurred),
591	    by = c("EnrollmentID", "PersonalID")
592	  ) %>%
593	  mutate(
594	    CurrentlyFleeing = if_else(is.na(CurrentlyFleeing), 99, CurrentlyFleeing),
595	    WhenOccurred = if_else(is.na(WhenOccurred), 99, WhenOccurred),
596	    CurrentlyFleeing = case_when(
597	      CurrentlyFleeing %in% c(0, 99) &
598	        WhenOccurred %in% c(4, 8, 9, 99) ~ "No",
599	      CurrentlyFleeing == 1 |
600	        WhenOccurred %in% c(1:3) ~ "Yes",
601	      CurrentlyFleeing %in% c(8, 9, 99) |
602	        WhenOccurred %in% c(8, 9, 99) ~ "Unknown"
603	    )
604	  ) %>%
605	  select(-WhenOccurred)

Case Manager report

There's a need for users to be able to see things that help them on the day to day like noticing youth who are about to age out, households who need an annual assessment, stayer children becoming 18 who will need more data collected on them on their birthdays, etc.

Separate tab in Rme with an Organization drop list and then data frames (depending on the project type and funding source) will show that are relevant to that Org's projects.

Considered waiting for all this stuff to be just available in new software as dashboards.

Clarifying HoH Correction Logic

It looks like we might have an unnecessary line of code in this part of the 08_Active_List head of household correction logic?

168	# merging the "corrected" hohs back into the main dataset with a flag, then
169	# correcting the RelationshipToHoH
170	hohs <- active_list %>%
171	  left_join(Adjusted_HoHs,
172	            by = c("HouseholdID", "PersonalID", "EnrollmentID")) %>%
173	  mutate(RelationshipToHoH = if_else(correctedhoh == 1, 1, RelationshipToHoH)) %>%
174	  select(PersonalID, HouseholdID, correctedhoh)

At line 173 we're modifying the RelationshipToHoH column, but in line 174 we don't select RelationshipToHoH as one of our kept columns. If we don't want to keep it I don't think we need to modify it and might be able to drop line 173.

If we do want to keep it, though, I also want to flag that at a glance that line looks like it's updating the RelationshipToHoH column to 1 if the correctedhoh column is 1 and keeping the original value otherwise. However, if_else can't interpret NAs with ==, so we're actually setting the RelationshipToHoH to NA for every row where correctedhoh is NA. If that isn't what we're trying to do, we could change that line to mutate(RelationshipToHoH = if_else(correctedhoh %in% 1, 1, RelationshipToHoH))

"Eligible For PSH" could be extended

The documentation for the "Eligible for PSH" column on the prioritization list says:

Eligible for PSH:
If you would like to filter for households eligible for PSH, enter "Yes" in the column search in the Eligible for PSH column. If there is a "Yes" there, it means someone in the household has a qualifying disability.

There are just over 200 folks on the priority list with differing disability statuses between 1.3 (subassessment) and 3.08 (yes/no dropdown), with about a hundred of those people who could be marked yes based on 1.3 but are not currently because of their 3.08 response. Based on that additional information, it could make sense to include them as potentially eligible for PSH.

Additionally, the HUD definition of chronic homelessness says that receiving disability income can be considered verification of a disability for the purposes of determining chronic homelessness. We know that not everyone who receives disability self-identifies as disabled, and because HMIS data is largely self-reported, that can cause discrepancies. If we wanted to mark these people as potentially PSH eligible as well, that would bring us up to ~240 mismatches, including ~150 people who are not currently marked eligible but could be.

To look at all this, you can just paste the following code at the end of 08_ script. The D_Disability column looks at 1.3, and I_Disability looks at 4.02.

disability_disagreement <- Disabilities %>%
  group_by(EnrollmentID) %>%
  mutate(D_Disability = if_else(DisabilityResponse == 1 &
                                   IndefiniteAndImpairs != 0, 1, 0),
         D_Disability = max(D_Disability)) %>%
  select(EnrollmentID, D_Disability) %>%
  distinct() %>%
  inner_join(Enrollment %>%
               select(
                 EnrollmentID, 
                 PersonalID, 
                 RelationshipToHoH, 
                 DisablingCondition, 
                 ProjectName,
                 EntryDate, 
                 ExitDate), 
             by = "EnrollmentID") %>%
  left_join(IncomeBenefits %>%
              filter(SSDI == 1 |
                       VADisabilityService == 1 |
                       VADisabilityNonService == 1 |
                       PrivateDisability == 1) %>%
              select(EnrollmentID) %>%
              distinct() %>%
              mutate(I_Disability = 1)
            , by = "EnrollmentID") %>%
  filter((D_Disability != DisablingCondition &
         !(D_Disability == 0
           & DisablingCondition %in% c(8, 9, 99))) |
           (DisablingCondition != 1 &
              !is.na(I_Disability))) %>%
  select(EnrollmentID, 
         PersonalID, 
         DisablingCondition, 
         D_Disability,
         I_Disability,
         ProjectName, 
         RelationshipToHoH,
         EntryDate,
         ExitDate) %>%
  mutate(PersonalID = as.character(PersonalID))

error_on_active <- disability_disagreement %>%
  inner_join(active_list, by = "PersonalID")

length(unique(error_on_active$PersonalID))

Give users a way to verify PDDEs

Currently, an agency can run this ART report that doubles as a form that they can print out that has all their PDDEs listed, with a space to indicate if there's been a change and when that change happened, a place to sign, instructions as to where to send the form, etc. It's a whole process that will be interrupted with the loss of ART when we move vendors.

Ideally this functionality exists in Clarity but assuming it does not, we need a way to communicate what the data is, allow space for feedback from the user, and there needs to be a way to submit that feedback to us so that the CoC can approve any requested changes and HMIS can actually make the changes.

An idea is to create a tab in Rme called Project Admin Change Requests or something, allow the user to select any number of providers they're needing to submit, then click Download, receive a Word file generated from R Markdown (?) with the data showing and spaces for feedback, then they can send the Word doc to us in the usual way (CoC helpdesk.)

Another idea is to explore the options Shiny has to receive user feedback in the app. I'm less interested in this option because really the form needs to go to the CoC, not to me.

Another idea that I don't like is to have it export a dataframe to Excel and have users send that back to us. The problem with that is it will not be obvious to the user what to do, and I don't know how there would be a signature line.

Prep for changing import start dates into Rm and Rme

Currently I'm running the HUD CSV Export back to 1/1/2018, but now that we're in 2021, I want to start pulling only back to 1/1/2019. The reason I can't do this today is we need data back to 10/1/2018 now because of the LSA. The other reason is I'm not entirely sure that my code is totally ready for this change.
I need to be sure that no dates have been hard coded anywhere and set up a non-fragile way of being sure that dates are handled correctly.

Possible Over-Inclusion of TAY

In the 00_cohorts script on line 58, the lines that identify the TAY in the system set an age ceiling but no floor. When I ran this with today's data, my max age range for people labeled members of TAY household was -72 through 24. It's possible that the households with very young people listed as heads of household are data quality issues rather than extremely young households.

# Transition Aged Youth

tay <- Enrollment %>%
  left_join(Client, by = "PersonalID") %>%
  select(all_of(vars_we_want)) %>%
  group_by(HouseholdID) %>%
  mutate(
    TAY = if_else(max(AgeAtEntry) < 25, 1, 0)
  ) %>%
  ungroup() %>%
  filter(TAY == 1, !is.na(ProjectName))

HUD CSV Export version FY2020 from SP: waiting on corrections from WS

The HUD CSV Export from ServicePoint is much improved over the past year, but some minor defects remain. Until they're fixed, be aware that the code in all of my repositories is written to work around these issues, and as they are fixed, the code will be adjusted.

WellSky Ticket	The Problem	What I'm Doing While it's being Fixed
864582	Yes/No data (e.g. IncomeFromAnySource) differs from SP data bc WS is overwriting nulls with whatever the subs are	WS will fix this. Since we're missing false negatives, at least it's not causing users to go hunt for something that's not a problem.
864459	LoSUnderThreshold should have 0, 1, or 99, but instead it has True/False data.	The HUD CSV specs say this data should return either a 0, a 1, a 99, or a null. The reason this matters is I can't tell if a null means it was irrelevant based on the other answers or if it's null because it was not answered. Asked them to re-open the case on 11/12/2019. Currently it's coded around.
877960	Last Permanent Address: State field not populating	submitted to DEV 10/18/2019
886040	CurrentLivingSituation.csv has some nulls in the CLS column	not using this yet anyway
886055	Users.csv only includes current users	should be fixed because UserIDs are referred to throughout the other data objects, and will not have anything to tie back to. Using ART report anyway bc I also need Default Provider and that's the only way to get it.
807291	the export is double quoted	This only affects when I use the export to upload our HIC data to HDX. HDX rejects it. The fix for this (from my side) is located in the script in the HIC repository (https://github.com/COHHIO/HIC) until they fix it on their side.
~~873163~~	~~Export data not matching what is in SP, specifically with Residence Prior and Length of Stay~~	submitted to DEV 9/30. Not fixed yet BUT they did find the reason. They didn't map old answers to one of the LoTH questions correctly, so the old answers are just coming in as nulls. I asked them to add "(retired)" to the value so users can tell what's wrong, but they closed the case.
~~918016~~	~~Organization ID in Project.csv shouldn't have nulls~~	~~submitted ticket 2/14/2020. No need to adjust anything in my code, it hasn't affected my work (yet)~~
~~886053~~	~~InformationDate column in CurrentLivingSituation.csv formatted incorrectly~~	~~commented out that line of code so I can use it once it's fixed, copied it and "corrected" it so it works with the wrongness of this export. shouldn't affect anything I need currently.~~
~~886038~~	~~HMIS Participating in Projects.csv is all 1's~~	UPDATE: this has been fixed. ~~ORIGINAL: put in a case, overwriting data from ReportWriter~~
~~837944~~	~~subassessments are coming in based on their Start & End Dates as compared to the Report Start & End Dates instead of the Entry & Exit Dates of the Enrollment ID that they're attached to~~	UPDATE: fixed 2/13/2020 ~~filtering out any Data Quality issues to do with subassessments, being sure the Data Quality reports have "UNDER CONSTRUCTION" written in the header until this is corrected~~
~~858797~~	~~2 identical Effective Dates & different Values returns the wrong value in the export~~	UPDATE: This has been fixed. ORIGINAL COMMENT: this is the 3rd ticket about this issue. I have tried (again) to impress upon them that this is an export-wide thing they need to fix but they seem to be relying on me for examples of Client IDs where this problem is evident. Trying to escalate as much as possible.
~~833259~~	~~takes 12 hours to run 2 years of data~~	~~running the export prior to leaving work, downloading next morning.~~
~~804998~~	~~Incorrect Enrollment User ID. The User ID in the Enrollment file and in the Client file is actually the Project ID.~~	~~Importing this data with RMisc (created in ART) and replacing the bad data with data from ART.~~
~~810738~~	~~No Bed Types are coming through~~	~~Pulling it in from ReportWriter until WS fixes it.~~
~~818626~~	~~No Provider Addresses coming through~~	~~Pulling it in from the RMisc file until WS fixes it.~~ Resolved 10/30
~~818854~~	~~No Youth Beds coming through~~	~~Pulling it in from ReportWriter until WS fixes it.~~
~~873523~~	~~Move-In Date in Export returning Move-In Date of the HoH on non-HoHs even if that non-HoH hadn't joined the project stay yet.~~	See comments below for more on this one.
~~874938~~	~~FY2020 HUD CSV Export still not available~~	~~??? It was due October 1, it's October 23rd and still no updated export. The next promise is October 28th.~~ Update: October 30th, it's available.

Exclude the Union Senior Assistance HP project data

Add Totals infobox to the Rme RRH Spending report (to match other QPR tabs)

Users have requested that this data be shown more closely to the way it is shown in the ART report, likely because they need the totals.

Add Percent Spent on RRH, Total HHs, $ Spent on RRH, $ Spent on HP

Made the ART report available temporarily, but ART will be going away eventually so..

Specifications for the BoS's QPR: https://cohhio.org/boscoc/performance-and-monitoring/

Make System Performance Measures more visual

Created the Quarterly Performance Report section in a hurry and needed to put the System Performance Measures metrics out but didn't have time to make them visual in any way.

I don't yet have a vision for how it should look, just know that I meant to come back to it when I put it out to make it look better.

Data Quality Comparison

At 585, we're calculating out the data quality flags with this chunk here

data_quality_flags_detail <- pe_validation_summary %>%
  left_join(dq_flags_staging, by = "AltProjectName") %>%
  mutate(General_DQ = if_else(GeneralFlagTotal/ClientsServed >= .02, 1, 0),
         Benefits_DQ = if_else(BenefitsFlagTotal/AdultsEntered >= .02, 1, 0),
         Income_DQ = if_else(IncomeFlagTotal/AdultsEntered >= .02, 1, 0),
         LoTH_DQ = if_else(LoTHFlagTotal/HoHsServed >= .02, 1, 0))

but the dq_flags_staging dataframe doesn't have the flags filtered by when folks entered. If we create dq_flags_staging but restrict the benefits and income flags just to clients entering program in that time period, we end up with fewer programs in the detail dataframe.

I think that means that we can have entries in the numerator that aren't necessarily included in the denominator--is that what we want? It seems like we might want to restrict our flags to the entries that we're flagging as relevant to the time period for that flag type.

Household Aggregations

I'm looking at the active from the user's perspective, and I have a couple systemic questions about what is intended to be rolled up and what isn't. If we decide to roll up either of these, the sections may also need to be moved up higher in the code, depending on the disposition of Issue #106

For instance, we have a column that's labelled Income, but it's only looking at the income of the head of household. As a user, I would probably expect that to include whether anyone in the household has income, not just the person shown, but that could be a reflection of the set-up I'm used to?

Another example is score: it's rare, but we have a few instances of someone who is not listed as head of household having a higher score than the head of household. It may make sense to display the highest score in the last year for all household members instead of just the head of household's. To see those households, run the following code after the scores are joined to the active_list data frame:

active_list_test <- active_list %>%
  group_by(HouseholdID) %>%
  mutate(highest_score = max(Score)) %>%
  ungroup() %>%
  filter(Score != highest_score & hoh == 1)