Comments (20)
@kimaina Thanks for the update on this! Just to ensure this is properly documented, I'm creating a little simulator for this data over in this repo: https://github.com/ibacher/hiv-test-data.
Is it possible to run the numbers over the same 5-ish year period? From the previous data set, it looked like June 2020 - November 2020 was a bit of an outlier period (I'm guessing it's related to the current pandemic).
from fhir-data-pipes.
Thanks, @jecihjoy for documenting this!
from fhir-data-pipes.
Please find attached monthly summary statistics for key metrics in the past 5 years (2015-2020). To improve granularity, I have provided monthly instead of yearly summaries so that when we generate the synthetic dataset it can reflect this distribution and pattern.
encounters_monthly_aggregates.xlsx
Here is the SQL query used to generate (FYI this is just a draft and not final):
SELECT
date_format(encounter_datetime,'%m-%Y') as month_year,
MONTH(encounter_datetime) as month,
YEAR(encounter_datetime) as year,
COUNT(DISTINCT `encounter_id`) AS distinct_encounters,
COUNT(DISTINCT `visit_id`) AS distinct_visits,
COUNT(DISTINCT if(prev_clinical_datetime_hiv is null,`person_id`,null)) AS newly_enrolled,
COUNT(DISTINCT if(encounter_datetime >= arv_start_date,`person_id`,null)) as patients_on_arvs,
COUNT(DISTINCT if((encounter_datetime < arv_start_date or arv_start_date is null),`person_id`,null)) as patients_not_on_arvs,
COUNT(DISTINCT if(vl_1 < 1000, `encounter_id` ,null)) as vl_suppressed_encounters,
COUNT( DISTINCT if(vl_1 >= 1000, `encounter_id`,null)) as vl_failure_encouunters,
COUNT( DISTINCT if( timestampdiff(day, if(rtc_date, rtc_date, DATE_ADD(encounter_datetime, INTERVAL 30 DAY)), endDate) > 90, `person_id`, null)) as LTFU
FROM
etl.dates
JOIN etl.flat_hiv_summary_v15b on date(encounter_datetime) <= date(endDate)
WHERE is_clinical_encounter = 1 and (next_clinical_datetime_hiv is null or date(next_clinical_datetime_hiv) > endDate) and
encounter_datetime >= timestamp(date("2015-01-01")) #and encounter_datetime<= date('2020-03-01')
group by MONTH(encounter_datetime), YEAR(encounter_datetime)
order by encounter_datetime desc
from fhir-data-pipes.
@kimaina Thanks!
For generating VL values, from what I've understood, AMPATH pretty much consistently has actual VL values reported in the resulting Obs, but this isn't the case for all implementations. See this comment by Mike Seaton and just my general overview of the landscape I reviewed. So the question is: do we want to simulate data where we always have a numeric VL result or should we include some number of these non-numeric results? Opinions?
from fhir-data-pipes.
So the question is: do we want to simulate data where we always have a numeric VL result or should we include some number of these non-numeric results? Opinions?
I think we should simulate both quantitative and qualitative VL results since the majority of implementers have both. From my experience working on this, we normally have 2 different concepts representing either the quantitative value or the VL qualifier (LDL or <300).
from fhir-data-pipes.
from fhir-data-pipes.
@jonathandick CIEL:1305 (qualitative VL) is the base concept for qualitative VLs and at least CIEL:1302 (Not detected) and CIEL:1306 (Beyond detectable limit) are used to record suppressed VLs in KenyaEMR and UgandaEMR respectively. No idea how PIH are representing the "< 300" or "< 500" part. Theoretically, OpenMRS has a ComplexNumeric datatype to handle those, but AFAIK it's never actually been used.
from fhir-data-pipes.
@kimaina Could you get us an estimate of the # encounters and visits / year (or month) for suppressed versus non-suppressed patients? JJ had indicated that VL non-suppressed patients will be seen more regularly and I'd like a way to estimate that, if possible. Thanks!
from fhir-data-pipes.
@kimaina Could you get us an estimate of the # encounters and visits / year (or month) for suppressed versus non-suppressed patients? JJ had indicated that VL non-suppressed patients will be seen more regularly and I'd like a way to estimate that, if possible. Thanks!
Hey @ibacher,I have provided these 4 estimates in the above excel --> distinct_encounters, distinct_visits, vl_suppressed_encounters, and vl_failure_encounters. I'll rename the above columns to be more understandable, in the next version
from fhir-data-pipes.
@ibacher do you prefer monthly aggregates or yearly aggregates?
from fhir-data-pipes.
@kimaina Ah! I see... then do I have a way of estimating the percent of patients suppressed vs patients unsuppressed?
do you prefer monthly aggregates or yearly aggregates?
Monthly is generally more useful, I just wasn't sure if a single patient was likely to have multiple visits per month or not.
from fhir-data-pipes.
I'll also add-in
- the number of unique patients per months
- of these patients how many are suppressed
- and how many are not
@kimaina Ah! I see... then do I have a way of estimating the percent of patients suppressed vs patients unsuppressed?
from fhir-data-pipes.
Monthly is generally more useful, I just wasn't sure if a single patient was likely to have multiple visits per month or not.
We can also do yearly + monthly aggregates
from fhir-data-pipes.
I have updated the query to include a few more metrics as well as some of @jonathandick and @ibacher suggestions e.g :
- How likely is it that a patient will remain active in care this month?
- How likely is it that a patient will get a viral load test this month if the patient is eligible for a test?
- How likely is it that the test will be < 1000?
This is still a draft and I still need to add @ibacher suggestions on
- % of patients LTFU who return to care and, ideally, the meantime to return to care.
SELECT
date_format(endDate,'%m-%Y') as month_year,
# Visits
SUM(hmsd.visit_this_month) AS `visit_this_month`,
SUM(hmsd.scheduled_this_month) AS `scheduled_this_month`,
SUM(hmsd.unscheduled_this_month) AS `unscheduled_this_month`,
SUM(hmsd.enrolled_this_month) AS `enrolled_this_month`,
# LTFU and active in care
COUNT( DISTINCT `person_id` ) as distinct_patients,
SUM(hmsd.active_in_care_this_month) AS `active_in_care_this_month`,
AVG(hmsd.days_since_rtc_date) AS `avg_days_since_rtc_date`,
COUNT( DISTINCT if( timestampdiff(day, if(rtc_date, rtc_date, DATE_ADD(encounter_date, INTERVAL 30 DAY)), endDate) > 90, `person_id`, null)) as LTFU_since_2015,
COUNT( DISTINCT if( timestampdiff(day, if(rtc_date, rtc_date, DATE_ADD(encounter_date, INTERVAL 30 DAY)), endDate) > 90 AND TIMESTAMPDIFF(DAY, encounter_date,`endDate`) <365, `person_id`, null)) as LTFU_past_year,
COUNT( DISTINCT if( timestampdiff(day, if(rtc_date, rtc_date, DATE_ADD(encounter_date, INTERVAL 30 DAY)), endDate) > 90 AND TIMESTAMPDIFF(DAY, encounter_date,`endDate`) <365/2, `person_id`, null)) as LTFU_past_6months,
COUNT( DISTINCT if( days_since_rtc_date > 90, `person_id`, null)) as LTFU_this_month,
COUNT( DISTINCT if(TIMESTAMPDIFF(DAY, death_date,`endDate`) < 30,`person_id`, null)) as deaths_this_month,
COUNT( DISTINCT if(TIMESTAMPDIFF(DAY, death_date,`endDate`) < 365,`person_id`, null)) as deaths_past_year,
COUNT( DISTINCT if(TIMESTAMPDIFF(DAY, transfer_out_date,`endDate`) < 30,`person_id`, null)) as transfer_out_this_month,
COUNT( DISTINCT if(TIMESTAMPDIFF(DAY, transfer_out_date,`endDate`) < 365,`person_id`, null)) as transfer_out_past_year,
# ART Status
SUM(hmsd.art_revisit_this_month) AS `art_revisit_this_month`,
SUM(hmsd.is_pre_art_this_month) AS `pre_art_this_month`,
SUM(hmsd.on_art_this_month) AS `on_art_this_month`,
SUM(hmsd.started_art_this_month) AS `started_art_this_month`,
SUM(had_med_change_this_month) AS `had_med_change_this_month`,
# ART Line
SUM(hmsd.on_original_first_line_this_month) AS `on_original_first_line_this_month`,
SUM(hmsd.on_alt_first_line_this_month) AS `on_alt_first_line_this_month`,
SUM(hmsd.on_second_line_or_higher_this_month) AS `on_second_line_or_higher_this_month`,
# Viral Load
COUNT( DISTINCT if(vl_1 < 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 30, `encounter_id`, null)) as vl_suppressed_encounters_this_month,
COUNT( DISTINCT if(vl_1 >= 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 30, `encounter_id`, null)) as vl_unsuppressed_encounters_this_month,
COUNT( DISTINCT if(vl_1 < 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 30, `person_id`, null)) as vl_suppressed_patients_this_month,
COUNT( DISTINCT if(vl_1 >= 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 30, `person_id`, null)) as vl_unsuppressed_patients_this_month,
COUNT( DISTINCT if(vl_1 < 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 365, `person_id`, null)) as vl_suppressed_patients_past_year,
COUNT( DISTINCT if(vl_1 >= 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) <365, `person_id`, null)) as vl_unsuppressed_patients_past_year,
COUNT( DISTINCT if(vl_1 < 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 365, `encounter_id`, null)) as vl_suppressed_encounters_past_year,
COUNT( DISTINCT if(vl_1 >= 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 365, `encounter_id`, null)) as vl_unsuppressed_encounters_past_year,
SUM(hmsd.due_for_vl_this_month) AS `due_for_vl_this_month`,
SUM(hmsd.qualifies_for_follow_up_vl) AS `qualifies_for_follow_up_vl`,
SUM(hmsd.got_follow_up_vl_this_month) AS `got_follow_up_vl_this_month`,
SUM(hmsd.follow_up_vl_suppressed_this_month) AS `follow_up_vl_suppressed_this_month`,
SUM(hmsd.follow_up_vl_unsuppressed_this_month) AS `follow_up_vl_unsuppressed_this_month`,
SUM(hmsd.follow_up_vl_suppressed_this_month) AS `follow_up_vl_unsuppressed`,
SUM(hmsd.follow_up_vl_suppressed_this_month) AS `follow_up_vl_suppressed`,
AVG(hmsd.num_days_to_follow_vl) AS `avg_num_days_to_follow_vl`
FROM
etl.hiv_monthly_report_dataset_frozen `hmsd`
WHERE
(endDate >= '2015-01-01')
GROUP BY endDate
order by endDate desc
here is a sample xlxs
2020_monthly_aggregates.xlsx
from fhir-data-pipes.
hey, @ibacher I have completed generating the necessary data elements for generating the required metrics for the simulation. Quick question, do you prefer proportions or percentages?
https://github.com/kimaina/vl-indicators/blob/main/2020_monthly_aggregates_v2.csv
from fhir-data-pipes.
I’ve been converting everything into percentages, apart from raw counts, but whatever is easiest!
from fhir-data-pipes.
you mean proportions, right! I'm happy to provide proportions and counts
from fhir-data-pipes.
I have completed creating the query with the recommendations, I'll be running it during off-peak to generate the dataset then get back to you
from fhir-data-pipes.
Hi @ibacher
Please see the PR: https://github.com/ibacher/hiv-test-data/pulls
You will find:
proportions.csv
containing key proportionsscript.Rmd
is the R script used to generate portions2020_monthly_aggregates_v2.csv
raw dataset containing over 55 indicators
from fhir-data-pipes.
@kimaina Thanks for the PRs and all the data!
from fhir-data-pipes.
Related Issues (20)
- Setup Sonar for developers to identify bugs through static code analysis
- Sonar detected bugs on metrics package of pipeline-controller
- Fix Code smells in pipeline-controller
- Investigate Cloud Build flakiness because of dockerised pipeline runs HOT 4
- compose-controller-spark-sql-single.yaml fails to launch HOT 2
- Remove `hiveJdbcDriver` configuration property and unify driver loading
- New recurring Thrift server errors HOT 3
- Upgrade HAPI FHIR version
- Fix issues with `compose-controller-spark-sql-single.yaml` HOT 6
- Do not display the `View Raw Logs` button in case of no logs
- In the HAPI JDBC mode, when resources are created with PUT, resource Id's mismatch between the original FHIR resource and the corresponding resource in the parquet file. HOT 4
- Missing extension in the parquet file compared to source FHIR resource. HOT 1
- How to handle DataFormatException while parsing JSON encoded FHIR content HOT 1
- Make recursive depth of Bunsen a configuration parameter. HOT 1
- The `answer` fields are dropped in QuestionnaireResponse due to recursive structure.
- Unable to create Individual columns of nested array of objects in a dataframe HOT 9
- Investigate high memory utilisation for the pipeline controller and provide configurations to control them HOT 3
- Cannot override fhirServerUserName or fhirServerPassword as it keeps defaulting HOT 6
- Help needed running a fhri pipeline controller HOT 6
- Gracefully handle resource types with no instances in the FHIR server. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fhir-data-pipes.