Giter Club home page Giter Club logo

Comments (20)

ibacher avatar ibacher commented on June 7, 2024 1

@kimaina Thanks for the update on this! Just to ensure this is properly documented, I'm creating a little simulator for this data over in this repo: https://github.com/ibacher/hiv-test-data.

Is it possible to run the numbers over the same 5-ish year period? From the previous data set, it looked like June 2020 - November 2020 was a bit of an outlier period (I'm guessing it's related to the current pandemic).

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

Thanks, @jecihjoy for documenting this!

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

Please find attached monthly summary statistics for key metrics in the past 5 years (2015-2020). To improve granularity, I have provided monthly instead of yearly summaries so that when we generate the synthetic dataset it can reflect this distribution and pattern.

encounters_monthly_aggregates.xlsx

Here is the SQL query used to generate (FYI this is just a draft and not final):

SELECT 
	date_format(encounter_datetime,'%m-%Y') as month_year,
	MONTH(encounter_datetime) as month,
	YEAR(encounter_datetime) as year,    
    COUNT(DISTINCT `encounter_id`) AS distinct_encounters,
	COUNT(DISTINCT `visit_id`) AS distinct_visits,
	COUNT(DISTINCT if(prev_clinical_datetime_hiv is null,`person_id`,null)) AS newly_enrolled,
    COUNT(DISTINCT if(encounter_datetime >= arv_start_date,`person_id`,null)) as patients_on_arvs,
    COUNT(DISTINCT if((encounter_datetime < arv_start_date  or arv_start_date is null),`person_id`,null)) as patients_not_on_arvs,
	COUNT(DISTINCT if(vl_1 < 1000, `encounter_id` ,null)) as vl_suppressed_encounters,
    COUNT( DISTINCT if(vl_1 >= 1000, `encounter_id`,null)) as vl_failure_encouunters,
    COUNT( DISTINCT if( timestampdiff(day, if(rtc_date, rtc_date, DATE_ADD(encounter_datetime, INTERVAL 30 DAY)), endDate) > 90,  `person_id`, null)) as LTFU
    
FROM
    etl.dates 
JOIN etl.flat_hiv_summary_v15b  on date(encounter_datetime) <= date(endDate)
WHERE is_clinical_encounter = 1  and (next_clinical_datetime_hiv is null or date(next_clinical_datetime_hiv) > endDate) and
encounter_datetime >= timestamp(date("2015-01-01"))  #and encounter_datetime<= date('2020-03-01')
group by MONTH(encounter_datetime),  YEAR(encounter_datetime)
order by encounter_datetime desc

from fhir-data-pipes.

ibacher avatar ibacher commented on June 7, 2024

@kimaina Thanks!

For generating VL values, from what I've understood, AMPATH pretty much consistently has actual VL values reported in the resulting Obs, but this isn't the case for all implementations. See this comment by Mike Seaton and just my general overview of the landscape I reviewed. So the question is: do we want to simulate data where we always have a numeric VL result or should we include some number of these non-numeric results? Opinions?

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

So the question is: do we want to simulate data where we always have a numeric VL result or should we include some number of these non-numeric results? Opinions?

I think we should simulate both quantitative and qualitative VL results since the majority of implementers have both. From my experience working on this, we normally have 2 different concepts representing either the quantitative value or the VL qualifier (LDL or <300).

from fhir-data-pipes.

jonathandick avatar jonathandick commented on June 7, 2024

from fhir-data-pipes.

ibacher avatar ibacher commented on June 7, 2024

@jonathandick CIEL:1305 (qualitative VL) is the base concept for qualitative VLs and at least CIEL:1302 (Not detected) and CIEL:1306 (Beyond detectable limit) are used to record suppressed VLs in KenyaEMR and UgandaEMR respectively. No idea how PIH are representing the "< 300" or "< 500" part. Theoretically, OpenMRS has a ComplexNumeric datatype to handle those, but AFAIK it's never actually been used.

from fhir-data-pipes.

ibacher avatar ibacher commented on June 7, 2024

@kimaina Could you get us an estimate of the # encounters and visits / year (or month) for suppressed versus non-suppressed patients? JJ had indicated that VL non-suppressed patients will be seen more regularly and I'd like a way to estimate that, if possible. Thanks!

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

@kimaina Could you get us an estimate of the # encounters and visits / year (or month) for suppressed versus non-suppressed patients? JJ had indicated that VL non-suppressed patients will be seen more regularly and I'd like a way to estimate that, if possible. Thanks!

Hey @ibacher,I have provided these 4 estimates in the above excel --> distinct_encounters, distinct_visits, vl_suppressed_encounters, and vl_failure_encounters. I'll rename the above columns to be more understandable, in the next version

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

@ibacher do you prefer monthly aggregates or yearly aggregates?

from fhir-data-pipes.

ibacher avatar ibacher commented on June 7, 2024

@kimaina Ah! I see... then do I have a way of estimating the percent of patients suppressed vs patients unsuppressed?

do you prefer monthly aggregates or yearly aggregates?

Monthly is generally more useful, I just wasn't sure if a single patient was likely to have multiple visits per month or not.

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

I'll also add-in

  • the number of unique patients per months
  • of these patients how many are suppressed
  • and how many are not

@kimaina Ah! I see... then do I have a way of estimating the percent of patients suppressed vs patients unsuppressed?

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

Monthly is generally more useful, I just wasn't sure if a single patient was likely to have multiple visits per month or not.

We can also do yearly + monthly aggregates

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

I have updated the query to include a few more metrics as well as some of @jonathandick and @ibacher suggestions e.g :

  • How likely is it that a patient will remain active in care this month?
  • How likely is it that a patient will get a viral load test this month if the patient is eligible for a test?
  • How likely is it that the test will be < 1000?

This is still a draft and I still need to add @ibacher suggestions on

  • % of patients LTFU who return to care and, ideally, the meantime to return to care.
SELECT 
	date_format(endDate,'%m-%Y') as month_year,
    # Visits 
    SUM(hmsd.visit_this_month) AS `visit_this_month`,
    SUM(hmsd.scheduled_this_month) AS `scheduled_this_month`,
    SUM(hmsd.unscheduled_this_month) AS `unscheduled_this_month`,
    SUM(hmsd.enrolled_this_month) AS `enrolled_this_month`,
    
    # LTFU and active in care
    COUNT( DISTINCT `person_id` ) as distinct_patients,
    SUM(hmsd.active_in_care_this_month) AS `active_in_care_this_month`,
    AVG(hmsd.days_since_rtc_date) AS `avg_days_since_rtc_date`,
    COUNT( DISTINCT if( timestampdiff(day, if(rtc_date, rtc_date, DATE_ADD(encounter_date, INTERVAL 30 DAY)), endDate) > 90,  `person_id`, null)) as LTFU_since_2015,
    COUNT( DISTINCT if( timestampdiff(day, if(rtc_date, rtc_date, DATE_ADD(encounter_date, INTERVAL 30 DAY)), endDate) > 90  AND TIMESTAMPDIFF(DAY, encounter_date,`endDate`) <365,  `person_id`, null)) as LTFU_past_year,
    COUNT( DISTINCT if( timestampdiff(day, if(rtc_date, rtc_date, DATE_ADD(encounter_date, INTERVAL 30 DAY)), endDate) > 90  AND TIMESTAMPDIFF(DAY, encounter_date,`endDate`) <365/2,  `person_id`, null)) as LTFU_past_6months,
    COUNT( DISTINCT if( days_since_rtc_date > 90,  `person_id`, null)) as LTFU_this_month,
    COUNT( DISTINCT if(TIMESTAMPDIFF(DAY, death_date,`endDate`) < 30,`person_id`, null)) as deaths_this_month,
    COUNT( DISTINCT if(TIMESTAMPDIFF(DAY, death_date,`endDate`) < 365,`person_id`, null)) as deaths_past_year,
    COUNT( DISTINCT if(TIMESTAMPDIFF(DAY, transfer_out_date,`endDate`) < 30,`person_id`, null)) as transfer_out_this_month,
    COUNT( DISTINCT if(TIMESTAMPDIFF(DAY, transfer_out_date,`endDate`) < 365,`person_id`, null)) as transfer_out_past_year,
    
	# ART Status
	SUM(hmsd.art_revisit_this_month) AS `art_revisit_this_month`,
        SUM(hmsd.is_pre_art_this_month) AS `pre_art_this_month`,
	SUM(hmsd.on_art_this_month) AS `on_art_this_month`,
         SUM(hmsd.started_art_this_month) AS `started_art_this_month`,
	SUM(had_med_change_this_month) AS `had_med_change_this_month`,
     
     # ART Line
     SUM(hmsd.on_original_first_line_this_month) AS `on_original_first_line_this_month`,
     SUM(hmsd.on_alt_first_line_this_month) AS `on_alt_first_line_this_month`,
     SUM(hmsd.on_second_line_or_higher_this_month) AS `on_second_line_or_higher_this_month`,

	# Viral Load
    COUNT( DISTINCT if(vl_1 < 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 30, `encounter_id`, null)) as vl_suppressed_encounters_this_month,
    COUNT( DISTINCT if(vl_1 >= 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 30, `encounter_id`, null)) as vl_unsuppressed_encounters_this_month,
    COUNT( DISTINCT if(vl_1 < 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 30, `person_id`, null)) as vl_suppressed_patients_this_month,
    COUNT( DISTINCT if(vl_1 >= 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 30, `person_id`, null)) as vl_unsuppressed_patients_this_month,
    COUNT( DISTINCT if(vl_1 < 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 365, `person_id`, null)) as vl_suppressed_patients_past_year,
    COUNT( DISTINCT if(vl_1 >= 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) <365, `person_id`, null)) as vl_unsuppressed_patients_past_year,
    COUNT( DISTINCT if(vl_1 < 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 365, `encounter_id`, null)) as vl_suppressed_encounters_past_year,
    COUNT( DISTINCT if(vl_1 >= 1000 AND TIMESTAMPDIFF(DAY, vl_1_date,`endDate`) < 365, `encounter_id`, null)) as vl_unsuppressed_encounters_past_year,
     
     
	SUM(hmsd.due_for_vl_this_month) AS `due_for_vl_this_month`,
	SUM(hmsd.qualifies_for_follow_up_vl) AS `qualifies_for_follow_up_vl`,
        SUM(hmsd.got_follow_up_vl_this_month) AS `got_follow_up_vl_this_month`,
        SUM(hmsd.follow_up_vl_suppressed_this_month) AS `follow_up_vl_suppressed_this_month`,
        SUM(hmsd.follow_up_vl_unsuppressed_this_month) AS `follow_up_vl_unsuppressed_this_month`,
        SUM(hmsd.follow_up_vl_suppressed_this_month) AS `follow_up_vl_unsuppressed`,
        SUM(hmsd.follow_up_vl_suppressed_this_month) AS `follow_up_vl_suppressed`,
        AVG(hmsd.num_days_to_follow_vl) AS `avg_num_days_to_follow_vl`
    
    
    

FROM
    etl.hiv_monthly_report_dataset_frozen `hmsd`
WHERE
    (endDate >= '2015-01-01')
GROUP BY endDate
order by endDate desc

here is a sample xlxs
2020_monthly_aggregates.xlsx

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

hey, @ibacher I have completed generating the necessary data elements for generating the required metrics for the simulation. Quick question, do you prefer proportions or percentages?

https://github.com/kimaina/vl-indicators/blob/main/2020_monthly_aggregates_v2.csv

from fhir-data-pipes.

ibacher avatar ibacher commented on June 7, 2024

I’ve been converting everything into percentages, apart from raw counts, but whatever is easiest!

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

you mean proportions, right! I'm happy to provide proportions and counts

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

I have completed creating the query with the recommendations, I'll be running it during off-peak to generate the dataset then get back to you

from fhir-data-pipes.

kimaina avatar kimaina commented on June 7, 2024

Hi @ibacher

Please see the PR: https://github.com/ibacher/hiv-test-data/pulls

You will find:

  • proportions.csv containing key proportions
  • script.Rmd is the R script used to generate portions
  • 2020_monthly_aggregates_v2.csv raw dataset containing over 55 indicators

from fhir-data-pipes.

ibacher avatar ibacher commented on June 7, 2024

@kimaina Thanks for the PRs and all the data!

from fhir-data-pipes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.