Giter Club home page Giter Club logo

phenotype_data_acquisition's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phenotype_data_acquisition's Issues

Duplicate summary rows

Create table creates duplicate rows in ACT and possibly PCORNET scripts. It seems to create a row for each fact tat satisfies the inclusion criteria. Probably need to add a subquery to grab only one fact

Additional LOINCs from PCORnet's list

I compared the N3C phenotype to the PCORnet phenotype. The following LOINCs were included in the PCORnet list but not N3C. Consider for inclusion.

94306-8 | SARS coronavirus 2 RNA panel - Unspecified specimen by NAA with probe detection
94503-0 | SARS coronavirus 2 IgG and IgM panel - Serum or Plasma Qualitative by Rapid immunoassay
94504-8 | SARS coronavirus 2 IgG and IgM panel - Serum or Plasma by Immunoassay
94531-1 | SARS coronavirus 2 RNA panel - Respiratory specimen by NAA with probe detection

Same encounter/Same day

Change phenotype to require two weak dx in the same encounter and/or the same day (instead of just encounter)

Post ACT_COVID ontology concept dimension table

When V3 of ACT_COVID ontology is released post concept_dimension table subset ( name_char code, concept_path) for use with ACT path based code for sites that do not want to implement entire ontology. Also to document codes used.

Two ACT code changes

  1. need to remove variable declarations from top of the script, as we discovered those may not work unless you're specifically using SQL Developer as your client-side tool.

  2. need to add comments showing where multi-fact models should swap in their own table names in the NON path-based queries.

The path-based query did not retrieve any patients at UNC, I suspect because we don't have the covid ontology installed? The non-path-based query worked perfectly.

Zip file structure

Need to change the Python extract script to structure output like so:

DATA_COUNTS and MANIFEST at the root level of the directory
All other tables in a subdirectory call "datafiles"
naming convention of parent directory per documentation.

Split out lab categories

This does not impact the inclusion criteria, but rather the categories for downstream analysis. There is likely a need to split out our "lab confirmed negative" and "lab confirmed positive" categories into two--one for PCR tests, and one for antibody tests. Having a positive or negative means different things for those two types of tests.

OMOP Exporter README

Please create a README (or Wiki page) describing how to use the OMOP Exporter.

B97.21

Suggestion to downgrade B97.21 ICD-10 code to a weak positive on or after 4/1/2020, when the "real" covid ICD-10 code was released for use. This would be the same rule that we applied to B97.29.

OMOP Vocab Update Needed

@cukarthik and I need to coordinate with the OMOP Vocabulary team to add the following new source_concepts:
94547-7 SARS-CoV-2 IgG + IgM, serum or plasma
94558-4 SARS-CoV-2 Ag, respiratory specimen
94559-2 SARS-CoV-2 ORF1ab region, respiratory specimen
94562-6 SARS-CoV-2 IgA, serum or plasma
94563-4 SARS-CoV-2 IgG, serum or plasma
94564-2 SARS-CoV-2 IgM, serum or plasma
94565-9 SARS-CoV-2 RNA, nasopharyngeal specimen
94639-2 SARS-CoV-2 ORF1ab region, unspecified specimen
94640-0 SARS-CoV-2 S gene, respiratory specimen
94641-8 SARS-CoV-2 S gene, unspecified specimen
94642-6 SARS-CoV-2 S gene, respiratory specimen
94643-4 SARS-CoV-2 S gene, unspecified specimen
94644-2 SARS-CoV-2 ORF1ab region, respiratory specimen
94645-9 SARS-CoV-2 RdRp gene, unspecified specimen
94646-7 SARS-CoV-2 RdRp gene, respiratory specimen
94647-5 SARS-related CoV, unspecified specimen
94660-8 SARS-CoV-2 RNA, serum/plasma
94661-6 SARS-CoV-2 antibody interpretation

Current phenotype will be updated as these are included in OMOP vocabularies.

COVID-19 disease progression & category

The current 4-tier proposal may need additional layer to capture the multi-faceted nature of this COVID-19 disease; and the lagged codings in EHR may hinge how we collect data for downstream research, we need think a bit out of the box for this brand new phenotype.

Based on the known observation thus far, the disease displays spectrum of severity, for example, Severity classification according to Australian guidelines for the clinical care of people with COVID-19 (v2) (https://app.magicapp.org/app#/guideline/4179) - mild/moderate/severe/critical.

Many mild patients may not have symptoms - those patients will not likely be recorded or coded in EHR, but may be rediscovered by antibody tests and recorded in EHR later or different systems. Although we'll not anticipate those severity spectrum being assigned different ICD10 codes at this early stage, however I believe introduce this phenotype spectrum plus the proposed tier concept will prepare us to handle both known and unknowns better down the line.

Possible additional codes

  1. May incorporate the latest NLM SNOMED value sets (including COVID-19 and signs/symptoms, see attached)
  2. In the ICD10 side, i.e. U072/U073 are also possible candidates, additional suspected ones like J00-J06, J09-J11, R092, R918, etc.; also CMS new HCPCS G2023, G2024
  3. Not sure we need consider other COVID-19 severe complications - i.e. ARDS, Sepsis (unique pathway) and ventilator related?

2.16.840.1.113762.1.4.1114.7.xlsx
2.16.840.1.113762.1.4.1181.51.xlsx

GBQ errors with R package

From [email protected]:

BigQuery SqlRender issues for N3C R package execution:

  1. generate_cohort.sql:
    The statement: “create table #Codesets()” is processed by SqlRender into “create table ttav56kacodesets()” The post SQLRender statement is invalid because it is missing the dataset name in front of the table name. The correct output should be “create table [dataset name].ttav56kacodesets()”. The dataset name in this project could be @resultsDatabaseSchema.
  2. source_extract_scripts.sql
    a. Convert() function is not a BigQuery function and has not been mapped in the SqlRender replacement patterns . The extract script has CONVERT(VARCHAR(20),OBSERVATION_PERIOD_START_DATE, 120), after sqlrender, it's convert(STRING,observation_period_start_date, 120).
    This is the script we adjusted for BigQuery: format_datetime('%Y-%m-%d %H:%M:%S',datetime(observation_period_start_date))
    b. Date format for where clause is not been transformed to a valid BigQuery date format. The extract script has visit_start_date >= '1/1/20', BigQuery accepts visit_start_date >= '2018-01-01'
    c. Convert() and date add/subtract function. For the last query, extract script has CONVERT(VARCHAR(20), GETDATE() -2, 120) as UPDATE_DATE. First is the convert part, second is the data add/subtract part. Bigquery accepts the following format: format_datetime('%Y-%m-%d %H:%M:%S',datetime( date_add(CURRENT_DATE(), interval -2 day))) as UPDATE_DATE
  3. May or may not be an issue: The output file combines all columns into one excel column. For example, PERSON should have 15 columns from the source_extract_scripts.sql, the output csv file has only one column which includes all 15 columns separated by “|”.

result coded values

For LOINC encoded lab test, there is also a need to standardize the results

Positive vs Detected

Add negative flu test as selection criterion

Suggestion from Ken Gersing:

Are we looking for influenza with a negative results as part of the phenotype? We are being told a lot of folks early on pre LOINC Covid code were being tested for influenza and many the neg folks are probably COVID patients

Add assumptions to OMOP documentation

Kristin noted a couple of assumptions for OMOP on the Colorado call: Updating vocab regularly and using ERA tables. Can you add these and any others to the OMOP documentation please?

Multi-fact model

For the ACT phenotype code--do we need to accommodate sites that use the multi-fact model? I know we can't know their table names, so no point in writing out code for it exactly, but maybe add comments indicating where sites would swap in their specific table names in the code?

R script uses NA instead of nulls

Can we prevent the R script from inserting its "NA" convention where nulls should be? We want the nulls to stay null in the final CSVs. Thanks!

Reformat phenotype documentation

Separate into two section: inclusion criteria, without categories, and categories. This should clear up confusion around whether sites need to precalculate the categories in their phenotypes (they shouldn't).

Discuss changing ACT extraction

After looking at the how mapping is being approached, I want to change the ACT extraction to export multiple fact tables using the ontology otherwise there is no way the Harmoninzation team is going to be able to back into the OMOP mapping. So there will be at least 4 OBSERVATION_FACT tables diagnosis, procedures, labs and meds plus demographics would change as well so the hispanic element would be included in the patient table.

Feedback from UTH

"To be included as a "suspected positive"/"probable positive" (over 50% patients should have COVID), the use of CPT code 86318 (infectious agent antibody) is going to potentially greatly skew these numbers. This code is used generically for many rapid infectious agent antibody tests. CPT 86328 is created specifically for COVID-19 rapid antibody testing. Again, they might think there could be miscoding here, but I would worry that this code is used extensively for other infectious agents and could affect the numbers. "

Community Issue: Want to help us test?

If you are an ACT site using SQL Server, we would love to make sure the SQL Server versions of our ACT phenotype and extract code runs with no errors. If you are able to run one or both of the ACT phenotype scripts (path based and hardcoded) and the ACT extract script, drop us a comment with the results.

You'll have our gratitude!

Fix extract mistake

I realized a silly thing I did in the PCORnet extract scripts (which I have since fixed)--please check to see if my mistake trickled down to the other models! In the DATA_COUNTS table, I neglected to join to the N3C_COHORT table each time (and add the 1/1/2018 date range where it made sense), so I was just counting the number of rows in the base table rather than the N3C extract.

Please add to your code (you can use mine as an example) if you don't have this already. Thanks!

Wording Consistency

The first section nominated the four tiers as “lab-confirmed positive,” “lab-confirmed negative,” “probable positive,” and “possible positive"; and the next section of "Phenotype rules" writes as “lab-confirmed positive cases,” “lab-confirmed negative cases,” “suspected positive cases,” and “possible positive cases". Maybe either could be changed to keep the consistency.

TriNetX MANIFEST table

TriNetX to determine how to handle the MANIFEST table; this will need to be filled out by TriNetX when they run the script (which means they need to know the right information from the site), or a process is needed to have the site create their own MANIFEST table before submitting data.

Clean up stray OMOP files

  1. Clean up any unneeded files from the OMOPExporter folder.
  2. Within \OMOPExporter\inst\sql\sql_server, make sure that the "generate_cohort.sql" and "source_extract_scripts.sql" files are running the logic as you expect and contain all the queries. The individual SQL files should then be removed.

Phenotype picks up lots of children

Thought for future versions--we have reports from UNC and UTH that the phenotype picks up a disproportionate number of children. This may not be an issue, as we know there will be many false positives in the mix, but worth thinking about whether we need to add some criteria to narrow down the number of kids that qualify.

ACT acquisition script

For sites that have the ACT i2b2 phenotype, wouldn't it make sense to start with a query like:

SELECT * FROM observation_fact WHERE concept_cd in 
     (SELECT concept_cd from concept_dimension where concept_path like '
      \\ACT\\UMLS_C0031437\\SNOMED_3947185011\\%');

This would catch updates to the phenotype and automagically include synthetic types as well as local mapping. If we wanted to use the local mappings, it would also be necessary to add something of the form:

SELECT * from concept_dimension where concept_path like 
'\\ACT\\UMLS_C0031437\\SNOMED_3947185011\\%';

Antibody tests

On a different note - Will antibody test be classified in the lab-confirmed positive/negative tiers? Some past patient case reported as "lab-confirmed negative" but imaging (i.e. CT/x-ray) confirmed as COVID-19 pneumonia - do we treat those as "probable positive"? Is it any value to include imaging diagnosis in the phenotype? Similar puzzle is the existence of L/H phenotypes during ICU treatment. There're many unknowns, we need to design such unique phenotype definition so those unknowns can be uncovered in the downstream analysis.

Originally posted by @johnl8888 in #5 (comment)

SITE_NAME is missing in the manifest table

The manifest table only contains the SITE_ABBREV and it is missing SITE_NAME.
We use this field to set the site's full name in our data quality dashboard to generate the output files via the OMOP's DQDashboard. Could you add SITE_NAME field in the manifest table?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.