Giter Club home page Giter Club logo

generalized_data_model's People

Contributors

aguynamedryan avatar benhannel avatar marc-outins avatar markdanese avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

generalized_data_model's Issues

allow source value and source vocab to be added to clinical codes table

I think we need to give users the option of doing the ETL and including a “source value" and “source vocab” in the clinical codes table. It should not be there by default, but it will make the table easier for people to use if they want it. I realize it is one join away by using the vocab table, but I believe it will help make the model understandable at first. And our first order of business is to convince people that this is a better way forward.

Change order of the tables in readme

Change the order of the tables in the Readme doc so that the flow makes more sense. Possibly providers first, than claims, then claims_providers, then lines, etc.

need a hospitalization details table

this will have admit, discharge, length of stay, admit source, and discharge source

need to consider whether this applies to outpatient facility claims -- for example, observation stays.

Consider changing type_concept_id to prov_type_concept_id

Currently in the column type_concept_id exists in tables contexts, clinical_codes, and admission_details. The column name is not very descriptive about what goes in there. For the most part this column describes the provenance of the record and so we should consider changing to prov_type_concept_id or something like that.

add collection_type_concept_id

We don't have a way of assigning a type to a collection record. This is useful for visit-based data like CPRD or other EHR data. It might contain "visit" or "claim" or consultation type (COT with 61 possible types) from CPRD.

Consider dividing the cost table into the cost details table and the collections costs table

Joining through the contexts table is an elegant way to have a single table. But I think it might be cleaner to have cost tables at the two levels where costs are generally represented. It also makes it more clear that costs can be duplicated (for example at the claim and line levels).

The costs details table would be like the drug and measurement tables and would provide cost details for individual clinical codes records (generally procedures or prescriptions).

The collections costs table would be for summarized (claims) costs and visit costs.

line and claim table renaming

We should rename lines something like "related_details" and claims something like "grouped_records". it is possible, but not required, that we can group related details.

clinical details and clinical conditions can be related to each other using related details. So, systolic and diastolic blood pressure could be connected together. Diagnoses and procedures can be connected together. And sets of clinical codes (diagnoses, procedures, etc) and clinical details (labs, and other observations) can all be linked together.

Rename claims and lines tables

Potentially rename claims and lines tables to something more generic to other types of data. Possibly encounters (lines) and grouped_encounters (claims).

add number_per_day to drug table

Days supplied is most commonly used in research, but the raw data might not include this information explicitly. CPRD includes ndd which is the numeric daily dose, which is the number of items to be used each day. Hence, if quantity is 60 and ndd is 2 then the days supplied is 30. CPRD does not report days supplied.

Add details.line_id

Currently details table only has a claim_id. Need to add line_id to details.

Change "type context id" to "record type context id"

This allows us to clarify that the items are linked together and connected as a single "record". Generally this is because they are all related in a domain-specific way. For example, the are all cancer diagnosis variables, claim diagnosis variables, laboratory values collected at the same time, etc.

consider changing brand_name and generic_name fields

These are generally going to require a mapping to RxNorm to identify. Part D data only uses NDC codes and there is no text fields. CPRD has drug substance (ingredient, like Furosemide) and product name (which is the fully qualified name like Furosemide 40mg tablets). These are also RxNorm ideas, so it may be better to use these in place of brand and generic. We may need to look at some other drug data to finalize this.

Add claim_id and line_id to exposures

In SEER-Medicare the DME file has ndc codes in it. If we want to add those ndc's here we need to have the ability to link it to claims and lines.

Potentially move drugs to clinical_codes

Drugs are represented as different clinical codes, hcpcs, ndc, etc. and could possible live inside the clinical codes table. Then we would use a modifier table to represent other data about the code such as quantity, days supplied, etc,

Add census tract information to address?

Should we add census tract information to the addresses table or create an auxiliary table that holds other information about the addresses? census tract, health service area, urban/rural, etc.

Should we add type_concept_id to facilities table?

We currently have specialty_concept_id which may be where we add a facility type or is that for something else and we need another column? SEER-Medicare has a variable fac_type which has values:

1 = Hospital
2 = Skilled nursing facility (SNF)
3 = Home health agency (HHA)
4 = Religious Nonmedical (Hospital) (eff. 8/1/00); prior to 8/00 referenced Christian Science (CS)
5 = Religious Nonmedical (Extended Care) (eff. 8/1/00); prior to 8/00 referenced CS (discontinued effective 10/1/05)
6 = Intermediate care
7 = Clinic or hospital-based renal dialysis facility
8 = Special facility or ASC surgery
9 = Reserved

Consider flattening out the costs tables

Rather than wide tables, it may be easier to use tall tables with concept ids in place of the specific cost types. This gives flexibility to users to define their own concept ids for costs without having to change the data model.

create a location_period table

This will allow locations to be used in the study period section of the jigsaw user interface. Consider supporting this in the information_periods table

Check claim adjudication codes

In processing of raw data ignore lines that are duplicates or errors. For example in SEER-Medicare there is the variable proindcd where "M" means duplicate line items that we may want to drop.

how should we incorporate text records and quality of life instruments?

These should be indicated with an appropriate row in the clinical codes table and perhaps a details table to store the data.

For text records, we don't expect to get raw text. We will probably get specific terms mined from text data (e.g., "diabetes"). So, this will either need to be mapped to an existing vocab or put in as raw text.

For QOL data, we might get scale scores or answers to individual questions. Each instrument could be considered its own vocabulary with names for questions and scores. And the values could be in the clinical codes table.

Determine which columns should never be null

I'd like to add another column to the readme that indicates if a column should never be null. If someone could determine which columns are always required, I'd appreciate it.

consider ways of including time or duration for collection records

CPRD has a consultation (visit) duration. This could be handled using a datetime variable so that the start and end times represent the duration. It could also be handled with explicit time variables or a duration variable.

The use case is that costing studies in the UK can use this variable to estimate the cost of a visit.

admission_details

does admission_details only contain inpatient admission and emergency department encounters ?
what about outpatient ?

consider a modifier table

This would link to a specific procedure in the clinical codes table. There may be many records per procedure (usually up to 4 or 8).

Drop position from lines table and move

Drop the position column from the lines table. Since their code be multiple codes per line it make sense to keep position in the clinical_codes table only

Consider how to add a family group to patients

There are instances where patients are grouped, most likely by a family id as is the case with CPRD. There are a couple of ways to handle this:

  1. Add family_id_source_value variable to patients table that is the source value from the source data
  2. Add family_id to patients table and add families table to GDM
  3. Add the family id to the patient_details table somehow. Currently that table expects a vocabulary associated with the detail we are adding so it seems weird to create a CPRD family vocab

Remove address id from contexts table

We can support a master (billing) address at the collections level if we have multiple different sites at the contexts level. This generally applies to a DME type of file where a site (e.g., CVS pharmacy) can have multiple locations but has one billing address.

Add npi as variable to providers and facilities table

Jen suggests either to add npi as a variable to the tables and change identifier and identifier_type to other_identifier and other_identifier_type so we can hold both npi and one other identifier in the same record. Another suggest is adding primary_identifier, primary_identifier_type, other_identifier, other_identifier_type to keep it less US-centric.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.