outcomesinsights / generalized_data_model Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 3.0 716 KB

Outcomes Insights' Data Model for Clinical Research

License: MIT License

Shell 1.17% Ruby 98.83%

generalized_data_model's People

Contributors

Stargazers

Watchers

Forkers

wanghaisheng lenamax2355

generalized_data_model's Issues

make "role type id" a concept id (contexts-practitioners table)

allow source value and source vocab to be added to clinical codes table

I think we need to give users the option of doing the ETL and including a “source value" and “source vocab” in the clinical codes table. It should not be there by default, but it will make the table easier for people to use if they want it. I realize it is one join away by using the vocab table, but I believe it will help make the model understandable at first. And our first order of business is to convince people that this is a better way forward.

Exposures table has provider_id listed twice

Need to remove one of the provider_id's from the exposures table

add country to the address table

We may want to use this in the future: http://stackoverflow.com/questions/929684/is-there-common-street-addresses-database-design-for-all-addresses-of-the-world

Change costs.total_paid to costs.total_paid_drugs

costs.total_paid currently is just a sum of all the paid fields. Suggestion to change the field to total_paid_drugs where we store only total paid for drugs.

Change order of the tables in readme

Change the order of the tables in the Readme doc so that the flow makes more sense. Possibly providers first, than claims, then claims_providers, then lines, etc.

need a hospitalization details table

this will have admit, discharge, length of stay, admit source, and discharge source

need to consider whether this applies to outpatient facility claims -- for example, observation stays.

should Collections drop the FK to admission_details

this would require us to put a collections_id in the admission_details table and would be more consistent with how measurements and drug exposure are handled

Consider changing type_concept_id to prov_type_concept_id

Currently in the column type_concept_id exists in tables contexts, clinical_codes, and admission_details. The column name is not very descriptive about what goes in there. For the most part this column describes the provenance of the record and so we should consider changing to prov_type_concept_id or something like that.

add collection_type_concept_id

We don't have a way of assigning a type to a collection record. This is useful for visit-based data like CPRD or other EHR data. It might contain "visit" or "claim" or consultation type (COT with 61 possible types) from CPRD.

Move file_type to claims table

file_type currently lives in clinical_codes table but should be moved to claims table

consider adding route to drug exposure details

Route (oral, intravenous, rectal, etc) is commonly used and could be added to drug table. should check OMOP v 5.1 to see what they are doing.

Consider dividing the cost table into the cost details table and the collections costs table

Joining through the contexts table is an elegant way to have a single table. But I think it might be cleaner to have cost tables at the two levels where costs are generally represented. It also makes it more clear that costs can be duplicated (for example at the claim and line levels).

The costs details table would be like the drug and measurement tables and would provide cost details for individual clinical codes records (generally procedures or prescriptions).

The collections costs table would be for summarized (claims) costs and visit costs.

line and claim table renaming

We should rename lines something like "related_details" and claims something like "grouped_records". it is possible, but not required, that we can group related details.

clinical details and clinical conditions can be related to each other using related details. So, systolic and diastolic blood pressure could be connected together. Diagnoses and procedures can be connected together. And sets of clinical codes (diagnoses, procedures, etc) and clinical details (labs, and other observations) can all be linked together.

Rename claims and lines tables

Potentially rename claims and lines tables to something more generic to other types of data. Possibly encounters (lines) and grouped_encounters (claims).

add number_per_day to drug table

Days supplied is most commonly used in research, but the raw data might not include this information explicitly. CPRD includes ndd which is the numeric daily dose, which is the number of items to be used each day. Hence, if quantity is 60 and ndd is 2 then the days supplied is 30. CPRD does not report days supplied.

create interpretation variable for measurement_details table

This would be "qualifier" from CPRD

Add details.line_id

Currently details table only has a claim_id. Need to add line_id to details.

Change "type context id" to "record type context id"

This allows us to clarify that the items are linked together and connected as a single "record". Generally this is because they are all related in a domain-specific way. For example, the are all cancer diagnosis variables, claim diagnosis variables, laboratory values collected at the same time, etc.

costs total_cost_type_concept_id exceeds maximum length of 256

Need to make comment shorter to work with loading data into impala

Apostrophe's come out as garbage when opening in Windows Rstuido

Need to fix apostrophe characters in Readme file.

change "type concept id" in clinical codes table to "provenance concept id"

This clarifies that the concept id is geared toward provenance (e.g., admitting, discharge, primary, problem list, symptom list, etc.)

consider changing brand_name and generic_name fields

These are generally going to require a mapping to RxNorm to identify. Part D data only uses NDC codes and there is no text fields. CPRD has drug substance (ingredient, like Furosemide) and product name (which is the fully qualified name like Furosemide 40mg tablets). These are also RxNorm ideas, so it may be better to use these in place of brand and generic. We may need to look at some other drug data to finalize this.

Move clinical_codes.provider_id to lines.provider_id

We are assuming 1 provider per line so it makes sense for the provider_id to live in the lines table.

WWXXX HCPCS codes

According to https://www.cms.gov/Regulations-and-Guidance/Guidance/Transmittals/downloads/R136CP.pdf "As new “WW” codes are established for oral anticancer drugs they will be communicated in a Recurring Update Notification. "

This is an issue for the cancer drug code webpage at http://ndc_map.cohortjigsaw.com/ because these appear not to be included. However they seem to be created based on NDC codes and may only be in DME.

Add claim_id and line_id to exposures

In SEER-Medicare the DME file has ndc codes in it. If we want to add those ndc's here we need to have the ability to link it to claims and lines.

Potentially move drugs to clinical_codes

Drugs are represented as different clinical codes, hcpcs, ndc, etc. and could possible live inside the clinical codes table. Then we would use a modifier table to represent other data about the code such as quantity, days supplied, etc,

Add census tract information to address?

Should we add census tract information to the addresses table or create an auxiliary table that holds other information about the addresses? census tract, health service area, urban/rural, etc.

Should we add type_concept_id to facilities table?

We currently have specialty_concept_id which may be where we add a facility type or is that for something else and we need another column? SEER-Medicare has a variable fac_type which has values:

1 = Hospital
2 = Skilled nursing facility (SNF)
3 = Home health agency (HHA)
4 = Religious Nonmedical (Hospital) (eff. 8/1/00); prior to 8/00 referenced Christian Science (CS)
5 = Religious Nonmedical (Extended Care) (eff. 8/1/00); prior to 8/00 referenced CS (discontinued effective 10/1/05)
6 = Intermediate care
7 = Clinic or hospital-based renal dialysis facility
8 = Special facility or ASC surgery
9 = Reserved

Consider flattening out the costs tables

Rather than wide tables, it may be easier to use tall tables with concept ids in place of the specific cost types. This gives flexibility to users to define their own concept ids for costs without having to change the data model.

discuss how to handle measurement_detail string and concept id use

Should we use concept ids for results? Should they be used only for string results (i.e., string is the source value that goes with concept id)?

create a location_period table

This will allow locations to be used in the study period section of the jigsaw user interface. Consider supporting this in the information_periods table

Remove provider_id from details

We decided to move provider_id from clinical_codes to lines so we should do the same for details.

admission details table -- clarify that admit and discharge types are concept ids

admission source concept id
discharge location concept id

Source value are included in costs and care_sites tables

None of the other tables include a source value. Do we need the source values for this table?

Check claim adjudication codes

In processing of raw data ignore lines that are duplicates or errors. For example in SEER-Medicare there is the variable proindcd where "M" means duplicate line items that we may want to drop.

how should we incorporate text records and quality of life instruments?

These should be indicated with an appropriate row in the clinical codes table and perhaps a details table to store the data.

For text records, we don't expect to get raw text. We will probably get specific terms mined from text data (e.g., "diabetes"). So, this will either need to be mapped to an existing vocab or put in as raw text.

For QOL data, we might get scale scores or answers to individual questions. Each instrument could be considered its own vocabulary with names for questions and scores. And the values could be in the clinical codes table.

trying to get wiki to pull in the lucid chart graphic

I copied the embedded HTML but it doesn't show. Is it possible to do this?

see: https://lucidchart.zendesk.com/entries/21940385-Embed-a-diagram-into-a-blog-or-website?__hstc=215508872.830fc8e553fc4086382100025d3d2978.1441165172080.1441165172080.1441165172080.1&__hssc=215508872.3.1441165172080&__hsfp=2354141532

Determine which columns should never be null

I'd like to add another column to the readme that indicates if a column should never be null. If someone could determine which columns are always required, I'd appreciate it.

consider ways of including time or duration for collection records

CPRD has a consultation (visit) duration. This could be handled using a datetime variable so that the start and end times represent the duration. It could also be handled with explicit time variables or a duration variable.

The use case is that costing studies in the UK can use this variable to estimate the cost of a visit.

admission_details

does admission_details only contain inpatient admission and emergency department encounters ?
what about outpatient ?

consider a modifier table

This would link to a specific procedure in the clinical codes table. There may be many records per procedure (usually up to 4 or 8).

Drop position from lines table and move

Drop the position column from the lines table. Since their code be multiple codes per line it make sense to keep position in the clinical_codes table only

change "position" in clinical_codes to "seq_num" (or similar)

could just be "seq" or "sequence number"

Consider how to add a family group to patients

There are instances where patients are grouped, most likely by a family id as is the case with CPRD. There are a couple of ways to handle this:

Add family_id_source_value variable to patients table that is the source value from the source data
Add family_id to patients table and add families table to GDM
Add the family id to the patient_details table somehow. Currently that table expects a vocabulary associated with the detail we are adding so it seems weird to create a CPRD family vocab

Consider adding ingredient and semantic clinical drug (see RXNORM) to drug_exposure_details

This works for CPRD and can be obtained from RXNORM. Both variables entered as text fields.

Remove address id from contexts table

We can support a master (billing) address at the collections level if we have multiple different sites at the contexts level. This generally applies to a DME type of file where a site (e.g., CVS pharmacy) can have multiple locations but has one billing address.

Add more information on how to use the costs table

Currently there is some ambiguity on what gets stored where in the costs table. Need to update to make it clearer.

Add npi as variable to providers and facilities table

Jen suggests either to add npi as a variable to the tables and change identifier and identifier_type to other_identifier and other_identifier_type so we can hold both npi and one other identifier in the same record. Another suggest is adding primary_identifier, primary_identifier_type, other_identifier, other_identifier_type to keep it less US-centric.

consider multiple locations per patient (over time)

http://forums.ohdsi.org/t/person-location-over-time/1420/9