Comments (7)
So for every clinical collection I have seen there is a column that identifies the patient. In the clinical_meta data table, this column is indicated by the case_col column. Just query
SELECT collection_id, variable_label, variable_name FROM idc-dev-etl.clinical.clinical_meta
where case_col= True
This column is sometimes PatientID, sometimes subjectId etc.
More problematic - sometimes the format of the PatientID is different in the clinical data than in the DICOM files. In some collections the letter case is different (ie all lower vs mix of upper & lower), or '-' gets replaced by '_' etc.
from etl.
Yes, I understand. But I think sometime it is possible there might be more substantiative differences (e.g., I think I saw somewhere where the spreadsheet would include suffix or number, instead of the full PatientID). I think it will be important to do the checks, confirm the values are matched with what we have in DICOM, and if not - add custom mapping, which hopefully should be feasible to define.
from etl.
@G-White-ISB I do not see this issue addressed in the current iteration of the tables. For example, in this table idc-dev-etl.idc_v9_clinical.lung_pet_ct_dx_clinical
, there is tcia_case_id
, but not PatientID
.
Since this mapping will need to be curated manually, maybe we could add an additional attribute in this JSON https://github.com/fedorov/actcianable/blob/master/output/clinical_collections.json for this purpose? Of course, a complication is that the mapping will need to be (in the general case) from a column in a given sheet within a given file to PatientID
. We could also introduce an additional mapping file, if this makes more sense. What are your thoughts?
I think without PatientID
column in each of the clinical tables, this data will be pretty much useless, so I would rather postpone release until v10 than not have it in v9.
from etl.
tcia_case_id is the column I generated to match the case IDs as displayed in the IDC portal (and the TCIA portal). Do these case IDs match the PatientID in the dicom_all table?
I can rename tcia_case_id to PatientID.
Each table also retains the original case or patient identifier column. This identify of this column is recorded in the case_col column in the clinical_meta table:
SELECT table_name, variable_name FROM idc-dev-etl.idc_v9_clinical.clinical_meta
where case_col=TRUE
from etl.
@G-White-ISB I do not see tcia_case_id
column - I was expecting to find those for each collection_id
using the query below - perhaps I am missing something.
SELECT collection_id, variable_name FROM `idc-dev-etl.idc_v9_clinical.clinical_meta_column`
where variable_name like "%tcia%"
Also, we should not call it tcia_case_id
- we will have data from sources other than TCIA, and also we should not add confusion between case and PatientID. I would suggest calling it dicom_PatientID
. What do you think?
from etl.
These were changed to dicom_patient_id. Every clinical table has this column
SELECT collection_id, variable_name FROM idc-dev-etl.idc_v9_clinical.clinical_meta_column
where variable_name like '%dicom_patient_id%' order by collection_id
from etl.
Excellent, thank you! And I confirmed that all of the 27 collections that have clinical data also have patient ID mapped.
WITH
collections_with_ptid AS (
SELECT
DISTINCT(collection_id),
variable_name
FROM
idc-dev-etl.idc_v9_clinical.clinical_meta_column
WHERE
variable_name LIKE '%dicom_patient_id%'
ORDER BY
collection_id)
SELECT
DISTINCT(clinical_meta_column.collection_id)
FROM
idc-dev-etl.idc_v9_clinical.clinical_meta_column AS clinical_meta_column
LEFT JOIN
collections_with_ptid
ON
clinical_meta_column.collection_id = collections_with_ptid.collection_id
WHERE
clinical_meta_column.collection_id IS NOT NULL
from etl.
Related Issues (20)
- Map the patient identifier column in the clinical collections to the DICOM patientID HOT 2
- [clinical] Support NLST clinical data HOT 9
- Clinical data per-table metadata tracking HOT 3
- Define BQ layout of clinical data tables HOT 1
- [clinical] collection_id should not be an array HOT 3
- variable_label should not be blank HOT 3
- Consider renaming "variable" to "column" in "column_metadata" HOT 2
- Inconsistencies identified for hnscc_3dct_rt_clinical table HOT 2
- Add regression testing to confirm consistency of clinical table schemas with column_metadata HOT 2
- Integrate 'Legacy' Clinical data into new clinical dataset.
- Inconsistencies identified for the ISPY1 clinical table HOT 4
- dicom_patient_id appears to be missing in several clinical tables HOT 2
- Use fully resolved versioned table names in all places HOT 1
- table_metadata should indicate whether table dictionary was parsed from sources or derived HOT 2
- acrin_6698 sbrgrade NAs are replaced with nulls HOT 2
- Values for `dicom_patient_id` are invalid for the `acrin_6698` collection HOT 2
- Duke-Breast-Cancer-MRI clinical data is missing HOT 1
- Investigate ingestion of HTAN related data HOT 2
- Ingest RMS clinical data HOT 18
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from etl.