Comments (8)
We are copying the CPTAC BQ table into per collection tables in our clinical dataset and recording the source BQ table in the table_metadata table. Suggest we can close this issue
from etl.
Table name is now recorded literally in the clinical_meta and clinical_summary tables, ie no 'dereferencing' needed. Project and dataset are no recorded literally in the clinical_meta table.
But the project and dataset columns should be moved from clinical_meta to clinical_summary. clinical_summary has table level meta information about the clinical tables, while clinical_meta has column level meta information about the clinical tables
from etl.
I am assuming the table names changed to be clinical_meta_table
and clinical_meta_column
. But it looks like we do not have any collections that would rely on tables from other projects integrated into this right now (ie, neither CPTAC nor TCGA collections are included). Should we go through the steps to integrate at least some project that relies on external tables to make sure the architecture of how things are working and organized can support external sources? Did we decide if we would replicate those external tables under versioned dataset, or indeed include external references?
from etl.
I think we have not made a decision with respect to referencing or just duplicating external sources.
from etl.
I propose duplicating external sources. Those tables should not be large, and if we do not duplicate them, they can disappear or change at any moment.
from etl.
Sure. I expect ISB-CGC is the only other entity pulling relevant data into BigQuery. In addition to CPTAC I know an ISB colleague will be gathering HTAN clinical data into BigQuery.
from etl.
Just one clarification question. Currently, the source of CPTAC tables points to current
(isb-cgc-bq.CPTAC.clinical_gdc_current
). In a few months from now, it might be the case that current
will be different. Can you discuss with the ISB-CGC folks if it makes sense to point to the actual numbered/versioned table instead of current, and note the response here?
from etl.
Superseded by #40
from etl.
Related Issues (20)
- [clinical] Add PatientID to the per-collection tables (whenever it is not already available) HOT 7
- Map the patient identifier column in the clinical collections to the DICOM patientID HOT 2
- [clinical] Support NLST clinical data HOT 9
- Clinical data per-table metadata tracking HOT 3
- Define BQ layout of clinical data tables HOT 1
- [clinical] collection_id should not be an array HOT 3
- variable_label should not be blank HOT 3
- Consider renaming "variable" to "column" in "column_metadata" HOT 2
- Inconsistencies identified for hnscc_3dct_rt_clinical table HOT 2
- Add regression testing to confirm consistency of clinical table schemas with column_metadata HOT 2
- Integrate 'Legacy' Clinical data into new clinical dataset.
- Inconsistencies identified for the ISPY1 clinical table HOT 4
- dicom_patient_id appears to be missing in several clinical tables HOT 2
- Use fully resolved versioned table names in all places HOT 1
- table_metadata should indicate whether table dictionary was parsed from sources or derived HOT 2
- acrin_6698 sbrgrade NAs are replaced with nulls HOT 2
- Values for `dicom_patient_id` are invalid for the `acrin_6698` collection HOT 2
- Duke-Breast-Cancer-MRI clinical data is missing HOT 1
- Investigate ingestion of HTAN related data HOT 2
- Ingest RMS clinical data HOT 18
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from etl.