Comments (9)
I have not started on parsing the RTF files
from etl.
This issue is still outstanding
from etl.
@G-White-ISB I made Excel spreadsheets for each of the dictionaries, see here:
The original RTF for convenience are here:
nlst780.idc.delivery.052821.zip
Can you please review and let me know if you would organize it differently or it is ok? If ok, I will send to TCIA and ask them to post on the wiki so you can ingest them from there and incorporate into your workflow. Would it be possible to add this to v15?
from etl.
I can definitely work with these Excell files. We'll get this in for v15. I don't know if we need to bother TCIA. After the release I'll see if I can do the rtf to Excel conversion programaticaly
from etl.
I don't know if we need to bother TCIA.
I do! If I put effort into this, and I believe it can help someone, I want it to be available, and ideally at a central place. I will take care of this.
After the release I'll see if I can do the rtf to Excel conversion programaticaly
I do not think this is worth the effort. I don't think we can expect those files to update dynamically, it is not a common representation, so we do it and forget about it until the next time (if the next time ever comes).
from etl.
@G-White-ISB I was reviewing this, and I have troubles understanding the BQ content.
I selected column metadata using this query:
SELECT
*
FROM
`bigquery-public-data.idc_v15_clinical.column_metadata`
WHERE
collection_id="nlst"
AND table_name="bigquery-public-data.idc_v15_clinical.nlst_prsn"
ORDER BY
column_label
- I would expect each tab in the individual spreadsheet would be stored as a separate table, but I only see
nlst_prsn
(the corresponding Excel contains 6 sheets) - I do not see the variables below
*
- I do see variables I do not see in the spreadsheet (and option descriptions are missing):
*
from etl.
The source DATA for nlst_prsn is all in ONE CSV file with all 30 + columns. The accompanying RTF document, which was used to create the Excel spreadsheet, explains different sets of columns on different pages.
Some columns in the dictionary were missed because the column name is not literally in the dictionary. Columns scr_iso1, scr_iso2, scr_iso2 are apparently covered by scr_iso0-2 in the dictionary.
from etl.
This needs to be addressed in the custom parsing script. The dictionary should contain actual values for meaning/labels.
from etl.
The column_metadata table in the pdp_staging dataset has been updated to include the column labels and options for scr_iso0.. scr_iso2 columns and scr_days0 ..scr_days2 columns as parsed from the dictionary. The table still needs to be updated in the public dataset.
from etl.
Related Issues (20)
- [clinical] Add PatientID to the per-collection tables (whenever it is not already available) HOT 7
- Map the patient identifier column in the clinical collections to the DICOM patientID HOT 2
- Clinical data per-table metadata tracking HOT 3
- Define BQ layout of clinical data tables HOT 1
- [clinical] collection_id should not be an array HOT 3
- variable_label should not be blank HOT 3
- Consider renaming "variable" to "column" in "column_metadata" HOT 2
- Inconsistencies identified for hnscc_3dct_rt_clinical table HOT 2
- Add regression testing to confirm consistency of clinical table schemas with column_metadata HOT 2
- Integrate 'Legacy' Clinical data into new clinical dataset.
- Inconsistencies identified for the ISPY1 clinical table HOT 4
- dicom_patient_id appears to be missing in several clinical tables HOT 2
- Use fully resolved versioned table names in all places HOT 1
- table_metadata should indicate whether table dictionary was parsed from sources or derived HOT 2
- acrin_6698 sbrgrade NAs are replaced with nulls HOT 2
- Values for `dicom_patient_id` are invalid for the `acrin_6698` collection HOT 2
- Duke-Breast-Cancer-MRI clinical data is missing HOT 1
- Investigate ingestion of HTAN related data HOT 2
- Ingest RMS clinical data HOT 18
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from etl.