Giter Club home page Giter Club logo

Comments (9)

G-White-ISB avatar G-White-ISB commented on August 10, 2024

I have not started on parsing the RTF files

from etl.

G-White-ISB avatar G-White-ISB commented on August 10, 2024

This issue is still outstanding

from etl.

fedorov avatar fedorov commented on August 10, 2024

@G-White-ISB I made Excel spreadsheets for each of the dictionaries, see here:

NLST_dicts_xls.zip

The original RTF for convenience are here:

nlst780.idc.delivery.052821.zip

Can you please review and let me know if you would organize it differently or it is ok? If ok, I will send to TCIA and ask them to post on the wiki so you can ingest them from there and incorporate into your workflow. Would it be possible to add this to v15?

from etl.

G-White-ISB avatar G-White-ISB commented on August 10, 2024

I can definitely work with these Excell files. We'll get this in for v15. I don't know if we need to bother TCIA. After the release I'll see if I can do the rtf to Excel conversion programaticaly

from etl.

fedorov avatar fedorov commented on August 10, 2024

I don't know if we need to bother TCIA.

I do! If I put effort into this, and I believe it can help someone, I want it to be available, and ideally at a central place. I will take care of this.

After the release I'll see if I can do the rtf to Excel conversion programaticaly

I do not think this is worth the effort. I don't think we can expect those files to update dynamically, it is not a common representation, so we do it and forget about it until the next time (if the next time ever comes).

from etl.

fedorov avatar fedorov commented on August 10, 2024

@G-White-ISB I was reviewing this, and I have troubles understanding the BQ content.

I selected column metadata using this query:

SELECT
  *
FROM
  `bigquery-public-data.idc_v15_clinical.column_metadata`
WHERE
  collection_id="nlst"
  AND table_name="bigquery-public-data.idc_v15_clinical.nlst_prsn"
ORDER BY
  column_label
  • I would expect each tab in the individual spreadsheet would be stored as a separate table, but I only see nlst_prsn (the corresponding Excel contains 6 sheets)
  • I do not see the variables below
    *
    image
  • I do see variables I do not see in the spreadsheet (and option descriptions are missing):
    *
    image

from etl.

G-White-ISB avatar G-White-ISB commented on August 10, 2024

The source DATA for nlst_prsn is all in ONE CSV file with all 30 + columns. The accompanying RTF document, which was used to create the Excel spreadsheet, explains different sets of columns on different pages.

Some columns in the dictionary were missed because the column name is not literally in the dictionary. Columns scr_iso1, scr_iso2, scr_iso2 are apparently covered by scr_iso0-2 in the dictionary.

from etl.

fedorov avatar fedorov commented on August 10, 2024

This needs to be addressed in the custom parsing script. The dictionary should contain actual values for meaning/labels.

from etl.

G-White-ISB avatar G-White-ISB commented on August 10, 2024

The column_metadata table in the pdp_staging dataset has been updated to include the column labels and options for scr_iso0.. scr_iso2 columns and scr_days0 ..scr_days2 columns as parsed from the dictionary. The table still needs to be updated in the public dataset.

from etl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.