Giter Club home page Giter Club logo

dlc-metadata-cleanup's Introduction

dlc-metadata-cleanup

Metadata remediation in preparation for ingest into DLC/Staff Viewer/etc. Primarily Omeka collections' metadata.

This GitHub page will be used to store data, XSLT, and comments/issues found as the collections are ingested and reviewed by the metadata staff.

Omeka -> MODS workflow and documents:

Current Workflow (last updated June 2017):

  • Digital Projects Librarian prioritizes exhibitions for remediation based on current needs and notifies metadata staff.
  • Metadata staff download the bibliographic/descriptive metadata in bulk for each Omeka Collection.
  • Metadata staff upload the metadata into OpenRefine for remediation following guidelines discussed in Workflow Meeting Notes.
  • Once metadata is cleaned, metadata staff will upload CSV, Excel, and/or OpenRefine to the wiki page for that particular metadata collection.
  • Once the metadata has been remediated and reviewed, metadata staff notifies the Digital Projects Librarian that the metadata is ready for ingest into Hyacinth.
  • Digital Projects Librarian uploads the metadata into Hyacinth and publishes out to DLC, etc.

Use the wiki page for the list of prioritized collections, mapping, Repository codes to be used, etc. - https://wiki.cul.columbia.edu/display/metadata/Omeka

dlc-metadata-cleanup's People

Contributors

alexanderjwhelan avatar amberbilley avatar blunalucero avatar cmharlow avatar erinpetrella avatar melaniewacker avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

dlc-metadata-cleanup's Issues

Jewels: Collection names

Collection names: We used to transcribe the form as found in the archival collection portal exactly, including capitalization. We have moved to using the form as found in the archival collection portal with title case UNLESS there are already a lot of items (other than those in the Jewels project) that use the lower case form in the SCV and DCV. Those will get cleaned up at a later point. For the time being we are trying to avoid split files which can be caused by different capitalization. Please also consult the CUL_Collection_Names on Google drive and let me know if there are any discrepancies.
Could you change collection names to title case (where appropriate)?

Remove accessCondition

Please remove this element. A decision was made to eliminate the accessCondition statement.

Jewels: Relinking assests to correct items

  1. Speculum Romanae Magnificentiae
    item pid: ldpd:113081
    asset pid: ldpd:112669

  2. Collection of Designs in Architecture, Containing New Plans and Elevations of Houses, for General Use
    item pid: ldpd:112738
    Title page: ldpd:112769
    Unnumbered page: ldpd:112887

Urban format change

Robbie, Melanie: I just noticed that if in DLC you select the Urban collection and then look at the formats available, there is exactly one format, "drawings," with exactly one hit, a set model.

https://dlc.library.columbia.edu/catalog?f[lib_collection_sim][]=Joseph+Urban+Papers&f[lib_format_sim][]=drawings

I assume there should probably be a metadata remediation ticket to remove the form element from that record?

And maybe a longer term metadata enhancement to assign "set model" as a format for the 150+ set model images in the project -- since set models were a key component of the project. It's worth keeping a metadata blue sky list more generally, since once Hyacinth is available we will have an easier path to metadata enhancement, remediation.

/Stephen

Jewels: Date

Treasures 072_001 and 072: The keyDateStart should read: 1921-11

Jewels: Collection name

The collection name for items 137 and 137_00 should be Emilie Grace Briggs Papers (ie instead of y in Emily)

Quran: CLIO Id in identifier

Format as CLIO_
See e-mail from 7/28/2014

  1. For identifier we use the mods identifier element, not relatedItem
  2. The relevant type attribute will be used, thus,
    <mods:identifier type="CLIO">
    <mods:identifier type="omeka">
  3. We limit the types that are used and maintain a list on the metadata wiki
  4. The type will also be included in the data value, e.g., <mods:identifier type="CLIO">CLIO_8750/mods:identifier.

Jewels: Urban format change

Robbie, Melanie: I just noticed that if in DLC you select the Urban collection and then look at the formats available, there is exactly one format, "drawings," with exactly one hit, a set model.

https://dlc.library.columbia.edu/catalog?f[lib_collection_sim][]=Joseph+Urban+Papers&f[lib_format_sim][]=drawings

I assume there should probably be a metadata remediation ticket to remove the form element from that record?

And maybe a longer term metadata enhancement to assign "set model" as a format for the 150+ set model images in the project -- since set models were a key component of the project. It's worth keeping a metadata blue sky list more generally, since once Hyacinth is available we will have an easier path to metadata enhancement, remediation.

/Stephen

Jewels: drawings vs. architectural drawings

Wondering if maybe some of the drawings should be architectural drawings?
"Graphic delineations made for the design and construction (or documentation of design and construction) of sites, structures, details, fixtures, furnishings, and decorations, as well as other objects designed by an architect or architectural office."

Jewels: Subjects vs. Genre

Please take another look at the subjects. I think a few genres have crept into the subjects. E.g. the resource is a dissertation, but not about a dissertation.

Jewels: Subject Social Service

The broad categories coming from the original site were changed to lcsh -- which works alright for "Music" for example. However, the items labeled "Social service" came from a category that was originally called "Philanthropy, Social Services, Human Rights". So by converting that to "social services" we ended up with a book about torture and political prosecution in that category. I've fixed that one, but I think the other items with that subject may need review as well.

Jewels: Chinese Paper Gods

This is related to 033 and 033_000: This is actually a compilation of two different images. Both are included in the "Chinese Paper Gods" digital project. I think I located the correct "gods" so that we can use that information (e.g. that they are from the Anne S. Goodrich Collection plus the names of the gods) -- but please take a look and double-check me on that.
https://dlc.library.columbia.edu/catalog/ldpd:115059
https://dlc.library.columbia.edu/catalog/ldpd:114770
https://dlc.library.columbia.edu/catalog/ldpd:115097

Jewels: location/physicalLocation

Check for correct MARCorg codes. Two items are listed in the wrong repository. For:

September 11th Oral History Narrative and Memory Project ldpd.treasures.052

Marshall, Thurgood, 1908-1993 / Transcript of Oral History Interview ldpd.treasures.118

Use: NyNyCOH, full repository name is Columbia Center for Oral History

Add the free text date

Add the "free text date" which in the MODS record will be a date without any attributes. The “free text date” is displayed to end users while the keyDate will be used for searching/sorting. For example the free text date in jewels_aal_000 should be
circa 1830
(Note: The free text date in the DLC is ca. 1830 but circa should be spelled out.)

To formulate the free text date follow the instructions in the Omeka Data Dictionary:
"This is the display form of the date. Date may be entered in structured form, e.g., 1935-07-09 or textual form, e.g., July 9th, 1935. Use circa, approximately, or other appropriate term, when the date is uncertain, e.g., circa 1945. Spell out rather than abbreviate the term, thus circa not ca." https://wiki.cul.columbia.edu/display/metadata/Omeka+Data+Dictionary

Per instructions on Jewels in Her Crown MODS Conversion (2014) this element should always be populated.
https://wiki.cul.columbia.edu/pages/viewpage.action?pageId=24904851

Quran: Burke [UTS] Union Rare

Should be a sublocation instead of a collection according to Alex T.
"It was a sublocation when I was doing Burke stuff. Don't think it would have changed. It is a collection I suppose in the sense that no new acquisitions would be classed there. But there may still be odd pockets in the cage or at Burke of old materials already classed as Union Rare but not yet cataloged that might still be recon-ned." E-mail 11/26/2014

Jewels: More format changes (from Jenny)

  1. Audubon Birds of America turns up under Prints. It is NOT a print, but the full book.
  2. The Marco Polo is a book, not a print.
  3. The Joseph Urban Blue Nursery is a set model (3-D) not a drawing. Perhaps set models could be its (maybe the same as #2 )
  4. the African Union Hymn Book is a printed book, not a Manuscript.

Jewels: no. 29, subject title

I don't believe the correct title has been pulled into the subject_title for 029_000 and 029. It appears the version on this printing block is only in Tibetan. "Polyglot" means it's in more than two languages. If there is no matching authority feel free to delete from this column and to record the information in a note.

Jewels: Beggar's Opera

items 224 and 224_000 link to the wrong subject title. The words and music depicted on the playing cards are actually those of the Beggar's Opera written by John Gay, not John Christopher Pepusch

Generalize Omeka XSLT?

Should we generalize current Omeka XSLT for wider projects? Data dictionaries would then map columns to generalized MODS elements, and data source instead would have columns changed as part of remediation.

This means it isn't dependent on Omeka data dump names like 'DublinCore_-Creator-1-_Source' but could be 'Name_1', 'Name_2', etc...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.