melaniewacker / dlc-metadata-cleanup Goto Github PK
View Code? Open in Web Editor NEWMetadata remediation in preparation for ingest into DLC/Staff Viewer/etc. Primarily Omeka data.
Metadata remediation in preparation for ingest into DLC/Staff Viewer/etc. Primarily Omeka data.
Collection names: We used to transcribe the form as found in the archival collection portal exactly, including capitalization. We have moved to using the form as found in the archival collection portal with title case UNLESS there are already a lot of items (other than those in the Jewels project) that use the lower case form in the SCV and DCV. Those will get cleaned up at a later point. For the time being we are trying to avoid split files which can be caused by different capitalization. Please also consult the CUL_Collection_Names on Google drive and let me know if there are any discrepancies.
Could you change collection names to title case (where appropriate)?
move out of holdingExternal into location/url
check if correct naf form
add authority source if applicable
add valueURI if applicable
This is related to 033 and 033_000: This is actually a compilation of two different images. Both are included in the "Chinese Paper Gods" digital project. I think I located the correct "gods" so that we can use that information (e.g. that they are from the Anne S. Goodrich Collection plus the names of the gods) -- but please take a look and double-check me on that.
https://dlc.library.columbia.edu/catalog/ldpd:115059
https://dlc.library.columbia.edu/catalog/ldpd:114770
https://dlc.library.columbia.edu/catalog/ldpd:115097
Change current phrasing to:
Created and edited in general conformance to MODS Guideline (Version 3).
See: Jewels in Her Crown MODS Conversion (2014)
https://wiki.cul.columbia.edu/pages/viewpage.action?pageId=24904851
I don't believe the correct title has been pulled into the subject_title for 029_000 and 029. It appears the version on this printing block is only in Tibetan. "Polyglot" means it's in more than two languages. If there is no matching authority feel free to delete from this column and to record the information in a note.
Please add authority source (FAST) and subject URI for the subject_title (http://id.worldcat.org/fast/1357316) for 025_000 and 025
Wondering if maybe some of the drawings should be architectural drawings?
"Graphic delineations made for the design and construction (or documentation of design and construction) of sites, structures, details, fixtures, furnishings, and decorations, as well as other objects designed by an architect or architectural office."
Please take another look at the subjects. I think a few genres have crept into the subjects. E.g. the resource is a dissertation, but not about a dissertation.
The broad categories coming from the original site were changed to lcsh -- which works alright for "Music" for example. However, the items labeled "Social service" came from a category that was originally called "Philanthropy, Social Services, Human Rights". So by converting that to "social services" we ended up with a book about torture and political prosecution in that category. I've fixed that one, but I think the other items with that subject may need review as well.
Add attribute usage="primary" to primary (or first listed name)
See instructions on Jewels in Her Crown MODS Conversion (2014) page.
https://wiki.cul.columbia.edu/pages/viewpage.action?pageId=24904851
Robbie, Melanie: I just noticed that if in DLC you select the Urban collection and then look at the formats available, there is exactly one format, "drawings," with exactly one hit, a set model.
I assume there should probably be a metadata remediation ticket to remove the form element from that record?
And maybe a longer term metadata enhancement to assign "set model" as a format for the 150+ set model images in the project -- since set models were a key component of the project. It's worth keeping a metadata blue sky list more generally, since once Hyacinth is available we will have an easier path to metadata enhancement, remediation.
/Stephen
The collection name for items 137 and 137_00 should be Emilie Grace Briggs Papers (ie instead of y in Emily)
Format as CLIO_
See e-mail from 7/28/2014
Check for correct MARCorg codes. Two items are listed in the wrong repository. For:
September 11th Oral History Narrative and Memory Project ldpd.treasures.052
Marshall, Thurgood, 1908-1993 / Transcript of Oral History Interview ldpd.treasures.118
Use: NyNyCOH, full repository name is Columbia Center for Oral History
Treasures 072_001 and 072: The keyDateStart should read: 1921-11
According to instructions on Jewels in Her Crown MODS Conversion (2014) page.
https://wiki.cul.columbia.edu/pages/viewpage.action?pageId=24904851
extent information needs to be moved at times out of extent and into a public note.
This has been done in the spreadsheet, but I think the physicalNote column has been accidentally mapped to physicalDescription/extent instead of note.
The column item - itemType - text currently maps into physicalDescription/form. However, it is a real mixed bag of phrases, for example "Gouache sent as a letter". Wondering if we should map it into physicalDescription/note instead, including the preceding text "Original format:"?
http://www.loc.gov/standards/mods/userguide/physicaldescription.html#note
Please remove this element. A decision was made to eliminate the accessCondition statement.
Add the "free text date" which in the MODS record will be a date without any attributes. The “free text date” is displayed to end users while the keyDate will be used for searching/sorting. For example the free text date in jewels_aal_000 should be
circa 1830
(Note: The free text date in the DLC is ca. 1830 but circa should be spelled out.)
To formulate the free text date follow the instructions in the Omeka Data Dictionary:
"This is the display form of the date. Date may be entered in structured form, e.g., 1935-07-09 or textual form, e.g., July 9th, 1935. Use circa, approximately, or other appropriate term, when the date is uncertain, e.g., circa 1945. Spell out rather than abbreviate the term, thus circa not ca." https://wiki.cul.columbia.edu/display/metadata/Omeka+Data+Dictionary
Per instructions on Jewels in Her Crown MODS Conversion (2014) this element should always be populated.
https://wiki.cul.columbia.edu/pages/viewpage.action?pageId=24904851
Original Jewels identifier lacking. Column A in 2010 spreadsheet. See instructions on Jewels in Her Crown MODS Conversion (2014) page.
https://wiki.cul.columbia.edu/pages/viewpage.action?pageId=24904851
Should be a sublocation instead of a collection according to Alex T.
"It was a sublocation when I was doing Burke stuff. Don't think it would have changed. It is a collection I suppose in the sense that no new acquisitions would be classed there. But there may still be odd pockets in the cage or at Burke of old materials already classed as Union Rare but not yet cataloged that might still be recon-ned." E-mail 11/26/2014
Please add. See instructions on Jewels in Her Crown MODS Conversion (2014) page.
https://wiki.cul.columbia.edu/pages/viewpage.action?pageId=24904851
Robbie, Melanie: I just noticed that if in DLC you select the Urban collection and then look at the formats available, there is exactly one format, "drawings," with exactly one hit, a set model.
I assume there should probably be a metadata remediation ticket to remove the form element from that record?
And maybe a longer term metadata enhancement to assign "set model" as a format for the 150+ set model images in the project -- since set models were a key component of the project. It's worth keeping a metadata blue sky list more generally, since once Hyacinth is available we will have an easier path to metadata enhancement, remediation.
/Stephen
items 224 and 224_000 link to the wrong subject title. The words and music depicted on the playing cards are actually those of the Beggar's Opera written by John Gay, not John Christopher Pepusch
For treasures_128 and 128.001 please move initial article "Der" from title into nonSort
Speculum Romanae Magnificentiae
item pid: ldpd:113081
asset pid: ldpd:112669
Collection of Designs in Architecture, Containing New Plans and Elevations of Houses, for General Use
item pid: ldpd:112738
Title page: ldpd:112769
Unnumbered page: ldpd:112887
Should we generalize current Omeka XSLT for wider projects? Data dictionaries would then map columns to generalized MODS elements, and data source instead would have columns changed as part of remediation.
This means it isn't dependent on Omeka data dump names like 'DublinCore_-Creator-1-_Source' but could be 'Name_1', 'Name_2', etc...
Please remove date range from subject_name for item treasures_119 and 119_036
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.