A program to process and validate metadata spreadsheets, pulling additional data from MARC records using pymarc.
##Data Paths ~/Box\ Sync/PrangeMetadataStuff/CSV-data-conversion/csv/ ~/Box\ Sync/PrangeMetadataStuff/CSV-data-conversion/excel/ ~/Box\ Sync/PrangeMetadataStuff/CSV-data-conversion/marc/ ~/Box\ Sync/PrangeMetadataStuff/CSV-data-conversion/tsv/
- Read Spredsheet Data.
- Load MARC file into array using pymarc.
- Check Spreadsheet Header Rows Against One Another.
- Search for matching MARC records.
- Report on possible matches.
- Pull data over from MARC to main array.
- Output main array into single CSV file for ingest into Fedora.
- Remove brackets from author names (were used to indicate supplied names, but not needed for Digital Collections).
- Separate the term for "editor" (編, 編纂, 編集, 編輯) from the name of the editor; and likewise remove the term for author (著) from the author column
- Separate page count info from dimensions; create sum of page counts where multiple page counts have been listed.
- Remove Japanese dates from publication date field.
- Remove the Y abbreviation for Yen.
- Compare call no. from spreadsheet against field 852h (where multiple call nos. in Aleph, check each one); if only one match is found, trust the match.
- If no match found, try matching 852i or combined 852h + i.
- If no match found, try matchign after removing volume info from call no.
- If no match found, try matching on Author/Title.
- If multiple matches found, flag record for follow up.
- Output a report of all matches by each of the various methods, as well as a list of unmatched records from the spreadsheets.