However, this file is not used anywhere in the scripts and I am not sure where it comes from, but the fields match at least a subset of what's listed in the data dictionary. I am using this file because it's the only copy of the source data I can find and I want to build some Disparate Impact Analysis demos and it's a nice dataset to use.
The source data used in hmda_sample_for_paper.py is a hardcoded path that's not available. So I think there is some disconnect here. It also contains many more fields than are present in the hmda_lar_2018_orig_mtg_sample.csv file -- more refined information, so instead of just derived_race (a summary of race of applicant & co-applicant), it will have 5 race fields for each applicant (primary and co-).