This repository contains the data and notebook used to train a Google AutoML NER network. The goal of this project was to train a network that could help with the analysis of policy documentation related to climate change and the environment. Documents were obtained from govinfo on the House Select Committee on the Climate Crisis.
-
0_Deprecated - Old documents
-
1_Raw_Climate_Crisis_Text - Hearing transcript txt files from the House Select Committee on the Climate Crisis
-
2_Processed_Data - Final annotated txt files in jsonl format and accompanying csv files used for the dataset creation in Google Cloud Platform
-
document_data_prep.ipynb - Notebook used to process raw txt files to final jsonl formatted txt files
By: Jack Seagrist