By implementing various functions from the Python Spark API, the data was transformed to develop a scorer that can predict whether a patient record will be a match based on the values considered significant.
Python 100.00%
record-linkage's Introduction
Record Linkage Exercise Using Pyspark
This Exercise Analyzes Data from the UC Irvine Machine Learning Repository. Data was fetched from their database to analyze a record linkage study that was done on patients that were matched over various different features.
By implementing various functions from the Python Spark API, the data was transformed to develop a scorer that can predict whether a patient record will be a match based on the values considered significant.