- There is video presentation to explain overview of this project.
Dealing with incomplete observation is one of the challenging tasks in machine learning. Incomplete observation can lead to erroneous predictions, and the consequences of erroneous predictions can sometimes be disastrous. It usually arises when the data is omitted in the collection process, when it does not meet the quality control criteria, or does not exist in the first place.
This dissertation starts from the one question: βWhat if the user has a short amount of time to give incomplete observations, but still wants valuable results?β There are still a lot of studies that handle missing observations in training sets alone. However, background research shows that there are not many studies that cover how to deal with missing observations on the testing set when the training set has completely filled.
This dissertation compares among four methods -Deletion, Mean Imputation, Regression Imputation, and custom K nearest neighbor Imputation- that deal with missing observations. Also, there are two housing property datasets to see how methods react on different structures of a dataset. The goal of this project was to provide an in-depth comparison among different methods that handle missing observation. The dissertation premises that the training set has no missing observation.