In this project, I'm going to go through an example machine learning project with the goal of predicting the sale price of bulldozers.
-
Problem defition How well can we predict the future sale price of a bulldozer, given its characteristics and previous examples of how much similar bulldozers have been sold for?
-
Data The data was provided by the client
There are 3 main datasets:
Train.csv is the training set, which contains data through the end of 2019. Valid.csv is the validation set, which contains data from January 1, 2020 - April 30, 2020 You make predictions on this set throughout the majority of the competition. Your score on this set is used to create the public leaderboard. Test.csv is the test set. It contains data from May 1, 2020 - November 2020. 3. Evaluation The evaluation metric is the RMSLE (root mean squared log error) between the actual and predicted auction prices.