This project aims to accomplish F1 race winner using predictions.
Formula 1 dataset: https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020
Scala
Spark
Python 3.8+
Navigate to streamlit folder and run the application by the command:
streamlit run app.py
The F1 race winner prediction project aims to leverage machine learning techniques to predict the winner of the Formula One racing season. The project will use historical data from past Formula One seasons to train and test machine learning models, and then use the trained model to predict the winner of the upcoming races.
The data files will be loaded using Spark and Scala, which will enable the project team to perform exploratory data analysis and data preprocessing. Once the data has been processed, the team will use Spark ML to build and train machine learning models.
Streamlit will be used to create interactive data visualizations that will help the project team to analyze the data and communicate their findings. Finally, the project team will use the trained model to predict the winner of the Formula One racing season.
The project will be divided into several phases, including data collection, data preprocessing, model development, model training and evaluation, and prediction. The successful completion of the project will provide valuable insights into the factors that determine the winner of the Formula One racing season. The project outcomes will be useful for race enthusiasts, betting companies, and F1 teams.
Data preprocessing and Cleaning: Available in data-processing-scala folder
ML prediction: Available in prediction folder
Visualization: Streamlit folder
Data from kaggle is preprocessed, cleaned using Scala Spark and stored in cleaned_data folder. cleaned_data folder is referenced for visualization