This repository showcases an analysis using the Decision Tree Classifier algorithm on a banking dataset. The goal is to predict whether a customer will subscribe to a term deposit, which is a common classification problem in the banking sector. The analysis involves data preprocessing, training a decision tree model, and evaluating its performance.
The preprocessing steps ensure that our data is clean and suitable for modeling:
- Handling missing values
- Encoding categorical variables
- Splitting the dataset into training and testing sets
We train a Decision Tree Classifier with the following configuration:
DecisionTreeClassifier(random_state=42)
Post-training, we evaluate the model using accuracy and a detailed classification report:
- Accuracy is used as a measure of the overall performance.
- Precision, recall, and F1-score provide insights into the performance for each class.
The Decision Tree Classifier yields an accuracy of approximately 87%, with the following classification report:
precision recall f1-score support
0 0.94 0.92 0.93 700
1 0.36 0.42 0.39 73
accuracy 0.87 773
macro avg 0.65 0.67 0.66 773
weighted avg 0.88 0.87 0.88 773
Decision Tree Classifier.ipynb
: The Jupyter notebook containing the entire process of data preprocessing, model training, and evaluation.
The analysis demonstrates the potential of decision tree algorithms in classification tasks within the banking sector. The results highlight areas for further improvement and refinement, such as feature engineering and model tuning, to enhance the predictive performance.