Feature importance is one of the important tools in Machine learning to reduce the unwanted or garbage features from our feature set. The goal of this report is to better understand the features in our dataset and to be able to decide on which features to select based on multiple strategies. Some of the questions we would answer include:
- What is feature importance and why do we need it?
- How do we judge if a feature is important or not in relative and absolute terms?
- How do we rank the features based on feature importance?
- What are the different strategies available to get the feature importances?
- What is the best strategy to select the top k features?
- How do we rank the strategies for selecting our features?
- Is there a way we can automate our feature selection algorithm?
- How reliable are our feature importance values?
- How can we select features by comparing to a Null distribution of our response variable?
Please check the notebook named featimp.ipynb for more details and the implementation.