convxgb_phishing_url_detection's Introduction

ConvXGB_Phishing_URL_detection

Hello, This repository contains the step by step implementation of ConvXGB model in detecting Phishing URL on mendeley datasets. The datasets can be freely downloaded at https://data.mendeley.com/datasets/n96ncsr5g4

1.Data Gathering

you can just download all the files provided in mendeley dataset and extract them to single folder, then use the "1.txt2image_transformation.py" python script which requires only rec_id and url as input from mendeley phishing-url datasets to re-visit the url and crawl the webpage in PNG format. This script uses python selenium libraries to re-visit the given webpage and crawl the webpage as PNG format to local storage.

2.EDA

Exploratory data analysis techniques are applied on the mendeley phishing url datasets.

3.Data Pre-processing

Several tasks are performed in this data pre-processing section, which includes but not limit to screen each image for complete white-background and purging, data similarity check, test-train split, etc

4.Data Validation

validation for quality of data, missing data, etc are done.

5.Hypertuning experiments

several hyperparameters are tuned to build the final_model. each folder has jupyter notebook of experiment with outputs.

6.Final_model

Final model provides the best outstanding performace.

Recommend Projects

yangguang-v / convxgb_phishing_url_detection Goto Github PK