The dataset is Spam Assasin dataset from kaggle named emails.csv. I have used word_tokenizer,stopword_removal,porter_stemming and count_vectorizer as some of the data preprocessing. Trained over various model, at the end I used Random Forest, XGBoost and Multilayer Perceptron which gave me 98% of accuracy.
ashutosh-4485 / spam-email-classification Goto Github PK
View Code? Open in Web Editor NEWThe dataset is taken from kaggle named emails.csv.