This project is an archive of a spam classifier built for my students to use using the Naive Bayes model.
WARNING: SHIT CODE Use at your own risk and I may or may not offer limited help as I am not experienced.
- scikit-learn
- jieba
- joblib
The data is collected online, with the following format:
<label> <text>
...
By default, it should be named as labeled.txt
and placed in ./data
directory. You can change it as you wish.
On the other hand, the list of stop words is placed in ./data
as well, with the name of stop.txt
.
Read the code yourself, and it should be easy to get this runnung.
Run
python main.py