This is an experiment to see how effective news article clustering can be. It uses articles from the IEEE Spectrum website, as all articles there are clearly categorized, meaning the algorithm's output can be compared to the real classes.
article_df.csv is a tab-delimited file containing the article texts used in this experiment, while 13 clusters.csv and 26 clusters.csv are cross-tabulations of clustering results with the articles' original categories.
Packages used: newspaper, pandas, numpy, bs4 (BeautifulSoup), tqdm, sklearn