- ThinkStat:统计分析
- knn_classification.ipynb:Iris,knn,split data
- regression.ipynb:LinearRegression,MSE,RMSE,R2,Lasso,RidgeRegression,ElasticNet,KFold
- apriori.ipynb:histogram,Apriori,最小支持度,项集(itemset),提升度lift
- stanford_segmenter.ipynb:斯坦福中文分词器
- text_clustering.ipynb:TF-IDF,Vectorizer,L2,KMeans
- topic_model.ipynb:英文文本预处理,topic model
- tutorial_matrix_factorization.ipynb:matrix factorization代码示例(梯度下降优化)
- tutorial_libfm:matrix factorization by libFM
- explore_user_data.ipynb:使用pyspark对MovieLens进行数据探索
- pyspark_regression.ipynb:使用pyspark构建回归模型