This repository contains python examples and notes around the development and implementations of various machine learning algorithms & data preparation techniques. It is a compilation of various book notes, bootcamps & research put together for myself - and others, to make references to quick implementations easily. Please feel free to contribute to the repository!
-- Jayson!
Training Machine Learning Algorithms for Classification
- Perceptron - Step by step implementation in Python
- Adaline - Working with the basics of optimizations, stochastic/batch gradient descent
Training Artificial Neural Networks for Image Recognition
- Multi-Layer Perceptron MLP - Part 1 - Implementation of MLP algorithm & analysis
Training Machine Learning Algorithms for Regression
- Overview Regression - Examples on linear regression, polynomial regression, random forest regressor
Machine Learning Classifiers Using Scikit-learn
- Perceptron with sklearn - Demonstrating the iris dataset with perceptron algorithm in sklearn ml kit
- Logistic Regression with sklearn - Working with logistic regression in sklearn & visualizing regularization
- Support Vector Machines (SVM) with sklearn - Using the linear and kernal SVM,
- Decision tree learning, Random Forest - Impurity measures, such as Gini, Entropy & Classification Error
- K-nearest neighbors classifier (KNN) - Visualizing the lazy learning algorithm
Clustering Analaysis
- Clustering Analysis Part 1 - K-means/++, Elbow method, Silhouette plots
- Clustering Analysis Part 2 - Hierarchical trees, Distance matrix, Dendrograms
Building good datasets
- Data Preprocessing 1 - Handling missing, nominal and ordinal values
- Data Preprocessing 2 - Dataset partitioning, feature scaling & selecting
- Data Preprocessing 3 - Sequential feature selection (SBS), Feature importance with Random Forests
Compressing Data via Dimensionality Reduction
- Principal component anaysis (PCA) - Unsupervised data compression
- Linear discriminant analysis (LDA) - Supervised dimensionality reduction
- Kernel Principal component analysis (K-PCA) - Nonlinear dimensionality
Model Evaluation and Hyper-parameter Tuning
- Model Evaluation 1 - Using pipelines and cross validation techniques
- Model Evaluation 2 - Learning curves, grid search, nested cross-validation
- Model Evaluation 3 - Precision, recall, F1-scores, ROC curves
Introduction
- The Classifier Interface - LinearSVC, RandomForest, Classifier Comparison
- The Regressor Interface - Ridge, RandomForestRegressor
- The Transformer Interface - StandardScaler, PCA, Dimensionality Reduction
- The Cluster Interface - KMeans, SpectralClustering, Overview of visuals
- The Manifold Interface - Unsupervised fitting with PCA, Isomap. Non-linear dimensionality reduction for use of visuals.
- Using Cross Validation - Splitting training/test and using cross validation to iterate scoring of classifiers
- Grid Searches - Recommend hyper-parameters (i.e, C, kernel, gamma) to be passed when building an estimator.
- Scikit Interface Summary - Quick recap on scikit-learns interface
Model Complexity, Overfitting and Underfitting
- Model Complexity - Overfitting, Underfitting visuals
- Linear models with Scikit - Linear regression, linear classification, regularization
- Kernel SVMs with Scikit - Support vector machines, kernel SVMs, hyperparameters
- Random Forests Preview - Decision tree classification, random forest classifier
- Learning Curves - Learning curves for analyzing model complexity
- Validation Curves - For Analyzing Model Parameters
- Hyperparameter CV Objects - Efficient Parameter Search with EstimatorCV Objects
Using Pipelines in Scikit Learn
- Motivation of using pipelines - Why pipelines, how not to do grid-searches.
- Defining a pipeline and basic usage - Examples of using pipelines and without pipelines.
- Cross-validation with pipelines - Cross-validation with/without pipelines
- Parameter selection with pipelines - Feature selection, grid-search using pipelines
- Python Machine Learning, Sebastian Raschka
- Applied Predictive Analytics, Dean Abott
- Advanced Machine Learning with scikit-learn, Andreas Mueller
- Machine Learning, Stanford University (Andrew Ng) - Using octave
- [140 Machine Learning Formulas](docs/140 Machine Learning Formulas.pdf)