Giter Club home page Giter Club logo

machine_learning_module's Introduction

Here, I will add Machine Learning Models and usage and Explanations

Types of ML

based on Human supervision:

  1. Supervised
  2. Unsupervised
  3. Semisupervised
  4. Reinforcement

based on data ingestion

  1. batch processing
  2. online processing

based on content

  1. Discriminative: A Discriminative model ‌models the decision boundary between the classes (conditional probability distribution p(y|x)).
    • ‌Logistic regression, SVMs, ‌CNNs, RNNs, Nearest neighbours.
  2. Generative: A Generative Model ‌explicitly models the actual distribution of each class (joint probability distribution p(x,y)).
    • Use Bayes rule to calculate P(Y |X)
    • Naïve Bayes, Bayesian networks, Markov random fields, AutoEncoders, GANs.

based on parameter

  1. Parametric: parametric model summarizes data with a set of fixed-size parameters (independent on the number of instances of training). Eg: Linear, Logistic Regression, linear SVM (wTx + b = 0), Linear Discriminant Analysis, Perceptron, Naive Bayes, Simple Neural Networks.
  2. Non-parametric: which do not make specific assumptions about the type of the mapping function. The word nonparametric does not mean that the value lacks parameters existing in it, but rather that the parameters are adjustable and can change. eg: k-Nearest Neighbors, Decision Trees, SVMs.

A paramter is something that is estimated from the training data and change (learnt) while training a model. They can be weights, coefficients, support vectors etc.

Other techniques

  1. Ensemble Learning Algorithms: Bagging, Boosting, stacking
  2. Deep Learning Algorithms: CNN, RNN
  3. Bayesian Learning Algorithms : Naive Bayes, Bayesian Networks
  4. Instance-Based Learning Algorithms: k-Nearest Neighbors (k-NN), Locally Weighted Regression (LWR)
  5. Clustering Algorithms : K-Means, Hierarchical clustering
  6. Dimensionality Reduction Algorithms: PCA (linear), t-SNE(non linear)

Concepts

Maximumm Liklihood Estimation (MLE) is a method that determines values of the parameters of a model such that they maximise the likelihood of observed data given a probability distribution.

Supervised ML Table

Regression

Here, the model predicts the relationship between input features (independent variables) and a continuous output variable.

Machine Learning Models Concepts Usecases
Linear Regression There are four assumptions:
1. Linearity: The relationship between X and the mean of Y is linear. $Y=\beta_{0}+\beta{1}X+\epsilon\text{(Error term)}$
Detection: Residual plots (against X), nicely and event spread.
2. Homoscedasticity: the variance of error terms are similar across the values of the independent variables. A plot of standardized residuals versus predicted values can show whether points are equally distributed across all values of the independent variables.
3. Little to no Multicollinearity: Independent variables are not highly correlated with each other. This assumption is tested using Variance Inflation Factor (VIF) values. One way to deal with multicollinearity is subtracting mean.
4. Normality: Residuals should be normally distributed. This can be checked using histogram of residuals.


- Feature scaling is required
- Sensitive to missing value
Good for sparse, high-dimensional data
1. Advance House Price Prediction
2. Flight Price Prediction
Polynomial regression more the degree of the polynomial the better is the fit but the more is the issue of overfitting.

Evaluation Metrics: Regression models are typically evaluated using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²), among others.

Classification:

Machine Learning Models Concepts Usecases
Logistics Regression Assumptions:
1. The outcome is binary
2. Linear relationship between the logit of the outcome and the predictors.
Logit function: $\text{logit}(p) = \log\left(\frac{p}{1-p}\right)$
$(p)$: probability of the outcome
3. No outliers/extreme values in the continuous predictors
4. No multicollinearity among the predictors

Sigmoid/Logistic Function: S-shaped curve that takes any real number and maps it between 0 and 1 $f(x)=\frac{1}{1+e^{-x}}$

- Feature scaling is required
- Sensitive to missing value
Decision Tree
Support Vector Machines

Evaluation Metrics: Classification models are evaluated using metrics such as accuracy, precision, recall, F1-score, and the confusion matrix, depending on the problem and class distribution.

Structuring ML

image

ML Algos

image

src: https://github.com/nvmcr/DataScience_HandBook/tree/main/Machine_Learning https://github.com/dhirajmahato/Machine-Learning-Notebooks

machine_learning_module's People

Contributors

dhirajmahato avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.