Giter Club home page Giter Club logo

ground-up-machine-learning's Introduction

Ground-Up Machine Learning

Fast-Track to Machine Learning:
A Curriculum Crafted for Newbies and Busy Bees

GitHub License Static Badge

Welcome to Ground-Up Machine Learning (GML) - a personal project that sprang from my desire to demystify the world of machine learning (ML) for friends and colleagues who expressed a keen interest but found themselves overwhelmed by the existing resources. This course is my attempt to break down ML into digestible, engaging, and accessible segments, ensuring that anyone, regardless of their background, can grasp the fundamentals of machine learning and see its beauty and utility in the modern world.

Table of Contents

  1. Introduction & Overview
  2. Week 1: Introduction to ML & Supervised Learning - Regression
  3. Week 2: Supervised Learning / Classification - From Basics to Deep Dive into Logistic Regression

Introduction & Overview

The Genesis of GML

Last month, my brother came to me and said that some of his friends and him are interested in learning about Machine Learning, and if I would be willing to hold classes for them and help out in giving some series of introductory lessons. Since I love teaching, and since my favorite crowd is people who are willingly putting themselves out there to learn, I jumped the gun and excitedly agreed.

While discussing the time requirements and how much time I can actually make them work on some extracurricular series (they are all already working hard on different things) we realized that there is little time and so much to learn.

This led to me doing some research on how a busy person can get exposed to Machine Learning as a complete entity without getting lost in the details and without having any background. This forced math out the door for the most part, and we decided a commitment of 8 hours a week (2h lecture + 6h self study).

In my research I couldn't find any resources that satisfies our criteria of having:

  • Structured short introduction to ML that explores various types.
  • Doesn't expect the students to have prior knowledge other than basics of python and high school math.
  • Focuses on providing only the necessary ML components and theory and jumping into a practical example.
  • Emphasizes on simple implementations and from scratch homeworks to enhance comprehension
  • Expects students to only commit 5-8 hours a week.

So I created one for my class, hoping that I can use it for future as well!

Target Audience

This curriculum is mainly designed for:

  • Absolute beginners curious about machine learning.
  • Busy people looking for a streamlined yet meaningful overview of ML.
  • Educators looking for a structured guide to introduce ML concepts to students (Could also be the starting point).

I think it's important to mention that the topics are not really designed for fully self-study, so a teacher or group study might be necessary. See Course Structure for more information on that.

Prerequisites

  • A foundational knowledge of Python is expected, as the course will briefly touch upon Python basics but primarily focus on applying it within ML context.
  • While not math-heavy, a basic understanding of calculus and linear algebra can help your learning experience.

Note

Nonetheless, none are dealbreakers; with a willingness to learn and explore these topics on your own, you'll find this curriculum both manageable and rewarding. It might just take a little longer and a little more self-study to understand some parts.

Tools and Libraries

Throughout the lectures we leverage several Python tools and libraries:

  • Python: The primary language of instruction.
  • NumPy and Pandas: For numerical computing and data manipulation.
  • Matplotlib/Seaborn: For visualizing data.
  • Scikit-learn: For easy-to-use ML models.

It's not really necessary to be familiar with any of these libraries for the course though.

Course Structure

GML unfolds over four weeks, each dedicated to a different part of machine learning:

  1. Introduction to ML & Supervised Learning - Regression
  2. Supervised Learning - Classification
  3. Unsupervised Learning & Clustering
  4. Introduction to Reinforcement Learning

Important

Each week includes a lecture guide and lab materials designed not as exhaustive resources but as interactive guides akin to slides that a teacher can use to lead discussions, demonstrations, and hands-on projects. The course is crafted with the classroom in mind, requiring an instructor to breathe life into the content. As such, it's perfect for educators or study groups seeking a structured path to explore ML together.

Expected Effort

To fully benefit from this curriculum, students/participants are expected to commit to approximately 2-3 hours of lecture time (really depends on the teacher) and 6-7 hours of self-study each week. This self-study time includes going through lab materials, completing assignments, and additional reading or practice as needed. The weekly structure is flexible and can be adjusted based on individual or group needs, making the course adaptable to different learning environments.

Note

I essentially envisioned the labs as a second part of the class that we go through with students but that was unfortunately not possible since the lectures already took a little over 2 hours and my students couldn't stay longer due to time differences.

Ideal Way to Follow the Curriculum

GML is structured to maximize learning through interaction, hands-on practice, and experimentation. To get the most out of this course, we recommend the following approach:

  1. Engage with the Lecture Content: Whether you're learning solo with an online community, in a classroom, or part of a study group, start by digesting the lecture content together. This collaborative approach allows for discussion, clarification of concepts, and sharing of insights, making the learning experience richer and more comprehensive.

  2. Hands-on Practice with Lab Material: After the lecture, dive into the lab material. This is where you'll get practical experience with the topics and methods introduced during the lecture. The labs are designed to be interactive, allowing you to apply what you've learned in a guided environment. Ideally, this should be done with the support of a teacher or within your study group, providing a collaborative space to explore and learn from each other.

  3. Weekly Assignments: Each week, you'll be tasked with an assignment that encourages you to implement and experiment with the week's topics. These assignments are crucial for deepening your understanding, as they require you to build concepts from scratch. Writing and debugging your code, and reflecting on your approach, will ensure you have a solid grasp of the material. These assignments are not only about reinforcing what you've learned but also about fostering creativity and problem-solving skills in real-world scenarios.

Why This Approach?

Learning machine learning, or any complex subject, is best achieved through a blend of theoretical understanding and practical application. By following this structured approach, you're not just passively absorbing information; you're actively engaging with the material, applying it in practical contexts, and solidifying your understanding through creation and collaboration. This methodology is designed to cater to diverse learning styles and to accommodate varying schedules, making ML accessible to everyone interested in embarking on this journey.

Join the Journey

GML is more than just a course; it's an invitation to explore the world of machine learning in a way that's engaging, accessible, and, most importantly, grounded in real-world application. Whether you're teaching a class, learning with peers, or guiding yourself through the fundamentals of ML, GML offers the tools, insights, and inspiration to embark on this exciting journey.

Welcome to Ground-Up Machine Learning. Let's discover the power of machines that learn, together.

Week 1: Introduction to ML & Supervised Learning - Regression

In this week, we will start with the basics of Machine Learning, and then move on to the fundamentals of Supervised Learning, focusing on Regression. We will also introduce the tools we will be using throughout the course, and discuss the importance of understanding data in Machine Learning.

I found that the best way to learn is by doing, so I am emphasizing that when teaching/studying this course we need to ask bunch of questions and do a lot of discussions. Also in the notebooks I tried to add bunch of interactive examples to make the learning process more fun and engaging.

Lecture Table of Contents

Topic Details Resources: Lecture Notebook / Lab Notebook
Introduction to Machine Learning and Its Evolution
  • History and Evolution of AI and Machine Learning: Quick overview, from perceptrons to deep learning.
  • Distinction between ML/AI/DS/DL: Clarifying the terms Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and Data Science (DS), and how they relate to each other.
  • Types of ML: Explaining the different Machine Learning methods: Supervised Learning, Unsupervised Learning and Reinforcement Learning.
Introcudtion to the Tools We will Utilize
  • Interactive Notebook: Showing how we are making use of interactive python notebooks, and why are they a good choice for us, and how to use them.
  • Python: Very quick and brief reminder of some python components.
  • Numpy: Mentioning the basic functions of numpy and how to use it.
  • Pandas: Tad bit of this as well!
Understanding Data in Machine Learning
  • Features and Targets: Introduction to the concepts of features (independent variables) and targets (dependent variables).
  • Data Visualization: Demonstrating the use of plots to explore relationships in data.
Fundamentals of Supervised Learning: Regression
  • Introduction to Supervised Learning: Basic introduction to what supervised learning is, and specifically regression.
  • Real-world examples: Discussing some real world scenarios and how regression is used in the real world.
  • Classification vs. Regression: Highlighting the differences, focusing on the predictive nature of regression for continuous outcomes.
  • Linear Regression Basics: Introduction to the simplest form of predictive modeling.
  • Loss Functions: Discussion on different ways to measure model accuracy, emphasizing Mean Squared Error (MSE) and Mean Absolute Error (MAE) for their simplicity and interpretability.
  • Overfitting and Underfitting: Concepts of overfitting and underfitting covered. How model complexity effects overfitting and underfitting.
  • Train-Test-Val Split: Importance of splitting data into training, test and validation sets to evaluate model performance realistically.

Homework Assignment: Building a Linear Regression Model from Scratch

Task(s):

  • Ground up coding of the linear regression model and all the bits and pieces we discussed in class using only python and numpy.
  • Using this model you built to train a regression model for a dataset you select. Use https://www.kaggle.com/datasets to find a dataset and a question you want to answer!
  • Then using your model, and some data cleaning answer your question.
  • Follow all the concepts you learned in class (and maybe even more) to solidify what we learned!

Week 2: Supervised Learning / Classification - From Basics to Deep Dive into Logistic Regression

In the second week, we move towards understanding Classification. We will start by covering the foundational elements of classification problems, including decision boundaries and the difference between binary and multiclass classification. We will then take a deep dive into Logistic Regression, exploring its mathematical foundations and practical applications.

The week will be rich with hands-on exercises to solidify these concepts, culminating in a homework assignment that challenges students to explore a new classification algorithm.

Lecture Table of Contents

Topic Details Resources: Lecture Notebook / Lab Notebook
Introduction and Why Classification Matters
  • Week 1 Recap: Brief recap of Week 1, emphasizing the transition from regression to classification.
  • The Importance of Classification and Real-World Applications: Brief introduction to classification problems and their significance in various domains such as healthcare, finance, and technology.
Building Intuition for Classification
  • Understanding Decision Boundaries: Introduction to the concept with visual aids.
  • Binary vs. Multiclass Classification: Discussing the differences with examples.
  • Different Classification Algorithms: High-level overview of popular algorithms.
Deep Dive into Logistic Regression
  • Introduction to Logistic Regression: Detailed explanation focusing on its probabilistic nature.
  • The Mathematics of Logistic Regression: Including logistic function, odds, and log-odds.
  • Maximum Likelihood Estimation (MLE): Explaining the concept and its implementation in logistic regression.
Gradient Descent for Optimization
  • Explanation of gradient descent as an optimization technique.
  • Detailing the cost function in logistic regression.
Model Evaluation Metrics
  • Starting with Accuracy: Explanation and significance.
  • Beyond Accuracy: Introduction to more advanced metrics like Precision, Recall, and ROC Curve.
Feature Handling and Preprocessing (Time Permitting) Importance of feature scaling and handling categorical variables in logistic regression.
Regularization and Avoiding Overfitting (Time Permitting) Discussion on overfitting and introduction to L1 and L2 regularization techniques.

Hands-On Lab Session: Multi-variable Logistic Regression Implementation

Objective: Apply logistic regression to a multi-variable dataset to deepen understanding of feature handling, model implementation, and evaluation.

Tasks:

  1. Feature Preprocessing: Discuss and implement preprocessing steps including feature scaling and handling categorical variables.
  2. Logistic Regression Implementation: Complete guided lab notebook code for logistic regression and gradient descent using vectorized implementations with NumPy.
  3. Training and Evaluation: Train the model on the training set and evaluate its performance on the test set using accuracy and at least one other metric (e.g., F1 score).
  4. Visualization: Visualize the decision boundary and model predictions on the test set to understand the model's performance. Also;
    • Plot the ROC curve and calculate the AUC score.
    • Investigate the learning curve to understand model performance.
  5. Compare with Scikit-learn: Implement logistic regression using Scikit-learn and compare the results with your implementation.

Homework Assignment: Exploring a New Classification Algorithm

Objective: We will implement and evaluate a classification algorithm not covered in detail in class, such as k-Nearest Neighbors (k-NN) or a Decision Tree, on a multi-variable dataset. This assignment aims to deepen your understanding of classification algorithms and their practical applications along with research and reflection on the learning process.

Tasks:

  1. Research and Implementation:

    • Select a classification algorithm not covered in the lecture (e.g., SVM) and research its working principles, advantages, and limitations.
    • Implement the selected algorithm to fit it on the same dataset used in the lab, following a similar structure to the logistic regression implementation. You should implement
    • Train the model on the training set and evaluate its performance on the test set using accuracy and at least one other metric.
    • Visualize the decision boundary of the model on the dataset.
  2. Comparison and Reflection:

    • Compare the performance of your model to the logistic regression model from the lab assignment. This comparison should include aspects like accuracy, computational efficiency, and ease of interpretation.
    • Write a short reflection of your learning experience, discussing the challenges you faced, how the chosen algorithm differs from logistic regression, and in what scenarios would you prefer one over the other.
  3. Advanced Exploration (Optional):

    • Here are some optional tasks to explore advanced features of your chosen algorithm;
      • Implement a hyperparameter tuning strategy (e.g., grid search or random search) to optimize the model's performance.
      • Investigate the impact of different hyperparameters on the model's performance and visualize the results.
      • Research and implement a more complex variant of the algorithm (e.g., kernel SVM) and compare its performance to the basic implementation.
      • Try a different dataset and compare the performance of the two algorithms on this new dataset (you can use a dataset from sklearn.datasets or kaggle.com).

Week 3: Unsupervised Learning & Clustering

In the third week, we move towards understanding how we approach Machine Learning when we are not exposed to the answers of the data (labels). Like the previous weeks we start with the basics of Unsupervised Learning, focusing on Clustering, and then move on to a deep dive into K-Means Clustering - one of the most popular clustering algorithms.

Later if we got time we will also discuss the concept of Dimensionality Reduction and how it can be used to reduce the complexity of the data.

I wanted to keep this week shorter and try having mostly learn + apply style of lecture. We will have a section and then students will be sent to rooms/groups to complete some tasks (coding for most parts).

Lecture Table of Contents

Topic Details Resources: Lecture Notebook / Lab Notebook
Introduction to Unsupervised Learning
  • Basics of Unsupervised Learning: Introduction to ML without labeled data.
  • Different Types of Unsupervised Learning: Talking a bit about Clustering, Dimensionality Reduction, and Association Rule Learning.
  • Real-world Applications: Discussing the significance of unsupervised learning in various domains.
Exploring Data with Clustering
  • Introduction to Clustering: Basics of clustering and its applications.
  • Types of Clustering: Overview of Hierarchical Clustering, K-Means Clustering, and DBSCAN.
  • Real-world Examples: Discussing how clustering is used in customer segmentation, anomaly detection, and more.
Deep Dive into K-Means Clustering
  • Introduction to K-Means: Detailed explanation of the algorithm and its working principles.
  • Choosing the Right Number of Clusters: Discussing the elbow method and silhouette score for optimal cluster selection.
  • Implementation: Basic implementation using simple python and numpy.
  • Model Evaluation: Exploring metrics like inertia and silhouette score to evaluate model performance.
Dimensionality Reduction Explained
  • Introduction to Idea: Discussion on how dimensionality reduction works.
  • PCA, t-SNE and UMAP basics: Introducing some very common dimensionality reduction techniques and explaining how they work.

Hands-On Lab Session: ...

TBD

Homework Assignment: ...

TBD

Contributions

We welcome contributions to the Ground-Up Machine Learning (GML) curriculum! Whether you're interested in adding new content, suggesting improvements, or fixing bugs, your input is valuable in making this an even better resource for everyone.

To contribute, please follow these steps:

  1. Fork the Repository: Start by forking the GML repository to your own GitHub account.
  2. Make Your Changes: Whether it's adding new materials, correcting typos, or suggesting enhancements, make your changes in your forked version.
  3. Submit a Pull Request: Once you're happy with your updates, submit a pull request back to the main GML repository. Please provide a clear description of your changes and the reasons for them.
  4. Review Process: Your pull request will be reviewed by me or the maintainers of the repo. We may engage with you for discussions or request modifications before merging your contributions.

For significant changes or new content, we recommend opening an issue to discuss your ideas with us before proceeding. This collaborative approach ensures that we maintain the integrity and coherence of the curriculum while incorporating the community's valuable insights.

License

The Ground-Up Machine Learning (GML) course is made available under the GNU General Public License v3.0. You are free to use, share, and modify the course materials for educational purposes, provided you adhere to the terms of the license.

Please review the full license for more details. This open license is part of our commitment to supporting and contributing to the open-source community, making learning accessible to as many people as possible.

ground-up-machine-learning's People

Contributors

bedirt avatar

Watchers

 avatar  avatar

ground-up-machine-learning's Issues

Remove `dropdown` s from week3

By accident i used jupyter-book docs instead of jupyter :) so now I have incompetible comments type stuff. Needs to be removed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.