Giter Club home page Giter Club logo

animl's Introduction

AnIML: Another Introduction to Machine Learning

Jupyter Book Badge

Author: Hunter Schafer

This repository defines the source code for the AnIML book. This book corresponds with the course content of University of Washington's CSE/STAT 416.

Feedback or Spot a Bug?

If you have any feedback about the book text or structure, or you spot a bug somewhere in the book, please let us know! The best way to contact us is to make an GitHub Issue or to contact Hunter Schafer directly.

Contributing

This book is built with the Sphinx Book Theme to generate HTML.

Setup

Confusingly, we have a separate set of dependenceis to build the animations. The reason these are not part of the main setup is that they currently don't work on our CI build. So we first have to develop the animations locally and then commit the video files before building a new version of the book.

Install Publishing Build Dependencies

Create a virtual environment with Python 3.9 or higher. For example, if you use Anaconda you can write:

conda create --name animl-book python=3.9
conda activate animl-book

Install the book theme dependencies.

All of these are libraries used for themes/templating in the book. Sphinx is the documentation templating tool, sphinx-book-theme is the specific book theme, myst-nb changes the Sphinx langauge from rST to MyST (more similar to Markdown), and sphinx-thebe allows interactive notebooks in the browser.

pip install -r requirements.txt

Install Develepor Dependencies

  1. To install the library for generating animations, follow the instructions here.
  2. Also install LaTeX on your system with whatever method is best.

Editing the book

The book text is stored in book_source/source. Each MyST file (.md) corresponds to a single page of the book. Some pages, like the index.md files for the Modules don't contain any useful information other than links. Some of the book pages are Juptyer notebooks which also get converted to HTML.

Edit the book text by editing the appropriate MyST file. See MyST's documentation for syntax examples (note: it is incredibly similar to plain markdown, with some extra macros available).

The practice problem starter code and tests live in book_source/coding_problems.

Rebuilding the book

Build the new book HTML by running:

# From the top-most directory
jupyter-book build book_source/source

# Or with the make command
make all

This will rebuild the whole book into the book_source/source/_build directory, which might take some time depending on the change.

Committing and pushing changes

Stage any changes to the book_source and push. We do not stage any changes to build files. Whenever we push to main, GitHub Actions will build the site again and deploy it to the gh-pages branch.

Special note aboute deploying:

This will likely not matter, but is a bug we ran into a few times when setting up the book so I thought we should docunment it. T here must be a file called .nojekyll in the directory wherever GitHub Pages is deployed. This file exists on the gh-pages branch and should stay there by itself. If something weird happens though, check to make sure it is still there.

animl's People

Stargazers

 avatar

Watchers

 avatar  avatar

animl's Issues

Build manim animations on Github Actions

Currently it takes too long and breaks too often to build the manim animations on Github actions. Right now have to manually build before push. Would be great to fix that eventually.

Write Fairness chapter

  • Examples of ML systems gone wrong
  • COMPAS
  • Sources of Bias
  • Definitions of fairness
    • College Admissions Example
    • "Shape blind"
    • Statistical Parity
    • Equal Opportunity
    • Predictive equality

Ridge Feedback

Thread for feedback about the Ridge Regression chapter

Write Ridge Chapter

Draft Notes

  • Reminder overfitting, and recap from last section (how to find best model with validation)
  • Interpreting coefficients
  • Overfitting + coefficients
  • Regularization
  • Measuring the magnitude
  • Ridge regression
  • Choosing $\lambda$
  • Regularization details (scaling + intercept)

Write Naive Bayes and Decision Trees chapters

  • Naive Bayes
    • Idea
    • Computing Probabilities
    • Practicalities (Laplace Smoothing)
    • Compare models + Generative vs Discriminative models
  • Decision Trees
    • Flow chart idea
    • Parametric vs. non-parametric
    • XOR
    • Context: Loans
      • Bias/Fairness
    • Decision Tree idea
    • Decision Stump
    • How to select best split (classification error)
    • Algorithm
    • Worked example
    • How to handle missing data
    • Real valued features
    • Decision boundaries
    • Overfitting
    • In practice

Write Intro Classification chapter

  • Regression Overview
  • Classification overview
  • Bag of Words
  • Simple Threshold
  • Linear classifier
  • Decision boundaries
  • Compelx boundaries
  • Evaluating classifier
    • Interpretting Accuracy
    • Types of error / Confusion matrix
    • Measures of error
  • A bit on learning theory

Draft Feebdack for TAs

Instead of making a bunch of individual issues for each section of the book, I'm just going to have one mega-issue for all of the feedback gathered from the CSE/STAT 416 TAs. Please cite which chapter and which section your feedback is about!

If you have a list of things, it helps to make them a checkbox list like

  • Chapter 1
    • Typo in the sentence "And the cow jumped over the moon"

Write LASSO chapter

  • Recap overfitting + regularization
  • Feature selection overview (benefits, interpretability, sparisty)
  • All subsets (an efficiency)
  • Greedy algorithms
    • Forward stepwise
    • Alude to other forms
  • Regularization with Ridge for feature selection
  • LASSO
  • Demo
  • Sparsity (why?)
  • Choosing lambda (same)
  • Practicalities (debias lasso/feature selection)

Write Logistic Regression chapter

  • Why minimizing error doesn't work
  • Probability predictions
  • Scores -> Probabilities
  • Logistic Demo?
  • MLE
  • Gradient Ascent + Step Size
  • Logistic Regression + Overfitting

Make better sidenote numbering scheme

Right now, we have to manually label all the sidenote numbers and their references. This is quite a pain to update.

The built-in sidenote functionality doesn't work because they only allow inline content. Need maybe a new text role for an autocounter

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.