Giter Club home page Giter Club logo

ml-projects's Introduction

Machine Learning projects

This repository contains an assortment of my projects in machine learning. Here's a brief overview of some of what you'll find here:

ML from Scratch

The ml_from_scratch folder contains notebooks where I implement various machine learning algorithms from scratch using only NumPy. Each notebook contains detailed documentation, mathematical explanations, and demos to experiment with the algorithms.

I built these projects to practice the skill of taking mathematical formulas and turning them into vectorized code. As such, I did not review anyone else's code related to these algorithms but rather studied their mathematical formulations.

Currently, the algorithms I have implemented are:

Neural network from scratch

A configurable fully-connected neural network that can be used for regression or classification tasks. This is an all-purpose algorithm that can be applied to many tasks like word embeddings, time series predictions, or multiclass classification (especially when using tabular data).

Key algorithms used include: backpropagation and gradient descent, cross entropy loss, L2 regularization, sigmoid and ReLU activations, forward propagation, normalization, parameter initialization, and a training and validation loop.

Results on sample data:

Classification Regression

K-Means clustering from scratch

A vectorized implementation of the K-Means clustering algorithm that groups an unlabeled dataset into $k$ clusters (labels) and can then classify new data based on proximity to the cluster centers. Uses include: finding related items in a dataset, segmenting customers, creating color palettes from images, grouping text documents, quantizing a dataset, categorizing transactions.

Key techniques used include: Euclidean distance, cosine similarity, vector normalization (L2), and a lot of plotting code for animated visualization.

Generate color palettes from images
image with color palette generated by the k-means algorithm
Fit to any numerical data
animation of k-means algorithm training process

Logistic regression from scratch

A logistic regression model for binary classification using gradient descent to optimize the model's parameters. Uses any number of input features.

Key algorithms include: binary cross entropy loss (log loss), sigmoid, and threshold selection using the inverse of sigmoid.

Training progress on a sample dataset, showing how the decision boundary is updated through gradient descent:

Linear regression from scratch

A linear regression model trained using gradient descent with optional early stopping. Uses any number of input features (i.e., multilinear regression).

Key algorithms used include: mean squared error loss, model training through gradient descent.

Results from a multilinear regression demo: notice that y_hat (color gradient) closely matches y (vertical axis).

Dominant color extraction

In my dominant color extraction notebook, I use the K-Means algorithm to estimate the main color for an image, which I then apply to the task of detecting a vehicle's color given an image of that vehicle. I used this technique in my vehicle specs project as part of a feature engineering pipeline to augment a dataset with vehicle color information.

GBM comparison

A comparison of different Gradient Boosting Machine (GBM) implementations on a classification dataset, including visualizations. chart showing feature rankings for four different ML models

Extractive text summarizing

In this notebook, adapted from a tutorial by Usman Malik, I implement a method for extractive text summarizing and use it to summarize Wikipedia articles.

Craiyon Text2Image

I created a sort-of API to generate images using the Craiyon text-to-image model, available at: https://www.craiyon.com/

Note: I last updated this notebook in fall 2022; the website may have changed since then and the web requests might not function as intended.

Text analytics assignments

I took a computational linguistics course and compiled my work on course assignments into a single notebook. In this notebook, I compute precision and recall for various NLP algorithms, compare categories and word frequency distributions across a corpus, classify sentiment, use part-of-speech tagging, predict categories using synsets from WordNet, and use LDA to visualize topics, compute the number of topics using perplexity and coherence, and use a lot of RegEx, web scraping, NLTK, and topic modeling with Gensim.

ml-projects's People

Contributors

rparkr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.