Giter Club home page Giter Club logo

fsdl_2022_course_project's Introduction

fsdl_2022_course_project

Our project is to create an augmented ML approach course creators can use to streamline the generation of lecture summaries and chapter markers based on lesson videos.

Functionalities of Course Co-pilot

The basic workflow is:

  1. User opens a link to a YouTube video lecture in our application and asks Course Co-Pilot to process it
  2. User can view status of requests via the “Get Predictions” button.
  3. User can view predicted topic boundaries, headlines, & content summaries for processed videos.
  4. User can correct and save generated content (planned later to use in data flywheel)
  5. User will be able to export results as chapter markers to use in YouTube(planned later)
  6. User will be able to export results in a quarto friendly format for posting to a web page or blog.(planned later)

Why

In our own experience, we have noticed that such content either doesn’t get done, is time consuming, and/or requires work from outside parties. In particular, we noted in the below courses we’ve been a part of:

  1. Fast.ai course - During the course students manually create youtube chapter markers, lesson transcripts, and summaries on the forums.

  2. FSDL course - The chapter markers and lesson notes are later created manually and then shared on the FSDL website usually 1 week after the each lesson.

How our application is structured?

System Diagram

What have we done so far?

Let’s look at the dataset, ML library, API, and web application we created for our prototype system

Dataset

Since we had to train summarization models and topic segmentation models, we manually created our dataset from a bunch of youtube videos ranging from videos from fastai lessons, FSDL lesson to random videos teaching something.

Dataset Link

Dataset Schema

ML library: course_copilot

We leveraged nbdev framework to create a python package which acted as our framework for Model training and model serving. We integrated Wandb for experiment tracking and fine tuning models with sweeps. We created Model trainers for task of topic segmentation and summarization. The timing of our project coincided with release whisper which we used for creating transcription of youtube video URL you are passing. This helps to provide the required data for creating topic segments and summaries.

fsdl_2022_course_project

nbdev based Model Trainer for Topic Segmentation, Experiment tracking with W&B

Backend API

For the backend, we used FastAPI for creating APIs. Our API is leveraging dagster as the workflow engine to create tasks for running inference jobs from creating transcripts of video with whisper, running topic segmentation and running the summarization models.

fsdl-2022-group-007-app

Course Copilot APIs

Web Application

We created our front-end web application using Vue3 and Quasar. It is deployed to github pages from our repo.

fsdl-2022-group-007-web

Topic summaries and chapter summaries generated

Future Plans

  • Improve quality of training data
  • Allow users to save their corrected headlines and summaries
  • Add ability for users to update topic spans
  • Implement data flywheel
  • Implement chapter marker and quarto export features
  • Add authentication/authorization

Install

pip install course_copilot

Setting up your development environment

Please take some time reading up on nbdev … how it works, directives, etc… by checking out the walk-thrus and tutorials on the nbdev website

Step 1: Create conda environment

After cloning the repo, create a conda environment. This will install nbdev alongside other libraries likely required for this project.

mamba env create -f environment.yml

Step 2: Install Quarto:

nbdev_install_quarto

Step 3: Install hooks

nbdev_install_hooks

Step 4: Add pre-commit hooks (optional)

If using VSCode, you can install pre-commit hooks “to catch and fix uncleaned and unexported notebooks” before pushing to get. See the instructions in the nbdev documentation if you want to use this feature. https://nbdev.fast.ai/tutorials/pre_commit.html

Step 5: Install our library

pip install -e '.[dev]'

fsdl_2022_course_project's People

Contributors

ohmeow avatar kurianbenoy avatar suvash avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.