Giter Club home page Giter Club logo

dubsbot's Introduction

DubsBot | The Unoffical UW Course Schedule Chatbot ๐Ÿพ

dubsbot

Table of Contents

This is the home for DubsBot, a chatbot designed to assist University of Washington students in learning about available classes and planning their course schedules. This project was created for the Microsoft Azure AI Chat Hackathon.

This assistant is a RAG (Retrieval Augmented Generation) chat application, it has been specially optimized and trained to utilize available UW Time Schedules.

Features ๐Ÿ’ซ

  • Chat about class offerings for Spring Quarter 2024
  • Compare topics, times, and instructors for courses
  • Find course offerings that fit your interests or build off of previous classes
  • See how course offerings fit with general degree requirments

Assistant Strengths ๐Ÿ’ช

DubsBot excels at digesting course information and providing recomendations based of off user interests or goals. Its structure as a converstational experience allows for clarifying questions and context based answers.

Example queries

  • Are there music theory classes availible for non majors?
  • What are some CSE classes about AI?
  • I just took ____ and want to continue learning ____, what should I take?

How it works โš™๏ธ

Data Ingestion

There are multiple steps for proper data ingestion:

Web Scraping

There are two provided Bash web scraping scripts, these scipts use Lynx:

  • /scripts/scrapeschedules.sh for scraping the courses on offer for Spring 2024
  • /scripts/scrapecatalog.sh for scraping the course catalogs for each department

IMPORTANT: For proper data structure, scrapeschedules.sh must first be run to completion before running scrapecatalog.sh

These scripts will populate the /data/ directory will all necessary data.

The script /data/prune.sh is a utility script to remove all time schedules and course catalogs that are missing their complements, it should be run after all data has been scraped to get rid of unecessary files.

HTML Parsing

Upon running prepdocs.sh, all HTML documents in the /data folder are parsed with a custom local parser. The HTML time schedules are translated into simple readable strings, where any abreviations are expanded, labels are given to different class sessions, and other course information is refined. The parser will also open the associated course catalog for each department's time schedule and insert a full course description for each course on offer for the quarter.

Search Index Chunking

Time schedule length can vary greatly between departments, and as a result all parsed time schedules must be chunked intelligently before being uploaded to the Azure Search Index. The custom text splitter will split the time schedules into chunks for each course that is offered, ensuring that all information for a single class stays together in the index.

Azure Search Index Structure

A search index is created to store the parsed HTML files with additional fields to facilitate proper data retrieval. All individual classes are associated with a course level and department code within the index. This allows for structured queries during chat operation, and for proper filtering for appropiate classes.

Chat Approach

This application implements the read-retrieve-read approach to interacting with GPT and the Azure Search Index. The original user query is sent to the LLM in order to extact specifc fields and choose a query type for the search index. The generated search query is then normalized and executed on the Azure Search Index with appropriate filtering. The results are sent back to the LLM for answer synthesis and displayed to the user.

Prompt Engineering

DubsBot largely follows standard assistant behavior, save for its predisposition to bark.

Limitations ๐Ÿ™…โ€โ™‚๏ธ

This application currently has some limitations that many impact its ability to help with all questions.

  • Only course data for Spring 2024 is included in the index, and as a result the assistant cannot answer questions about past or future quarters. eg: "How often is CSE 333 offered?"
  • The assitant cannot access student specific documents and information, so it cannot make choices based off of a student's previous course load without being told explicity what the student has taken while chatting. eg: "What classes should I take to fufill my Natural Sciences credits?"
  • The assistant does not have access to internal UW course data, and as such cannot provide information on course trends, real time enrollment numbers, or unexpected changes to course offerings. eg: "How often is a CSE 143 totally filled?"
  • The assistant cannot actually register students for courses, registraton must be done through MyPlan.
  • The assistant cannot see the entire search index at once, so it cannot answer questions about how many classes there are.

Future Additions ๐Ÿ”ฎ

  • Allow for student supplied documents such as transcipts or existing course plans. This would allow the assistant to give personalized recommendations based on what classes the student is already planning to take, or which requirements they still need to fufill.
  • Make data from past and future quarters available. This would allow the assistant to give information on class offerings in the past or planned for the near future.

dubsbot's People

Contributors

pamelafox avatar dependabot[bot] avatar tonybaloney avatar srbalakr avatar cjrieth avatar mattgotteiner avatar codebytes avatar jongio avatar pablocastro avatar liamca avatar vhvb1989 avatar microsoftopensource avatar anatolip avatar lukemurraynz avatar gkulin avatar weikanglim avatar gukoff avatar darrenturchiarelli avatar achandmsft avatar kokuljose avatar wherewith avatar maciejlitwiniec avatar marekjakimiuk1 avatar ahnl avatar patrick-davis-msft avatar pjirsa avatar ppeterlongotg avatar smiddy4000 avatar irooc avatar stevejsteiner avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.