Giter Club home page Giter Club logo

ucla-course-explorer's Introduction

UCLA Course Explorer

The primary goal is to create a web page that displays a graph of courses with requisite relations determining connections between nodes.

A secondary goal may be to create a command-line tool similar to git log that shows the same relations, although this may or may not be possible to do well.

Requirements

This was written on (Arch) Linux. I have not tested on Windows so I cannot ensure proper functionality.

Python

  • Beautiful Soup
  • PyMongo
  • Selenium

Browser

  • WebDriver (eg. geckodriver for Firefox)

MongoDB

Usage

scrape.py

In scrape.py, modify existing lines for browser of choice (default Firefox):

driver = webdriver.Firefox() # Edit to use preferred browser.

For slower internet connections, also increase time given to load pages:

time.sleep(1) # Give web page time to load. Not 100% reliable.

You may also need to create directory "course-descriptions-data" if it doesn't exist. This is where course descriptions are saved.

scrape.py scrapes course descriptions for all subject areas by default, which may take several minutes.
scrape.py takes optional arguments (lines of subjects from subjects.txt) for manually selecting subjects to scrape. This is useful to update or fix scraped course descriptions for specific subjects.
For example, To scrape the Computer Science and Mathematics courses, execute:

$ python scrape.py -l 43 121

scrape.py uses subjects.txt to determine subject area names. This can be edited in case subject area names are changed.

Sources for course descriptions can be found at: https://registrar.ucla.edu/academics/course-descriptions

parse.py

parse.py parses the saved course descriptions generated by scrape.py to determine course subject area, number and name, textual description, and requisites of each course. These are saved in temporary variables. Currently, course data except for requisites are upserted into a courses collection. Edges representing requisite relations are inserted into a requisites collection.

credentials.py

parse.py relies on credentials.py to provide sensitive information such as the connection string to MongoDB. You must create this file:

MONGO_CONNECTION_STRING = "[connection_string]"

If you wish and if versions allow, you may use the same file to save the connection string for multiple MongoDB drivers for this project.

subjects_syn.txt

parse.py uses subjects_syn.txt to determine synonyms for subject area names in course descriptions. These are used to determine the subject areas of requisite courses.

If the last word (word1) of a potential synonym uniquely determines a subject area, there should be a line like so:

Word1:Subject

Note there are no spaces around the colon.
If Word1 cannot uniquely determine a subject area:

Word1:INCONCLUSIVE

If Word1 is the full name of a subject area but Word1 itself is inconclusive, an additional line is needed:

*Word1:Subject

If Word1 is inconclusive, look at the previous word (Word2) and apply the same logic as with Word1:

Word2 Word1:Subject // OR
Word2 Word1:INCONCLUSIVE // OR
*Word2 Word1:Subject

Extend to more words if needed.

Notes

It would be better to take course data from the UCLA API as the parser is imperfect given that course description language is only somewhat standardized, but this is limited to authorized staff. At the moment scraping course descriptions should be good enough.

Scraping and parsing may be improved by using CSS selectors or finding elements by class names. The current code was written before learning about such, but it works well enough so a change is unnecessary.

Progress Demo

2023-04-17.22-01-10.mp4

ucla-course-explorer's People

Contributors

edwinyyyu avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.