UCLA Course Explorer

The primary goal is to create a web page that displays a graph of courses with requisite relations determining connections between nodes.

A secondary goal may be to create a command-line tool similar to git log that shows the same relations, although this may or may not be possible to do well.

Requirements

This was written on (Arch) Linux. I have not tested on Windows so I cannot ensure proper functionality.

Python

Beautiful Soup
PyMongo
Selenium

Browser

WebDriver (eg. geckodriver for Firefox)

MongoDB

Usage

scrape.py

In scrape.py, modify existing lines for browser of choice (default Firefox):

driver = webdriver.Firefox() # Edit to use preferred browser.

For slower internet connections, also increase time given to load pages:

time.sleep(1) # Give web page time to load. Not 100% reliable.

You may also need to create directory "course-descriptions-data" if it doesn't exist. This is where course descriptions are saved.

scrape.py scrapes course descriptions for all subject areas by default, which may take several minutes.
scrape.py takes optional arguments (lines of subjects from subjects.txt) for manually selecting subjects to scrape. This is useful to update or fix scraped course descriptions for specific subjects.
For example, To scrape the Computer Science and Mathematics courses, execute:

$ python scrape.py -l 43 121

scrape.py uses subjects.txt to determine subject area names. This can be edited in case subject area names are changed.

Sources for course descriptions can be found at: https://registrar.ucla.edu/academics/course-descriptions

parse.py

parse.py parses the saved course descriptions generated by scrape.py to determine course subject area, number and name, textual description, and requisites of each course. These are saved in temporary variables. Currently, course data except for requisites are upserted into a courses collection. Edges representing requisite relations are inserted into a requisites collection.

credentials.py

parse.py relies on credentials.py to provide sensitive information such as the connection string to MongoDB. You must create this file:

MONGO_CONNECTION_STRING = "[connection_string]"

If you wish and if versions allow, you may use the same file to save the connection string for multiple MongoDB drivers for this project.

subjects_syn.txt

parse.py uses subjects_syn.txt to determine synonyms for subject area names in course descriptions. These are used to determine the subject areas of requisite courses.

If the last word (word1) of a potential synonym uniquely determines a subject area, there should be a line like so:

Word1:Subject

Note there are no spaces around the colon.
If Word1 cannot uniquely determine a subject area:

Word1:INCONCLUSIVE

If Word1 is the full name of a subject area but Word1 itself is inconclusive, an additional line is needed:

*Word1:Subject

If Word1 is inconclusive, look at the previous word (Word2) and apply the same logic as with Word1:

Word2 Word1:Subject // OR
Word2 Word1:INCONCLUSIVE // OR
*Word2 Word1:Subject

Extend to more words if needed.

Notes

It would be better to take course data from the UCLA API as the parser is imperfect given that course description language is only somewhat standardized, but this is limited to authorized staff. At the moment scraping course descriptions should be good enough.

Scraping and parsing may be improved by using CSS selectors or finding elements by class names. The current code was written before learning about such, but it works well enough so a change is unnecessary.

Progress Demo

2023-04-17.22-01-10.mp4

edwinyyyu / ucla-course-explorer Goto Github PK