Giter Club home page Giter Club logo

owlapi's Introduction

OpenCourseAPI

An open-source API to scrape, process, and serve college course and class data ๐Ÿ“’

(Also, a fresh take on OwlAPI =D)

Features

  • ๐Ÿ”ย  Scrapes terms, departments, courses, and classes
  • ๐Ÿงฉย  Campus-agnostic architecture allows easy extensibility
  • ๐Ÿ“…ย  Currently serves 10+ years of just FHDA data
  • ๐Ÿซย  Scrapes a total of 4 colleges (adding more is just a matter of time)

Coming Soon:

  • ๐Ÿ“™ Scrape course catalogs to get more course info, such as descriptions
  • ๐Ÿ”— Link classes to professors for advanced analytics
  • ๐ŸŒ GraphQL API (proof-of-concept finished)

Data Explorer => opencourse.dev

URL is subject to change.. but for now, the frontend and the API are hosted on the same domain.

API Docs

Currently, the following data for the following campuses exists:

id name
fh Foothill College
da De Anza College
wv West Valley College
mc Mission College

All endpoints (except for /:campus) support the following query parameters:

name format default
year a valid year 2020
quarter fall, winter, spring, or summer fall

Example: /da/depts?year=2019&quarter=fall

GET /:campus

Example: /fh

Get campus metadata information.

Sample Response
{
  "id": "fh",
  "terms": [
    {
      "year": 2020,
      "term": "summer",
      "code": "202111"
    },
    {
      "year": 2020,
      "term": "spring",
      "code": "202041"
    },
    {
      "year": 2020,
      "term": "winter",
      "code": "202031"
    },
    {
      "year": 2019,
      "term": "fall",
      "code": "202021"
    },
    {
      "year": 2020,
      "term": "fall",
      "code": "202121"
    }
  ]
}

GET /:campus/courses

Example: /fh/courses

Get all courses at a campus for the selected term and year.

Sample Response
[
  {
    "dept": "ACTG",
    "course": "1A",
    "title": "Financial Accounting I"
  },
  {
    "dept": "ACTG",
    "course": "1B",
    "title": "Financial Accounting II"
  },
  {
    "dept": "MATH",
    "course": "1A",
    "title": "Calculus"
  }
]

GET /:campus/classes

Example: /fh/classes

Get all classes at a campus for the selected term and year.

Sample Response
[
  {
    "CRN": 20238,
    "raw_course": "ACTG F001A01W",
    "dept": "ACTG",
    "course": "1A",
    "section": "01W",
    "title": "Financial Accounting I",
    "units": 5,
    "start": "10/19/2020",
    "end": "12/11/2020",
    "seats": 2,
    "wait_seats": 15,
    "status": "open",
    "times": [
      {
        "days": "TBA",
        "start_time": "TBA",
        "end_time": "TBA",
        "instructor": [
          "Joe L  Mayer (P)"
        ],
        "location": "FC ONLINE"
      }
    ]
  },
  { "...": "..." }
]

POST /:campus/classes

Get selected classes in a batch request.

Body Schema
{
  resources: Resource[]
}
Resource Schema

Combinations: CRN, dept, or dept + course

{
  CRN: number,
  dept: string,
  course: string
}
Response Schema
{
  resources: [
    {
      status: 'success' | 'error',
      error: string,
      data: Class | Class[] | null
    }
  ]
}
Sample Body & Response

Body:

{
  resources: [
    { dept: 'CS', course: '1A' },
    { dept: 'CS', course: '2B' },
    { CRN: 20211 },
    {} // Invalid
  ]
}

Response:

{
  "resources": [
    {
      "status": "success",
      "data": [
        {
          "CRN": 30239,
          "raw_course": "C S F001A01Z",
          "dept": "CS",
          "course": "1A",
          "section": "01Z",
          "title": "Object-Oriented Programming Methodologies in Java",
          "units": 4.5,
          "start": "01/04/2021",
          "end": "03/26/2021",
          "times": ["..."],
          "status": "open",
          "seats": 40,
          "wait_seats": 10,
          "wait_cap": 10
        },
        {"...": "..."}
      ]
    },
    {
      "status": "success",
      "data": [
        {
          "CRN": 30239,
          "raw_course": "C S F001A01Z",
          "dept": "CS",
          "course": "1A",
          "section": "01Z",
          "title": "Object-Oriented Programming Methodologies in Java",
          "units": 4.5,
          "start": "01/04/2021",
          "end": "03/26/2021",
          "times": ["..."],
          "status": "open",
          "seats": 40,
          "wait_seats": 10,
          "wait_cap": 10
        },
        {"...": "..."}
      ]
    },
    {
      "status": "success",
      "data": {
        "CRN": 30324,
        "raw_course": "ACTG F001C03W",
        "dept": "ACTG",
        "course": "1C",
        "section": "03W",
        "title": "Managerial Accounting",
        "units": 5,
        "start": "01/04/2021",
        "end": "03/26/2021",
        "times": ["..."],
        "status": "open",
        "seats": 40,
        "wait_seats": 15,
        "wait_cap": 15
      }
    },
    {
      "status": "error",
      "error": "At least \"CRN\" or \"dept\" have to be specified."
    }
  ]
}

GET /:campus/classes/:crn

Example: /fh/classes/20238

Get a class by its CRN (in a given term and year).

Sample Response
{
  "CRN": 20238,
  "raw_course": "ACTG F001A01W",
  "dept": "ACTG",
  "course": "1A",
  "section": "01W",
  "title": "Financial Accounting I",
  "units": 5,
  "start": "10/19/2020",
  "end": "12/11/2020",
  "seats": 2,
  "wait_seats": 15,
  "status": "open",
  "times": [
    {
      "days": "TBA",
      "start_time": "TBA",
      "end_time": "TBA",
      "instructor": [
        "Joe L  Mayer (P)"
      ],
      "location": "FC ONLINE"
    }
  ]
}

GET /:campus/depts

Example: /fh/depts

Get all departments at a campus for the selected term and year.

Sample Response
[
  {
    "id": "GIST",
    "name": "Geospatial Tech & Data Sci"
  },
  {
    "id": "GLST",
    "name": "Global Studies"
  },
  {
    "id": "GID",
    "name": "Graphic and Interact Desig"
  }
]

GET /:campus/depts/:dept

Example: /fh/depts/GLST

Get a department at a campus for the selected term and year.

Sample Response
{
  "id": "GLST",
  "name": "Global Studies"
}

GET /:campus/depts/:dept/classes

Example: /fh/depts/MATH/classes

Get all classes in a department for the selected term and year.

Sample Response
[
  {
    "CRN": 20086,
    "raw_course": "MATH F001A01V",
    "dept": "MATH",
    "course": "1A",
    "section": "01V",
    "title": "Calculus",
    "units": 5,
    "start": "09/21/2020",
    "end": "12/11/2020",
    "seats": 0,
    "wait_seats": 7,
    "status": "waitlist",
    "times": [
      {
        "days": "MW",
        "start_time": "07:30 AM",
        "end_time": "09:45 AM",
        "instructor": [
          "Diana Monica  Uilecan (P)"
        ],
        "location": "FH ONLINE"
      }
    ]
  },
  { "...": "..." }
]

GET /:campus/depts/:dept/courses

Example: /fh/depts/MATH/courses

Get all courses in a department for the selected term and year.

Sample Response
[
  {
    "dept": "MATH",
    "course": "1A",
    "title": "Calculus",
    "classes": [
      20086,
      20087,
      20088,
      20257,
      20761,
      20762,
      20305,
      21310
    ]
  },
  {
    "dept": "MATH",
    "course": "1B",
    "title": "Calculus",
    "classes": [
      20672,
      20089,
      20769,
      20773
    ]
  }
]

GET /:campus/depts/:dept/courses/:course

Example: /fh/depts/MATH/courses/1A

Get a course in a department for the selected term and year.

Sample Response
{
  "dept": "MATH",
  "course": "1A",
  "title": "Calculus",
  "classes": [
    20086,
    20087,
    20088,
    20257,
    20761,
    20762,
    20305,
    21310
  ]
}

GET /:campus/depts/:dept/courses/:course/classes

Example: /fh/depts/MATH/courses/1A/classes

Get all classes for a course in the selected term and year.

Sample Response
[
  {
    "CRN": 20086,
    "raw_course": "MATH F001A01V",
    "dept": "MATH",
    "course": "1A",
    "section": "01V",
    "title": "Calculus",
    "units": 5,
    "start": "09/21/2020",
    "end": "12/11/2020",
    "seats": 0,
    "wait_seats": 7,
    "status": "waitlist",
    "times": [
      {
        "days": "MW",
        "start_time": "07:30 AM",
        "end_time": "09:45 AM",
        "instructor": [
          "Diana Monica  Uilecan (P)"
        ],
        "location": "FH ONLINE"
      }
    ]
  },
  { "...": "..." }
]

Examples

Python

Install requests, and use the API as follows:

import requests

API_URL = 'https://opencourse.dev'
req = requests.get(f'{API_URL}/fh/courses')

if req.ok:
    courses = req.json()

    for course in courses:
        print(course['dept'], course['course'], course['title'])
View Output
ACTG 1A Financial Accounting I
ACTG 1B Financial Accounting II
ACTG 1C Managerial Accounting
ACTG 51A Intermediate Accounting I
ACTG 52 Advanced Accounting
ACTG 53 Financial Statement Analysis
ACTG 54 Accounting Information Systems
ACTG 58 Auditing
ACTG 59 Fraud Examination
ACTG 60 Accounting for Small Business
ACTG 64A Computerized Accounting Practice Using Quickbooks
ACTG 64B Computerized Accounting Practice Using Excel
ACTG 65 Payroll & Business Tax Accounting
ACTG 66 Cost Accounting
ACTG 67 Tax Accounting
...

Development

For running the API server, install python 3.8, pip, and pipenv.

Note: the following is subject to change (especially the scraper modules)

git clone https://github.com/OpenCourseAPI/OpenCourseAPI.git
cd OpenCourseAPI

pipenv install # install all python dependencies

python -m campus.fhda.fhda_scrape # scrape Foothill / De Anza College
python -m campus.wvm.wvm_scrape # scrape West Valley / Mission College

python server.py # start the server

To run the frontend, install Node.js (preferably v12+) and yarn. Afterwards, run the following:

cd frontend
yarn install # install NPM packages
yarn start # start dev server

To build for production, use:

yarn run build

The generated static files are in frontend/build. To run a static server to serve the built files, use:

yarn run start:static

To run tests, use the following:

yarn test # run all tests
yarn test:e2e # only run end-to-end tests

Contribute

All contributions are welcome! A contribution guide is TBA.. but you can start by opening an issue and looking at the development guide above. Thanks!

Core Team

License

MIT License

owlapi's People

Contributors

boomsyrup avatar cwjoshuak avatar dependabot[bot] avatar joshuaptfan avatar madhavarshney avatar phi-line avatar tryexceptelse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

owlapi's Issues

Write a new data scraper for getting previous quarter's data

This would involve going to each quarter's previous data and using Beautiful Soup to find gather all the departments, all of the child courses, and all of the child sections. Then, using the same format as the current data scraper, make a database for every quarter.

The resulting database should be named using this schema:

[year][quarter][campus]_database.json

201812_database.json


    201812: The 2017-2018 academic year runs from July 1, 2017 to June 30, 2018. Therefore the ending year, 2011 is used to refer to the entire year.

    201812: The fifth digit represent the quarter:
        1 for Summer
        2 for Fall
        3 for Winter
        4 for Spring

    201812: The last digit refers to the college, 1 for Foothill and 2 for De Anza. 

This would make our local database very consistent with the one found on MyPortal, as the GET request made for the tabular course data follows this same format.

API Endpoint for autocomplete

This is an enhanced version of /list that should list all departments, all courses, and a url for hitting each course. can be structured however as long as all this info is present.

Fix frontend sidebar menu

Waypoints.js is not triggering when sidebar links are clicked.
In this example, the mouse just clicked single but the active status did not change on the menu item. This behaviour is controlled with Waypoints.js but does not work on some motions through the menu.

screen shot 2018-06-04 at 1 26 41 pm

Fix scrolling on resp modal

When the modal for the API response is opened, scrolling in the modal will also trigger a scroll on the body. We need to find a way to freeze the scrolling on body while the modal is triggered.

Add Docstrings to all functions

As this project becomes more public, the code needs to be easier to understand.
Inline comments on confusing lines would help too.

Migration to new data pattern

As discussed with @TryExceptElse, we will move to an easier to traverse schema for the database. As we move away from a course first system, we will decouple from the section data being buried under their departments and course keys.

Comparison between old and new format:

Old New
.
โ”œโ”€โ”€ dept
โ”‚   โ”œโ”€โ”€ course
โ”‚   โ”‚   โ”œโ”€โ”€ section 
.
โ”œโ”€โ”€ section

This switch allows for the easy filtering of section data like teacher, time, and room data. Advanced multi-step queries can be constructed using this method as well. Additionally, the new data format will not require tables to be used for each department.

Duplicate IDs on frontend

In this example, each interact element has an ID of modal where there should only be one ID of modal.

<span class="modal" id="modal" aria-hidden="true">
  <span class="modal-background" onclick="toggleModal(this.parentElement, false)"></span>
  <span class="modal-content"></span>
</span>

Owl API needs an architecture rebuff

While working on #46 I realized that we need to move away from all the api scripts in root. This caused some difficult while trying to get Travis setup to run automatically.

Specifically the modules found in root should be moved to a folder named OwlAPI.
With an __init__.py in that folder, it should be easier to call OwlAPI modules without the need of an __init__.py in the root.

This issue should be tackled after the merge of @TryExceptElse PR

Feature request - density plot of class times

Would be useful to visualize /quantify when most classes take place, and when most people are free.

A way to filter the most common classes as well.

Would be useful for planning the best times for events/workshops/etc. Would be able to see when ppl are most likely available without having to do surveys .

Graphs

Going to update this with more ideas while more ideas are thought of. Can be used later when making graphs.

Data Sets

  • S_all : set of all classes
  • S_common : set of "common" classes, ie. those required for major and ge

X-axis

  • X_axis = minute 0 -> minute max (maybe 1 week? 5-days?)
  • x = t = iterator of 5 minute increments

Y-axis

  • Y_classes_in_session : number of classes in set S that are 'in session' at time t
  • Y_students_in_session : sum of (number of students currently enrolled in (each class that is currently in session))
let x_axis be an iterator where t_next = t + 5
let class[start] be the start time of the class in minutes since t = 0
let class[end] be the end time of the class in minutes since t = 0 

for each value t in x_axis : 
    classes_in_session[t] = {class โˆˆ (Set of all classes) such that (class[start] < t < class[end])}
    y[t] ๐Ÿก 0
    for each class in classes_in_session[t]:
        y[t] ๐Ÿก y[t] + class[students_enrolled]

Note that this does not account for students who didn't show up to class ;)

programs

There could be an Updater, which fetches the data from the API and saves it into a Json file with filename based on date/time, into a folder called cache

There could be another program called data pre proccessor, which does any intermediate calculations. It outputs a new file with the data in such a way that it can be used directly by the grapher.

File_renamer could be a program that generates a list of all the cache and data filenames, which could be used by the HTML file to determine where all the data is. This might be helpful because filenames would be based on date/times, and it will be hard to predict what the next one is called. (Alternatively, make the filenames easily predictable, like data1, data2, data3 ... , and save the timestamp in the content)

The graph.html can then simply load the prepared data, and display it.

how to use

One possibility is to just have the updater call the data proccessor directly.

./program <cache directory> <processed data directory> 

Externally, you would just call the program and give the directory names where you want to store the output files (the filenames will be automatically generated).

Returned data formatting

For https://floof.li/fh/single?dept=POLI
Data returned where there are spaces before the number in units:

[
  {
    "1" : {
      "42082" : [
        {
          "room" : "ONLINE",
          "units" : "  5.00",
          "end" : "06\/29\/2018",
          "start" : "05\/21\/2018",
          "campus" : "FH",
          "desc" : "POLI SCI:INTRO AMERICAN GOVN\/P",
          "CRN" : "42082",
          "course" : "POLI F001.14W",
          "status" : "Waitlist",
          "wait_cap" : "20",
          "instructor" : "Callow",
          "days" : "TBA",
          "time" : "TBA",
          "seats" : "0",
          "wait_seats" : "15"
        }
      ]
    }
  }
]

It's also prevalent in other calls https://floof.li/da/single?dept=JOUR
Not that it really matters but letting you know.

Also, something new I discovered if you didn't know already
https://floof.li/da/single?dept=NURS
image
There apparently are possible locations out of campus- DO refers to DO HOSP in myPortal under Location.. and it crashes my application because I hardcoded locations. wuu. ;_;

Fix python version in Pipfile

Resolve this: Warning: Your Pipfile requires python_version 3.x, but you are using 3.8.2

This seems to be caused by defining the python version as 3.x in the Pipefile.

Tests for Archival Data

With such a massive amount of data (100,000 courses) being added with the #59 archive scraper, we need a way to check if all the correct keys for section data exist. An automated test should be made with pytest. You can find some of the examples that we've made in tests/test_server.py.

Here's a snippet to get you started:

class TestGenerateURL(TestCase):
    def test_sample_url_can_be_generated(self):
        self.assertEqual(
            test_data.test_sample_url_can_be_generated_data,
            generate_url('test_dept', 'test_course')
)

This class calls the generate_url() function with two example parameters and check to see if the result is equal to that found in the test_data file's copy. assertEqual will return 0 if it passes, and and 1 if it fails. All failed tests will be flagged with pytest.

Post here if you have any questions regarding how to start on this task. The archival data will not be ready until #59 has been merged.

Random Idea - visualization of prerequisites

Miscellaneous, for future, still brewing

Basic concept

  • Each course is a node.
  • Each node contains some logic about it's prerequisites, in the form of relations to another nodes.
  • Create graph automatically (should look something like a web)

How to interact with it

  • Highlight what you've finished
  • Pick a target node to see the paths to get there.
  • Map out class plan, or discover new classes

How it COULD be used

  • Add in courses from other sources
  • Create learning plans that extend outside of regular courses
  • Basically a skill tree like in a game, but more practical, and has real data from real courses

API Response Consistency

Currently, some of the API responses are not consistent. For example, fetching a non-existent course such as /fh/single?dept=MATH&course=2D returns a response as plaintext, instead of being valid JSON:

Error! Could not find given selectors in database

These instances should be fixed and the API should consistently return valid JSON .

Querying with course modifiers

UPDATE: Our course key parsing logic is indeed broken. Fixing that will fix this issue as well.

Querying with course modifiers included does not currently work. Searching for courses with certain modifiers, such as H for Honors, is a common use case and should be accounted for. A spike to investigate which modifiers should be queryable and which shouldn't, along with how feasible it is to do with or without creating special cases, is necessary.

For example, the ENGL 1BH only shows up at /fh/single?dept=ENGL&course=1B but not at /fh/single?dept=ENGL&course=1BH.

[
  {
    "CRN": "21132",
    "campus": "FH",
    "course": "ENGL F01BH1HW",
    "days": "TBA",
    "desc": "HONORS COMP,CRITICAL READ & TH",
    "end": "12/11/2020",
    "instructor": "Armerding",
    "room": "ONLINE",
    "seats": "29",
    "start": "09/21/2020",
    "status": "Open",
    "time": "TBA",
    "units": "5.00",
    "wait_cap": "25",
    "wait_seats": "25"
  }
]

Spike to nullify the need for '5 minute'

With the introduction of the advanced scraper in #59, we may not need the original '5 minute scraper'. This issue has been brought up in #60 as we try and consolidate to a streamlined data model.

The new data from the advanced scraper could be served up with the api, and give us a consistent format for accessing the data. Right now, the two different data formats make the methods for accessing data from new and old data a pain to manage.

For this research to be done, we need to see if My Portal records incremental seat changes in the advanced dataset. If we can monitor those changes overtime, then we can verify that the advanced scraper can be used as a replacement for the old '5 minute scraper'

Add a style guide for contributors

A style guide would be a helpful aid for people contributing to the project, in order to help maintain a consistent style, and therefore readability, throughout the project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.