pittcsc / pittapi Goto Github PK

View Code? Open in Web Editor NEW

101.0 32.0 24.0 863 KB

An API to easily get data from the University of Pittsburgh

Home Page: https://pittapi.pittcsc.org

License: GNU General Public License v2.0

Python 60.32% HTML 39.68%

pittsburgh university universityofpittsburgh open-data python scraper

pittapi's People

Contributors

Stargazers

Watchers

pittapi's Issues

Integrate Travis CI

Once tests are more developed, I'll integrate Travis CI into this project.

I noticed your tests are making actual calls to Pitt's website each time they are run. While this does shield against API changes from Pitt's side, it does so at the expense of slowing down your test runs and increases the amount of traffic that Pitt's website may have.

Ideally, your unit tests should not make any network, disk, etc. calls. The use of a mocking framework such as Python's mock, will allow you to substitute mock objects which allow you to verify that they were called, what parameters they were called with, etc. Your unit tests should only test your code (within reason).

Create a Library API

We can get JSON results from our library catalog via the following URL: http://pitt.summon.serialssolutions.com/api/search?q=water

q is a URL-encoded parameter that is the search string.

This returns JSON already, just clean up the results somewhat and return it as is!

write unit tests for tests

License - MIT or GPL?

The readme says this is released under the MIT license but the pittAPI.py file notes it is released under GPLv2.

Write unit tests for PeopleAPI

As stated, also add documentation.

Any API using m.pitt.edu are dead, new scrapers needed!

Current APIs that need people:

People API
News API
Dining API

Enforce Unicode for all strings and results

Write now the string handling is very hacky so you'll see Unicode characters being expanded to \u0blah in the responses. Please ensure that proper Unicode encoding is enforced throughout so we don't have this issue. UTF-8 is what we should stick with.

add branch protections for dev

fix failing tests for courses

Create a DiningAPI

The Pitt legacy mobile website shows if the dining halls are currently open or not: https://m.pitt.edu/dining/. Find the place where they're getting this info from, hook into it directly, and create an API to return the statuses for all dining places.

Parallelize People calls

The PeopleAPI currently synchronously loops in chunks of 10 to get data. The Requests API lets you make a pool of connections that not only keep one connection alive for re-use, but can be made asynchronously! Switch the PeopleAPI to use this async feature.

ValueError when using BeautifulSoup in lab.py

Experiencing a ValueError when I attempt to use the PittAPI to retrieve a dictionary from lab.py for computer lab availability. I've narrowed it down to a potential issue with BeautifulSoup in _fetch_labs():
soup = BeautifulSoup(page.text, 'lxml')
I added a print statement before and after this line. Print statement before checks out, the one after never prints.

Steps to reproduce:
-Activate the virtual environment, and run server.py
-Make a request to the API for a lab's status.

It will reach the appropriate endpoint, find a match in locations, and attempt to fetch the lab info, but will encounter a ValueError when attempting to parse the span section of the page response's text.

set up gh actions for automated testing

People API is broken

People API's endpoint has changed and the response is different. @dzheng256 is on this.

Looking over the data results of sports api

This is just looking if the output of each function gives the data in a usable form and makes it simple for people to extract information out

Documentation

So that we know what APIs have what methods with what parameters and what return values.

Without having to pry through the code.

Improving documentation

This issue will deal with making consistent documentation within the project's code and with the docs. The plans below are subject to change and will include more issues/enhancements to deal with as they come up. Each issue will have the pull request or issue that they are resolved in. If there are any other documentation features you would like to be included please comment below.

Documentation Structure

Moving from DOCS.md to docs/ ( #35 )
Spliting DOCS.md into seperate API DOCS ( #35 )
Create template for adding functions to docs

Doc Improvements

Adding usage example to each function in the API ( #35, #43, #48, #55, #56, 6862373)
Adding descriptions of parameters and returns of each function ( #35, #43, #48, #55, #56)

Code Documentation

All functions have docstrings ( #43, #55, #56 )
All docstrings comply with PEP 257 ( #43, #55, #56 )

Extend the PeopleAPI (WIP) to actually work and return all results

There is currently a PeopleAPI that is partially developed that utilizes find.pitt.edu to scrape information about a given query. Finish the rest and return all results!

write unit tests for people

make PittAPI actually an API

Currently PittAPI is essentially a collection of functions in classes. Gold!

However, it would be more useful if it was hosted on, say, an AWS lambda.

Making PittAPI into a proper package

Proposal:

Essentially the change I proposing is moving all of PittAPI code into a folder and modularizing the project. This includes a big move away from OOP programming since there isn't much use of it in this library (this will get explained towards the end). So I'm going to show how the API will look after the change, how it will get structured, and why to get rid of the OOP usage.

API Usage Change:

So let's look at an example from of the current usage from the README.md of how to use the course API...

from pittAPI import CourseAPI

### Courses
course = CourseAPI()
# Will return a list of dictionaries containing courses in subject
big_dict = course.get_courses(term="2161", subject="CS")
# Will return a list of dictionaries containing courses that fulfill req
big_dict = course.get_courses_by_req(term="2161", req="Q")
# Will return a string, which is the description of the course
description = course.get_class_description(class_number="10163", term="2161")

and here is what it will look like after the change.

from PittAPI import course

# Will return a list of dictionaries containing courses in subject
big_dict = course.get_courses(term="2161", subject="CS")
# Will return a list of dictionaries containing courses that fulfill req
big_dict = course.get_courses_by_req(term="2161", req="Q")
# Will return a string, which is the description of the course
description = course.get_class_description(class_number="10163", term="2161")

Details to Notice

Notice how the naming is changed especially with CourseAPI turning into course. Thought behind this is that the PittAPI contains different modules that have to access various data from Pitt. Saying API repeatedly is a bit redundant since it's all coming from the PittAPI, instead it's shorten down to just course which is imported from PittAPI.
Next, notice how you no longer have to initialize the course API. Instead, you can use it's function directly, making the API even more friendly. This is a change to all the APIs where initialization is no longer required. (I will go into more in-depth why in a later explanation)

Package Structure:

The way the repository is structured is there is a pittAPI.py that has the whole library. First, change is to move this into a folder a.k.a. a package. This will come in handy later if this package gets put on the Python Package Index since, in the setup of submitting, you should already have the code all in a folder (many python packages contain their code into packages like this).

Before

__init__.py
pittAPI.py

After

PittAPI/
  - __init__.py
  - pittAPI.py

Very simple change, but in the current state shown above it makes the usage worse but there is one more change I would like to make.

Moving away from OOP (Making them into modules):

So as I was looking throughout this library I couldn't help but notice the overuses of classes when none are needed. This might seem weird but here me out.

Let's look at that CourseAPI class. One issue is why is __init__ declared since it's not used? Ok that's a bit nitpicky.

class CourseAPI:
    def __init__(self):
        pass

But there is a much bigger observation to make with this class. Besides the functions annotated with @staticmethod. Such as...

@staticmethod
def _retrieve_from_url(url):
    page = s.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')
    courses = soup.findAll("tr", {"class": "odd"})
    courses_even = soup.findAll("tr", {"class": "even"})
    courses.extend(courses_even)
    return courses

Aren't all the functions static? Take a look at this function below notice how there isn't really any usage of self (besides using it to call a static method) and that this function is very self-contained. What's the point of forcing all these static functions into a class, in python, there is a better way of doing this.

def get_courses(self, term, subject):
      subject = subject.upper()

      url = 'http://www.courses.as.pitt.edu/results-subja.asp?TERM={}&SUBJ={}'.format(term, subject)
      courses = self._retrieve_from_url(url)

      course_details = []

      for course in courses:
          details = [course_detail.string.replace('&nbsp;', '').strip()
                     for course_detail in course
                     if course_detail.string is not None]

          # Only append details if the list is not empty
          # If the subject code is incorrect, details will be NoneType
          if details:
              course_details.append(
                  {
                      'subject': details[1] if details[1] else "Not Decided",
                      'catalog_number': details[3] if details[3] else "Not Decided",
                      'term': details[5].replace('\r\n\t', '') if details[5] else "Not Decided",
                      'class_number': course.find('a').contents[0] if course.find('a').contents[0] else "Not Decided",
                      'title': details[8] if details[8] else "Not Decided",
                      'instructor': details[10] if details[10] else "Not Decided",
                      'credits': details[12] if details[12] else "Not Decided"
                  }
              )

      if not course_details:
          raise InvalidParameterException("The TERM or SUBJECT is invalid")

      return course_details

This better way is to turn these classes into modules. Essentially all that is being done is dropping the class and moving the all the functions into their own python file. So the structure will go from this ...

Before

PittAPI/
  - __init__.py
  - pittAPI.py

to this with each API have its own python file.

After

PittAPI/
  - __init__.py
  - course.py
  - laundry.py
  - people.py
  - lab.py
  - dining.py

In an instant, this removes all the OOP and makes the project more manageable, with each API having its own module to contain all its functions and any other specifics that apply to that API. This makes calling the API even simpler since there is no longer the need to initialize a class. It also makes importing the API very simple as all that needs to be done is to import the module.

I have already done this and tested it a bit (take a look at my fork of this project), but I would love to hear what your opinion of this idea is. If it all sounds good I would be more than glad to do a pull request of all these changes.

Who am I?

This is all who wonder why someone outside of Pitt is contributing to this project. Hello, I'm Alex Zharichenko and I'm a senior in high school who is currently in his 4th year of computer science (in my school this is basically independent study). I just recently after more than 6 months of waiting (applied on 7/21/16) finally got accepted into Pitt. I'm already fully committed to going to Pitt with my major being Computer Science. Hence the sudden interest into this repository.

(Completely irrelevant) Alex, join the CSC Slack

@azharichenko Join the Pitt Computer Science Club Slack here: http://pitt-csc-slackin.herokuapp.com/
If you can't join comment your email and I'll add you manually.
Join #clubproject and anything you find interesting.

I don't have your contact info so this is my only way of communicating

Switch from unittest to pytest

Fixing up the docstrings in sports api

Exception of course caused by the edge case

Value Error Exception due to the different date format.
Reproduction:
Try to get subject of ADMJ for the term 2197.
This cased happened on the date part of
https://psmobile.pitt.edu/app/catalog/classsection/UPITT/2197/19872

Write unit tests for all functions

There is currently 0% coverage and we don't know what breaks. All functions have to be manually tested. Write a test gradle for the repo.

README.md issues

Hey, I was messing around with your example use cases in README.md and noticed a few things. First, there's a call for a deprecated function in there. The line
big_dict = course.get_courses(term="2177", code="CS")
should be
big_dict = course.get_classes(term="2177", code="CS") since get_courses() is deprecated.
You also make a call to course.get_course() when I believe you mean to call course.get_courses(), since course.get_course() doesn't exist (sorry, but it's a little confusing when there's a get_class()).

I'm just working with the course module right now, but you should probably take a look through the examples in README.md. In particular, all the examples related to dining don't seem to be working right now (for me, the first 3 return empty lists, and get_location_by_name isn't a valid function name)--not sure if that's still a work in progress. I just installed and upgraded the package with Pip, so maybe the issues with dining.py have something to do with that version not being updated.

Anyway, sorry for the lengthy thing. Awesome job on this API overall.

update python version of API to 3.12 (currently 3.7)

Issues with Shuttle API's get_vehicle_route_stop_estimates

Recently the builds are showing an issue with Shuttle API's get_vehicle_route_stop_estimates and this should be investigated.

The issue comming off is....

======================================================================
ERROR: test_vehicle_route_stop_estimates (tests.shuttle_test.ShuttleTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<string>", line 3, in wrapper
  File "/home/travis/build/Pitt-CSC/PittAPI/tests/shuttle_test.py", line 55, in test_vehicle_route_stop_estimates
    stop_estimates = shuttle.get_vehicle_route_stop_estimates(25, 4)
  File "/home/travis/build/Pitt-CSC/PittAPI/PittAPI/shuttle.py", line 43, in get_vehicle_route_stop_estimates
    response = sess.get("http://www.pittshuttle.com/Services/JSONPRelay.svc/GetVehicleRouteStopEstimates", params=payload)
  File "/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/responses.py", line 625, in unbound_on_send
    return self._on_request(adapter, request, *a, **kwargs)
  File "/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/responses.py", line 600, in _on_request
    raise response
requests.exceptions.ConnectionError: Connection refused: GET http://www.pittshuttle.com/Services/JSONPRelay.svc/GetVehicleRouteStopEstimates?vehicleIdStrings=25&quantity=4

Introduce static type checking into the API

MyPy is a terrific project that lets us introduce static typing into parts of our code. Since there are certain parts of the code which follow a specific type, use MyPy to type check those portions.

MyPy: http://mypy-lang.org/

Identify parts of the code that can be type checked (all private functions)
Add static typing to them

Making the package name lowercase

I find that the current usage is not very intuitive since the package name is case sensitive and makes typing it a hassle, as shown below.

from PittAPI import course

Instead, let's just make it all lowercase so that typing and adding the package is a breeze, and follows the convention most other packages follow.

from pittapi import course

Note: This is just effecting the users use of the package and don't reflect a change of the name you use to install through pip. Essentially this will still hold pip install PittAPI

Rewrite all urllib2 content in requests

Seems like requests has matured a lot since the last time I looked at it. Most importantly, it allows us to use a connection pool and parallel async requests. Re-write every urllib2 call with an equivalent requests call.

Adding typing information into sports api

Moving docs onto readthedocs

Use LaundryView's JSON API instead of BeautifulSoup

I found those two APIs from LaundryView. The first returns university data and the second returns room data.
https://www.laundryview.com/api/c_room?loc=197
https://www.laundryview.com/api/currentRoomData?school_desc_key=197&location=2430137

I'm new to Python but I guess that JSON would be a bit more efficient than BeautifulSoup?

Drop Python 2 support

It's getting weird and complicated trying to support both 2 and 3 in one library.

To embrace the future (technically the present), I move to drop Python 2 support.

As of this term, the Course API is broken

The Dietrich site that was traditionally used is now gone forever and we need to switch it over to using peoplesoft mobile which might have the advantage of being more up to date information than Dietrich.

CourseAPI: Valid subjects/terms are labeled invalid if no courses exist

I caught this while unit testing.
The specific example is subject="ARABIC" and term="2171" or term="2177". For these two terms, no Arabic courses are offered, so the array course_details is empty, and the code then assumes that because the array is empty, then there must be an invalid parameter and it throws an InvalidParameteException

Courses HTML Page is updated so scraping is incorrect

https://github.com/Pitt-CSC/PittAPI/blob/99bef258517ae539bdec8fb67cad13cecbbb58a3/PittAPI/course.py#L265

I think the section HTML page changed, because data[2] no longer contains units. So I am pretty sure a couple of the other ones are incorrect as well

fix failing tests for lab

Course Information is incorrect for detail lists of length 6

For some subjects,

    details = [course_detail.string.replace('&nbsp;', '').strip()
                       for course_detail in course
                       if course_detail.string is not None
                       and len(course_detail.string.strip()) > 2]

grabs the subject at index 0. This causes the returned list of course_details to be incorrect, since with the current conditionals, it's setup to have the catalog_number be set to what would then be the subject. This was found for every subject except 4. For now, an easy fix is to check lists of length 6, and accommodate, but the underlying reason for why certain subjects (CS, JS, SA, FP) return lists of length 5 instead of 6 (ie not including the subject in the parsed result).

feeds = (
    "main_news",  # pitt.com main news
    "cssd",  # student announcements, stuff that goes on My Pitt
    "news_chronicle",  # the Pitt Chronicle news
    "news_alerts"  # crime alerts
)

IndexError when getting extra details for section

The function at line 230 of course.py could throw an IndexError if the parameter text does not contain the string ' :'

The function should look like:

def __extract_data_past_colon(text: str) -> str:
        if ': ' in text:
            return text.split(': ')[1]
        else:
            return text

I was getting this issue for some reason and this fixed it

People api seems to be dead.

Looks like we are gonna need to find an alternative.

set up pre-commit

just adding some hooks for basic formatting
tests are already handled with gh action

pittcsc / pittapi Goto Github PK

pittapi's People

Contributors

Stargazers

Watchers

Forkers

pittapi's Issues

Documentation Structure

Doc Improvements

Code Documentation

Proposal:

API Usage Change:

Details to Notice

Package Structure:

Before

After

Moving away from OOP (Making them into modules):

Before

After

Who am I?

Recommend Projects

Recommend Topics

Recommend Org