pittcsc / pittapi Goto Github PK
View Code? Open in Web Editor NEWAn API to easily get data from the University of Pittsburgh
Home Page: https://pittapi.pittcsc.org
License: GNU General Public License v2.0
An API to easily get data from the University of Pittsburgh
Home Page: https://pittapi.pittcsc.org
License: GNU General Public License v2.0
Once tests are more developed, I'll integrate Travis CI into this project.
I noticed your tests are making actual calls to Pitt's website each time they are run. While this does shield against API changes from Pitt's side, it does so at the expense of slowing down your test runs and increases the amount of traffic that Pitt's website may have.
Ideally, your unit tests should not make any network, disk, etc. calls. The use of a mocking framework such as Python's mock
, will allow you to substitute mock objects which allow you to verify that they were called, what parameters they were called with, etc. Your unit tests should only test your code (within reason).
We can get JSON results from our library catalog via the following URL: http://pitt.summon.serialssolutions.com/api/search?q=water
q
is a URL-encoded parameter that is the search string.
This returns JSON already, just clean up the results somewhat and return it as is!
The readme says this is released under the MIT license but the pittAPI.py file notes it is released under GPLv2.
As stated, also add documentation.
Current APIs that need people:
Write now the string handling is very hacky so you'll see Unicode characters being expanded to \u0blah
in the responses. Please ensure that proper Unicode encoding is enforced throughout so we don't have this issue. UTF-8
is what we should stick with.
The Pitt legacy mobile website shows if the dining halls are currently open or not: https://m.pitt.edu/dining/. Find the place where they're getting this info from, hook into it directly, and create an API to return the statuses for all dining places.
The PeopleAPI currently synchronously loops in chunks of 10 to get data. The Requests API lets you make a pool of connections that not only keep one connection alive for re-use, but can be made asynchronously! Switch the PeopleAPI to use this async feature.
Experiencing a ValueError
when I attempt to use the PittAPI to retrieve a dictionary from lab.py
for computer lab availability. I've narrowed it down to a potential issue with BeautifulSoup
in _fetch_labs()
:
soup = BeautifulSoup(page.text, 'lxml')
I added a print statement before and after this line. Print statement before checks out, the one after never prints.
Steps to reproduce:
-Activate the virtual environment, and run server.py
-Make a request to the API for a lab's status.
It will reach the appropriate endpoint, find a match in locations, and attempt to fetch the lab info, but will encounter a ValueError
when attempting to parse the span
section of the page response's text.
People API's endpoint has changed and the response is different. @dzheng256 is on this.
This is just looking if the output of each function gives the data in a usable form and makes it simple for people to extract information out
So that we know what APIs have what methods with what parameters and what return values.
Without having to pry through the code.
This issue will deal with making consistent documentation within the project's code and with the docs. The plans below are subject to change and will include more issues/enhancements to deal with as they come up. Each issue will have the pull request or issue that they are resolved in. If there are any other documentation features you would like to be included please comment below.
DOCS.md
to docs/
( #35 )DOCS.md
into seperate API DOCS ( #35 )There is currently a PeopleAPI that is partially developed that utilizes find.pitt.edu to scrape information about a given query. Finish the rest and return all results!
Currently PittAPI is essentially a collection of functions in classes. Gold!
However, it would be more useful if it was hosted on, say, an AWS lambda.
Essentially the change I proposing is moving all of PittAPI code into a folder and modularizing the project. This includes a big move away from OOP programming since there isn't much use of it in this library (this will get explained towards the end). So I'm going to show how the API will look after the change, how it will get structured, and why to get rid of the OOP usage.
So let's look at an example from of the current usage from the README.md of how to use the course API...
from pittAPI import CourseAPI
### Courses
course = CourseAPI()
# Will return a list of dictionaries containing courses in subject
big_dict = course.get_courses(term="2161", subject="CS")
# Will return a list of dictionaries containing courses that fulfill req
big_dict = course.get_courses_by_req(term="2161", req="Q")
# Will return a string, which is the description of the course
description = course.get_class_description(class_number="10163", term="2161")
and here is what it will look like after the change.
from PittAPI import course
# Will return a list of dictionaries containing courses in subject
big_dict = course.get_courses(term="2161", subject="CS")
# Will return a list of dictionaries containing courses that fulfill req
big_dict = course.get_courses_by_req(term="2161", req="Q")
# Will return a string, which is the description of the course
description = course.get_class_description(class_number="10163", term="2161")
CourseAPI
turning into course
. Thought behind this is that the PittAPI
contains different modules that have to access various data from Pitt. Saying API
repeatedly is a bit redundant since it's all coming from the PittAPI
, instead it's shorten down to just course
which is imported from PittAPI
.course
API. Instead, you can use it's function directly, making the API even more friendly. This is a change to all the APIs where initialization is no longer required. (I will go into more in-depth why in a later explanation)The way the repository is structured is there is a pittAPI.py
that has the whole library. First, change is to move this into a folder a.k.a. a package. This will come in handy later if this package gets put on the Python Package Index since, in the setup of submitting, you should already have the code all in a folder (many python packages contain their code into packages like this).
__init__.py
pittAPI.py
PittAPI/
- __init__.py
- pittAPI.py
Very simple change, but in the current state shown above it makes the usage worse but there is one more change I would like to make.
So as I was looking throughout this library I couldn't help but notice the overuses of classes when none are needed. This might seem weird but here me out.
Let's look at that CourseAPI
class. One issue is why is __init__
declared since it's not used? Ok that's a bit nitpicky.
class CourseAPI:
def __init__(self):
pass
But there is a much bigger observation to make with this class. Besides the functions annotated with @staticmethod
. Such as...
@staticmethod
def _retrieve_from_url(url):
page = s.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
courses = soup.findAll("tr", {"class": "odd"})
courses_even = soup.findAll("tr", {"class": "even"})
courses.extend(courses_even)
return courses
Aren't all the functions static? Take a look at this function below notice how there isn't really any usage of self
(besides using it to call a static method) and that this function is very self-contained. What's the point of forcing all these static functions into a class, in python, there is a better way of doing this.
def get_courses(self, term, subject):
subject = subject.upper()
url = 'http://www.courses.as.pitt.edu/results-subja.asp?TERM={}&SUBJ={}'.format(term, subject)
courses = self._retrieve_from_url(url)
course_details = []
for course in courses:
details = [course_detail.string.replace(' ', '').strip()
for course_detail in course
if course_detail.string is not None]
# Only append details if the list is not empty
# If the subject code is incorrect, details will be NoneType
if details:
course_details.append(
{
'subject': details[1] if details[1] else "Not Decided",
'catalog_number': details[3] if details[3] else "Not Decided",
'term': details[5].replace('\r\n\t', '') if details[5] else "Not Decided",
'class_number': course.find('a').contents[0] if course.find('a').contents[0] else "Not Decided",
'title': details[8] if details[8] else "Not Decided",
'instructor': details[10] if details[10] else "Not Decided",
'credits': details[12] if details[12] else "Not Decided"
}
)
if not course_details:
raise InvalidParameterException("The TERM or SUBJECT is invalid")
return course_details
This better way is to turn these classes into modules. Essentially all that is being done is dropping the class
and moving the all the functions into their own python file. So the structure will go from this ...
PittAPI/
- __init__.py
- pittAPI.py
to this with each API have its own python file.
PittAPI/
- __init__.py
- course.py
- laundry.py
- people.py
- lab.py
- dining.py
In an instant, this removes all the OOP and makes the project more manageable, with each API having its own module to contain all its functions and any other specifics that apply to that API. This makes calling the API even simpler since there is no longer the need to initialize a class. It also makes importing the API very simple as all that needs to be done is to import the module.
I have already done this and tested it a bit (take a look at my fork of this project), but I would love to hear what your opinion of this idea is. If it all sounds good I would be more than glad to do a pull request of all these changes.
This is all who wonder why someone outside of Pitt is contributing to this project. Hello, I'm Alex Zharichenko and I'm a senior in high school who is currently in his 4th year of computer science (in my school this is basically independent study). I just recently after more than 6 months of waiting (applied on 7/21/16) finally got accepted into Pitt. I'm already fully committed to going to Pitt with my major being Computer Science. Hence the sudden interest into this repository.
@azharichenko Join the Pitt Computer Science Club Slack here: http://pitt-csc-slackin.herokuapp.com/
If you can't join comment your email and I'll add you manually.
Join #clubproject and anything you find interesting.
I don't have your contact info so this is my only way of communicating
Value Error Exception due to the different date format.
Reproduction:
Try to get subject of ADMJ for the term 2197.
This cased happened on the date part of
https://psmobile.pitt.edu/app/catalog/classsection/UPITT/2197/19872
There is currently 0% coverage and we don't know what breaks. All functions have to be manually tested. Write a test gradle for the repo.
Hey, I was messing around with your example use cases in README.md and noticed a few things. First, there's a call for a deprecated function in there. The line
big_dict = course.get_courses(term="2177", code="CS")
should be
big_dict = course.get_classes(term="2177", code="CS")
since get_courses()
is deprecated.
You also make a call to course.get_course()
when I believe you mean to call course.get_courses()
, since course.get_course()
doesn't exist (sorry, but it's a little confusing when there's a get_class()
).
I'm just working with the course module right now, but you should probably take a look through the examples in README.md. In particular, all the examples related to dining don't seem to be working right now (for me, the first 3 return empty lists, and get_location_by_name isn't a valid function name)--not sure if that's still a work in progress. I just installed and upgraded the package with Pip, so maybe the issues with dining.py have something to do with that version not being updated.
Anyway, sorry for the lengthy thing. Awesome job on this API overall.
Recently the builds are showing an issue with Shuttle API's get_vehicle_route_stop_estimates
and this should be investigated.
The issue comming off is....
======================================================================
ERROR: test_vehicle_route_stop_estimates (tests.shuttle_test.ShuttleTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 3, in wrapper
File "/home/travis/build/Pitt-CSC/PittAPI/tests/shuttle_test.py", line 55, in test_vehicle_route_stop_estimates
stop_estimates = shuttle.get_vehicle_route_stop_estimates(25, 4)
File "/home/travis/build/Pitt-CSC/PittAPI/PittAPI/shuttle.py", line 43, in get_vehicle_route_stop_estimates
response = sess.get("http://www.pittshuttle.com/Services/JSONPRelay.svc/GetVehicleRouteStopEstimates", params=payload)
File "/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/responses.py", line 625, in unbound_on_send
return self._on_request(adapter, request, *a, **kwargs)
File "/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/responses.py", line 600, in _on_request
raise response
requests.exceptions.ConnectionError: Connection refused: GET http://www.pittshuttle.com/Services/JSONPRelay.svc/GetVehicleRouteStopEstimates?vehicleIdStrings=25&quantity=4
MyPy is a terrific project that lets us introduce static typing into parts of our code. Since there are certain parts of the code which follow a specific type, use MyPy to type check those portions.
MyPy: http://mypy-lang.org/
I find that the current usage is not very intuitive since the package name is case sensitive and makes typing it a hassle, as shown below.
from PittAPI import course
Instead, let's just make it all lowercase so that typing and adding the package is a breeze, and follows the convention most other packages follow.
from pittapi import course
Note: This is just effecting the users use of the package and don't reflect a change of the name you use to install through pip. Essentially this will still hold pip install PittAPI
Seems like requests has matured a lot since the last time I looked at it. Most importantly, it allows us to use a connection pool and parallel async requests. Re-write every urllib2 call with an equivalent requests call.
I found those two APIs from LaundryView. The first returns university data and the second returns room data.
https://www.laundryview.com/api/c_room?loc=197
https://www.laundryview.com/api/currentRoomData?school_desc_key=197&location=2430137
I'm new to Python but I guess that JSON would be a bit more efficient than BeautifulSoup?
It's getting weird and complicated trying to support both 2 and 3 in one library.
To embrace the future (technically the present), I move to drop Python 2 support.
The Dietrich site that was traditionally used is now gone forever and we need to switch it over to using peoplesoft mobile which might have the advantage of being more up to date information than Dietrich.
I caught this while unit testing.
The specific example is subject="ARABIC" and term="2171" or term="2177". For these two terms, no Arabic courses are offered, so the array course_details
is empty, and the code then assumes that because the array is empty, then there must be an invalid parameter and it throws an InvalidParameteException
I think the section HTML page changed, because data[2]
no longer contains units. So I am pretty sure a couple of the other ones are incorrect as well
For some subjects,
details = [course_detail.string.replace(' ', '').strip()
for course_detail in course
if course_detail.string is not None
and len(course_detail.string.strip()) > 2]
grabs the subject at index 0. This causes the returned list of course_details
to be incorrect, since with the current conditionals, it's setup to have the catalog_number
be set to what would then be the subject. This was found for every subject except 4. For now, an easy fix is to check lists of length 6, and accommodate, but the underlying reason for why certain subjects (CS, JS, SA, FP) return lists of length 5 instead of 6 (ie not including the subject in the parsed result).
Encoding and decoding u'Bookstore Caf\xe9'
fails. Please check!
I found this URL: https://m.pitt.edu/news/index.json?feed=main_news&id=&_object=kgoui_Rcontent_I0_Rcontent_I0&_object_include_html=1
Gets us all the Pitt news. There are different feeds that go into the feed
URL path paramater:
feeds = (
"main_news", # pitt.com main news
"cssd", # student announcements, stuff that goes on My Pitt
"news_chronicle", # the Pitt Chronicle news
"news_alerts" # crime alerts
)
The function at line 230 of course.py could throw an IndexError if the parameter text
does not contain the string ' :'
The function should look like:
def __extract_data_past_colon(text: str) -> str:
if ': ' in text:
return text.split(': ')[1]
else:
return text
I was getting this issue for some reason and this fixed it
Looks like we are gonna need to find an alternative.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.