uwflow / rmc Goto Github PK

View Code? Open in Web Editor NEW

233.0 233.0 73.0 54.59 MB

Flow is a website that lets you plan courses with friends.

Home Page: https://uwflow.com

License: MIT License

Makefile 0.14% Python 14.61% Shell 1.18% JavaScript 70.99% Ruby 0.04% HTML 3.99% CSS 9.03% Dockerfile 0.02%

rmc's Introduction

Backend

This is a collection of services comprising the UWFlow backend.

Architecture

The UWFlow backend is composed of 5 components that will be explained in detail below. Each of these components runs as a separate Docker container, orchestrated by docker-compose.

Postgres: Our Postgres database stores all of the data for UWFlow.
Hasura: Hasura is a GraphQL engine that sits on top of our Postgres database. It provides a GraphQL API for our frontend to interact with and is generally used for CRUD operations. We also use Hasura to enforce permissions and relationships between tables and manage DB migrations. For more details on using Hasura and creating new DB migrations, see the [./hasura/README.md](Hasura README).
API: Our API is a Go server that provides custom endpoints for our frontend to interact with. It is generally used for more complex operations that cannot be done with Hasura alone. This includes authentication, parsing for transcripts and calendars, webcal generation, and dumping raw search data for the frontend to use for autocomplete.
UW Importer: This is a cron job that runs on a schedule to import data from the UW API. We use this to fetch updates for courses, instructors, and term schedules.
Email: This is a service that watches a "queue" in our Postgres database for emails to send. It sends emails by generating HTML documents and sending them using the Google SMTP service.

In production, we run an Nginx reverse proxy in front of Hasura, the API, and the frontend to route requests to the correct service. Hasura is exposed via /graphql, the API via /api, and the frontend via /.

Requirements

The following packages are required for core functionality:

docker
docker-compose

The following packages are required by optional components:

hasura-cli: Hasura web interface

Exact package names may vary across distributions; for example, Ubuntu refers to docker as docker.io. The above list is intended as an unambiguous guideline for humans and is not necessarily consistent with any single distribution.

First-time setup

To find out what is really expected, peruse scripts/sanity-check.sh and apply common sense, as the following docs may be outdated.

Ensure the required packages are installed (see above).
Download and decrypt the database dump:

Download the file located in Google Drive at Flow/Data/pg_backup.gpg.
Run gpg2 --decrypt pg_backup.gpg > pg_backup. Use the password from the shared Bitwarden vault.

Copy .env.sample to .env and edit the latter as needed. In particular:

POSTGRES_DUMP_PATH should point to pg_backup obtained at the end of (2)
UW_API_KEY_V3 should be set as instructed in the uwapi-importer README
POSTGRES_HOST should be set to postgres on *NIX systems and 0.0.0.0 on Windows (which is incidentally otherwise unsupported)

How to run this

If you have not run the backend before, refer to the preceding section first. That being done, simply run script/start.sh.

As dependencies between containers exist that cannot be explicitly specified, the system will take a while to reach a stable (all services up) state. The script will wait as this happens, but it should not take more than a minute. If it does, then something went wrong. Ping #backend-dev.

It is instructive to study the script, as it often does not need to be re-run in its entirety. For example, when developing api, it is not necessary clear database state, so the following command suffices:

docker-compose up -d --build

Interacting with the backend

When docker-compose is active, services may be accessed at their published ports, as declared in docker-compose.yml.

To illustrate, the postgres service publishes port 5432, so

psql -h localhost -p 5432 -U flow

will spawn a Postgres shell connected to the database container. If you do not happen to have postgres-client installed, this also works:

$ docker exec -it postgres sh
(docker) # psql -U postgres flow

rmc's People

Contributors

Stargazers

Watchers

rmc's Issues

Mongodb sometimes takes longer to start and crashes Flask on Linux

Sometimes when mongod is run for the first time with our config, it takes a while to start, and Flask crashes when make local is run.

We can monitor a mongo log file to ensure it starts properly, or perhaps just "warm up" mongo after a fresh installation.

This only seems to affect Linux and not Mac.

More details in discussions of #64

Add JS unit tests for schedule and transcripts parsers

Whenever we fix bugs with the schedule or transcript parser, we want to make sure we don't introduce new bugs. So having unit tests would be nice.

Sadface if the connection to facebook server is refused

Steps to reproduce:
Edit the hosts file to block facebook.com
go on uwflow.com
This screen shows up

Make passing JS tests a deploy blocker

Thanks to @JGulbronson, we now have JS unit tests set up! Woohoo!

As I commented on #151, tests can thrown to the wayside if you never have to look at them, and can easily get broken long-term and therefore useless. My favorite way of dealing with this problem is making them a deploy blocker (i.e. you can't deploy unless the tests pass).

In the past, I've rolled my own systems to do this, but it looks the community has matured in some areas.

Two major contenders at the moment: Karma and Testem.

The behaviour we're looking for here is:

I run a shell command.
A browser opens either in the foreground or background or completely headless, and runs the JS tests (just running the tests in node is not a viable options IMO, because browser specific JS is insanely difficult to mock out).
The shell command terminates, and returns 0 if all the tests pass, or some other return code otherwise.

Once this is done, we can make the tests a deploy blocker like this:

diff --git i/Makefile w/Makefile
index bb90595..d7e621a 100644
--- i/Makefile
+++ w/Makefile
@@ -81,7 +81,7 @@ deploy_skiptest:
                cat deploy.sh | ssh rmc DEPLOYER=`whoami` sh; \
        fi

-deploy: test deploy_skiptest
+deploy: js-test test deploy_skiptest

 pip_install: require_virtualenv_in_dev
        pip install -r requirements.txt

You can see that running python tests is already a deploy blocker.

Do not link course reviews by someone that is not your friend

To repro, go to https://uwflow.com/course/psych101.

The course review by Steven Da Costa was made publicly, which is why you can see the reviewer's name. However, his profile page should not be linked since I am not his friend on Facebook.

Email logins

Let people create an account and sign in with their email address.

Use OpenData terms data for class schedules

Right now we parse the user's class schedule to get date and time information for their courses. This is undesirable because:

The class schedule format can vary across students, and is even ambiguous in some cases (#107)
Too much regex :(

Instead, we should simply parse out the current school term, along with the class number and section number of each course, and look it up in SectionMeeting (which is backed by OpenData terms data)

data/processor.py shows how the section data is processed (fetched by data/crawler.py)
server/static/js/schedule.js is where the schedule gets parsed.

Allow Facebook login for users who signed up with email

We should retrieve additional information from their Facebook account (e.g. friends) and merge it into their existing Flow account. There are at least two cases:

User signs up with an email. Logs out. Clicks "Sign in with Facebook" on the front page, while logged in to a Facebook account with the same email.
User signs up with an email. Clicks a "Connect with Facebook banner".

The first case currently can happen and it results in a duplicate keys error when saving the user. We should write a test to reproduce this and fix it by merging info from Facebook instead of making a new user.

There's no UI that facilitates the second case right now, so there's no problem. We can reuse the banners that we already show to logged out users.

Unified search bar in header

It would be awesome to have a unified search bar in the header that would let me jump to course pages or user pages.

Right now, searching for my friends means I load my profile page, wait for all their names to load, then use Ctrl/Cmd-F to find their names. This could definitely be improved.

Major UI component on this, so please post mockups before starting code for this.

Privacy controls for profile page

Currently, your profile page is completely visible to all of your Facebook friends who signed up on Flow. Allow users to make their profile page only visible to themselves.

Add whitespace linting for all files

Currently, we only have whitespace linting for python files. However, we should enforce whitespace rules for all source files.

Some things to lint for include:

disallowing tabs
disallowing trailing whitespace
enforcing 80-char line length

Ordering of terms in sections table is not consistent

For example, https://uwflow.com/course/cs136 currently shows:

... while most other courses have Winter 2014 before Spring 2014.

Make it more clear how to get to the course details page from a course box

When we were boothing at the SE symposium, we noticed that new users had trouble finding out that you can click "more info" to go to the course details page:

One possibility is to add a link with text like "See reviews and professors" under the rating bars (on the right column).

Create a label for "starter issues"

This is more of a repo maintenance thing, but this has worked well for CocoaPods. Creating a label for starter issues makes contributing less intimidating for those that want to be involved in contributing to the project. It helped me get involved in CocoaPods and I think it'll help foster a strong community around Flow as well.

Online classes produce a warning message on profile page

Under the calendar, there is a warning message that says that the meeting times are TBA even though there are no meeting times since it is an online class.

Make professor pages

Right now professors are only shown in the context of course pages.

There have been a few professors that were so good, I'd take pretty much anything they taught. It'd be cool to have a page that lists all the courses a prof teaches with associated info and reviews.

This would have a major UI component, so please post mockups before you dive into the code on this one.

Cluster together sections with same location+time in exam schedule

Since switching to Open Data v2 to get data for the exam schedule in #43, all sections are listed separately now.

It'd be great to group together sections that have the same time slot and location, so as to avoid creating like 9 entries all for the same time in the user's calendar (say, if they export to GCal). I believe this was the behaviour under the v1 API (which was a more direct mapping from the PDF, which does the clustering for brevity).

So, instead of

... we'd have

Add unit testing framework for JS

We don't currently have anything for unit testing our JavaScript code. Having this would be useful for testing things like the transcript and schedule parsing logic (see #105).

I don't really have any strong feeling for any specific unit testing framework, but one I have encountered in the past in Jasmine.

Improve performance of single course page by using work queue

For single course page (e.g. https://uwflow.com/course/econ101), components like the friend sidebar, list of course reviews, list of professors can each be loaded asynchronously using a work queue. This will reduce the perception of how long the page "hangs" before things start rendering.

See #27 for an example of how this was done for the profile page.

Link to import transcript page (/onboarding) from profile page

Currently, the only way to upload your previous terms' courses is by uploading your transcript. The only time you get a chance to do that is on initial account creation, where the user is taken to this page initially: https://uwflow.com/onboarding

We never link to that page anywhere else. Consequently, users may not know how to upload previous terms.

An easy fix would be to add an "Add previous terms" button to your profile page linking to /onboarding, similar to the "Add term" button already there.

Support Co-ops

Most students would love to be able to rate employers, get realistic salary data coming into an interview (CECS is notoriously bad at salary transparency), and reviews from past co-ops.

Improve /api/v1/programs efficiency

Use a query to get the program names counts, instead of doing it in code. See discussion in #78.

Add email course alerts to notify when a spot opens up

Take this: http://uw.coursealerts.ca/ and put it into Flow.

Re-enable serving minified JS files on production

Minifying JS was disabled twice (yes, how is that possible you ask) on two different occasions:

We use Airbrake to email us details of uncaught JS exceptions in production. However, in order to get intelligible error messages that we can actually act on, we disabled serving minified files on prod in 52fae50b. However, it seems like Airbrake now supports source maps, so we can get our deploy build process to generate source maps and configure Airbrake to use them.
#161 broke the JS compile step in deploy.sh due to changes to server/static/js/main.js. So, we disabled compiling JS files during deploy in ad83530 (see that commit for more details).

We'll need to resolve both of these issues to be able to serve minified JS in prod.

Allow filtering courses by this term or next term

We now have course offerings (sections) data, which tells us whether a course is offered this term or next term. Use that as an additional filter option in the search courses page.

Add login button to the nav bar

Allow users to log into their account from any page by adding it to the nav bar. Clicking the button should show the login modal. Possibly add a link to the signup modal as well.

We sometimes get negative ratings

Occasionally, when a user makes a rating, we'll get an error on the server like this:

  File "./rmc/server/server.py", line 1094, in user_course
    uc.save()
  File "./rmc/models/user_course.py", line 226, in save
    cur_professor.save()
  File "./rmc/models/professor.py", line 73, in save
    super(Professor, self).save(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/mongoengine/document.py", line 208, in save
    self.validate()
  File "/usr/local/lib/python2.7/dist-packages/mongoengine/base.py", line 1025, in validate
    raise ValidationError('ValidationError', errors=errors)
ValidationError: ValidationError(sorting_score_negative.Float value is too small: ['passion'])

This has been happening for more than 1.5 years, but we never seriously looked into it. I just now looked at the code in rating.py and it was evident what the bug was. Here's a repro: https://gist.github.com/divad12/5fd40c6278548fd43ac5

For more context: rating.py defines the AggregateRating class. An AggregateRating is something like "75% of 100 users thought this course was easy" — ie. an aggregate rating on a single metric. In this example, AggregateRating#rating would be 0.75 and AggregateRating#count would be 100. Note that AggregateRating#rating should always be in [0, 1], but the error raised here is that it's not.

Also note that users rate with binary metrics — yes or no. So the rating parameter in AggregateRating#add_rating(self, rating) is always 0 or 1.

See if you can figure it out! 😛

Don't link professor names in sections on Explore Courses page

The links on professor names in the sections table, added in #98, do not go anywhere:

Exams with multiple sections overflow container in calendar

For CS 341 in the calendar below, the string "EXAM 001, 002, 003" overflows the width of the container.

Calling get_ratings() for a prof doesn't return "easiness"

When calling get_ratings() in professor.py, an array containing four objects is returned, but the last one, "easiness", is always 0.

As an example, if I call get_ratings() on Eddie Dupont, it returns this:

{'overall': {'count': 50, 'rating': 1.0}, 'clarity': {'count': 50, 'rating': 1.0}, 'passion': {'count': 49, 'rating': 1.0}, 'easiness': {'count': 0, 'rating': 0.0}}

Note that both count and rating are 0

Migrate to OpenData v2

There's code in crawler.py that still uses the now-deprecated version 1 of uWaterloo/OpenData.

Exclude /api/v1/login/facebook from CSRF protection

Due to turning on CSRF protection in #63, POSTing to /api/v1/login/facebook from our Android app returns a 403 Forbidden. We need to exclude this endpoint from CSRF protection and have it return the CSRF token.

Also need to ensure this endpoint is a no-op if user is already logged in, else a CSRF attack on this endpoint would allow an attacker to login a user's browser to the attacker's account.

Add spinner during transcript and schedule import

Since it takes time to parse and then upload transcript or schedule, it'd be nice to give the user feedback while this happening by having a spinner.

To test importing transcript visit https://uwflow.com/onboarding.

To test importing schedule visit https://uwflow.com/profile?import-schedule=1.

Background does not extend to bottom of short course pages

For course pages that are short (e.g. no reviews yet), the background does not extend to bottom of the page. See screenshot below:

Refactor get_current_term_exams to group sections together

#96 fixed a grouping issue on the front-end, but there are other consumers of this exam data. For example, if a user exports their schedule to their Google Calendar, they'll get all the exams added too. We need to make the fix in the Python backend, so those improvements can propagate to calendar exports too.

A good place to make that fix in the backend code would be here:
https://github.com/UWFlow/rmc/blob/master/models/user.py#L469

Combine transcript and schedule parsing

Users have been trying to import their transcripts with the "Add Term" button (see screenshot). However, this accepts only schedules. This leads users to believe that our transcript parsing is broken and they are unable to import their previous courses.

We should make the copy & paste boxes compatible with both transcript and schedule pastes from Quest.

Broken professor name links in sections table on course detail page

Eg.

... links to the non-existent professor review https://uwflow.com/course/math135#j_huang

There's two ways I see to fix this:

Don't link to profs w/o a corresponding review. This would only involve JS changes.
Ensure there's always a corresponding prof listed under a course for all the profs in the sections table. This has the added bonus of keeping the "profs who teach this course" data up to date (which is used in the select2 dropdown for reviewing a prof). If you're going for this approach, you'd have to use the same heuristic (as used in #98) on the server to do approximate name matching on profs. The best place to fix this would be in the data scraping: https://github.com/UWFlow/rmc/blob/master/data/processor.py#L536

Inconsistent order of 'ratings' array contents in course info

The issue is that the ordering of the usefulness, interest, and easiness ratings is inconsistent between the overall "ratings" array and the "ratings" arrays included in the individual course reviews.

Shown below is a snippet from the response at /api/v1/courses/:id
The overall ratings array is shown at the top, and another ratings array (attached to a course review) is shown further down. Notice that the order of the easiness and interest array elements is reversed.

"ratings": [
    {
      "count": 162, 
      "rating": 0.6790123456790125, 
      "name": "usefulness"
    }, 
    {
      "count": 302, 
      "rating": 0.8178807947019866, 
      "name": "interest"
    }, 
    {
      "count": 242, 
      "rating": 0.7272727272727272, 
      "name": "easiness"
    }
  ], 
  "code": "PSYCH 101", 
  "name": "Introductory Psychology", 
  "overall": {
    "count": 311, 
    "rating": 0.8842443729903537
  }, 
  "reviews": [
    {
      "comment": "Rote memorization.", 
      "ratings": [
        {
          "rating": 0.0, 
          "name": "usefulness"
        }, 
        {
          "rating": 1.0, 
          "name": "easiness"
        }, 
        {
          "rating": 0.0, 
          "name": "interest"
        }
      ], 
      "comment_date": 1354093463046, 
      "author": {

Display only one validation message for missing name (during sign up)

Currently:

Desired:

Add a "smart app banner" for Android browser visitors linking to Android app

On iOS, you sometimes get this banner when you visit a website indicating there's a mobile app available:

Since we now have an Android app, and our website is not mobile-optimized, it would be good to indicate that to visitors from Android. Last month we had about 130 visits from an Android browser.

Perhaps we could integrate one of these plugins:

Send deploy notifications to the public UW Flow HipChat room

It's really helpful to see deploy notifications in the same chat window as the GitHub notifications, like we used to (before we had our public HipChat room). Send notifications to both the internal Operations room as well as the public "UW Flow" room.

In course reviews, show the professor the user took the course with

For example, in a review like this, knowing who the user took the course with would be helpful:

Course page load before clientside rendering looks ugly

On loading a course page, it briefly looks like this before clientside rendering finishes

Get course information from opendata as cronjob

Currently, we are manually crawling the undergrad calendar to get course descriptions. This should be ported to use the OpenData API. Furthermore, we should have cronjob that automatically fetches updated course descriptions periodically.

Handle parsing ambiguous dates from Quest schedules

When parsing schedules from Quest, we assume date range information for courses come in MM/DD/YYYY format (which is the case for most users).

However, Quest also has schedules where the date format is DD/MM/YYYY format for a small subset of users. We will need to use heuristics to guess which format the dates are in.

Below is a link to a schedule with the unsupported DD/MM/YYYY format: https://gist.github.com/mduan/9334117.

Schedule Building

I would like the ability to add and remove classes based on the scheduling information available. At the current moment you can only update classes based on your quest schedule, but if I am trying to decide how to schedule my classes such that everything fits, I need the ability to add/remove classes manually.

Aggregate shortlist

It would be useful to know the most popular courses that your friends plan to take in the future (ie. have in their shortlist).

People post questions like this in class Facebook groups:

Clicking "hide reviews" on professor reviews should scroll back up

Suppose you're on https://uwflow.com/course/psych101 and you're reading reviews on Richard Ennis. You click "See 71 more reviews" and read to the bottom. Now you're here:

You click "Hide reviews", and you get scrolled down, sometimes to the end of the page, because the document height shrinks. It would be better if the viewport scrolled back up to where you left off:

Thanks @AliceYuan for reporting.

Correct mistyped emails during sign up

Use something like this https://github.com/Kicksend/mailcheck

Filter out phantomjs requests from analytics stats

Recently, we added a weekly cronjob to have phantom.js to create snapshots of our dynamic course pages (so they can be indexed by search engine spiders). See 2c675ec.

However, for phantom.js to create the snapshots for the course pages, it has to make a request to each course page. This screws up our analytic stats on Mixpanel and Google Analytics. For example, the spikes in the # of visitors per day graph (Google Analytics) below are caused by phantom.js:

So we need to either find a way to either filter out phantom.js from Mixpanel and Google Analytics stats or find a way to generate the snapshots without generating requests to the production site.