Giter Club home page Giter Club logo

rmc's Introduction

Backend

This is a collection of services comprising the UWFlow backend.

Architecture

The UWFlow backend is composed of 5 components that will be explained in detail below. Each of these components runs as a separate Docker container, orchestrated by docker-compose.

  1. Postgres: Our Postgres database stores all of the data for UWFlow.

  2. Hasura: Hasura is a GraphQL engine that sits on top of our Postgres database. It provides a GraphQL API for our frontend to interact with and is generally used for CRUD operations. We also use Hasura to enforce permissions and relationships between tables and manage DB migrations. For more details on using Hasura and creating new DB migrations, see the [./hasura/README.md](Hasura README).

  3. API: Our API is a Go server that provides custom endpoints for our frontend to interact with. It is generally used for more complex operations that cannot be done with Hasura alone. This includes authentication, parsing for transcripts and calendars, webcal generation, and dumping raw search data for the frontend to use for autocomplete.

  4. UW Importer: This is a cron job that runs on a schedule to import data from the UW API. We use this to fetch updates for courses, instructors, and term schedules.

  5. Email: This is a service that watches a "queue" in our Postgres database for emails to send. It sends emails by generating HTML documents and sending them using the Google SMTP service.

In production, we run an Nginx reverse proxy in front of Hasura, the API, and the frontend to route requests to the correct service. Hasura is exposed via /graphql, the API via /api, and the frontend via /.

Requirements

The following packages are required for core functionality:

  • docker
  • docker-compose

The following packages are required by optional components:

Exact package names may vary across distributions; for example, Ubuntu refers to docker as docker.io. The above list is intended as an unambiguous guideline for humans and is not necessarily consistent with any single distribution.

First-time setup

To find out what is really expected, peruse scripts/sanity-check.sh and apply common sense, as the following docs may be outdated.

  1. Ensure the required packages are installed (see above).
  2. Download and decrypt the database dump:
  • Download the file located in Google Drive at Flow/Data/pg_backup.gpg.
  • Run gpg2 --decrypt pg_backup.gpg > pg_backup. Use the password from the shared Bitwarden vault.
  1. Copy .env.sample to .env and edit the latter as needed. In particular:
  • POSTGRES_DUMP_PATH should point to pg_backup obtained at the end of (2)
  • UW_API_KEY_V3 should be set as instructed in the uwapi-importer README
  • POSTGRES_HOST should be set to postgres on *NIX systems and 0.0.0.0 on Windows (which is incidentally otherwise unsupported)

How to run this

If you have not run the backend before, refer to the preceding section first. That being done, simply run script/start.sh.

As dependencies between containers exist that cannot be explicitly specified, the system will take a while to reach a stable (all services up) state. The script will wait as this happens, but it should not take more than a minute. If it does, then something went wrong. Ping #backend-dev.

It is instructive to study the script, as it often does not need to be re-run in its entirety. For example, when developing api, it is not necessary clear database state, so the following command suffices:

docker-compose up -d --build

Interacting with the backend

When docker-compose is active, services may be accessed at their published ports, as declared in docker-compose.yml.

To illustrate, the postgres service publishes port 5432, so

psql -h localhost -p 5432 -U flow

will spawn a Postgres shell connected to the database container. If you do not happen to have postgres-client installed, this also works:

$ docker exec -it postgres sh
(docker) # psql -U postgres flow

rmc's People

Contributors

ayushk1 avatar bobqywei avatar catastrophiclam avatar ccqi avatar derrekchow avatar divad12 avatar dshynkev avatar gabrielwong avatar gautamgupta avatar georgeke avatar jakesi avatar jgulbronson avatar jlfwong avatar joshkalpin avatar jswu avatar kartiktalwar avatar kimyousee avatar klistwan avatar mario54 avatar mduan avatar michaelmior avatar noryb009 avatar psobot avatar pushrax avatar ryandv avatar s-stripe avatar sachdevs avatar sophiebits avatar tbelaire avatar yingted avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rmc's Issues

Mongodb sometimes takes longer to start and crashes Flask on Linux

Sometimes when mongod is run for the first time with our config, it takes a while to start, and Flask crashes when make local is run.

We can monitor a mongo log file to ensure it starts properly, or perhaps just "warm up" mongo after a fresh installation.

This only seems to affect Linux and not Mac.

More details in discussions of #64

Make passing JS tests a deploy blocker

Thanks to @JGulbronson, we now have JS unit tests set up! Woohoo!

As I commented on #151, tests can thrown to the wayside if you never have to look at them, and can easily get broken long-term and therefore useless. My favorite way of dealing with this problem is making them a deploy blocker (i.e. you can't deploy unless the tests pass).

In the past, I've rolled my own systems to do this, but it looks the community has matured in some areas.

Two major contenders at the moment: Karma and Testem.

The behaviour we're looking for here is:

  1. I run a shell command.
  2. A browser opens either in the foreground or background or completely headless, and runs the JS tests (just running the tests in node is not a viable options IMO, because browser specific JS is insanely difficult to mock out).
  3. The shell command terminates, and returns 0 if all the tests pass, or some other return code otherwise.

Once this is done, we can make the tests a deploy blocker like this:

diff --git i/Makefile w/Makefile
index bb90595..d7e621a 100644
--- i/Makefile
+++ w/Makefile
@@ -81,7 +81,7 @@ deploy_skiptest:
                cat deploy.sh | ssh rmc DEPLOYER=`whoami` sh; \
        fi

-deploy: test deploy_skiptest
+deploy: js-test test deploy_skiptest

 pip_install: require_virtualenv_in_dev
        pip install -r requirements.txt

You can see that running python tests is already a deploy blocker.

Email logins

Let people create an account and sign in with their email address.

Use OpenData terms data for class schedules

Right now we parse the user's class schedule to get date and time information for their courses. This is undesirable because:

  1. The class schedule format can vary across students, and is even ambiguous in some cases (#107)
  2. Too much regex :(

Instead, we should simply parse out the current school term, along with the class number and section number of each course, and look it up in SectionMeeting (which is backed by OpenData terms data)

data/processor.py shows how the section data is processed (fetched by data/crawler.py)
server/static/js/schedule.js is where the schedule gets parsed.

Allow Facebook login for users who signed up with email

We should retrieve additional information from their Facebook account (e.g. friends) and merge it into their existing Flow account. There are at least two cases:

  1. User signs up with an email. Logs out. Clicks "Sign in with Facebook" on the front page, while logged in to a Facebook account with the same email.
  2. User signs up with an email. Clicks a "Connect with Facebook banner".

The first case currently can happen and it results in a duplicate keys error when saving the user. We should write a test to reproduce this and fix it by merging info from Facebook instead of making a new user.

There's no UI that facilitates the second case right now, so there's no problem. We can reuse the banners that we already show to logged out users.

Unified search bar in header

It would be awesome to have a unified search bar in the header that would let me jump to course pages or user pages.

Right now, searching for my friends means I load my profile page, wait for all their names to load, then use Ctrl/Cmd-F to find their names. This could definitely be improved.

Major UI component on this, so please post mockups before starting code for this.

Privacy controls for profile page

Currently, your profile page is completely visible to all of your Facebook friends who signed up on Flow. Allow users to make their profile page only visible to themselves.

Add whitespace linting for all files

Currently, we only have whitespace linting for python files. However, we should enforce whitespace rules for all source files.

Some things to lint for include:

  • disallowing tabs
  • disallowing trailing whitespace
  • enforcing 80-char line length

Create a label for "starter issues"

This is more of a repo maintenance thing, but this has worked well for CocoaPods. Creating a label for starter issues makes contributing less intimidating for those that want to be involved in contributing to the project. It helped me get involved in CocoaPods and I think it'll help foster a strong community around Flow as well.

Make professor pages

Right now professors are only shown in the context of course pages.

There have been a few professors that were so good, I'd take pretty much anything they taught. It'd be cool to have a page that lists all the courses a prof teaches with associated info and reviews.

This would have a major UI component, so please post mockups before you dive into the code on this one.

Cluster together sections with same location+time in exam schedule

Since switching to Open Data v2 to get data for the exam schedule in #43, all sections are listed separately now.

It'd be great to group together sections that have the same time slot and location, so as to avoid creating like 9 entries all for the same time in the user's calendar (say, if they export to GCal). I believe this was the behaviour under the v1 API (which was a more direct mapping from the PDF, which does the clustering for brevity).

So, instead of

image

... we'd have

image

Add unit testing framework for JS

We don't currently have anything for unit testing our JavaScript code. Having this would be useful for testing things like the transcript and schedule parsing logic (see #105).

I don't really have any strong feeling for any specific unit testing framework, but one I have encountered in the past in Jasmine.

Link to import transcript page (/onboarding) from profile page

Currently, the only way to upload your previous terms' courses is by uploading your transcript. The only time you get a chance to do that is on initial account creation, where the user is taken to this page initially: https://uwflow.com/onboarding

We never link to that page anywhere else. Consequently, users may not know how to upload previous terms.

An easy fix would be to add an "Add previous terms" button to your profile page linking to /onboarding, similar to the "Add term" button already there.

Support Co-ops

Most students would love to be able to rate employers, get realistic salary data coming into an interview (CECS is notoriously bad at salary transparency), and reviews from past co-ops.

Re-enable serving minified JS files on production

Minifying JS was disabled twice (yes, how is that possible you ask) on two different occasions:

  1. We use Airbrake to email us details of uncaught JS exceptions in production. However, in order to get intelligible error messages that we can actually act on, we disabled serving minified files on prod in 52fae50b. However, it seems like Airbrake now supports source maps, so we can get our deploy build process to generate source maps and configure Airbrake to use them.
  2. #161 broke the JS compile step in deploy.sh due to changes to server/static/js/main.js. So, we disabled compiling JS files during deploy in ad83530 (see that commit for more details).

We'll need to resolve both of these issues to be able to serve minified JS in prod.

Add login button to the nav bar

Allow users to log into their account from any page by adding it to the nav bar. Clicking the button should show the login modal. Possibly add a link to the signup modal as well.

screen shot 2014-03-23 at 1 34 54 pm

We sometimes get negative ratings

Occasionally, when a user makes a rating, we'll get an error on the server like this:

  File "./rmc/server/server.py", line 1094, in user_course
    uc.save()
  File "./rmc/models/user_course.py", line 226, in save
    cur_professor.save()
  File "./rmc/models/professor.py", line 73, in save
    super(Professor, self).save(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/mongoengine/document.py", line 208, in save
    self.validate()
  File "/usr/local/lib/python2.7/dist-packages/mongoengine/base.py", line 1025, in validate
    raise ValidationError('ValidationError', errors=errors)
ValidationError: ValidationError(sorting_score_negative.Float value is too small: ['passion'])

This has been happening for more than 1.5 years, but we never seriously looked into it. I just now looked at the code in rating.py and it was evident what the bug was. Here's a repro: https://gist.github.com/divad12/5fd40c6278548fd43ac5

For more context: rating.py defines the AggregateRating class. An AggregateRating is something like "75% of 100 users thought this course was easy" — ie. an aggregate rating on a single metric. In this example, AggregateRating#rating would be 0.75 and AggregateRating#count would be 100. Note that AggregateRating#rating should always be in [0, 1], but the error raised here is that it's not.

Also note that users rate with binary metrics — yes or no. So the rating parameter in AggregateRating#add_rating(self, rating) is always 0 or 1.

See if you can figure it out! 😛

Calling get_ratings() for a prof doesn't return "easiness"

When calling get_ratings() in professor.py, an array containing four objects is returned, but the last one, "easiness", is always 0.

As an example, if I call get_ratings() on Eddie Dupont, it returns this:

{'overall': {'count': 50, 'rating': 1.0}, 'clarity': {'count': 50, 'rating': 1.0}, 'passion': {'count': 49, 'rating': 1.0}, 'easiness': {'count': 0, 'rating': 0.0}}

Note that both count and rating are 0

Exclude /api/v1/login/facebook from CSRF protection

Due to turning on CSRF protection in #63, POSTing to /api/v1/login/facebook from our Android app returns a 403 Forbidden. We need to exclude this endpoint from CSRF protection and have it return the CSRF token.

Also need to ensure this endpoint is a no-op if user is already logged in, else a CSRF attack on this endpoint would allow an attacker to login a user's browser to the attacker's account.

Combine transcript and schedule parsing

Users have been trying to import their transcripts with the "Add Term" button (see screenshot). However, this accepts only schedules. This leads users to believe that our transcript parsing is broken and they are unable to import their previous courses.

screen shot 2014-03-21 at 3 24 46 pm

We should make the copy & paste boxes compatible with both transcript and schedule pastes from Quest.

Broken professor name links in sections table on course detail page

Eg.

image

... links to the non-existent professor review https://uwflow.com/course/math135#j_huang

There's two ways I see to fix this:

  1. Don't link to profs w/o a corresponding review. This would only involve JS changes.
  2. Ensure there's always a corresponding prof listed under a course for all the profs in the sections table. This has the added bonus of keeping the "profs who teach this course" data up to date (which is used in the select2 dropdown for reviewing a prof). If you're going for this approach, you'd have to use the same heuristic (as used in #98) on the server to do approximate name matching on profs. The best place to fix this would be in the data scraping: https://github.com/UWFlow/rmc/blob/master/data/processor.py#L536

Inconsistent order of 'ratings' array contents in course info

The issue is that the ordering of the usefulness, interest, and easiness ratings is inconsistent between the overall "ratings" array and the "ratings" arrays included in the individual course reviews.

Shown below is a snippet from the response at /api/v1/courses/:id
The overall ratings array is shown at the top, and another ratings array (attached to a course review) is shown further down. Notice that the order of the easiness and interest array elements is reversed.

"ratings": [
    {
      "count": 162, 
      "rating": 0.6790123456790125, 
      "name": "usefulness"
    }, 
    {
      "count": 302, 
      "rating": 0.8178807947019866, 
      "name": "interest"
    }, 
    {
      "count": 242, 
      "rating": 0.7272727272727272, 
      "name": "easiness"
    }
  ], 
  "code": "PSYCH 101", 
  "name": "Introductory Psychology", 
  "overall": {
    "count": 311, 
    "rating": 0.8842443729903537
  }, 
  "reviews": [
    {
      "comment": "Rote memorization.", 
      "ratings": [
        {
          "rating": 0.0, 
          "name": "usefulness"
        }, 
        {
          "rating": 1.0, 
          "name": "easiness"
        }, 
        {
          "rating": 0.0, 
          "name": "interest"
        }
      ], 
      "comment_date": 1354093463046, 
      "author": {

Add a "smart app banner" for Android browser visitors linking to Android app

On iOS, you sometimes get this banner when you visit a website indicating there's a mobile app available:

image

Since we now have an Android app, and our website is not mobile-optimized, it would be good to indicate that to visitors from Android. Last month we had about 130 visits from an Android browser.

Perhaps we could integrate one of these plugins:

Send deploy notifications to the public UW Flow HipChat room

It's really helpful to see deploy notifications in the same chat window as the GitHub notifications, like we used to (before we had our public HipChat room). Send notifications to both the internal Operations room as well as the public "UW Flow" room.

Get course information from opendata as cronjob

Currently, we are manually crawling the undergrad calendar to get course descriptions. This should be ported to use the OpenData API. Furthermore, we should have cronjob that automatically fetches updated course descriptions periodically.

Handle parsing ambiguous dates from Quest schedules

When parsing schedules from Quest, we assume date range information for courses come in MM/DD/YYYY format (which is the case for most users).

However, Quest also has schedules where the date format is DD/MM/YYYY format for a small subset of users. We will need to use heuristics to guess which format the dates are in.

Below is a link to a schedule with the unsupported DD/MM/YYYY format: https://gist.github.com/mduan/9334117.

Schedule Building

I would like the ability to add and remove classes based on the scheduling information available. At the current moment you can only update classes based on your quest schedule, but if I am trying to decide how to schedule my classes such that everything fits, I need the ability to add/remove classes manually.

Aggregate shortlist

It would be useful to know the most popular courses that your friends plan to take in the future (ie. have in their shortlist).

People post questions like this in class Facebook groups:

image

Filter out phantomjs requests from analytics stats

Recently, we added a weekly cronjob to have phantom.js to create snapshots of our dynamic course pages (so they can be indexed by search engine spiders). See 2c675ec.

However, for phantom.js to create the snapshots for the course pages, it has to make a request to each course page. This screws up our analytic stats on Mixpanel and Google Analytics. For example, the spikes in the # of visitors per day graph (Google Analytics) below are caused by phantom.js:

image

So we need to either find a way to either filter out phantom.js from Mixpanel and Google Analytics stats or find a way to generate the snapshots without generating requests to the production site.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.