Giter Club home page Giter Club logo

varjo-sport-lambda-scraper's Introduction

Scrapes the basic info (mainly business hours) of all Unisport gyms.

Why you ask? Because as of now you can't view them all on one page and they are two clicks away from the frontpage. Who designed such moronic system, I don't know, but to me it's annoying since I have no interest in other stuff than seeing how long its open today.

How to run the scraper locally

Requires Python >=3.6 with preferably virtualenv. I use virtualenv-wrapper, some have said pyenv is pretty good too.

  1. Activate your virtualenv eg workon varjos
  2. Install dependencies: pip install -r requirements.txt (Scrapy and Twisted)
  3. Load the dev commands: . cmds.sh
  4. Run the spider: crawl

You should get ./frontend/unisport_gyms.json file with the data scraped.

Use shell to open interactive Scrapy shell to test CSS selectors without having to run the spider.

How to run the test server

Requires Node.js >=10.

  1. Run: node server.js
  2. The server should run at http://localhost:4040/

Reload the page after making changes to the files inside frontend-folder.

How to deploy the fronted

Requires AWS account and one S3 bucket.

  1. Set the bucket's permissions to allow public bucket access and enable static website hosting
  2. Configure your local AWS user with access to that bucket
  3. You should replace the bucket name in my cmds.sh script with your own
  4. Then deploy the code with AWS_PROFILE=varjosport.net-ci deploy_front where AWS_PROFILE is your local AWS profile.

Go to the bucket's website URL to see the app running eg http://varjosport.net.s3-website.eu-north-1.amazonaws.com

If you want to deploy it to your own domain incase this one dies out for some reason, you have to configure Route 53, Certificate Manager, and CloudFront too. Pretty basic configuration, so I didn't bother writing it down as a template.

How to run the serverless lambda

Requires Node.js >=10. Docker if you want to deploy it. AWS account with one local AWS user with admin privileges (I'm lazy) and one S3 bucket (the same bucket you use to host the frontend).

  1. Install dependencies: npm i
  2. Run npm run invoke to execute the lambda. It will most probably fail because I hard-coded the profile and the bucket. Change them to your own
  3. Similar to the previous command, npm run deploy will deploy the lambda but I've hard-coded the parameters

Once deployed, instead of waiting 24 hours for the lambda to run, you can trigger it manually by going to your AWS console's Lambda page for this function and creating & sending a test event.

varjo-sport-lambda-scraper's People

Contributors

teemukoivisto avatar

Stargazers

yoogr avatar Zhensong Ren avatar Jay Ess avatar

Watchers

James Cloos avatar

varjo-sport-lambda-scraper's Issues

Vulnerability report

We are a group of researchers from Leiden University, and we conduct research on vulnerabilities in open-source software. We have discovered and verified a high-severity vulnerability in your project(TeemuKoivisto/varjo-sport-lambda-scraper). Explaining the vulnerability further in this issue could allow malicious users to access details, so we recommend enabling private vulnerability reporting on GitHub to discuss this matter confidentially.
After you have enabled this feature, please add a comment to this issue so we can continue our discussion. If you have any questions, feel free to leave a reply here or send an email to: j.akhoundali [at] liacs.leidenuniv.nl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.