Giter Club home page Giter Club logo

past-year-scrapper's Introduction

Deprecation Notice

This project has been deprecated. It was written when I first started to learn coding, hence the codebase and implementation is messy af, making recreating the project from scratch to be much feasible.

I have decided to revamp this project with a cleaner codebase and faster performance using NodeJS. You may find it here.

You are still welcome to make changes to this project, even though I'm not maintaining it anymore. Currently the main scrapper has ceased to work and the project dependencies have vulnerabilities.

Past Year Scrapper

Developed using Ruby on Rails, by using Mechanize for web scrapping, Rubyzip for zipping on-the-fly, Parallel for concurrent downloading (in the background).
App available for free under MIT Licence.

#### Current development status: BETA release with basic functionalities working.

Current development status: ABANDONED Codebase available for public use, however might need some tweaking to get it working again.

Special features:

  • Search and download past year papers from MMU's Vlib online library!
  • Zip files on-the-fly feature
  • Download multiple past year paper all at once
  • Support single file download API at "/subject_id"

Note for developers:

  • This project requires background workers and access rights to the tmp folder to run (to store the past year papers temporarily). Hence, it would not run on Heroku (it uses ephemeral storage).
  • Set up crontab using whenever gem to auto clear the temp files
  • Set up student_id and student_password in environment variables to access Vlib. The app automatically updates and maintains the session.

How to run:

  • Use a server
  • Use localhost by running rails s. Be sure to clear your tmp folder or set up crontab using whenever --update-crontab command

To do list (Welcome to contribute)

  • Exception handling (for session errors)
  • UI improvements
  • Upgrade to using web sockets instead of using polling
  • Any other relevant pull requests would be accepted.

past-year-scrapper's People

Contributors

dannyongtey avatar

Stargazers

Prev Wong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.