Giter Club home page Giter Club logo

paste-corral's Introduction

Paste Corral crawls pastebin.com to collect, clean, and store PasteBin posts. Paste Corral concurrently provides a REST API endpoint so developers can easily consume cleaned PasteBin data to perform analytics. 

See pastecorral.com/api for a live version of Paste Corral.

  • At the moment it only supports using a simple GET request method.
  • You can test it using curl -i -X GET http://pastecorral.com/api

API JSON Response

Field Name Definition
author The paste author
title The paste title
content The paste content
pdate The date the paste was created (UTC)

Web Crawler and ETL Overview


Setup

Step 1: Fork and then clone this GitHub repo.

Step 2: Create an account on Heroku

Step 3: Install Heroku CLI

Step 4: Run heroku create

  • This creates a new empty application on Heroku, along with an associated empty Git repository.
  • Run this command from your app’s root directory, so the empty Heroku Git repository is automatically set as a remote for your local repository.

Step 5: Add a free Heroku Postgres Starter Tier dev database to your app:

  • heroku addons:create heroku-postgresql:hobby-dev

Create a .env file.

  • Note, this file is intentionally in .gitignore
  • Add PORT=8080 to the file

Show the $DATABASE_URL environment variable:

  • heroku config
  • Add that DATABASE_URL environment variable to the .env file.

Step 6: Connect to the Heroku PostgreSQL instance and run:

  1. data/ddl/setup.sql
  2. data/ddl/paste_data_etl.sql

You can view your credentials using the heroku config command.

You can connect using any PostgreSQL admin tool. If you're using VSCode, the PostgreSQL Explorer extension works great.


General Notes

If you make any code changes, remember to:

  • Commit and push the changes to your GitHub repo
  • Then push to Heroku as well: git push heroku master

To view information about your running Heroku app:

  • heroku logs --tail

To open your Heroku app (in this case a REST API endpoint):

  • heroku open

paste-corral's People

Contributors

ylazerson avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.