Giter Club home page Giter Club logo

next-headlines-archiver's Introduction

Next Headlines Archiver

Firebase Functions / Node.js / Puppeteer / FaunaDB / Next.js / Auth0 / Tailwind CSS

Full Stack Data Aggregation app that archives the headlines content from CNN and Fox News on scheduled intervals and allows users to scroll through a news timeline to see how they are reported on these sites.


How is data collected and displayed?

Firebase Cloud Function - Puppeteer - FaunaDB

A Firebase Pub/Sub Cloud Function is setup to run a background job every hour. This function uses Node.js and Puppeteer to scrape the headlines content from both CNN and Fox and then saves it to FaunaDB using the FaunaDB JavaScript Driver.

Data Management with FaunaDB

FaunaDB, the chosen database for this project, is a transactional database built in the cloud with a fast and developer friendly API.

The data collected from the web scraper is saved as documents in the news Collection with the following shape:

{
  "ref": Ref(Collection("news"), "1"), // Fauna specific
  "ts": 1617917180920000, // Fauna specific
  "data": {
    "provider": "CNN or Fox",
    "headLineUrl": "headLineUrl",
    "headLineTitle": "headLineTitle",
    "headLineImg": "headLineImg",
    "headLineTxt": "headLineTxt",
    "headLineTs": 1617917167766,
    "headLineUTCDate": "Thu, 08 Apr 2021 21:26:07 GMT"
    "likes": 0
  }
}

Several indexes were created to enforce uniqueness, perform searches and data sorting as well as custom functions to pull the data for different scenarios and paged results.

Displaying Data with Next.js

The client-side is built with Next.js, a framework built on top of React, which features hybrid static & server rendering. The data is displayed in the form of a timeline with cards shown side by side.


Client-side Features

  • Hybrid pages featuring both Static & Server Side Rendering and SEO
  • Static pages also feature Incremental Static Regeneration
  • Server and Client side rendering for search and filters
  • Optimized images
  • Data is displayed in cards over a timeline
  • Uses Tailwind CSS framework for styling
  • Responsive design
  • Progress bar indicator shown on page transitions
  • Search news
  • Filters included by provider or by dates range
  • Cards likes system
  • Only logged in users may like cards once
  • Auth0 for user authentication

Clonning this repo

If you'd like to clone this repo, you'd first need to setup a firebase Pub/Sub Cloud Function for the web scrapper background job (code found in the functions folder). Then open a FaunaDb account and setup a database with two collections (news and likes) and get a server key. However, further tweaks are needed to pull the correct data for the various API endpoints, search & sort and likes. Setup necessary environment variables.

git clone https://github.com/luvagu/next-headlines-archiver.git

cd next-headlines-archiver

npm install

npm run dev

Deploy the scheduled crawler after setting up and liking to your firebase project

cd functions

npm run deploy

next-headlines-archiver's People

Contributors

luvagu avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.