Giter Club home page Giter Club logo

pngnewsr's Introduction

pngnewsR

license pngnewsR badge R badge Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public

Introduction

The pngnewsRis an open-source webscraper package that scrapes the news articles from three website in Papua New Guinea: Loop PNG, Post-Courier and The National. Its expectation is that the data derived from the packages functions provide the first steps of data collection towards news sentiment analysis. pngnewsR does this in such a way that its outputs are in tabular form and can further be worked on by other R package functions.

Installation

To be able to use this package, you will need to first install R and RStudio into your machine.

Once that has downloaded and installed, run the following script below:

library(devtools) 
install_github("charlieikosi/pngnewsR")

Load Package

Load the package using the script:

library(pngnewsR)

This allows you to utilize several of pngnewsR's webscraping functions.

Available Functions
  • article_content() scrapes news article content only
  • business_lp() scrapes only business news articles from loop png news website
  • business_na() scrapes only business news articles from the national news website
  • business_pc() scrapes only business news articles from the post courier news website
  • feature() scrapes only featured news articles from the post courier news website
  • topstories_pc() scrapes only top story news articles from the post courier news website
  • national_lp() scrapes only national news articles from loop png news website
  • national_na() scrapes only national news articles from the national news website
  • national_pc() scrapes only national news articles from post courier news website
  • world_pc() scrapes only world news articles from post courier news website
  • sport_lp() scrapes only sport news articles from loop png news website
  • sport_na() scrapes only sport news articles from the national news website
  • sport_pc() scrapes only sport news articles from post courier news website
  • scrape_news() scraper function that can be used to call other functions listed above.
Usage

All functions except the scrape_news() take only one argument, pages which must be an integer.

scrape_news() takes three arguments i.e. pages and news and agent. The news argument takes in a character string and must be either "business", "sport", "world", "national", "topstories" or "feature". The agent argument takes in character strings and must either be "postcourier", "looppng" or "thenational".

The article_content() takes only a url argument of class Character. This url must have a base/hostname from Post Courier, Loop PNG and The National news websites only.

Examples
url <- "https://www.looppng.com/business/fiscal-stability-agreement-p%E2%80%99nyang-signed-125267"
news_content <- article_content(url)
`scrape_news()` takes three arguments i.e. `pages` and `news` and `agent`. The `news` argument takes in a character string and must be either `"business", "sport", "world", "national", "topstories" or "feature"`. The `agent` argument takes in character strings and must either be "postcourier", "looppng" or "thenational".
business_df <- business_lp(page=1)
topstories_df <- topstories_pc(1)
num <- 1
sport_df <- sport_lp(num)
topstories_df2 <- scrape_news(page=2,news="topstories", "postcourier)
Outputs

pngnewsR functions endeavour to structure all scrapped data into tabular form as a tibble. Three columns make up the tabular data:

  • Pub.Date - date of publication
  • Top.Stories - title of news articles
  • URL - article url from which it was scrapped
# A tibble: 10 × 3
   Pub.Date        Top.Stories                                        URL                             
   <chr>           <chr>                                              <chr>                           
 1 August 21, 2023 Garap: Kerevat jail needs urgent govt intervention https://www.postcourier.com.pg/…
 2 August 21, 2023 OSICA calls to support locally produced rice       https://www.postcourier.com.pg/…
 3 August 21, 2023 Bands sign contract with Big Records               https://www.postcourier.com.pg/…
 4 August 21, 2023 Danny’s Travel – My Experience with Traffic in POM https://www.postcourier.com.pg/…
 5 August 18, 2023 Chasing My Dream                                   https://www.postcourier.com.pg/…
 6 August 18, 2023 Bilum sales add extra income to a Mum              https://www.postcourier.com.pg/…
 7 August 18, 2023 UPNG Geology Student helped develop his community  https://www.postcourier.com.pg/…
 8 August 18, 2023 Cleaner by day, taxi driver at night               https://www.postcourier.com.pg/…
 9 August 18, 2023 Perfect taxi condition for my customers’ safety    https://www.postcourier.com.pg/…
10 August 18, 2023 Frangipani Festival set                            https://www.postcourier.com.pg/…
> 

Demonstration

To demonstrate the pngnewsR package functions, we have created a basic rshiny app that will enable non-coders to work on the front-end which is user friendly. We've also included an added option to download the scraped data as a .csv

pngnewsr's People

Contributors

charlieikosi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.