Giter Club home page Giter Club logo

rscielo's Introduction

rScielo

Travis-CI Build Status AppVeyor Build Status Package-License CRAN_Status_Badge

rScielo provides a set of functions to scrape meta-data from scientific articles hosted on the Scientific Electronic Library Online Platform (Scielo.br). The meta-data information includes authors' names, articles' titles, year of the publication, among others. The package also provides additional functions to summarize the scrapped data.

How does it work?

Getting a journal's ID

The rScielo package scrapes data based on a journal ID (or pid). For example, consider the link to the Brazilian Political Science Review homepage on Scielo:

http://www.scielo.br/scielo.php?script=sci_serial&pid=1981-3821&lng=en&nrm=iso

The ID is located between &pid= and &lng (i.e., 1981-3821). Most of rScielo functions depend on this argument. To automatically extract an ID from a journal hosted on Scielo, you may also use the get_id_journal() function:

get_id_journal("http://www.scielo.br/scielo.php?script=sci_serial&pid=1981-3821&lng=en&nrm=iso")
#> [1] "1981-3821"

Scraping data

To scrape meta-data from all articles of a journal hosted on Scielo, use the get_journal() function:

df <- get_journal("1981-3821")

Then summarize the scrapped data with summary:

summary(df)
#> 
#> ### JOURNAL SUMMARY: Brazilian Political Science Review (2012 - 2016)
#> 
#> 
#>  Total number of articles:  98 
#>  Total number of articles (reviews excluded):  67
#> 
#>  Mean number of authors per article:  1.61 
#>  Mean number of pages per article:  29.38

The rScielo package also provides a function to scrape meta-data from a single article:

# The article's URL on Scielo
url <- "http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1981-38212016000200201&lng=en&nrm=iso&tlng=en"

# Scrape the data
article <- get_article(url)

Finally, get_journal_info() and get_journal_list() scrapes a journal's meta-information (publisher, ISSN, and mission) and a list of all journals hosted on Scielo, respectively:

# Get a journal's meta-information
meta_info <- get_journal_info("1981-3821")

# Get a list with all journals names, URLs and IDs
journals <- get_journal_list()

Scraping metrics

With the rScielo, it is possible to scrape several publication and citation metrics of a journal hosted on Scielo:

# Gets citation metrics
cit <- get_journal_metrics("1981-3821")

# Plots the data for a quick visualization
plot(cit)

Functions

Here is a description of the rScielo functions:

  • get_id_journal(): Gets a journal's ID from its url.
  • get_journal(): Gets meta-data from all articles published by a journal.
  • get_article(): Gets meta-data from a single article.
  • get_journal_info(): Gets a journal's description.
  • get_journal_list(): Gets a list with all journals' names, URLs and ID's.
  • get_journal_metrics(): Gets publication and citation metrics of a journal.

Installation

Install the latest stable release from CRAN via:

install.packages("rScielo")

Alternatively, install the latest pre-release version from GitHub via:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("meirelesff/rScielo")

Author

Fernando Meireles

License

GPL (>= 2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.