rScielo

rScielo provides a set of functions to scrape meta-data from scientific articles hosted on the Scientific Electronic Library Online Platform (Scielo.br). The meta-data information includes authors' names, articles' titles, year of the publication, among others. The package also provides additional functions to summarize the scrapped data.

How does it work?

Getting a journal's ID

The rScielo package scrapes data based on a journal ID (or pid). For example, consider the link to the Brazilian Political Science Review homepage on Scielo:

http://www.scielo.br/scielo.php?script=sci_serial&pid=1981-3821&lng=en&nrm=iso

The ID is located between &pid= and &lng (i.e., 1981-3821). Most of rScielo functions depend on this argument. To automatically extract an ID from a journal hosted on Scielo, you may also use the get_id_journal() function:

get_id_journal("http://www.scielo.br/scielo.php?script=sci_serial&pid=1981-3821&lng=en&nrm=iso")
#> [1] "1981-3821"

Scraping data

To scrape meta-data from all articles of a journal hosted on Scielo, use the get_journal() function:

df <- get_journal("1981-3821")

Then summarize the scrapped data with summary:

summary(df)
#> 
#> ### JOURNAL SUMMARY: Brazilian Political Science Review (2012 - 2016)
#> 
#> 
#>  Total number of articles:  98 
#>  Total number of articles (reviews excluded):  67
#> 
#>  Mean number of authors per article:  1.61 
#>  Mean number of pages per article:  29.38

The rScielo package also provides a function to scrape meta-data from a single article:

# The article's URL on Scielo
url <- "http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1981-38212016000200201&lng=en&nrm=iso&tlng=en"

# Scrape the data
article <- get_article(url)

Finally, get_journal_info() and get_journal_list() scrapes a journal's meta-information (publisher, ISSN, and mission) and a list of all journals hosted on Scielo, respectively:

# Get a journal's meta-information
meta_info <- get_journal_info("1981-3821")

# Get a list with all journals names, URLs and IDs
journals <- get_journal_list()

Scraping metrics

With the rScielo, it is possible to scrape several publication and citation metrics of a journal hosted on Scielo:

# Gets citation metrics
cit <- get_journal_metrics("1981-3821")

# Plots the data for a quick visualization
plot(cit)

Functions

Here is a description of the rScielo functions:

get_id_journal(): Gets a journal's ID from its url.
get_journal(): Gets meta-data from all articles published by a journal.
get_article(): Gets meta-data from a single article.
get_journal_info(): Gets a journal's description.
get_journal_list(): Gets a list with all journals' names, URLs and ID's.
get_journal_metrics(): Gets publication and citation metrics of a journal.

Installation

Install the latest stable release from CRAN via:

install.packages("rScielo")

Alternatively, install the latest pre-release version from GitHub via:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("meirelesff/rScielo")

Author

Fernando Meireles

License

GPL (>= 2)

lgallindo / rscielo Goto Github PK

rscielo's Introduction

rScielo

How does it work?

Getting a journal's ID

Scraping data

Scraping metrics

Functions

Installation

Author

License

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent