rScielo
provides a set of functions to scrape meta-data from scientific articles hosted on the Scientific Electronic Library Online Platform (Scielo.br). The meta-data information includes authors' names, articles' titles, year of the publication, among others. The package also provides additional functions to summarize the scrapped data.
The rScielo
package scrapes data based on a journal ID (or pid). For example, consider the link to the Brazilian Political Science Review homepage on Scielo:
http://www.scielo.br/scielo.php?script=sci_serial&pid=1981-3821&lng=en&nrm=iso
The ID is located between &pid=
and &lng
(i.e., 1981-3821
). Most of rScielo
functions depend on this argument. To automatically extract an ID from a journal hosted on Scielo, you may also use the get_id_journal()
function:
get_id_journal("http://www.scielo.br/scielo.php?script=sci_serial&pid=1981-3821&lng=en&nrm=iso")
#> [1] "1981-3821"
To scrape meta-data from all articles of a journal hosted on Scielo, use the get_journal()
function:
df <- get_journal("1981-3821")
Then summarize the scrapped data with summary
:
summary(df)
#>
#> ### JOURNAL SUMMARY: Brazilian Political Science Review (2012 - 2016)
#>
#>
#> Total number of articles: 98
#> Total number of articles (reviews excluded): 67
#>
#> Mean number of authors per article: 1.61
#> Mean number of pages per article: 29.38
The rScielo
package also provides a function to scrape meta-data from a single article:
# The article's URL on Scielo
url <- "http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1981-38212016000200201&lng=en&nrm=iso&tlng=en"
# Scrape the data
article <- get_article(url)
Finally, get_journal_info()
and get_journal_list()
scrapes a journal's meta-information (publisher, ISSN, and mission) and a list of all journals hosted on Scielo, respectively:
# Get a journal's meta-information
meta_info <- get_journal_info("1981-3821")
# Get a list with all journals names, URLs and IDs
journals <- get_journal_list()
With the rScielo
, it is possible to scrape several publication and citation metrics of a journal hosted on Scielo:
# Gets citation metrics
cit <- get_journal_metrics("1981-3821")
# Plots the data for a quick visualization
plot(cit)
Here is a description of the rScielo
functions:
get_id_journal()
: Gets a journal's ID from its url.get_journal()
: Gets meta-data from all articles published by a journal.get_article()
: Gets meta-data from a single article.get_journal_info()
: Gets a journal's description.get_journal_list()
: Gets a list with all journals' names, URLs and ID's.get_journal_metrics()
: Gets publication and citation metrics of a journal.
Install the latest stable release from CRAN via:
install.packages("rScielo")
Alternatively, install the latest pre-release version from GitHub via:
if (!require("devtools")) install.packages("devtools")
devtools::install_github("meirelesff/rScielo")
GPL (>= 2)