Giter Club home page Giter Club logo

citecorp's Introduction

citecorp

cran checks Project Status: Active - The project has reached a stable, usable state and is being actively developed. R-check rstudio mirror downloads cran version

Client for the Open Citations Corpus http://opencitations.net/ (OCC)

OCC created their own identifiers called Open Citation Identifiers (oci), e.g.,

020010009033611182421271436182433010601-02001030701361924302723102137251614233701000005090307

You are probably not going to be using oci identifiers, but rather DOIs and/or PMIDs and/or PMCIDs. See ?oc_lookup for methods for cross-walking among identifier types.

If you'd like to use the OpenCitations Sparql endpoint yourself you can find that at http://opencitations.net/sparql

Install

CRAN version

install.packages("citecorp")

Development version

remotes::install_github("ropensci/citecorp")
library("citecorp")

Methods for converting IDs

oc_doi2ids("10.1097/igc.0000000000000609")
#>                            doi                           paper      pmcid
#> 1 10.1097/igc.0000000000000609 https://w3id.org/oc/corpus/br/1 PMC4679344
#>       pmid
#> 1 26645990
oc_pmid2ids("26645990")
#>                            doi                           paper      pmcid
#> 1 10.1097/igc.0000000000000609 https://w3id.org/oc/corpus/br/1 PMC4679344
#>       pmid
#> 1 26645990
oc_pmcid2ids("PMC4679344")
#>                            doi                           paper      pmcid
#> 1 10.1097/igc.0000000000000609 https://w3id.org/oc/corpus/br/1 PMC4679344
#>       pmid
#> 1 26645990

You can pass in more than one identifer to each of the above functions:

oc_doi2ids(oc_dois[1:6])
#>                                  doi                                 paper
#> 1               10.1128/jvi.00758-10 https://w3id.org/oc/corpus/br/5357460
#> 2 10.1111/j.2042-3306.1989.tb02167.x  https://w3id.org/oc/corpus/br/589891
#> 3       10.1097/rli.0b013e31821eea45 https://w3id.org/oc/corpus/br/3931705
#> 4           10.1177/0148607114529597 https://w3id.org/oc/corpus/br/5016780
#> 5            10.1111/1567-1364.12217 https://w3id.org/oc/corpus/br/3819297
#> 6      10.1016/s0168-9525(99)01798-9 https://w3id.org/oc/corpus/br/4606537
#>        pmcid     pmid
#> 1 PMC2953162 20702630
#> 2       <NA>  2670542
#> 3       <NA> 21577119
#> 4       <NA> 24711119
#> 5       <NA> 25263709
#> 6       <NA> 10461200

COCI methods

OpenCitations Index of Crossref open DOI-to-DOI references

If you don't load tibble you get normal data.frame's

library(tibble)
doi1 <- "10.1108/jd-12-2013-0166"
# references
oc_coci_refs(doi1)
#> # A tibble: 37 x 7
#>    journal_sc author_sc timespan citing    oci             cited        creation
#>  * <chr>      <chr>     <chr>    <chr>     <chr>           <chr>        <chr>   
#>  1 no         no        P9Y2M5D  10.1108/… 02001010008361… 10.1001/jam… 2015-03…
#>  2 no         no        P41Y8M   10.1108/… 02001010008361… 10.1002/asi… 2015-03…
#>  3 no         no        P25Y6M   10.1108/… 02001010008361… 10.1002/(si… 2015-03…
#>  4 no         no        P17Y2M   10.1108/… 02001010008361… 10.1007/bf0… 2015-03…
#>  5 no         no        P2Y2M3D  10.1108/… 02001010008361… 10.1007/s10… 2015-03…
#>  6 no         no        P5Y8M27D 10.1108/… 02001010008361… 10.1007/s11… 2015-03…
#>  7 no         no        P2Y3M    10.1108/… 02001010008361… 10.1016/j.w… 2015-03…
#>  8 no         no        P1Y10M   10.1108/… 02001010008361… 10.1016/j.w… 2015-03…
#>  9 no         no        P12Y     10.1108/… 02001010008361… 10.1023/a:1… 2015-03…
#> 10 no         no        P13Y10M  10.1108/… 02001010008361… 10.1038/350… 2015-03…
#> # … with 27 more rows
# citations
oc_coci_cites(doi1)
#> # A tibble: 23 x 7
#>    journal_sc author_sc timespan  citing     oci               cited    creation
#>  * <chr>      <chr>     <chr>     <chr>      <chr>             <chr>    <chr>   
#>  1 no         no        P3Y       10.1145/3… 0200101040536030… 10.1108… 2018    
#>  2 no         no        P2Y5M     10.1057/s… 0200100050736280… 10.1108… 2017-08 
#>  3 no         no        P4Y1M1D   10.3233/d… 0200302030336132… 10.1108… 2019-04…
#>  4 no         no        P4Y5M10D  10.3233/d… 0200302030336132… 10.1108… 2019-08…
#>  5 no         no        P1Y0M14D  10.3233/s… 0200302030336283… 10.1108… 2016-03…
#>  6 no         no        P3Y10M12D 10.3233/s… 0200302030336283… 10.1108… 2019-01…
#>  7 no         no        P3Y6M     10.1142/s… 0200101040236280… 10.1108… 2018-09 
#>  8 no         no        P2Y11M20D 10.7554/e… 0200705050436142… 10.1108… 2018-03…
#>  9 no         no        P0Y       10.3346/j… 0200303040636192… 10.1108… 2015    
#> 10 no         no        P3Y       10.1007/9… 0200100000736090… 10.1108… 2018    
#> # … with 13 more rows
# metadata
oc_coci_meta(doi1)
#> # A tibble: 1 x 13
#>   doi   reference issue source_id citation page  volume author citation_count
#> * <chr> <chr>     <chr> <chr>     <chr>    <chr> <chr>  <chr>  <chr>         
#> 1 10.1… 10.1001/… 2     issn:002… 10.1145… 253-… 71     Peron… 23            
#> # … with 4 more variables: year <chr>, source_title <chr>, title <chr>,
#> #   oa_link <chr>

Meta

  • Please report any issues or bugs
  • License: MIT
  • Get citation information for citecorp in R doing citation(package = 'citecorp')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

rofooter

citecorp's People

Contributors

sckott avatar selbosh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

citecorp's Issues

Error in doi2ids - arguments imply differing number of rows: 0, 1

I try to convert DOIs to IDs, without success. And I know for a fact that these DOIs are on the Open Citations Corpus, because that's where I found them!

Minimal working example

library(citecorp)
oc_doi2ids('10.1093/biomet/79.3.531')
oc_doi2ids('10.1093/biomet/80.3.527')

Error returned in each case:

Error in data.frame(type = gsub("\\.type", "", names(tmp[, grep("\\.type",  : 
  arguments imply differing number of rows: 0, 1

The problem

This is a common bug due to an unexpected behaviour in how subsetting data frames works. That is, if you subset a data frame and the result is one column, it is automatically collapsed to a vector (not a one-column data frame) unless you specify drop = FALSE.

Here is the culprit
https://github.com/ropenscilabs/citecorp/blob/32929e8a8504652c8f767343166dc1c7ee7f5537/R/oc_lookup.R#L9

And the bug is triggered whenever the preceding tmp variable contains exactly one column with the suffix .type, for example

  paper.type                           paper.value
1        uri https://w3id.org/oc/corpus/br/5902173

because if you run the above code on this, you get character(0) as a result, which is not what you want.

Solution

Whilst I could add , drop = FALSE I would take the opportunity to simplify the code instead. The following works on the examples above.

    tmp <- data.frame(
      type  = gsub('\\.type', '', grep('\\.type', names(tmp), value = TRUE)),
      value = unname(unlist(tmp[, grep('\\.value', names(tmp))])),
      stringsAsFactors = FALSE
    )

oc_coci_cites() fails with multiple DOIs

Hi, just starting to play with this package that looks very cool:

library(citecorp)
library(tibble)

pavo1_doi <- "10.1111/2041-210X.12069"
pavo2_doi <- "10.1111/2041-210X.13174"

oc_coci_cites(pavo1_doi)
#> # A tibble: 67 x 7
#>    cited     timespan citing     journal_sc creation oci               author_sc
#>  * <chr>     <chr>    <chr>      <chr>      <chr>    <chr>             <chr>    
#>  1 10.1111/… P2Y2M    10.1111/b… no         2015-09… 0200101010136111… no       
#>  2 10.1111/… P0Y4M    10.1636/b… no         2013-11  0200106030636110… no       
#>  3 10.1111/… P1Y10M   10.1650/c… no         2015-05  0200106050036122… no       
#>  4 10.1111/… P3Y7M    10.1186/s… no         2017-02… 0200101080636280… no       
#>  5 10.1111/… P5Y5M    10.1155/2… no         2018-12… 0200101050536020… no       
#>  6 10.1111/… P5Y4M    10.1111/e… no         2018-11… 0200101010136142… no       
#>  7 10.1111/… P5Y7M    10.1111/e… no         2019-02… 0200101010136142… no       
#>  8 10.1111/… P2Y6M    10.1002/e… no         2016-01… 0200100000236141… no       
#>  9 10.1111/… P3Y10M   10.1101/1… no         2017-05… 0200101000136010… no       
#> 10 10.1111/… P4Y1M    10.1101/1… no         2017-08… 0200101000136010… no       
#> # … with 57 more rows

oc_coci_cites(pavo2_doi)
#> # A tibble: 4 x 7
#>   cited     timespan  citing    journal_sc creation oci                author_sc
#> * <chr>     <chr>     <chr>     <chr>      <chr>    <chr>              <chr>    
#> 1 10.1111/… P0Y4M6D   10.7717/… no         2019-08… 02007070107362514… no       
#> 2 10.1111/… P0Y2M3D   10.1007/… no         2019-06… 02001000007362801… no       
#> 3 10.1111/… -P0Y0M13D 10.1101/… no         2019-03… 02001010001360508… yes      
#> 4 10.1111/… P0Y4M7D   10.1101/… no         2019-08… 02001010001360703… no

oc_coci_cites(c(pavo1_doi, pavo2_doi))
#> # A tibble: 0 x 0

Created on 2020-04-08 by the reprex package (v0.3.0)

According to the documentation, it should work:

doi (character) one or more Digital Object Identifiers

but maybe this just applies to oc_coci_meta() (which does work with multiple DOIs) and not oc_coci_cites()?

Maintenance status / help needed?

👋 @Selbosh!

Do you still intend to become this package's maintainer?

If so do you need any help? For instance an aspect where you'd appreciate some tips, contributions, a PR review? Do you need an invitation to our friendly Slack workspace?

Package not working anymore?

It seems that citecorp is not working anymore?

See transcript below.

Rainer

> library("citecorp")
> oc_doi2ids("10.1097/igc.0000000000000609")
data frame with 0 columns and 0 rows
> devtools::session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       macOS Ventura 13.4.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Zurich
 date     2023-06-26
 pandoc   3.1.3 @ /opt/homebrew/bin/pandocPackages ──────────────────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cachem        1.0.7   2023-02-24 [1] CRAN (R 4.3.0)
 callr         3.7.3   2022-11-02 [1] CRAN (R 4.3.0)
 citecorp    * 0.3.0   2020-04-16 [1] CRAN (R 4.3.0)
 cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
 crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
 crul          1.4.0   2023-05-17 [1] CRAN (R 4.3.0)
 curl          5.0.1   2023-06-07 [1] CRAN (R 4.3.0)
 data.table    1.14.8  2023-02-17 [1] CRAN (R 4.3.0)
 devtools      2.4.5   2022-10-11 [1] CRAN (R 4.3.0)
 digest        0.6.31  2022-12-11 [1] CRAN (R 4.3.0)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.3.0)
 fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 fauxpas       0.5.2   2023-05-03 [1] CRAN (R 4.3.0)
 fs            1.6.1   2023-02-06 [1] CRAN (R 4.3.0)
 glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
 htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.3.0)
 htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
 httpcode      0.3.0   2020-04-10 [1] CRAN (R 4.3.0)
 httpuv        1.6.9   2023-02-14 [1] CRAN (R 4.3.0)
 jsonlite      1.8.5   2023-06-05 [1] CRAN (R 4.3.0)
 later         1.3.0   2021-08-18 [1] CRAN (R 4.3.0)
 lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
 mime          0.12    2021-09-28 [1] CRAN (R 4.3.0)
 miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
 pkgbuild      1.4.0   2022-11-27 [1] CRAN (R 4.3.0)
 pkgload       1.3.2   2022-11-16 [1] CRAN (R 4.3.0)
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.3.0)
 processx      3.8.1   2023-04-18 [1] CRAN (R 4.3.0)
 profvis       0.3.7   2020-11-02 [1] CRAN (R 4.3.0)
 promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)
 ps            1.7.5   2023-04-18 [1] CRAN (R 4.3.0)
 purrr         1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 Rcpp          1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
 remotes       2.4.2   2021-11-30 [1] CRAN (R 4.3.0)
 rlang         1.1.0   2023-03-14 [1] CRAN (R 4.3.0)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 shiny         1.7.4   2022-12-15 [1] CRAN (R 4.3.0)
 stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
 stringr       1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
 triebeard     0.4.1   2023-03-04 [1] CRAN (R 4.3.0)
 urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.0)
 urltools      1.7.3   2019-04-14 [1] CRAN (R 4.3.0)
 usethis       2.1.6   2022-05-25 [1] CRAN (R 4.3.0)
 vctrs         0.6.2   2023-04-19 [1] CRAN (R 4.3.0)
 whisker       0.4.1   2022-12-05 [1] CRAN (R 4.3.0)
 xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.0)

 [1] /Users/rainerkrug/R/library/aarch64-apple-darwin20/4.3
 [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

─────────────────────────────────────────────────────────────────────────────────────────
>

make sure egs run only if okay

via cran checks

  > if (crul::ok('http://opencitations.net/sparql')) {
    + oc_doi2ids("10.1097/igc.0000000000000609")
... removed
    Error: No description found for code: 520

and

 > if (crul::ok("http://opencitations.net/index/coci/api/v1")) {
    + # references
    + oc_coci_refs(doi1)
... removed
    Error in fauxpas::find_error_class(x$status_code) :
     no method found for 520
    Calls: oc_coci_refs -> oc_coci_stub -> oc_GET -> errs -> <Anonymous>

Paginate oc_coci_meta() or warn when there are too many DOIs?

Thanks for this great package!

I am trying to scrape and augment the citations of a bunch of articles - so I am obtaining them with oc_coci_cites and then passing the result into oc_coci_meta. However, with more than about 120 citations, then fails after quite a long time with Request Header Fields Too Large (HTTP 431)

Maybe oc_coci_meta could split the request automatically when too many DOI are requested? Or alternatively, just issue an explicit warning when there are more than say 100? As it stands, it took me rather long too figure out what the problem was (even though the error is already rather suggestive in hindsight.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.