ropensci / citecorp Goto Github PK
View Code? Open in Web Editor NEWClient for the Open Citations Corpus
Home Page: https://docs.ropensci.org/citecorp
License: Other
Client for the Open Citations Corpus
Home Page: https://docs.ropensci.org/citecorp
License: Other
๐ @Selbosh!
Do you still intend to become this package's maintainer?
If so do you need any help? For instance an aspect where you'd appreciate some tips, contributions, a PR review? Do you need an invitation to our friendly Slack workspace?
It seems that citecorp
is not working anymore?
See transcript below.
Rainer
> library("citecorp")
> oc_doi2ids("10.1097/igc.0000000000000609")
data frame with 0 columns and 0 rows
> devtools::session_info()
โ Session info โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
setting value
version R version 4.3.1 (2023-06-16)
os macOS Ventura 13.4.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Zurich
date 2023-06-26
pandoc 3.1.3 @ /opt/homebrew/bin/pandoc
โ Packages โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
package * version date (UTC) lib source
cachem 1.0.7 2023-02-24 [1] CRAN (R 4.3.0)
callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.0)
citecorp * 0.3.0 2020-04-16 [1] CRAN (R 4.3.0)
cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
crul 1.4.0 2023-05-17 [1] CRAN (R 4.3.0)
curl 5.0.1 2023-06-07 [1] CRAN (R 4.3.0)
data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.0)
devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.0)
digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
fauxpas 0.5.2 2023-05-03 [1] CRAN (R 4.3.0)
fs 1.6.1 2023-02-06 [1] CRAN (R 4.3.0)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)
htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0)
httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.3.0)
httpuv 1.6.9 2023-02-14 [1] CRAN (R 4.3.0)
jsonlite 1.8.5 2023-06-05 [1] CRAN (R 4.3.0)
later 1.3.0 2021-08-18 [1] CRAN (R 4.3.0)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)
mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.3.0)
pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.3.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.3.0)
processx 3.8.1 2023-04-18 [1] CRAN (R 4.3.0)
profvis 0.3.7 2020-11-02 [1] CRAN (R 4.3.0)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)
ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.0)
purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.3.0)
remotes 2.4.2 2021-11-30 [1] CRAN (R 4.3.0)
rlang 1.1.0 2023-03-14 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
shiny 1.7.4 2022-12-15 [1] CRAN (R 4.3.0)
stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)
triebeard 0.4.1 2023-03-04 [1] CRAN (R 4.3.0)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0)
urltools 1.7.3 2019-04-14 [1] CRAN (R 4.3.0)
usethis 2.1.6 2022-05-25 [1] CRAN (R 4.3.0)
vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.3.0)
whisker 0.4.1 2022-12-05 [1] CRAN (R 4.3.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0)
[1] /Users/rainerkrug/R/library/aarch64-apple-darwin20/4.3
[2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
>
Hi, just starting to play with this package that looks very cool:
library(citecorp)
library(tibble)
pavo1_doi <- "10.1111/2041-210X.12069"
pavo2_doi <- "10.1111/2041-210X.13174"
oc_coci_cites(pavo1_doi)
#> # A tibble: 67 x 7
#> cited timespan citing journal_sc creation oci author_sc
#> * <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 10.1111/โฆ P2Y2M 10.1111/bโฆ no 2015-09โฆ 0200101010136111โฆ no
#> 2 10.1111/โฆ P0Y4M 10.1636/bโฆ no 2013-11 0200106030636110โฆ no
#> 3 10.1111/โฆ P1Y10M 10.1650/cโฆ no 2015-05 0200106050036122โฆ no
#> 4 10.1111/โฆ P3Y7M 10.1186/sโฆ no 2017-02โฆ 0200101080636280โฆ no
#> 5 10.1111/โฆ P5Y5M 10.1155/2โฆ no 2018-12โฆ 0200101050536020โฆ no
#> 6 10.1111/โฆ P5Y4M 10.1111/eโฆ no 2018-11โฆ 0200101010136142โฆ no
#> 7 10.1111/โฆ P5Y7M 10.1111/eโฆ no 2019-02โฆ 0200101010136142โฆ no
#> 8 10.1111/โฆ P2Y6M 10.1002/eโฆ no 2016-01โฆ 0200100000236141โฆ no
#> 9 10.1111/โฆ P3Y10M 10.1101/1โฆ no 2017-05โฆ 0200101000136010โฆ no
#> 10 10.1111/โฆ P4Y1M 10.1101/1โฆ no 2017-08โฆ 0200101000136010โฆ no
#> # โฆ with 57 more rows
oc_coci_cites(pavo2_doi)
#> # A tibble: 4 x 7
#> cited timespan citing journal_sc creation oci author_sc
#> * <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 10.1111/โฆ P0Y4M6D 10.7717/โฆ no 2019-08โฆ 02007070107362514โฆ no
#> 2 10.1111/โฆ P0Y2M3D 10.1007/โฆ no 2019-06โฆ 02001000007362801โฆ no
#> 3 10.1111/โฆ -P0Y0M13D 10.1101/โฆ no 2019-03โฆ 02001010001360508โฆ yes
#> 4 10.1111/โฆ P0Y4M7D 10.1101/โฆ no 2019-08โฆ 02001010001360703โฆ no
oc_coci_cites(c(pavo1_doi, pavo2_doi))
#> # A tibble: 0 x 0
Created on 2020-04-08 by the reprex package (v0.3.0)
According to the documentation, it should work:
doi (character) one or more Digital Object Identifiers
but maybe this just applies to oc_coci_meta()
(which does work with multiple DOIs) and not oc_coci_cites()
?
For oc_doi2ids, oc_pmid2ids, and oc_pmcid2ids
The examples in the documentation for the oc_pmid2ids
and pc_pcid2ids
no longer seem to work, not even if you paste the corresponding queries into the Sparql sandbox on the OC web site. Possibly needs re-doing.
Originally posted by @Selbosh in #10 (comment)
Thanks for this great package!
I am trying to scrape and augment the citations of a bunch of articles - so I am obtaining them with oc_coci_cites
and then passing the result into oc_coci_meta
. However, with more than about 120 citations, then fails after quite a long time with Request Header Fields Too Large (HTTP 431)
Maybe oc_coci_meta
could split the request automatically when too many DOI are requested? Or alternatively, just issue an explicit warning when there are more than say 100? As it stands, it took me rather long too figure out what the problem was (even though the error is already rather suggestive in hindsight.)
via cran checks
> if (crul::ok('http://opencitations.net/sparql')) {
+ oc_doi2ids("10.1097/igc.0000000000000609")
... removed
Error: No description found for code: 520
and
> if (crul::ok("http://opencitations.net/index/coci/api/v1")) {
+ # references
+ oc_coci_refs(doi1)
... removed
Error in fauxpas::find_error_class(x$status_code) :
no method found for 520
Calls: oc_coci_refs -> oc_coci_stub -> oc_GET -> errs -> <Anonymous>
I try to convert DOIs to IDs, without success. And I know for a fact that these DOIs are on the Open Citations Corpus, because that's where I found them!
library(citecorp)
oc_doi2ids('10.1093/biomet/79.3.531')
oc_doi2ids('10.1093/biomet/80.3.527')
Error returned in each case:
Error in data.frame(type = gsub("\\.type", "", names(tmp[, grep("\\.type", :
arguments imply differing number of rows: 0, 1
This is a common bug due to an unexpected behaviour in how subsetting data frames works. That is, if you subset a data frame and the result is one column, it is automatically collapsed to a vector (not a one-column data frame) unless you specify drop = FALSE
.
Here is the culprit
https://github.com/ropenscilabs/citecorp/blob/32929e8a8504652c8f767343166dc1c7ee7f5537/R/oc_lookup.R#L9
And the bug is triggered whenever the preceding tmp
variable contains exactly one column with the suffix .type
, for example
paper.type paper.value
1 uri https://w3id.org/oc/corpus/br/5902173
because if you run the above code on this, you get character(0)
as a result, which is not what you want.
Whilst I could add , drop = FALSE
I would take the opportunity to simplify the code instead. The following works on the examples above.
tmp <- data.frame(
type = gsub('\\.type', '', grep('\\.type', names(tmp), value = TRUE)),
value = unname(unlist(tmp[, grep('\\.value', names(tmp))])),
stringsAsFactors = FALSE
)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.