Giter Club home page Giter Club logo

biocpkgtools's Introduction

BiocPkgTools

R-CMD-check

Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R in a tidy data format. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.

Functionality includes access to computable versions of:

  • Download statistics
  • Detailed package information
  • Package dependendencies (and reverse dependencies)
  • Package BiocViews categories
  • Build reports
  • Vignettes (including examples of text mining)

biocpkgtools's People

Contributors

csoneson avatar felixernst avatar grimbough avatar hpages avatar jwokaty avatar link-ny avatar lshep avatar mtmorgan avatar nturaga avatar rcastelo avatar seandavi avatar shians avatar vjcitn avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

biocpkgtools's Issues

Error when stats page is not available

Many thanks for this great package. I found a problem with biocDownloadStats:

BiocManager::valid()
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#>     BioCsoft: https://bioconductor.org/packages/3.18/bioc
#>     CRAN: https://ftp.cixug.es/CRAN
#> [1] TRUE
BiocManager::version()
#> [1] '3.18'
packageVersion("BiocPkgTools")
#> [1] '1.20.0'
BiocPkgTools::biocDownloadStats()
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#>     BioCsoft: https://bioconductor.org/packages/3.18/bioc
#>     CRAN: https://ftp.cixug.es/CRAN
#> Error in if (identical(nrow(bquery), 1L) && bfcneedsupdate(bfc, bquery[["rid"]])) tryCatch({: missing value where TRUE/FALSE needed

Created on 2024-01-22 with reprex v2.1.0

It fails because https://bioconductor.org/packages/stats/data-experiment/experiment_pkg_stats.tab is not (currently) available.
I am not sure if this is due to the recent problems with the Bioconductor stats or something else.
But in these cases bfcneedsupdate(bfc, bquery[["rid"]]) from .cache_read returns NA in the condition.

I am not sure if I should report that to BiocFileCache.

Push to Bioconductor?

@vjcitn, @lshep, do the two of you think it is worth pushing this into Bioconductor as a package? Right now, it is "unofficial", but I think it could be useful and might get a little input from other users.

edges function

@Shians @seandavi
Can you specify what edges function you are using?
Thanks,
Marcel

library(BiocPkgTools)
#> Loading required package: htmlwidgets
#> Registered S3 method overwritten by 'rvest':
#>   method            from
#>   read_xml.response xml2
ll <- biocPkgList()
#> Error in edges(biocViewsTC): could not find function "edges"
sessionInfo()
#> R version 3.6.0 RC (2019-04-19 r76406)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] BiocPkgTools_1.1.9 htmlwidgets_1.3   
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.1          compiler_3.6.0      pillar_1.3.1       
#>  [4] BiocManager_1.30.4  highr_0.8           tools_3.6.0        
#>  [7] digest_0.6.18       jsonlite_1.6        evaluate_0.13      
#> [10] tibble_2.1.1        pkgconfig_2.0.2     rlang_0.3.4        
#> [13] graph_1.61.1        rex_1.1.2           igraph_1.2.4.1     
#> [16] yaml_2.2.0          parallel_3.6.0      xfun_0.6           
#> [19] dplyr_0.8.0.1       stringr_1.4.0       httr_1.4.0         
#> [22] xml2_1.2.0          knitr_1.22          hms_0.4.2          
#> [25] stats4_3.6.0        DT_0.5              tidyselect_0.2.5   
#> [28] glue_1.3.1          R6_2.4.0            gh_1.0.1           
#> [31] RBGL_1.59.5         rmarkdown_1.12      tidyr_0.8.3        
#> [34] purrr_0.3.2         readr_1.3.1         magrittr_1.5       
#> [37] htmltools_0.3.6     BiocGenerics_0.29.2 assertthat_0.2.1   
#> [40] rvest_0.3.3         stringi_1.4.3       crayon_1.3.4

Created on 2019-04-22 by the reprex package (v0.2.1)

biocBuildReport("3.9") fails

biocBuildReport("3.9")
Error in [[<-.data.frame(*tmp*, "bioc_version", value = "3.9") :
replacement has 1 row, data has 0

Enter a frame number, or 0 to exit

1: biocBuildReport("3.9")
2: [[<-(*tmp*, "bioc_version", value = "3.9")
3: [[<-.data.frame(*tmp*, "bioc_version", value = "3.9")

works for "3.8"

R Under development (unstable) (2018-11-16 r75612)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocPkgTools_1.1.3 htmlwidgets_1.3    rmarkdown_1.11    

loaded via a namespace (and not attached):
 [1] igraph_1.2.2       rex_1.1.2          Rcpp_1.0.0         knitr_1.21        
 [5] bindr_0.1.1        xml2_1.2.0         magrittr_1.5       hms_0.4.2         
 [9] rvest_0.3.2        tidyselect_0.2.5   R6_2.3.0           rlang_0.3.1       
[13] stringr_1.3.1      httr_1.4.0         dplyr_0.7.8        tools_3.6.0       
[17] DT_0.5             xfun_0.4           htmltools_0.3.6    lazyeval_0.2.1    
[21] digest_0.6.18      assertthat_0.2.0   tibble_2.0.1       crayon_1.3.4      
[25] startup_0.11.0     bindrcpp_0.2.2     tidyr_0.8.2        readr_1.3.1       
[29] purrr_0.2.5        BiocManager_1.30.4 curl_3.2           glue_1.3.0        
[33] evaluate_0.12      stringi_1.2.4      compiler_3.6.0     pillar_1.3.1      
[37] jsonlite_1.6       pkgconfig_2.0.2   

Author names a bit of a mess

Unfortunately the authors field is a bit of a Wild West in terms of what people put in there. At the moment the Authors field is a list of character which is the way to go, however each token in the list should try to represent an author. This is currently not the case since tokens are split by "\\s?,\\s?" which gets tripped up by [aut, ctb] or the word "and", and retains all sorts of weird things like emails, orchid profile links and whatever weirdness people have decided to insert.

This leads to a curious lineup of Bioconductor's most prolific authors.

library(BiocPkgTools)

x <- biocPkgList()

x$Author %>%
    unlist() %>%
    table() %>%
    sort(decreasing = TRUE) %>%
    head()

## cre] aut] cph]  cre ths] ctb] 
## 298   39   27   26   21   16 

I'm hoping to clean this up a bit with an upcoming pull request, it's essentially impossible to do this well in general since there are packages that have decided to split the author list by periods and many contain sentences explaining an author's contribution. But at the very least it's not too hard to take care of the simple cases. At the moment this is my prototype

library(BiocPkgTools)
library(magrittr)
library(purrr)
library(stringr)

x <- biocPkgList()

# reassemble authors for re-processing
authors <- x$Author %>%
    map_chr(function(x) paste(x, collapse = ", "))

authors_cleaned <- authors %>%
    str_replace_all("\n", " ") %>%
    str_remove_all("\\[.*?\\]") %>%
    str_remove_all("<.*?>") %>%
    str_remove_all("\\(.*?\\)") %>%
    str_squish() %>%
    str_replace_all("\\w* contributions ?\\w*", ", ") %>%
    str_replace_all("\\sand\\s", ", ") %>%
    str_replace_all(",\\s+,", ",") %>%
    str_replace_all(",+", ",")

authors_cleaned %>%
    sample(200)

authors_cleaned will behave a bit better with str_split

authors_list <- map(
    authors_cleaned,
    function(x) {
        str_trim(str_split(x, ",", simplify = TRUE))
    }
)

authors_list %>% unlist() %>% table() %>% sort(decreasing = TRUE) %>% head()
##   Martin Morgan   Wolfgang Huber Robert Gentleman    Laurent Gatto     R. Gentleman      Hervé Pagès 
##              31               21               20               18               17               16 

Cache can lead to stale results

I don't have an example, but I was seeing 0's for downloads for recent months and then realized I needed to remove the rows in BiocFileCache (manually with bfcremove) in order to re-download the stats.

build_report returns an empty table

Don't know what's funky about my (current) connection or setting, but pkgnames at https://github.com/seandavi/BiocPkgTools/blob/master/R/build_status.R#L46 looks like

Browse[2]> pkgnames[1]
[1] "a4 1.26.0Tobias VerbekeLast Commit: e6af2cbLast Changed Date: 2017-10-30 12:39:33 -0500"

(no spaces between fields) so the regex parsing doesn't work. A solution (and better practice anyway?) would be to extract each element using xpath queries, e.g.,

  xpath = '/html/body/table[@class="mainrep"]/tr/td[@rowspan="3"]/b/a'
  pkgnames = html_text(html_nodes(dat, xpath=xpath))
> sessionInfo()
R Under development (unstable) (2017-11-06 r73681)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /home/mtmorgan/bin/R-devel/lib/libRblas.so
LAPACK: /home/mtmorgan/bin/R-devel/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocPkgTools_0.1.5   BiocInstaller_1.29.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13     dplyr_0.7.4      assertthat_0.2.0 R6_2.2.2        
 [5] magrittr_1.5     httr_1.3.1       stringi_1.1.6    rlang_0.1.4     
 [9] bindrcpp_0.2     rex_1.1.2        xml2_1.1.1       tools_3.5.0     
[13] stringr_1.2.0    readr_1.1.1      glue_1.2.0       hms_0.3         
[17] compiler_3.5.0   pkgconfig_2.0.1  rvest_0.3.2      bindr_0.1       
[21] tibble_1.3.4    

question about pubyear

Just thinking ahead... right now the pubyear is 2017. When we change to 2018, will all the DOI be regenerated for pubyear 2018 when this happens or will the already generated 2017 be skipped?

Could problemPage also check workflow packages?

Hi,

Would it be possible for problemPage() to also check workflow packages?

As of today (2018-11-16), problemPage() doesn't detect any errors with my packages, but I just got an email from @lshep about the recountWorkflow package that I maintain:

Hello workflow maintainer, 


Your current workflow is failing in Bioconductor 

http://bioconductor.org/checkResults/3.9/workflows-LATEST/recountWorkflow/malbec2-buildsrc.html


Please investigate this ERROR.  

Thanks,
Leo

Info today

> library('BiocPkgTools')
Loading required package: htmlwidgets
> problemPage('Collado', ver = '3.8')
Error in problemPage("Collado", ver = "3.8") : all packages fine
> problemPage('Collado', ver = '3.9')
Error in problemPage("Collado", ver = "3.9") : all packages fine
> options(width = 120)
> sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 3.5.1 (2018-07-02)
 os       macOS Mojave 10.14.1
 system   x86_64, darwin15.6.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2018-11-16Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package      * version date       lib source
 assertthat     0.2.0   2017-04-11 [1] CRAN (R 3.5.0)
 bindr          0.1.1   2018-03-13 [1] CRAN (R 3.5.0)
 bindrcpp       0.2.2   2018-03-29 [1] CRAN (R 3.5.0)
 BiocManager    1.30.3  2018-10-10 [1] CRAN (R 3.5.0)
 BiocPkgTools * 1.0.1   2018-11-05 [1] Bioconductor
 cli            1.0.1   2018-09-25 [1] CRAN (R 3.5.0)
 colorout     * 1.2-0   2018-05-03 [1] Github (jalvesaq/colorout@c42088d)
 crayon         1.3.4   2017-09-16 [1] CRAN (R 3.5.0)
 curl           3.2     2018-03-28 [1] CRAN (R 3.5.0)
 digest         0.6.18  2018-10-10 [1] CRAN (R 3.5.0)
 dplyr          0.7.8   2018-11-10 [1] CRAN (R 3.5.0)
 DT             0.5     2018-11-05 [1] CRAN (R 3.5.0)
 glue           1.3.0   2018-07-17 [1] CRAN (R 3.5.0)
 hms            0.4.2   2018-03-10 [1] CRAN (R 3.5.0)
 htmltools      0.3.6   2017-04-28 [1] CRAN (R 3.5.0)
 htmlwidgets  * 1.3     2018-09-30 [1] CRAN (R 3.5.0)
 httr           1.3.1   2017-08-20 [1] CRAN (R 3.5.0)
 igraph         1.2.2   2018-07-27 [1] CRAN (R 3.5.0)
 jsonlite       1.5     2017-06-01 [1] CRAN (R 3.5.0)
 lazyeval       0.2.1   2017-10-29 [1] CRAN (R 3.5.0)
 magrittr       1.5     2014-11-22 [1] CRAN (R 3.5.0)
 pillar         1.3.0   2018-07-14 [1] CRAN (R 3.5.0)
 pkgconfig      2.0.2   2018-08-16 [1] CRAN (R 3.5.0)
 purrr          0.2.5   2018-05-29 [1] CRAN (R 3.5.0)
 R6             2.3.0   2018-10-04 [1] CRAN (R 3.5.0)
 Rcpp           1.0.0   2018-11-07 [1] CRAN (R 3.5.0)
 readr          1.1.1   2017-05-16 [1] CRAN (R 3.5.0)
 rex            1.1.2   2017-10-19 [1] CRAN (R 3.5.0)
 rlang          0.3.0.1 2018-10-25 [1] CRAN (R 3.5.0)
 rvest          0.3.2   2016-06-17 [1] CRAN (R 3.5.0)
 sessioninfo    1.1.1   2018-11-05 [1] CRAN (R 3.5.0)
 stringi        1.2.4   2018-07-20 [1] CRAN (R 3.5.0)
 stringr        1.3.1   2018-05-10 [1] CRAN (R 3.5.0)
 tibble         1.4.2   2018-01-22 [1] CRAN (R 3.5.0)
 tidyr          0.8.2   2018-10-28 [1] CRAN (R 3.5.0)
 tidyselect     0.2.5   2018-10-11 [1] CRAN (R 3.5.0)
 withr          2.1.2   2018-03-15 [1] CRAN (R 3.5.0)
 xml2           1.2.0   2018-01-24 [1] CRAN (R 3.5.0)

[1] /Library/Frameworks/R.framework/Versions/3.5devel/Resources/library
>

Use of deprecated symbols from igraph

I am writing on behalf of the igraph project. This package uses the from / to symbols when indexing vertex or edge sequences. These names have been soft-deprecated for a long time, and the next version of igraph (to be released soon) will issue an explicit deprecation warning for their use. To fix this, replace them with .from / .to.

unable to run buildPkgDependencyDataFrame

I'm having an issue when building the vignette during R CMD build BiocPkgTools:

```{r}
library(BiocPkgTools)
depdf <- buildPkgDependencyDataFrame(repo=c("BioCsoft", "CRAN"),
dependencies=c("Depends", "Imports"))
depdf
```

I'm using this version of R:

sessionInfo()
R Under development (unstable) (2020-01-03 r77629)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 19.10

Error

Quitting from lines 274-279 (BiocPkgTools.Rmd) 
Error: processing vignette 'BiocPkgTools.Rmd' failed with diagnostics:
ReadItem: unknown type 50, perhaps written by later version of R
--- failed re-buildingBiocPkgTools.RmdSUMMARY: processing the following file failed:BiocPkgTools.RmdError: Vignette re-building failed.
Execution halted

Perhaps this is an issue with a serialized dataset in the that is too old for R devel?
See here:

cranFileUrl <- sprintf("%s/web/packages/packages.rds",
repos[r])

@seandavi @rcastelo

mystery attribution of package "authorship" to me

Browse[5]> y[grep("arey", y$author),]
                 pkg        author version last_commit   last_changed_date
26        affyContam      V. Carey  1.50.0     bb3b0e7 2021-05-19 11:42:08
74      AnVILBilling   Vince Carey   1.2.0     c339a21 2021-05-19 12:57:15
83        arrayMvout      V. Carey  1.50.0     c59319e 2021-05-19 11:42:11
166       BiocOncoTK      VJ Carey  1.12.1     5b4a389 2021-06-28 15:44:05
172      BiocSklearn   Vince Carey  1.14.1     df73d5c 2021-07-28 15:03:53
592    erccdashboard      VJ Carey  1.26.0     cea65f1 2021-05-19 12:06:57
832       GWAS.BAYES      VJ Carey   1.2.0     008101f 2021-05-19 12:57:41
869             HiTC      VJ Carey  1.36.0     012dec8 2021-05-19 11:52:11
879           HubPub      VJ Carey   1.0.0     52515fb 2021-05-19 13:03:37
960             IVAS      VJ Carey  2.12.0     72bbdb6 2021-05-19 12:10:52
968        KEGGlincs      VJ Carey  1.18.0     899d372 2021-05-19 12:23:35
977              LBE      VJ Carey  1.60.0     7b28f58 2021-05-19 11:38:52
1161        mixOmics      V. Carey  6.16.3     759d581 2021-07-27 22:15:58
1310       onlineFDR      VJ Carey   2.0.0     94f9a83 2021-05-19 12:36:15
1343 PanVizGenerator      VJ Carey  1.20.0     8eb2920 2021-05-19 12:17:29
1344         parglms   Vince Carey  1.24.1     0ffcdf6 2021-07-28 17:54:45
1350       PathoStat   Vince Carey  1.18.0     9d6981f 2021-05-19 12:23:43
1410          podkat      VJ Carey  1.24.0     01fa5e3 2021-05-19 12:12:22
1504           rawrr   Vince Carey   1.0.2     be251cd 2021-06-17 06:53:42
1573           rhdf5 Vincent Carey  2.36.0     4dc527f 2021-05-19 11:51:13
1616            roar   Vince Carey  1.28.0     34c7fa7 2021-05-19 12:02:13
1849           ssrch      VJ Carey   1.8.1     104d1ae 2021-07-28 10:35:44
1904       tenXplore      VJ Carey  1.14.1     3c0a0bd 2021-07-28 05:09:43
1911         TFutils Vincent Carey  1.12.2     df013ae 2021-08-03 17:29:35
2012          vtpnet      VJ Carey  0.32.0     adfd187 2021-05-19 11:59:05

GWAS.BAYES, roar, rhdf5, pathoStat and others are ... not "mine"??

Testing new DOI creation by Bioc core team

Hi, @lshep.

library(BiocPkgTools)
pl = getBiocPkgList()
z = mapply(FUN = generateBiocPkgDOI,pl$Package,pl$Author,2017,testing=TRUE)

Run it with testing=TRUE. Then, you can login to https://ezid.lib.purdue.edu/ with the username/password combo of apitest/apitest to list the DOIs that are created. You can get a sense of how things go for you.

Ideally, when running for "real", DOIs are updated if they exist and created if not. I need to give you the credentials for running for real, which we can do offline.

firstInBioc(): Error in desc(Date) : could not find function "desc"

Issue

> library(BiocPkgTools)
Loading required package: htmlwidgets
> dlstats <- biocDownloadStats()
> firstInBioc(dlstats)
Error in desc(Date) : could not find function "desc"

Workaround

This is probably because dplyr::desc is not imported from, because:

> desc <- dplyr::desc
> firstInBioc(dlstats)
# A tibble: 4,997 x 7
# Groups:   Package [4,980]
   Package        Year Month Nb_of_distinct_INb_of_downloads repo   Date      
   <chr>         <int> <chr>             <int>           <int> <chr>  <date>    
 1 ABarray        2009 Jan                 254             367 Softw2009-01-01
 2 ACME           2009 Jan                 169             270 Softw2009-01-01
 3 AffyCompatib2009 Jan                 195             287 Softw2009-01-01
 4 AffyExpress    2009 Jan                 176             248 Softw2009-01-01
 5 AffyTiling     2009 Jan                  14              22 Softw2009-01-01
 6 Agi4x44PrePr2009 Jan                 131             193 Softw2009-01-01
 7 AgiMicroRna    2009 Jan                   0               0 Softw2009-01-01
 8 AnnBuilder     2009 Jan                 270             389 Softw2009-01-01
 9 AnnotationDbi  2009 Jan                2923            4307 Softw2009-01-01
10 ArrayExpress   2009 Jan                 186             281 Softw2009-01-01
# … with 4,987 more rows

Troubleshooting

R CMD check --as-cran reports on this and other issues suggesting that, say, 'igraph' is assumed to be attached:

* checking R code for possible problems ... NOTE
inducedSubgraphByPkgs: warning in induced_subgraph(g, v = pkgs):
  partial argument match of 'v' to 'vids'
subgraphByDegree: warning in induced_subgraph(g, v = names(d2)):
  partial argument match of 'v' to 'vids'
.computeBiocViewsTransitiveClosure: no visible binding for global
  variable ‘biocViewsVocab’
biocBuildReport: no visible binding for global variable ‘start’
biocBuildReport: no visible global function definition for ‘capture’
biocBuildReport: no visible global function definition for
  ‘except_any_of’
biocBuildReport: no visible binding for global variable ‘blank’
biocBuildReport: no visible binding for global variable ‘anything’
biocBuildReport: no visible binding for global variable ‘any_alnums’
biocBuildReport: no visible global function definition for ‘maybe’
biocBuildReport: no visible binding for global variable ‘any_blanks’
biocBuildReport: no visible binding for global variable ‘any_alphas’
biocBuildReport: no visible binding for global variable
  ‘any_non_alnums’
biocBuildReport: no visible global function definition for ‘any_of’
biocBuildReport: no visible binding for global variable ‘digit’
biocDownloadStats: no visible binding for global variable ‘Year’
biocDownloadStats: no visible binding for global variable ‘Month’
firstInBioc: no visible binding for global variable ‘Month’
firstInBioc: no visible binding for global variable ‘Package’
firstInBioc: no visible global function definition for ‘desc’  <========
firstInBioc: no visible binding for global variable ‘Date’
get_bioc_data: no visible binding for global variable ‘tags’
inducedSubgraphByPkgs: no visible global function definition for ‘V<-’
process_data: no visible binding for global variable ‘Author’
process_data: no visible binding for global variable ‘Package’
process_data: no visible binding for global variable ‘License’
process_data: no visible binding for global variable ‘biocViews’
process_data: no visible binding for global variable ‘Description’
process_data: no visible binding for global variable ‘downloads_month’
process_data: no visible binding for global variable ‘downloads_total’
summarise_dl_stats: no visible binding for global variable ‘Package’
summarise_dl_stats: no visible binding for global variable
  ‘Nb_of_downloads’
Undefined global functions or variables:
  Author Date Description License Month Nb_of_downloads Package V<-
  Year any_alnums any_alphas any_blanks any_non_alnums any_of anything
  biocViews biocViewsVocab blank capture desc digit downloads_month
  downloads_total except_any_of maybe start tags
> sessionInfo()
R version 3.6.0 Patched (2019-05-31 r76629)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocPkgTools_1.2.0 htmlwidgets_1.3   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1          compiler_3.6.0      pillar_1.4.1       
 [4] BiocManager_1.30.4  bitops_1.0-6        tools_3.6.0        
 [7] digest_0.6.19       jsonlite_1.6        tibble_2.1.3       
[10] pkgconfig_2.0.2     rlang_0.3.99.9003   graph_1.62.0       
[13] rex_1.1.2           igraph_1.2.4.1      parallel_3.6.0     
[16] dplyr_0.8.1         httr_1.4.0          stringr_1.4.0      
[19] xml2_1.2.0          hms_0.4.2           stats4_3.6.0       
[22] DT_0.7              tidyselect_0.2.5    glue_1.3.1         
[25] Biobase_2.44.0      R6_2.4.0            gh_1.0.1           
[28] XML_3.98-1.20       RBGL_1.60.0         purrr_0.3.2        
[31] readr_1.3.1         tidyr_0.8.3         magrittr_1.5       
[34] htmltools_0.3.6     biocViews_1.52.2    BiocGenerics_0.30.0
[37] RUnit_0.4.32        assertthat_0.2.1    rvest_0.3.4        
[40] stringi_1.4.3       RCurl_1.95-4.12     crayon_1.3.4

Function to identify all packages not using Authors@R

In a quest to provide better functionality, particularly around ORCID, converting to Authors@R format in DESCRIPTION a valuable step. A function to identify packages in need to change would facilitate

  1. Developing metrics of usage
  2. Support regular emails to maintainers to make the change

F1000 REVIEW: Things I spotted while reviewing

I'm reviewing the F1000 submission. While doing this I spotted a few "random" things related to the package per se. I figured it's more convenient to just post these things here rather than via the review process. If I spot more, I'll post them to this issue.

Help pages

  • A few \title{}:s are in all lower case, e.g. "get bioconductor download stats". Easy to spot if you look at the HTML index page.

  • The help pages use all lower case for the value sections, e.g. "a character string of the email" in help("biocBuildEmail"). Is that intentional?

  • Bioconductor is sometimes referred to as 'bioconductor' (lower case) or just 'bioc' and 'Bioc'.

  • xml -> XML (for the initialism XML)

  • github -> GitHub

  • The help pages could need some more cross links. For example, help(dataciteXMLGenerate) says the value us "an xml element", e.g. exactly what is an "xml element". A link to the corresponding \pkg{XML} or \pkg{xml2} documentation would be helpful.

Vignette

  • Broken URL: https://bioconductor.org/packages/3.9/BiocViews in Section 6.3 'Integration with [BiocViews]' (in the header) - automatically detected by R CMD check --as-cran (awesome flag; hint, hint, nudge, nudge to the Bioconductor community)

closing unused connection 3 (https://bioconductor.org/packages/3.6/bioc/VIEWS)

I get the warning in the title some time after running getBiocPkgList(), since it's an automatic R cleanup I don't have a good way to track down the source of the problem.

My best guess is that the following line doesn't close its connection properly
https://github.com/seandavi/BiocPkgTools/blob/master/R/getBiocPkgList.R#L23

Thanks for the neat package, very helpful for my little widget rather than having to web scrap Bioconductor.

strip email from authors?

Should getPackageInfo() strip email from authors? the output is submitted to generateBiocPkgDOI(). E.g., if I have

Authors: Martin Morgan <[email protected]>

Should the function return my name and email, or just my name?

Fix DOI to bib

Hi,

Thanks for the package DOIs. I have a question about citation formats
from these DOIs.

When using the bioconductor DOIs to programatically pull a citation from
DOI providers in bibtex format, the author field seems to be formatted
incorrectly. I don't really know how the information is given to the
provider, or how that is formatted and parsed, but there seems to be a
hiccup somewhere. For example, if you take the AnnotationHub DOI:

10.18129/B9.bioc.AnnotationHub
https://doi.org/doi:10.18129/B9.bioc.AnnotationHub

And paste this into the DOI citation formatter at crosscite
(https://citation.crosscite.org/), with bibtex formatting style, the
result is:

@Article{Martin Morgan [Cre], Marc Carlson [Ctb], Dan Tenenbaum [Ctb],
Sonali Arora [Ctb]_2017, title={AnnotationHub},
DOI={10.18129/b9.bioc.annotationhub}, publisher={Bioconductor},
author={Martin Morgan [Cre], Marc Carlson [Ctb], Dan Tenenbaum [Ctb],
Sonali Arora [Ctb]}, year={2017}}

When using the jabref DOI puller, I get the same bibtex:

@misc{[Cre]2017,
author = {Martin Morgan [Cre], Marc Carlson [Ctb], Dan Tenenbaum
[Ctb], Sonali Arora [Ctb]},
title = {AnnotationHub},
year = {2017},
doi = {10.18129/b9.bioc.annotationhub},
pages = {-},
publisher = {Bioconductor},
timestamp = {2018-01-05},
}

Jabref doesn't correctly parse this bibtex because the author field is
not formatted correctly in bibtex format. See this page for an
explanation: http://www.tex.ac.uk/FAQ-manyauthor.html

This also leads to the really strange default bibtex keys. This
indicates that however the metadata is getting sent to the provider may
be incorrect, because it's just treating that author field as a single
string so it's not getting parsed correctly into alternative citation
formats. It strikes me that the [Cre]/[Ctb] flags would probably need to
be passed in a separate field, and the authors seem to be not passed in
correctly as individuals but rather as a concatenated string, somehow.

This could either be a problem with the way bioconductor is passing
metadata along, or perhaps it's a problem with crosscite or something?
I'm not sure. Any thoughts?

Nathan Sheffield, PhD
Assistant Professor
Center for Public Health Genomics
University of Virginia
www.databio.org

biocExplorer not embedding in vignette

I don't know what the problem is, but at least on my Mac Chrome browser, including biocExplorer() as an example in the vignette leads to empty plots. Might be a conflict with visNetwork javascript?

parse from DESCRIPTION

I added a pull request for some code I started. I'm not sure if there is a better way. If eventually we want to use this when a package is accepted after review, it will not be in the repo yet. This will allow the needed author and package name to be pulled from a DESCRIPTION file to be used with newBiocPkgDOI

Software BiocViews missing from many packages

pkg_list <- BiocPkgTools::biocPkgList()
has_software_tag <- sapply(pkg_list$biocViews, function(b_view) "Software" %in% b_view))
table(has_software_tag)
## FALSE  TRUE 
## 1301   348 

So only 348 packages have the Software biocView. But on bioconductor.org almost every has the software biocView.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.