Giter Club home page Giter Club logo

bsve's Introduction

bsve is an R package for developers to access the BSVE API

###Installation

The package requires a recent version of openssl to create the hash key, httr and plyr packages

library(devtools)
install_github("cstubben/bsve")

###Connecting

You will need to get the API and secret key from the developer site under My Account -> Manage API Credentials and replace them below.

api_key    <- "API key"
secret_key <- "SECRET key"
email      <- "and your@email"

The bsve_sha1 function creates the authentication header using the two keys and a valid email.


token <- bsve_sha1(api_key, secret_key, email)
token
[1] "apikey=AKcfef08f1-c852-43b9-bce7-8923880e3b68;timestamp=1457989545992;nonce=535514;signature=f6b90ed483b37..."

###4.4 Find All DataSources

If you are not familiar with httr, be sure to check the quickstart guide.

I may add functions to simplify some queries, but for now I will go through each step and include the URL (/api/data/list) and authentication header to GET.

url <- "http://search.bsvecosystem.net/api/data/list"
r <- GET(url, add_headers("harbinger-authentication" = token))
r
Response [http://search.bsvecosystem.net/api/data/list]
  Date: 2016-03-16 21:44
  Status: 200
  Content-Type: application/json;charset=UTF-8
  Size: 72.1 kB
http_status(r)
$category
[1] "success"

Parse the results into a nested list (content uses fromJSON in jsonlite by default).

x <- content(r)

sapply(x, length)
   status   message requestId    result    errors 
        1         1         0        15         0 
names(x$result[[1]])
[1] "name"        "description" "fileTypes"   "label"       "shortLabel"  "fields"     
[7] "dataSources"

Both fields and dataSources are arrays with 0 to many elements, so just grab the first 5 keys and count the number of fields and dataSources below.

b1 <- ldply(lapply( x$result, function(y) y[1:5] ), "data.frame",
stringsAsFactors=FALSE)
b1$fields <- sapply( x$result, function(y) length( y$fields) )
b1$dataSources <- sapply( x$result, function(y) length( y$dataSources) )

b1[, c(1:2,6:7)]
              name                             description fields dataSources
1           AFCENT                            Clinic Visit     39           0
2   Demo Data Type Testing addition of data to repository.      2           0
3            EIDSS                         EIDSS Flat File     22           0
4  HydraSourceType          Test dataSource type for hydra      4           2
5        HydraType          Test dataSource type for hydra      4           1
6        Leidos Wx          Weather for use by Leidos apps      2           0
7         LeidosWx          Weather for use by Leidos apps      2           0
8              NDD                           NDD Flat File     20           0
9              PON                           PON Flat File     31           1
10             RSS                                RSS Feed     18          65
11    RSS_FLATFILE                           RSS Flat File      0           0
12              SD                     Syndromic Flat File     15           0
13            SODA                          SODA Flat File      0          15
14         TWITTER                       Twitter Flat File      0           1
15       WEBSEARCH                              Web Search      0           0

###Notes on parsing lists with NULLs

If you want to list fields or dataSources, ldply will return an error if NULLs are present. I have included a few options below to avoid NULLs and parse the RSS fields in element 10.

OPTION 1. Remove nulls before combining

If you don't know where NULLs are found, you can remove them using Filter

y <- lapply(x$result[[10]]$fields, Filter, f = Negate(is.null))
ldply(y, "data.frame")
        name    type
1      cases  String
2     deaths  String
3       when  String
4       link  String
5      where  String
6  longitude   Float
7  simulated Boolean
8   latitude   Float
9         id  String
10    source  String
...

OPTION 2. Skip NULL fields

When you figure out where NULLs are located, you can skip those keys.

ldply( lapply(x$result[[10]]$fields, "[",  1:2), "data.frame")

OPTION 3. USE do.call

There are two problems with do.call. First, if a tag if missing, I think it will fill the row by silenting repeating values. Second, while the table looks correct, each column is actually a list

as.data.frame(do.call("rbind", x$result[[10]]$fields ))
    name   type format description
1  cases String   NULL        NULL
2 deaths String   NULL        NULL
3   when String   NULL        NULL
...

OPTION 4. Use RJSONIO and replace NULLs with NAs

The RJSONIO package has a nullValue option that lets you replace NULLs with NAs. This returns a different nested list than jsonlite with named vectors instead of lists, so use rbind.

x1 <- RJSONIO::fromJSON(content(r, "text"), nullValue=NA)
str(x1$result[[10]]$fields[[1]])
       name        type      format description 
    "cases"    "String"          NA          NA 
ldply( x1$result[[10]]$fields, "rbind")  
    name   type format description
1  cases String   <NA>        <NA>
2 deaths String   <NA>        <NA>
3   when String   <NA>        <NA>

You can list the 65 RSS dataSources below. Only the first 3 keys have non-NULL values and fields is an empty list except in SODA, which has an array with 4 elements like the result fields above.

names(x$result[[10]]$dataSources[[1]])
[1] "name"        "category"    "description" "feedType"    "selected"    "status"     
[7] "fields"     

ldply( lapply(x$result[[10]]$dataSources, "[",  1:2), "data.frame")
                                           name        category
1                            Agriculture Canada   Expert Domain
2         Agrifeeds Animal Diseases and Control   Expert Domain
3 AgriFeeds News on Phytosanitary Measures IPPC   Expert Domain
4 AgriFeeds News on Plant Pathology and Disease   Expert Domain
5                        Agrifeeds Pest Control   Expert Domain
6                           AP Top Science News Non-Domain News
...

###4.6 Querying the Datasource API

Use api/data/query/{data} to query RSS or other datasets. I'm not familiar with all the query options, so please send me examples or post them to issues.

url1 <- "http://search.bsvecosystem.net/api/data/query/RSS"

r1 <- GET(url1,  add_headers("harbinger-authentication" = token),
  query = list(`$source` = "CDC MMWR Reports",  `$filter`="pubDate ge 2016-02-01") )

x1 <- content(r1)
x1[!sapply(x1, is.null)]
$status
[1] 0

$message
[1] "In Progress"

$requestId
[1] "abfe96a8-1ce8-4336-8abb-4ce89550e204"

$query
$query$type
[1] "RSS"

$query$sources
[1] "CDC MMWR Reports"

$query$filter
[1] "pubDate ge 2016-02-01"
...

###4.7 Getting Datasource Results

You need the requestId above to download the results. I have not looked at this carefully, but I did figure out how to find titles and dates (and these nested lists are way too complicated). The dates can be converted using as.POSIXct.

url2 <- "http://search.bsvecosystem.net/api/data/result/"
r2 <-  GET( paste0(url2, x1$requestId), add_headers("harbinger-authentication" = token))
x2 <- content(r2)

sapply(x2, length)

names(x2$result[[1]]$hits)
[1] "found"                "start"                "hit"                 
[4] "additionalProperties"

sapply(x2$result[[1]]$hits$hit, function(y) y$data$title[[1]])
[1] "EARLY RELEASE: Vital Signs: Preventing Antibiotic-Resistant Infections in Hospitals - United States, 2014" 
[2] "SUPPLEMENTS: Development of the Community Health Improvement Navigator Database of Interventions" 
[3] "RECOMMENDATIONS AND REPORTS: CDC Guideline for Prescribing Opioids for Chronic Pain - United States, 2016" 
[4] "EARLY RELEASE: Transmission of Zika Virus Through Sexual Contact with Travelers to Areas of Ongoing Transmission - Continental United States, 2016"
...

z <- sapply(x2$result[[1]]$hits$hit, function(y) y$data$pubdate[[1]]) 
[1] "1457112660000" "1456509505000" "1457112600000" "1456511400000" "1456507345000"
[6] "1456500085000" "1456511400000"

as.POSIXct( round(as.numeric(z)/1000), origin = "1970-01-01")
[1] "2016-03-04 10:31:00 MST" "2016-02-25 10:58:25 MST" "2016-03-04 10:30:00 MST" ...

These last two steps are combined in the get_bsve function. Without a filter, all 794 titles are returned. The function includes API options for top, skip and orderby, but currently they do not seem to change the results!

x1 <- get_bsve(token, "RSS", source ="CDC MMWR Reports")
x2 <- get_bsve(token, "RSS", source ="CDC MMWR Reports", filter="pubdate ge 2016-02-01")
## same 7 unsorted titles as x2
x3 <- get_bsve(token, "RSS", source ="CDC MMWR Reports", filter="pubdate ge 2016-02-01", orderby="pubdate DESC", top=5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.