cscheid / rgithub Goto Github PK

R bindings for the github API

License: MIT License

R 100.00%

rgithub's Introduction

R Bindings for the Github v3 API

This project is essentially abandoned

Hello there - unfortunately I don't have time to maintain this project anymore. If you're interested in taking over, send me an email and I'm happy to help with the transition.

If you're a user of this library, consider switching to gh or ghapi3, which are both newer and more actively maintained.

Intro

These are bindings for the Github web service API as of version 3. For full documentation, please refer to the Github developer API.

Quickie

> ctx = interactive.login(client.id, client.secret)
> me = get.myself(ctx)
> me$content$public_repos
[1] 17
> star.repository("cscheid", "guitar", ctx)
> unstar.repository("cscheid", "guitar", ctx)

Setting up

In order to use most of their API, Github requires you to register an application. You'll need to do this if you want to change information, repositories, post gists, etc.

Interactive use

If you will authenticate your Github account interactively (using the OAuth dance, for example), you should point the Callback URL to "http://localhost:1410". This is so that interactive.login works: the interactive OAuth login dancing involves redirection to a URL that is specified in advance. httr creates a web server on port 1410 to catch the redirection request and the token that comes with it.

To log in, you will need the Application Client ID and Secret, and you will call the following function in your R script:

> ctx = interactive.login(client.id, client.secret, scopes=c("gist"))

This will open a web browser where the caller of the R script (presumably, you) will be asked to type their Github username and password. Give more scopes to allow the bindings to do more things (more on oAuth scopes).

Noninteractive use

If you have obtained an OAuth access token by some other method, you can pass it directly to create.github.context. In fact, interactive.login does little else than securing access to a token interactively, and then calling create.github.context.

In other words, you should use create.github.context if some other part of your application is in charge of obtaining the OAuth access token.

You can also use create.github.context to create anonymous access contexts. You will be able to make some calls into the API, but not all of them, and you will be much more severely rate-limited. See the documentation for details.

About the Github client secret

DO NOT PUT THE CLIENT SECRET ON A PUBLICLY VISIBLE LOCATION. This is equivalent to an application-wide password. It is used by Github to track API usage, and could be used to impersonate your app. Store it in a safe location (like a locally-encrypted file in your local filesystem), ask the user to type it, or pass it via environment variables from the shell.

API naming and style

All API entry points are very shallow wrappers around Github's API. Once you logged in, all calls to the API take the result of web.login. Examples:

> repos = get.my.repositories(ctx)
> repos = get.my.repositories(ctx, type="owner")

The result of all calls to the API is the request from httr. Most of the time you will want the response contents, accessible through httr's content(). This is automatically parsed into JSON, the message format used throughout's the api:

> me = get.myself(ctx)
> me$content$public_repos
[1] 17

The naming of the function calls is always meant to evoke an order. It starts with a verb and follows with the appropriate objects:

> evts = get.public.events(ctx)
> result = is.repository.starred.by.me("cscheid", "facet")
> star.repository("cscheid", "guitar", ctx)
> unstar.repository("cscheid", "guitar", ctx)

When the Github API specifies "inputs", such as with blob creation, you will send the JSON content as the last parameter, named content:

> create.blob(ctx, "cscheid", "guitar", list(content="This is a new blob", encoding="utf-8"))

When the Github API specifies extra parameters, such as when listing repositories, you will pass parameters directly to the API call, like you've already seen above:

# Get all my repositories, sorted by when I last pushed to them
> repos = get.my.repositories(ctx, type="owner", sort="pushed")

TODO

Keep track of rate limits
Honor event API X-Poll-Interval query rate limits automatically

rgithub's People

Contributors

Stargazers

Watchers

rgithub's Issues

Policy re: examples and tests

API wrapping packages present headaches in terms of runnable examples and tests. They can be slow and there are myriad reasons why they might not work, that don't necessarily reveal a fundamental problem.

Do you have any Big Picture "policy thoughts" before I even contemplate adding examples to function docs or putting some real tests in?

Getting the tree of a commit?

Is it currently possible to get the tree of a commit (e.g. https://developer.github.com/v3/git/trees/#get-a-tree)? Based on my reading of the code it seems like this is not possible, but it would be a very useful feature/

Output formats

What do you think about outputting an R list by default instead of the httr response object? So give the output of content(GET(<params>)) instead of GET(<params>)

There is a tradeoff - beginners will be lost with JSON, whereas advanced users probably do want the raw response back.

Since perhaps beginner R users may not use this package (?) returning JSON perhaps is good.

Code search

We should incorporate this.

https://github.com/blog/1564-code-search-api

The only issue is that it's currently an API in a "preview period" with a mandatory custom media type. It could be done with a one-off, but what would the general mechanism look like in the code? Do we expose a mode that enables these preview period calls? Since the github API seems to be moving rapidly, does it even make sense to have a stable version?

Differences between rgithub and octokit.rb?

I have a question about the rgithub functionality. Using ruby and octokit.rb I can do things like the following:

client = Octokit::Client.new :login => 'username', :password => 'password'
pull = client.pull_request(repo, number)
output = [pull.merged_at, pull.additions, pull.deletions, pull.commits, pull.changed_files]

Meaning that I can directly extract different fields from the JSON that I pull down using the API library. Is something similar possible using rgithub? Whenever I try to extract specific data from the output I end up using either something using grepl() or very complex subsetting of lists e.g. output[[1]][2]$'filename'[[1]][1] etc.

ead1c0d broke the package because GitHub doesn't accept vectors instead of scalars

By default jsonlite encodes scalars as vectors, so the resulting requests are invalid and thus GitHub rejects the requests. Example:

> cat(rjson::toJSON(list(a="foo")),"\n")
{"a":"foo"} 
> cat(jsonlite::toJSON(list(a="foo")),"\n")
{"a":["foo"]}

Either that change has to be reverted or auto_unbox=TRUE must be used with jsonlite.

rgithub should validate JSON content

Different API calls have different requirements for its JSON contents. Currently we just forward the JSON content without any validation. If the JSON is not what github expects, though, sometimes github will answer back with a 500 response and little extra information. So it would be nice to be able to validate the JSON before sending it along, if for no other reason than user convenience.

delete.gist.comment not woking

current implementation of delete.gist.comment is not deleting comments for a particular gist

Pagination?

library(dplyr)
library(github)
ctx <- interactive.login("56b637a5baffac62cad9", "8e107541ae1791259e9987d544ca568633da2ebf")

issues <- get.repository.issues("hadley", "dplyr")
length(issues$content)

I only get 30 issues - how do I get the remaining pages? (Obviously I could just crank up the per_page parameter which solves this problem, but it would be nice to have a general solution)

Managing rate limits?

It would be helpful to have automatic management of rate limits.

I don't know how to implement this across the board, but I have some ideas for a "manual" solution. $x-ratelimit-remaining tracks how many requests you have left and $x-ratelimit-reset is a timestamp formatted as seconds-since-the-epoch. I suppose that if the query is wrapped in a loop one can check when $x-ratelimit-remaining is approaching zero, and then simply implement Sys.sleep() until the rate limit has been reset, and then resume the looping of your queries.

For example:

if (downloaded_api_response$headers$x-ratelimit-remaining < 5)
Sys.sleep(as.integer(files$headers$x-ratelimit-reset)-as.POSIXct( Sys.time() ) %>% as.integer())
else
do stuff

What do you think of this (as a manual solution)?

Seeking future maintainer

It's clear that I don't have the time to update this project to my satisfaction anymore. If you're interested in taking over, please reach out to me.

Permission to host github package on our repo

Thanks for your work on this package, we started using it to reach out and gather some information from github, and it is very helpful. The question is related to #67 in that in order to put our package into an R repo (we have a CRAN-like repo here) the dependencies also need to be in a similar repo. Until this is renamed and put on CRAN, would you be ok with us hosting it in our repo?

Thanks again for the work on this.

Define class and print methods for return objects?

Thanks for an excellent implementation of the github API for R here.

Just thinking that the output would be a little prettier if you defined a class for your standard return list (that has elements "ok", "content", headers", and "code"). Then one might add a few convenience methods for such an object so that it provides somewhat tidier output for the user.

For instance, one might then have a print method for the object, such as this:

print.github <- function(x, ...) sapply(x$content, function(x) cat(as.yaml(x[names(x) != "body"]), cat(x$body)))

(e.g. using things like cat and yaml formatting to make the output print more cleanly to the console for inspection).

Likewise a summary method could offer a more condensed view.

Create an actual test suite

I've been reluctant to create a test suite because we can't test the entry points that require authentication (and the things that modify state on GitHub itself are even uglier to test). But that shouldn't stop us from doing basic tests.

ellipsis vs content parameters

We're going to need to break some behavior in rgithub to fix this, but there's inconsistencies across the API. Some call use ellipsis for the body of a request, like create.repository:

create.repository <- function (..., ctx = get.github.context()) 
  .api.post.request(ctx, c("user", "repos"), body = list(...))

But some calls use a content parameter:

modify.repository <- function(owner, repo, content, ctx = get.github.context())
  .api.patch.request(ctx, c("repos", owner, repo), body=content)

We should pick one of these and go with it.

confusing that GitHub repo `rgithub` = R package `github`?

Perhaps I am the only one to get tripped by this?

But since the repo is named rgithub and the package one will immediately want to load is github, maybe you should mention this in the README.md.

My install and explore sequence started like so:

> library(devtools)
> install_github("cscheid/rgithub")
Installing github repo cscheid/rgithub@master from cscheid
<blah blah blah>
> library(rgithub)
Error in library(rgithub) : there is no package called ‘rgithub’
<much puzzlement … listing my libraries … AHA!>
> library(github)

Looking forward to using the package!

Pagination on starred repositories

Hello,

Thank you for this library.

When I run get.repositories.starred.by.me(ctx) I seem to get only 30 of my starred repositories.
How can I expand to get all my respositories?Pagination parameters?

Thank you.

Allowing for "git" to be an optional argument

Currently the get.commit function reads:

get.commit <- function(owner, repo, sha, ctx = get.github.context())
.api.get.request(ctx, c("repos", owner, repo, "git", "commits", sha))

However, "git" should be an optional argument, since:

https://api.github.com/repos/rubinius/rubinius/git/commits/2d762ce473a0a8a104d80e838a647eb67fa035d7

and

https://api.github.com/repos/rubinius/rubinius/commits/2d762ce473a0a8a104d80e838a647eb67fa035d7

return different things - the latter query gives a list of all the files attached to that commit which is very practical.

My dirty solution is to rewrite the function as such:

get.commit <- function(owner, repo, sha, git, ctx = get.github.context())
.api.get.request(ctx, c("repos", owner, repo, git, "commits", sha))

Which allows you to call either:

get.commit(git = NULL, ctx = get.github.context(), owner = "rubinius", repo = "rubinius", sha = "7ebdd693f1f3f9f194f4ee103f39795eb16b3610")

get.commit(git = "git", ctx = get.github.context(), owner = "rubinius", repo = "rubinius", sha = "7ebdd693f1f3f9f194f4ee103f39795eb16b3610")

This is not a very elegant solution, but I currently don't know a better option - does anyone have a suggestion of a neater way to write it?

Rename to grithub

We're going to rename the library grithub, and push it to CRAN.

get.pull.request.files does not seem to work

When I run:

get.pull.request.files(owner = "rubinius", repo = "rubinius", id = 1860, ctx = get.github.context())

everything is fine. However, when I try to get the commits:

get.pull.request.commits(owner = "rubinius", repo = "rubinius", id = 1860, ctx = get.github.context())

I get the following error:

Error in get.pull.request.commits(owner = "rubinius", repo = "rubinius", :
could not find function ".api.get.request"

Is this user error on my part, or is there something wrong with the function?

Plans for this repo

Nice work. I wonder what your plans are for this repo. Is it mostly for internal use? Or are you planning to maintain long-term and put on CRAN, etc.?

We started to wrap the Github API as well, but we just started. See here.

I'm thinking that if you are planning on maintaining this long-term we should stop development of ours.

Management of NAMESPACE?

How do you feel about using @import, @importFrom, @export from roxygen to build the NAMESPACE file? And about NOT importing the entire namespace of httr, Rook, stringr, and rjson?

Asking if you would be receptive to a PR that uses the roxygen approach more fully to manage NAMESPACE and to be more conservative/selective about the imports.

Error: All components of query must be named

rgithub had been working for me fine but now gives me an error when I try to get my repositories. Bad interaction with another package? Any advice?

> ctx <- create.github.context(access_token = Sys.getenv("GITHUB_TOKEN"),verbose=F)
> my.repos <- get.my.repositories(ctx)
Error: All components of query must be named

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] github_0.9.7    devtools_1.10.0 plyr_1.8.3     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.3     withr_1.0.1     digest_0.6.9    R6_2.1.2        jsonlite_0.9.19 git2r_0.13.1   
 [7] magrittr_1.5    httr_1.1.0      stringi_1.0-1   curl_0.9.5      rstudioapi_0.5  brew_1.0-6     
[13] rjson_0.2.15    tools_3.2.3     stringr_1.0.0   Rook_1.1-1      memoise_1.0.0  
>

Create a set of examples that illustrates how the library works

(This is one half of #36)

create.repository() problems

It is possible that I am not understanding something, but I am having problems with create.repository():

based on the help file I thought that I should run it like this:

create.repository(name="my_new_repo",ctx)

But I get an error

Error: is.null(body) is not TRUE

Same error if I try

create.repository2(content=list(name="my_new_repos"),ctx=ctx)

However I can fix it by modifying the function:

create.repository2 <- function (content, ctx = get.github.context()) 
.api.post.request(ctx, c("user", "repos"), body = content)

environment(create.repository2) <- asNamespace("github")

create.repository2(content=list(name="my_new_repos"),ctx=ctx)
#works!

(original function is):

function (..., ctx = get.github.context()) 
.api.post.request(ctx, c("user", "repos"), body = content)

Is there something I don't understand about how to use the original function, or is this a bug?

Thanks,

Julin

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] github_0.9.6

loaded via a namespace (and not attached):
 [1] bitops_1.0-6    brew_1.0-6      httr_0.6.1      jsonlite_0.9.14 mime_0.2        RCurl_1.95-4.5  rjson_0.2.15    Rook_1.1-1     
 [9] stringr_0.6.2   tools_3.1.2

Support github releases api

I'm trying to add github release support to r-travis (see craigcitro/r-travis#124 for details ). Currently, I'm implementing it in ruby, since rgithub doesn't support that yet. I find supporting releases api in the TODO list. Is there any chance to add the support in the near future? Thanks!

rgithub broken on httr 0.3

Register github app for this package

You don't need to have each user register their own app, and there's no problem with including the app key/secret in the package (the secret isn't actually a secret)

Notebooks creation fails due to bad body schema in github.R/api.request.with.body

Creating a notebook fails due to bad body schema. The github API expects a schema like -

{
  "description": "the description for this gist",
  "public": true,
  "files": {
    "file1.txt": {
      "content": "String file contents"
    }
  }
}

But in the api.request.with.body function in github.R, the schema we use is -

{
   "description":"Notebook 1",
   "public":false,
   "files":{
      "scratch.R":"
   }
}

This leads to a 500 Internal Server Error. This is probably a bug in the github API but sending a schema as expected -

{
   "description":"Notebook 1",
   "public":false,
   "files":{
      "scratch.R":{
         "content":"Lorem Ipsum"
      }
   }
}

solves the problem.

add ellipsis for extra parameters in calls that need it (for pagination, for example)

Whatever we end up doing for #30 will require bugfixes from this issue.

bump version when API changes

The version of rgithub doesn't change despite large incompatible changes in the API so there is no way to identify a working version. Please bump the version when changing the API so dependent package can require the corresponding version.

Use underscores instead of dots

Otherwise it looks like you're defining S3 methods

Creating a repository

This works

me = get.myself(ctx)
me$content$login
[1] "csgillespie"

When I try to create a repo I get:

R> create.repository(name="my_new_repo1",ctx = ctx)
-> POST /user/repos?XXXXX
-> User-Agent: R (3.2.3 x86_64-pc-linux-gnu x86_64 linux-gnu)
-> Host: api.github.com
-> Accept-Encoding: gzip, deflate
-> Accept: application/vnd.github.beta+json
-> Content-Length: 23
-> 
>> {"name":"my_new_repo1"}

<- HTTP/1.1 404 Not Found
<- Server: GitHub.com
<- Date: Thu, 25 Feb 2016 10:04:50 GMT
<- Content-Type: application/json; charset=utf-8
<- Transfer-Encoding: chunked
<- Status: 404 Not Found
<- X-RateLimit-Limit: 5000
<- X-RateLimit-Remaining: 4997
<- X-RateLimit-Reset: 1456398194
<- X-OAuth-Scopes: 
<- X-Accepted-OAuth-Scopes: public_repo, repo
<- X-OAuth-Client-Id: 62db374708d74cfdafa5
<- X-GitHub-Media-Type: github.beta; format=json
<- Access-Control-Allow-Credentials: true
<- Access-Control-Expose-Headers: ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval
<- Access-Control-Allow-Origin: *
<- Content-Security-Policy: default-src 'none'
<- Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
<- X-Content-Type-Options: nosniff
<- X-Frame-Options: deny
<- X-XSS-Protection: 1; mode=block
<- Content-Encoding: gzip
<- X-GitHub-Request-Id: 80F0E120:C8D0:ECB53EA:56CED1C2
<- 
$ok
[1] FALSE

$content
$content$message
[1] "Not Found"

$content$documentation_url
[1] "https://developer.github.com/v3"


$headers
$server
[1] "GitHub.com"

$date
[1] "Thu, 25 Feb 2016 10:04:50 GMT"

$`content-type`
[1] "application/json; charset=utf-8"

$`transfer-encoding`
[1] "chunked"

$status
[1] "404 Not Found"

$`x-ratelimit-limit`
[1] "5000"

$`x-ratelimit-remaining`
[1] "4997"

$`x-ratelimit-reset`
[1] "1456398194"

$`x-oauth-scopes`
[1] ""

$`x-accepted-oauth-scopes`
[1] "public_repo, repo"

$`x-oauth-client-id`
[1] "62db374708d74cfdafa5"

$`x-github-media-type`
[1] "github.beta; format=json"

$`access-control-allow-credentials`
[1] "true"

$`access-control-expose-headers`
[1] "ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval"

$`access-control-allow-origin`
[1] "*"

$`content-security-policy`
[1] "default-src 'none'"

$`strict-transport-security`
[1] "max-age=31536000; includeSubdomains; preload"

$`x-content-type-options`
[1] "nosniff"

$`x-frame-options`
[1] "deny"

$`x-xss-protection`
[1] "1; mode=block"

$`content-encoding`
[1] "gzip"

$`x-github-request-id`
[1] "80F0E120:C8D0:ECB53EA:56CED1C2"

attr(,"class")
[1] "insensitive" "list"       

$code
[1] 404

R> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] github_0.9.7   httr_1.1.0     colorout_1.1-1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.3     mime_0.4        R6_2.1.2        jsonlite_0.9.19 magrittr_1.5    stringi_1.0-1   curl_0.9.5     
 [8] xml2_0.1.2      fortunes_1.5-2  brew_1.0-6      tools_3.2.3     stringr_1.0.0   Rook_1.1-1      httpuv_1.3.3   
[15] openssl_0.9.1

Error handling

Github started sending errors like 422. Rgithub should check and translate these back sensibly.

Look at jsonlite instead of rjson

It has been designed specifically for APIs:

https://public.opencpu.org/posts/jsonlite-a-smarter-json-encoder/

Alternative access mechanism

Using personal access token instead of oauth. devtools uses env var GITHUB_PAT

Pagination

Have you thought about pagination at all?

Requests that return multiple items will be paginated to 30 items by default.

https://developer.github.com/v3/#pagination

I'm not sure what the conventions are for API wrappers and assisting the user to fetch multiple or all pages. But the Ruby wrapper for the GitHub API talks about this in its README:

For smallish resource lists, Octokit provides auto pagination. When this is enabled, calls for paginated resources will fetch and concatenate the results from every page into a single array.

I'm thinking about this because I'm using get.my.repositories() and have realized that by default I'm getting the only the first 30. Before I start playing around with explicit requests for specific pages, I'm wondering if you are contemplating adding any auto pagination to your package…..