hrbrmstr / sergeant Goto Github PK
View Code? Open in Web Editor NEW:guardsman: Tools to Transform and Query Data with 'Apache' 'Drill'
Home Page: https://hrbrmstr.github.io/sergeant/
License: Other
:guardsman: Tools to Transform and Query Data with 'Apache' 'Drill'
Home Page: https://hrbrmstr.github.io/sergeant/
License: Other
current error reporting on failed queries — in a word — sucks. must improve.
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_Singapore.1252 LC_CTYPE=English_Singapore.1252 LC_MONETARY=English_Singapore.1252
[4] LC_NUMERIC=C LC_TIME=English_Singapore.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.3.0 stringr_1.3.1 purrr_0.2.5 readr_1.1.1 tidyr_0.8.1 tibble_1.4.2
[7] ggplot2_3.0.0 tidyverse_1.2.1 sergeant_0.5.2 dbplyr_1.2.2 dplyr_0.7.6 DBI_1.0.0
[13] rJava_0.9-10
ds <- src_drill(drill_ip)
ds
>src: DrillConnection
>tbls: cp.default, dfs.default, dfs.root, dfs.tmp, INFORMATION_SCHEMA, postgres.information_schema,
postgres.pg_catalog, postgres.postgres, postgres.public, postgres, sys
db <- tbl(ds, "cp.`employee.json`")
db
store_id gender department_id birth_date supervisor_id last_name position_title hire_date
<int> <chr> <int> <date> <int> <chr> <chr> <dttm>
1 0 F 1 1961-08-26 0 Nowmer President 1994-12-01 00:00:00
2 0 M 1 1915-07-03 1 Whelply VP Country Ma~ 1994-12-01 00:00:00
3 0 M 1 1969-06-20 1 Spence VP Country Ma~ 1998-01-01 00:00:00
4 0 F 1 1951-05-10 1 Gutierrez VP Country Ma~ 1998-01-01 00:00:00
5 0 F 2 1942-10-08 1 Damstra VP Informatio~ 1994-12-01 00:00:00
6 0 F 3 1949-03-27 1 Kanagaki VP Human Reso~ 1994-12-01 00:00:00
7 9 F 11 1922-08-10 5 Brunner Store Manager 1998-01-01 00:00:00
8 21 F 11 1979-06-23 5 Blumberg Store Manager 1998-01-01 00:00:00
9 0 M 5 1949-08-26 1 Stanz VP Finance 1994-12-01 00:00:00
10 1 M 11 1967-06-20 5 Murraiin Store Manager 1998-01-01 00:00:00
# ... with more rows, and 8 more variables: management_role <chr>, salary <dbl>, marital_status <chr>,
# full_name <chr>, employee_id <int>, education_level <chr>, first_name <chr>, position_id <int>
drill_version(ds)
Error in is.url(url) : length(url) == 1 is not TRUE
sergeant:::make_server
function (drill_con)
{
sprintf("%s://%s:%s", ifelse(drill_con$ssl[1], "https", "http"),
drill_con$host, drill_con$port)
}
<bytecode: 0x0000000022b58b98>
<environment: namespace:sergeant>
str(ds)
List of 1
$ con:Formal class 'DrillConnection' [package "sergeant"] with 5 slots
.. ..@ host : chr "172.27.141.128"
.. ..@ port : int 8047
.. ..@ ssl : logi FALSE
.. ..@ username: chr(0)
.. ..@ password: chr(0)
- attr(*, "class")= chr [1:3] "src_drill" "src_sql" "src"
Addressing the ssl variable has a problem:
ds$ssl[1]
NULL
# Correct way
ds$con@ssl
FALSE
This code need to be fixed in make_server function.
function (drill_con)
{
sprintf("%s://%s:%s", ifelse(drill_con$con@ssl, "https", "http"),
drill_con$con@host, drill_con$con@port)
}
Hi,
I am encountering problems in installing sergeant.
Upon installing
devtools::install_github("hrbrmstr/sergeant")
I am getting the error below:
Error: package or namespace load failed for 'sergeant':
.onLoad failed in loadNamespace() for 'rJava', details:
call: inDL(x, as.logical(local), as.logical(now), ...)
error: unable to load shared object 'C:/Users/herma/Documents/R/win-library/3.5/rJava/libs/i386/rJava.dll':
LoadLibrary failure: %1 is not a valid Win32 application.
Error: loading failed
Execution halted
*** arch - x64
ERROR: loading failed for 'i386'
version output:
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 5.0
year 2018
month 04
day 23
svn rev 74626
language R
version.string R version 3.5.0 (2018-04-23)
nickname Joy in Playing
As of Drill 1.15.0 we now know the data types of things returned so we can use this to have proper 64-bit integers as well as proper types all 'round.
To translate a query like:
select `Year`,`Month` from dfs.`..../AIRLINE/` LIMIT 5;
I would use something like:
airline %>% select(Year)
while fails with:
warning message:
In .local(res, ...) : PARSE ERROR: Encountered "Year" at line 1, column 9.
Was expecting one of:
...
...
due to missing quotes.
Same for
airline %>% select(`Year`)
Hi Bob,
I tried
drill_jdbc("localhost")
and it returns error:
Error in stop("Package `", package, "` required for `", fun, "`.\n", "Please install and try again.", :
argument "fun" is missing, with no default
Looking at the code in utils.r
try_require <- function(package, fun) {
if (requireNamespace(package, quietly = TRUE)) {
library(package, character.only = TRUE)
return(invisible())
}
stop("Package `", package, "` required for `", fun , "`.\n", # nocov start
"Please install and try again.", call. = FALSE) # nocov end
}
arg fun is required with no default.
When try_require is called it is missing 2nd arg fun.
I suggest set fun = "" in the function definition.
Regards,
Herman
Hi! Maybe I just lack experience with SQL APIs, but I find it odd that a query passed to drill_query
without a terminating semicolon will work but with one will error out:
library(sergeant)
dc <- drill_connection()
dc %>% drill_query('SELECT full_name, salary FROM cp.`employee.json` LIMIT 3')
#> # A tibble: 3 x 2
#> full_name salary
#> * <chr> <dbl>
#> 1 Sheri Nowmer 80000
#> 2 Derrick Whelply 40000
#> 3 Michael Spence 40000
dc %>% drill_query('SELECT full_name, salary FROM cp.`employee.json` LIMIT 3;')
#> Query ==> SELECT full_name, salary FROM cp.`employee.json` LIMIT 3;
#> PARSE ERROR: Encountered ";" at line 1, column 57.
#> Was expecting one of:
#> <EOF>
#> "OFFSET" ...
#> "FETCH" ...
#>
#>
#> SQL Query SELECT full_name, salary FROM cp.`employee.json` LIMIT 3;
#> ^
#>
#>
#> [Error Id: a172e1a9-833c-4d89-9b83-de4e250cd1c5 on 192.168.1.3:31010]
whereas the Drill shell only works with a semicolon (without it expects a continuation of the query):
0: jdbc:drill:zk=local> SELECT full_name, salary FROM cp.`employee.json` LIMIT 3;
+--------------------+----------+
| full_name | salary |
+--------------------+----------+
| Sheri Nowmer | 80000.0 |
| Derrick Whelply | 40000.0 |
| Michael Spence | 40000.0 |
+--------------------+----------+
3 rows selected (0.102 seconds)
and when run through the web interface both are fine.
Looking at query.r
, the whole query is passed via httr::POST
, so the error seems to be a product of how the API interprets queries. I'm not sure if this is intended behavior (the Drill API docs don't use a semicolon, but the SQL docs do), but from an R perspective, it seems like a good idea to either
?drill_query
orquery <- sub(";$", "", query)
or the like to its code so it works regardless.Thoughts?
I am not sure if this is a bug or not yet. But I have a parquet table with columns of datatype BIGINT.
When querying using drill_connection(), they show up as col_integer(). I am not sure what the range of col_integer is, but if it's indeed 4 bytes then there will be loss of data when querying BIGINT columns where value exceeds integers min/max values.
Hi, I am testing sergeant against MapR cluster and to my knowledge we should either be using 31010 or zk port and 8047 is drillbits management page in MapR.
It looks like using sergeant it cannot use 31010 but only 8047. Is this correct?
Hi,
I tried to use the dplyr interface and it seems to fail with "Error in readr::type_convert(out$rows): is.data.frame(df) is not TRUE"
The reprex is below:
#install.packages("sergeant")
library(sergeant)
#> Loading required package: DBI
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Loading required package: dbplyr
#>
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#>
#> ident, sql
library(tidyverse)
# drill server IP Address
drill_ip <- "xxx.xxx.xxx.xxx"
drill_jdbc_port <- "31010"
# Connecting to drill server
# src_drill: dplyr connection
# drill_connection: REST API connection
# drill_jdbc: JDBC connection (if remote, needs zk etc)
# Connect using dplyr
ds_dp <- src_drill(drill_ip)
tbl(ds_dp, "cp.`employee.json`")
#> # Source: table<cp.`employee.json`> [?? x 16]
#> # Database: DrillConnection
#> store_id gender department_id birth_date supervisor_id last_name
#> <dbl> <chr> <dbl> <date> <dbl> <chr>
#> 1 0 F 1 1961-08-26 0 Nowmer
#> 2 0 M 1 1915-07-03 1 Whelply
#> 3 0 M 1 1969-06-20 1 Spence
#> 4 0 F 1 1951-05-10 1 Gutierrez
#> 5 0 F 2 1942-10-08 1 Damstra
#> 6 0 F 3 1949-03-27 1 Kanagaki
#> 7 9 F 11 1922-08-10 5 Brunner
#> 8 21 F 11 1979-06-23 5 Blumberg
#> 9 0 M 5 1949-08-26 1 Stanz
#> 10 1 M 11 1967-06-20 5 Murraiin
#> # ... with more rows, and 10 more variables: position_title <chr>,
#> # hire_date <dttm>, management_role <chr>, salary <dbl>,
#> # marital_status <chr>, full_name <chr>, employee_id <dbl>,
#> # education_level <chr>, first_name <chr>, position_id <dbl>
tbl(ds_dp, "SELECT * FROM dfs.`c:/apps/apache-drill-1.14.0/sample-data/nation.parquet` LIMIT 5")
#> Error in readr::type_convert(out$rows): is.data.frame(df) is not TRUE
ds_rest <- drill_connection(drill_ip)
drill_query(ds_rest,"SELECT * FROM dfs.`c:/apps/apache-drill-1.14.0/sample-data/nation.parquet` LIMIT 5")
#> Parsed with column specification:
#> cols(
#> N_COMMENT = col_character(),
#> N_NAME = col_character(),
#> N_NATIONKEY = col_double(),
#> N_REGIONKEY = col_double()
#> )
#> # A tibble: 5 x 4
#> N_COMMENT N_NAME N_NATIONKEY N_REGIONKEY
#> * <chr> <chr> <dbl> <dbl>
#> 1 haggle. carefully f ALGERIA 0 0
#> 2 al foxes promise sly ARGENTINA 1 1
#> 3 y alongside of the p BRAZIL 2 1
#> 4 eas hang ironic, sil CANADA 3 1
#> 5 y above the carefull EGYPT 4 4
Created on 2019-01-09 by the reprex package (v0.2.1)
Hi! I read in your README that sergeant tools were not designed with authentication/encryption in mind due to the lack of need for support. We have been using sergeant at my work to run drill queries against our hdfs for a while, and are currently in the process of migrating to a new drill endpoint that requires authentication. We would love to continue using the sergeant library to run these queries, so if you have the ability to implement authentication we would greatly appreciate it!
Hi
I'm running Apache Drill in a docker container linked to a rocker/tidyverse container, which doesn't include Java.
Is there any reason why you use RJDBC rather than DBI?
https://issues.apache.org/jira/browse/DRILL-6662 makes it possible to use non-hardcoded creds so it finally makes sense to add some examples of how to query S3 data.
schema
parameter to src_drill()
so folks can just use table identifiers in tbl()
?Hi,
In going through your code in
https://rud.is/rpubs/yelp.html#
I am unable to run this code:
tbl(db, "(SELECT b.name AS name, b.hours[5] AS sat_hrs FROM dfs.d.`/yelp/yelp_academic_dataset_business.json` b)") %>%
filter(!is.na(sat_hrs)) %>%
filter(grepl("Saturday", sat_hrs)) %>%
select(name, sat_hrs) %>%
collect() %>%
mutate(sat_hrs = gsub("Saturday ", "", sat_hrs)) %>%
tidyr::separate(sat_hrs, c("open", "close"), "-") %>%
filter(open != "0:0" & close != "0:0")
tbl(db, "(SELECT b.name AS name, b.hours[5] AS sat_hrs FROM dfs.d.`/yelp/yelp_academic_dataset_business.json` b)")
returns an error
Error: is.data.frame(df) is not TRUE
tbl does not work with a SQL statement.
Herman
I am eager to try out what described in the blog post about 0.8.0 version, but after having installed as instructed
devtools::install_git("https://git.sr.ht/~hrbrmstr/sergeant", ref="0.8.0")
and then running the code below, I got a failure:
> library(sergeant)
> library(tidyverse)
── Attaching packages ───────────────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.1.0 ✔ purrr 0.2.5
✔ tibble 2.0.1 ✔ dplyr 0.7.8
✔ tidyr 0.8.2 ✔ stringr 1.3.1
✔ readr 1.3.1 ✔ forcats 0.3.0
── Conflicts ──────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
> # start Docker in terminal, then open
> dr <- drill_up(data_dir = here::here("data-raw"))
Drill container started. Waiting for the service to become active (this may take up to 30s).
Error in drill_up(data_dir = here::here("data-raw")) :
Could not connect to Drill container.
>
I have been able to have drill running manually and mapping the local file system to my directory of interest.
On the Docker side I have an old MBP, so I have Docker Toolkit.
(That maybe is the reason for the failure.)
If I run drill
image from the command line I get the following:
> docker run drill/apache-drill (master|✚2✭)
Jan 27, 2019 2:25:49 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)
Apache Drill 1.15.0
"The only truly happy people are children, the creative minority and Drill users."
0: jdbc:drill:zk=local> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
I tried to hello_world
Docker example using stevedore
package and it works...so somewhat Docker (at least the easy bit) seems to work.
I then decided to dig into drill_up
implementation and executed line by line to debug.
> docker <- stevedore::docker_client()
> (drill <- docker$container$run(image = "drill/apache-drill:1.15.0", name = "drill", cmd = "/usr/local/bin/bash", volumes = sprintf("%s:/data", here::here("data-raw"))))
E> Jan 27, 2019 3:01:03 PM org.jline.utils.Log logr
E> WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)
E> Apache Drill 1.15.0
E> "Say hello to my little Drill."
E> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
O> 0: jdbc:drill:zk=local> <docker_run_output>
$container:
<docker_container>
id: 2db3d96f27f55d35017dd0361531b37e25c80fbbe9e01cf731654c15adb1a773
name: drill
$logs:
Jan 27, 2019 3:01:03 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)
Apache Drill 1.15.0
"Say hello to my little Drill."
Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
0: jdbc:drill:zk=local>
>
Then the 30 attempts of drill_active
fail...(here I just show one manual try :-) )
> (drill_con <- drill_connection("localhost"))
<Drill REST API Direct Connection to localhost:8047>
> drill_active(drill_con)
[1] FALSE
I am out of ideas to try to make it work...any directions is welcome.
You write
If there is sufficient desire for an automated downloader and a way to start the drill-embedded server from within R, please file an issue.
Consider the issue filed!
I think this would be great. I've been impressed by how RStudio's packages (blogdown, tinytex, keras, etc) ship with these mini-installers, and it's amazing how much time is saved by sidestepping external-to-R installation steps when I've introduced those tools to students.
I'm thinking through how the drill_up()
interactive experience could be made a bit easier. That function returns a stevedore
object, which the user is responsible for terminating (as it says in the fine documentation 😉). If the user doesn't capture the stevedore object coming back from the drill_up()
call, they have to get()
and then stop()
the container manually, which could be further complicated if they also failed to capture the container ID from the drill_up()
call.
Any thoughts on the best way of documenting a virtuous workflow? Can't protect the user completely here, but I want to make it a bit clearer if a user don't have previous experience with stevedore. Perhaps a more verbose message when drill_up()
is in interactive mode? A vignette? (To be clear, I'm considering writing the PR myself...not just adding things to your queue!)
This may or may not be motivated by this author's first stumblings through the stevedore interface. 😜
The columns order for my table reference seems not to be respected.
The columns order in the CSV files is
snapshot_id,altitude,heading,latitude,longitude,radar_id,speed,squawk
But I would like to see them as:
snapshot_id longitude latitude altitude heading speed radar_id squawk
I tried the following SELECT
in the SQL prompt (it worked):
0: jdbc:drill:> SELECT snapshot_id, longitude, latitude, altitude, heading, speed, CAST(REGEXP_REPLACE(`squawk`, '\r', '') AS int) AS `squawk`, radar_id FROM dfs.fr24.`/*.csv` LIMIT 20;
+--------------+------------+-----------+-----------+----------+--------+---------+-----------+
| snapshot_id | longitude | latitude | altitude | heading | speed | squawk | radar_id |
+--------------+------------+-----------+-----------+----------+--------+---------+-----------+
| 1486340102 | 14.27592 | 50.10523 | 0 | 40 | 0 | 0 | 8520 |
| 1486340568 | 14.27612 | 50.10521 | 0 | 58 | 0 | 0 | 8520 |
| 1486340668 | 14.27592 | 50.10512 | 0 | 247 | 0 | 0 | 8520 |
| 1486340773 | 14.27569 | 50.10515 | 0 | 232 | 0 | 0 | 8520 |
| 1486340828 | 14.27578 | 50.10507 | 0 | 77 | 0 | 0 | 8520 |
| 1486340903 | 14.27590 | 50.10528 | 0 | 41 | 0 | 0 | 8520 |
| 1486341363 | 14.27594 | 50.10523 | 0 | 48 | 0 | 0 | 8520 |
| 1486341828 | 14.27573 | 50.10522 | 0 | 241 | 0 | 0 | 8520 |
| 1486342643 | 14.27581 | 50.10522 | 0 | 120 | 0 | 0 | 8520 |
| 1486342870 | 14.27571 | 50.10522 | 0 | 255 | 0 | 0 | 8520 |
| 1486343421 | 14.27588 | 50.10524 | 0 | 63 | 0 | 0 | 8520 |
| 1486343966 | 14.27568 | 50.10509 | 0 | 312 | 0 | 0 | 8520 |
| 1486344251 | 14.27590 | 50.10522 | 0 | 119 | 0 | 0 | 8520 |
| 1486345301 | 14.27586 | 50.10524 | 0 | 294 | 0 | 0 | 8520 |
| 1486345875 | 14.27606 | 50.10507 | 0 | 123 | 0 | 0 | 8520 |
| 1486345931 | 14.27627 | 50.10509 | 0 | 132 | 0 | 0 | 8520 |
| 1486345961 | 14.27619 | 50.10529 | 0 | 339 | 0 | 0 | 8520 |
| 1486345981 | 14.27595 | 50.10538 | 0 | 297 | 0 | 0 | 8520 |
| 1486346091 | 14.27594 | 50.10522 | 0 | 120 | 0 | 0 | 8520 |
| 1486347131 | 14.27592 | 50.10543 | 0 | 12 | 0 | 0 | 8520 |
+--------------+------------+-----------+-----------+----------+--------+---------+-----------+
20 rows selected (9.943 seconds)
But when I use the same SELECT
as definition of the table reference with sergeant
/dplyr
I only get the column order as in the original CSV files (even if later I try to reorder them with dplyr
).
Here is the sequence of commands in R:
> db <- src_drill("localhost")
> fr24pos <- tbl(db, "(SELECT snapshot_id, longitude, latitude, altitude, heading, speed, CAST(REGEXP_REPLACE(`squawk`, '\r', '') AS int) AS `squawk`, radar_id FROM dfs.fr24.`/*.csv`)")
> fr24pos
# Source: table<(SELECT snapshot_id, longitude, latitude, altitude, heading, speed,
', '') AS int) AS `squawk`, radar_id FROM
# dfs.fr24.`/*.csv`)> [?? x 8]
# Database: DrillConnection
altitude squawk snapshot_id heading latitude radar_id speed longitude
<int> <int> <int> <int> <dbl> <int> <int> <dbl>
1 0 0 1486340102 40 50.1 8520 0 14.3
2 0 0 1486340568 58 50.1 8520 0 14.3
3 0 0 1486340668 247 50.1 8520 0 14.3
4 0 0 1486340773 232 50.1 8520 0 14.3
5 0 0 1486340828 77 50.1 8520 0 14.3
6 0 0 1486340903 41 50.1 8520 0 14.3
7 0 0 1486341363 48 50.1 8520 0 14.3
8 0 0 1486341828 241 50.1 8520 0 14.3
9 0 0 1486342643 120 50.1 8520 0 14.3
10 0 0 1486342870 255 50.1 8520 0 14.3
# ... with more rows
> fr24pos %>% filter(altitude > 1500) %>% head()
# Source: lazy query [?? x 8]
# Database: DrillConnection
altitude squawk snapshot_id heading latitude radar_id speed longitude
<int> <int> <int> <int> <dbl> <int> <int> <dbl>
1 39000 30276 1486339252 114 37.9 230 506 44.0
2 39000 30276 1486339315 114 37.8 230 509 44.2
3 38975 20836 1486339327 114 37.8 230 509 44.2
4 39025 20836 1486339361 114 37.8 230 508 44.2
5 39000 20836 1486339452 114 37.7 230 505 44.6
6 39000 20836 1486339513 131 37.6 230 503 44.7
> fr24pos %>% select(snapshot_id, longitude, latitude, heading, speed, everything()) %>% filter(altitude > 1500) %>% head()
# Source: lazy query [?? x 8]
# Database: DrillConnection
altitude squawk snapshot_id heading latitude radar_id speed longitude
<int> <int> <int> <int> <dbl> <int> <int> <dbl>
1 39000 30276 1486339252 114 37.9 230 506 44.0
2 39000 30276 1486339315 114 37.8 230 509 44.2
3 38975 20836 1486339327 114 37.8 230 509 44.2
4 39025 20836 1486339361 114 37.8 230 508 44.2
5 39000 20836 1486339452 114 37.7 230 505 44.6
6 39000 20836 1486339513 131 37.6 230 503 44.7
I file it here, but maybe it could be an issue with dplyr
or Apache Drill...
Example CSV file in attachment (extension in .txt to get into the issue):
20170206_207761844.csv.txt
PS: I have to explicitly remove "\r" from squawk
(last field in the CSV) to deal with line ending...probably there is a config to do that...
PPS: fr24 is a workspace of dfs with CSV files with headers and I followed "Using Apache Drill with R" recipes to get up an running: THANKS!
Current plan is to use the new metadata field and deliberately convert result sets using the specified field types and provide the following warning (once per R session) whenever BIGINT
is encountered:
Warning message:
One or more columns are of type BIGINT. The sergeant package is in the process of switching to the use of the rapidjsonr package in an effort to provide support for this data type. Until then, BIGINT columns will still be converted to numeric since that's how jsonlite::fromJSON() works.If you really need BIGINT/integer64 support, consider using the R ODBC interface to Apache Drill with the MapR ODBC drivers.
This informational warning will only be shown once per R session.
This will go away once #26 is done.
I could PR a fix for this, but I'm not sure how much it has to do with package version compatibility and how much other refactoring is required related to dbplyr/dplyr updates. Currently using CRAN version of everything.
library(sergeant)
ncdr <- src_drill(host = "localhost", port=22023L)
# The connection appears to work
ncdr
#> src: DrillConnection
#> tbls: INFORMATION_SCHEMA, cp.default, dfs.default, dfs.root, dfs.tmp, sys
#But I can't do a query
snaps <- drill_query(ncdr, "SELECT * FROM INFORMATION_SCHEMA")
#> Error: length(url) == 1 is not TRUE
# It looks like there's no URL to send the query to
sergeant:::make_server(ncdr)
#> character(0)
#This seems to be because `make_server` doesn't get the host url correctly
sergeant:::make_server
#> function (drill_con)
#> {
#> sprintf("%s://%s:%s", ifelse(drill_con$ssl[1], "https", "http"),
#> drill_con$host, drill_con$port)
#> }
#> <environment: namespace:sergeant>
ncdr$host
#> NULL
ncdr$con@host
#> [1] "localhost"
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.3.2 (2016-10-31)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz <NA>
#> date 2017-07-23
#> Packages -----------------------------------------------------------------
#> package * version date source
#> assertthat 0.2.0 2017-04-11 cran (@0.2.0)
#> backports 1.1.0 2017-05-22 cran (@1.1.0)
#> base * 3.3.2 2016-11-01 local
#> bindr 0.1 2016-11-13 CRAN (R 3.3.2)
#> bindrcpp 0.2 2017-06-17 CRAN (R 3.3.2)
#> colorspace 1.3-2 2016-12-14 cran (@1.3-2)
#> curl 2.8.1 2017-07-21 CRAN (R 3.3.2)
#> datasets * 3.3.2 2016-11-01 local
#> DBI * 0.7 2017-06-18 CRAN (R 3.3.2)
#> dbplyr * 1.1.0 2017-06-27 CRAN (R 3.3.2)
#> devtools 1.13.2 2017-06-02 CRAN (R 3.3.2)
#> digest 0.6.12 2017-01-27 CRAN (R 3.3.1)
#> dplyr * 0.7.2 2017-07-20 CRAN (R 3.3.2)
#> evaluate 0.10 2016-10-11 CRAN (R 3.3.1)
#> glue 1.1.1 2017-06-21 CRAN (R 3.3.2)
#> graphics * 3.3.2 2016-11-01 local
#> grDevices * 3.3.2 2016-11-01 local
#> hms 0.3 2016-11-22 cran (@0.3)
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.3.2)
#> httr 1.2.1 2016-07-03 CRAN (R 3.3.2)
#> jsonlite 1.5 2017-06-01 CRAN (R 3.3.2)
#> knitr 1.16 2017-05-18 cran (@1.16)
#> magrittr 1.5 2014-11-22 CRAN (R 3.1.2)
#> memoise 1.0.0.9001 2016-12-13 Github (hadley/memoise@e392c7b)
#> methods * 3.3.2 2016-11-01 local
#> munsell 0.4.3 2016-02-13 CRAN (R 3.3.0)
#> pkgconfig 2.0.1 2017-03-21 CRAN (R 3.3.2)
#> plyr 1.8.4 2016-06-08 CRAN (R 3.3.0)
#> purrr 0.2.2.2 2017-05-11 CRAN (R 3.3.2)
#> R6 2.2.2 2017-06-17 CRAN (R 3.3.2)
#> Rcpp 0.12.12 2017-07-15 CRAN (R 3.3.2)
#> readr 1.1.1 2017-05-16 CRAN (R 3.3.2)
#> rlang 0.1.1 2017-05-18 cran (@0.1.1)
#> rmarkdown 1.3.9002 2017-02-06 Github (rstudio/rmarkdown@c4908dc)
#> rprojroot 1.2 2017-01-16 CRAN (R 3.3.1)
#> scales 0.4.1 2016-11-09 cran (@0.4.1)
#> sergeant * 0.5.2 2017-07-17 CRAN (R 3.3.2)
#> stats * 3.3.2 2016-11-01 local
#> stringi 1.1.5 2017-04-07 cran (@1.1.5)
#> stringr 1.2.0 2017-02-18 cran (@1.2.0)
#> tibble 1.3.3 2017-05-28 CRAN (R 3.3.2)
#> tools 3.3.2 2016-11-01 local
#> utils * 3.3.2 2016-11-01 local
#> withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
#> yaml 2.1.14 2016-11-12 CRAN (R 3.3.1)
Drill Jira: https://issues.apache.org/jira/browse/DRILL-6847
Package Issue Ref: #18 (comment)
Tested under 0.8.0 branch
It looks like the src_drill()
class doesn't support the dbQuoteIdentifier()
method. This keeps the super nifty glue::glue_sql()
function from being as useful as it could be.
library(sergeant)
con <- src_drill()
fields_to_select <- c("first_name", "last_name")
glue::glue_sql("SELECT {`fields_to_select`*} FROM cp.`employee.json` LIMIT 20", .con = con)
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'dbQuoteIdentifier' for signature '"src_drill", "character"'
Created on 2019-01-10 by the reprex package (v0.2.1)
Hi! I saw your posts about this on Twitter and it looked really cool, so I though I'd try it out. However, that involved another bout in the battle to get rJava to compile from source (which I appear to have won this time!), which led me to a sergeant installation error that
Error : object ‘build_sql’ is not exported by 'namespace:dplyr'
which is true for devel-dplyr, as it's been moved to dbplyr. (It's actually the case for most of the database-focused dplyr imports; build_sql
is just the first it runs into.) Reinstalling CRAN-dplyr allows sergeant to be installed, but it still can't be used with devel-dplyr, which is a shame, as the update contains a lot of useful features.
According to the blog post about the dplyr 0.6 release candidate, it was to be submitted to CRAN yesterday. I'm not sure if that happened or quite when it will arrive on the mirrors, but for sergeant to be used at that point sergeant/R/sergeant-package.r
will need to point to dbplyr.
If you want to update it now, I'm happy to make a PR if you want, though it's not a very complicated change, really.
Are dplyr
idioms expected to work if the src
of a tbl()
call is a drill SELECT statement instead of a bare table name? It looks like column name evaluation isn't working as expected in these cases. Reprex of the behavior I see under v0.8.0 is below:
library(sergeant)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
ds <- src_drill()
emp_full <- tbl(ds, "cp.`employee.json`")
emp_full %>% select(full_name)
#> # Source: lazy query [?? x 1]
#> # Database: DrillConnection
#> full_name
#> <chr>
#> 1 Sheri Nowmer
#> 2 Derrick Whelply
#> 3 Michael Spence
#> 4 Maya Gutierrez
#> 5 Roberta Damstra
#> 6 Rebecca Kanagaki
#> 7 Kim Brunner
#> 8 Brenda Blumberg
#> 9 Darren Stanz
#> 10 Jonathan Murraiin
#> # ... with more rows
emp_partial <- tbl(ds, "SELECT full_name from cp.`employee.json`") #edits? we don't need no stinkin' edits!
emp_partial %>% select(full_name)
#> Error in .f(.x[[i]], ...): object 'full_name' not found
Created on 2019-01-18 by the reprex package (v0.2.1)
I had installed the sergeant package but could not figure out how to use it.
Meanwhile, strange errors started appearing mostly when I am inside the debugger and press Q. I used to get sergeant package related errors. Even though I had nothing (no code) on sergeant in my source file. So I decided to uninstall it.
Even after uninstalling the package following warning comes on sourcing my R file.
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘sergeant’
And to top it now on entering the Q command in the Rstudio debugger the sergeant package is loaded - twice. Where is the package coming from? And how is it called by just the Q command?
I'm running into an issue where session settings don't see to stick when they're submitted through sergeant. The problem seems to be with the REST api itself, not with sergeant.
Here's an example:
# Packages
library(sergeant)
#> Loading required package: DBI
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Loading required package: dbplyr
#>
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#>
#> ident, sql
# Connect to Drill.
con <- drill_connection()
# Alternatively with drill_use.
drill_use(con, "dfs.tmp")
#> Parsed with column specification:
#> cols(
#> summary = col_character(),
#> ok = col_character()
#> )
#> Default schema changed to [dfs.tmp]true
drill_query(con, "CREATE TABLE employee as SELECT * FROM cp.`employee.json`")
#> Query ==> CREATE TABLE employee as SELECT * FROM cp.`employee.json`
#> VALIDATION ERROR: Root schema is immutable. Creating or dropping tables/views is not allowed in root schema.Select a schema using 'USE schema' command.
#>
#>
#> [Error Id: d4d341b3-07da-4562-813e-67f0ef89c028 on DESKTOP-73M5KKL.mshome.net:31010]
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.4.3 (2017-11-30)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.1252
#> tz America/New_York
#> date 2018-03-01
#> Packages -----------------------------------------------------------------
#> package * version date source
#> assertthat 0.2.0 2017-04-11 CRAN (R 3.4.3)
#> backports 1.1.2 2017-12-13 CRAN (R 3.4.3)
#> base * 3.4.3 2017-12-06 local
#> bindr 0.1 2016-11-13 CRAN (R 3.4.3)
#> bindrcpp 0.2 2017-06-17 CRAN (R 3.4.3)
#> colorspace 1.3-2 2016-12-14 CRAN (R 3.4.3)
#> compiler 3.4.3 2017-12-06 local
#> curl 3.1 2017-12-12 CRAN (R 3.4.3)
#> datasets * 3.4.3 2017-12-06 local
#> DBI * 0.7 2017-06-18 CRAN (R 3.4.3)
#> dbplyr * 1.2.1 2018-02-19 CRAN (R 3.4.3)
#> devtools 1.13.5 2018-02-18 CRAN (R 3.4.3)
#> digest 0.6.15 2018-01-28 CRAN (R 3.4.3)
#> dplyr * 0.7.4 2017-09-28 CRAN (R 3.4.3)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.4.3)
#> glue 1.2.0 2017-10-29 CRAN (R 3.4.3)
#> graphics * 3.4.3 2017-12-06 local
#> grDevices * 3.4.3 2017-12-06 local
#> hms 0.4.1 2018-01-24 CRAN (R 3.4.3)
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.4.3)
#> httr 1.3.1 2017-08-20 CRAN (R 3.4.3)
#> jsonlite 1.5 2017-06-01 CRAN (R 3.4.3)
#> knitr 1.20 2018-02-20 CRAN (R 3.4.3)
#> magrittr 1.5 2014-11-22 CRAN (R 3.4.3)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.4.3)
#> methods * 3.4.3 2017-12-06 local
#> munsell 0.4.3 2016-02-13 CRAN (R 3.4.3)
#> pillar 1.1.0 2018-01-14 CRAN (R 3.4.3)
#> pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.3)
#> plyr 1.8.4 2016-06-08 CRAN (R 3.4.3)
#> purrr 0.2.4 2017-10-18 CRAN (R 3.4.3)
#> R6 2.2.2 2017-06-17 CRAN (R 3.4.3)
#> Rcpp 0.12.15 2018-01-20 CRAN (R 3.4.3)
#> readr 1.1.1 2017-05-16 CRAN (R 3.4.3)
#> rlang 0.2.0 2018-02-20 CRAN (R 3.4.3)
#> rmarkdown 1.8 2017-11-17 CRAN (R 3.4.3)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3)
#> scales 0.5.0 2017-08-24 CRAN (R 3.4.3)
#> sergeant * 0.5.2 2017-07-17 CRAN (R 3.4.3)
#> stats * 3.4.3 2017-12-06 local
#> stringi 1.1.6 2017-11-17 CRAN (R 3.4.2)
#> stringr 1.3.0 2018-02-19 CRAN (R 3.4.3)
#> tibble 1.4.2 2018-01-22 CRAN (R 3.4.3)
#> tools 3.4.3 2017-12-06 local
#> utils * 3.4.3 2017-12-06 local
#> withr 2.1.1 2017-12-19 CRAN (R 3.4.3)
#> yaml 2.1.16 2017-12-12 CRAN (R 3.4.3)
One workaround is to reference cp.tmp.employee
directly when creating the table, but this makes drill_use
not very useful.
Created on 2018-03-01 by the reprex package (v0.2.0).
It would be nice if the seargeant
package could create new tables as I can do with other database drivers. Something like this:
library(sergeant)
library(dplyr)
conn = dbConnect(Drill())
# Load original data
original_data = tbl(conn, "`dfs.downloads`.`original_data `")
# Do some operation
edited_data = original_data %>%
filter(name = "John")
# Write the result to a table without loading it into R
dbWriteTable(conn, table = "`dfs.downloads`.`edited_data`", values = edited_data )
Not sure exactly what changed, but some version somewhere changed so I'm now getting errors when using the dplyr interface because it's putting double-quotes around the table identifier. Reprex:
> library(sergeant)
> tbl(src_drill('localhost'), 'cp.`employee.json`')
Request failed [500]. Retrying in 1 seconds...
Request failed [500]. Retrying in 2 seconds...
Request failed [500]. Retrying in 1.8 seconds...
Request failed [500]. Retrying in 1.7 seconds...
# Source: table<cp.`employee.json`> [?? x 8]
# Database: DrillConnection
Warning message:
PARSE ERROR: Encountered "FROM \"" at line 1, column 1.
SQL Query: SELECT *
FROM "cp.`employee.json`"
^
LIMIT 11
Original Query:
1: SELECT *
2: FROM "cp.`employee.json`"
3: LIMIT 11
Query Profile Error Link:
http://localhost:8047/profiles/772b3c22-8d04-43d5-b733-9e2ae0bcbb50
> con <- drill_connection('localhost')
If I try to run that query directly, it fails:
> drill_query(
+ con,
+ 'SELECT *
+ FROM "cp.`employee.json`"
+ LIMIT 11'
+ )
|======================================================================| 100%
Request failed [500]. Retrying in 1.4 seconds...
|======================================================================| 100%
Request failed [500]. Retrying in 1.9 seconds...
|======================================================================| 100%
Query ==> SELECT * FROM "cp.`employee.json`" LIMIT 11
PARSE ERROR: Encountered "FROM \"" at line 1, column 6.
SQL Query: SELECT *
FROM "cp.`employee.json`"
^
LIMIT 11
[Error Id: dc495019-3219-4b88-b5f8-9a4fd657dc0f ]
whereas if I remove the quotes around cp.`employee.json`
it works:
> drill_query(
+ con,
+ 'SELECT *
+ FROM cp.`employee.json`
+ LIMIT 11'
+ )
|======================================================================| 100%
# A tibble: 11 x 16
employee_id full_name first_name last_name position_id position_title
<dbl> <chr> <chr> <chr> <dbl> <chr>
1 1 Sheri No… Sheri Nowmer 1 President
2 2 Derrick … Derrick Whelply 2 VP Country Ma…
3 4 Michael … Michael Spence 2 VP Country Ma…
4 5 Maya Gut… Maya Gutierrez 2 VP Country Ma…
5 6 Roberta … Roberta Damstra 3 VP Informatio…
6 7 Rebecca … Rebecca Kanagaki 4 VP Human Reso…
7 8 Kim Brun… Kim Brunner 11 Store Manager
8 9 Brenda B… Brenda Blumberg 11 Store Manager
9 10 Darren S… Darren Stanz 5 VP Finance
10 11 Jonathan… Jonathan Murraiin 11 Store Manager
11 12 Jewel Cr… Jewel Creek 11 Store Manager
# … with 10 more variables: store_id <dbl>, department_id <dbl>,
# birth_date <chr>, hire_date <chr>, salary <dbl>, supervisor_id <dbl>,
# education_level <chr>, marital_status <chr>, gender <chr>,
# management_role <chr>
Results are the same in the Drill shell. I assume this is a result of a version or setting changing somewhere, but so far I can't locate it. What I'm running:
software | version |
---|---|
MacOS | 10.14.6 |
Apache Drill | 1.18.0 |
R | 4.0.3 |
sergeant | 0.9.0 |
dbplyr | 2.0.0 |
dplyr | 1.0.2 |
Happy to write a PR to fix this if this is something that can be fixed here, but so far I can't figure out what has to change.
Hello and thanks for the package! I can't find a way to install it on my system from GitHub, however:
> devtools::install_github("hrbrmstr/sergeant")
Downloading GitHub repo hrbrmstr/sergeant@master
Error in structure(.External(.C_dotTclObjv, objv), class = "tclObj") :
[tcl] bad pad value "2m": must be positive screen distance.
I can use it from CRAN, however:
> installed.packages() %>% as_tibble() %>% filter(Package == "sergeant") %>% flatten() %>% .$Version
[1] "0.5.2"
> library(sergeant)
> db <- src_drill("localhost")
> drill_connection() %>% drill_version()
[1] "1.16.0"
This is my sessionInfo:
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=pt_BR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=pt_BR.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=pt_BR.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=pt_BR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] nvimcom_0.9-82 colorout_1.2-1 usethis_1.5.0 devtools_2.0.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 magrittr_1.5 pkgload_1.0.2 R6_2.4.0
[5] rlang_0.3.4 tcltk_3.6.0 tools_3.6.0 pkgbuild_1.0.3
[9] parallel_3.6.0 sessioninfo_1.1.1 cli_1.1.0 withr_2.1.2
[13] remotes_2.0.4 assertthat_0.2.1 rprojroot_1.3-2 digest_0.6.18
[17] crayon_1.3.4 processx_3.3.1 callr_3.2.0 fs_1.3.1
[21] ps_1.3.0 curl_3.3 testthat_2.1.1 memoise_1.1.0
[25] glue_1.3.1 compiler_3.6.0 desc_1.2.0 backports_1.1.4
[29] prettyunits_1.0.2
Thanks for providing the sergeant
package!
My use case is reading a single column of data from a fairly large parquet file. The column I want to read from the parquet file is called occurrenceId
and the whole column fits into a character vector of length 70M in R where each value is a string of length 41 characters (it is a unique identifier, and I'd like to check it for uniqueness and presence in R). In R the whole column would occupy about 700M in memory when I inspect it with ls()
.
I can do this with sparklyr
but the drill sergeant
's approach is appealing, being more lightweight. I am struggling with an out-of-memory issue, though and I have 16 GB available, which I think should suffice, given the size of the vector in R, so now I'm wondering if this use case is supported or if I'm doing it wrong?
The dataset I'm using is public and can be viewed here https://www.gbif.org/dataset/38b4c89f-584c-41bb-bd8f-cd1def33e92f and it can be downloaded from here in .zip format: http://www.gbif.se/ipt/archive.do?r=artdata
I first tried using vroom
and reading directly from the compressed .zip file (details here tidyverse/vroom#116) but the promise of being able to read directly from a comparatively smaller parquet file and being able to just read the columns I need made me turn to the sergeant
. So in my attempt to read the parquet file I have first converted the .zip to parquet using sparklyr
, like so:
library(sparklyr)
library(dplyr)
# first install spark 2.40 hadoop 2.7 with sparklyr::spark_install()
Sys.setenv("SPARK_MEM" = "12g")
config <- spark_config()
config$`sparklyr.shell.driver-memory` <- '12G'
config$`sparklyr.shell.executor-memory` <- '4G'
config$sparklyr.defaultPackages <- "com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M3"
config$spark.cassandra.cassandra.host <- "localhost"
config$spark.driver.maxResultSize <- "4G"
config$spark.executor.cores <- 3
# is pushdown option TRUE?
sc <- spark_connect(master = "local", config = config)
# for this connection, load all records
system.time(
spark_read_csv(sc, memory = FALSE,
name = "artdata", path = "file:///home/roger/artdata/artdata.tsv", delimiter = "\t")
)
#user system elapsed
#6.154 7.060 1559.874
# generate a parquet file based on the dataframe above
system.time(
spark_write_parquet(
tbl(sc, "artdata"),
"file:///home/roger/artdata/artdata.parquet")
)
#user system elapsed
#14.634 16.586 3816.375
# the parquet-file is 3.8 GB on disk, smaller than the zip
spark_tbl_handle <- spark_read_parquet(sc, memory = FALSE,
"artdata", "file:///home/roger/artdata/artdata.parquet")
has_valid_bor <- function() {
bor <-
spark_tbl_handle %>%
count(basisOfRecord) %>%
collect() %>%
mutate(is_ok = basisOfRecord %in% c(
"humanobservation",
"machineobservation"
)
)
bor %>% pull(is_ok) %>% all
}
n_rowcount <- function() {
spark_tbl_handle %>%
summarise(n = n()) %>%
pull(n)
}
has_valid_id <- function() {
ids <-
spark_tbl_handle %>%
count(occurrenceID) %>%
filter(n > 1, is.na(occurrenceID)) %>%
collect()
nrow(ids) == 0
}
system.time(
has_valid_bor()
)
system.time(
has_valid_id()
)
system.time(
n_rowcount()
)
sort_artdata <- function() {
spark_tbl_handle %>%
arrange(occurrenceID) %>%
head(10) %>%
collect()
}
system.time(
sort_artdata()
)
# sorting in spark takes about 5 minutes...
#user system elapsed
#3.182 1.370 282.698
This gives me a parquet file on disk.
I then proceed to attempt to use the sergeant
to read the occurrenceId column like so:
library(sergeant)
library(tidyverse)
if (Sys.which("docker") == "")
stop("Please install docker first - see https://docs.docker.com/install/")
# install and run official Apache Drill software
system("docker stop drill; docker rm drill; docker run -i --name drill -e DRILL_HEAP=10G -v /home/markus/tmp/artdata:/tmp -p 8047:8047 --detach drill/apache-drill:1.16.0 /bin/bash")
dc <- drill_connection("localhost")
df <- drill_query(dc, "SELECT occurrenceId FROM dfs.`/tmp/artdata.parquet`")
The error message that I get is the following:
|===========================================================================| 100%
Query ==> SELECT occurrenceId FROM dfs.`/tmp/artdata.parquet`
RESOURCE ERROR: There is not enough heap memory to run this query using the web interface.
Please try a query with fewer columns or with a filter or limit condition to limit the data returned.
You can also try an ODBC/JDBC client.
[Error Id: f42e398d-0afb-4931-a800-2a56baaa074c ]
I tried to set the DRILL_HEAP to 10G.
Is this use case supported with the sergeant
?
Any advice on how I should proceed?
Under sergeant master (20190106), csvh files being read in over DBI (src_drill()
) are coming across as character columns, while the same file parsed under CRAN v0.5.0 sergeant is properly coerced into R data types.
Apologies for the lack of reprex here. I'm in the thick of a drill-powered project and wanted to throw this early report up in case this was something related to the rapidjsonr
conversion progress and the new dev branches. If previously unknown, I'll generate a reprex when my hair is a bit less on fire. ;)
Hopefully if you are running into issues with this, you'll see this issue and add your comments here.
Sufficient interest in moving the JDBC bits to a sibling pkg will cause that to happen.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.