Comments (9)
New version on CRAN has this function.
from qs.
update: I was able to speed up my function in case anyone is interested
load_qs <- function(url) qs::qdeserialize(curl::curl_fetch_memory(url)$content)
from qs.
It is a good idea, but CRAN doesn't allow using R-connections directly within C code. Glad you found a workaround!
from qs.
Ah dang CRAN. Before you replied I found what readRDS is actually doing. It should be the below given code block
I assume it's a CRAN exception for base R
from qs.
Is there any update to allow qs::qread to read URLs? Wrapping load_qs inside qs:qread would help a lot.
from qs.
@zecojls Sure it could be put in for a next update, just would like to think about how it looks.
Could you help me prototype this? Here are my thoughts:
I'd prefer to not have curl
as a strict dependency (just to keep requirements at an absolute minimum). Is there a base-R option that's just a performant?
I'm thinking it should be in a separate function such as aqread_url
, because qread
is auto-generated by Rcpp (linking to the C++ code).
from qs.
I was just googling about it and found this qs_from_url function in the nflverse package. I agree that avoiding dependencies is good, but I think curl is pretty active and well-maintained.
from qs.
curl
is great, but it has a system libcurl-dev
requirement which presents an challenge e.g. if you're on a linux workstation where you don't have admin privileges.
So I'm considering two options, use curl and add it as a suggested dependency:
qread_url <- function(url, ...) {
if(<check if curl installed>) {
qs::qdeserialize(curl::curl_fetch_memory(url)$content, ...)
} else {
stop("qread_url requires curl installed")
}
}
Or some base R solution such as:
qread_url <- function(url, ...) {
con <- url(url, mode = "rb", raw = TRUE)
buffer_size <- 10000
data <- ...
while(x <- readBin(con, buffer_size)) {
<append x to data>
...
}
close(con)
qdeserialize(data, ...)
}
from qs.
Well, they are pretty much the same I think (depends on the internet connection). Reading a 13 Mb file from google cloud storage took me around 3 sec in both modes. I think that sticking to base R is great but I'm not sure how it deals with larger files that extrapolate the chunk size. Unfortunately, I have no idea how to recursively download the chunks and append them.
library("qs")
library("curl")
library("tictoc")
options(timeout=240)
qread_url_curl <- function(url, ...) {
if(!require("curl")) {
stop("qread_url requires curl installed")
} else {
qs::qdeserialize(curl::curl_fetch_memory(url)$content, ...)
}
}
qread_url_base <- function(url, ...) {
con <- file(url, "rb", raw = TRUE)
buffer_size <- 2^31-1 # limit from readBin help
x <- readBin(con, what = "raw", n = buffer_size)
close(con)
qs::qdeserialize(x)
}
target.url <- "https://storage.googleapis.com/soilspec4gg-test/test.qs"
# 2.993 sec
tic()
test1 <- qread_url_curl(target.url)
toc()
# 2.991
tic()
test2 <- qread_url_base(target.url)
toc()
from qs.
Related Issues (20)
- can't install 'qs' package on remote server CentOS Linux 7 (Core) HOT 1
- replacement of save.image ? HOT 2
- zstd decompression error - qread() HOT 2
- Documentation and examples, especially for qsave_ptr HOT 1
- Unexpected behavior when loading .qs file HOT 2
- Extra memory usage when loading an object twice HOT 2
- Slowly when using multiple threads HOT 1
- ALTREP serialization and deserialization methods are ignored HOT 20
- Rocky 8 Linux: ld: cannot find -latomic HOT 17
- Saving ggplot object results in indefinitely growing file HOT 3
- qs apparently slower than rds when saving nested lists HOT 2
- DESCRIPTION file broken? HOT 6
- read part of a qs file into memory HOT 1
- Deserialization arbitrary code execution attack HOT 5
- qsave() bad binding access HOT 5
- qread() cannot locate file on cloud? HOT 4
- segfault: RApiSerialize update requires rebuilding/reinstalling qs from source HOT 6
- qs segfaults when using a data.frame with external pointer in attributes in renv-controlled session HOT 1
- New check for `libatomic` in 0.26.3 does not work HOT 5
- qsavem(list = ls(),file = "environment2.qs",nthreads = 4) Error HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qs.