truecluster / bit64 Goto Github PK
View Code? Open in Web Editor NEWAn R package with an S3 Class for Vectors of 64bit Integers
An R package with an S3 Class for Vectors of 64bit Integers
I just ran into the "problem" described here.
A dedicated method would be really convenient.
Example:
d <- 2.5
i <- as.integer64(5)
d/i
#0.4
Should have been 0.5 - current implementation coerces d
into integer64
before the division.
I see in the examples for bit64 that roundtripping with dput()
is supported. But I noticed that if I use the deparse option hexNumeric
(to get the best fidelity for floats in roundtrips, https://stat.ethz.ch/R-manual/R-devel/library/base/html/deparseOpts.html), I'm getting 0
as the return value for many bit64 integers:
library(bit64)
d_file <- tempfile()
int32 <- 10
int64 <- as.integer64(10)
# for a standard integer, we can roundtrip with dput
dput(int32, d_file, control = c("all", "hexNumeric"))
back <- source(d_file)$value
identical(int32, back)
#> [1] TRUE
# for bit64, hexNumeric doesn't roundtrip
dput(int64, d_file, control = c("all", "hexNumeric"))
back <- source(d_file)$value
identical(int64, back)
#> [1] FALSE
int64
#> integer64
#> [1] 10
back
#> integer64
#> [1] 0
# though digits17 does work
dput(int64, d_file, control = c("all", "digits17"))
back <- source(d_file)$value
identical(int64, back)
#> [1] TRUE
Created on 2022-12-30 with reprex v2.0.2
This might ultimately be a limitation or issue with R's representation of binary fractions with floats that are lower than .Machine$double.xmin
(which apparently can occur https://stat.ethz.ch/R-manual/R-devel/library/base/html/zMachine.html), since a large enough integer64 does roundtrip (see below).
library(bit64)
d_file <- tempfile()
# but a large integer64 does roundtrip
int64_large <- as.integer64(.Machine$integer.max) * .Machine$integer.max
dput(int64_large, d_file, control = c("all", "hexNumeric"))
back <- source(d_file)$value
identical(int64_large, back)
#> [1] TRUE
Created on 2022-12-30 with reprex v2.0.2
Example:
i <- as.integer64(2)
d <- 3.5
i*d
#integer64
#[1] 7
d*i
#integer64
#[1] 6
Native functions such as as.vector
, unlist
will remove integer64
class.
For example,
> x <- bit64::as.integer64(1:9)
> as.vector(x)
[1] 4.940656e-324 9.881313e-324 1.482197e-323 1.976263e-323
[5] 2.470328e-323 2.964394e-323 3.458460e-323 3.952525e-323
[9] 4.446591e-323
I suggest to add S3 generics for these functions. Here are my naive implementations:
as.vector.integer64 <- function(x, mode = "any") {
result <- NextMethod("as.vector")
if(mode %in% c("any", "integer64")) {
class(result) <- "integer64"
}
result
}
unlist.integer64 <- function(x, recursive = TRUE, use.names = TRUE) {
structure(NextMethod("unlist"), class = "integer64")
}
as.list.integer64 <- function(x, ...) {
# similar to as.list.factor
res <- vector("list", length(x))
for (i in seq_along(x)) res[[i]] <- x[[i]]
if (is.null(names(x)))
res
else `names<-`(res, names(x))
}
x <- bit64::as.integer64(1:9)
as.vector(x)
unlist(x)
lapply(x, I)
simplify2array(x)
array(x, dim = c(3,3))
Results:
> as.vector(x)
integer64
[1] 1 2 3 4 5 6 7 8 9
> unlist(x)
integer64
[1] 1 2 3 4 5 6 7 8 9
> lapply(x, I)
[[1]]
integer64
[1] 1
[[2]]
integer64
[1] 2
[[3]]
integer64
[1] 3
[[4]]
integer64
[1] 4
[[5]]
integer64
[1] 5
[[6]]
integer64
[1] 6
[[7]]
integer64
[1] 7
[[8]]
integer64
[1] 8
[[9]]
integer64
[1] 9
> simplify2array(x)
integer64
[1] 1 2 3 4 5 6 7 8 9
> array(x, dim = c(3,3))
integer64
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Old bit64 versions had bitOr
and bitAnd
. The github versions don't have them, but NEWS doesn't have any mention of dropping them deliberately in any prior version. Could they have been accidentally deleted?
Thank you for bringing 64 bit integers to R!
When calling max.integer64
on an empty vector an unclear warning message is given and a strange negative number is returned (see reprex below).
data <- numeric(0)
data64 <- bit64::integer64(0)
max(data)
#> Warning in max(data): no non-missing arguments to max; returning -Inf
#> [1] -Inf
max(data64)
#> Warning in max.integer64(structure(numeric(0), class = "integer64"), na.rm =
#> FALSE): no non-NA value, returning -9223372036854775807
#> integer64
#> [1] -9223372036854775807
Created on 2024-05-03 with reprex v2.0.2
When calling max.double
, -Inf
is returned. So I presume that the strange number is actually a representation of -Inf
but cannot be handled as integer64 in R. I don't think this is desired behaviour, but I'm not sure what the best fix would be. It would be helpful if at least the warning message is more comprehensible.
cheers,
Pepijn
Hi, I noticed you are using basically the same definition for all.equal.integer64
as base::all.equal.numeric()
; I just filed this bug & patch to base R:
https://bugs.r-project.org/show_bug.cgi?id=18272
Happy to file the same PR against bit64
, but for now, waiting to see what r-devel wants to do about the bug.
Hello,
Base R mean
on integers will always return numeric. Would you consider aligning this behavior in bit64?
It seems that bit64 does not support long vectors, I'm getting
Error in is.na.integer64(res) :
long vectors not supported yet: memory.c:3888
on R 4.3.0, bit64 4.0.5. As far as I can tell, fixing this requires replacing LENGTH
calls with XLENGTH
calls, but I'm not sure if this is the only change that's needed for full long vector compatibility.
When one tries to implicitly coerce an integer64 to a numeric one gets a tiny numeric.
Also it's odd that if the order is changed the results change. I assume this is because of the single dispatch.
Since numeric is the base R type a numeric should probably and the coercion should be the same as explicit coercion i.e. as.numeric().
> c(1,as.integer64(1))
[1] 1.000000e+00 4.940656e-324
> c(as.integer64(1),1)
integer64
[1] 1 1
> as.vector(as.integer64(1))
[1] 4.940656e-324
Probably related to this
Session Info:
R version 4.0.2 (2020-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 32 (Workstation Edition)
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.9.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bit64_4.0.5 bit_4.0.4
loaded via a namespace (and not attached):
[1] compiler_4.0.2 data.table_1.13.2
Hi Jens,
Thanks for bit64 package.
It seems that [<-
is now well handled. Would be nice to have to working as it is sometimes useful.
I am on 0.9.7.
library(bit64)
x <- as.integer64(1:3)
v <- as.integer64(4)
x[1L] <- v
x
#integer64
#[1] 4 2 3
`[<-`(x, 1L, v)
#Error in as.integer64(value) :
# argument "value" is missing, with no default
It could work just like an integer
x <- 1:3
v <- 4L
x[1L] <- v
x
#[1] 4 2 3
`[<-`(x, 1L, v)
#[1] 4 2 3
bit64 is not returning bit64::NA_integer64_
when indexing out of range
> str(list(
+ ints = (1:3)[4:5],
+ char = letters[30:31],
+ logi = c(T,F,T)[4:5],
+ bit64 = bit64::as.integer64(1:3)[4:5]
+ ))
List of 4
$ ints : int [1:2] NA NA
$ char : chr [1:2] NA NA
$ logi : logi [1:2] NA NA
$ bit64:integer64 [1:2] 9218868437227407266 9218868437227407266
seen with bit64 4.0.5 and older on win and mac.
Bitwise AND/OR and left/right shifts would be very useful when working with integers that are encoding multiple sub-values based on bit position.
A trivial example:
> setdiff(bit64::as.integer64(c(1, 2, 3)), bit64::as.integer64(c(1, 2)))
[1] 1.482197e-323
> # Expected result
> setdiff(c(1L, 2L, 3L), c(1L, 2L))
[1] 3
My sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8 LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Australia.utf8
time zone: Australia/Sydney
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bit64_4.0.5 bit_4.0.5
loaded via a namespace (and not attached):
[1] compiler_4.3.0 tools_4.3.0 rstudioapi_0.15.0
I tested bit64::as.integer64(2^63)
on x86 ubuntu vs ARM M1 OSX,
# Ubuntu
> bit64::as.integer64(2^63)
integer64
[1] <NA>
# M1, OSX
> bit64::as.integer64(2^63)
integer64
[1] 9223372036854775807
I was wondering if this is expected behavior?
I tested the following script:
print_bit <- Rcpp::cppFunction(r"(
SEXP print_bit(SEXP obj){
int64_t tmp1 = *REAL0(obj);
printf("%lld ", tmp1);
return(R_NilValue);
}
)")
print_bit(2^63)
On x64 ubuntu server, print_bit(2^63) prints -9223372036854775808
, but on M1 mac, it prints 9223372036854775807
hi @truecluster I noticed something inconsistent about the way NA_integer64_
is handled by is.na
function of base R:
> is.na(list(NA_integer64_, NA_real_, NA_integer_))
[1] FALSE TRUE TRUE
I would have expected TRUE rather than FALSE above (same as below)
> sapply(list(NA_integer64_, NA_real_, NA_integer_), is.na)
[1] TRUE TRUE TRUE
The reason for this discrepancy is that base R is.na
has a special method for lists which is hard coded in C: https://github.com/wch/r-source/blob/b560647e74459fa2f40262dcaf1abf171c197efc/src/main/coerce.c#L2247-L2271 (only for five types: logical/int, string, complex, real).
Do you think it would be possible to patch base R to fix this inconsistency? The fix would be to first check if there exists a method for is.na
, and if so then use it (here the method is is.na.integer64
) instead of the hard coded logic.
For reference I discovered this issue while fixing a bug in data.table::melt
Rdatatable/data.table#5044 (comment)
For the primitive classes, R does a neat job of automatically converting between types where possible, e.g.
as.numeric(2) %in% as.integer(c(1, 2, 3))
# [1] TRUE
as.integer(2) %in% as.numeric(c(1, 2, 3))
# [1] TRUE
However, this does not work for integer64, where the answer for %in%
is just always FALSE:
bit64::as.integer64(2) %in% as.numeric(c(1, 2, 3))
# [1] FALSE
bit64::as.integer64(2) %in% as.integer(c(1, 2, 3))
# [1] FALSE
as.numeric(2) %in% bit64::as.integer64(c(1, 2, 3))
# [1] FALSE
as.integer(2) %in% bit64::as.integer64(c(1, 2, 3))
# [1] FALSE
If possible, could this behavior be changed? This would be more consistent with for example direct comparisons, where the automatic conversion does take place, e.g.:
bit64::as.integer64(2) == as.numeric(2)
# [1] TRUE
In the case of strings that represent values that are out of the range of representable values by a long long
, strtoll()
will return LLONG_MAX
or LLONG_MIN
and set errno
to ERANGE
.
bit64::as.integer64("12312312423432842390482390482348328992382930482093842384092842834238904823908423904230423908924300")
#> integer64
#> [1] 9223372036854775807
See the Return Value section of:
http://www.cplusplus.com/reference/cstdlib/strtoll/
I imagine that after calling strtoll()
here you could check errno
, and return NA
if it is set to ERANGE
(possibly with a warning), which seems like it might be better behavior for R
Line 205 in e428535
Consider a vector of integer64
:
x <- c(30000, 30000, 250000, 93500, 102900)
x_int64 <- bit64::as.integer64(x)
These data do not coerce to matrix correctly, as noted in this StackOverflow question.
> as.matrix(x_int64)
# [,1]
# [1,] 1.482197e-319
# [2,] 1.482197e-319
# [3,] 1.235164e-318
# [4,] 4.619514e-319
# [5,] 5.083935e-319
Not entirely sure if this is a bug in base R or in this package, but there seems to be a bug here somewhere.
I'm getting some strange results when I try using very large integers from MS SQL Server.
`
# max -9223372036854775808 = NA
bit64::as.integer64(-9223372036854775808)
#> integer64
#> [1] <NA>
# max 9223372036854775807 = NA
bit64::as.integer64(9223372036854775807)
#> integer64
#> [1] <NA>
# very strange numbers
bit64::as.integer64(-9223372036854775295)
#> integer64
#> [1] -9223372036854774784
# very strange numbers
bit64::as.integer64(9223372036854775295)
#> integer64
#> [1] 9223372036854774784
# start of NA
bit64::as.integer64(-9223372036854775296)
#> integer64
#> [1] <NA>
# start of NA
bit64::as.integer64(9223372036854775296)
#> integer64
#> [1] <NA>
# above 9 999 999 999 999 998 starts rounding by 1:
bit64::as.integer64(-9999999999999999)
#> integer64
#> [1] -10000000000000000
bit64::as.integer64(9999999999999999)
#> integer64
#> [1] 10000000000000000
bit64::as.integer64(-9999999999999998)
#> integer64
#> [1] -9999999999999998
bit64::as.integer64(9999999999999998)
#> integer64
#> [1] 9999999999999998
bit64::as.integer64(9.999999999999999e+15)
#> integer64
#> [1] 10000000000000000
bit64::as.integer64(9.999999999999998e+15)
#> integer64
#> [1] 9999999999999998
Created on 2021-01-22 by the reprex package (v0.3.0)
`
I was reviewing some code and came across this odd result. If you have a dataframe with one value of type integer and you coerce it to integer you get what I think you would expect:
library(dplyr)
tibble(x = as.integer(c(1))) %>% as.integer()
[1] 1
But if it's of type int64, you get something weird:
library(bit64)
tibble(x = as.integer64(c(1))) %>% as.integer()
[1] 0
What gives? I assume it has something to do with the int64
class. But why would I get zero? Is this just bad error handling?
OK, there's a hint to what's going on when you call dput
on the int64
dataframe:
structure(list(x = structure(4.94065645841247e-324,
class = "integer64")),
row.names = c(NA, -1L),
class = c("tbl_df", "tbl", "data.frame"))
So as.integer()
is rightly converting 4.94065645841247e-324 to zero. But why is that what's stored in the DF?
Also, to see that this is not a bit64
issue, I get a very similar structure on the actual df I get back from my database:
structure(list(max = structure(2.78554211125295e-320,
class = "integer64")),
class = "data.frame",
row.names = c(NA, -1L))
This should either produce the expected result or throw an error. Happy to work on this myself . . .
While constructing a data-frame, columns are replicated if lengths differ.
> data.frame(x = c(1,2), y = NA_integer_)
x y
1 1 NA
2 2 NA
However, when I try to do this with bit64::NA_integer64_
, I get an error. Does anyone know what could be happening? rep()
works if it is called separately on bit64::NA_integer64_
.
> data.frame(x = c(1,2), y = bit64::NA_integer64_)
Error in data.frame(x = c(1, 2), y = bit64::NA_integer64_) :
arguments imply differing number of rows: 2, 1
> rep(bit64::NA_integer64_, 2)
integer64
[1] <NA> <NA>
seq.integer64()
doesn't match seq()
when from and to are identical:
> seq(1, 1)
[1] 1
> library(bit64)
> seq(as.integer64(1), as.integer64(1))
integer64(0)
Warning message:
In `%/%.integer64`((to - from), by) : NAs produced due to division by zero
sessionInfo():
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_DK.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bit64_4.0.5 bit_4.0.4
loaded via a namespace (and not attached):
[1] compiler_4.2.1
Example:
> a <- as.integer64(c(6,5)); a
integer64
[1] 6 5
> b <- as.integer64(); b
integer64(0)
> a*b
integer64
[1] 567545821093824 21474836490
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.