truecluster / bit64 Goto Github PK

View Code? Open in Web Editor NEW

23.0 23.0 8.0 332 KB

An R package with an S3 Class for Vectors of 64bit Integers

R 56.46% Perl 0.18% Shell 0.07% C 41.47% C++ 1.83%

bit64's People

Contributors

Stargazers

Watchers

Forkers

qulogic naprstek2609 istraresearch v10qa dipterix ofekshilon michaelchirico cswingle

bit64's Issues

FR: integer64 as.list method

I just ran into the "problem" described here.

A dedicated method would be really convenient.

Double divided by integer64 gives wrong results

Example:

d <- 2.5
i <- as.integer64(5)
d/i
#0.4

Should have been 0.5 - current implementation coerces d into integer64 before the division.

Round trip with `dput()` and `hexNumeric`?

I see in the examples for bit64 that roundtripping with dput() is supported. But I noticed that if I use the deparse option hexNumeric (to get the best fidelity for floats in roundtrips, https://stat.ethz.ch/R-manual/R-devel/library/base/html/deparseOpts.html), I'm getting 0 as the return value for many bit64 integers:

library(bit64)

d_file <- tempfile()

int32 <- 10
int64 <- as.integer64(10)

# for a standard integer, we can roundtrip with dput
dput(int32, d_file, control = c("all", "hexNumeric"))
back <- source(d_file)$value

identical(int32, back)
#> [1] TRUE

# for bit64, hexNumeric doesn't roundtrip
dput(int64, d_file, control = c("all", "hexNumeric"))

back <- source(d_file)$value

identical(int64, back)
#> [1] FALSE

int64
#> integer64
#> [1] 10
back
#> integer64
#> [1] 0

# though digits17 does work
dput(int64, d_file, control = c("all", "digits17"))

back <- source(d_file)$value

identical(int64, back)
#> [1] TRUE

^{Created on 2022-12-30 with reprex v2.0.2}

This might ultimately be a limitation or issue with R's representation of binary fractions with floats that are lower than .Machine$double.xmin (which apparently can occur https://stat.ethz.ch/R-manual/R-devel/library/base/html/zMachine.html), since a large enough integer64 does roundtrip (see below).

library(bit64)

d_file <- tempfile()

# but a large integer64 does roundtrip
int64_large <- as.integer64(.Machine$integer.max) * .Machine$integer.max

dput(int64_large, d_file, control = c("all", "hexNumeric"))

back <- source(d_file)$value

identical(int64_large, back)
#> [1] TRUE

^{Created on 2022-12-30 with reprex v2.0.2}

integer64/double multiplication is not commutative

Example:

i <- as.integer64(2)
d <- 3.5
i*d
#integer64
#[1] 7
d*i
#integer64
#[1] 6

Functions remove `integer64` classes

Native functions such as as.vector, unlist will remove integer64 class.

For example,

> x <- bit64::as.integer64(1:9)
> as.vector(x)
[1] 4.940656e-324 9.881313e-324 1.482197e-323 1.976263e-323
[5] 2.470328e-323 2.964394e-323 3.458460e-323 3.952525e-323
[9] 4.446591e-323

I suggest to add S3 generics for these functions. Here are my naive implementations:

as.vector.integer64 <- function(x, mode = "any") {
  result <- NextMethod("as.vector")
  if(mode %in% c("any", "integer64")) {
    class(result) <- "integer64"
  }
  result
}

unlist.integer64 <- function(x, recursive = TRUE, use.names = TRUE) {
  structure(NextMethod("unlist"), class = "integer64")
}

as.list.integer64 <- function(x, ...) {
  # similar to as.list.factor
  res <- vector("list", length(x))
  for (i in seq_along(x)) res[[i]] <- x[[i]]
  if (is.null(names(x))) 
    res
  else `names<-`(res, names(x))
}

x <- bit64::as.integer64(1:9)
as.vector(x)
unlist(x)
lapply(x, I)
simplify2array(x)
array(x, dim = c(3,3))

Results:

> as.vector(x)
integer64
[1] 1 2 3 4 5 6 7 8 9
> unlist(x)
integer64
[1] 1 2 3 4 5 6 7 8 9
> lapply(x, I)
[[1]]
integer64
[1] 1

[[2]]
integer64
[1] 2

[[3]]
integer64
[1] 3

[[4]]
integer64
[1] 4

[[5]]
integer64
[1] 5

[[6]]
integer64
[1] 6

[[7]]
integer64
[1] 7

[[8]]
integer64
[1] 8

[[9]]
integer64
[1] 9

> simplify2array(x)
integer64
[1] 1 2 3 4 5 6 7 8 9
> array(x, dim = c(3,3))
integer64
     [,1] [,2] [,3]
[1,] 1    4    7   
[2,] 2    5    8   
[3,] 3    6    9

Bitwise operations missing

Old bit64 versions had bitOr and bitAnd. The github versions don't have them, but NEWS doesn't have any mention of dropping them deliberately in any prior version. Could they have been accidentally deleted?

Unexpected behaviour when calling `max.integer64`?

Thank you for bringing 64 bit integers to R!

When calling max.integer64 on an empty vector an unclear warning message is given and a strange negative number is returned (see reprex below).

data   <- numeric(0)
data64 <- bit64::integer64(0)
max(data)
#> Warning in max(data): no non-missing arguments to max; returning -Inf
#> [1] -Inf
max(data64)
#> Warning in max.integer64(structure(numeric(0), class = "integer64"), na.rm =
#> FALSE): no non-NA value, returning -9223372036854775807
#> integer64
#> [1] -9223372036854775807

^{Created on 2024-05-03 with reprex v2.0.2}

When calling max.double, -Inf is returned. So I presume that the strange number is actually a representation of -Inf but cannot be handled as integer64 in R. I don't think this is desired behaviour, but I'm not sure what the best fix would be. It would be helpful if at least the warning message is more comprehensible.

cheers,

Pepijn

bug in all.equal.integer64 for length(scale)>1

Hi, I noticed you are using basically the same definition for all.equal.integer64 as base::all.equal.numeric(); I just filed this bug & patch to base R:

https://bugs.r-project.org/show_bug.cgi?id=18272

Happy to file the same PR against bit64, but for now, waiting to see what r-devel wants to do about the bug.

mean.integer64 method could return numeric

Hello,
Base R mean on integers will always return numeric. Would you consider aligning this behavior in bit64?

long vector support

It seems that bit64 does not support long vectors, I'm getting

Error in is.na.integer64(res) : 
  long vectors not supported yet: memory.c:3888

on R 4.3.0, bit64 4.0.5. As far as I can tell, fixing this requires replacing LENGTH calls with XLENGTH calls, but I'm not sure if this is the only change that's needed for full long vector compatibility.

Implicit coercion to double/numeric

When one tries to implicitly coerce an integer64 to a numeric one gets a tiny numeric.

Also it's odd that if the order is changed the results change. I assume this is because of the single dispatch.
Since numeric is the base R type a numeric should probably and the coercion should be the same as explicit coercion i.e. as.numeric().

> c(1,as.integer64(1))
[1]  1.000000e+00 4.940656e-324
> c(as.integer64(1),1)
integer64
[1] 1 1

> as.vector(as.integer64(1))
[1] 4.940656e-324

Probably related to this

Session Info:
R version 4.0.2 (2020-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 32 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.9.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] bit64_4.0.5 bit_4.0.4

loaded via a namespace (and not attached):
[1] compiler_4.0.2 data.table_1.13.2

sub-assign to integer64 does not handle function form

Hi Jens,
Thanks for bit64 package.
It seems that [<- is now well handled. Would be nice to have to working as it is sometimes useful.
I am on 0.9.7.

library(bit64)
x <- as.integer64(1:3)
v <- as.integer64(4)
x[1L] <- v
x
#integer64
#[1] 4 2 3
`[<-`(x, 1L, v)
#Error in as.integer64(value) : 
#  argument "value" is missing, with no default

It could work just like an integer

x <- 1:3
v <- 4L
x[1L] <- v
x
#[1] 4 2 3
`[<-`(x, 1L, v)
#[1] 4 2 3

bit64 is not returning `bit64::NA_integer64_` when indexing out of range

bit64 is not returning bit64::NA_integer64_ when indexing out of range

> str(list(
+ ints = (1:3)[4:5],
+ char = letters[30:31],
+ logi = c(T,F,T)[4:5],
+ bit64 = bit64::as.integer64(1:3)[4:5]
+ ))
List of 4
 $ ints : int [1:2] NA NA
 $ char : chr [1:2] NA NA
 $ logi : logi [1:2] NA NA
 $ bit64:integer64 [1:2] 9218868437227407266 9218868437227407266

seen with bit64 4.0.5 and older on win and mac.

Feature request: Add bitwise operators on integer64

Bitwise AND/OR and left/right shifts would be very useful when working with integers that are encoding multiple sub-values based on bit position.

`setdiff` does not seem to like integer64 class vectors

A trivial example:

> setdiff(bit64::as.integer64(c(1, 2, 3)), bit64::as.integer64(c(1, 2)))
[1] 1.482197e-323

> # Expected result
> setdiff(c(1L, 2L, 3L), c(1L, 2L))
[1] 3

My sessionInfo()

R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8    LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.utf8    

time zone: Australia/Sydney
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bit64_4.0.5 bit_4.0.5  

loaded via a namespace (and not attached):
[1] compiler_4.3.0    tools_4.3.0       rstudioapi_0.15.0

Inconsistent results on different machine

I tested bit64::as.integer64(2^63) on x86 ubuntu vs ARM M1 OSX,

# Ubuntu
> bit64::as.integer64(2^63)
integer64
[1] <NA>

# M1, OSX
> bit64::as.integer64(2^63)
integer64
[1] 9223372036854775807

I was wondering if this is expected behavior?

I tested the following script:

print_bit <- Rcpp::cppFunction(r"(
SEXP print_bit(SEXP obj){

  int64_t tmp1 = *REAL0(obj);
  printf("%lld ", tmp1);

  return(R_NilValue);
}
)")
print_bit(2^63)

On x64 ubuntu server, print_bit(2^63) prints -9223372036854775808, but on M1 mac, it prints 9223372036854775807

is.na returns FALSE for list element with scalar NA_integer64_

hi @truecluster I noticed something inconsistent about the way NA_integer64_ is handled by is.na function of base R:

> is.na(list(NA_integer64_, NA_real_, NA_integer_))
[1] FALSE  TRUE  TRUE

I would have expected TRUE rather than FALSE above (same as below)

> sapply(list(NA_integer64_, NA_real_, NA_integer_), is.na)
[1] TRUE TRUE TRUE

The reason for this discrepancy is that base R is.na has a special method for lists which is hard coded in C: https://github.com/wch/r-source/blob/b560647e74459fa2f40262dcaf1abf171c197efc/src/main/coerce.c#L2247-L2271 (only for five types: logical/int, string, complex, real).
Do you think it would be possible to patch base R to fix this inconsistency? The fix would be to first check if there exists a method for is.na, and if so then use it (here the method is is.na.integer64) instead of the hard coded logic.
For reference I discovered this issue while fixing a bug in data.table::melt Rdatatable/data.table#5044 (comment)

Automatic type conversion when using %in%?

For the primitive classes, R does a neat job of automatically converting between types where possible, e.g.

as.numeric(2) %in% as.integer(c(1, 2, 3))
# [1] TRUE
as.integer(2) %in% as.numeric(c(1, 2, 3))
# [1] TRUE

However, this does not work for integer64, where the answer for %in% is just always FALSE:

bit64::as.integer64(2) %in% as.numeric(c(1, 2, 3))
# [1] FALSE
bit64::as.integer64(2) %in% as.integer(c(1, 2, 3))
# [1] FALSE
as.numeric(2) %in% bit64::as.integer64(c(1, 2, 3))
# [1] FALSE
as.integer(2) %in% bit64::as.integer64(c(1, 2, 3))
# [1] FALSE

If possible, could this behavior be changed? This would be more consistent with for example direct comparisons, where the automatic conversion does take place, e.g.:

bit64::as.integer64(2) == as.numeric(2)
# [1] TRUE

`as.integer64.character()` should check `errno` at the C level

In the case of strings that represent values that are out of the range of representable values by a long long, strtoll() will return LLONG_MAX or LLONG_MIN and set errno to ERANGE.

bit64::as.integer64("12312312423432842390482390482348328992382930482093842384092842834238904823908423904230423908924300")
#> integer64
#> [1] 9223372036854775807

See the Return Value section of:
http://www.cplusplus.com/reference/cstdlib/strtoll/

I imagine that after calling strtoll() here you could check errno, and return NA if it is set to ERANGE (possibly with a warning), which seems like it might be better behavior for R

bit64/src/integer64.c

Line 205 in e428535

ret[i] = strtoll(str, &endpointer, 10);

integer64 does not coerce to matrix correctly

Consider a vector of integer64:

x <- c(30000, 30000, 250000, 93500, 102900)
x_int64 <- bit64::as.integer64(x)

These data do not coerce to matrix correctly, as noted in this StackOverflow question.

> as.matrix(x_int64)
#               [,1]
# [1,] 1.482197e-319
# [2,] 1.482197e-319
# [3,] 1.235164e-318
# [4,] 4.619514e-319
# [5,] 5.083935e-319

Not entirely sure if this is a bug in base R or in this package, but there seems to be a bug here somewhere.

Not able to handle big integer

I'm getting some strange results when I try using very large integers from MS SQL Server.

# max -9223372036854775808 = NA
bit64::as.integer64(-9223372036854775808)
#> integer64
#> [1] <NA>
# max 9223372036854775807 = NA
bit64::as.integer64(9223372036854775807)
#> integer64
#> [1] <NA>
# very strange numbers
bit64::as.integer64(-9223372036854775295)
#> integer64
#> [1] -9223372036854774784
# very strange numbers
bit64::as.integer64(9223372036854775295)
#> integer64
#> [1] 9223372036854774784
# start of NA
bit64::as.integer64(-9223372036854775296)
#> integer64
#> [1] <NA>
# start of NA
bit64::as.integer64(9223372036854775296)
#> integer64
#> [1] <NA>
# above 9 999 999 999 999 998 starts rounding by 1:
bit64::as.integer64(-9999999999999999)
#> integer64
#> [1] -10000000000000000
bit64::as.integer64(9999999999999999)
#> integer64
#> [1] 10000000000000000
bit64::as.integer64(-9999999999999998)
#> integer64
#> [1] -9999999999999998
bit64::as.integer64(9999999999999998)
#> integer64
#> [1] 9999999999999998
bit64::as.integer64(9.999999999999999e+15)
#> integer64
#> [1] 10000000000000000
bit64::as.integer64(9.999999999999998e+15)
#> integer64
#> [1] 9999999999999998

^{Created on 2021-01-22 by the reprex package (v0.3.0)}

as.integer() on an int64 dataframe produces unexpected result

I was reviewing some code and came across this odd result. If you have a dataframe with one value of type integer and you coerce it to integer you get what I think you would expect:

library(dplyr)

tibble(x = as.integer(c(1))) %>% as.integer()

[1] 1

But if it's of type int64, you get something weird:

library(bit64)

tibble(x = as.integer64(c(1))) %>% as.integer()

[1] 0

What gives? I assume it has something to do with the int64 class. But why would I get zero? Is this just bad error handling?

Update

OK, there's a hint to what's going on when you call dput on the int64 dataframe:

structure(list(x = structure(4.94065645841247e-324, 
                             class = "integer64")), 
          row.names = c(NA, -1L), 
          class = c("tbl_df", "tbl", "data.frame"))

So as.integer() is rightly converting 4.94065645841247e-324 to zero. But why is that what's stored in the DF?

Also, to see that this is not a bit64 issue, I get a very similar structure on the actual df I get back from my database:

structure(list(max = structure(2.78554211125295e-320,
                               class = "integer64")),
          class = "data.frame", 
          row.names = c(NA, -1L))

Bottom line

This should either produce the expected result or throw an error. Happy to work on this myself . . .

bit64 NA doesn't replicate in data.frame constructor

While constructing a data-frame, columns are replicated if lengths differ.

> data.frame(x = c(1,2), y = NA_integer_)
  x  y
1 1 NA
2 2 NA

However, when I try to do this with bit64::NA_integer64_, I get an error. Does anyone know what could be happening? rep() works if it is called separately on bit64::NA_integer64_.

> data.frame(x = c(1,2), y = bit64::NA_integer64_)
Error in data.frame(x = c(1, 2), y = bit64::NA_integer64_) : 
  arguments imply differing number of rows: 2, 1
> rep(bit64::NA_integer64_, 2)
integer64
[1] <NA> <NA>

seq.integer64() doesn't match base seq() when from and to are identical

seq.integer64() doesn't match seq() when from and to are identical:

> seq(1, 1)
[1] 1
> library(bit64)
> seq(as.integer64(1), as.integer64(1))
integer64(0)                                                                                                                                                                                                  
Warning message:                                                                                                                                                                                              
In `%/%.integer64`((to - from), by) : NAs produced due to division by zero

sessionInfo():

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bit64_4.0.5 bit_4.0.4  

loaded via a namespace (and not attached):
[1] compiler_4.2.1

Arithmetic operators don't handle well zero-length vectors

Example:

> a <- as.integer64(c(6,5)); a
integer64
[1] 6 5
> b <- as.integer64(); b
integer64(0)
> a*b
integer64
[1] 567545821093824 21474836490

truecluster / bit64 Goto Github PK

bit64's People

Contributors

Stargazers

Watchers

Forkers

bit64's Issues

Update

Bottom line

Recommend Projects

Recommend Topics

Recommend Org