Giter Club home page Giter Club logo

bit64's People

Contributors

qulogic avatar truecluster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

bit64's Issues

Round trip with `dput()` and `hexNumeric`?

I see in the examples for bit64 that roundtripping with dput() is supported. But I noticed that if I use the deparse option hexNumeric (to get the best fidelity for floats in roundtrips, https://stat.ethz.ch/R-manual/R-devel/library/base/html/deparseOpts.html), I'm getting 0 as the return value for many bit64 integers:

library(bit64)

d_file <- tempfile()

int32 <- 10
int64 <- as.integer64(10)

# for a standard integer, we can roundtrip with dput
dput(int32, d_file, control = c("all", "hexNumeric"))
back <- source(d_file)$value

identical(int32, back)
#> [1] TRUE

# for bit64, hexNumeric doesn't roundtrip
dput(int64, d_file, control = c("all", "hexNumeric"))

back <- source(d_file)$value

identical(int64, back)
#> [1] FALSE

int64
#> integer64
#> [1] 10
back
#> integer64
#> [1] 0

# though digits17 does work
dput(int64, d_file, control = c("all", "digits17"))

back <- source(d_file)$value

identical(int64, back)
#> [1] TRUE

Created on 2022-12-30 with reprex v2.0.2

This might ultimately be a limitation or issue with R's representation of binary fractions with floats that are lower than .Machine$double.xmin (which apparently can occur https://stat.ethz.ch/R-manual/R-devel/library/base/html/zMachine.html), since a large enough integer64 does roundtrip (see below).

library(bit64)

d_file <- tempfile()

# but a large integer64 does roundtrip
int64_large <- as.integer64(.Machine$integer.max) * .Machine$integer.max

dput(int64_large, d_file, control = c("all", "hexNumeric"))

back <- source(d_file)$value

identical(int64_large, back)
#> [1] TRUE

Created on 2022-12-30 with reprex v2.0.2

Functions remove `integer64` classes

Native functions such as as.vector, unlist will remove integer64 class.

For example,

> x <- bit64::as.integer64(1:9)
> as.vector(x)
[1] 4.940656e-324 9.881313e-324 1.482197e-323 1.976263e-323
[5] 2.470328e-323 2.964394e-323 3.458460e-323 3.952525e-323
[9] 4.446591e-323

I suggest to add S3 generics for these functions. Here are my naive implementations:

as.vector.integer64 <- function(x, mode = "any") {
  result <- NextMethod("as.vector")
  if(mode %in% c("any", "integer64")) {
    class(result) <- "integer64"
  }
  result
}

unlist.integer64 <- function(x, recursive = TRUE, use.names = TRUE) {
  structure(NextMethod("unlist"), class = "integer64")
}

as.list.integer64 <- function(x, ...) {
  # similar to as.list.factor
  res <- vector("list", length(x))
  for (i in seq_along(x)) res[[i]] <- x[[i]]
  if (is.null(names(x))) 
    res
  else `names<-`(res, names(x))
}

x <- bit64::as.integer64(1:9)
as.vector(x)
unlist(x)
lapply(x, I)
simplify2array(x)
array(x, dim = c(3,3))

Results:

> as.vector(x)
integer64
[1] 1 2 3 4 5 6 7 8 9
> unlist(x)
integer64
[1] 1 2 3 4 5 6 7 8 9
> lapply(x, I)
[[1]]
integer64
[1] 1

[[2]]
integer64
[1] 2

[[3]]
integer64
[1] 3

[[4]]
integer64
[1] 4

[[5]]
integer64
[1] 5

[[6]]
integer64
[1] 6

[[7]]
integer64
[1] 7

[[8]]
integer64
[1] 8

[[9]]
integer64
[1] 9

> simplify2array(x)
integer64
[1] 1 2 3 4 5 6 7 8 9
> array(x, dim = c(3,3))
integer64
     [,1] [,2] [,3]
[1,] 1    4    7   
[2,] 2    5    8   
[3,] 3    6    9   

Bitwise operations missing

Old bit64 versions had bitOr and bitAnd. The github versions don't have them, but NEWS doesn't have any mention of dropping them deliberately in any prior version. Could they have been accidentally deleted?

Unexpected behaviour when calling `max.integer64`?

Thank you for bringing 64 bit integers to R!

When calling max.integer64 on an empty vector an unclear warning message is given and a strange negative number is returned (see reprex below).

data   <- numeric(0)
data64 <- bit64::integer64(0)
max(data)
#> Warning in max(data): no non-missing arguments to max; returning -Inf
#> [1] -Inf
max(data64)
#> Warning in max.integer64(structure(numeric(0), class = "integer64"), na.rm =
#> FALSE): no non-NA value, returning -9223372036854775807
#> integer64
#> [1] -9223372036854775807

Created on 2024-05-03 with reprex v2.0.2

When calling max.double, -Inf is returned. So I presume that the strange number is actually a representation of -Inf but cannot be handled as integer64 in R. I don't think this is desired behaviour, but I'm not sure what the best fix would be. It would be helpful if at least the warning message is more comprehensible.

cheers,

Pepijn

long vector support

It seems that bit64 does not support long vectors, I'm getting

Error in is.na.integer64(res) : 
  long vectors not supported yet: memory.c:3888

on R 4.3.0, bit64 4.0.5. As far as I can tell, fixing this requires replacing LENGTH calls with XLENGTH calls, but I'm not sure if this is the only change that's needed for full long vector compatibility.

Implicit coercion to double/numeric

When one tries to implicitly coerce an integer64 to a numeric one gets a tiny numeric.

Also it's odd that if the order is changed the results change. I assume this is because of the single dispatch.
Since numeric is the base R type a numeric should probably and the coercion should be the same as explicit coercion i.e. as.numeric().

> c(1,as.integer64(1))
[1]  1.000000e+00 4.940656e-324
> c(as.integer64(1),1)
integer64
[1] 1 1

> as.vector(as.integer64(1))
[1] 4.940656e-324

Probably related to this

Session Info:
R version 4.0.2 (2020-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 32 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.9.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] bit64_4.0.5 bit_4.0.4

loaded via a namespace (and not attached):
[1] compiler_4.0.2 data.table_1.13.2

sub-assign to integer64 does not handle function form

Hi Jens,
Thanks for bit64 package.
It seems that [<- is now well handled. Would be nice to have to working as it is sometimes useful.
I am on 0.9.7.

library(bit64)
x <- as.integer64(1:3)
v <- as.integer64(4)
x[1L] <- v
x
#integer64
#[1] 4 2 3
`[<-`(x, 1L, v)
#Error in as.integer64(value) : 
#  argument "value" is missing, with no default

It could work just like an integer

x <- 1:3
v <- 4L
x[1L] <- v
x
#[1] 4 2 3
`[<-`(x, 1L, v)
#[1] 4 2 3

bit64 is not returning `bit64::NA_integer64_` when indexing out of range

bit64 is not returning bit64::NA_integer64_ when indexing out of range

> str(list(
+ ints = (1:3)[4:5],
+ char = letters[30:31],
+ logi = c(T,F,T)[4:5],
+ bit64 = bit64::as.integer64(1:3)[4:5]
+ ))
List of 4
 $ ints : int [1:2] NA NA
 $ char : chr [1:2] NA NA
 $ logi : logi [1:2] NA NA
 $ bit64:integer64 [1:2] 9218868437227407266 9218868437227407266 

seen with bit64 4.0.5 and older on win and mac.

`setdiff` does not seem to like integer64 class vectors

A trivial example:

> setdiff(bit64::as.integer64(c(1, 2, 3)), bit64::as.integer64(c(1, 2)))
[1] 1.482197e-323

> # Expected result
> setdiff(c(1L, 2L, 3L), c(1L, 2L))
[1] 3

My sessionInfo()

R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8    LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.utf8    

time zone: Australia/Sydney
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bit64_4.0.5 bit_4.0.5  

loaded via a namespace (and not attached):
[1] compiler_4.3.0    tools_4.3.0       rstudioapi_0.15.0

Inconsistent results on different machine

I tested bit64::as.integer64(2^63) on x86 ubuntu vs ARM M1 OSX,

# Ubuntu
> bit64::as.integer64(2^63)
integer64
[1] <NA>

# M1, OSX
> bit64::as.integer64(2^63)
integer64
[1] 9223372036854775807

I was wondering if this is expected behavior?

I tested the following script:

print_bit <- Rcpp::cppFunction(r"(
SEXP print_bit(SEXP obj){

  int64_t tmp1 = *REAL0(obj);
  printf("%lld ", tmp1);

  return(R_NilValue);
}
)")
print_bit(2^63)

On x64 ubuntu server, print_bit(2^63) prints -9223372036854775808, but on M1 mac, it prints 9223372036854775807

is.na returns FALSE for list element with scalar NA_integer64_

hi @truecluster I noticed something inconsistent about the way NA_integer64_ is handled by is.na function of base R:

> is.na(list(NA_integer64_, NA_real_, NA_integer_))
[1] FALSE  TRUE  TRUE

I would have expected TRUE rather than FALSE above (same as below)

> sapply(list(NA_integer64_, NA_real_, NA_integer_), is.na)
[1] TRUE TRUE TRUE

The reason for this discrepancy is that base R is.na has a special method for lists which is hard coded in C: https://github.com/wch/r-source/blob/b560647e74459fa2f40262dcaf1abf171c197efc/src/main/coerce.c#L2247-L2271 (only for five types: logical/int, string, complex, real).
Do you think it would be possible to patch base R to fix this inconsistency? The fix would be to first check if there exists a method for is.na, and if so then use it (here the method is is.na.integer64) instead of the hard coded logic.
For reference I discovered this issue while fixing a bug in data.table::melt Rdatatable/data.table#5044 (comment)

Automatic type conversion when using %in%?

For the primitive classes, R does a neat job of automatically converting between types where possible, e.g.

as.numeric(2) %in% as.integer(c(1, 2, 3))
# [1] TRUE
as.integer(2) %in% as.numeric(c(1, 2, 3))
# [1] TRUE

However, this does not work for integer64, where the answer for %in% is just always FALSE:

bit64::as.integer64(2) %in% as.numeric(c(1, 2, 3))
# [1] FALSE
bit64::as.integer64(2) %in% as.integer(c(1, 2, 3))
# [1] FALSE
as.numeric(2) %in% bit64::as.integer64(c(1, 2, 3))
# [1] FALSE
as.integer(2) %in% bit64::as.integer64(c(1, 2, 3))
# [1] FALSE

If possible, could this behavior be changed? This would be more consistent with for example direct comparisons, where the automatic conversion does take place, e.g.:

bit64::as.integer64(2) == as.numeric(2)
# [1] TRUE

`as.integer64.character()` should check `errno` at the C level

In the case of strings that represent values that are out of the range of representable values by a long long, strtoll() will return LLONG_MAX or LLONG_MIN and set errno to ERANGE.

bit64::as.integer64("12312312423432842390482390482348328992382930482093842384092842834238904823908423904230423908924300")
#> integer64
#> [1] 9223372036854775807

See the Return Value section of:
http://www.cplusplus.com/reference/cstdlib/strtoll/

I imagine that after calling strtoll() here you could check errno, and return NA if it is set to ERANGE (possibly with a warning), which seems like it might be better behavior for R

ret[i] = strtoll(str, &endpointer, 10);

integer64 does not coerce to matrix correctly

Consider a vector of integer64:

x <- c(30000, 30000, 250000, 93500, 102900)
x_int64 <- bit64::as.integer64(x)

These data do not coerce to matrix correctly, as noted in this StackOverflow question.

> as.matrix(x_int64)
#               [,1]
# [1,] 1.482197e-319
# [2,] 1.482197e-319
# [3,] 1.235164e-318
# [4,] 4.619514e-319
# [5,] 5.083935e-319

Not entirely sure if this is a bug in base R or in this package, but there seems to be a bug here somewhere.

Not able to handle big integer

I'm getting some strange results when I try using very large integers from MS SQL Server.

`

# max -9223372036854775808 = NA
bit64::as.integer64(-9223372036854775808)
#> integer64
#> [1] <NA>
# max 9223372036854775807 = NA
bit64::as.integer64(9223372036854775807)
#> integer64
#> [1] <NA>
# very strange numbers
bit64::as.integer64(-9223372036854775295)
#> integer64
#> [1] -9223372036854774784
# very strange numbers
bit64::as.integer64(9223372036854775295)
#> integer64
#> [1] 9223372036854774784
# start of NA
bit64::as.integer64(-9223372036854775296)
#> integer64
#> [1] <NA>
# start of NA
bit64::as.integer64(9223372036854775296)
#> integer64
#> [1] <NA>
# above 9 999 999 999 999 998 starts rounding by 1:
bit64::as.integer64(-9999999999999999)
#> integer64
#> [1] -10000000000000000
bit64::as.integer64(9999999999999999)
#> integer64
#> [1] 10000000000000000
bit64::as.integer64(-9999999999999998)
#> integer64
#> [1] -9999999999999998
bit64::as.integer64(9999999999999998)
#> integer64
#> [1] 9999999999999998
bit64::as.integer64(9.999999999999999e+15)
#> integer64
#> [1] 10000000000000000
bit64::as.integer64(9.999999999999998e+15)
#> integer64
#> [1] 9999999999999998

Created on 2021-01-22 by the reprex package (v0.3.0)

`

as.integer() on an int64 dataframe produces unexpected result

I was reviewing some code and came across this odd result. If you have a dataframe with one value of type integer and you coerce it to integer you get what I think you would expect:

library(dplyr)

tibble(x = as.integer(c(1))) %>% as.integer()

[1] 1

But if it's of type int64, you get something weird:

library(bit64)

tibble(x = as.integer64(c(1))) %>% as.integer()

[1] 0

What gives? I assume it has something to do with the int64 class. But why would I get zero? Is this just bad error handling?

Update

OK, there's a hint to what's going on when you call dput on the int64 dataframe:

structure(list(x = structure(4.94065645841247e-324, 
                             class = "integer64")), 
          row.names = c(NA, -1L), 
          class = c("tbl_df", "tbl", "data.frame"))

So as.integer() is rightly converting 4.94065645841247e-324 to zero. But why is that what's stored in the DF?

Also, to see that this is not a bit64 issue, I get a very similar structure on the actual df I get back from my database:

structure(list(max = structure(2.78554211125295e-320,
                               class = "integer64")),
          class = "data.frame", 
          row.names = c(NA, -1L))

Bottom line

This should either produce the expected result or throw an error. Happy to work on this myself . . .

bit64 NA doesn't replicate in data.frame constructor

While constructing a data-frame, columns are replicated if lengths differ.

> data.frame(x = c(1,2), y = NA_integer_)
  x  y
1 1 NA
2 2 NA

However, when I try to do this with bit64::NA_integer64_, I get an error. Does anyone know what could be happening? rep() works if it is called separately on bit64::NA_integer64_.

> data.frame(x = c(1,2), y = bit64::NA_integer64_)
Error in data.frame(x = c(1, 2), y = bit64::NA_integer64_) : 
  arguments imply differing number of rows: 2, 1
> rep(bit64::NA_integer64_, 2)
integer64
[1] <NA> <NA>

seq.integer64() doesn't match base seq() when from and to are identical

seq.integer64() doesn't match seq() when from and to are identical:

> seq(1, 1)
[1] 1
> library(bit64)
> seq(as.integer64(1), as.integer64(1))
integer64(0)                                                                                                                                                                                                  
Warning message:                                                                                                                                                                                              
In `%/%.integer64`((to - from), by) : NAs produced due to division by zero 

sessionInfo():

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bit64_4.0.5 bit_4.0.4  

loaded via a namespace (and not attached):
[1] compiler_4.2.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.