Giter Club home page Giter Club logo

gdsfmt's People

Contributors

nathanweeks avatar zhengxwen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gdsfmt's Issues

'val' and 'valdim' are both missing in 'add.gdsn' when a compression method is used

library(gdsfmt)

f <- createfn.gds("test.gds")
var.snp <- add.gdsn(f, "snp.id", storage="int", compress="ZIP")
append.gdsn(var.snp, 1:300000)
closefn.gds(f)

(z <- openfn.gds("test.gds"))
head(read.gdsn(index.gdsn(z, "snp.id")))

showing:

Error in read.gdsn(index.gdsn(z, "snp.id")) : Stream read error

The current version (<=v1.3.5) requires specifying 'valdim' in 'add.gdsn' when the argument 'val' is missing, and the following code works:

f <- createfn.gds("test.gds")
var.snp <- add.gdsn(f, "snp.id", valdim=0, storage="int", compress="ZIP")
append.gdsn(var.snp, 1:300000)
closefn.gds(f)

(z <- openfn.gds("test.gds"))
head(read.gdsn(index.gdsn(z, "snp.id")))
closefn.gds(z)

Question: creating and accessing attributes to gds object

Hi,

Thanks for this data structure. I've difficulty accessing, creating attributes to a gds object.

file.gdsdosage<-paste("dosage","_CHR",chr,".gds",sep="")
gdsfile <- createfn.gds(file.gdsdosage)

##pre-existing matrix 
##add them to .gds 
matrix.afr<-as.matrix(df.afr[,-c(1:3)])
matrix.nat<-as.matrix(df.nat[,-c(1:3)])
matrix.ceu<-as.matrix(df.ceu[,-c(1:3)])
add.gdsn(gdsfile , "dosage_eur",matrix.ceu)
add.gdsn(gdsfile , "dosage_nat",matrix.nat)
add.gdsn(gdsfile , "dosage_afr",matrix.afr)

### add SNP ids
add.gdsn(gdsfile,"snp.id" ,df.ceu[,c(1)])
add.gdsn(gdsfile,"snp.position",jointed_rschrpos$POS) ##add position
add.gdsn(gdsfile,"snp.chromosome",chr) ##add CHROMSOME to structure object
closefn.gds(gdsfile)

##subset  for ceu, afr and nat
tempgds <- openfn.gds(file.gdsdosage) ##open

for (anc in  c("nat","afr")){
##iterate using anc variable

I get error for the following loop when accessing elements:

##get dosage -- dosage_afr
  gdsr <- GdsGenotypeReader(tempgds , genotypeVar=paste0("dosage_", anc))
  genoDataList[[anc]] <- GenotypeData(gdsr, scanAnnot=scanAnnot)
}

closefn.gds(file.gdsdosage) ##open

I get error as:

Error in validObject(.Object) :
invalid class “GdsGenotypeReader” object: variable snp.chromosome has incorrect dimension

Also, I don't know how to get snp.id or chromsome attributes from the object when created.

print(gdsfile, attribute=TRUE)
this line prints all SNP ids but doesn't print position and chromosome.

Thanks!

assign.gdsn does not work in gdsfmt_1.5.4

library(gdsfmt)

f <- createfn.gds("test.gds")

n1 <- add.gdsn(f, "n1", 1:100)
n2 <- add.gdsn(f, "n2", storage="int", valdim=c(20, 0))

assign.gdsn(n2, n1, append=TRUE)

read.gdsn(n1)
read.gdsn(n2)

shows:

integer(0)

cannot create gds file on an opened file.

Thanks for your work, I feel it is great for handling large data.
I am writing a program to efficiently store very large BED file (>19G gz file for 3 billion site scores of whole genome), it is reduced to 3G in GDS format(storage=packedreal16, compress=LZ4_RA). It is great.

However, how to close a file handler which I lose the file handler instead of restarting the R session, for example:
`
foo <- function(){
createfn.gds("test.gds")
}

foo()

a <- createfn.gds("test.gds")
`
I got this:

The file '/home/liqg/rpkg/gdsfmt/test.gds' has been created or opened.

This also happens when using openfn.gds.

Closing the missing file handler is also important when error happens.

Thanks.

Stream read error bug

Dear Xiuwen,

I identified a bug leading to the error "Stream read error" during manipulation of GDS objects : I have to open and close a gds file several successive times, sometimes adding new nodes, sometimes modifying existing nodes (using replace = TRUE option in add.gdsn)... At a certain point, I get the error "Stream read error" when I try to re-open a GDS file : Error in openfn.gds(gds_file, FALSE, TRUE, TRUE) : Stream read error. I tried to use delete.gdsn and add.gdsn instead of using the replace option but I did not succeed to fix the error. I also tried to use sync.gds just after adding/modifying a node but no success neither.

Please find in my github repository https://github.com/claudiaQB/gdsfmt_bugstream.git a reproducible example using the official docker R image and the latest gdsfmt github version.
To run this example, just type:

git clone https://github.com/claudiaQB/gdsfmt_bugstream.git
cd gdsfmt_bugstream
make

Thank you for your help.

R crashes when creating a gds in forked process

Rscript -e 'library(parallel);library(gdsfmt);fn <- tempfile(); res <- mccollect(mcparallel({gds <- createfn.gds(fn); add.gdsn(gds, "i", 1) })); print(res)'

produces:

$`695`

 *** caught segfault ***
address 0x66, cause 'memory not mapped'

Traceback:
 1: .Call(gdsNodeValid, x)
 2: print.gdsn.class(c(31598528L, 0L, 0L, 0L))
 3: print(c(31598528L, 0L, 0L, 0L))
 4: print.default(res)
 5: print(res)
aborting ...
Segmentation fault (core dumped)

tested with R-3.1.1 and R-3.0.2 on ubuntu and debian with gdsfmt 1.1.0.1

INSTALL fails on F34 with R 4.1.0

The package fails to install on Fedora 34 with gcc/g++ 11.1.1 and R 4.1.0. Attached check log and zipped install outfile:
00check.log
00install.zip
I do not use or need the package, but it is used by a package that I check regularly in a reverse dependency chain (I haven't identified which). I simply did BiocManager::install(version="3.13", force=TRUE), and gdsfmt was the only failure. I wonder whether bundling libraries is such a good idea when a simple configure.ac might find installed copies that probably worked.

uncompressing node results in loss of data

I had a GDS file with a node compressed with ZIP_RA. When I uncompressed the node to write new values to it, the original data was lost. An example:

> gdsfile <- tempfile()
> gds <- createfn.gds(gdsfile)
> add.gdsn(gds, "x", letters, compress="ZIP_RA", closezip=TRUE)
> closefn.gds(gds)
> gds <- openfn.gds(gdsfile, readonly=FALSE)
> gds
File: /private/var/folders/sr/1znj0x853fb_hg5yyvwpdwg0000136/T/RtmpsKFeb2/filee46463420c84 (372 bytes)
+    [  ]
|--+ x   { VStr8 26 ZIP_RA(123.08%), 64 bytes }
> node <- index.gdsn(gds, "x")
> compression.gdsn(node, "")
+ x   { VStr8 26, 0 byte }
> read.gdsn(node)
Error in read.gdsn(node) : Stream read error
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gdsfmt_1.5.16

loaded via a namespace (and not attached):
[1] compiler_3.2.2 tools_3.2.2   

A bug in add.gdsn (gdsfmt_1.1.0)

We observe that adding a very large dataset by add.gdsn could cause the following error:

Error in add.gdsn(gfile, "snp.id", snp.id, compress = compress.annotation,  : 
  Invalid Zip Deflate Stream operation 'Seek'!

Immediate solution -- downgrade the package gdsfmt from v1.1.0 to v1.0.4:

install.packages("gdsfmt", repos="http://bioconductor.org/packages/release/extra")

Patched version: v1.1.1.

STEAM: Error: is.character(gds.fn) is not TRUE

Dear authors,

I'm running error with gdsfmt library, data structure. I'd like to use STEAM library to estimate threshold for local admixture analysis.
I've opened an issue for this on their repository but I think issue is relevant to this library. Link for issue: GrindeLab/STEAM#1

Any help would be appreciated. Thanks.

install error: version GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference

Hi,
When I install gdsfmt, an error occurred as below:
installing to /home/LL/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-gdsfmt/00new/gdsfmt/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘gdsfmt’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/LL/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-gdsfmt/00new/gdsfmt/libs/gdsfmt.so':
/home/LL/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-gdsfmt/00new/gdsfmt/libs/gdsfmt.so: symbol _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev, version GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference
Error: loading failed
Execution halted
ERROR: loading failed

How to solve the problem? I have tried many ways, but failed.
Thank you very much

In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, : Missing characters are converted to "".

Hi, when I want to add annotations to my gds file, I got this error.
Could you help me to solve this?

Warning messages:
1: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, :
Missing characters are converted to "".
2: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, :
Missing characters are converted to "".
3: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, :
Missing characters are converted to "".
4: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, :
Missing characters are converted to "".
5: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, :
Missing characters are converted to "".
6: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, :
Missing characters are converted to "".
7: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, :
Missing characters are converted to "".
8: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, :
Missing characters are converted to "".
9: In add.gdsn(ans, nm[i], val[[i]], compress = compress, closezip = closezip, :
Missing characters are converted to "".

My command is:
Rscript 0.2.3gds2agds.R 1

and the R script is:
##########################################################################

Input

##########################################################################

gds file

dir_geno <- "/xxx/variants/rare_variant/GDS_file/"
gds_file_name_1 <- "phenotype.chr"
gds_file_name_2 <- ".2802.mac1.filt.gds"

annotation file (output of Annotate.R)

dir_anno <- "/xxx/variants/rare_variant/Anno/"
anno_file_name_1 <- "Anno_chr"
anno_file_name_2 <- "_STAARpipeline.csv"

chr <- as.numeric(commandArgs(TRUE)[1])

###########################################################################

Main Function

###########################################################################

load required package

library(gdsfmt)
library(SeqArray)
library(SeqVarTools)
library(readr)

read annotation data

FunctionalAnnotation <- read_csv(paste0(dir_anno,"chr",chr,"/",anno_file_name_1,chr,anno_file_name_2),
col_types=list(col_character(),col_double(),col_double(),col_double(),col_double(),
col_double(),col_double(),col_double(),col_double(),col_double(),
col_character(),col_character(),col_character(),col_double(),col_character(),
col_character(),col_character(),col_character(),col_character(),col_double(),
col_double(),col_character()))

dim(FunctionalAnnotation)

rename colnames

colnames(FunctionalAnnotation)[2] <- "apc_conservation"
colnames(FunctionalAnnotation)[7] <- "apc_local_nucleotide_diversity"
colnames(FunctionalAnnotation)[9] <- "apc_protein_function"

open GDS

gds.path <- paste0(dir_geno,gds_file_name_1,chr,gds_file_name_2)
genofile <- seqOpen(gds.path, readonly = FALSE)

Anno.folder <- index.gdsn(genofile, "annotation/info")
add.gdsn(Anno.folder, "FunctionalAnnotation", val=FunctionalAnnotation, compress="LZMA_ra", closezip=TRUE)

seqClose(genofile)

error if there is no level in a factor variable

library(gdsfmt)

f <- createfn.gds("t.gds")

val <- factor(rep(NA, 10))
add.gdsn(f, "val", val)
Error in put.attr.gdsn(ans, "R.levels", levels(val)) : 
  The length of values should be > 0.
Calls: add.gdsn -> put.attr.gdsn
Execution halted

gdsSubset and other gds object interactions are not practial for loops/parallel

If you want to gdsSubset your object, you cannot have it open, which is incredibly impractical for downstream use if I need information from it, for example sample IDs.

# prune to the relevant biomarker overlap
gds <- GdsGenotypeReader(gds_fn)
sample.sel <- getScanID(gds)

# for each biomarker make grm
lapply(bios, function(i) {
  # intersection for variable of interest
  both <- intersect(a[DS1V == 0 & !is.na(get(i)), PIDN], sample.sel)
  x <- tempfile()
  gdsSubset(gds_fn, x, sample.include = both)

  # run PC-Relate
  x <- GenotypeBlockIterator(x)
  mypcrelate <- pcrelate(x, 
    pcs = mypcair$vectors[both, 1:8], 
    training.set = intersect(mypcair$unrels, both),
    BPPARAM=BiocParallel::SerialParam()
  )

  # write pcrelate RDS and GRM to file
  saveRDS(mypcrelate, file = paste("mypcrelate", i, "rds", sep = "."))
})

This will fail unless I close the object first, and would be a major chokepoint if I had to access elements inside the gds object in the lapply, as it would open and close the gds file for each iteration. Not to mention impossible to use with something like future_lapply trying to parallelize the process since it only allows the file to be open once. Is there a reason it has to lock the file for use?

Trouble installing on HPC cluster

Updated 03/29/2021: was able to install the package using the following commands:
wget --no-check-certificate https://github.com/zhengxwen/gdsfmt/tarball/master -O gdsfmt_latest.tar.gz
R CMD INSTALL gdsfmt_latest.tar.gz
Hope this helps anyone having the same issue.

Dear Dr. Zheng,

I was able to install the package "gdsfmt" on my own computer but was not able to do so on our university's computing cluster. I have enclosed the error messages:

  • installing source package 'gdsfmt' ...
    ** using staged installation
    ** libs
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c R_CoreArray.cpp -o R_CoreArray.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c gdsfmt.cpp -o gdsfmt.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c digest.cpp -o digest.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/CoreArray.cpp -o CoreArray/CoreArray.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dAllocator.cpp -o CoreArray/dAllocator.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dAny.cpp -o CoreArray/dAny.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dBase.cpp -o CoreArray/dBase.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dBitGDS.cpp -o CoreArray/dBitGDS.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dEndian.cpp -o CoreArray/dEndian.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dFile.cpp -o CoreArray/dFile.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dParallel.cpp -o CoreArray/dParallel.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dParallel_Ext.c -o CoreArray/dParallel_Ext.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dPlatform.cpp -o CoreArray/dPlatform.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dRealGDS.cpp -o CoreArray/dRealGDS.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dSerial.cpp -o CoreArray/dSerial.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dStrGDS.cpp -o CoreArray/dStrGDS.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dStream.cpp -o CoreArray/dStream.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dStruct.cpp -o CoreArray/dStruct.o
    g++ -std=gnu++11 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/dVLIntGDS.cpp -o CoreArray/dVLIntGDS.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/adler32.c -o ZLIB/adler32.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/compress.c -o ZLIB/compress.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/crc32.c -o ZLIB/crc32.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/deflate.c -o ZLIB/deflate.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/infback.c -o ZLIB/infback.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/inffast.c -o ZLIB/inffast.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/inflate.c -o ZLIB/inflate.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/inftrees.c -o ZLIB/inftrees.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/trees.c -o ZLIB/trees.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/uncompr.c -o ZLIB/uncompr.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c ZLIB/zutil.c -o ZLIB/zutil.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c LZ4/lz4.c -o LZ4/lz4.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c LZ4/lz4hc.c -o LZ4/lz4hc.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c LZ4/lz4frame.c -o LZ4/lz4frame.o
    gcc -std=gnu99 -I"/nfs/apps/R/3.6.0/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c LZ4/xxhash.c -o LZ4/xxhash.o
    cd XZ && tar -xzf xz-5.2.3.tar.gz
    cd XZ/xz-5.2.3 &&
    ./configure CC="gcc -std=gnu99" CPP="gcc -std=gnu99 -E" CXX="g++ -std=gnu++11" CXXCPP="g++ -std=gnu++11 -E" --build=""
    --with-pic --enable-silent-rules --quiet --disable-xz > /dev/null
    cd XZ/xz-5.2.3/src/liblzma && make
    make[1]: Entering directory /tmp/269292.1.int.q/RtmpZXKb9P/R.INSTALL85751e980379/gdsfmt/src/XZ/xz-5.2.3/src/liblzma' Making all in api make[2]: Entering directory /tmp/269292.1.int.q/RtmpZXKb9P/R.INSTALL85751e980379/gdsfmt/src/XZ/xz-5.2.3/src/liblzma/api'
    make[2]: Nothing to be done for all'. make[2]: Leaving directory /tmp/269292.1.int.q/RtmpZXKb9P/R.INSTALL85751e980379/gdsfmt/src/XZ/xz-5.2.3/src/liblzma/api'
    make[2]: Entering directory `/tmp/269292.1.int.q/RtmpZXKb9P/R.INSTALL85751e980379/gdsfmt/src/XZ/xz-5.2.3/src/liblzma'
    CC liblzma_la-tuklib_physmem.lo
    CC liblzma_la-tuklib_cpucores.lo
    CC liblzma_la-common.lo
    CC liblzma_la-block_util.lo
    CC liblzma_la-easy_preset.lo
    CC liblzma_la-filter_common.lo
    CC liblzma_la-hardware_physmem.lo
    CC liblzma_la-index.lo
    CC liblzma_la-stream_flags_common.lo
    CC liblzma_la-vli_size.lo
    CC liblzma_la-hardware_cputhreads.lo
    CC liblzma_la-alone_encoder.lo
    CC liblzma_la-block_buffer_encoder.lo
    CC liblzma_la-block_encoder.lo
    CC liblzma_la-block_header_encoder.lo
    CC liblzma_la-easy_buffer_encoder.lo
    CC liblzma_la-easy_encoder.lo
    CC liblzma_la-easy_encoder_memusage.lo
    CC liblzma_la-filter_buffer_encoder.lo
    CC liblzma_la-filter_encoder.lo
    CC liblzma_la-filter_flags_encoder.lo
    CC liblzma_la-index_encoder.lo
    CC liblzma_la-stream_buffer_encoder.lo
    CC liblzma_la-stream_encoder.lo
    CC liblzma_la-stream_flags_encoder.lo
    Hangup
    make[2]: *** wait: No child processes. Stop.
    make[2]: *** Waiting for unfinished jobs....
    make[2]: *** wait: No child processes. Stop.
    make[1]: *** wait: No child processes. Stop.
    make[1]: *** Waiting for unfinished jobs....
    make[1]: *** wait: No child processes. Stop.
    make: *** wait: No child processes. Stop.
    make: *** Waiting for unfinished jobs....
    make: *** wait: No child processes. Stop.

Updated 03/29/2021:
Switched to R/4.0.3 and still wasn't able to install the package. The error messages are as follows:

  • installing source package 'gdsfmt' ...
    ** using staged installation
    ** libs
    g++ -std=gnu++11 -I"/nfs/apps/R/4.0.3/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c R_CoreArray.cpp -o R_CoreArray.o
    g++ -std=gnu++11 -I"/nfs/apps/R/4.0.3/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c gdsfmt.cpp -o gdsfmt.o
    g++ -std=gnu++11 -I"/nfs/apps/R/4.0.3/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c digest.cpp -o digest.o
    g++ -std=gnu++11 -I"/nfs/apps/R/4.0.3/lib64/R/include" -DNDEBUG -DUSING_R -D_FILE_OFFSET_BITS=64 -I../inst/include -ICoreArray -I/usr/local/include -fpic -g -O2 -c CoreArray/CoreArray.cpp -o CoreArray/CoreArray.o
    make: *** wait: No child processes. Stop.
    make: *** Waiting for unfinished jobs....
    make: *** wait: No child processes. Stop.

Would you by any chance happen to know the meaning of these messages and perhaps how to install this package?

Thanks a lot in advance!

Sincerely,

Ian

Check for local libs prior to building duplicates

The build system straightaway builds its own zlib etc., this is suboptimal: instead, configure should check for existing libs and build its own only when there are none found (or versions are too old, if there is a particular requirement).

How to add two gds files

If I create a gds file like so:

gfile <- createfn.gds("test1.gds")
add.gdsn(gfile,  input_stuff1_here...)
gfile2 <- createfn.gds("test2.gds")
add.gdsn(gfile2,  input_stuff2_here....)

How do I go about adding the two together so that I have one gds object/file containing both input_stuff and input_stuff_2 i.e. the contents of both files.

Thanks!

how to share GDS file

Hi,
I have a gds file on HPC. Two people or Process will analysis this gds file with GWASTools at the same time. It seems that it is not allow to access gds file at the same time when one has opened gds file. How to solve this? Thanks.

R crashes if print() is called on a previously deleted node

The call .Call(gdsNodeValid, x) in print.gdsn.class crashes the R session if x is a previously deleted node.

library(gdsmt)
f <- createfn.gds("test.gds")
n <- add.gdsn(f, "vec", 1:10)
delete.gdsn(n, TRUE)
## THIS WILL CRASH YOUR R SESSION!!!
# print(n)
sessionInfo()
# R version 3.2.1 (2015-06-18)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Linux Mint LMDE
# 
# locale:
#     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#     [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#     [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#     
# attached base packages:
#     [1] stats     graphics  grDevices utils     datasets  methods   base     
#     
# other attached packages:
#     [1] gdsfmt_1.5.8
# 
# loaded via a namespace (and not attached):
#     [1] tools_3.2.1

append data to the variable in a compressed manner (has to decompress data first)

I have a GDS file created like this (with gdsfmt v1.4.0):

testfile <- tempfile()
x <- createfn.gds(testfile)
add.gdsn(x, "sample.id", storage="integer", valdim=0, compress="ZIP.max")
append.gdsn(index.gdsn(x, "sample.id"), val=1)
readmode.gdsn(index.gdsn(x, "sample.id"))
closefn.gds(x)

Now I want to append more data to "sample.id". So I try opening the file and uncompressing before writing, as recommended in the man page for "readmode.gdsn":

x <- openfn.gds(testfile, readonly=FALSE)
compression.gdsn(index.gdsn(x, "sample.id"), "")
append.gdsn(index.gdsn(x, "sample.id"), val=2)

I get this error:

Error in append.gdsn(index.gdsn(x, "sample.id"), val = 2) :
 Read-only and please call 'compression.gdsn(node, "")' before writing.

Do I need to do something else to switch this node out of read-only mode?

thanks,
Stephanie

adding metadata

Hi
could you please help me add metadata to a gds created from VCF.
I have cryptic sample IDs and want to add a new node for real names (as example)
These names are stored in a vector of same length as the 'sample.id' node
I did not find example code yet for adding to an existing dataset
Thanks in advance!

print(genofile, all=TRUE)
File: /data.gds (835.7M)

  • [ ] *
    |--+ sample.id { Str8 1135 ZIP_ra(40.0%), 2.2K }
    |--+ snp.id { Int32 3050272 ZIP_ra(34.6%), 4.0M }
    |--+ snp.rs.id { Str8 3050272 ZIP_ra(0.10%), 2.9K }
    |--+ snp.position { Int32 3050272 ZIP_ra(38.1%), 4.4M }
    |--+ snp.chromosome { Str8 3050272 ZIP_ra(0.10%), 5.8K }
    |--+ snp.allele { Str8 3050272 ZIP_ra(15.5%), 1.8M }
    |--+ genotype { Bit2 1135x3050272, 825.4M } *
    --+ snp.annot [ ]
    |--+ qual { Float32 3050272 ZIP_ra(0.32%), 38.1K }
    --+ filter { Str8 3050272 ZIP_ra(0.15%), 21.7K }

Opened GDS file can be garbage collected

Since gdsfmt_v1.23.9:

library(gdsfmt)

fc <- function()
{
    f <- createfn.gds("test.gds")
    node <- add.gdsn(f, name="list", val=list(x=c(1,2), y=c("T","B","C"), z=TRUE))
    # forget to close the file
}

fc()
showfile.gds()
#                      FileName ReadOnly State
#  1 /Documents/GitHub/test.gds    FALSE  open

gc()
showfile.gds() # no opened file

Bug in gdsfmtv1.1.1

When converting a very large dataset from .vcf to .gds using snpgdsVCF2GDS(), the following error was encountered:

Error:
FILE: eciton_SNPs_no_rep_10000min.vcf
LINE: 26729, COLUMN: 7, PASS
Invalid Zip Deflate Stream operation 'Seek'!

Thanks,

Max

Bug in unload.gdsn()

Segfault fails or shows Invalid stream header with random access.

library(gdsfmt)
nm <- "genotype/@data"

fn <- system.file("extdata", "CEU_Exon.gds", package = "SeqArray")
f <- openfn.gds(fn)
(n <- index.gdsn(f, nm))

unload.gdsn(n); n

(n <- index.gdsn(f, nm))

Parse gds files in python

(This message has also been posted as an issue on the pygds repo)

Hello Dr. Zheng,

I was wondering if pygds is still being supported and if it’s compatible with the GDS file format and with modern versions of python, numpy, and gcc? I’ve been experiencing issues installing pygds. In particular, using the command ’pip install git+git://github.com/CoreArray/pygds.git' generates the following error:

src/PyCoreArray.cpp: In function ‘bool pygds_init()’:
src/PyCoreArray.cpp:899:25: error: ‘NUMPY_IMPORT_ARRAY_RETVAL’ was not declared in this scope
899 | if (init() == NUMPY_IMPORT_ARRAY_RETVAL) return false;
| ^~~~~~~~~~~~~~~~~~~~~~~~~
error: command '/usr/lib64/ccache/gcc' failed with exit code 1
----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-ffkgdaz9/setup.py'"'"'; file='"'"'/tmp/pip-req-build-ffkgdaz9/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-btac33gq/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/kyle/.local/include/python3.9/pygds Check the logs for full command output.

Is there any chance you know what may cause this?

Specifications:
OS: Mac 10.15.7
Python: 3.8.5
numpy: 1.21.2

the node name should not contain "/"

library(gdsfmt)

f <- createfn.gds("test.gds")
add.gdsn(f, "zzz/ff", 1:1000)
f

showing an error:

File: test.gds (145B)
+    [  ]
Error in inherits(node, "gdsn.class") : No such GDS node "zzz/ff"!

LZ4_RA.max does not work

library(gdsfmt)

v <- sample(c(0,1), 10000, replace=T)

f <- createfn.gds("t.gds")

n1 <- add.gdsn(f, "t1", v, compress="LZ4_RA", closezip=T)
n2 <- add.gdsn(f, "t2", v, compress="LZ4_RA.max", closezip=T)

f
closefn.gds(f)
File: /work/tmp/t.gds (84.3K)
+    [  ]
|--+ t1   { Float64 10000 LZ4_ra(6.90%), 5.4K }
\--+ t2   { Float64 10000 LZ4_ra(100.4%), 78.5K }

caught segfault when recompressing zero-length data

library(gdsfmt)

f <- createfn.gds("test.gds")
n <- add.gdsn(f, "i1", integer(), compress="ZIP_RA", closezip=TRUE)
closefn.gds(f)

f <- openfn.gds("test.gds", FALSE)
compression.gdsn(index.gdsn(f, "i1"), "LZMA_RA")
closefn.gds(f)

openfn.gds("test.gds")

showing that

 *** caught segfault ***
address 0xd00000062, cause 'memory not mapped'

Traceback:
 1: ls.gdsn(node, include.hidden = all)

Transpose an array by permuting its dimensions

Here is an example using apply.gdsn with a target GDS node:

library(gdsfmt)

# cteate a GDS file
f <- createfn.gds("test.gds")

(n1 <- add.gdsn(f, "array", val=array(1:120, dim=c(5,4,3,2))))
read.gdsn(n1)

showing:

, , 1, 1
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

, , 2, 1
     [,1] [,2] [,3] [,4]
[1,]   21   26   31   36
[2,]   22   27   32   37
[3,]   23   28   33   38
[4,]   24   29   34   39
[5,]   25   30   35   40

, , 3, 1
     [,1] [,2] [,3] [,4]
[1,]   41   46   51   56
[2,]   42   47   52   57
[3,]   43   48   53   58
[4,]   44   49   54   59
[5,]   45   50   55   60

, , 1, 2
     [,1] [,2] [,3] [,4]
[1,]   61   66   71   76
[2,]   62   67   72   77
[3,]   63   68   73   78
[4,]   64   69   74   79
[5,]   65   70   75   80

, , 2, 2
     [,1] [,2] [,3] [,4]
[1,]   81   86   91   96
[2,]   82   87   92   97
[3,]   83   88   93   98
[4,]   84   89   94   99
[5,]   85   90   95  100

, , 3, 2
     [,1] [,2] [,3] [,4]
[1,]  101  106  111  116
[2,]  102  107  112  117
[3,]  103  108  113  118
[4,]  104  109  114  119
[5,]  105  110  115  120

Then use aperm with a parameter perm=c(2,1,3) to permute the dimensions:

n1.1 <- add.gdsn(f, "permuting", storage="int", valdim=c(4,5,3,0))
apply.gdsn(n1, margin=4, FUN=aperm, as.is="gdsnode", target.node=n1.1,
    perm=c(2,1,3))

read.gdsn(n1.1)
closefn.gds(f)

showing:

, , 1, 1
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20

, , 2, 1
     [,1] [,2] [,3] [,4] [,5]
[1,]   21   22   23   24   25
[2,]   26   27   28   29   30
[3,]   31   32   33   34   35
[4,]   36   37   38   39   40

, , 3, 1
     [,1] [,2] [,3] [,4] [,5]
[1,]   41   42   43   44   45
[2,]   46   47   48   49   50
[3,]   51   52   53   54   55
[4,]   56   57   58   59   60

, , 1, 2
     [,1] [,2] [,3] [,4] [,5]
[1,]   61   62   63   64   65
[2,]   66   67   68   69   70
[3,]   71   72   73   74   75
[4,]   76   77   78   79   80

, , 2, 2
     [,1] [,2] [,3] [,4] [,5]
[1,]   81   82   83   84   85
[2,]   86   87   88   89   90
[3,]   91   92   93   94   95
[4,]   96   97   98   99  100

, , 3, 2
     [,1] [,2] [,3] [,4] [,5]
[1,]  101  102  103  104  105
[2,]  106  107  108  109  110
[3,]  111  112  113  114  115
[4,]  116  117  118  119  120

"Stream read error" occurs when storage="character" (variable-length string)

gdsfmt_1.3.1 has a bug which was identified recently.
It is fixed in gdsfmt_1.3.2, and please install the latest gdsfmt package immediately.

library(gdsfmt)

gfile <- createfn.gds("test.gds")
node <- add.gdsn(gfile, "data", as.character(1:1000))
read.gdsn(node, start=300, count=1)

closefn.gds(gfile)

showing:

Error in read.gdsn(node, start = 300, count = 1) : Stream read error

However, the following code works with gdsfmt_1.3.1:

library(gdsfmt)

gfile <- createfn.gds("test.gds")
node <- add.gdsn(gfile, "data", as.character(1:1000))
read.gdsn(node)
read.gdsn(node, start=300, count=1)

closefn.gds(gfile)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.