Giter Club home page Giter Club logo

echotabix's People

Contributors

bschilder avatar

Watchers

 avatar  avatar

echotabix's Issues

Add tabix CLI interface

Originally used a wrapper for tabix via the CLI, which was faster but required user installation.

Rhtslib installs tabix for you automatically by compiling the C source code during R package installation.

The sys may also helpful here for creating more robust CLI wrappers.

Handle multiple chromosomes in one `query_granges`

Would be useful to automatically handle situations where query_granges spans multiple chromosomes. Could add an extra loop at the level of query or query_vcf/query_table.

Otherwise, stuff like this can happen:

 query_dat <- rbind(echodata::BST1[1:50,], 
                        echodata::LRRK2[1:50,], fill=TRUE)
     annot_dt <- echoannot::IMPACT_query(query_dat=query_dat,
                              populations="EUR")
     testthat::expect_equal(dim(annot_dt),c(13,1419))
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Inferred format: 'table'
Querying tabular tabix file using: Rsamtools.
Checking query chromosome style is correct.
Chromosome format: 1
Retrieving data.
 Error: scanTabix: '4' not present in tabix index
path: https://zenodo.org/record/7062238/files/IMPACT707_EUR_chr12.annot.bgz?download=1
index: https://zenodo.org/record/7062238/files/IMPACT707_EUR_chr12.annot.bgz.tbi?download=1

`convert()`: `force_new=FALSE` being ignored

Checked and the file it's being written to does indeed already exist:

x <-  "IMPACT707/Annotations/IMPACT707_EAS_chr1.annot.gz"
file.exists(echotabix::construct_tabix_path(target_path = x))
out <  echotabix::convert(target_path = x,
                           chrom_col = "CHR",
                           start_col = "BP",
                           comment_char = "CHR",
                           force_new = FALSE)

and yet file is still being reprocessed:

========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Determining chrom type from file header.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with zgrep.
( zgrep ^'CHR' .../IMPACT707_EAS_chr1.annot.gz; zgrep
    -v ^'CHR' .../IMPACT707_EAS_chr1.annot.gz | sort
    -k1,1n
    -k2,2n ) > .../file3ef1cb2ba03_sorted.tsv

Error in rm_tbi() : could not find function "rm_tbi"

# Solution

A very small but important fix was editing this line in the .Rbuildignore. The syntax was wrong and was ignoring the rm_tbi.R file (not just .tbi files, which i do want to ignore). Therefore echotabix was blind to the rm_tbi function.

.*.tbi --> .*\.tbi$

Reprex

query_dat <- echodata::BST1[seq(1, 50), ] 
locus_dir <- file.path(tempdir(), echodata::locus_dir)  
LD_list <- echoLD::get_LD(
    locus_dir = locus_dir,
    query_dat = query_dat,
    LD_reference = "1KGphase1")
 
Here is what I got:
LD_reference identified as: 1kg.
Using 1000Genomes as LD reference panel.
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
LD Reference Panel = 1KGphase1
Querying 1KG remote server.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Explicit format: 'vcf'
Querying VCF tabix file.
Importing existing VCF file: /tmp/RtmpdpQfkr/VCF/RtmpdpQfkr.chr4-14884541-16649679.ALL.chr4.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.bgz

Session info

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] echotabix_0.99.10

loaded via a namespace (and not attached):
  [1] tidyselect_1.2.0            dplyr_1.1.3                 blob_1.2.4                 
  [4] filelock_1.0.2              R.utils_2.12.2              Biostrings_2.70.1          
  [7] bitops_1.0-7                fastmap_1.1.1               RCurl_1.98-1.13            
 [10] BiocFileCache_2.10.1        VariantAnnotation_1.48.0    GenomicAlignments_1.38.0   
 [13] XML_3.99-0.15               digest_0.6.33               lifecycle_1.0.4            
 [16] KEGGREST_1.42.0             RSQLite_2.3.3               magrittr_2.0.3             
 [19] compiler_4.3.1              rlang_1.1.2                 progress_1.2.2             
 [22] tools_4.3.1                 utf8_1.2.4                  yaml_2.3.7                 
 [25] data.table_1.14.8           rtracklayer_1.62.0          htmlwidgets_1.6.2          
 [28] prettyunits_1.2.0           S4Arrays_1.2.0              bit_4.0.5                  
 [31] curl_5.1.0                  reticulate_1.34.0           DelayedArray_0.28.0        
 [34] xml2_1.3.5                  pkgload_1.3.3               abind_1.4-5                
 [37] BiocParallel_1.36.0         purrr_1.0.2                 BiocGenerics_0.48.1        
 [40] R.oo_1.25.0                 grid_4.3.1                  stats4_4.3.1               
 [43] echoconda_0.99.9            fansi_1.0.5                 biomaRt_2.58.0             
 [46] SummarizedExperiment_1.32.0 cli_3.6.1                   crayon_1.5.2               
 [49] generics_0.1.3              rstudioapi_0.15.0           tzdb_0.4.0                 
 [52] httr_1.4.7                  rjson_0.2.21                piggyback_0.1.5            
 [55] DBI_1.1.3                   cachem_1.0.8                stringr_1.5.1              
 [58] zlibbioc_1.48.0             parallel_4.3.1              AnnotationDbi_1.64.1       
 [61] BiocManager_1.30.22         XVector_0.42.0              restfulr_0.0.15            
 [64] matrixStats_1.1.0           basilisk_1.14.0             vctrs_0.6.4                
 [67] Matrix_1.6-1.1              jsonlite_1.8.7              dir.expiry_1.10.0          
 [70] IRanges_2.36.0              hms_1.1.3                   S4Vectors_0.40.1           
 [73] bit64_4.0.5                 GenomicFeatures_1.54.1      tidyr_1.3.0                
 [76] glue_1.6.2                  codetools_0.2-19            DT_0.30                    
 [79] stringi_1.8.1               GenomeInfoDb_1.38.1         GenomicRanges_1.54.1       
 [82] BiocIO_1.12.0               tibble_3.2.1                pillar_1.9.0               
 [85] htmltools_0.5.7             basilisk.utils_1.14.0       rappdirs_0.3.3             
 [88] GenomeInfoDbData_1.2.11     BSgenome_1.70.1             R6_2.5.1          

GHA MacOS: `bgzip executable could be identified.`

Missing a system dependency? Might be able to circumvent this with one of the echotabix alternatives:
https://github.com/RajLabMSSM/echolocatoR/actions/runs/3357812148/jobs/5563951744#step:21:1

Run options(crayon.enabled = TRUE)
Loading required package: sessioninfo
── R CMD build ─────────────────────────────────────────────────────────────────
* checking for file ‘.../DESCRIPTION’ ... OK
* preparing ‘echolocatoR’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Error: --- re-building ‘BD_GWAS.Rmd’ using rmarkdown
--- finished re-building ‘BD_GWAS.Rmd’
--- re-building ‘echolocatoR.Rmd’ using rmarkdown
Quitting from lines 85-95 (echolocatoR.Rmd) 
Error: Error: processing vignette 'echolocatoR.Rmd' failed with diagnostics:
bgzip executable could be identified.
--- failed re-building ‘echolocatoR.Rmd’
--- re-building ‘finemapping_portal.Rmd’ using rmarkdown
Downloading: https://github.com/RajLabMSSM/Fine_Mapping_Shiny/raw/master/www/data/GWAS/Nalls23andMe_2019/ASXL3/multi_finemap/ASXL3.UKB.multi_finemap.csv.gz
trying URL 'https://github.com/RajLabMSSM/Fine_Mapping_Shiny/raw/master/www/data/GWAS/Nalls23andMe_2019/ASXL3/multi_finemap/ASXL3.UKB.multi_finemap.csv.gz'
Content type 'application/octet-stream' length 157348 bytes (153 KB)
==================================================
downloaded 153 KB
Downloading: https://github.com/RajLabMSSM/Fine_Mapping_Shiny/raw/master/www/data/GWAS/Nalls23andMe_2019/BIN3/multi_finemap/BIN3.UKB.multi_finemap.csv.gz
trying URL 'https://github.com/RajLabMSSM/Fine_Mapping_Shiny/raw/master/www/data/GWAS/Nalls23andMe_2019/BIN3/multi_finemap/BIN3.UKB.multi_finemap.csv.gz'
Content type 'application/octet-stream' length 230[37](https://github.com/RajLabMSSM/echolocatoR/actions/runs/3357812148/jobs/5563951744#step:21:38)0 bytes (224 KB)
==================================================
downloaded 224 KB
Downloading: https://github.com/RajLabMSSM/Fine_Mapping_Shiny/raw/master/www/data/GWAS/Nalls23andMe_2019/ASXL3/LD/ASXL3.UKB.LD.csv.gz
trying URL 'https://github.com/RajLabMSSM/Fine_Mapping_Shiny/raw/master/www/data/GWAS/Nalls23andMe_2019/ASXL3/LD/ASXL3.UKB.LD.csv.gz'
Content type 'application/octet-stream' length 66098 bytes (64 KB)
==================================================
downloaded 64 KB
Downloading: https://github.com/RajLabMSSM/Fine_Mapping_Shiny/raw/master/www/data/GWAS/Nalls23andMe_2019/BIN3/LD/BIN3.UKB.LD.csv.gz
trying URL 'https://github.com/RajLabMSSM/Fine_Mapping_Shiny/raw/master/www/data/GWAS/Nalls23andMe_2019/BIN3/LD/BIN3.UKB.LD.csv.gz'
Content type 'application/octet-stream' length 100237 bytes (97 KB)
==================================================
downloaded 97 KB
--- finished re-building ‘finemapping_portal.Rmd’
--- re-building ‘plot_locus.Rmd’ using rmarkdown
Failed with error:  'there is no package called 'pals''
Failed with error:  'there is no package called 'pals''
The magick package is required to crop "/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/RtmpBvAYS5/Rbuildabab4d92368a/echolocatoR/vignettes/plot_locus_files/figure-html/trk_plot-1.png" but not available.
Failed with error:  'there is no package called 'pals''
Failed with error:  'there is no package called 'pals''
The magick package is required to crop "/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/RtmpBvAYS5/Rbuildabab4d92368a/echolocatoR/vignettes/plot_locus_files/figure-html/modify track-1.png" but not available.
Failed with error:  'there is no package called 'pals''
Failed with error:  'there is no package called 'pals''
The magick package is required to crop "/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/RtmpBvAYS5/Rbuildabab4d92368a/echolocatoR/vignettes/plot_locus_files/figure-html/trk_plot.xgr-1.png" but not available.
Failed with error:  'there is no package called 'pals''
Failed with error:  'there is no package called 'pals''
The magick package is required to crop "/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/RtmpBvAYS5/Rbuildabab4d92368a/echolocatoR/vignettes/plot_locus_files/figure-html/trk_plot.QTL-1.png" but not available.
--- finished re-building ‘plot_locus.Rmd’
--- re-building ‘QTLs.Rmd’ using rmarkdown
[tabix] the index file exists. Please use '-f' to overwrite.
Failed with error:  'there is no package called 'seqminer''
Quitting from lines 74-83 (QTLs.Rmd) 
Error: Error: processing vignette 'QTLs.Rmd' failed with diagnostics:
there is no package called 'seqminer'
--- failed re-building ‘QTLs.Rmd’
--- re-building ‘summarise.Rmd’ using rmarkdown
The magick package is required to crop "/private/var/folders/24/8k[48](https://github.com/RajLabMSSM/echolocatoR/actions/runs/3357812148/jobs/5563951744#step:21:49)jl6d249_n_qfxwsl6xvm0000gn/T/RtmpBvAYS5/Rbuildabab4d92368a/echolocatoR/vignettes/summarise_files/figure-html/super_summary_plot()-1.png" but not available.
--- finished re-building ‘summarise.Rmd’
SUMMARY: processing the following files failed:
  ‘echolocatoR.Rmd’ ‘QTLs.Rmd’
Error: Error: Vignette re-building failed.
Execution halted
Error: Error in proc$get_built_file() : Build process failed
Calls: <Anonymous> ... build_package -> with_envvar -> force -> <Anonymous>
Execution halted
Error: Process completed with exit code 1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.