Giter Club home page Giter Club logo

Comments (10)

williambrandler avatar williambrandler commented on June 26, 2024

Hey @edg1983, reading bgen's that are indexed with bgenix should work fine on cloud object storage
You do not need to read the index itself, just the bgen like you did before, but ensure that the index is on the same path.

Please provide more details about the environment are you running glow in

from glow.

edg1983 avatar edg1983 commented on June 26, 2024

Hi,
I'm testing glow on a local Spark stand-alone implementation (especially we are interested in the GWAS pipeline) and everything else worked fine so far. Essentially, I initialize a SparkSession with pyspark using local[24] as master and additional packages for delta and glow.

I'm reading the BGEN file directly as you suggested, but when a BGI index is present I get the error reported above, from which it seems that there is some issue reading the index file.

from glow.

williambrandler avatar williambrandler commented on June 26, 2024

ah ok thanks,

Here is the offending line of code:

val dbi = new DBI(s"jdbc:sqlite:$localIdxPath")

It is using SQLite to access the index. But cannot find the sqlite classes and driver on the class path. Do you have the sqlite jdbc jar on your stand-alone implementation of spark? Here is a similar issue on stack overflow:

https://stackoverflow.com/questions/16725377/no-suitable-driver-found-sqlite

from glow.

williambrandler avatar williambrandler commented on June 26, 2024

@edg1983 were you able to resolve this?

Glow depends on sqlite-jdbc 3.20.1

from glow.

edg1983 avatar edg1983 commented on June 26, 2024

Hi!
Apologise for the late reply... In the end I've built a container with all Spark and Python dependencies and it works now!

Thanks!

from glow.

williambrandler avatar williambrandler commented on June 26, 2024

thanks @edg1983,

Can we work together to contribute this container back to Glow?

It should be straightforward as we already have a container for running Glow in Databricks and a dockerhub subscription.

This would benefit the community as we have had requests to make it easier to run Glow in a container

from glow.

edg1983 avatar edg1983 commented on June 26, 2024

Hi, our main interest is using GLOW to run regenie GWAS algorithm at scale using Spark implementation provided in the GloWGR pipeline. So I've made a container based on the datamechanics docker image for Spark ( gcr.io/datamechanics/spark:3.1.2-hadoop-3.2.0-java-11-scala-2.12-python-3.8-dm16), integrated with additional jar and python dependencies for GLOW.

The idea is to use this docker image to deploy the system at scale using kubernetes so that we can adapt easily for local run on our HPC as well as cloud run on UKBB RAP or other cloud platforms.
The image I've optimized so far is available in DockerHub as edg1983/glowgr-spark:v1 and it packed spark and GLOW. Essentially, you can run any python script containing GLOW analysis in a stand-alone mode using something like the following command (it's singularity because we can not run docker on our HPC, but it's using the same image) with a test.py script that then initialize spark config.

singularity run \
    --bind /your/output/path:/output \
    --bind /your/input/path:/input \
    --bind /path/to/python/script:/opt/application \
    --bind /path/to/tmp/dir:/spark_tmp \
    glowgr-spark_v1.sif \
    driver --driver-memory 120G \
    local:///opt/application/test.py

It worked fine in my tests so far and we are now working to make it run on kubernetes. Feel free to test it more and let me know if this can be of interest.

from glow.

williambrandler avatar williambrandler commented on June 26, 2024

This is great, thanks. Would like to translate this into something anyone can use

Do you have the dockerfiles in a repo that I could look at and see if we can contribute back to glow?

Thanks

from glow.

edg1983 avatar edg1983 commented on June 26, 2024

This is the Dockerfile I'm using right now. Feel free to improve and/or re-distribute this as long as my contribution is properly acknowledged.
Dockerfile_glow.zip

from glow.

williambrandler avatar williambrandler commented on June 26, 2024

thanks @edg1983 , working on a container here, #503

please could you test projectglow/open-source-glow:1.1.2
https://hub.docker.com/r/projectglow/open-source-glow/tags

to see if it works the same as your container?

Acknowledged you in the documentation

Thanks!

from glow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.