Comments (10)
Hey @edg1983, reading bgen's that are indexed with bgenix should work fine on cloud object storage
You do not need to read the index itself, just the bgen like you did before, but ensure that the index is on the same path.
Please provide more details about the environment are you running glow in
from glow.
Hi,
I'm testing glow on a local Spark stand-alone implementation (especially we are interested in the GWAS pipeline) and everything else worked fine so far. Essentially, I initialize a SparkSession with pyspark using local[24] as master and additional packages for delta and glow.
I'm reading the BGEN file directly as you suggested, but when a BGI index is present I get the error reported above, from which it seems that there is some issue reading the index file.
from glow.
ah ok thanks,
Here is the offending line of code:
It is using SQLite to access the index. But cannot find the sqlite classes and driver on the class path. Do you have the sqlite jdbc jar on your stand-alone implementation of spark? Here is a similar issue on stack overflow:
https://stackoverflow.com/questions/16725377/no-suitable-driver-found-sqlite
from glow.
@edg1983 were you able to resolve this?
Glow depends on sqlite-jdbc 3.20.1
from glow.
Hi!
Apologise for the late reply... In the end I've built a container with all Spark and Python dependencies and it works now!
Thanks!
from glow.
thanks @edg1983,
Can we work together to contribute this container back to Glow?
It should be straightforward as we already have a container for running Glow in Databricks and a dockerhub subscription.
This would benefit the community as we have had requests to make it easier to run Glow in a container
from glow.
Hi, our main interest is using GLOW to run regenie GWAS algorithm at scale using Spark implementation provided in the GloWGR pipeline. So I've made a container based on the datamechanics docker image for Spark ( gcr.io/datamechanics/spark:3.1.2-hadoop-3.2.0-java-11-scala-2.12-python-3.8-dm16), integrated with additional jar and python dependencies for GLOW.
The idea is to use this docker image to deploy the system at scale using kubernetes so that we can adapt easily for local run on our HPC as well as cloud run on UKBB RAP or other cloud platforms.
The image I've optimized so far is available in DockerHub as edg1983/glowgr-spark:v1 and it packed spark and GLOW. Essentially, you can run any python script containing GLOW analysis in a stand-alone mode using something like the following command (it's singularity because we can not run docker on our HPC, but it's using the same image) with a test.py
script that then initialize spark config.
singularity run \
--bind /your/output/path:/output \
--bind /your/input/path:/input \
--bind /path/to/python/script:/opt/application \
--bind /path/to/tmp/dir:/spark_tmp \
glowgr-spark_v1.sif \
driver --driver-memory 120G \
local:///opt/application/test.py
It worked fine in my tests so far and we are now working to make it run on kubernetes. Feel free to test it more and let me know if this can be of interest.
from glow.
This is great, thanks. Would like to translate this into something anyone can use
Do you have the dockerfiles in a repo that I could look at and see if we can contribute back to glow?
Thanks
from glow.
This is the Dockerfile I'm using right now. Feel free to improve and/or re-distribute this as long as my contribution is properly acknowledged.
Dockerfile_glow.zip
from glow.
thanks @edg1983 , working on a container here, #503
please could you test projectglow/open-source-glow:1.1.2
https://hub.docker.com/r/projectglow/open-source-glow/tags
to see if it works the same as your container?
Acknowledged you in the documentation
Thanks!
from glow.
Related Issues (20)
- Improve docs: Document usage of schema when reading/writing VCF HOT 2
- VCF files with spaces in the file name cannot be read HOT 4
- configure sparkSession with glow HOT 2
- VEP exone/intron annotations failure HOT 5
- Improve handling of environment variables in the pipe transformer HOT 5
- Release with spark3.2 support HOT 1
- Vulnerable shared library might make glow-spark3 vulnerable. Can you help upgrade to patch versions? HOT 5
- Docker build.sh script is not current in tagged release HOT 1
- Ask: add glow to chocolatey HOT 4
- VCF Infinity/NaN values are not handled according to VCF spec
- PySpark 3.3 support HOT 8
- Load glow mave package from Python HOT 5
- spark.read.format("vcf").load() fails for vcf.bgz files HOT 3
- Cannot write INFO fields with LongType to VCF HOT 2
- Python tests fail with KeyError: '_glow_regression_values' HOT 1
- Logistic regression ValueError: Null fit failed! HOT 2
- AnalysisException: Column 'num_workers' does not exist. HOT 1
- pipe transformer support for file input/output for command line apps HOT 2
- Interaction Tests with GLOW HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glow.