Giter Club home page Giter Club logo

terra-docker's Introduction

This repo provides docker images for running jupyter notebook in Terra

Contributing

Make sure to go through the contributing guide as you make changes to this repo.

Terra Base Images

terra-jupyter-base

terra-jupyter-python

terra-jupyter-r

terra-jupyter-hail

terra-jupyter-gatk

terra-jupyter-bioconductor

How to create your own Custom image to use with notebooks on Terra

Custom docker images need to use a Terra base image (see above) in order to work with the service that runs notebooks on Terra.

  • You can use any of the base images above
  • Here is an example of how to build off of a base image: Add FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-base:0.0.1 to your dockerfile (terra-jupyter-base is the smallest image you can extend from)
  • Customize your image (see the terra-jupyter-python dockerfile for an example of how to extend from one of our base images
  • Publish the image to either GCR or Dockerhub; the image must be public to be used
  • Use the published container image location when creating notebook runtime
  • Dockerhub image example: [image name]:[tag]
  • GCR image example: us.gcr.io/repository/[image name]:[tag]
  • Since 6/28/2021, we introduced a few changes that might impact building custom images
    • Home directory of new images will be /home/jupyter. This means if your dockerfile is referencing /home/jupyter-user directory, you need to update it to $HOME (recommended) or /home/jupyter.
    • Creating VMs with custom images will take much longer than terra supported images because docker pull will take a few min. If the custom image ends up being too large, VM creation may time out. New base images are much larger in size than previous versions.

Development

Using git secrets

Make sure git secrets is installed:

brew install git-secrets

Ensure git-secrets is run: If you use the rsync script to run locally you can skip this step

cp -r hooks/ .git/hooks/
chmod 755 .git/hooks/apply-git-secrets.sh

Run/developing smoke_test.ipynb file locally

Run your image locally with the repo directory mounted into the container. For example

docker run -d -p <port_number>:8000 -v <your_local_path_to_the_repo>/terra-docker:/home/jupyter -it us.gcr.io/broad-dsp-gcr-public/terra-jupyter-r:test

Once you have the container running, you should be able to access jupyter at http://localhost:<port_number>/notebooks. You should be able to navigate to the smoke test ipynb file you're interested in, and run a cell. After you modify a smoke test .ipynb file, go to Cell -> All Ouput -> Clear to clear all outputs to keep the .ipynb files smaller.

Generate New Image or Update Existing Image

Detailed documentation on how to integrate the terra-docker image with Leonardo can be found here

If you are adding a new image:

  • Create a new directory with the Dockerfile and a CHANGELOG.md.
  • Add the directory name (also referred to as the image name) as an entry to the image_data array in the file in config/conf.json. For more info on what is needed for a new image, see the section on the config
  • If you wish the image to be baked into our custom image, which makes the runtime load significantly faster (recommended), make a PR into the leonardo repo doing the following within the jenkins folder:
    • Add the image to the parameter list in the Jenkinsfile
    • Update the relevant prepare script in each subdirectory. Currently there is a prepare script for gce and dataproc.
    • It is recommended to add a test in the automation directory (automation/src/test/resources/reference.conf)
    • Add your image to the reference.conf in the automation directory. This will be the only place any future version updates to your image happen. This ensures, along with the test in the previous step, that any changes to the image are tested.
    • Run the GHA to generate the image, and add it to reference.conf in the http directory (http/src/main/resources/reference.conf)

If you are updating an existing image:

Testing your image manually

Build the image: run docker build [your_dir] -t [name].

docker build terra-jupyter-base -t terra-jupyter-base

If you're on an M1 and building an image from a locally built image, replace the current FROM command:

FROM --platform=linux/amd64 terra-jupyter-base

It is not advised to run build.sh locally, as this will push to the remote docker repo and delete the image locally upon completion.

All images can be run locally. For example:

docker run --rm -it -p 8000:8000 us.gcr.io/broad-dsp-gcr-public/terra-jupyter-base:0.0.7

Then navigate a browser to http://localhost:8000/notebooks to access the Jupyter UI.

You can gain root access and open a bash terminal as follows:

docker run --rm -it -u root -p 8000:8000 --entrypoint /bin/bash us.gcr.io/broad-dsp-gcr-public/terra-jupyter-base:0.0.7

Running locally is conventient for quick development and exploring the image. However it has some limitations compared to running through Terra. Namely:

  • there are no service account credentials when run locally
  • there are no environment variables like GOOGLE_PROJECT, WORKSPACE_NAME, WORKSPACE_BUCKET, etc when running locally
  • there is no workspace-syncing when run locally

To launch an image through Terra, navigate to https://app.terra.bio or your BEE's UI, select a workspace, enter your new image in the "Custom Image" field, and click Create.

Automation Tests

Here are automation tests for various docker image, please update the image hash for relevant tests. You can run the job build-terra-docker to automatically create a PR with your branch if you manually specify versions.

Config

There is a config file located at config/conf.json that contains the configuration used by all automated jobs and build scripts that interface with this repo.

There is a field for "spark_version" top-level which must be updated if we update the debian version used in the custom image. Currently it assumes 1.4x https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-1.4

There are some constants included, such as the tools supported by this repo. Of particular interest is the image_data array.

Each time you update or add an image, you will need to update the appropriate entry in this array:

{
    "name": "terra-jupyter-base", //the name of the image. should correspond to the directory it is located

    "base_label": "Minimal",      //the base name used in the UI for this image. This is appended with some information about the packages in this image.

    "tools": ["python"],          //the tools present in this image. see the top-level "tools" array for valid entries. 
                                  //The significance of 'tools' is that there is expected to be an entry in the documentation specifying the version of this tool
                                  //If you wish to add a tool, you will need to add a handler to the function get_doc_builder in generate_package_documentation.py

    "packages": { "python": ["pandas"] },               //The packages that we wish to single out to display to the user at a later date. 
                                  //The difference between a package and a tool is that a tool can have a set of packages associated with it (i.e. pip packages for python)
                                  // A package  must have a tool associated with it
    "version": "0.0.4",           //The current version the image is at

    "automated_flags": {          //Flags used as control flow for scripts

        "generate_docs": true,    //Whether documentation should be auto-generated for this image. This is superceded by the build flag (i.e. if build=false, this flag is ignored)

        "build": true,            //Whether or not the jenkins job that builds the docker images in this repo should build this image

        "include_in_custom_dataproc": true,  //Whether or not the jenkins job that builds the custom dataproc image should include this image. 
                                            //This is superceded by the build flag
        "include_in_ui": true, // Whether or not this should be included in the .json file that power the terra ui dropdown for available images
        "include_in_custom_gce": true, //Whether or not the jenkins job that builds the custom gce image should include this image.
                                 //This is superceded by the build flag   
        "requires_spark": true // Whether or not this image requires a dataproc cluster to run (as opposed to most, which just need a GCE VM)
    }   
},

Scripts

The scripts folder has scripts used for building.

  • generate_package_docs.py This script is run once by build.sh each time an image is built. It is used to generate a .json with the versions for the packages in the image.
  • generate_version_docs.py This script is run each time an image is built. It builds a new file master version file for the UI to look up the current versions to reference.

Image dependencies

Note that this dependency graph needs to be updated! Image dependencies

Push Images to GCR

To push images to Broad managed Google Container Registrygcr.io/broad-dsp-gcr-public, manually trigger Publish image to GCR Github action and choose the image to push.

terra-docker's People

Contributors

akarukappadath avatar almahmoud avatar calbach avatar cstrandw avatar danking avatar deflaux avatar gpcarr avatar jdcanas avatar jmthibault79 avatar johnc1231 avatar kyuksel avatar lizbaldo avatar lucymcnatt avatar matthewbemis avatar n8landolt avatar nturaga avatar qi77qi avatar raoberman avatar ravi9 avatar rtitle avatar salonishah11 avatar sharmatime avatar tarekmamdouh avatar tlangs avatar tpoterba avatar xbrianh avatar xuscrbw avatar yihming avatar yonghaoy avatar zbedo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terra-docker's Issues

Adding PLINKv1.9 and PLINKv2.0

Hello,

I'm requesting the addition of the following 2 smaller software packages to be a default on an image available on Terra

I believe installing both have their own merits, as PLINK2 has functionality that PLINK1.9 does not have, or that multiple existing pipelines rely on PLINK1.9 formatting.

Thank you in advance!

Easy method to shrink size of `terra-jupyter-r`

The following line in the terra-jupyter-r Dockerfile is a problem:

&& chown -R $USER:users /usr/local/lib/R/site-library /home/jupyter

The chown operation itself is fine, but since it occurs at the end of the build, it actually copies the whole layer with all the R installs over again. This ends up being several GB.
For example, on a machine that has built terra-jupyter-r, try running docker history terra-jupyter-r and you will likely see something like

IMAGE          CREATED             CREATED BY                                      SIZE      COMMENT
47c09bb5034d   42 minutes ago      /bin/sh -c #(nop)  USER jupyter                 0B        
754d26234c45   42 minutes ago      |1 NCPU=16 /bin/sh -c R -e 'IRkernel::instal…   3.76GB    
...
250ffdaacc93   42 minutes ago      |1 NCPU=16 /bin/sh -c R -e 'BiocManager::ins…   3.76GB   
...

Changing the permissions on /usr/local/lib/R/site-library after all the installs ends up copying all the massive layers that went into that folder (all the package installs).

Proposal:
Move the chown on line 177

&& chown -R $USER:users /usr/local/lib/R/site-library /home/jupyter

up above all the R installs... perhaps to line 111.

terra-jupyter-r question about multi-core operations

Hello! This is a question that's going to be a little bit incomplete... but here it goes:

I have been basing an image off of the terra-jupyter-r:1.0.4 base image for use in Terra notebooks.

I just recently realized that, when I try to run commands like lmFit() in the limma package (for differential expression analysis), only one processor is being used. When I have set up my own R installations in the past, limma's differential expression testing has always automatically parallelized itself over all the available cores. Unfortunately I do not understand much about multi-core computing in R... but I have come to rely on that factor of 16 speedup I get on a 16-cpu machine.

My question:
Is there something about the R installation in terra-jupyter-r that changes a default somewhere for how multi-core operations happen? Something that might have broken the default behavior, which is to parallelize operations over all available cores?

(A bit more information... if I would watch processes in top in the past, there were two steps where I would see multi-core operations: one would show one process using 1600% compute, and the other operation would show 16 processes each using 100% compute. I am guessing they used different means of parallelizing their operations. But they both work on my own R installation, and they both fail to work (they show one process using 100% compute) on terra-jupyter-r.
Working install shows

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS/LAPACK: /home/sfleming/miniconda3/envs/scanpy15/lib/libopenblasp-r0.3.7.so

terra-jupyter-r:1.0.4 shows

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

)
I am not sure what else to look for!

Any tips would be much appreciated!

terra-jupyter-base should preserve /home/jupyter contents and environment variables such as PATH

Currently, in images derived from terra-jupyter-base, it appears that the /home/jupyter directory is completely blown away after docker build and overwritten with a Terra persistent disk mount. Likewise, environment variables such as PATH set during docker build are not preserved. At least, this has been my experience while testing https://github.com/broadinstitute/gatk-workshop-terra-jupyter-image/blob/main/Dockerfile

The consequence of this is that there's no way to properly set up the jupyter user's environment and configuration during docker build. I had to resort to asking users of our image to run a script live in a terminal in their Terra workspace before loading their notebooks, in order to restore some necessary configuration to /home/jupyter

OpenVINO image can't detect GPUs

Hi @ravi9 @cavusmustafa,

A user brought this to my attention, thought I would run it by you. The current terra-jupyter-gatk-ovtf image downgrades tensorflow to 2.5.0 (the base image has 2.6.0 installed). It seems like tf 2.5.0 is not able to detect GPUs. Below is output from a runtime with 1 nvidia-tesla-t4 running the terra-jupyter-gatk-ovtf image:

image

For comparison, here is output from a runtime running the terra-jupyter-gatk image:

image

Question: can we safely use tensorflow 2.6.0 in terra-jupyter-gatk-ovtf? If not, we might need to use a different base image.

Let me know what you think. Thanks!

update of Cromshell causes failed build in terra-jupyter-base

It looks like Cromshell was updated from alpha to beta, but this isn't changed yet in the Dockerfile of terra-jupyter-base.

This line of code will cause the failed build:
RUN cromshell-alpha version

If you change cromshell-alpha to cromshell-beta the build will succeed.

FR: Include igv-jupyter in terra-jupyter-base.

Please install igv-jupyter to render the IGV browser inside Terra notebook cells.

Currently, we require the user to do the following in a trusted notebook (igv-jupyter runs JS on the front end):

Cell 0:

! pip install igv-jupyter

Cell 1:

!jupyter serverextension enable --py igv
!jupyter nbextension install --py igv --user
!jupyter nbextension enable --py igv

Cell 2:

import igv
b = igv.Browser({"genome": "hg38"})
b.show()

For some reason, it's not possible to run the contents of cell 0 and cell 1 in a startup script; cloud environment creation throws an error saying that Jupyter did not start within 10 minutes.

Thanks very much!

FR: install the firecloud python library in the R image

Unless there is a recommended alternative method of accessing the firecloud API from R, please install the firecloud Python library. For R notebooks that access workspace, data, the approach we have used in the past is to use reticulate to call into the Python library.

I can see that the Python Dockerfile includes:

&& pip3 install --ignore-installed firecloud==0.16.25 \

would be great if the R Dockerfile did the same.

Installing reticulate in the R image would be helpful here as well.

Python: making it easier for users to override installed packages

When we try to override a system package i.e. pip3 install --user pysam --force-reinstall and import it, current structure of sys.path doesn't allow us to import the new package directly because the system folder is higher in rank than the user folder:

['/home/jupyter-user/notebooks/whatever/edit',
 '/etc/jupyter/custom',
 '/usr/lib/python37.zip',
 '/usr/lib/python3.7',                                          <-------
 '/usr/lib/python3.7/lib-dynload', 
 '',
 '/home/jupyter-user/.local/lib/python3.7/site-packages',       <-------
 '/usr/local/lib/python3.7/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/usr/local/lib/python3.7/dist-packages/IPython/extensions',
 '/home/jupyter-user/.ipython']

Therefore we have to prepend user site-packages folder to sys.path in order to import the new package:

import sys
sys.path.insert(0, '/home/jupyter-user/.local/lib/python3.7')

which leads to

['/home/jupyter-user/.local/lib/python3.7',
 '/home/jupyter-user/notebooks/whatever/edit',
 '/etc/jupyter/custom',
 '/usr/lib/python37.zip',
 '/usr/lib/python3.7',
 '/usr/lib/python3.7/lib-dynload',
 '',
 '/home/jupyter-user/.local/lib/python3.7/site-packages',
 '/usr/local/lib/python3.7/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/usr/local/lib/python3.7/dist-packages/IPython/extensions',
 '/home/jupyter-user/.ipython']

Would it make sense to add the following line to the base image Dockerfile:

ENV PYTHONPATH $HOME/.local/lib/python3.7:${PYTHONPATH:+$PYTHONPATH}

to make it easier to override system packages?

As a use case, you can just try to upgrade pysam package which is already installed in the default runtime and import the updated version to reproduce the issue.

Setting default R package installation location

Hi,

I am a current user of Terra dockers for notebooks on the Terra website. I saw that R packages installed by users are by default placed in /home/jupyter/packages. This is a smart way to have user-installed packages remain on the persistent disk and also be readable by R.

I'm curious how this was done. Sifting through the base and R Dockerfiles, I can't seem to figure out where you all change the R installation directory.

Would you mind sharing how that was done?

All the best,
Sabrina

issues with `pip install` when building library from source for `terra-jupyter-gatk` img

We're seeing issues with pip install for the terra-jupyter-gatk image when it's necessary to build the library from source.

It appears that PIP_TARGET is set when docker exec is run on the GCE instance: https://github.com/DataBiosphere/leonardo/search?q=PIP_TARGET
Whenever we run pip, it interprets the environment variable PIP_TARGET as the value for the --home flag. pypa/pip#8438 (comment)
BUT . . . whenever pip tries to install any packages from source, it passes flag --prefix and those flags cannot be used together.

By unsetting PIP_TARGET for an installation, we can work around the issue. e.g.:

!unset PIP_TARGET ; pip install -U docstring-parser==0.13

Nicole and I created a notebook to demo the issue: https://app.terra.bio/#workspaces/fc-product-demo/try-saturn/notebooks/launch/pip_install_issues.ipynb
The notebook html is attached in case it's useful.
pip_install_issues.html.zip

Other images may have this same issue, but we haven't tested yet.
Given the workaround, it's not urgent, but other customers are presumably hitting this as well. (I'll send an email with a bit more after the holidays).

/cc @deflaux

Running terra-jupyter-r locally on gcp vm

Hello Terra team,

I'm testing running the terra-jupyter images (particularly the r and python images) on my own gcp VM, as an alternative to running the images on Terra.

It all works reasonably well, with some additional entrypoint parameters. Here's what I use to run an image on a vm with debian 11:

docker run -it -p 8000:8000 -v /home/$USER:/home/jupyter/data --entrypoint "/opt/conda/bin/jupyter" \
us.gcr.io/broad-dsp-gcr-public/terra-jupyter-r:2.1.10 lab --port=8000 --allow-root --ip=0.0.0.0 --no-browser

Jupyter lab then gets launched at port 8000, where I can log in with a token.

However, I am not able to write any file into my mounted volume data/. There's a permission issue, and attempts to sudo chown the directory runs into the issue of requiring password for root access, which I can't seem to figure out.

Any idea on how to proceed? the already install package list on the terra images are very convenient for me, so getting it to work would be great.

New docker images not successfully launching as 'cloud environments'?

I'm test driving the new docker images: I tried us.gcr.io/broad-dsp-gcr-public/terra-jupyter-gatk:2.0.2 and us.gcr.io/broad-dsp-gcr-public/terra-jupyter-python:1.0.1 . When they're used, it doesn't seem that the 'Cloud Environment'/VM comes up successfully. (They error out after a long delay). I tested both with and without use of GPUs.

cmake not available but present in the Dockerfile?

Hi,

So I have this R pkg that has a dependency that needs cmake installed in the VM so that installation completes.

Essentially, I have been trying to make use of the Docker image with R installed provided by the Terra team that installs R, and I also checked that cmake is actually being installed in the VM here.

However, when I go to Terra and I init my Cloud Environment making use of the R image

image

On my workspace, making use of this Cloud Environment, I then face this annoying issue telling that cmake is not available then installation of my package fails.

Thank you

Using Hail with dsub

Hi all, I am trying to use hail via dsub to extract a subset of variants in All of Us server. I think this is the most relevant image I can use https://github.com/DataBiosphere/terra-docker/tree/master/terra-jupyter-hail

But it result in error that pyspark is not found. I tried to install pyspark from https://dlcdn.apache.org/spark/spark-3.1.3/spark-3.1.3-bin-hadoop3.tgz. Now it says No FileSystem for scheme "gs".

May I ask do you have any idea how to use hail via dsub?
Your help is really appreciated!

Custom docker images a lot larger than expected

I have to preface this by saying I'm not very experienced with docker, and most of what I'm doing is stemming from this Terra community guide.

I am trying to update my custom Terra images (in particular, extending the r and bioconductor versions). I have done this successfully in the past, but haven't updated it in a while. So I decided to start from the latest images available, and just added some additional packages on the bioconductor install list (or from CRANS) in the dockerfiles.

My built images ended up being a lot larger than I expected (>15gb), where as my old versions were sitting at ~1gb. I tried just building one of these images directly from dockerfile, with no modification, and my images still ended almost just as large. Curiously, I noticed it's much larger than the corresponding images listed on the Broad gcr (for example: us.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:2.0.0).

Am I missing something here? Was there a compression step that I might've missed? With images this large, it's obviously very difficult to push to dockerhub, which was the way I had been accessing my images.

Inconsistent file permisions for /home/jupyter-user

It seems like all files under /home/jupyter-user should belong to jupyter-user, but some are owned by root

[1] "total 68"                                                               
 [2] "drwxr-xr-x  1 jupyter-user users 4096 Nov 19 11:01 ."                   
 [3] "drwxr-xr-x  1 root         root  4096 Oct 24 17:28 .."                  
 [4] "-rw-r--r--  1 jupyter-user users  220 Jan  1  2000 .bash_logout"        
 [5] "-rw-r--r--  1 jupyter-user users 3836 Oct 24 17:33 .bashrc"             
 [6] "drwxr-xr-x  1 root         root  4096 Oct 24 17:32 .cache"              
 [7] "drwxr-xr-x  2 root         root  4096 Oct 24 17:33 .conda"              
 [8] "drwxrwxr-x  3 jupyter-user users 4096 Nov 19 11:01 .config"             
 [9] "drwxr-xr-x  2 root         root  4096 Oct 24 17:34 .cookiecutter_replay"
[10] "drwxr-xr-x  3 root         root  4096 Oct 24 17:34 .cookiecutters"      
[11] "drwxr-xr-x  1 jupyter-user users 4096 Oct 24 17:34 .jupyter"            
[12] "drwxr-xr-x  1 jupyter-user users 4096 Oct 24 17:34 .local"              
[13] "drwxr-xr-x 16 root         root  4096 Oct 24 17:33 miniconda"           
[14] "drwxrwxrwx  6 root         root  4096 Nov 19 11:00 notebooks"           
[15] "drwxr-xr-x  3 root         root  4096 Oct 24 17:34 .npm"                
[16] "-rw-r--r--  1 jupyter-user users  807 Jan  1  2000 .profile"            
[17] "-rw-r--r--  1 jupyter-user users   37 Oct 24 17:59 .Renviron"           
[18] "drwxr-xr-x  1 jupyter-user users 4096 Nov 19 11:12 .rpackages"  

This causes problems when, for instance, an application tries to use a standard location ~/.cache/... as a file cache.

Issue installing hdf5r in images based on terra-jupyter-r:2.0.1

Hi all,

I am trying to get Seurat working in a Docker image derived from terra-jupyter-r:2.0.1. If I run the terra-jupyter-r:2.0.1 image

sudo docker run -it --rm --entrypoint /bin/bash -v ~/:/home/jupyter/drive:rw terra-jupyter-r:2.0.1

and try to do the following in R,

> library(Seurat)
Attaching SeuratObject
> Read10X_h5('drive/testing/hgmm_12k_raw_gene_bc_matrices_h5.h5')

then I get the error

Error in Read10X_h5("drive/testing/hgmm_12k_raw_gene_bc_matrices_h5.h5") : 
  Please install hdf5r to read HDF5 files

Seems simple enough, so I create the following Dockerfile:

FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-r:2.0.1

# as root
USER root
ENV PIP_USER=false

# additional R packages
RUN R -e "install.packages('hdf5r')"

But here I run into a big mess trying to build this image. Ultimately I get this error message from my docker build:

Error: package or namespace load failed for ‘hdf5r’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/usr/local/lib/R/site-library/00LOCK-hdf5r/00new/hdf5r/libs/hdf5r.so':
  libhdf5_hl.so.100: cannot open shared object file: No such file or directory
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/usr/local/lib/R/site-library/hdf5r’

The downloaded source packages are in
	‘/tmp/RtmpVlLx2Z/downloaded_packages’
Warning message:
In install.packages("hdf5r") :
  installation of package ‘hdf5r’ had non-zero exit status

Similarly, if I try

FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-r:2.0.1

# as root
USER root
ENV PIP_USER=false

# additional R packages
RUN R -e "install.packages('hdf5r', configure.args = '--with-hdf5=/opt/conda/bin/h5cc')"

I get a similar error

> install.packages('hdf5r', configure.args = '--with-hdf5=/opt/conda/bin/h5cc')
Installing package into/usr/local/lib/R/site-library’
(aslibis unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/hdf5r_1.3.4.tar.gz'
Content type 'application/x-gzip' length 2218150 bytes (2.1 MB)
==================================================
downloaded 2.1 MB

* installing *source* packagehdf5r...
** packagehdf5rsuccessfully unpacked and MD5 sums checked
** using staged installation
checking for a sed that does not truncate output... /bin/sed
checking for gawk... no
checking for mawk... mawk
checking for grep that handles long lines and -e... /bin/grep
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking Using provided HDF5 C wrapper... /opt/conda/bin/h5cc
checking for HDF5 libraries... yes (version 1.10.6)
checking hdf5.h usability... no
checking hdf5.h presence... configure: WARNING: hdf5.h: present but cannot be compiled
configure: WARNING: hdf5.h:     check for missing prerequisite headers?
configure: WARNING: hdf5.h: see the Autoconf documentation
yes
configure: WARNING: hdf5.h:     section "Present But Cannot Be Compiled"
configure: WARNING: hdf5.h: proceeding with the compiler's result
configure: WARNING:     ## --------------------------------- ##
configure: WARNING:     ## Report this to [email protected] ##
configure: WARNING:     ## --------------------------------- ##
checking for hdf5.h... no
checking for H5Fcreate in -lhdf5... no
configure: WARNING: Unable to compile HDF5 test program
checking for hdf5_hl.h... no
checking for H5LTpath_valid in -lhdf5_hl... no
configure: WARNING: Unable to compile HDF5_HL test program
checking for main in -lhdf5_hl... no
checking for matching HDF5 Fortran wrapper... /opt/conda/bin/h5fc
Found hdf5 with version: 1.10.6
checking for ggrep... /bin/grep
checking whether /bin/grep accepts -o... yes
checking for ggrep... (cached) /bin/grep
checking whether /bin/grep accepts -o... yes
configure: creating ./config.status
config.status: creating src/Makevars
** libs
cp 1_10_3/*.c 1_10_3/*.h .
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c const_export.c -o const_export.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c datatype_export.c -o datatype_export.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5A.c -o Wrapper_auto_H5A.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5.c -o Wrapper_auto_H5.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5D.c -o Wrapper_auto_H5D.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5DS.c -o Wrapper_auto_H5DS.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5E.c -o Wrapper_auto_H5E.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5F.c -o Wrapper_auto_H5F.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5G.c -o Wrapper_auto_H5G.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5I.c -o Wrapper_auto_H5I.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5IM.c -o Wrapper_auto_H5IM.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5L.c -o Wrapper_auto_H5L.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5LT.c -o Wrapper_auto_H5LT.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5O.c -o Wrapper_auto_H5O.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5P.c -o Wrapper_auto_H5P.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5R.c -o Wrapper_auto_H5R.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5S.c -o Wrapper_auto_H5S.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5TB.c -o Wrapper_auto_H5TB.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5T.c -o Wrapper_auto_H5T.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5Z.c -o Wrapper_auto_H5Z.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5FDcore.c -o Wrapper_auto_H5FDcore.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5FDfamily.c -o Wrapper_auto_H5FDfamily.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5FDlog.c -o Wrapper_auto_H5FDlog.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5FDsec2.c -o Wrapper_auto_H5FDsec2.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_auto_H5FDstdio.c -o Wrapper_auto_H5FDstdio.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c convert.c -o convert.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c hdf5r_init.c -o hdf5r_init.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c H5Error.c -o H5Error.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c H5ls.c -o H5ls.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/opt/conda/include -I/opt/conda/include      -D__USE_MINGW_ANSI_STDIO   -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5XUBcI/r-base-4.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c Wrapper_manual_H5T.c -o Wrapper_manual_H5T.o
gcc -std=gnu99 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o hdf5r.so const_export.o datatype_export.o Wrapper_auto_H5A.o Wrapper_auto_H5.o Wrapper_auto_H5D.o Wrapper_auto_H5DS.o Wrapper_auto_H5E.o Wrapper_auto_H5F.o Wrapper_auto_H5G.o Wrapper_auto_H5I.o Wrapper_auto_H5IM.o Wrapper_auto_H5L.o Wrapper_auto_H5LT.o Wrapper_auto_H5O.o Wrapper_auto_H5P.o Wrapper_auto_H5R.o Wrapper_auto_H5S.o Wrapper_auto_H5TB.o Wrapper_auto_H5T.o Wrapper_auto_H5Z.o Wrapper_auto_H5FDcore.o Wrapper_auto_H5FDfamily.o Wrapper_auto_H5FDlog.o Wrapper_auto_H5FDsec2.o Wrapper_auto_H5FDstdio.o convert.o hdf5r_init.o H5Error.o H5ls.o Wrapper_manual_H5T.o -L/opt/conda/lib -L/opt/conda/lib -L/opt/conda/lib -lcrypto -lcurl -lrt -lpthread -lz -ldl -lm -L. -lhdf5_hl -lhdf5 -lz -lm -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/00LOCK-hdf5r/00new/hdf5r/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘hdf5r’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/usr/local/lib/R/site-library/00LOCK-hdf5r/00new/hdf5r/libs/hdf5r.so':
  libhdf5_hl.so.100: cannot open shared object file: No such file or directory
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/usr/local/lib/R/site-library/hdf5r’

The downloaded source packages are in
	‘/tmp/RtmpvxOqhO/downloaded_packages’
Warning message:
In install.packages("hdf5r", configure.args = "--with-hdf5=/opt/conda/bin/h5cc") :
  installation of package ‘hdf5r’ had non-zero exit status

Can anyone give me some pointers as to how to successfully install hdf5r so that Seurat can use it?

The terra-jupyter-gatk image is not compatible with GATK

The Python environment in the latest terra-jupyter-gatk image is not compatible with most of the Python-based GATK tools, which is causing problems for downstream GATK users and Terra/GATK workshop attendees.

Would it be possible to make older images available as an option?

The official GATK conda environment, reflecting current tool requirements, is here: https://github.com/broadinstitute/gatk/blob/master/scripts/gatkcondaenv.yml.template

Unable to build terra-docker-r image

Hi,

I am unable to build the Docker image using the latest version of the Dockerfile you have in this remote
The issue seems to be related to this Dockerfile actually getting R=3.16 and not R=4.1 as speicified in the README

Dockerfile

ROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-base:1.1.3

USER root

COPY scripts $JUPYTER_HOME/scripts

# Add env vars to identify binary package installation
ENV TERRA_R_PLATFORM="terra-jupyter-r-2.2.4"
ENV TERRA_R_PLATFORM_BINARY_VERSION=4.3

# Install protobuf 3.20.3. Note this version comes from base deep learning image. Use `conda list` to see what's installed
RUN cd /tmp \
  && wget https://github.com/protocolbuffers/protobuf/releases/download/v3.20.3/protobuf-all-3.20.3.tar.gz \
	&& tar -xvzf protobuf-all-3.20.3.tar.gz \
	&& cd protobuf-3.20.3/ \
	&& ./configure \
	&& make \
	&& make check \
	&& sudo make install \
	&& sudo ldconfig \
	&& rm -rf /tmp/protobuf-* \
	&& cd ~

# Add R kernel
RUN find $JUPYTER_HOME/scripts -name '*.sh' -type f | xargs chmod +x \
 && $JUPYTER_HOME/scripts/kernel/kernelspec.sh $JUPYTER_HOME/scripts/kernel /opt/conda/share/jupyter/kernels

# https://cran.r-project.org/bin/linux/ubuntu/README.html
RUN apt-get update \
    && apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
    && add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/' \
    && apt-get install -yq --no-install-recommends apt-transport-https \
    && apt update \
    && apt install -yq --no-install-recommends \
	apt-utils \
	libssh2-1-dev \
	libssl-dev \
	libcurl4-gnutls-dev \
	libgit2-dev \
	libxml2-dev \
	libgfortran-7-dev \
	r-base-dev \
	r-base-core \
	# This section installs libraries
	libnetcdf-dev \
	libhdf5-serial-dev \
	libfftw3-dev \
	libopenbabel-dev \
	libopenmpi-dev \
	libexempi3 \
	libgdal-dev \
	libcairo2-dev \
	libtiff5-dev \
	libgsl0-dev \
	libgtk2.0-dev \
	libgl1-mesa-dev \
	libglu1-mesa-dev \
	libgmp3-dev \
	libhdf5-dev \
	libncurses-dev \
	libxpm-dev \
	libv8-3.14-dev \
	libgtkmm-2.4-dev \
	libmpfr-dev \
	libudunits2-dev \
	libmodule-build-perl \
	libapparmor-dev \
	libgeos-dev \
	librdf0-dev \
	libmagick++-dev \
	libsasl2-dev \
	libpoppler-cpp-dev \
	libpq-dev \
	libperl-dev \
	libgfortran5 \
	libarchive-extract-perl \
	libfile-copy-recursive-perl \
	libcgi-pm-perl \
	libdbi-perl \
	libdbd-mysql-perl \
	libxml-simple-perl \
	sqlite \
	mpi-default-bin \
	openmpi-common \
	tcl8.5-dev \
	imagemagick \
	tabix \
	ggobi \
	graphviz \
	jags \
	xfonts-100dpi \
	xfonts-75dpi \
	biber \
	libzmq3-dev \
	libsbml5-dev \
	biber \
    ocl-icd-opencl-dev \
    libeigen3-dev \
    mono-runtime \
	cmake \
 	libarchive-dev \
    && ln -s /usr/lib/gcc/x86_64-linux-gnu/7/libgfortran.so /usr/lib/x86_64-linux-gnu/libgfortran.so \
    && ln -s /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so /usr/lib/x86_64-linux-gnu/libstdc++.so \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# DEVEL: Add sys env variables to DEVEL image
# Variables in Renviron.site are made available inside of R.
# Add libsbml CFLAGS
ENV LIBSBML_CFLAGS="-I/usr/include"
ENV LIBSBML_LIBS="-lsbml"
RUN echo 'export LIBSBML_CFLAGS="-I/usr/include"' >> /etc/profile \
    && echo 'export LIBSBML_LIBS="-lsbml"' >> /etc/profile

## set pip3 to run as root, not as jupyter user
ENV PIP_USER=false

## Install python packages needed for a few Bioc packages
RUN pip3 -V \
    && pip3 install --upgrade pip \
    && pip3 install cwltool \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

RUN R -e 'install.packages("BiocManager")' \
    ## check version
    && R -e 'BiocManager::install(version="3.18", ask=FALSE)' \
    && R -e 'BiocManager::install(c( \
    "boot", \
    "class", \
    "cluster", \
    "codetools", \
    "foreign", \
    "kernsmooth", \
    "lattice", \
    "mass", \
    "Matrix", \
    "mgcv", \
    "nlme", \
    "nnet", \
    "rpart", \
    "spatial", \
    "survival", \
    # Jupyter notebook essentials
    "IRdisplay",  \
    "IRkernel", \
    # User oriented packages
    "reticulate", \
    "remotes", \
    "devtools", \
    "pbdZMQ", \
    "uuid", \
    "lme4", \
    "lmerTest", \
    "data.table", \
    "tidyr", \
    "dplyr", \
    "ggplot2"))' \
    && R -e 'BiocManager::install("DataBiosphere/Ronaldo")'


## pip runs as jupyter user
ENV PIP_USER=true

RUN R -e 'IRkernel::installspec(user=FALSE)' \
    && chown -R $USER:users /usr/local/lib/R/site-library /home/jupyter

USER $USER

Command to run

docker build -t test  .

Output

[+] Building 1131.8s (6/13)                                                                                                                  
[+] Building 1132.0s (6/13)                                                                                                                  
[+] Building 1132.1s (6/13)                                                                                                                  
 => => transferring dockerfile: 4.29kB                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                       0.0s
[+] Building 1132.3s (6/13)                                                                                                                  
 => [internal] load metadata for us.gcr.io/broad-dsp-gcr-public/terra-jupyter-base:1.1.3                                                0.4s
 => [internal] load build context                                                                                                       0.0s
[+] Building 1520.2s (12/13)                                                                                                                 
 => [internal] load build definition from Dockerfile_w_Rinstalls                                                                        0.0s
 => => transferring dockerfile: 4.29kB                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                       0.0s
 => => transferring context: 2B                                                                                                         0.0s
 => [internal] load metadata for us.gcr.io/broad-dsp-gcr-public/terra-jupyter-base:1.1.3                                                0.4s
 => [internal] load build context                                                                                                       0.0s
 => => transferring context: 854B                                                                                                       0.0s
 => CACHED [1/9] FROM us.gcr.io/broad-dsp-gcr-public/terra-jupyter-base:1.1.3@sha256:d1758af3b27f9ec97e37855538acd0ac074120a1b4a3cbed0  0.0s
 => [2/9] COPY scripts /etc/jupyter/scripts                                                                                             0.0s 
 => [3/9] RUN cd /tmp   && wget https://github.com/protocolbuffers/protobuf/releases/download/v3.20.3/protobuf-all-3.20.3.tar.gz  &  1179.4s 
 => [4/9] RUN find /etc/jupyter/scripts -name '*.sh' -type f | xargs chmod +x  && /etc/jupyter/scripts/kernel/kernelspec.sh /etc/jupyt  0.4s 
 => [5/9] RUN apt-get update     && apt-get install -yq --no-install-recommends apt-transport-https     && apt update     && apt ins  318.0s 
 => [6/9] RUN echo 'export LIBSBML_CFLAGS="-I/usr/include"' >> /etc/profile     && echo 'export LIBSBML_LIBS="-lsbml"' >> /etc/profile  0.4s
 => [7/9] RUN pip3 -V     && pip3 install --upgrade pip     && pip3 install cwltool     && apt-get clean     && rm -rf /var/lib/apt/l  17.1s
 => ERROR [8/9] RUN R -e 'install.packages("BiocManager")'     && R -e 'BiocManager::install(version="3.18", ask=FALSE)'     && R -e '  4.1s
------
 > [8/9] RUN R -e 'install.packages("BiocManager")'     && R -e 'BiocManager::install(version="3.18", ask=FALSE)'     && R -e 'BiocManager::install(c(     "boot",     "class",     "cluster",     "codetools",     "foreign",     "kernsmooth",     "lattice",     "mass",     "Matrix",     "mgcv",     "nlme",     "nnet",     "rpart",     "spatial",     "survival",     "IRdisplay",      "IRkernel",     "reticulate",     "remotes",     "devtools",     "pbdZMQ",     "uuid",     "lme4",     "lmerTest",     "data.table",     "tidyr",     "dplyr",     "ggplot2"))'     && R -e 'BiocManager::install("DataBiosphere/Ronaldo")':
#0 0.388 
#0 0.388 R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
#0 0.388 Copyright (C) 2020 The R Foundation for Statistical Computing
#0 0.388 Platform: x86_64-pc-linux-gnu (64-bit)
#0 0.388 
#0 0.389 R is free software and comes with ABSOLUTELY NO WARRANTY.
#0 0.389 You are welcome to redistribute it under certain conditions.
#0 0.389 Type 'license()' or 'licence()' for distribution details.
#0 0.389 
#0 0.389   Natural language support but running in an English locale
#0 0.389 
#0 0.389 R is a collaborative project with many contributors.
#0 0.389 Type 'contributors()' for more information and
#0 0.389 'citation()' on how to cite R or R packages in publications.
#0 0.389 
#0 0.389 Type 'demo()' for some demos, 'help()' for on-line help, or
#0 0.389 'help.start()' for an HTML browser interface to help.
#0 0.389 Type 'q()' to quit R.
#0 0.389 
#0 0.465 > install.packages("BiocManager")
#0 0.468 Installing package into ‘/usr/local/lib/R/site-library’
#0 0.468 (as ‘lib’ is unspecified)
#0 1.672 trying URL 'https://cloud.r-project.org/src/contrib/BiocManager_1.30.22.tar.gz'
#0 1.967 Content type 'application/x-gzip' length 582690 bytes (569 KB)
#0 1.967 ==================================================
#0 2.189 downloaded 569 KB
#0 2.189 
#0 2.351 * installing *source* package ‘BiocManager’ ...
#0 2.354 ** package ‘BiocManager’ successfully unpacked and MD5 sums checked
#0 2.354 ** using staged installation
#0 2.364 ** R
#0 2.369 ** inst
#0 2.370 ** byte-compile and prepare package for lazy loading
#0 2.755 ** help
#0 2.776 *** installing help indices
#0 2.790 ** building package indices
#0 2.800 ** installing vignettes
#0 2.802 ** testing if installed package can be loaded from temporary location
#0 3.231 ** testing if installed package can be loaded from final location
#0 3.676 ** testing if installed package keeps a record of temporary installation path
#0 3.677 * DONE (BiocManager)
#0 3.683 
#0 3.683 The downloaded source packages are in
#0 3.683        ‘/tmp/RtmpkJT8fH/downloaded_packages’
#0 3.683 > 
#0 3.683 > 
#0 3.731 
#0 3.731 R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
#0 3.731 Copyright (C) 2020 The R Foundation for Statistical Computing
#0 3.731 Platform: x86_64-pc-linux-gnu (64-bit)
#0 3.731 
#0 3.731 R is free software and comes with ABSOLUTELY NO WARRANTY.
#0 3.731 You are welcome to redistribute it under certain conditions.
#0 3.731 Type 'license()' or 'licence()' for distribution details.
#0 3.731 
#0 3.731   Natural language support but running in an English locale
#0 3.731 
#0 3.731 R is a collaborative project with many contributors.
#0 3.731 Type 'contributors()' for more information and
#0 3.731 'citation()' on how to cite R or R packages in publications.
#0 3.731 
#0 3.731 Type 'demo()' for some demos, 'help()' for on-line help, or
#0 3.731 'help.start()' for an HTML browser interface to help.
#0 3.731 Type 'q()' to quit R.
#0 3.731 
#0 3.791 > BiocManager::install(version="3.18", ask=FALSE)
#0 4.107 Error: Bioconductor version '3.18' requires R version '4.3'; use `version = '3.10'`
#0 4.107   with R version 3.6; see https://bioconductor.org/install
#0 4.107 Execution halted
------
Dockerfile_w_Rinstalls:125
--------------------
 124 |     
 125 | >>> RUN R -e 'install.packages("BiocManager")' \
 126 | >>>     ## check version
 127 | >>>     && R -e 'BiocManager::install(version="3.18", ask=FALSE)' \
 128 | >>>     && R -e 'BiocManager::install(c( \
 129 | >>>     "boot", \
 130 | >>>     "class", \
 131 | >>>     "cluster", \
 132 | >>>     "codetools", \
 133 | >>>     "foreign", \
 134 | >>>     "kernsmooth", \
 135 | >>>     "lattice", \
 136 | >>>     "mass", \
 137 | >>>     "Matrix", \
 138 | >>>     "mgcv", \
 139 | >>>     "nlme", \
 140 | >>>     "nnet", \
 141 | >>>     "rpart", \
 142 | >>>     "spatial", \
 143 | >>>     "survival", \
 144 | >>>     # Jupyter notebook essentials
 145 | >>>     "IRdisplay",  \
 146 | >>>     "IRkernel", \
 147 | >>>     # User oriented packages
 148 | >>>     "reticulate", \
 149 | >>>     "remotes", \
 150 | >>>     "devtools", \
 151 | >>>     "pbdZMQ", \
 152 | >>>     "uuid", \
 153 | >>>     "lme4", \
 154 | >>>     "lmerTest", \
 155 | >>>     "data.table", \
 156 | >>>     "tidyr", \
 157 | >>>     "dplyr", \
 158 | >>>     "ggplot2"))' \
 159 | >>>     && R -e 'BiocManager::install("DataBiosphere/Ronaldo")'
 160 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c R -e 'install.packages(\"BiocManager\")'     && R -e 'BiocManager::install(version=\"3.18\", ask=FALSE)'     && R -e 'BiocManager::install(c(     \"boot\",     \"class\",     \"cluster\",     \"codetools\",     \"foreign\",     \"kernsmooth\",     \"lattice\",     \"mass\",     \"Matrix\",     \"mgcv\",     \"nlme\",     \"nnet\",     \"rpart\",     \"spatial\",     \"survival\",     \"IRdisplay\",      \"IRkernel\",     \"reticulate\",     \"remotes\",     \"devtools\",     \"pbdZMQ\",     \"uuid\",     \"lme4\",     \"lmerTest\",     \"data.table\",     \"tidyr\",     \"dplyr\",     \"ggplot2\"))'     && R -e 'BiocManager::install(\"DataBiosphere/Ronaldo\")'" did not complete successfully: exit code: 1

Add plotnine to terra-jupyter-python

plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2.

Having this available in terra-jupyter-python would make it such that the code for R and Python notebooks in Terra can look similar and decrease the cognitive load on researchers when going back-and-forth between the two languages.

Note that the AoU Dockerfile adds plotnine.

terra-jupyter-base should allow non-sudo/root use of conda

terra-jupyter-base should allow the jupyter user to create conda environments in its home directory without having to sudo or run as root. As conda itself tells us, "In general, it's not advisable to use 'sudo conda'", and it can cause problems with libraries that default to storing configuration in the user home. Indeed, I was unable to get my nb_conda_kernels-generated GATK kernel working in https://github.com/broadinstitute/gatk-workshop-terra-jupyter-image/blob/main/Dockerfile (which is derived from terra-jupyter-base) until I switched to creating the conda environment as the jupyter user.

The main obstacle to enabling this functionality is the /opt/conda/pkgs/cache directory -- this directory needs to be writable by the jupyter user (or the users group), even if the jupyter user is just trying to create a conda environment in their own home directory. In https://github.com/broadinstitute/gatk-workshop-terra-jupyter-image/blob/main/Dockerfile I use a terrible hack to allow this directory to be written by the users group, but there should be a more principled approach involving setting the setgid bit on the directory and changing the default umask to allow group write permission (or perhaps by making use of the CONDA_PKGS_DIRS environment variable).

GPUs, Terra image builds, and TF >=2.5

Creating this issue for discussion. (I’m on the Terra solutions team at Verily).
It’s a known issue that the Terra images (using the gcr.io/deeplearning-platform-release/base-cu110:latest base image) don’t support >=TF 2.5 with GPUs due to a CUDA library mismatch.
I noticed that GCP notebooks, which use the same base image, do support >=TF 2.5 with GPUs where the Terra builds do not.

I did a bit of digging and I believe the reason for this is that the gcr.io/deeplearning-platform-release/tf2-gpu.2-6 images etc. (which use the same base image above) do their own builds of TF from source, via bazel, using the CUDA 11.0 libs, rather than doing a pip install.

The TF release notes indicate that from 2.5 onwards, the TF team’s builds (as you’d get from pip) are built using the CUDA 11.2 libs. So, the GCP notebook team’s build from source, using the 11.0 libs, is key to why their images work for TF >=2.5.
The GCP notebook team does have a bug in to update to using the 11.2 libs, but it’s just a placeholder and may not see movement soon.

We could just document this well and wait for the GCP notebook team to support CUDA 11.2.
However, I suspect that many people won’t read the docs, will want to use TF >=2.5 and will probably attempt to update via pip install without realizing that it will remove their use of GPUs.

It might be worth considering using the GCP Notebooks images as Terra base images. These images have many useful ML and data science-related libs installed in addition to the ML frameworks.
If we could get this working it would let us piggyback on the GCP Notebook team’s maintenance and testing of all the libraries that they install on these images, and their dependencies— basically offloading all that work.
To do this this would presumably require overriding their Jupyter config with the Terra Jupyter setup as the last steps in a Terra Dockerfile, probably terra-jupyter-python.

Would this approach— using the GCP notebook images rather than their base image as a starting point— be worth trying?
(A related question, that I haven’t checked yet, is whether there are similar issues with other ML frameworks like PyTorch, but I suspect there are. We could provide a set of images, based on the GCP notebook images, to choose from.)

/cc @deflaux

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.