Giter Club home page Giter Club logo

genocae's Issues

Extracting feature importance

Hello!,

How would I go about extracting feature importance scores from a trained GenoCAE model? Some methods I'd like to try out:

  1. Simply extract feature weights
  2. Compute SHAP scores (or something akin to it)

I think part of my issue is not being sure how to recontruct the model from the GenoCAE outputs, since I see the weights are all stored in weights subfolder but not sure how to import them.

As a side note, I'm more familiar with the format where the entire model (architecture, weights) is saved as one .h5 file. Is there a way to save GenoCAE models in this way?

Thanks so much!,
Brian

Suggest + volunteer: add CONTRIBUTING.md

From, for example, 'Best Practices for Maintainers' on can learn it is a good idea to have guidelines on the rules for contributors to do so, e.g. one of my own CONTRIBUTING.md documents. One of the many benefits of this, is that is makes it easier to say 'no' to undesired features.

I suggest to add such a CONTRIBUTING.md document and volunteer to create a first sketch of one. Of course, the current maintainers are boss, so I do not expect the rules I put in to become the actual rules :-)

Good idea?

Error while building docker container

The error occurred at the end of the building procedure.
I think the version format in the requirements.txt file is the problem.

 => ERROR [6/6] RUN python3 -m pip install -r /workspace/requirements.txt and &&rm /workspace/requirements.txt                                                                                                                                                                                                     1.2s
------                                                                                                                                                                                                                                                                                                                  
 > [6/6] RUN python3 -m pip install -r /workspace/requirements.txt and &&rm /workspace/requirements.txt:                                                                                                                                                                                                                
#8 1.023 ERROR: Could not find a version that satisfies the requirement and (from versions: none)
#8 1.024 ERROR: No matching distribution found for and
------
executor failed running [/bin/bash -c python3 -m pip install -r /workspace/requirements.txt and &&rm /workspace/requirements.txt]: exit code: 1

`2022-06-23 09:25:59.298713: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 19113388800 exceeds 10% of free system memory.`

I'm currently trying to train GenoCAE on a datasets of 15 million SNPs across 67 individuals, but seem to be running into memory issues despite the fact that I'm using a AMD Threadripper workstations with 252 GB of memory and 64 cores (128 threads).

I suspect this may be due to the large number of SNPs I'm including, since the example data (which runs fine) only contains 9,259 SNPS, and in the original paper 161k were used.

2,067 individuals typed at 160,858

From my limited experience with these models, the number of input features drastically affect memory usage (much moreso than sample size). So I think my first step will be to filter the number of variants I'm training the model on based on some of the guidelines provided in the paper:

  1. Remove sex chromosomes
  2. Set missing genotypes "to the most frequent value per SNP so as to avoid their influence over dimensionality reduction results".
  3. Remove SNPS with MAF <1%.
  4. Perform LD pruning by "removing one of each pair of SNPs in windows of 1.0 centimorgan that had an allelic R2 value greater than 0.2." Though eventually I'd like to find a way to avoid this last step because I'm interested in identifying causal variants.

Suggest: allow to work with GenoCAE without changing the working directory

Dear GenoCAE maintainer,

Here I suggest to to allow a user to run GCAE from any folder, instead of forcing him/here to work from the GenoCAE folder.

When running the 'training' example code from the GenoCAE folder, the training works awesome:

Here I run the command:

richel@N141CU:~/.local/share/gcaer/gcae_v1_0$ /home/richel/.local/share/r-miniconda/envs/r-reticulate/bin/python \
  ~/.local/share/gcaer/gcae_v1_0/run_gcae.py train --datadir ~/.local/share/gcaer/gcae_v1_0/example_tiny/ \
  --data HumanOrigins249_tiny --model_id M1 --epochs 20 --save_interval 2 --train_opts_id ex3 --data_opts_id b_0_4

Here is part of the result:

2021-06-28 14:50:13.776150: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-06-28 14:50:13.776180: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
tensorflow version 2.3.3

______________________________ arguments ______________________________
train : True
datadir : /home/richel/.local/share/gcaer/gcae_v1_0/example_tiny/
data : HumanOrigins249_tiny
model_id : M1
...

However, when I work from another folder, say, one folder up ...

richel@N141CU:~/.local/share/gcaer$ /home/richel/.local/share/r-miniconda/envs/r-reticulate/bin/python \
  ~/.local/share/gcaer/gcae_v1_0/run_gcae.py train --datadir ~/.local/share/gcaer/gcae_v1_0/example_tiny/  \
  --data HumanOrigins249_tiny --model_id M1 --epochs 20 --save_interval 2 --train_opts_id ex3 --data_opts_id b_0_4

I get an error message that "data_opts/" + data_opts_id+".json" cannot be found, at here in the code:

2021-06-28 14:50:53.728916: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-06-28 14:50:53.728947: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
tensorflow version 2.3.3
Traceback (most recent call last):
  File "/home/richel/.local/share/gcaer/gcae_v1_0/run_gcae.py", line 396, in <module>
    with open("data_opts/" + data_opts_id+".json") as data_opts_def_file:
FileNotFoundError: [Errno 2] No such file or directory: 'data_opts/b_0_4.json'

The problem here is the hardcoded "data_opts/" part, that forces me to work in the same folder as GenoCAE. It feels clumsy to work with, as I have to change the working directory when calling GenoCAE. Note that, looking at the code, the same applies for train_opts and models.

I would enjoy a way to either (my favorites are first :-) ):

  • (1) being able to set the path from where the current JSON file is expected to be, e.g. via --data_opts_folder CLI argument, code becomes data_opts_folder + "data_opts/" + data_opts_id+".json", or
  • (2) specify the full path to the JSON file instead, --data_opts myfolder/b_0_4.json, code becomes data_opts (which is now a filename), or
  • (3) Expect the JSON files to be in the folder specified by --datadir ( data_dir + "/" + data_opts_id+".json"), or

Would one of these options be doable?

Suggest: cite paper

Hi @kausmees,

GCAE seems awesome to me! What I feel is missing is a reference to the paper at BioRxiv. I suggest to add it add it as reference, something like I do below. Sure, I volunteer to do so myself via a Pull Request :-)

References

  • [1] Ausmees, Kristiina, and Carl Nettelblad. "A deep learning framework for characterization of genotype data." bioRxiv (2020). here

Suggest + volunteer: rename HumanOrigins249_tiny.eigenstratgeno to HumanOrigins249_tiny.bed

Dear GenoCAE maintainer,

Thanks so much for having example files and example code: I find those very useful!

I did find something unexpected, the file extension of HumanOrigins249_tiny.eigenstratgeno: this appears to be a PLINK .bed file, as it follows the same structure as described in the PLINK .bim file format doc. Also, genio (an R package to read PLINK files) cannot read .bed files if they do not have that extension.

I suggest to rename the file to what any PLINK user would expect for a .bed file, which is HumanOrigins249_tiny.bed

I volunteer to do so.

Projecting non-population metadata

I'm trying to project sample metadata onto a trained GenoCAE model,

I'm wondering if this is a formatting issue with my --superpops input file (e.g. i have >2 columns, and some rows have NAs for missing metadata), or if GenoCAE is hard-code to only accept superpopulation-type metadata.

I'm attaching here an example of my metadata.
merged.metadata.csv

Command

python3 run_gcae.py project --datadir /shared/bms20/projects/MND_ALS/SNP_VCFs/merged/ --data merged.SNPS_filtered.plink --trainedmodeldir als_out --model_id M1 --train_opts_id ex3  --data_opts_id b_0_4 --superpops /home/bms20/projects/MND_ALS/SNP_VCFs/merged/merged.metadata.csv

Output

...
...
Projecting epochs: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
Already projected: []
In DG.get_train_set: number of -1.0 genotypes in train: 5689140
In DG.get_train_set: number of -9 genotypes in train: 0
In DG.get_train_set: number of 0 values in train mask: 0
2022-06-23 12:39:48.063862: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

______________________________ Building model ______________________________
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'strides': 1}
Adding layer: BatchNormalization: {}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: MaxPooling1D: {'pool_size': 5, 'strides': 2, 'padding': 'same'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Flatten: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dense: {'units': 2, 'name': 'encoded'}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 621496}
Adding layer: Reshape: {'target_shape': (77687, 8), 'name': 'i_msvar'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Reshape: {'target_shape': (77687, 1, 8)}
Adding layer: UpSampling2D: {'size': (2, 1)}
Adding layer: Reshape: {'target_shape': (155374, 8)}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu', 'name': 'nms'}
Adding layer: BatchNormalization: {}
Adding layer: Conv1D: {'filters': 1, 'kernel_size': 1, 'padding': 'same'}
Adding layer: Flatten: {'name': 'logits'}
########################### epoch 2 ###########################
Reading weights from /shared/bms20/projects/GenoCAE/als_out/ae.M1.ex3.b_0_4.merged.SNPS_filtered.plink/weights/2
**Traceback (most recent call last):
  File "/shared/bms20/projects/GenoCAE/run_gcae.py", line 1011, in <module>
    plot_coords_by_superpop(coords_by_pop,"{0}/dimred_e_{1}_by_superpop".format(results_directory, epoch), superpopulations_file, plot_legend = epoch == epochs[0])
  File "/shared/bms20/projects/GenoCAE/utils/visualization.py", line 203, in plot_coords_by_superpop
    max_num_pops = max([len(superpop_dict[spop]) for spop in superpops])
ValueError: max() arg is an empty sequence**

GenoCAE build fails due to upstream update

Dear GenoCAE maintainers, hi Carl and Kristiina,

Thanks for GenoCAE and its tests using GitHub Actions, showing off how awesome it is!

However, upstream something has happened that cause the builds of all of my Python-dependent work to fail. Sadly, it happened to GenoCAE as well. As you are superior with Python, I hope you will help me/us :-)

Currently, the last GitHub Action trigger of the repo passed, which was (as of today) 5 days ago. That seems great! However, today this build fails. I figured this out by simpling forking this repo and trigger a rebuild. From the GitHub Actions log one can read:

Run python3 run_gcae.py --help
2022-02-07 14:01:16.923333: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-02-07 14:01:16.923388: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
Traceback (most recent call last):
  File "run_gcae.py", line 32, in <module>
    import tensorflow as tf
  File "/home/runner/.local/lib/python3.8/site-packages/tensorflow/__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/home/runner/.local/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 37, in <module>
    from tensorflow.python.eager import context
  File "/home/runner/.local/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 35, in <module>
    from tensorflow.python.client import pywrap_tf_session
  File "/home/runner/.local/lib/python3.8/site-packages/tensorflow/python/client/pywrap_tf_session.py", line 19, in <module>
    from tensorflow.python.client._pywrap_tf_session import *
ImportError: SystemError: <built-in method __contains__ of dict object at 0x7f9dbba31580> returned a result with an error set

The problem is obviously:

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

I have been trying all day to fix this, but I did not dare to meddle with requirements.txt. I will continue trying, yet I hope you will beat me fix this ๐Ÿ˜‡

Suggest + volunteer: text files are not executable

Dear GenoCAE maintainers,

Thanks for the example code and examples, I find these very useful!

What is unexpected, however, is that somehow the genetic input files in the folder example_tiny are set to be executables, as can be seen in the screenshot of my terminal (see below, green indicates an executable) and by the File Manager asking me to run a text file when I open it (see below, at the right-hand side):

Screenshot from 2021-06-29 11-52-12

I guess a chmod +x was messed up somewhere :-)

I suggest to remove the executable flag of these simple text files.

I volunteer to do so.

Docker script does not build anymore?

Dear GenoCAE maintainer, hi Carl and Kristiina,

Thanks for GenoCAE as well as the Docker container script: It's great for running GenoCAE on a computer cluster :-)

This Issue is related to #26, which is probably also caused by an upstream update: the Docker file does not work anymore. The installation instructions at https://github.com/kausmees/GenoCAE#docker-installation are great! Doing the suggested command, i.e. (note I added sudo) ...

sudo docker build -t gcae/genocae:build -f docker/build.dockerfile .

... results in a failed build, with a full error log below.

I have been trying the whole day ( for example, there are 6 failed attempts here), but could not fix this.

Does the Docker build work for you? Do you have an idea how to fix the Docker file?

A temporary workaround could be to upload an existing Docker container to Docker hub. Do you happen to have one? Would be awesome!

I hope it will be easy for you to help me solve this. I am not very experiences with Docker nor Python, so I can imagine an easy fix being possible (on the other hand, the 6 Stack Overflow 'solutions' hint that the problem is there).

To reproduce, I have created a script to build the Docker container, together with a GitHub Actions script with an error log here.

I hope you can help me out here! Thanks and cheers, Richel

Full error log

Sending build context to Docker daemon  186.6MB
Step 1/15 : ARG CUDA_VERSION=11.1.1
Step 2/15 : ARG OS_VERSION=20.04
Step 3/15 : FROM nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-ubuntu${OS_VERSION}
 ---> 1189781af5ec
Step 4/15 : LABEL maintainer="Dong Wang"
 ---> Using cache
 ---> 9ae2635141d3
Step 5/15 : ENV PATH="/root/miniconda3/bin:${PATH}"
 ---> Using cache
 ---> f907151c27bd
Step 6/15 : ARG PATH="/root/miniconda3/bin:${PATH}"
 ---> Using cache
 ---> 76b31f23bd5e
Step 7/15 : SHELL ["/bin/bash", "-c"]
 ---> Using cache
 ---> e52cb6a4a70e
Step 8/15 : RUN apt-get update && apt-get upgrade -y &&     apt-get install -y wget
 ---> Using cache
 ---> f779a2c5021a
Step 9/15 : RUN wget     https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh     && mkdir /root/.conda     && bash Miniconda3-latest-Linux-x86_64.sh -b     && rm -f Miniconda3-latest-Linux-x86_64.sh
 ---> Using cache
 ---> e86837b5b18e
Step 10/15 : RUN pip3 install --upgrade pip
 ---> Running in 7a97102a4336
Requirement already satisfied: pip in /root/miniconda3/lib/python3.9/site-packages (21.1.3)
Collecting pip
  Downloading pip-22.0.3-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-22.0.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Removing intermediate container 7a97102a4336
 ---> b48bafcfa623
Step 11/15 : RUN pip3 install --upgrade setuptools
 ---> Running in ea564922654f
Requirement already satisfied: setuptools in /root/miniconda3/lib/python3.9/site-packages (52.0.0.post20210125)
Collecting setuptools
  Downloading setuptools-60.8.1-py3-none-any.whl (1.1 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 1.1/1.1 MB 13.7 MB/s eta 0:00:00
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 52.0.0.post20210125
    Uninstalling setuptools-52.0.0.post20210125:
      Successfully uninstalled setuptools-52.0.0.post20210125
Successfully installed setuptools-60.8.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Removing intermediate container ea564922654f
 ---> 874907ea03a5
Step 12/15 : WORKDIR /workspace
 ---> Running in ca79d4cb1457
Removing intermediate container ca79d4cb1457
 ---> a96a63e990ee
Step 13/15 : ADD ./requirements.txt /workspace
 ---> ad981a4056bd
Step 14/15 : RUN pip3 install -r /workspace/requirements.txt and &&	rm /workspace/requirements.txt
 ---> Running in 47c7e4f5b30c
Collecting and
  Downloading and-0.1.1-py3-none-any.whl (2.0 kB)
Collecting docopt
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting grpcio
  Downloading grpcio-1.43.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 4.1/4.1 MB 24.1 MB/s eta 0:00:00
Collecting setuptools==47.1.1
  Downloading setuptools-47.1.1-py3-none-any.whl (583 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 583.2/583.2 KB 32.3 MB/s eta 0:00:00
Collecting tensorflow>=2.2.0
  Downloading tensorflow-2.8.0-cp39-cp39-manylinux2010_x86_64.whl (497.6 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 497.6/497.6 MB 4.1 MB/s eta 0:00:00
Collecting numpy==1.18.4
  Downloading numpy-1.18.4.zip (5.4 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 5.4/5.4 MB 26.8 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting scikit-learn
  Downloading scikit_learn-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.4 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 26.4/26.4 MB 25.0 MB/s eta 0:00:00
Collecting matplotlib==3.2.1
  Downloading matplotlib-3.2.1.tar.gz (40.3 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 40.3/40.3 MB 20.9 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting seaborn
  Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 292.8/292.8 KB 26.8 MB/s eta 0:00:00
Collecting scipy==1.4.1
  Downloading scipy-1.4.1.tar.gz (24.6 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 24.6/24.6 MB 24.9 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: still running...
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'error'
  error: subprocess-exited-with-error
  
  ร— Preparing metadata (pyproject.toml) did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [171 lines of output]
      setup.py:418: UserWarning: Unrecognized setuptools command ('dist_info --egg-base /tmp/pip-modern-metadata-jxp98lbc'), proceeding with generating Cython sources and expanding templates
        warnings.warn("Unrecognized setuptools command ('{}'), proceeding with "
      Running from scipy source directory.
      lapack_opt_info:
      lapack_mkl_info:
      customize UnixCCompiler
        libraries mkl_rt not found in ['/root/miniconda3/lib', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
        NOT AVAILABLE
      
      openblas_lapack_info:
      customize UnixCCompiler
      customize UnixCCompiler
        libraries openblas not found in ['/root/miniconda3/lib', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
        NOT AVAILABLE
      
      openblas_clapack_info:
      customize UnixCCompiler
      customize UnixCCompiler
        libraries openblas,lapack not found in ['/root/miniconda3/lib', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
        NOT AVAILABLE
      
      flame_info:
      customize UnixCCompiler
        libraries flame not found in ['/root/miniconda3/lib', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
        NOT AVAILABLE
      
      atlas_3_10_threads_info:
      Setting PTATLAS=ATLAS
      customize UnixCCompiler
        libraries lapack_atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries tatlas,tatlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries tatlas,tatlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries tatlas,tatlas not found in /usr/lib64
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib
      customize UnixCCompiler
        libraries tatlas,tatlas not found in /usr/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
      customize UnixCCompiler
        libraries tatlas,tatlas not found in /usr/lib/x86_64-linux-gnu
      <class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
        NOT AVAILABLE
      
      atlas_3_10_info:
      customize UnixCCompiler
        libraries lapack_atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries satlas,satlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries satlas,satlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries satlas,satlas not found in /usr/lib64
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib
      customize UnixCCompiler
        libraries satlas,satlas not found in /usr/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
      customize UnixCCompiler
        libraries satlas,satlas not found in /usr/lib/x86_64-linux-gnu
      <class 'numpy.distutils.system_info.atlas_3_10_info'>
        NOT AVAILABLE
      
      atlas_threads_info:
      Setting PTATLAS=ATLAS
      customize UnixCCompiler
        libraries lapack_atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries ptf77blas,ptcblas,atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries ptf77blas,ptcblas,atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib
      customize UnixCCompiler
        libraries ptf77blas,ptcblas,atlas not found in /usr/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
      customize UnixCCompiler
        libraries ptf77blas,ptcblas,atlas not found in /usr/lib/x86_64-linux-gnu
      <class 'numpy.distutils.system_info.atlas_threads_info'>
        NOT AVAILABLE
      
      atlas_info:
      customize UnixCCompiler
        libraries lapack_atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries f77blas,cblas,atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries f77blas,cblas,atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries f77blas,cblas,atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib
      customize UnixCCompiler
        libraries f77blas,cblas,atlas not found in /usr/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
      customize UnixCCompiler
        libraries f77blas,cblas,atlas not found in /usr/lib/x86_64-linux-gnu
      <class 'numpy.distutils.system_info.atlas_info'>
        NOT AVAILABLE
      
      accelerate_info:
        NOT AVAILABLE
      
      lapack_info:
      customize UnixCCompiler
        libraries lapack not found in ['/root/miniconda3/lib', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
        NOT AVAILABLE
      
      /tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/numpy/distutils/system_info.py:1712: UserWarning:
          Lapack (http://www.netlib.org/lapack/) libraries not found.
          Directories to search for the libraries can be specified in the
          numpy/distutils/site.cfg file (section [lapack]) or by setting
          the LAPACK environment variable.
        if getattr(self, '_calc_info_{}'.format(lapack))():
      lapack_src_info:
        NOT AVAILABLE
      
      /tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/numpy/distutils/system_info.py:1712: UserWarning:
          Lapack (http://www.netlib.org/lapack/) sources not found.
          Directories to search for the sources can be specified in the
          numpy/distutils/site.cfg file (section [lapack_src]) or by setting
          the LAPACK_SRC environment variable.
        if getattr(self, '_calc_info_{}'.format(lapack))():
        NOT AVAILABLE
      
      Traceback (most recent call last):
        File "/root/miniconda3/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/root/miniconda3/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/root/miniconda3/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel
          return hook(metadata_directory, config_settings)
        File "/tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 173, in prepare_metadata_for_build_wheel
          self.run_setup()
        File "/tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 266, in run_setup
          super(_BuildMetaLegacyBackend,
        File "/tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 157, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File "setup.py", line 540, in <module>
          setup_package()
        File "setup.py", line 536, in setup_package
          setup(**metadata)
        File "/tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/numpy/distutils/core.py", line 137, in setup
          config = configuration()
        File "setup.py", line 435, in configuration
          raise NotFoundError(msg)
      numpy.distutils.system_info.NotFoundError: No lapack/blas resources found.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

ร— Encountered error while generating package metadata.
โ•ฐโ”€> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Conversion to PLINK format failed for .bed file

Dear GenoCAE maintainer,

Thanks for the conversion of the example files to PLINK format! I checked the .bim and .fam file and they match the PLINK doc (this time, I checked more carefully :-) ).

Sadly, this conversion resulted in files that cannot be run by PLINK (note I ran into the same problems as well :-) . I also found out that convertf is also a .deb package installed on Ubuntu). I can let PLINK2 do something, but this does not result in PLINK-readable files either. Below some notes, mostly reminders to self.

Would you try again?

  • If you have the data in a human-readable format, I could handcraft the PLINK text/non-binary files and let PLINK convert it to the binary version.
  • If you'd enjoy this, I could add a script and a test to the build, to confirm that the example data files can be read by PLINK, e.g. in a new folder called -for example- scripts

Cheers, Richel

PLINK v1.7

./plink --bfile ~/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny --assoc --out ~/test --noweb
@----------------------------------------------------------@
|        PLINK!       |     v1.07      |   10/Aug/2009     |
|----------------------------------------------------------|
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@

Skipping web check... [ --noweb ] 
Writing this text to log file [ /home/richel/test.log ]
Analysis started: Wed Jun 30 07:47:48 2021

Options in effect:
	--bfile /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny
	--assoc
	--out /home/richel/test
	--noweb

Reading map (extended format) from [ /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.bim ] 

ERROR: Problem reading BIM file, line 1

PLINK v1.9

./plink --bfile ~/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny --assoc --out ~/test
PLINK v1.90b6.22 64-bit (16 Apr 2021)          www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/richel/test.log.
Options in effect:
  --assoc
  --bfile /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny
  --out /home/richel/test

7652 MB RAM detected; reserving 3826 MB for main workspace.

Error: Invalid chromosome code 'rs6515824' on line 1 of .bim file.
(Use --allow-extra-chr to force it to be accepted.)

PLINK v2.0

./plink2 --bfile ~/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny --glm --out ~/test
PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020)   www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/richel/test.log.
Options in effect:
  --bfile /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny
  --glm
  --out /home/richel/test

Start time: Wed Jun 30 07:49:08 2021
7652 MiB RAM detected; reserving 3826 MiB for main workspace.
Using up to 8 compute threads.
249 samples (0 females, 0 males, 249 ambiguous; 249 founders) loaded from
/home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.fam.

Error: Invalid chromosome code 'rs6515824' on line 1 of .pvar file.
(Use --allow-extra-chr to force it to be accepted.)
End time: Wed Jun 30 07:49:08 2021

following the suggestion results in

./plink2 --bfile ~/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny --glm --allow-extra-chr --out ~/test
PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020)   www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/richel/test.log.
Options in effect:
  --allow-extra-chr
  --bfile /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny
  --glm
  --out /home/richel/test

Start time: Wed Jun 30 07:49:35 2021
7652 MiB RAM detected; reserving 3826 MiB for main workspace.
Using up to 8 compute threads.
249 samples (0 females, 0 males, 249 ambiguous; 249 founders) loaded from
/home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.fam.
9259 variants loaded from
/home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.bim.
1 binary phenotype loaded (0 cases, 249 controls).
Calculating allele frequencies... done.
--glm: Skipping case/control phenotype 'PHENO1' since all samples are controls.
End time: Wed Jun 30 07:49:35 2021

Aha, so the .bim file can be read! Let's re-create it:

./plink2 --bfile ~/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny --allow-extra-chr --make-bpgen  --out ~/HumanOrigins249_tiny

Something is successfully created:

PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020)   www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/richel/HumanOrigins249_tiny.log.
Options in effect:
  --allow-extra-chr
  --bfile /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny
  --make-bpgen
  --out /home/richel/HumanOrigins249_tiny

Start time: Wed Jun 30 07:53:16 2021
7652 MiB RAM detected; reserving 3826 MiB for main workspace.
Using up to 8 compute threads.
249 samples (0 females, 0 males, 249 ambiguous; 249 founders) loaded from
/home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.fam.
9259 variants loaded from
/home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.bim.
1 binary phenotype loaded (0 cases, 249 controls).
Writing /home/richel/HumanOrigins249_tiny.fam ... done.
Writing /home/richel/HumanOrigins249_tiny.bim ... done.
Writing /home/richel/HumanOrigins249_tiny.pgen ... done.
End time: Wed Jun 30 07:53:16 2021

Sadly, in R, the files cannot be read.

Here is genio's response:

genio::read_bed(
  bed_filename,
  names_loci = bim_table$id,
  names_ind = fam_table$id
 )
Reading: /home/richel/.local/share/gcaer/gcae_v1_0/example_tiny//HumanOrigins249_tiny.bed
Error in read_bed_cpp(file, m_loci, n_ind) : 
  Row 1 padding was non-zero.  Either the specified number of individuals is incorrect or the input file is corrupt!

Here is ARTP2's response:

ARTP2::read.bed(bed = bed_filename, bim = bim_filename, fam = fam_filename)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'an integer', got 'rs6515824'

Suggest: improve CLI error messages

Dear GenoCAE maintainers,

I enjoy GenoCAE quite a bit and especially the examples are great!

What would make me like GenoCAE even better, is to have clearer error messages from the CLI. I think redirecting the user to the help is great, but a clearer error message to guide the user to the next step would be even better.

Some examples:

Example 1

This is not something a user will blame you for, it is more of an opening to the next example.

python run_gcae.py train

I get:

2021-07-02 11:35:47.399470: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
tensorflow version 2.3.3
Invalid command. Run 'python run_gcae.py --help' for more information.

I expected something like:

`datadir` is missing. Please specify the data folder using `--datadir [data dir]`, e.g. `--datadir example_tiny/`

Example 2

This is what I had myself:

python run_gcae.py train --datadir example_tiny/ --data HumanOrigins249_tiny --model_id M1 --train_opts_id ex3 --data_opts_id b_0_4

I got:


2021-07-02 11:35:25.100815: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
tensorflow version 2.3.3
Invalid command. Run 'python run_gcae.py --help' for more information.

I expected something like:

`epochs` is missing. Please specify the number of epochs using `--epochs [number]`, e.g. `--epochs 20`

Idea: use Swish instead of ELU?

GCAE uses the exponential linear unit ('ELU') as an activation function. In [1] it is claimed that 'the Swish activation function would be better in all cases [over ELU]'.

I am unsure if you think it would be worth to try out Swish? The improvements in accuracy as shown in [1] are only minor.

  • [1] Ramachandran, Prajit, Barret Zoph, and Quoc V. Le. "Searching for activation functions." arXiv preprint arXiv:1710.05941 (2017). https://arxiv.org/abs/1710.05941

Suggest: release a version

Dear GenoCAE maintainers,

I quote from the GitHub docs:

Releases are deployable software iterations you can package and make available for a wider audience to download and use.

I would love to have a named release version, e.g. v1.0 (as that is the version shown in the help) that I can use (over the commit hash) for the gcaer R package I am writing to install this cool tool.

It's easy to make one:

there

Sure, I volunteer to do this, but for that I need more access rights than you may want to give (which I'd understand :-) ).

Suggest: give an error message when an invalid CLI argument is given

When I run GCAE from the command line with a nonsense argument, e.g. nonsense:

python run_gcae.py nonsense

I get to see the help file:

2021-06-24 06:34:30.118604: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-06-24 06:34:30.118629: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
tensorflow version 2.3.3
Usage:
  run_gcae.py train --datadir=<name> --data=<name> --model_id=<name> --train_opts_id=<name> --data_opts_id=<name> --save_interval=<num> --epochs=<num> [--resume_from=<num> --trainedmodeldir=<name> ]
  run_gcae.py project --datadir=<name>   [ --data=<name> --model_id=<name>  --train_opts_id=<name> --data_opts_id=<name> --superpops=<name> --epoch=<num> --trainedmodeldir=<name>   --pdata=<name> --trainedmodelname=<name>]
  run_gcae.py plot --datadir=<name> [  --data=<name>  --model_id=<name> --train_opts_id=<name> --data_opts_id=<name>  --superpops=<name> --epoch=<num> --trainedmodeldir=<name>  --pdata=<name> --trainedmodelname=<name>]
  run_gcae.py animate --datadir=<name>   [ --data=<name>   --model_id=<name> --train_opts_id=<name> --data_opts_id=<name>  --superpops=<name> --epoch=<num> --trainedmodeldir=<name> --pdata=<name> --trainedmodelname=<name>]
  run_gcae.py evaluate --datadir=<name> --metrics=<name>  [  --data=<name>  --model_id=<name> --train_opts_id=<name> --data_opts_id=<name>  --superpops=<name> --epoch=<num> --trainedmodeldir=<name>  --pdata=<name> --trainedmodelname=<name>]

I think it is already friendly to show the help, yet I would not expect the output to be exactly the same as when doing python run_gcae.py --help. I suggest to add an error message (and error code) if an invalid CLI argument is given.

Docker building error due to python requirements

Hi @kausmees, I seem to be having some issues setting up the Docker container, which I think stems from installing the python requirements.

Reprex

git clone https://github.com/kausmees/GenoCAE.git
cd GenoCAE
docker build -t gcae/genocae:build -f docker/build.dockerfile .

Output

Here is the full output, but the main error comes at the very end.

Sending build context to Docker daemon  5.337MB
Step 1/14 : ARG CUDA_VERSION=11.1.1
Step 2/14 : ARG OS_VERSION=20.04
Step 3/14 : FROM nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-ubuntu${OS_VERSION}
 ---> 75f53d2b5da8
Step 4/14 : LABEL maintainer="Dong Wang"
 ---> Using cache
 ---> 05c68a023e26
Step 5/14 : ENV PATH="/root/miniconda3/bin:${PATH}"
 ---> Using cache
 ---> 84aafea13cc7
Step 6/14 : ARG PATH="/root/miniconda3/bin:${PATH}"
 ---> Using cache
 ---> d5d84110b3c8
Step 7/14 : ARG DEBIAN_FRONTEND=noninteractive
 ---> Running in 79a5ae18bf31
Removing intermediate container 79a5ae18bf31
 ---> a9ec0c06e8ee
Step 8/14 : SHELL ["/bin/bash", "-c"]
 ---> Running in 07a8435b31cd
Removing intermediate container 07a8435b31cd
 ---> bb8387371d5a
Step 9/14 : RUN apt-get update && apt-get upgrade -y &&apt-get install -y wget python3-pip
 ---> Running in 07b6d6f53352
Get:1 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease [1581 B]
Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:9 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.3 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1161 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2415 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1404 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [27.1 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [54.2 kB]
Get:16 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages [579 kB]
Get:17 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.5 kB]
Get:18 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1324 kB]
Get:19 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [881 kB]
Get:20 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1974 kB]
Fetched 23.3 MB in 2s (13.5 MB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
Calculating upgrade...
The following packages have been kept back:
  libcudnn8 libcudnn8-dev libnccl-dev libnccl2
The following packages will be upgraded:
  apt ca-certificates dpkg dpkg-dev e2fsprogs libapt-pkg6.0 libc-bin
  libcom-err2 libdpkg-perl libext2fs2 libpcre3 libsepol1 libss2 libssl1.1
  libsystemd0 libudev1 linux-libc-dev login logsave openssl passwd
21 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.
Need to get 10.6 MB of archives.
After this operation, 22.5 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 dpkg amd64 1.19.7ubuntu3.2 [1128 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 login amd64 1:4.8.1-1ubuntu5.20.04.2 [220 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libc-bin amd64 2.31-0ubuntu9.9 [633 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libsystemd0 amd64 245.4-4ubuntu3.17 [269 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libudev1 amd64 245.4-4ubuntu3.17 [76.5 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libapt-pkg6.0 amd64 2.0.9 [839 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 apt amd64 2.0.9 [1294 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 logsave amd64 1.45.5-2ubuntu1.1 [10.2 kB]
Get:9 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libext2fs2 amd64 1.45.5-2ubuntu1.1 [183 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 e2fsprogs amd64 1.45.5-2ubuntu1.1 [527 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpcre3 amd64 2:8.39-12ubuntu0.1 [232 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libsepol1 amd64 3.0-1ubuntu0.1 [252 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 passwd amd64 1:4.8.1-1ubuntu5.20.04.2 [797 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libcom-err2 amd64 1.45.5-2ubuntu1.1 [9548 B]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libss2 amd64 1.45.5-2ubuntu1.1 [11.3 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libssl1.1 amd64 1.1.1f-1ubuntu2.15 [1321 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 openssl amd64 1.1.1f-1ubuntu2.15 [623 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 ca-certificates all 20211016~20.04.1 [144 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 dpkg-dev all 1.19.7ubuntu3.2 [679 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libdpkg-perl all 1.19.7ubuntu3.2 [231 kB]
Get:21 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 linux-libc-dev amd64 5.4.0-120.136 [1113 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 10.6 MB in 0s (55.0 MB/s)
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../dpkg_1.19.7ubuntu3.2_amd64.deb ...
Unpacking dpkg (1.19.7ubuntu3.2) over (1.19.7ubuntu3) ...
Setting up dpkg (1.19.7ubuntu3.2) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../login_1%3a4.8.1-1ubuntu5.20.04.2_amd64.deb ...
Unpacking login (1:4.8.1-1ubuntu5.20.04.2) over (1:4.8.1-1ubuntu5.20.04.1) ...
Setting up login (1:4.8.1-1ubuntu5.20.04.2) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../libc-bin_2.31-0ubuntu9.9_amd64.deb ...
Unpacking libc-bin (2.31-0ubuntu9.9) over (2.31-0ubuntu9.7) ...
Setting up libc-bin (2.31-0ubuntu9.9) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../libsystemd0_245.4-4ubuntu3.17_amd64.deb ...
Unpacking libsystemd0:amd64 (245.4-4ubuntu3.17) over (245.4-4ubuntu3.16) ...
Setting up libsystemd0:amd64 (245.4-4ubuntu3.17) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../libudev1_245.4-4ubuntu3.17_amd64.deb ...
Unpacking libudev1:amd64 (245.4-4ubuntu3.17) over (245.4-4ubuntu3.16) ...
Setting up libudev1:amd64 (245.4-4ubuntu3.17) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../libapt-pkg6.0_2.0.9_amd64.deb ...
Unpacking libapt-pkg6.0:amd64 (2.0.9) over (2.0.6) ...
Setting up libapt-pkg6.0:amd64 (2.0.9) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../archives/apt_2.0.9_amd64.deb ...
Unpacking apt (2.0.9) over (2.0.6) ...
Setting up apt (2.0.9) ...
Removing obsolete conffile /etc/kernel/postinst.d/apt-auto-removal ...
(Reading database ... 12625 files and directories currently installed.)
Preparing to unpack .../logsave_1.45.5-2ubuntu1.1_amd64.deb ...
Unpacking logsave (1.45.5-2ubuntu1.1) over (1.45.5-2ubuntu1) ...
Preparing to unpack .../libext2fs2_1.45.5-2ubuntu1.1_amd64.deb ...
Unpacking libext2fs2:amd64 (1.45.5-2ubuntu1.1) over (1.45.5-2ubuntu1) ...
Setting up libext2fs2:amd64 (1.45.5-2ubuntu1.1) ...
(Reading database ... 12625 files and directories currently installed.)
Preparing to unpack .../e2fsprogs_1.45.5-2ubuntu1.1_amd64.deb ...
Unpacking e2fsprogs (1.45.5-2ubuntu1.1) over (1.45.5-2ubuntu1) ...
Preparing to unpack .../libpcre3_2%3a8.39-12ubuntu0.1_amd64.deb ...
Unpacking libpcre3:amd64 (2:8.39-12ubuntu0.1) over (2:8.39-12build1) ...
Setting up libpcre3:amd64 (2:8.39-12ubuntu0.1) ...
(Reading database ... 12625 files and directories currently installed.)
Preparing to unpack .../libsepol1_3.0-1ubuntu0.1_amd64.deb ...
Unpacking libsepol1:amd64 (3.0-1ubuntu0.1) over (3.0-1) ...
Setting up libsepol1:amd64 (3.0-1ubuntu0.1) ...
(Reading database ... 12625 files and directories currently installed.)
Preparing to unpack .../passwd_1%3a4.8.1-1ubuntu5.20.04.2_amd64.deb ...
Unpacking passwd (1:4.8.1-1ubuntu5.20.04.2) over (1:4.8.1-1ubuntu5.20.04.1) ...
Setting up passwd (1:4.8.1-1ubuntu5.20.04.2) ...
(Reading database ... 12625 files and directories currently installed.)
Preparing to unpack .../0-libcom-err2_1.45.5-2ubuntu1.1_amd64.deb ...
Unpacking libcom-err2:amd64 (1.45.5-2ubuntu1.1) over (1.45.5-2ubuntu1) ...
Preparing to unpack .../1-libss2_1.45.5-2ubuntu1.1_amd64.deb ...
Unpacking libss2:amd64 (1.45.5-2ubuntu1.1) over (1.45.5-2ubuntu1) ...
Preparing to unpack .../2-libssl1.1_1.1.1f-1ubuntu2.15_amd64.deb ...
Unpacking libssl1.1:amd64 (1.1.1f-1ubuntu2.15) over (1.1.1f-1ubuntu2.13) ...
Preparing to unpack .../3-openssl_1.1.1f-1ubuntu2.15_amd64.deb ...
Unpacking openssl (1.1.1f-1ubuntu2.15) over (1.1.1f-1ubuntu2.13) ...
Preparing to unpack .../4-ca-certificates_20211016~20.04.1_all.deb ...
Unpacking ca-certificates (20211016~20.04.1) over (20210119~20.04.2) ...
Preparing to unpack .../5-dpkg-dev_1.19.7ubuntu3.2_all.deb ...
Unpacking dpkg-dev (1.19.7ubuntu3.2) over (1.19.7ubuntu3) ...
Preparing to unpack .../6-libdpkg-perl_1.19.7ubuntu3.2_all.deb ...
Unpacking libdpkg-perl (1.19.7ubuntu3.2) over (1.19.7ubuntu3) ...
Preparing to unpack .../7-linux-libc-dev_5.4.0-120.136_amd64.deb ...
Unpacking linux-libc-dev:amd64 (5.4.0-120.136) over (5.4.0-113.127) ...
Setting up libssl1.1:amd64 (1.1.1f-1ubuntu2.15) ...
Setting up linux-libc-dev:amd64 (5.4.0-120.136) ...
Setting up libcom-err2:amd64 (1.45.5-2ubuntu1.1) ...
Setting up libss2:amd64 (1.45.5-2ubuntu1.1) ...
Setting up libdpkg-perl (1.19.7ubuntu3.2) ...
Setting up logsave (1.45.5-2ubuntu1.1) ...
Setting up openssl (1.1.1f-1ubuntu2.15) ...
Setting up e2fsprogs (1.45.5-2ubuntu1.1) ...
Setting up dpkg-dev (1.19.7ubuntu3.2) ...
Setting up ca-certificates (20211016~20.04.1) ...
Updating certificates in /etc/ssl/certs...
rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
7 added, 8 removed; done.
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
Processing triggers for ca-certificates (20211016~20.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  file libexpat1 libexpat1-dev libmagic-mgc libmagic1 libmpdec2 libpsl5
  libpython3-dev libpython3-stdlib libpython3.8 libpython3.8-dev
  libpython3.8-minimal libpython3.8-stdlib mime-support publicsuffix
  python-pip-whl python3 python3-dev python3-distutils python3-lib2to3
  python3-minimal python3-pkg-resources python3-setuptools python3-wheel
  python3.8 python3.8-dev python3.8-minimal zlib1g-dev
Suggested packages:
  python3-doc python3-tk python3-venv python-setuptools-doc python3.8-venv
  python3.8-doc binfmt-support
The following NEW packages will be installed:
  file libexpat1 libexpat1-dev libmagic-mgc libmagic1 libmpdec2 libpsl5
  libpython3-dev libpython3-stdlib libpython3.8 libpython3.8-dev
  libpython3.8-minimal libpython3.8-stdlib mime-support publicsuffix
  python-pip-whl python3 python3-dev python3-distutils python3-lib2to3
  python3-minimal python3-pip python3-pkg-resources python3-setuptools
  python3-wheel python3.8 python3.8-dev python3.8-minimal wget zlib1g-dev
0 upgraded, 30 newly installed, 0 to remove and 4 not upgraded.
Need to get 14.9 MB of archives.
After this operation, 63.1 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpython3.8-minimal amd64 3.8.10-0ubuntu1~20.04.4 [717 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libexpat1 amd64 2.2.9-1ubuntu0.4 [74.4 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3.8-minimal amd64 3.8.10-0ubuntu1~20.04.4 [1899 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-minimal amd64 3.8.2-0ubuntu2 [23.6 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal/main amd64 mime-support all 3.64ubuntu1 [30.6 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal/main amd64 libmpdec2 amd64 2.4.2-3 [81.1 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpython3.8-stdlib amd64 3.8.10-0ubuntu1~20.04.4 [1675 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3.8 amd64 3.8.10-0ubuntu1~20.04.4 [387 kB]
Get:9 http://archive.ubuntu.com/ubuntu focal/main amd64 libpython3-stdlib amd64 3.8.2-0ubuntu2 [7068 B]
Get:10 http://archive.ubuntu.com/ubuntu focal/main amd64 python3 amd64 3.8.2-0ubuntu2 [47.6 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal/main amd64 libmagic-mgc amd64 1:5.38-4 [218 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/main amd64 libmagic1 amd64 1:5.38-4 [75.9 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal/main amd64 file amd64 1:5.38-4 [23.3 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-pkg-resources all 45.2.0-1 [130 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal/main amd64 libpsl5 amd64 0.21.0-1ubuntu1 [51.5 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal/main amd64 publicsuffix all 20200303.0012-1 [111 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 wget amd64 1.20.3-1ubuntu2 [348 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libexpat1-dev amd64 2.2.9-1ubuntu0.4 [117 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpython3.8 amd64 3.8.10-0ubuntu1~20.04.4 [1625 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpython3.8-dev amd64 3.8.10-0ubuntu1~20.04.4 [3952 kB]
Get:21 http://archive.ubuntu.com/ubuntu focal/main amd64 libpython3-dev amd64 3.8.2-0ubuntu2 [7236 B]
Get:22 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 python-pip-whl all 20.0.2-5ubuntu1.6 [1805 kB]
Get:23 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 zlib1g-dev amd64 1:1.2.11.dfsg-2ubuntu1.3 [155 kB]
Get:24 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3.8-dev amd64 3.8.10-0ubuntu1~20.04.4 [514 kB]
Get:25 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3-lib2to3 all 3.8.10-0ubuntu1~20.04 [76.3 kB]
Get:26 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3-distutils all 3.8.10-0ubuntu1~20.04 [141 kB]
Get:27 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-dev amd64 3.8.2-0ubuntu2 [1212 B]
Get:28 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-setuptools all 45.2.0-1 [330 kB]
Get:29 http://archive.ubuntu.com/ubuntu focal/universe amd64 python3-wheel all 0.34.2-1 [23.8 kB]
Get:30 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 python3-pip all 20.0.2-5ubuntu1.6 [231 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 14.9 MB in 0s (81.4 MB/s)
Selecting previously unselected package libpython3.8-minimal:amd64.
(Reading database ... 12624 files and directories currently installed.)
Preparing to unpack .../libpython3.8-minimal_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking libpython3.8-minimal:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package libexpat1:amd64.
Preparing to unpack .../libexpat1_2.2.9-1ubuntu0.4_amd64.deb ...
Unpacking libexpat1:amd64 (2.2.9-1ubuntu0.4) ...
Selecting previously unselected package python3.8-minimal.
Preparing to unpack .../python3.8-minimal_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking python3.8-minimal (3.8.10-0ubuntu1~20.04.4) ...
Setting up libpython3.8-minimal:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Setting up libexpat1:amd64 (2.2.9-1ubuntu0.4) ...
Setting up python3.8-minimal (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package python3-minimal.
(Reading database ... 12915 files and directories currently installed.)
Preparing to unpack .../0-python3-minimal_3.8.2-0ubuntu2_amd64.deb ...
Unpacking python3-minimal (3.8.2-0ubuntu2) ...
Selecting previously unselected package mime-support.
Preparing to unpack .../1-mime-support_3.64ubuntu1_all.deb ...
Unpacking mime-support (3.64ubuntu1) ...
Selecting previously unselected package libmpdec2:amd64.
Preparing to unpack .../2-libmpdec2_2.4.2-3_amd64.deb ...
Unpacking libmpdec2:amd64 (2.4.2-3) ...
Selecting previously unselected package libpython3.8-stdlib:amd64.
Preparing to unpack .../3-libpython3.8-stdlib_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking libpython3.8-stdlib:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package python3.8.
Preparing to unpack .../4-python3.8_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking python3.8 (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package libpython3-stdlib:amd64.
Preparing to unpack .../5-libpython3-stdlib_3.8.2-0ubuntu2_amd64.deb ...
Unpacking libpython3-stdlib:amd64 (3.8.2-0ubuntu2) ...
Setting up python3-minimal (3.8.2-0ubuntu2) ...
Selecting previously unselected package python3.
(Reading database ... 13317 files and directories currently installed.)
Preparing to unpack .../00-python3_3.8.2-0ubuntu2_amd64.deb ...
Unpacking python3 (3.8.2-0ubuntu2) ...
Selecting previously unselected package libmagic-mgc.
Preparing to unpack .../01-libmagic-mgc_1%3a5.38-4_amd64.deb ...
Unpacking libmagic-mgc (1:5.38-4) ...
Selecting previously unselected package libmagic1:amd64.
Preparing to unpack .../02-libmagic1_1%3a5.38-4_amd64.deb ...
Unpacking libmagic1:amd64 (1:5.38-4) ...
Selecting previously unselected package file.
Preparing to unpack .../03-file_1%3a5.38-4_amd64.deb ...
Unpacking file (1:5.38-4) ...
Selecting previously unselected package python3-pkg-resources.
Preparing to unpack .../04-python3-pkg-resources_45.2.0-1_all.deb ...
Unpacking python3-pkg-resources (45.2.0-1) ...
Selecting previously unselected package libpsl5:amd64.
Preparing to unpack .../05-libpsl5_0.21.0-1ubuntu1_amd64.deb ...
Unpacking libpsl5:amd64 (0.21.0-1ubuntu1) ...
Selecting previously unselected package publicsuffix.
Preparing to unpack .../06-publicsuffix_20200303.0012-1_all.deb ...
Unpacking publicsuffix (20200303.0012-1) ...
Selecting previously unselected package wget.
Preparing to unpack .../07-wget_1.20.3-1ubuntu2_amd64.deb ...
Unpacking wget (1.20.3-1ubuntu2) ...
Selecting previously unselected package libexpat1-dev:amd64.
Preparing to unpack .../08-libexpat1-dev_2.2.9-1ubuntu0.4_amd64.deb ...
Unpacking libexpat1-dev:amd64 (2.2.9-1ubuntu0.4) ...
Selecting previously unselected package libpython3.8:amd64.
Preparing to unpack .../09-libpython3.8_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking libpython3.8:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package libpython3.8-dev:amd64.
Preparing to unpack .../10-libpython3.8-dev_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking libpython3.8-dev:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package libpython3-dev:amd64.
Preparing to unpack .../11-libpython3-dev_3.8.2-0ubuntu2_amd64.deb ...
Unpacking libpython3-dev:amd64 (3.8.2-0ubuntu2) ...
Selecting previously unselected package python-pip-whl.
Preparing to unpack .../12-python-pip-whl_20.0.2-5ubuntu1.6_all.deb ...
Unpacking python-pip-whl (20.0.2-5ubuntu1.6) ...
Selecting previously unselected package zlib1g-dev:amd64.
Preparing to unpack .../13-zlib1g-dev_1%3a1.2.11.dfsg-2ubuntu1.3_amd64.deb ...
Unpacking zlib1g-dev:amd64 (1:1.2.11.dfsg-2ubuntu1.3) ...
Selecting previously unselected package python3.8-dev.
Preparing to unpack .../14-python3.8-dev_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking python3.8-dev (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package python3-lib2to3.
Preparing to unpack .../15-python3-lib2to3_3.8.10-0ubuntu1~20.04_all.deb ...
Unpacking python3-lib2to3 (3.8.10-0ubuntu1~20.04) ...
Selecting previously unselected package python3-distutils.
Preparing to unpack .../16-python3-distutils_3.8.10-0ubuntu1~20.04_all.deb ...
Unpacking python3-distutils (3.8.10-0ubuntu1~20.04) ...
Selecting previously unselected package python3-dev.
Preparing to unpack .../17-python3-dev_3.8.2-0ubuntu2_amd64.deb ...
Unpacking python3-dev (3.8.2-0ubuntu2) ...
Selecting previously unselected package python3-setuptools.
Preparing to unpack .../18-python3-setuptools_45.2.0-1_all.deb ...
Unpacking python3-setuptools (45.2.0-1) ...
Selecting previously unselected package python3-wheel.
Preparing to unpack .../19-python3-wheel_0.34.2-1_all.deb ...
Unpacking python3-wheel (0.34.2-1) ...
Selecting previously unselected package python3-pip.
Preparing to unpack .../20-python3-pip_20.0.2-5ubuntu1.6_all.deb ...
Unpacking python3-pip (20.0.2-5ubuntu1.6) ...
Setting up libpsl5:amd64 (0.21.0-1ubuntu1) ...
Setting up mime-support (3.64ubuntu1) ...
Setting up wget (1.20.3-1ubuntu2) ...
Setting up libmagic-mgc (1:5.38-4) ...
Setting up libmagic1:amd64 (1:5.38-4) ...
Setting up file (1:5.38-4) ...
Setting up libexpat1-dev:amd64 (2.2.9-1ubuntu0.4) ...
Setting up zlib1g-dev:amd64 (1:1.2.11.dfsg-2ubuntu1.3) ...
Setting up python-pip-whl (20.0.2-5ubuntu1.6) ...
Setting up libmpdec2:amd64 (2.4.2-3) ...
Setting up libpython3.8-stdlib:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Setting up python3.8 (3.8.10-0ubuntu1~20.04.4) ...
Setting up publicsuffix (20200303.0012-1) ...
Setting up libpython3-stdlib:amd64 (3.8.2-0ubuntu2) ...
Setting up python3 (3.8.2-0ubuntu2) ...
Setting up python3-wheel (0.34.2-1) ...
Setting up libpython3.8:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Setting up python3-lib2to3 (3.8.10-0ubuntu1~20.04) ...
Setting up python3-pkg-resources (45.2.0-1) ...
Setting up python3-distutils (3.8.10-0ubuntu1~20.04) ...
Setting up python3-setuptools (45.2.0-1) ...
Setting up libpython3.8-dev:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Setting up python3-pip (20.0.2-5ubuntu1.6) ...
Setting up python3.8-dev (3.8.10-0ubuntu1~20.04.4) ...
Setting up libpython3-dev:amd64 (3.8.2-0ubuntu2) ...
Setting up python3-dev (3.8.2-0ubuntu2) ...
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
Removing intermediate container 07b6d6f53352
 ---> bdad80779900
Step 10/14 : RUN python3 -m pip install --no-cache-dir --upgrade pip
 ---> Running in a7851101a995
Collecting pip
  Downloading pip-22.1.2-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.0.2
    Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
    Can't uninstall 'pip'. No files were found to uninstall.
Successfully installed pip-22.1.2
Removing intermediate container a7851101a995
 ---> fdd991e1849d
Step 11/14 : WORKDIR /workspace
 ---> Running in 69aa6a53c3e4
Removing intermediate container 69aa6a53c3e4
 ---> 325c1b83c37e
Step 12/14 : ADD ./requirements.txt /workspace
 ---> 1757a4388050
Step 13/14 : RUN python3 -m pip install -r /workspace/requirements.txt and &&rm /workspace/requirements.txt
 ---> Running in df9be31afa55
ERROR: Could not find a version that satisfies the requirement and (from versions: none)
ERROR: No matching distribution found for and
The command '/bin/bash -c python3 -m pip install -r /workspace/requirements.txt and &&rm /workspace/requirements.txt' returned a non-zero code: 1

Potential solutions

One way to avoid this might be to make use of conda environments with less restrictive version requirements. I've created a yaml file which can be used to set up all the dependencies. Haven't yet tried running GenoCAE with it yet though.
Perhaps this could be used when setting up the Docker container, instead of the requirements.txt file? (PS- only added the .txt suffix to allow it to be uploaded to GH Issues).

Thanks! Really looking forward to using GenoCAE!
env.yml.txt

conda env create -f env.yml.txt

Best,
Brian

GCAE gives error when running 'project' with a phenotype

Dear GenoCAE maintainers,

Thanks for GenoCAE and its Continuous Integration (GitHub Actions) script!

When I run GenoCAE with the added/experimental phenotype, I can now (thanks to #19) train the neural network. Great!

However, when I want to project the genotypes, I can get it to run (after fixing #21), but in the end it fails.

Training goes great, as confirmed by this example GitHub Actions log:

python3 run_gcae.py train --datadir example_tiny --data issue_6_bin --model_id M1  --epochs 3 --save_interval 1  --train_opts_id ex3  --data_opts_id b_0_4 --pheno_model_id=p1

The last line of the output is also clear:

Done training. Wrote to /home/runner/work/GenoCAE/GenoCAE/ae_out/ae.M1.ex3.b_0_4.issue_6_bin.p1

When I start using the project option (after fixing #21), it starts running, but fails in the visualization:

When I run on GHA like this:

python3 run_gcae.py project --datadir example_tiny --data issue_6_bin --model_id M1 --train_opts_id ex3 --data_opts_id b_0_4 --superpops example_tiny/HO_superpopulations --pheno_model_id=p1

It starts doing the projection until the visualisation, then it gives the following error (as copied from the GHA log) (full error message is at the bottom of this Issue):

Traceback (most recent call last):
  File "run_gcae.py", line 1616, in <module>
    main()
  File "run_gcae.py", line 1365, in main
    plot_coords_by_superpop(coords_by_pop,"{0}/dimred_e_{1}_by_superpop".format(results_directory, epoch), superpopulations_file, plot_legend = epoch == epochs[0])
  File "/home/runner/work/GenoCAE/GenoCAE/utils/visualization.py", line 222, in plot_coords_by_superpop
    max_num_pops = max([len(superpop_dict[spop]) for spop in superpops])
ValueError: max() arg is an empty sequence
Error: Process completed with exit code 1.

I expected the projection to work the same with or without the phenotype, as it gives a useful visualization of the dimensionality reduction, as show in the Ausmees & Nettelblad paper [1].

How do I get this to work?

Thanks and cheers, Richel

References

  • [1] Ausmees, Kristiina and Nettelblad, Carl. "A deep learning framework for characterization of genotype data." bioRxiv (2020).

Full error message

Copied from a GHA log:

2021-12-07 12:55:17.080095: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-07 12:55:17.080130: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-12-07 12:55:19.686377: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-12-07 12:55:19.686415: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2021-12-07 12:55:19.686436: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az74-543): /proc/driver/nvidia/version does not exist
2021-12-07 12:55:19.686793: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
tensorflow version 2.7.0


______________________________ arguments ______________________________
train : False
datadir : example_tiny
data : issue_6_bin
model_id : M1
train_opts_id : ex3
data_opts_id : b_0_4
save_interval : None
epochs : None
resume_from : None
trainedmodeldir : None
pheno_model_id : p1
project : True
superpops : example_tiny/HO_superpopulations
epoch : None
pdata : None
trainedmodelname : None
plot : False
animate : False
evaluate : False
metrics : None

______________________________ data opts ______________________________
sparsifies : [0.0, 0.1, 0.2, 0.3, 0.4]
norm_opts : {'flip': False, 'missing_val': -1.0}
norm_mode : genotypewise01
impute_missing : True
validation_split : 0.2

______________________________ train opts ______________________________
learning_rate : 0.00032
batch_size : 10
noise_std : 0.0032
n_samples : -1
loss : {'module': 'tf.keras.losses', 'class': 'CategoricalCrossentropy', 'args': {'from_logits': False}}
regularizer : {'reg_factor': 1e-07, 'module': 'tf.keras.regularizers', 'class': 'l2'}
lr_scheme : {'module': 'tf.keras.optimizers.schedules', 'class': 'ExponentialDecay', 'args': {'decay_rate': 0.96, 'decay_steps': 100, 'staircase': False}}
______________________________
Imputing originally missing genotypes to most common value.
Reading ind pop list from /home/runner/work/GenoCAE/GenoCAE/example_tiny/issue_6_bin.fam
Reading ind pop list from /home/runner/work/GenoCAE/GenoCAE/example_tiny/issue_6_bin.fam
Mapping files:   0%|          | 0/3 [00:00<?, ?it/s]
Mapping files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:00<00:00, 227.43it/s]array([[ 0.10205683, -0.50682646, -0.7242572 , -0.41514382],
       [ 0.08256383, -0.4811394 , -0.66083604, -0.38157633],
       [ 0.04545861, -0.38441584, -0.52315164, -0.36843315],
       [ 0.10497452, -0.50911087, -0.73314375, -0.42168617],
       [ 0.06072002, -0.40441108, -0.57263076, -0.39334926]],
      dtype=float32)
[[0.5 0.5 0 0.5]
 [0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5]
 [0.5 0.5 0 0.5]
 [0.5 0.5 0.5 0.5]] array([[1. , 1. , 0.5, 1. ],
       [1. , 0.5, 1. , 0.5],
       [1. , 0. , 0.5, 1. ],
       [0.5, 0.5, 0.5, 0.5],
       [1. , 0.5, 0. , 0.5]])

Encoded data file not found: /home/runner/work/GenoCAE/GenoCAE/ae_out/ae.M1.ex3.b_0_4.issue_6_bin.p1/issue_6_bin/encoded_data.h5 
Projecting epochs: [1, 2, 3]
Already projected: []
In DG.get_train_set: number of -1.0 genotypes in train: 0
In DG.get_train_set: number of -9 genotypes in train: 0
In DG.get_train_set: number of 0 values in train mask: 0

______________________________ Building model ______________________________
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'strides': 1}
Adding layer: BatchNormalization: {}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: MaxPooling1D: {'pool_size': 5, 'strides': 2, 'padding': 'same'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Flatten: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dense: {'units': 2, 'name': 'encoded'}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 16}
Adding layer: Reshape: {'target_shape': (2, 8), 'name': 'i_msvar'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Reshape: {'target_shape': (2, 1, 8)}
Adding layer: UpSampling2D: {'size': (2, 1)}
Adding layer: Reshape: {'target_shape': (4, 8)}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu', 'name': 'nms'}
Adding layer: BatchNormalization: {}
Adding layer: Conv1D: {'filters': 1, 'kernel_size': 1, 'padding': 'same'}
Adding layer: Flatten: {'name': 'logits'}

______________________________ Building model ______________________________
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dense: {'units': 1}
No marker specific variable.
########################### epoch 1 ###########################
Reading weights from /home/runner/work/GenoCAE/GenoCAE/ae_out/ae.M1.ex3.b_0_4.issue_6_bin.p1/weights/1
tf.Tensor(
[0.23319791 0.19988029 0.19377704 0.23756738 0.21141517 0.19966617
 0.18867628 0.22778517 0.22095694 0.23621069 0.24232931 0.23751874
 0.21052796 0.2478469  0.22860327 0.24310602 0.2248713  0.22607517
 0.21316327 0.24836378 0.24003232 0.2017554  0.2420473  0.25501102
 0.236629   0.22140262 0.20744076 0.20671275 0.22881663 0.19617875], shape=(30,), dtype=float32)
(30,)
tf.Tensor(
[0.2380066  0.19335003 0.23319791 0.23863165 0.23831964 0.18697073
 0.23600692 0.21887155 0.1588867  0.1949499  0.21993566 0.25195104
 0.18325783 0.24391052 0.18994804 0.23802298 0.20401272 0.22448331
 0.2229189  0.21869858 0.23501374 0.22200371 0.23621069 0.20525226
 0.2003792  0.24328303 0.24873887 0.23385888 0.24009213 0.2101993 ], shape=(30,), dtype=float32)
(30,)
tf.Tensor(
[0.21908474 0.22722876 0.2339882  0.23049714 0.20231368 0.2011057
 0.21599561 0.17069077 0.23896992 0.2420473  0.24056283 0.20440042
 0.24473031 0.23399888 0.2080764  0.21408066 0.23621069 0.20678237
 0.20441745 0.19976138 0.200915   0.22420047 0.21946532 0.2136519
 0.23442683 0.2120275  0.23950182 0.21574992 0.23319791 0.2304342 ], shape=(30,), dtype=float32)
(30,)
tf.Tensor(
[0.24093255 0.23797578 0.23640822 0.19804403 0.21179074 0.24339268
 0.20556012 0.24795108 0.22094505 0.25103822 0.2339133  0.18515958
 0.23047343 0.23206557 0.20824197 0.23773867 0.22685748 0.18689398
 0.21542913 0.23442683 0.24944112 0.24592474 0.22365497 0.22963423
 0.19812118 0.24454156 0.23143443 0.21166426 0.21157375 0.2214748 ], shape=(30,), dtype=float32)
(30,)
tf.Tensor(
[0.2144528  0.23790193 0.20847306 0.17789963 0.22853369 0.22519661
 0.22575805 0.23663682 0.2309236  0.21082726 0.19669218 0.1876471
 0.18697073 0.22914475 0.20111184 0.2027495  0.22810453 0.24159692
 0.24206525 0.19896871 0.22794227 0.21941704 0.21471435 0.19822605
 0.20103996 0.23831964 0.18830639 0.20552994 0.23621069 0.23235762], shape=(30,), dtype=float32)
(30,)
tf.Tensor(
[0.24529332 0.2289039  0.23621069 0.19376357 0.23415062 0.22575805
 0.21179074 0.21793306 0.23040852 0.21893027 0.24770258 0.19905189
 0.21635695 0.2532666  0.24553452 0.1958462 ], shape=(16,), dtype=float32)
(16,)
Traceback (most recent call last):
  File "run_gcae.py", line 1616, in <module>
    main()
  File "run_gcae.py", line 1365, in main
    plot_coords_by_superpop(coords_by_pop,"{0}/dimred_e_{1}_by_superpop".format(results_directory, epoch), superpopulations_file, plot_legend = epoch == epochs[0])
  File "/home/runner/work/GenoCAE/GenoCAE/utils/visualization.py", line 222, in plot_coords_by_superpop
    max_num_pops = max([len(superpop_dict[spop]) for spop in superpops])
ValueError: max() arg is an empty sequence
Error: Process completed with exit code 1.

Suggest: shorter error message when --datadir is not found

Dear GCAE maintainer,

Here I try to convince you to give a shorter error message when --datadir is absent.

Thanks for the GCAE examples provided; these are very helpful!

When I run the example code of the first GCAE training example ...

python3 run_gcae.py train --datadir example_tiny/ --data HumanOrigins249_tiny --model_id M1  --epochs 20 --save_interval 2  --train_opts_id ex3  --data_opts_id b_0_4

I get a clear-but-long error message:

2021-06-28 13:48:01.293305: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-06-28 13:48:01.293338: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
tensorflow version 2.3.3
Traceback (most recent call last):
  File "/home/richel/.local/share/gcaer/gcae_v1_0/run_gcae.py", line 396, in <module>
    with open("data_opts/" + data_opts_id+".json") as data_opts_def_file:
FileNotFoundError: [Errno 2] No such file or directory: 'data_opts/b_0_4.json'

The drawback is that this is too long of an error message for R to display (here I use the gcaer R package):

Screenshot from 2021-06-28 13-39-44

Also, one could argue that initializing Tensorflow and looking for CUDA should be done after checking if the CLI arguments are valid.

In that way, the error message would shorten to the lines below and I would be happy:

Traceback (most recent call last):
  File "/home/richel/.local/share/gcaer/gcae_v1_0/run_gcae.py", line 396, in <module>
    with open("data_opts/" + data_opts_id+".json") as data_opts_def_file:
FileNotFoundError: [Errno 2] No such file or directory: 'data_opts/b_0_4.json'

An alternative would be to be able to remove these Tensorflow warnings from a CLI argument.

What I suggest is one of these options:

  • Load Tensorflow after checking the CLI arguments
  • Add the ubiquitous --verbose argument and only show the Tensorflow things when it is enabled

What do you think about this idea?

GCAE cannot run 'project' with a phenotype, how to fix?

Dear GenoCAE maintainers,

Thanks for GenoCAE and its Continuous Integration (GitHub Actions) script!

When I run GenoCAE with the added/experimental phenotype, I can now (thanks to #19) train the neural network. Great!

However, when I want to project the genotypes I get the wrong error messages that are too early.

Training goes great, as confirmed by this example GitHub Actions log:

 python3 run_gcae.py train --datadir example_tiny --data issue_2_bin --model_id M1  --epochs 20 --save_interval 2  --train_opts_id ex3  --data_opts_id b_0_4 --pheno_model_id=p1

The last line of the output is also clear:

Done training. Wrote to /home/runner/work/GenoCAE/GenoCAE/ae_out/ae.M1.ex3.b_0_4.issue_2_bin.p1

Note the .p1 addition to the folder name, which is not there when not working with a phenotype.

When I start using the project option, that I copy from the doc, I get unexpected and/or too early error messages:

When I run on GHA like this:

python3 run_gcae.py project --datadir example_tiny --data issue_2_bin --model_id M1 --train_opts_id ex3 --data_opts_id b_0_4 --superpops example_tiny/HO_superpopulations --pheno_model_id=p1

I get the error:

Invalid command. Run 'python run_gcae.py --help' for more information.

as if the --pheno_model_id=p1 is not supported yet.

Sure, I can delete that flag altogether, but then I get:

FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/GenoCAE/GenoCAE/ae_out/ae.M1.ex3.b_0_4.issue_2_bin/weights'

Note the absence of .p1 in the folder name.

The error I expect would be that the dataset used (issue_2_bin) would not work with the file specified with --superpops example_tiny/HO_superpopulations (although it might work by sheer luck).

How can I use project on a neural net that can also do a phenotype?

What is the cause for 'ValueError: Dimensions must be equal'?

Dear GenoCAE maintainers, hi @kausmees and @cnettel,

When I run the GenCAE experimental Pheno branch, I get an error of which I have no idea what to do with. Below the reprex.

Currently, the GitHub Actions script runs GenoCAE with the --help flag, showing the help successfully.

On my fork of GenoCAE in the GitHub Actions 'check.yaml' script, I added the following command to run:

python3 run_gcae.py train --datadir example_tiny --data issue_2_bin --model_id M1  --epochs 20 --save_interval 2  --train_opts_id ex3  --data_opts_id b_0_4 --pheno_model_id=p1

The --data issue_2_bin are the data files I supplied to Carl at this Issue and are already put in the example_tiny folder of my 'GenoCAE' fork.

GitHub Actions gives the following error:

ValueError: in user code:

    File "/home/richel/GitHubs/GenoCAE/run_gcae.py", line 424, in run_optimization  *
        loss_value += tf.math.reduce_sum(((-y_pred) * y_true)) * 1e-6

    ValueError: Dimensions must be equal, but are 2 and 4 for '{{node mul_21}} = Mul[T=DT_FLOAT](Neg_2, one_hot_2)' with input shapes: [2,4], [2,4,3].

Below is the full error log, which can also be found in this GitHub Actions log.

What does the error mean?

Thanks and cheers, Richel

Full error log

richel@N141CU:~/GitHubs/GenoCAE$ python3 run_gcae.py train --datadir example_tiny --data issue_2_bin --model_id M1  --epochs 20 --save_interval 2  --train_opts_id ex3  --data_opts_id b_0_4 --pheno_model_id=p1
2021-11-30 13:41:10.460286: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-30 13:41:10.460312: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-30 13:41:13.244831: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-11-30 13:41:13.244903: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (N141CU): /proc/driver/nvidia/version does not exist
2021-11-30 13:41:13.245186: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
tensorflow version 2.7.0

______________________________ arguments ______________________________
train : True
datadir : example_tiny
data : issue_2_bin
model_id : M1
train_opts_id : ex3
data_opts_id : b_0_4
save_interval : 2
epochs : 20
resume_from : None
trainedmodeldir : None
pheno_model_id : p1
project : False
superpops : None
epoch : None
pdata : None
trainedmodelname : None
plot : False
animate : False
evaluate : False
metrics : None

______________________________ data opts ______________________________
sparsifies : [0.0, 0.1, 0.2, 0.3, 0.4]
norm_opts : {'flip': False, 'missing_val': -1.0}
norm_mode : genotypewise01
impute_missing : True
validation_split : 0.2

______________________________ train opts ______________________________
learning_rate : 0.00032
batch_size : 10
noise_std : 0.0032
n_samples : -1
loss : {'module': 'tf.keras.losses', 'class': 'CategoricalCrossentropy', 'args': {'from_logits': False}}
regularizer : {'reg_factor': 1e-07, 'module': 'tf.keras.regularizers', 'class': 'l2'}
lr_scheme : {'module': 'tf.keras.optimizers.schedules', 'class': 'ExponentialDecay', 'args': {'decay_rate': 0.96, 'decay_steps': 100, 'staircase': False}}
______________________________
Imputing originally missing genotypes to most common value.
Reading ind pop list from /home/richel/GitHubs/GenoCAE/example_tiny/issue_2_bin.fam
Reading ind pop list from /home/richel/GitHubs/GenoCAE/example_tiny/issue_2_bin.fam
Mapping files: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:00<00:00, 362.20it/s]
Using learning rate schedule tf.keras.optimizers.schedules.ExponentialDecay with {'decay_rate': 0.96, 'decay_steps': 100, 'staircase': False}

______________________________ Data ______________________________
N unique train samples: 800
--- training on : 800
N valid samples: 200
N markers: 4


______________________________ Building model ______________________________
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'strides': 1}
Adding layer: BatchNormalization: {}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: MaxPooling1D: {'pool_size': 5, 'strides': 2, 'padding': 'same'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Flatten: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dense: {'units': 2, 'name': 'encoded'}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 16}
Adding layer: Reshape: {'target_shape': (2, 8), 'name': 'i_msvar'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Reshape: {'target_shape': (2, 1, 8)}
Adding layer: UpSampling2D: {'size': (2, 1)}
Adding layer: Reshape: {'target_shape': (4, 8)}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu', 'name': 'nms'}
Adding layer: BatchNormalization: {}
Adding layer: Conv1D: {'filters': 1, 'kernel_size': 1, 'padding': 'same'}
Adding layer: Flatten: {'name': 'logits'}

______________________________ Building model ______________________________
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'strides': 1}
Adding layer: BatchNormalization: {}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: MaxPooling1D: {'pool_size': 5, 'strides': 2, 'padding': 'same'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same'}
Adding layer: BatchNormalization: {}
Adding layer: Flatten: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dense: {'units': 2, 'name': 'encoded'}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 16}
Adding layer: Reshape: {'target_shape': (2, 8), 'name': 'i_msvar'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same'}
Adding layer: BatchNormalization: {}
Adding layer: Reshape: {'target_shape': (2, 1, 8)}
Adding layer: UpSampling2D: {'size': (2, 1)}
Adding layer: Reshape: {'target_shape': (4, 8)}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'name': 'nms'}
Adding layer: BatchNormalization: {}
Adding layer: Conv1D: {'filters': 1, 'kernel_size': 1, 'padding': 'same'}
Adding layer: Flatten: {'name': 'logits'}

______________________________ Building model ______________________________
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dense: {'units': 1}
No marker specific variable.
ALLVARS [<tf.Variable 'autoencoder/conv1d/kernel:0' shape=(5, 3, 8) dtype=float32>, <tf.Variable 'autoencoder/conv1d/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/conv1d_1/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/conv1d_1/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/batch_normalization_1/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/batch_normalization_1/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/conv1d_2/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/conv1d_2/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/batch_normalization_2/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/batch_normalization_2/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/conv1d_3/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/conv1d_3/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_3/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_3/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/dense/kernel:0' shape=(16, 75) dtype=float32>, <tf.Variable 'autoencoder/dense/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder/dense_1/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder/dense_1/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder/encoded/kernel:0' shape=(75, 2) dtype=float32>, <tf.Variable 'autoencoder/encoded/bias:0' shape=(2,) dtype=float32>, <tf.Variable 'autoencoder/dense_2/kernel:0' shape=(2, 75) dtype=float32>, <tf.Variable 'autoencoder/dense_2/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder/dense_3/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder/dense_3/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder/dense_4/kernel:0' shape=(75, 16) dtype=float32>, <tf.Variable 'autoencoder/dense_4/bias:0' shape=(16,) dtype=float32>, <tf.Variable 'autoencoder/conv1d_4/kernel:0' shape=(5, 10, 8) dtype=float32>, <tf.Variable 'autoencoder/conv1d_4/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_4/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_4/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/conv1d_5/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/conv1d_5/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/batch_normalization_5/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/batch_normalization_5/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/conv1d_6/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/conv1d_6/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/batch_normalization_6/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/batch_normalization_6/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/nms/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/nms/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_7/gamma:0' shape=(9,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_7/beta:0' shape=(9,) dtype=float32>, <tf.Variable 'autoencoder/conv1d_7/kernel:0' shape=(1, 9, 1) dtype=float32>, <tf.Variable 'autoencoder/conv1d_7/bias:0' shape=(1,) dtype=float32>, <tf.Variable 'Variable:0' shape=(1, 4) dtype=float32>, <tf.Variable 'Variable:0' shape=(1, 4) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_8/kernel:0' shape=(5, 3, 8) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_8/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_8/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_8/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/conv1d_9/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/conv1d_9/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/batch_normalization_9/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/batch_normalization_9/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/conv1d_10/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/conv1d_10/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/batch_normalization_10/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/batch_normalization_10/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_11/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_11/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_11/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_11/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/dense_5/kernel:0' shape=(16, 75) dtype=float32>, <tf.Variable 'autoencoder_1/dense_5/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_1/dense_6/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder_1/dense_6/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_1/encoded/kernel:0' shape=(75, 2) dtype=float32>, <tf.Variable 'autoencoder_1/encoded/bias:0' shape=(2,) dtype=float32>, <tf.Variable 'autoencoder_1/dense_7/kernel:0' shape=(2, 75) dtype=float32>, <tf.Variable 'autoencoder_1/dense_7/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_1/dense_8/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder_1/dense_8/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_1/dense_9/kernel:0' shape=(75, 16) dtype=float32>, <tf.Variable 'autoencoder_1/dense_9/bias:0' shape=(16,) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_12/kernel:0' shape=(5, 10, 8) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_12/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_12/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_12/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/conv1d_13/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/conv1d_13/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/batch_normalization_13/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/batch_normalization_13/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/conv1d_14/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/conv1d_14/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/batch_normalization_14/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/batch_normalization_14/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/nms/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/nms/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_15/gamma:0' shape=(9,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_15/beta:0' shape=(9,) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_15/kernel:0' shape=(1, 9, 1) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_15/bias:0' shape=(1,) dtype=float32>, <tf.Variable 'Variable:0' shape=(1, 4) dtype=float32>, <tf.Variable 'Variable:0' shape=(1, 4) dtype=float32>, <tf.Variable 'autoencoder_2/dense_10/kernel:0' shape=(2, 75) dtype=float32>, <tf.Variable 'autoencoder_2/dense_10/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_2/dense_11/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder_2/dense_11/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_2/dense_12/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder_2/dense_12/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_2/dense_13/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder_2/dense_13/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_2/dense_14/kernel:0' shape=(75, 1) dtype=float32>, <tf.Variable 'autoencoder_2/dense_14/bias:0' shape=(1,) dtype=float32>] ###
Traceback (most recent call last):
  File "/home/richel/GitHubs/GenoCAE/run_gcae.py", line 1616, in <module>
    main()
  File "/home/richel/GitHubs/GenoCAE/run_gcae.py", line 1014, in main
    run_optimization(autoencoder, autoencoder2, optimizer, optimizer2, loss_func, input_init, targets_init, True, phenomodel=pheno_model, phenotargets=phenotargets_init)
  File "/home/richel/miniconda3/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/richel/miniconda3/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1129, in autograph_handler
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    File "/home/richel/GitHubs/GenoCAE/run_gcae.py", line 424, in run_optimization  *
        loss_value += tf.math.reduce_sum(((-y_pred) * y_true)) * 1e-6

    ValueError: Dimensions must be equal, but are 2 and 4 for '{{node mul_21}} = Mul[T=DT_FLOAT](Neg_2, one_hot_2)' with input shapes: [2,4], [2,4,3].

Request: add a toy model setup

Dear GenoCAE maintainers, hi @cnettel and @kausmees,

Thanks for GenoCAE and the experimental Pheno branch!

What I would enjoy is a toy Mx model (e.g. M0) and a toy px model (e.g. p0) that would be the smallest neural network possible, respecting the dimensions of the input and output (or: 'they just work' (although their predictions will be bad)).

I have tried modifying the /models/M1.json and /models/p2.json files (the latter only available on the Pheno branch), but I feel this will take you seconds to create.

I would enjoy this as this would speed up my GitHub Actions test suite: now training alone takes 150 seconds, whereas I am (usually) only able in that it creates some files, not the output being useful (for useful output I would use the regular models).

Would it be easy to add toy models Mx (e.g. models/M0.json) and toy model px (e.g. models/p0.json)?

If I underestimate how hard this is, just let me know, and I will try harder :-)

Thanks and cheers, Richel

evaluate with superpops: how is the average calculated?

Dear GenoCAE maintainers, hi @cnettel and @kausmees,

As you are back, I have found the following (here discussed from my point of view). Here I submit something I found unexpected. If you also did not expect this, I'd happily create a minimally reproducible example.

When using evaluate with a superpops file, in one of my cases I got the following:

Population num samples f1_score_3 f1_score_5
C 333 0.0000 0.0000
B 334 0.2431 0.0000
A 333 0.4400 0.4996
avg (micro) 1000 0.3100 0.3330

The unexpectedness is in the last line, that suggests to calculate the average, but appears to do different things per column (and I understand for the first column (num_samples) to use a sum there :-) ).

I would expect the averages to be:

Population num samples f1_score_3 f1_score_5
C 333 0.0000 0.0000
B 334 0.2431 0.0000
A 333 0.4400 0.4996
avg (micro) 333 0.2277 0.1665

I checked: these 'averages' are also neither the harmonic nor geometric mean.

What are those values?

If you think these are weird as well, I will happily create a reproducible example. Else, I am happy to learn what these values are :-)

Suggest + volunteer: rename HumanOrigins249_tiny.snp to HumanOrigins249_tiny.bim

Dear GenoCAE maintainer,

Thanks so much for having example files and example code: I find those very useful!

I did find something unexpected, the file extension of HumanOrigins249_tiny.snp: this appears to be a PLINK .bim file, as it follows the same structure as described in the PLINK .bim file format doc:

Screenshot from 2021-06-29 11-44-56

I suggest to rename the file to what any PLINK user would expect for a .bim file, which is HumanOrigins249_tiny.bim

I volunteer to do so.

Could AasaJohanssonUU be added as a Collaborator?

Hi @kausmees,

Currently, when I create an Issue that my supervisor needs to be informed about, I cannot tag her, (i.e. use @AasaJohanssonUU), as ร…sa is not a Collaborator.

Could ร…sa be added as a Collaborator (AasaJohanssonUU) so I can tag here in Issues, in that way, allowing her to stay in the loop better? Would be great!

Suggest + volunteer: add GitHub Actions continuous integration

Continuous integration is the workflow in which after every -among others- git push, the project is tested to 'still work'. Not only is this helpful to speed up development, it also allows one to see if code from contributors (via a Pull Request) keep the build intact.

I suggest to add a minimal GitHub Actions script that simply does the steps in the README.md.

I volunteer to write it and maintain it, as I have plenty of experience with that (e.g. plinkr, but there are dozens if not hundreds)

Good idea?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.