kausmees / genocae Goto Github PK

View Code? Open in Web Editor NEW

14.0 3.0 10.0 1.13 MB

Convolutional autoencoder for genotype data

License: BSD 3-Clause "New" or "Revised" License

Python 99.14% Dockerfile 0.61% Shell 0.25%

genocae's Introduction

GenoCAE

Convolutional autoencoder for genotype data, as described in [1]

An interactive version of the dimensionality reduction visualization from the paper can be found here.

Installation

Manual Installation

(examples for linux)

Requirements:

python >= 3.6

python3-dev

pip3

Install Python packages:

$ cd GenoCAE/
$ pip3 install -r requirements.txt

Docker Installation

Build Docker image

docker build -t gcae/genocae:build -f docker/build.dockerfile .

CLI

$ docker run -it --rm -v ${PWD}:/workspace gcae/genocae:build python3 run_gcae.py --help

If you have a Docker with GPU support.

$ docker run -it  --gpus=all --rm -v ${PWD}:/workspace gcae/genocae:build python3 run_gcae.py --help

CLI

The training and evaluation of models is wrapped by a command-line interface (CLI)

$ cd GenoCAE/
$ python3 run_gcae.py --help

GenoCAE.

Usage:
  run_gcae.py train --datadir=<name> --data=<name> --model_id=<name> --train_opts_id=<name> --data_opts_id=<name> --epochs=<num> [--resume_from=<num> --trainedmodeldir=<name> --patience=<num> --save_interval=<num> --start_saving_from=<num> ]
  run_gcae.py project --datadir=<name>   [ --data=<name> --model_id=<name>  --train_opts_id=<name> --data_opts_id=<name> --superpops=<name> --epoch=<num> --trainedmodeldir=<name>   --pdata=<name> --trainedmodelname=<name>]
  run_gcae.py plot --datadir=<name> [  --data=<name>  --model_id=<name> --train_opts_id=<name> --data_opts_id=<name>  --superpops=<name> --epoch=<num> --trainedmodeldir=<name>  --pdata=<name> --trainedmodelname=<name>]
  run_gcae.py animate --datadir=<name>   [ --data=<name>   --model_id=<name> --train_opts_id=<name> --data_opts_id=<name>  --superpops=<name> --epoch=<num> --trainedmodeldir=<name> --pdata=<name> --trainedmodelname=<name>]
  run_gcae.py evaluate --datadir=<name> --metrics=<name>  [  --data=<name>  --model_id=<name> --train_opts_id=<name> --data_opts_id=<name>  --superpops=<name> --epoch=<num> --trainedmodeldir=<name>  --pdata=<name> --trainedmodelname=<name>]

Options:
  -h --help             show this screen
  --datadir=<name>      directory where sample data is stored. if not absolute: assumed relative to GenoCAE/ directory. DEFAULT: data/
  --data=<name>         file prefix, not including path, of the data files (EIGENSTRAT of PLINK format)
  --trainedmodeldir=<name>     base path where to save model training directories. if not absolute: assumed relative to GenoCAE/ directory. DEFAULT: ae_out/
  --model_id=<name>     model id, corresponding to a file models/model_id.json
  --train_opts_id=<name>train options id, corresponding to a file train_opts/train_opts_id.json
  --data_opts_id=<name> data options id, corresponding to a file data_opts/data_opts_id.json
  --epochs<num>         number of epochs to train
  --resume_from<num>	saved epoch to resume training from. set to -1 for latest saved epoch. DEFAULT: None (don't resume)
  --save_interval<num>	epoch intervals at which to save state of model. DEFAULT: None (don't save)
  --start_saving_from<num>	number of epochs to train before starting to save model state. DEFAULT: 0.
  --trainedmodelname=<name> name of the model training directory to fetch saved model state from when project/plot/evaluating
  --pdata=<name>     	file prefix, not including path, of data to project/plot/evaluate. if not specified, assumed to be the same the model was trained on.
  --epoch<num>          epoch at which to project/plot/evaluate data. DEFAULT: all saved epochs
  --superpops<name>     path+filename of file mapping populations to superpopulations. used to color populations of the same superpopulation in similar colors in plotting. if not absolute path: assumed relative to GenoCAE/ directory.
  --metrics=<name>      the metric(s) to evaluate, e.g. hull_error of f1 score. can pass a list with multiple metrics, e.g. "f1_score_3,f1_score_5". DEFAULT: f1_score_3
  --patience=<num>	 	stop training after this number of epochs without improving lowest validation. DEFAULT: None

The main commands are:

train: train model, and save its state at certain epochs to disk.
project: load saved model state, run data through it and save the projected data to disk.

Once projected data has been written to disk, the following commands can be used:

plot: create plots of the projected data
animate: create an animation visualizing the projected data at different epochs
evaluate: calculate specified metrics for the projected data

See the Examples section below for examples of running these commands.

Setup training

Setup requires defining the following (using the CLI options):

data
data options
model
training options

data

Defines the actual samples and genotype data to use. Passed to the CLI with option --data

Accepted data formats are

EIGENSTRAT (eigenstratgeno/snp/ind). Details here
PLINK (bed/bim/fam). Details here

A small example data set HumanOrigins249_tiny is in example_tiny/ with 249 samples and 9259 SNPs. This can be used for local testing.

data options

These options affect how data is input to the model. Passed to the CLI with option --data_opts, specifying a json file in the directory data_opts/

Example: (data_opts/b_0_4.json)

{ "norm_mode" : "genotypewise01",
  "norm_opts" : {"flip": false, "missing_val":-1.0},
  "impute_missing" : true,
  "validation_split" : 0.2,
  "sparsifies" : [0.0, 0.1, 0.2, 0.3, 0.4]
 }

norm_mode: what normalization mode to use (genotypewise01, smartPCAstyle or standard)
norm_opts: additional normalization options:
- missing_val: value to use to encode missing data in model input (only applicable if either sparsifies is specified, or the original data contains missing genotypes and impute_missing = False )
- flip: whether or not to flip genotype labels 0-1-2 → 2-1-0
impute_missing: if true, genotypes that are missing in the original data are set to the most frequent genotype per marker. if false, then the genotypes that were originally missing are ignored when calculating the loss and genotype concordance. NOTE if this is set to False, then missing_val should be given a value that cannot occur in the data after normalization, so the model can correctly identify which genotypes were originally missing.
validation_split: fraction of samples to use as validation set

Optional:

sparsifies: list of fractions of data to remove artificially during the training process, for regularization

normalization methods

genotypewise01: normalize genotypes to range [0-1] by mapping 0,1,2 -> 0.0,0.5,1.0

smartPCAstyle: subtract mean and divide with an estimate of the std of population allele frequency (more info in the EIGENSTRAT paper), results in data that is centered and has close to unit variance

standard: subtract mean and divide with std

model

This defines a model architecture. Passed to the CLI with option --model_id, specifying a json file in the directory models/. The json file defines what layers the model should have, in what order, convolution, stride, pool size etc.

Example model: (models/M1.json)

{"layers":
[
  {"class": "Conv1D", "module":"tf.keras.layers", "args": {"filters":8, "kernel_size":5, "padding":"same", "activation":"elu", "strides":1}},
  {"class": "BatchNormalization", "module":"tf.keras.layers", "args": {}},
  {"class": "ResidualBlock2", "module":"utils.layers", "args": {"filters":8, "kernel_size":5}},
  {"class": "MaxPool1D", "module":"tf.keras.layers", "args": {"pool_size":5, "strides":2, "padding":"same"}},
  {"class": "Conv1D", "module":"tf.keras.layers", "args": {"filters":8, "kernel_size":5, "padding":"same", "activation":"elu"}},
  {"class": "BatchNormalization", "module":"tf.keras.layers", "args": {}},
  {"class": "Flatten", "module":"tf.keras.layers", "args": {}},
  {"class": "Dropout", "module":"tf.keras.layers", "args": {"rate":0.01}},
  {"class": "Dense", "module":"tf.keras.layers", "args": {"units":75}},
  {"class": "Dropout", "module":"tf.keras.layers", "args": {"rate":0.01}},
  {"class": "Dense", "module":"tf.keras.layers", "args": {"units":75, "activation":"elu"}},
  {"class": "Dense", "module":"tf.keras.layers", "encoding" : true, "args": {"units":2, "name":"encoded"}},
  {"class": "Dense", "module":"tf.keras.layers", "args": {"units":75, "activation":"elu"}},
  {"class": "Dropout", "module":"tf.keras.layers", "args": {"rate":0.01}},
  {"class": "Dense", "module":"tf.keras.layers", "args": {"units":75, "activation":"elu"}},
  {"class": "Dropout", "module":"tf.keras.layers", "args": {"rate":0.01}},
  {"class": "Dense", "module":"tf.keras.layers", "args": {"units":"ns[1]*8"}},
  {"class": "Reshape", "module":"tf.keras.layers", "args": {"target_shape":"(ns[1],8)", "name":"i_msvar"}},
  {"class": "Conv1D", "module":"tf.keras.layers", "args": {"filters":8, "kernel_size":5, "padding":"same", "activation":"elu"}},
  {"class": "BatchNormalization", "module":"tf.keras.layers", "args": {}},
  {"class": "Reshape", "module":"tf.keras.layers", "args": {"target_shape":"(ns[1],1,8)"}},
  {"class": "UpSampling2D", "module":"tf.keras.layers", "args": {"size":"(2,1)"}},
  {"class": "Reshape", "module":"tf.keras.layers", "args": {"target_shape":"(ns[1]*2,8)"}},
  {"class": "ResidualBlock2", "module":"utils.layers", "args": {"filters":8, "kernel_size":5}},
  {"class": "Conv1D", "module":"tf.keras.layers", "args": {"filters":8, "kernel_size":5, "padding":"same", "activation":"elu", "name":"nms"}},
  {"class": "BatchNormalization", "module":"tf.keras.layers", "args": {}},
  {"class": "Conv1D", "module":"tf.keras.layers", "args": {"filters":1, "kernel_size":1, "padding":"same"}},
  {"class": "Flatten", "module":"tf.keras.layers", "args": {"name":"logits"}}
]

}

Corresponds to the below architecture:

The layers are ordered top-bottom as they are left-right in the autoencoder.
The middle layer should be given the name "encoded", this indicates which layer is the latent representation, or encoding.

Marker-specific variables: In order to facilitate the learning of global patterns in the input data, the there is the option to add so-called marker-specific variables. These contsist of one variable per marker that is updated during the optimization process, allowing the model to capture marker-specific behavior. The two sets of marker-specific variables are illustrated in the figure in red and green. The red set of variables is also concatenated to the model input at every stage of the training process.

Giving a layer the name "i_msvar" denotes where the marker-specific variable that is red in the figure should be concatenated
Giving a layer the name "nms" denotes where the marker-specific variable that is green in the figure should be concatenated
The variable "ns" is a list that contains the size of the length dimension of the data (the dimension that corresponds to the length of the genetic sequence). When a layer that modifies this (e.g. maxpooling) in the encoder is added, the new length is added to ns, so that it can be used to reconstruct the lengths in the decoder (e.g. when upsamling). The values in ns will thus depend on the length of the input data. For the data HumanOrigins249_tiny which is used in the Examples section below, ns=[9259,4630]. When a layer argument such as "units" or "target_shape" that should be an int is specified as a string, it will be evaluated as a python expression (and should therefore be a valid expression that evaluates to an int).

train options

These options affect the training of the model. Passed to the CLI with option --train_opts, specifying a json file in the directory train_opts/

Example with mean squared error loss: (train_opts/ex1.json)

{
  "learning_rate": 0.000275,
  "batch_size": 100,
  "noise_std": 0.005,
  "n_samples": -1,
  "loss": {
    "module": "tf.keras.losses",
    "class": "MeanSquaredError"
  },
  "regularizer": {
    "reg_factor": 5e-08,
    "module": "tf.keras.regularizers",
    "class": "l2"
  },

}

Example with binary cross-entropy loss: (train_opts/ex2.json)

{
  "learning_rate": 0.000275,
  "batch_size": 100,
  "noise_std": 0.01,
  "n_samples": -1,
  "loss": {
    "module": "tf.keras.losses",
    "class": "BinaryCrossentropy",
    "args": {
      "from_logits": true
    }
  },
  "regularizer": {
    "reg_factor": 1.5e-07,
    "module": "tf.keras.regularizers",
    "class": "l2"
  }
}

Example with categorical cross-entropy loss and a learning rate scheme: (train_opts/ex3.json)

{
  "learning_rate": 3.2e-02,
  "batch_size": 100,
  "noise_std": 0.0032,
  "n_samples": -1,
  "loss": {
    "module": "tf.keras.losses",
    "class": "CategoricalCrossentropy",
    "args": {
    "from_logits": false}},
  "regularizer": {
    "reg_factor": 1.0e-07,
    "module": "tf.keras.regularizers",
    "class": "l2"
  },
"lr_scheme": {
    "module": "tf.keras.optimizers.schedules",
    "class": "ExponentialDecay",
    "args": {
         "decay_rate": 0.98,
         "decay_steps": 100,
         "staircase": false}}
}

learning rate: learning rate
batch size: batch size
noise_std: std of noise to add to encoding layer, for regularization
n_samples: : how many samples to use in training. if more than the specified data has, then they are repeated. makes most sense to use in combination with dats augmentation (sparsifies). -1 means use all train samples (after the validation set has been removed)
loss: the loss function to use for reconstructed genotypes. specify module, class and arguments of the loss object. the given examples correspond to tensorflow loss classes. specifying one other than these 3, or making a custom class may require additional code changes.
regularizer: how to regularize encoding so the values don't grow uncontrollably. also specifies an existing tensforflow class to use.
- reg_factor: the regularization factor.

optional:

lr_scheme: scheme to apply to learning rate. the example above is to use exponential decay, see tf documentation.

if the loss is sigmoid_cross_entropy_with_logits, the normalized genotypes should be in range (0,1) - so normalized using e.g. genotypewise01

if the loss is mean_squared_error, the normalized genotypes can be in any range, can use smartPCAstyle normalization

normalization mode is specified in the data_opts file

On saving model state

The training procedure saves the current state of the model in the directory weights/ in the training directory. Since these files can take up a significant amount of space, there are arguments to control how this is done:

If specified, the model state is saved every save_interval epochs.
If a validation set is used, whenever a minimum validation loss is encountered, the weights are saved, and the weights of the previously saved lowest validation epoch are deleted.
For both of the above: the argument start_saving_from specifies a number of epochs to train before starting to save model state.
When stopping training (either because the specified number of epochs has been reached, or because patience number of epochs have passed without reaching a new lowest validation loss), the current model state is saved.

Examples

Training

Command to train a model on the example data :

$ cd GenoCAE/
$ python3 run_gcae.py train --datadir example_tiny/ --data HumanOrigins249_tiny --model_id M1  --epochs 20 --save_interval 2  --train_opts_id ex3  --data_opts_id b_0_4

This creates a model training directory: ae_out/ae.M1.ex3.b_0_4.HumanOrigins249_tiny/ with subdirectories

train/: tensorboard statistics for the train set
valid/: tensorboard statistics for the valid set
weights/: files containing saved model states

The following files are also created:

train_times.csv: time in seconds to train each epoch
losses_from_train_t.csv: loss function value on the training set per epoch
losses_from_train_v.csv: loss function value on the validation set per epoch

You can install tensorboard to use their suite of web tools for inspecting TensorFlow runs.

Tensorboard can be started using:

$ tensorboard --logdir ae_out/ae.M1.ex3.b_0_4.HumanOrigins249_tiny/

it will be displayed on localhost:6006 in the browser

Projecting

The saved model weights in a model training directory are used to reload the model at each epoch, and project a given data set (= calcaulate the encoding / latent representation). The entire given data set is projected, no validation set is defined.

The data set to project is specified using the pdata argument. If not specified, the same set as was used for training is assumed.

$ cd GenoCAE/
$ python3 run_gcae.py project --datadir example_tiny/ --data HumanOrigins249_tiny --model_id M1 --train_opts_id ex3  --data_opts_id b_0_4 --superpops example_tiny/HO_superpopulations

This creates a directory named after the projected data containing:

a file encoded_data.h5 containing the projected data (= the encoded data) for all samples at each epoch. this file is used by the plot, evaluate and animate commands.
for each saved epoch: a plot of the projected samples colored according to population, and if specified, superpopulation. a legend is written to a separate file.
a plot and csv file of the loss function value of the model per epoch. note that this is the loss for the entire data set.
a plot and csv file of the genotype concordance of the model per epoch (this is the rate that the model output is equal to the model input). the black line shows the baseline genotype concordance, given by guessing the most frequently occurring genotype per marker.
a plot true_genotypes.pdf showing a histogram of the true (input) genotypes that the model is trained on
a plot output_as_genotypes.pdf showing a histogram of the model output interpreted as genotypes, for the last epoch

When projecting/plotting/evaluating: the location to look for a trained model can either be specified with the same options given to the train commamd (model_id, data_opts_id, etc.) OR by giving the entire directory name with the --trainedmodelname argument. e.g.

$ cd GenoCAE/
$ python3 run_gcae.py project --datadir example_tiny/  --trainedmodelname ae.M1.ex3.b_0_4.HumanOrigins249_tiny --superpops example_tiny/HO_superpopulations

Plotting

The encoded data per epoch that is stored in the file encoded_data.h5 created by the project command is plotted (generating the same plots as the project command).

$ cd GenoCAE/
$ python3 run_gcae.py plot --datadir example_tiny/ --trainedmodelname ae.M1.ex3.b_0_4.HumanOrigins249_tiny --superpops example_tiny/HO_superpopulations

Animating

An animation visualizing the dimensionality reduction over the saved epochs encoded_data.h5 is created.

$ cd GenoCAE/
$ python3 run_gcae.py animate --datadir example_tiny/ --trainedmodelname ae.M1.ex3.b_0_4.HumanOrigins249_tiny --superpops example_tiny/HO_superpopulations

Evaluating

The encoded data per epoch that is stored in the file encoded_data.h5 created by the project command is used to calculate various metrics. The metrics are passed using the --metrics option. Currently implemented metrics are hull_error and f1_score_k, both of which are a measure of how well the encoding clusters populations.

$ cd GenoCAE/
$ python3 run_gcae.py evaluate --metrics "hull_error,f1_score_3" --datadir example_tiny/ --trainedmodelname ae.M1.ex3.b_0_4.HumanOrigins249_tiny  --superpops example_tiny/HO_superpopulations

For each metric, a csv and pdf plot of the metric (averaged over all populations) per epoch is created.

For f1-scores, a csv file with the per-population f1-score is also created, one file per epoch.

metrics:

hull_error: for every population p: define the convex hull created by the points of samples of p. calculate the fraction that other population's samples make up of all the points inside the hull. the hull error is the average of this over populations.
f1_score_k: define a k-NN model for population classification based on the dimensionality reduction. get the f1-score of the classification model, micro-averaged over populations.

Example results

Dimensionality reduction:

Legend:

Getting started

See tips for some rules of thumb for model architecture settings and hyperparameter tuning that might be a good place to start for using GCAE on your own data.

References

[1] Kristiina Ausmees, Carl Nettelblad, A deep learning framework for characterization of genotype data, G3 Genes|Genomes|Genetics, 2022;, jkac020, https://doi.org/10.1093/g3journal/jkac020

genocae's People

Contributors

Stargazers

Watchers

Forkers

runpunchwin nanguoyu camcl filtho cnettel sethmusker shannjiang keivrijdag scicompuu

genocae's Issues

GCAE cannot run 'project' with a phenotype, how to fix?

Dear GenoCAE maintainers,

Thanks for GenoCAE and its Continuous Integration (GitHub Actions) script!

When I run GenoCAE with the added/experimental phenotype, I can now (thanks to #19) train the neural network. Great!

However, when I want to project the genotypes I get the wrong error messages that are too early.

Training goes great, as confirmed by this example GitHub Actions log:

 python3 run_gcae.py train --datadir example_tiny --data issue_2_bin --model_id M1  --epochs 20 --save_interval 2  --train_opts_id ex3  --data_opts_id b_0_4 --pheno_model_id=p1

The last line of the output is also clear:

Done training. Wrote to /home/runner/work/GenoCAE/GenoCAE/ae_out/ae.M1.ex3.b_0_4.issue_2_bin.p1

Note the .p1 addition to the folder name, which is not there when not working with a phenotype.

When I start using the project option, that I copy from the doc, I get unexpected and/or too early error messages:

When I run on GHA like this:

python3 run_gcae.py project --datadir example_tiny --data issue_2_bin --model_id M1 --train_opts_id ex3 --data_opts_id b_0_4 --superpops example_tiny/HO_superpopulations --pheno_model_id=p1

I get the error:

Invalid command. Run 'python run_gcae.py --help' for more information.

as if the --pheno_model_id=p1 is not supported yet.

Sure, I can delete that flag altogether, but then I get:

FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/GenoCAE/GenoCAE/ae_out/ae.M1.ex3.b_0_4.issue_2_bin/weights'

Note the absence of .p1 in the folder name.

The error I expect would be that the dataset used (issue_2_bin) would not work with the file specified with --superpops example_tiny/HO_superpopulations (although it might work by sheer luck).

How can I use project on a neural net that can also do a phenotype?

Could AasaJohanssonUU be added as a Collaborator?

Hi @kausmees,

Currently, when I create an Issue that my supervisor needs to be informed about, I cannot tag her, (i.e. use @AasaJohanssonUU), as Åsa is not a Collaborator.

Could Åsa be added as a Collaborator (AasaJohanssonUU) so I can tag here in Issues, in that way, allowing her to stay in the loop better? Would be great!

Suggest + volunteer: rename HumanOrigins249_tiny.snp to HumanOrigins249_tiny.bim

Dear GenoCAE maintainer,

Thanks so much for having example files and example code: I find those very useful!

I did find something unexpected, the file extension of HumanOrigins249_tiny.snp: this appears to be a PLINK .bim file, as it follows the same structure as described in the PLINK .bim file format doc:

I suggest to rename the file to what any PLINK user would expect for a .bim file, which is HumanOrigins249_tiny.bim

I volunteer to do so.

Suggest: shorter error message when --datadir is not found

Dear GCAE maintainer,

Here I try to convince you to give a shorter error message when --datadir is absent.

Thanks for the GCAE examples provided; these are very helpful!

When I run the example code of the first GCAE training example ...

python3 run_gcae.py train --datadir example_tiny/ --data HumanOrigins249_tiny --model_id M1  --epochs 20 --save_interval 2  --train_opts_id ex3  --data_opts_id b_0_4

I get a clear-but-long error message:

2021-06-28 13:48:01.293305: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-06-28 13:48:01.293338: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
tensorflow version 2.3.3
Traceback (most recent call last):
  File "/home/richel/.local/share/gcaer/gcae_v1_0/run_gcae.py", line 396, in <module>
    with open("data_opts/" + data_opts_id+".json") as data_opts_def_file:
FileNotFoundError: [Errno 2] No such file or directory: 'data_opts/b_0_4.json'

The drawback is that this is too long of an error message for R to display (here I use the gcaer R package):

Also, one could argue that initializing Tensorflow and looking for CUDA should be done after checking if the CLI arguments are valid.

In that way, the error message would shorten to the lines below and I would be happy:

Traceback (most recent call last):
  File "/home/richel/.local/share/gcaer/gcae_v1_0/run_gcae.py", line 396, in <module>
    with open("data_opts/" + data_opts_id+".json") as data_opts_def_file:
FileNotFoundError: [Errno 2] No such file or directory: 'data_opts/b_0_4.json'

An alternative would be to be able to remove these Tensorflow warnings from a CLI argument.

What I suggest is one of these options:

Load Tensorflow after checking the CLI arguments
Add the ubiquitous --verbose argument and only show the Tensorflow things when it is enabled

What do you think about this idea?

Conversion to PLINK format failed for .bed file

Dear GenoCAE maintainer,

Thanks for the conversion of the example files to PLINK format! I checked the .bim and .fam file and they match the PLINK doc (this time, I checked more carefully :-) ).

Sadly, this conversion resulted in files that cannot be run by PLINK (note I ran into the same problems as well :-) . I also found out that convertf is also a .deb package installed on Ubuntu). I can let PLINK2 do something, but this does not result in PLINK-readable files either. Below some notes, mostly reminders to self.

Would you try again?

If you have the data in a human-readable format, I could handcraft the PLINK text/non-binary files and let PLINK convert it to the binary version.
If you'd enjoy this, I could add a script and a test to the build, to confirm that the example data files can be read by PLINK, e.g. in a new folder called -for example- scripts

Cheers, Richel

PLINK v1.7

./plink --bfile ~/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny --assoc --out ~/test --noweb

@----------------------------------------------------------@
|        PLINK!       |     v1.07      |   10/Aug/2009     |
|----------------------------------------------------------|
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@

Skipping web check... [ --noweb ] 
Writing this text to log file [ /home/richel/test.log ]
Analysis started: Wed Jun 30 07:47:48 2021

Options in effect:
	--bfile /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny
	--assoc
	--out /home/richel/test
	--noweb

Reading map (extended format) from [ /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.bim ] 

ERROR: Problem reading BIM file, line 1

PLINK v1.9

./plink --bfile ~/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny --assoc --out ~/test

PLINK v1.90b6.22 64-bit (16 Apr 2021)          www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/richel/test.log.
Options in effect:
  --assoc
  --bfile /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny
  --out /home/richel/test

7652 MB RAM detected; reserving 3826 MB for main workspace.

Error: Invalid chromosome code 'rs6515824' on line 1 of .bim file.
(Use --allow-extra-chr to force it to be accepted.)

PLINK v2.0

./plink2 --bfile ~/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny --glm --out ~/test

PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020)   www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/richel/test.log.
Options in effect:
  --bfile /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny
  --glm
  --out /home/richel/test

Start time: Wed Jun 30 07:49:08 2021
7652 MiB RAM detected; reserving 3826 MiB for main workspace.
Using up to 8 compute threads.
249 samples (0 females, 0 males, 249 ambiguous; 249 founders) loaded from
/home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.fam.

Error: Invalid chromosome code 'rs6515824' on line 1 of .pvar file.
(Use --allow-extra-chr to force it to be accepted.)
End time: Wed Jun 30 07:49:08 2021

following the suggestion results in

./plink2 --bfile ~/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny --glm --allow-extra-chr --out ~/test

PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020)   www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/richel/test.log.
Options in effect:
  --allow-extra-chr
  --bfile /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny
  --glm
  --out /home/richel/test

Start time: Wed Jun 30 07:49:35 2021
7652 MiB RAM detected; reserving 3826 MiB for main workspace.
Using up to 8 compute threads.
249 samples (0 females, 0 males, 249 ambiguous; 249 founders) loaded from
/home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.fam.
9259 variants loaded from
/home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.bim.
1 binary phenotype loaded (0 cases, 249 controls).
Calculating allele frequencies... done.
--glm: Skipping case/control phenotype 'PHENO1' since all samples are controls.
End time: Wed Jun 30 07:49:35 2021

Aha, so the .bim file can be read! Let's re-create it:

./plink2 --bfile ~/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny --allow-extra-chr --make-bpgen  --out ~/HumanOrigins249_tiny

Something is successfully created:

PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020)   www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/richel/HumanOrigins249_tiny.log.
Options in effect:
  --allow-extra-chr
  --bfile /home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny
  --make-bpgen
  --out /home/richel/HumanOrigins249_tiny

Start time: Wed Jun 30 07:53:16 2021
7652 MiB RAM detected; reserving 3826 MiB for main workspace.
Using up to 8 compute threads.
249 samples (0 females, 0 males, 249 ambiguous; 249 founders) loaded from
/home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.fam.
9259 variants loaded from
/home/richel/GitHubs/GenoCAE/example_tiny/HumanOrigins249_tiny.bim.
1 binary phenotype loaded (0 cases, 249 controls).
Writing /home/richel/HumanOrigins249_tiny.fam ... done.
Writing /home/richel/HumanOrigins249_tiny.bim ... done.
Writing /home/richel/HumanOrigins249_tiny.pgen ... done.
End time: Wed Jun 30 07:53:16 2021

Sadly, in R, the files cannot be read.

Here is genio's response:

genio::read_bed(
  bed_filename,
  names_loci = bim_table$id,
  names_ind = fam_table$id
 )

Reading: /home/richel/.local/share/gcaer/gcae_v1_0/example_tiny//HumanOrigins249_tiny.bed
Error in read_bed_cpp(file, m_loci, n_ind) : 
  Row 1 padding was non-zero.  Either the specified number of individuals is incorrect or the input file is corrupt!

Here is ARTP2's response:

ARTP2::read.bed(bed = bed_filename, bim = bim_filename, fam = fam_filename)

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'an integer', got 'rs6515824'

GenoCAE build fails due to upstream update

Dear GenoCAE maintainers, hi Carl and Kristiina,

Thanks for GenoCAE and its tests using GitHub Actions, showing off how awesome it is!

However, upstream something has happened that cause the builds of all of my Python-dependent work to fail. Sadly, it happened to GenoCAE as well. As you are superior with Python, I hope you will help me/us :-)

Currently, the last GitHub Action trigger of the repo passed, which was (as of today) 5 days ago. That seems great! However, today this build fails. I figured this out by simpling forking this repo and trigger a rebuild. From the GitHub Actions log one can read:

Run python3 run_gcae.py --help
2022-02-07 14:01:16.923333: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-02-07 14:01:16.923388: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
Traceback (most recent call last):
  File "run_gcae.py", line 32, in <module>
    import tensorflow as tf
  File "/home/runner/.local/lib/python3.8/site-packages/tensorflow/__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/home/runner/.local/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 37, in <module>
    from tensorflow.python.eager import context
  File "/home/runner/.local/lib/python3.8/site-packages/tensorflow/python/eager/context.py", line 35, in <module>
    from tensorflow.python.client import pywrap_tf_session
  File "/home/runner/.local/lib/python3.8/site-packages/tensorflow/python/client/pywrap_tf_session.py", line 19, in <module>
    from tensorflow.python.client._pywrap_tf_session import *
ImportError: SystemError: <built-in method __contains__ of dict object at 0x7f9dbba31580> returned a result with an error set

The problem is obviously:

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

I have been trying all day to fix this, but I did not dare to meddle with requirements.txt. I will continue trying, yet I hope you will beat me fix this 😇

Suggest + volunteer: add GitHub Actions continuous integration

Continuous integration is the workflow in which after every -among others- git push, the project is tested to 'still work'. Not only is this helpful to speed up development, it also allows one to see if code from contributors (via a Pull Request) keep the build intact.

I suggest to add a minimal GitHub Actions script that simply does the steps in the README.md.

I volunteer to write it and maintain it, as I have plenty of experience with that (e.g. plinkr, but there are dozens if not hundreds)

Good idea?

Suggest: improve CLI error messages

Dear GenoCAE maintainers,

I enjoy GenoCAE quite a bit and especially the examples are great!

What would make me like GenoCAE even better, is to have clearer error messages from the CLI. I think redirecting the user to the help is great, but a clearer error message to guide the user to the next step would be even better.

Some examples:

Example 1

This is not something a user will blame you for, it is more of an opening to the next example.

python run_gcae.py train

I get:

2021-07-02 11:35:47.399470: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
tensorflow version 2.3.3
Invalid command. Run 'python run_gcae.py --help' for more information.

I expected something like:

`datadir` is missing. Please specify the data folder using `--datadir [data dir]`, e.g. `--datadir example_tiny/`

Example 2

This is what I had myself:

python run_gcae.py train --datadir example_tiny/ --data HumanOrigins249_tiny --model_id M1 --train_opts_id ex3 --data_opts_id b_0_4

I got:


2021-07-02 11:35:25.100815: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
tensorflow version 2.3.3
Invalid command. Run 'python run_gcae.py --help' for more information.

I expected something like:

`epochs` is missing. Please specify the number of epochs using `--epochs [number]`, e.g. `--epochs 20`

Docker building error due to python requirements

Hi @kausmees, I seem to be having some issues setting up the Docker container, which I think stems from installing the python requirements.

Reprex

git clone https://github.com/kausmees/GenoCAE.git
cd GenoCAE
docker build -t gcae/genocae:build -f docker/build.dockerfile .

Output

Here is the full output, but the main error comes at the very end.

Sending build context to Docker daemon  5.337MB
Step 1/14 : ARG CUDA_VERSION=11.1.1
Step 2/14 : ARG OS_VERSION=20.04
Step 3/14 : FROM nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-ubuntu${OS_VERSION}
 ---> 75f53d2b5da8
Step 4/14 : LABEL maintainer="Dong Wang"
 ---> Using cache
 ---> 05c68a023e26
Step 5/14 : ENV PATH="/root/miniconda3/bin:${PATH}"
 ---> Using cache
 ---> 84aafea13cc7
Step 6/14 : ARG PATH="/root/miniconda3/bin:${PATH}"
 ---> Using cache
 ---> d5d84110b3c8
Step 7/14 : ARG DEBIAN_FRONTEND=noninteractive
 ---> Running in 79a5ae18bf31
Removing intermediate container 79a5ae18bf31
 ---> a9ec0c06e8ee
Step 8/14 : SHELL ["/bin/bash", "-c"]
 ---> Running in 07a8435b31cd
Removing intermediate container 07a8435b31cd
 ---> bb8387371d5a
Step 9/14 : RUN apt-get update && apt-get upgrade -y &&apt-get install -y wget python3-pip
 ---> Running in 07b6d6f53352
Get:1 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease [1581 B]
Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:9 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.3 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1161 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2415 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1404 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [27.1 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [54.2 kB]
Get:16 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages [579 kB]
Get:17 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.5 kB]
Get:18 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1324 kB]
Get:19 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [881 kB]
Get:20 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1974 kB]
Fetched 23.3 MB in 2s (13.5 MB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
Calculating upgrade...
The following packages have been kept back:
  libcudnn8 libcudnn8-dev libnccl-dev libnccl2
The following packages will be upgraded:
  apt ca-certificates dpkg dpkg-dev e2fsprogs libapt-pkg6.0 libc-bin
  libcom-err2 libdpkg-perl libext2fs2 libpcre3 libsepol1 libss2 libssl1.1
  libsystemd0 libudev1 linux-libc-dev login logsave openssl passwd
21 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.
Need to get 10.6 MB of archives.
After this operation, 22.5 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 dpkg amd64 1.19.7ubuntu3.2 [1128 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 login amd64 1:4.8.1-1ubuntu5.20.04.2 [220 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libc-bin amd64 2.31-0ubuntu9.9 [633 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libsystemd0 amd64 245.4-4ubuntu3.17 [269 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libudev1 amd64 245.4-4ubuntu3.17 [76.5 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libapt-pkg6.0 amd64 2.0.9 [839 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 apt amd64 2.0.9 [1294 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 logsave amd64 1.45.5-2ubuntu1.1 [10.2 kB]
Get:9 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libext2fs2 amd64 1.45.5-2ubuntu1.1 [183 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 e2fsprogs amd64 1.45.5-2ubuntu1.1 [527 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpcre3 amd64 2:8.39-12ubuntu0.1 [232 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libsepol1 amd64 3.0-1ubuntu0.1 [252 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 passwd amd64 1:4.8.1-1ubuntu5.20.04.2 [797 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libcom-err2 amd64 1.45.5-2ubuntu1.1 [9548 B]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libss2 amd64 1.45.5-2ubuntu1.1 [11.3 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libssl1.1 amd64 1.1.1f-1ubuntu2.15 [1321 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 openssl amd64 1.1.1f-1ubuntu2.15 [623 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 ca-certificates all 20211016~20.04.1 [144 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 dpkg-dev all 1.19.7ubuntu3.2 [679 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libdpkg-perl all 1.19.7ubuntu3.2 [231 kB]
Get:21 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 linux-libc-dev amd64 5.4.0-120.136 [1113 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 10.6 MB in 0s (55.0 MB/s)
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../dpkg_1.19.7ubuntu3.2_amd64.deb ...
Unpacking dpkg (1.19.7ubuntu3.2) over (1.19.7ubuntu3) ...
Setting up dpkg (1.19.7ubuntu3.2) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../login_1%3a4.8.1-1ubuntu5.20.04.2_amd64.deb ...
Unpacking login (1:4.8.1-1ubuntu5.20.04.2) over (1:4.8.1-1ubuntu5.20.04.1) ...
Setting up login (1:4.8.1-1ubuntu5.20.04.2) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../libc-bin_2.31-0ubuntu9.9_amd64.deb ...
Unpacking libc-bin (2.31-0ubuntu9.9) over (2.31-0ubuntu9.7) ...
Setting up libc-bin (2.31-0ubuntu9.9) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../libsystemd0_245.4-4ubuntu3.17_amd64.deb ...
Unpacking libsystemd0:amd64 (245.4-4ubuntu3.17) over (245.4-4ubuntu3.16) ...
Setting up libsystemd0:amd64 (245.4-4ubuntu3.17) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../libudev1_245.4-4ubuntu3.17_amd64.deb ...
Unpacking libudev1:amd64 (245.4-4ubuntu3.17) over (245.4-4ubuntu3.16) ...
Setting up libudev1:amd64 (245.4-4ubuntu3.17) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../libapt-pkg6.0_2.0.9_amd64.deb ...
Unpacking libapt-pkg6.0:amd64 (2.0.9) over (2.0.6) ...
Setting up libapt-pkg6.0:amd64 (2.0.9) ...
(Reading database ... 12626 files and directories currently installed.)
Preparing to unpack .../archives/apt_2.0.9_amd64.deb ...
Unpacking apt (2.0.9) over (2.0.6) ...
Setting up apt (2.0.9) ...
Removing obsolete conffile /etc/kernel/postinst.d/apt-auto-removal ...
(Reading database ... 12625 files and directories currently installed.)
Preparing to unpack .../logsave_1.45.5-2ubuntu1.1_amd64.deb ...
Unpacking logsave (1.45.5-2ubuntu1.1) over (1.45.5-2ubuntu1) ...
Preparing to unpack .../libext2fs2_1.45.5-2ubuntu1.1_amd64.deb ...
Unpacking libext2fs2:amd64 (1.45.5-2ubuntu1.1) over (1.45.5-2ubuntu1) ...
Setting up libext2fs2:amd64 (1.45.5-2ubuntu1.1) ...
(Reading database ... 12625 files and directories currently installed.)
Preparing to unpack .../e2fsprogs_1.45.5-2ubuntu1.1_amd64.deb ...
Unpacking e2fsprogs (1.45.5-2ubuntu1.1) over (1.45.5-2ubuntu1) ...
Preparing to unpack .../libpcre3_2%3a8.39-12ubuntu0.1_amd64.deb ...
Unpacking libpcre3:amd64 (2:8.39-12ubuntu0.1) over (2:8.39-12build1) ...
Setting up libpcre3:amd64 (2:8.39-12ubuntu0.1) ...
(Reading database ... 12625 files and directories currently installed.)
Preparing to unpack .../libsepol1_3.0-1ubuntu0.1_amd64.deb ...
Unpacking libsepol1:amd64 (3.0-1ubuntu0.1) over (3.0-1) ...
Setting up libsepol1:amd64 (3.0-1ubuntu0.1) ...
(Reading database ... 12625 files and directories currently installed.)
Preparing to unpack .../passwd_1%3a4.8.1-1ubuntu5.20.04.2_amd64.deb ...
Unpacking passwd (1:4.8.1-1ubuntu5.20.04.2) over (1:4.8.1-1ubuntu5.20.04.1) ...
Setting up passwd (1:4.8.1-1ubuntu5.20.04.2) ...
(Reading database ... 12625 files and directories currently installed.)
Preparing to unpack .../0-libcom-err2_1.45.5-2ubuntu1.1_amd64.deb ...
Unpacking libcom-err2:amd64 (1.45.5-2ubuntu1.1) over (1.45.5-2ubuntu1) ...
Preparing to unpack .../1-libss2_1.45.5-2ubuntu1.1_amd64.deb ...
Unpacking libss2:amd64 (1.45.5-2ubuntu1.1) over (1.45.5-2ubuntu1) ...
Preparing to unpack .../2-libssl1.1_1.1.1f-1ubuntu2.15_amd64.deb ...
Unpacking libssl1.1:amd64 (1.1.1f-1ubuntu2.15) over (1.1.1f-1ubuntu2.13) ...
Preparing to unpack .../3-openssl_1.1.1f-1ubuntu2.15_amd64.deb ...
Unpacking openssl (1.1.1f-1ubuntu2.15) over (1.1.1f-1ubuntu2.13) ...
Preparing to unpack .../4-ca-certificates_20211016~20.04.1_all.deb ...
Unpacking ca-certificates (20211016~20.04.1) over (20210119~20.04.2) ...
Preparing to unpack .../5-dpkg-dev_1.19.7ubuntu3.2_all.deb ...
Unpacking dpkg-dev (1.19.7ubuntu3.2) over (1.19.7ubuntu3) ...
Preparing to unpack .../6-libdpkg-perl_1.19.7ubuntu3.2_all.deb ...
Unpacking libdpkg-perl (1.19.7ubuntu3.2) over (1.19.7ubuntu3) ...
Preparing to unpack .../7-linux-libc-dev_5.4.0-120.136_amd64.deb ...
Unpacking linux-libc-dev:amd64 (5.4.0-120.136) over (5.4.0-113.127) ...
Setting up libssl1.1:amd64 (1.1.1f-1ubuntu2.15) ...
Setting up linux-libc-dev:amd64 (5.4.0-120.136) ...
Setting up libcom-err2:amd64 (1.45.5-2ubuntu1.1) ...
Setting up libss2:amd64 (1.45.5-2ubuntu1.1) ...
Setting up libdpkg-perl (1.19.7ubuntu3.2) ...
Setting up logsave (1.45.5-2ubuntu1.1) ...
Setting up openssl (1.1.1f-1ubuntu2.15) ...
Setting up e2fsprogs (1.45.5-2ubuntu1.1) ...
Setting up dpkg-dev (1.19.7ubuntu3.2) ...
Setting up ca-certificates (20211016~20.04.1) ...
Updating certificates in /etc/ssl/certs...
rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
7 added, 8 removed; done.
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
Processing triggers for ca-certificates (20211016~20.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  file libexpat1 libexpat1-dev libmagic-mgc libmagic1 libmpdec2 libpsl5
  libpython3-dev libpython3-stdlib libpython3.8 libpython3.8-dev
  libpython3.8-minimal libpython3.8-stdlib mime-support publicsuffix
  python-pip-whl python3 python3-dev python3-distutils python3-lib2to3
  python3-minimal python3-pkg-resources python3-setuptools python3-wheel
  python3.8 python3.8-dev python3.8-minimal zlib1g-dev
Suggested packages:
  python3-doc python3-tk python3-venv python-setuptools-doc python3.8-venv
  python3.8-doc binfmt-support
The following NEW packages will be installed:
  file libexpat1 libexpat1-dev libmagic-mgc libmagic1 libmpdec2 libpsl5
  libpython3-dev libpython3-stdlib libpython3.8 libpython3.8-dev
  libpython3.8-minimal libpython3.8-stdlib mime-support publicsuffix
  python-pip-whl python3 python3-dev python3-distutils python3-lib2to3
  python3-minimal python3-pip python3-pkg-resources python3-setuptools
  python3-wheel python3.8 python3.8-dev python3.8-minimal wget zlib1g-dev
0 upgraded, 30 newly installed, 0 to remove and 4 not upgraded.
Need to get 14.9 MB of archives.
After this operation, 63.1 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpython3.8-minimal amd64 3.8.10-0ubuntu1~20.04.4 [717 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libexpat1 amd64 2.2.9-1ubuntu0.4 [74.4 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3.8-minimal amd64 3.8.10-0ubuntu1~20.04.4 [1899 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-minimal amd64 3.8.2-0ubuntu2 [23.6 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal/main amd64 mime-support all 3.64ubuntu1 [30.6 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal/main amd64 libmpdec2 amd64 2.4.2-3 [81.1 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpython3.8-stdlib amd64 3.8.10-0ubuntu1~20.04.4 [1675 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3.8 amd64 3.8.10-0ubuntu1~20.04.4 [387 kB]
Get:9 http://archive.ubuntu.com/ubuntu focal/main amd64 libpython3-stdlib amd64 3.8.2-0ubuntu2 [7068 B]
Get:10 http://archive.ubuntu.com/ubuntu focal/main amd64 python3 amd64 3.8.2-0ubuntu2 [47.6 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal/main amd64 libmagic-mgc amd64 1:5.38-4 [218 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/main amd64 libmagic1 amd64 1:5.38-4 [75.9 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal/main amd64 file amd64 1:5.38-4 [23.3 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-pkg-resources all 45.2.0-1 [130 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal/main amd64 libpsl5 amd64 0.21.0-1ubuntu1 [51.5 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal/main amd64 publicsuffix all 20200303.0012-1 [111 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 wget amd64 1.20.3-1ubuntu2 [348 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libexpat1-dev amd64 2.2.9-1ubuntu0.4 [117 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpython3.8 amd64 3.8.10-0ubuntu1~20.04.4 [1625 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libpython3.8-dev amd64 3.8.10-0ubuntu1~20.04.4 [3952 kB]
Get:21 http://archive.ubuntu.com/ubuntu focal/main amd64 libpython3-dev amd64 3.8.2-0ubuntu2 [7236 B]
Get:22 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 python-pip-whl all 20.0.2-5ubuntu1.6 [1805 kB]
Get:23 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 zlib1g-dev amd64 1:1.2.11.dfsg-2ubuntu1.3 [155 kB]
Get:24 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3.8-dev amd64 3.8.10-0ubuntu1~20.04.4 [514 kB]
Get:25 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3-lib2to3 all 3.8.10-0ubuntu1~20.04 [76.3 kB]
Get:26 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3-distutils all 3.8.10-0ubuntu1~20.04 [141 kB]
Get:27 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-dev amd64 3.8.2-0ubuntu2 [1212 B]
Get:28 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-setuptools all 45.2.0-1 [330 kB]
Get:29 http://archive.ubuntu.com/ubuntu focal/universe amd64 python3-wheel all 0.34.2-1 [23.8 kB]
Get:30 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 python3-pip all 20.0.2-5ubuntu1.6 [231 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 14.9 MB in 0s (81.4 MB/s)
Selecting previously unselected package libpython3.8-minimal:amd64.
(Reading database ... 12624 files and directories currently installed.)
Preparing to unpack .../libpython3.8-minimal_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking libpython3.8-minimal:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package libexpat1:amd64.
Preparing to unpack .../libexpat1_2.2.9-1ubuntu0.4_amd64.deb ...
Unpacking libexpat1:amd64 (2.2.9-1ubuntu0.4) ...
Selecting previously unselected package python3.8-minimal.
Preparing to unpack .../python3.8-minimal_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking python3.8-minimal (3.8.10-0ubuntu1~20.04.4) ...
Setting up libpython3.8-minimal:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Setting up libexpat1:amd64 (2.2.9-1ubuntu0.4) ...
Setting up python3.8-minimal (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package python3-minimal.
(Reading database ... 12915 files and directories currently installed.)
Preparing to unpack .../0-python3-minimal_3.8.2-0ubuntu2_amd64.deb ...
Unpacking python3-minimal (3.8.2-0ubuntu2) ...
Selecting previously unselected package mime-support.
Preparing to unpack .../1-mime-support_3.64ubuntu1_all.deb ...
Unpacking mime-support (3.64ubuntu1) ...
Selecting previously unselected package libmpdec2:amd64.
Preparing to unpack .../2-libmpdec2_2.4.2-3_amd64.deb ...
Unpacking libmpdec2:amd64 (2.4.2-3) ...
Selecting previously unselected package libpython3.8-stdlib:amd64.
Preparing to unpack .../3-libpython3.8-stdlib_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking libpython3.8-stdlib:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package python3.8.
Preparing to unpack .../4-python3.8_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking python3.8 (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package libpython3-stdlib:amd64.
Preparing to unpack .../5-libpython3-stdlib_3.8.2-0ubuntu2_amd64.deb ...
Unpacking libpython3-stdlib:amd64 (3.8.2-0ubuntu2) ...
Setting up python3-minimal (3.8.2-0ubuntu2) ...
Selecting previously unselected package python3.
(Reading database ... 13317 files and directories currently installed.)
Preparing to unpack .../00-python3_3.8.2-0ubuntu2_amd64.deb ...
Unpacking python3 (3.8.2-0ubuntu2) ...
Selecting previously unselected package libmagic-mgc.
Preparing to unpack .../01-libmagic-mgc_1%3a5.38-4_amd64.deb ...
Unpacking libmagic-mgc (1:5.38-4) ...
Selecting previously unselected package libmagic1:amd64.
Preparing to unpack .../02-libmagic1_1%3a5.38-4_amd64.deb ...
Unpacking libmagic1:amd64 (1:5.38-4) ...
Selecting previously unselected package file.
Preparing to unpack .../03-file_1%3a5.38-4_amd64.deb ...
Unpacking file (1:5.38-4) ...
Selecting previously unselected package python3-pkg-resources.
Preparing to unpack .../04-python3-pkg-resources_45.2.0-1_all.deb ...
Unpacking python3-pkg-resources (45.2.0-1) ...
Selecting previously unselected package libpsl5:amd64.
Preparing to unpack .../05-libpsl5_0.21.0-1ubuntu1_amd64.deb ...
Unpacking libpsl5:amd64 (0.21.0-1ubuntu1) ...
Selecting previously unselected package publicsuffix.
Preparing to unpack .../06-publicsuffix_20200303.0012-1_all.deb ...
Unpacking publicsuffix (20200303.0012-1) ...
Selecting previously unselected package wget.
Preparing to unpack .../07-wget_1.20.3-1ubuntu2_amd64.deb ...
Unpacking wget (1.20.3-1ubuntu2) ...
Selecting previously unselected package libexpat1-dev:amd64.
Preparing to unpack .../08-libexpat1-dev_2.2.9-1ubuntu0.4_amd64.deb ...
Unpacking libexpat1-dev:amd64 (2.2.9-1ubuntu0.4) ...
Selecting previously unselected package libpython3.8:amd64.
Preparing to unpack .../09-libpython3.8_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking libpython3.8:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package libpython3.8-dev:amd64.
Preparing to unpack .../10-libpython3.8-dev_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking libpython3.8-dev:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package libpython3-dev:amd64.
Preparing to unpack .../11-libpython3-dev_3.8.2-0ubuntu2_amd64.deb ...
Unpacking libpython3-dev:amd64 (3.8.2-0ubuntu2) ...
Selecting previously unselected package python-pip-whl.
Preparing to unpack .../12-python-pip-whl_20.0.2-5ubuntu1.6_all.deb ...
Unpacking python-pip-whl (20.0.2-5ubuntu1.6) ...
Selecting previously unselected package zlib1g-dev:amd64.
Preparing to unpack .../13-zlib1g-dev_1%3a1.2.11.dfsg-2ubuntu1.3_amd64.deb ...
Unpacking zlib1g-dev:amd64 (1:1.2.11.dfsg-2ubuntu1.3) ...
Selecting previously unselected package python3.8-dev.
Preparing to unpack .../14-python3.8-dev_3.8.10-0ubuntu1~20.04.4_amd64.deb ...
Unpacking python3.8-dev (3.8.10-0ubuntu1~20.04.4) ...
Selecting previously unselected package python3-lib2to3.
Preparing to unpack .../15-python3-lib2to3_3.8.10-0ubuntu1~20.04_all.deb ...
Unpacking python3-lib2to3 (3.8.10-0ubuntu1~20.04) ...
Selecting previously unselected package python3-distutils.
Preparing to unpack .../16-python3-distutils_3.8.10-0ubuntu1~20.04_all.deb ...
Unpacking python3-distutils (3.8.10-0ubuntu1~20.04) ...
Selecting previously unselected package python3-dev.
Preparing to unpack .../17-python3-dev_3.8.2-0ubuntu2_amd64.deb ...
Unpacking python3-dev (3.8.2-0ubuntu2) ...
Selecting previously unselected package python3-setuptools.
Preparing to unpack .../18-python3-setuptools_45.2.0-1_all.deb ...
Unpacking python3-setuptools (45.2.0-1) ...
Selecting previously unselected package python3-wheel.
Preparing to unpack .../19-python3-wheel_0.34.2-1_all.deb ...
Unpacking python3-wheel (0.34.2-1) ...
Selecting previously unselected package python3-pip.
Preparing to unpack .../20-python3-pip_20.0.2-5ubuntu1.6_all.deb ...
Unpacking python3-pip (20.0.2-5ubuntu1.6) ...
Setting up libpsl5:amd64 (0.21.0-1ubuntu1) ...
Setting up mime-support (3.64ubuntu1) ...
Setting up wget (1.20.3-1ubuntu2) ...
Setting up libmagic-mgc (1:5.38-4) ...
Setting up libmagic1:amd64 (1:5.38-4) ...
Setting up file (1:5.38-4) ...
Setting up libexpat1-dev:amd64 (2.2.9-1ubuntu0.4) ...
Setting up zlib1g-dev:amd64 (1:1.2.11.dfsg-2ubuntu1.3) ...
Setting up python-pip-whl (20.0.2-5ubuntu1.6) ...
Setting up libmpdec2:amd64 (2.4.2-3) ...
Setting up libpython3.8-stdlib:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Setting up python3.8 (3.8.10-0ubuntu1~20.04.4) ...
Setting up publicsuffix (20200303.0012-1) ...
Setting up libpython3-stdlib:amd64 (3.8.2-0ubuntu2) ...
Setting up python3 (3.8.2-0ubuntu2) ...
Setting up python3-wheel (0.34.2-1) ...
Setting up libpython3.8:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Setting up python3-lib2to3 (3.8.10-0ubuntu1~20.04) ...
Setting up python3-pkg-resources (45.2.0-1) ...
Setting up python3-distutils (3.8.10-0ubuntu1~20.04) ...
Setting up python3-setuptools (45.2.0-1) ...
Setting up libpython3.8-dev:amd64 (3.8.10-0ubuntu1~20.04.4) ...
Setting up python3-pip (20.0.2-5ubuntu1.6) ...
Setting up python3.8-dev (3.8.10-0ubuntu1~20.04.4) ...
Setting up libpython3-dev:amd64 (3.8.2-0ubuntu2) ...
Setting up python3-dev (3.8.2-0ubuntu2) ...
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
Removing intermediate container 07b6d6f53352
 ---> bdad80779900
Step 10/14 : RUN python3 -m pip install --no-cache-dir --upgrade pip
 ---> Running in a7851101a995
Collecting pip
  Downloading pip-22.1.2-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.0.2
    Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
    Can't uninstall 'pip'. No files were found to uninstall.
Successfully installed pip-22.1.2
Removing intermediate container a7851101a995
 ---> fdd991e1849d
Step 11/14 : WORKDIR /workspace
 ---> Running in 69aa6a53c3e4
Removing intermediate container 69aa6a53c3e4
 ---> 325c1b83c37e
Step 12/14 : ADD ./requirements.txt /workspace
 ---> 1757a4388050
Step 13/14 : RUN python3 -m pip install -r /workspace/requirements.txt and &&rm /workspace/requirements.txt
 ---> Running in df9be31afa55
ERROR: Could not find a version that satisfies the requirement and (from versions: none)
ERROR: No matching distribution found for and
The command '/bin/bash -c python3 -m pip install -r /workspace/requirements.txt and &&rm /workspace/requirements.txt' returned a non-zero code: 1

Potential solutions

One way to avoid this might be to make use of conda environments with less restrictive version requirements. I've created a yaml file which can be used to set up all the dependencies. Haven't yet tried running GenoCAE with it yet though.
Perhaps this could be used when setting up the Docker container, instead of the requirements.txt file? (PS- only added the .txt suffix to allow it to be uploaded to GH Issues).

Thanks! Really looking forward to using GenoCAE!
env.yml.txt

conda env create -f env.yml.txt

Best,
Brian

Docker script does not build anymore?

Dear GenoCAE maintainer, hi Carl and Kristiina,

Thanks for GenoCAE as well as the Docker container script: It's great for running GenoCAE on a computer cluster :-)

This Issue is related to #26, which is probably also caused by an upstream update: the Docker file does not work anymore. The installation instructions at https://github.com/kausmees/GenoCAE#docker-installation are great! Doing the suggested command, i.e. (note I added sudo) ...

sudo docker build -t gcae/genocae:build -f docker/build.dockerfile .

... results in a failed build, with a full error log below.

I have been trying the whole day ( for example, there are 6 failed attempts here), but could not fix this.

Does the Docker build work for you? Do you have an idea how to fix the Docker file?

A temporary workaround could be to upload an existing Docker container to Docker hub. Do you happen to have one? Would be awesome!

I hope it will be easy for you to help me solve this. I am not very experiences with Docker nor Python, so I can imagine an easy fix being possible (on the other hand, the 6 Stack Overflow 'solutions' hint that the problem is there).

To reproduce, I have created a script to build the Docker container, together with a GitHub Actions script with an error log here.

I hope you can help me out here! Thanks and cheers, Richel

Full error log

Sending build context to Docker daemon  186.6MB
Step 1/15 : ARG CUDA_VERSION=11.1.1
Step 2/15 : ARG OS_VERSION=20.04
Step 3/15 : FROM nvidia/cuda:${CUDA_VERSION}-cudnn8-devel-ubuntu${OS_VERSION}
 ---> 1189781af5ec
Step 4/15 : LABEL maintainer="Dong Wang"
 ---> Using cache
 ---> 9ae2635141d3
Step 5/15 : ENV PATH="/root/miniconda3/bin:${PATH}"
 ---> Using cache
 ---> f907151c27bd
Step 6/15 : ARG PATH="/root/miniconda3/bin:${PATH}"
 ---> Using cache
 ---> 76b31f23bd5e
Step 7/15 : SHELL ["/bin/bash", "-c"]
 ---> Using cache
 ---> e52cb6a4a70e
Step 8/15 : RUN apt-get update && apt-get upgrade -y &&     apt-get install -y wget
 ---> Using cache
 ---> f779a2c5021a
Step 9/15 : RUN wget     https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh     && mkdir /root/.conda     && bash Miniconda3-latest-Linux-x86_64.sh -b     && rm -f Miniconda3-latest-Linux-x86_64.sh
 ---> Using cache
 ---> e86837b5b18e
Step 10/15 : RUN pip3 install --upgrade pip
 ---> Running in 7a97102a4336
Requirement already satisfied: pip in /root/miniconda3/lib/python3.9/site-packages (21.1.3)
Collecting pip
  Downloading pip-22.0.3-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-22.0.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Removing intermediate container 7a97102a4336
 ---> b48bafcfa623
Step 11/15 : RUN pip3 install --upgrade setuptools
 ---> Running in ea564922654f
Requirement already satisfied: setuptools in /root/miniconda3/lib/python3.9/site-packages (52.0.0.post20210125)
Collecting setuptools
  Downloading setuptools-60.8.1-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 13.7 MB/s eta 0:00:00
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 52.0.0.post20210125
    Uninstalling setuptools-52.0.0.post20210125:
      Successfully uninstalled setuptools-52.0.0.post20210125
Successfully installed setuptools-60.8.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Removing intermediate container ea564922654f
 ---> 874907ea03a5
Step 12/15 : WORKDIR /workspace
 ---> Running in ca79d4cb1457
Removing intermediate container ca79d4cb1457
 ---> a96a63e990ee
Step 13/15 : ADD ./requirements.txt /workspace
 ---> ad981a4056bd
Step 14/15 : RUN pip3 install -r /workspace/requirements.txt and &&	rm /workspace/requirements.txt
 ---> Running in 47c7e4f5b30c
Collecting and
  Downloading and-0.1.1-py3-none-any.whl (2.0 kB)
Collecting docopt
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting grpcio
  Downloading grpcio-1.43.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/4.1 MB 24.1 MB/s eta 0:00:00
Collecting setuptools==47.1.1
  Downloading setuptools-47.1.1-py3-none-any.whl (583 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 583.2/583.2 KB 32.3 MB/s eta 0:00:00
Collecting tensorflow>=2.2.0
  Downloading tensorflow-2.8.0-cp39-cp39-manylinux2010_x86_64.whl (497.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 497.6/497.6 MB 4.1 MB/s eta 0:00:00
Collecting numpy==1.18.4
  Downloading numpy-1.18.4.zip (5.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 26.8 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting scikit-learn
  Downloading scikit_learn-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.4/26.4 MB 25.0 MB/s eta 0:00:00
Collecting matplotlib==3.2.1
  Downloading matplotlib-3.2.1.tar.gz (40.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.3/40.3 MB 20.9 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting seaborn
  Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.8/292.8 KB 26.8 MB/s eta 0:00:00
Collecting scipy==1.4.1
  Downloading scipy-1.4.1.tar.gz (24.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.6/24.6 MB 24.9 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: still running...
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'error'
  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [171 lines of output]
      setup.py:418: UserWarning: Unrecognized setuptools command ('dist_info --egg-base /tmp/pip-modern-metadata-jxp98lbc'), proceeding with generating Cython sources and expanding templates
        warnings.warn("Unrecognized setuptools command ('{}'), proceeding with "
      Running from scipy source directory.
      lapack_opt_info:
      lapack_mkl_info:
      customize UnixCCompiler
        libraries mkl_rt not found in ['/root/miniconda3/lib', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
        NOT AVAILABLE
      
      openblas_lapack_info:
      customize UnixCCompiler
      customize UnixCCompiler
        libraries openblas not found in ['/root/miniconda3/lib', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
        NOT AVAILABLE
      
      openblas_clapack_info:
      customize UnixCCompiler
      customize UnixCCompiler
        libraries openblas,lapack not found in ['/root/miniconda3/lib', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
        NOT AVAILABLE
      
      flame_info:
      customize UnixCCompiler
        libraries flame not found in ['/root/miniconda3/lib', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
        NOT AVAILABLE
      
      atlas_3_10_threads_info:
      Setting PTATLAS=ATLAS
      customize UnixCCompiler
        libraries lapack_atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries tatlas,tatlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries tatlas,tatlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries tatlas,tatlas not found in /usr/lib64
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib
      customize UnixCCompiler
        libraries tatlas,tatlas not found in /usr/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
      customize UnixCCompiler
        libraries tatlas,tatlas not found in /usr/lib/x86_64-linux-gnu
      <class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
        NOT AVAILABLE
      
      atlas_3_10_info:
      customize UnixCCompiler
        libraries lapack_atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries satlas,satlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries satlas,satlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries satlas,satlas not found in /usr/lib64
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib
      customize UnixCCompiler
        libraries satlas,satlas not found in /usr/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
      customize UnixCCompiler
        libraries satlas,satlas not found in /usr/lib/x86_64-linux-gnu
      <class 'numpy.distutils.system_info.atlas_3_10_info'>
        NOT AVAILABLE
      
      atlas_threads_info:
      Setting PTATLAS=ATLAS
      customize UnixCCompiler
        libraries lapack_atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries ptf77blas,ptcblas,atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries ptf77blas,ptcblas,atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib
      customize UnixCCompiler
        libraries ptf77blas,ptcblas,atlas not found in /usr/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
      customize UnixCCompiler
        libraries ptf77blas,ptcblas,atlas not found in /usr/lib/x86_64-linux-gnu
      <class 'numpy.distutils.system_info.atlas_threads_info'>
        NOT AVAILABLE
      
      atlas_info:
      customize UnixCCompiler
        libraries lapack_atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries f77blas,cblas,atlas not found in /root/miniconda3/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries f77blas,cblas,atlas not found in /usr/local/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries f77blas,cblas,atlas not found in /usr/lib64
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib
      customize UnixCCompiler
        libraries f77blas,cblas,atlas not found in /usr/lib
      customize UnixCCompiler
        libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
      customize UnixCCompiler
        libraries f77blas,cblas,atlas not found in /usr/lib/x86_64-linux-gnu
      <class 'numpy.distutils.system_info.atlas_info'>
        NOT AVAILABLE
      
      accelerate_info:
        NOT AVAILABLE
      
      lapack_info:
      customize UnixCCompiler
        libraries lapack not found in ['/root/miniconda3/lib', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
        NOT AVAILABLE
      
      /tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/numpy/distutils/system_info.py:1712: UserWarning:
          Lapack (http://www.netlib.org/lapack/) libraries not found.
          Directories to search for the libraries can be specified in the
          numpy/distutils/site.cfg file (section [lapack]) or by setting
          the LAPACK environment variable.
        if getattr(self, '_calc_info_{}'.format(lapack))():
      lapack_src_info:
        NOT AVAILABLE
      
      /tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/numpy/distutils/system_info.py:1712: UserWarning:
          Lapack (http://www.netlib.org/lapack/) sources not found.
          Directories to search for the sources can be specified in the
          numpy/distutils/site.cfg file (section [lapack_src]) or by setting
          the LAPACK_SRC environment variable.
        if getattr(self, '_calc_info_{}'.format(lapack))():
        NOT AVAILABLE
      
      Traceback (most recent call last):
        File "/root/miniconda3/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/root/miniconda3/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/root/miniconda3/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel
          return hook(metadata_directory, config_settings)
        File "/tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 173, in prepare_metadata_for_build_wheel
          self.run_setup()
        File "/tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 266, in run_setup
          super(_BuildMetaLegacyBackend,
        File "/tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 157, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File "setup.py", line 540, in <module>
          setup_package()
        File "setup.py", line 536, in setup_package
          setup(**metadata)
        File "/tmp/pip-build-env-jf9lnjy9/overlay/lib/python3.9/site-packages/numpy/distutils/core.py", line 137, in setup
          config = configuration()
        File "setup.py", line 435, in configuration
          raise NotFoundError(msg)
      numpy.distutils.system_info.NotFoundError: No lapack/blas resources found.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Suggest + volunteer: text files are not executable

Dear GenoCAE maintainers,

Thanks for the example code and examples, I find these very useful!

What is unexpected, however, is that somehow the genetic input files in the folder example_tiny are set to be executables, as can be seen in the screenshot of my terminal (see below, green indicates an executable) and by the File Manager asking me to run a text file when I open it (see below, at the right-hand side):

I guess a chmod +x was messed up somewhere :-)

I suggest to remove the executable flag of these simple text files.

I volunteer to do so.

Suggest + volunteer: add CONTRIBUTING.md

From, for example, 'Best Practices for Maintainers' on can learn it is a good idea to have guidelines on the rules for contributors to do so, e.g. one of my own CONTRIBUTING.md documents. One of the many benefits of this, is that is makes it easier to say 'no' to undesired features.

I suggest to add such a CONTRIBUTING.md document and volunteer to create a first sketch of one. Of course, the current maintainers are boss, so I do not expect the rules I put in to become the actual rules :-)

Good idea?

Use 'evaluate' without the `HO_superpopulation` file

This is a note to self, as I cannot assign myself as I am not a Collaborator. Hence I assign myself in text :-)

Idea: use Swish instead of ELU?

GCAE uses the exponential linear unit ('ELU') as an activation function. In [1] it is claimed that 'the Swish activation function would be better in all cases [over ELU]'.

I am unsure if you think it would be worth to try out Swish? The improvements in accuracy as shown in [1] are only minor.

[1] Ramachandran, Prajit, Barret Zoph, and Quoc V. Le. "Searching for activation functions." arXiv preprint arXiv:1710.05941 (2017). https://arxiv.org/abs/1710.05941

Suggest + volunteer: rename HumanOrigins249_tiny.eigenstratgeno to HumanOrigins249_tiny.bed

Dear GenoCAE maintainer,

Thanks so much for having example files and example code: I find those very useful!

I did find something unexpected, the file extension of HumanOrigins249_tiny.eigenstratgeno: this appears to be a PLINK .bed file, as it follows the same structure as described in the PLINK .bim file format doc. Also, genio (an R package to read PLINK files) cannot read .bed files if they do not have that extension.

I suggest to rename the file to what any PLINK user would expect for a .bed file, which is HumanOrigins249_tiny.bed

I volunteer to do so.

evaluate with superpops: how is the average calculated?

Dear GenoCAE maintainers, hi @cnettel and @kausmees,

As you are back, I have found the following (here discussed from my point of view). Here I submit something I found unexpected. If you also did not expect this, I'd happily create a minimally reproducible example.

When using evaluate with a superpops file, in one of my cases I got the following:

Population	num samples	f1_score_3	f1_score_5
C	333	0.0000	0.0000
B	334	0.2431	0.0000
A	333	0.4400	0.4996
avg (micro)	1000	0.3100	0.3330

The unexpectedness is in the last line, that suggests to calculate the average, but appears to do different things per column (and I understand for the first column (num_samples) to use a sum there :-) ).

I would expect the averages to be:

Population	num samples	f1_score_3	f1_score_5
C	333	0.0000	0.0000
B	334	0.2431	0.0000
A	333	0.4400	0.4996
avg (micro)	333	0.2277	0.1665

I checked: these 'averages' are also neither the harmonic nor geometric mean.

What are those values?

If you think these are weird as well, I will happily create a reproducible example. Else, I am happy to learn what these values are :-)

`2022-06-23 09:25:59.298713: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 19113388800 exceeds 10% of free system memory.`

I'm currently trying to train GenoCAE on a datasets of 15 million SNPs across 67 individuals, but seem to be running into memory issues despite the fact that I'm using a AMD Threadripper workstations with 252 GB of memory and 64 cores (128 threads).

I suspect this may be due to the large number of SNPs I'm including, since the example data (which runs fine) only contains 9,259 SNPS, and in the original paper 161k were used.

2,067 individuals typed at 160,858

From my limited experience with these models, the number of input features drastically affect memory usage (much moreso than sample size). So I think my first step will be to filter the number of variants I'm training the model on based on some of the guidelines provided in the paper:

Remove sex chromosomes
Set missing genotypes "to the most frequent value per SNP so as to avoid their influence over dimensionality reduction results".
Remove SNPS with MAF <1%.
Perform LD pruning by "removing one of each pair of SNPs in windows of 1.0 centimorgan that had an allelic R2 value greater than 0.2." Though eventually I'd like to find a way to avoid this last step because I'm interested in identifying causal variants.

Extracting feature importance

Hello!,

How would I go about extracting feature importance scores from a trained GenoCAE model? Some methods I'd like to try out:

Simply extract feature weights
Compute SHAP scores (or something akin to it)

I think part of my issue is not being sure how to recontruct the model from the GenoCAE outputs, since I see the weights are all stored in weights subfolder but not sure how to import them.

As a side note, I'm more familiar with the format where the entire model (architecture, weights) is saved as one .h5 file. Is there a way to save GenoCAE models in this way?

Thanks so much!,
Brian

Suggest: cite paper

Hi @kausmees,

GCAE seems awesome to me! What I feel is missing is a reference to the paper at BioRxiv. I suggest to add it add it as reference, something like I do below. Sure, I volunteer to do so myself via a Pull Request :-)

References

[1] Ausmees, Kristiina, and Carl Nettelblad. "A deep learning framework for characterization of genotype data." bioRxiv (2020). here

Request: add a toy model setup

Dear GenoCAE maintainers, hi @cnettel and @kausmees,

Thanks for GenoCAE and the experimental Pheno branch!

What I would enjoy is a toy Mx model (e.g. M0) and a toy px model (e.g. p0) that would be the smallest neural network possible, respecting the dimensions of the input and output (or: 'they just work' (although their predictions will be bad)).

I have tried modifying the /models/M1.json and /models/p2.json files (the latter only available on the Pheno branch), but I feel this will take you seconds to create.

I would enjoy this as this would speed up my GitHub Actions test suite: now training alone takes 150 seconds, whereas I am (usually) only able in that it creates some files, not the output being useful (for useful output I would use the regular models).

Would it be easy to add toy models Mx (e.g. models/M0.json) and toy model px (e.g. models/p0.json)?

If I underestimate how hard this is, just let me know, and I will try harder :-)

Thanks and cheers, Richel

Suggest: release a version

Dear GenoCAE maintainers,

I quote from the GitHub docs:

Releases are deployable software iterations you can package and make available for a wider audience to download and use.

I would love to have a named release version, e.g. v1.0 (as that is the version shown in the help) that I can use (over the commit hash) for the gcaer R package I am writing to install this cool tool.

It's easy to make one:

Sure, I volunteer to do this, but for that I need more access rights than you may want to give (which I'd understand :-) ).

Suggest: allow to work with GenoCAE without changing the working directory

Dear GenoCAE maintainer,

Here I suggest to to allow a user to run GCAE from any folder, instead of forcing him/here to work from the GenoCAE folder.

When running the 'training' example code from the GenoCAE folder, the training works awesome:

Here I run the command:

richel@N141CU:~/.local/share/gcaer/gcae_v1_0$ /home/richel/.local/share/r-miniconda/envs/r-reticulate/bin/python \
  ~/.local/share/gcaer/gcae_v1_0/run_gcae.py train --datadir ~/.local/share/gcaer/gcae_v1_0/example_tiny/ \
  --data HumanOrigins249_tiny --model_id M1 --epochs 20 --save_interval 2 --train_opts_id ex3 --data_opts_id b_0_4

Here is part of the result:

2021-06-28 14:50:13.776150: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-06-28 14:50:13.776180: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
tensorflow version 2.3.3

______________________________ arguments ______________________________
train : True
datadir : /home/richel/.local/share/gcaer/gcae_v1_0/example_tiny/
data : HumanOrigins249_tiny
model_id : M1
...

However, when I work from another folder, say, one folder up ...

richel@N141CU:~/.local/share/gcaer$ /home/richel/.local/share/r-miniconda/envs/r-reticulate/bin/python \
  ~/.local/share/gcaer/gcae_v1_0/run_gcae.py train --datadir ~/.local/share/gcaer/gcae_v1_0/example_tiny/  \
  --data HumanOrigins249_tiny --model_id M1 --epochs 20 --save_interval 2 --train_opts_id ex3 --data_opts_id b_0_4

I get an error message that "data_opts/" + data_opts_id+".json" cannot be found, at here in the code:

2021-06-28 14:50:53.728916: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-06-28 14:50:53.728947: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
tensorflow version 2.3.3
Traceback (most recent call last):
  File "/home/richel/.local/share/gcaer/gcae_v1_0/run_gcae.py", line 396, in <module>
    with open("data_opts/" + data_opts_id+".json") as data_opts_def_file:
FileNotFoundError: [Errno 2] No such file or directory: 'data_opts/b_0_4.json'

The problem here is the hardcoded "data_opts/" part, that forces me to work in the same folder as GenoCAE. It feels clumsy to work with, as I have to change the working directory when calling GenoCAE. Note that, looking at the code, the same applies for train_opts and models.

I would enjoy a way to either (my favorites are first :-) ):

(1) being able to set the path from where the current JSON file is expected to be, e.g. via --data_opts_folder CLI argument, code becomes data_opts_folder + "data_opts/" + data_opts_id+".json", or
(2) specify the full path to the JSON file instead, --data_opts myfolder/b_0_4.json, code becomes data_opts (which is now a filename), or
(3) Expect the JSON files to be in the folder specified by --datadir ( data_dir + "/" + data_opts_id+".json"), or

Would one of these options be doable?

Suggest: give an error message when an invalid CLI argument is given

When I run GCAE from the command line with a nonsense argument, e.g. nonsense:

python run_gcae.py nonsense

I get to see the help file:

2021-06-24 06:34:30.118604: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-06-24 06:34:30.118629: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
tensorflow version 2.3.3
Usage:
  run_gcae.py train --datadir=<name> --data=<name> --model_id=<name> --train_opts_id=<name> --data_opts_id=<name> --save_interval=<num> --epochs=<num> [--resume_from=<num> --trainedmodeldir=<name> ]
  run_gcae.py project --datadir=<name>   [ --data=<name> --model_id=<name>  --train_opts_id=<name> --data_opts_id=<name> --superpops=<name> --epoch=<num> --trainedmodeldir=<name>   --pdata=<name> --trainedmodelname=<name>]
  run_gcae.py plot --datadir=<name> [  --data=<name>  --model_id=<name> --train_opts_id=<name> --data_opts_id=<name>  --superpops=<name> --epoch=<num> --trainedmodeldir=<name>  --pdata=<name> --trainedmodelname=<name>]
  run_gcae.py animate --datadir=<name>   [ --data=<name>   --model_id=<name> --train_opts_id=<name> --data_opts_id=<name>  --superpops=<name> --epoch=<num> --trainedmodeldir=<name> --pdata=<name> --trainedmodelname=<name>]
  run_gcae.py evaluate --datadir=<name> --metrics=<name>  [  --data=<name>  --model_id=<name> --train_opts_id=<name> --data_opts_id=<name>  --superpops=<name> --epoch=<num> --trainedmodeldir=<name>  --pdata=<name> --trainedmodelname=<name>]

I think it is already friendly to show the help, yet I would not expect the output to be exactly the same as when doing python run_gcae.py --help. I suggest to add an error message (and error code) if an invalid CLI argument is given.

Projecting non-population metadata

I'm trying to project sample metadata onto a trained GenoCAE model,

I'm wondering if this is a formatting issue with my --superpops input file (e.g. i have >2 columns, and some rows have NAs for missing metadata), or if GenoCAE is hard-code to only accept superpopulation-type metadata.

I'm attaching here an example of my metadata.
merged.metadata.csv

Command

python3 run_gcae.py project --datadir /shared/bms20/projects/MND_ALS/SNP_VCFs/merged/ --data merged.SNPS_filtered.plink --trainedmodeldir als_out --model_id M1 --train_opts_id ex3  --data_opts_id b_0_4 --superpops /home/bms20/projects/MND_ALS/SNP_VCFs/merged/merged.metadata.csv

Output

...
...
Projecting epochs: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
Already projected: []
In DG.get_train_set: number of -1.0 genotypes in train: 5689140
In DG.get_train_set: number of -9 genotypes in train: 0
In DG.get_train_set: number of 0 values in train mask: 0
2022-06-23 12:39:48.063862: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

______________________________ Building model ______________________________
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'strides': 1}
Adding layer: BatchNormalization: {}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: MaxPooling1D: {'pool_size': 5, 'strides': 2, 'padding': 'same'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Flatten: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dense: {'units': 2, 'name': 'encoded'}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 621496}
Adding layer: Reshape: {'target_shape': (77687, 8), 'name': 'i_msvar'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Reshape: {'target_shape': (77687, 1, 8)}
Adding layer: UpSampling2D: {'size': (2, 1)}
Adding layer: Reshape: {'target_shape': (155374, 8)}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu', 'name': 'nms'}
Adding layer: BatchNormalization: {}
Adding layer: Conv1D: {'filters': 1, 'kernel_size': 1, 'padding': 'same'}
Adding layer: Flatten: {'name': 'logits'}
########################### epoch 2 ###########################
Reading weights from /shared/bms20/projects/GenoCAE/als_out/ae.M1.ex3.b_0_4.merged.SNPS_filtered.plink/weights/2
**Traceback (most recent call last):
  File "/shared/bms20/projects/GenoCAE/run_gcae.py", line 1011, in <module>
    plot_coords_by_superpop(coords_by_pop,"{0}/dimred_e_{1}_by_superpop".format(results_directory, epoch), superpopulations_file, plot_legend = epoch == epochs[0])
  File "/shared/bms20/projects/GenoCAE/utils/visualization.py", line 203, in plot_coords_by_superpop
    max_num_pops = max([len(superpop_dict[spop]) for spop in superpops])
ValueError: max() arg is an empty sequence**

GCAE gives error when running 'project' with a phenotype

Dear GenoCAE maintainers,

Thanks for GenoCAE and its Continuous Integration (GitHub Actions) script!

When I run GenoCAE with the added/experimental phenotype, I can now (thanks to #19) train the neural network. Great!

However, when I want to project the genotypes, I can get it to run (after fixing #21), but in the end it fails.

Training goes great, as confirmed by this example GitHub Actions log:

python3 run_gcae.py train --datadir example_tiny --data issue_6_bin --model_id M1  --epochs 3 --save_interval 1  --train_opts_id ex3  --data_opts_id b_0_4 --pheno_model_id=p1

The last line of the output is also clear:

Done training. Wrote to /home/runner/work/GenoCAE/GenoCAE/ae_out/ae.M1.ex3.b_0_4.issue_6_bin.p1

When I start using the project option (after fixing #21), it starts running, but fails in the visualization:

When I run on GHA like this:

python3 run_gcae.py project --datadir example_tiny --data issue_6_bin --model_id M1 --train_opts_id ex3 --data_opts_id b_0_4 --superpops example_tiny/HO_superpopulations --pheno_model_id=p1

It starts doing the projection until the visualisation, then it gives the following error (as copied from the GHA log) (full error message is at the bottom of this Issue):

Traceback (most recent call last):
  File "run_gcae.py", line 1616, in <module>
    main()
  File "run_gcae.py", line 1365, in main
    plot_coords_by_superpop(coords_by_pop,"{0}/dimred_e_{1}_by_superpop".format(results_directory, epoch), superpopulations_file, plot_legend = epoch == epochs[0])
  File "/home/runner/work/GenoCAE/GenoCAE/utils/visualization.py", line 222, in plot_coords_by_superpop
    max_num_pops = max([len(superpop_dict[spop]) for spop in superpops])
ValueError: max() arg is an empty sequence
Error: Process completed with exit code 1.

I expected the projection to work the same with or without the phenotype, as it gives a useful visualization of the dimensionality reduction, as show in the Ausmees & Nettelblad paper [1].

How do I get this to work?

Thanks and cheers, Richel

References

[1] Ausmees, Kristiina and Nettelblad, Carl. "A deep learning framework for characterization of genotype data." bioRxiv (2020).

Full error message

Copied from a GHA log:

2021-12-07 12:55:17.080095: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-07 12:55:17.080130: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-12-07 12:55:19.686377: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-12-07 12:55:19.686415: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2021-12-07 12:55:19.686436: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az74-543): /proc/driver/nvidia/version does not exist
2021-12-07 12:55:19.686793: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
tensorflow version 2.7.0


______________________________ arguments ______________________________
train : False
datadir : example_tiny
data : issue_6_bin
model_id : M1
train_opts_id : ex3
data_opts_id : b_0_4
save_interval : None
epochs : None
resume_from : None
trainedmodeldir : None
pheno_model_id : p1
project : True
superpops : example_tiny/HO_superpopulations
epoch : None
pdata : None
trainedmodelname : None
plot : False
animate : False
evaluate : False
metrics : None

______________________________ data opts ______________________________
sparsifies : [0.0, 0.1, 0.2, 0.3, 0.4]
norm_opts : {'flip': False, 'missing_val': -1.0}
norm_mode : genotypewise01
impute_missing : True
validation_split : 0.2

______________________________ train opts ______________________________
learning_rate : 0.00032
batch_size : 10
noise_std : 0.0032
n_samples : -1
loss : {'module': 'tf.keras.losses', 'class': 'CategoricalCrossentropy', 'args': {'from_logits': False}}
regularizer : {'reg_factor': 1e-07, 'module': 'tf.keras.regularizers', 'class': 'l2'}
lr_scheme : {'module': 'tf.keras.optimizers.schedules', 'class': 'ExponentialDecay', 'args': {'decay_rate': 0.96, 'decay_steps': 100, 'staircase': False}}
______________________________
Imputing originally missing genotypes to most common value.
Reading ind pop list from /home/runner/work/GenoCAE/GenoCAE/example_tiny/issue_6_bin.fam
Reading ind pop list from /home/runner/work/GenoCAE/GenoCAE/example_tiny/issue_6_bin.fam
Mapping files:   0%|          | 0/3 [00:00<?, ?it/s]
Mapping files: 100%|██████████| 3/3 [00:00<00:00, 227.43it/s]array([[ 0.10205683, -0.50682646, -0.7242572 , -0.41514382],
       [ 0.08256383, -0.4811394 , -0.66083604, -0.38157633],
       [ 0.04545861, -0.38441584, -0.52315164, -0.36843315],
       [ 0.10497452, -0.50911087, -0.73314375, -0.42168617],
       [ 0.06072002, -0.40441108, -0.57263076, -0.39334926]],
      dtype=float32)
[[0.5 0.5 0 0.5]
 [0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5]
 [0.5 0.5 0 0.5]
 [0.5 0.5 0.5 0.5]] array([[1. , 1. , 0.5, 1. ],
       [1. , 0.5, 1. , 0.5],
       [1. , 0. , 0.5, 1. ],
       [0.5, 0.5, 0.5, 0.5],
       [1. , 0.5, 0. , 0.5]])

Encoded data file not found: /home/runner/work/GenoCAE/GenoCAE/ae_out/ae.M1.ex3.b_0_4.issue_6_bin.p1/issue_6_bin/encoded_data.h5 
Projecting epochs: [1, 2, 3]
Already projected: []
In DG.get_train_set: number of -1.0 genotypes in train: 0
In DG.get_train_set: number of -9 genotypes in train: 0
In DG.get_train_set: number of 0 values in train mask: 0

______________________________ Building model ______________________________
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'strides': 1}
Adding layer: BatchNormalization: {}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: MaxPooling1D: {'pool_size': 5, 'strides': 2, 'padding': 'same'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Flatten: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dense: {'units': 2, 'name': 'encoded'}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 16}
Adding layer: Reshape: {'target_shape': (2, 8), 'name': 'i_msvar'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Reshape: {'target_shape': (2, 1, 8)}
Adding layer: UpSampling2D: {'size': (2, 1)}
Adding layer: Reshape: {'target_shape': (4, 8)}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu', 'name': 'nms'}
Adding layer: BatchNormalization: {}
Adding layer: Conv1D: {'filters': 1, 'kernel_size': 1, 'padding': 'same'}
Adding layer: Flatten: {'name': 'logits'}

______________________________ Building model ______________________________
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dense: {'units': 1}
No marker specific variable.
########################### epoch 1 ###########################
Reading weights from /home/runner/work/GenoCAE/GenoCAE/ae_out/ae.M1.ex3.b_0_4.issue_6_bin.p1/weights/1
tf.Tensor(
[0.23319791 0.19988029 0.19377704 0.23756738 0.21141517 0.19966617
 0.18867628 0.22778517 0.22095694 0.23621069 0.24232931 0.23751874
 0.21052796 0.2478469  0.22860327 0.24310602 0.2248713  0.22607517
 0.21316327 0.24836378 0.24003232 0.2017554  0.2420473  0.25501102
 0.236629   0.22140262 0.20744076 0.20671275 0.22881663 0.19617875], shape=(30,), dtype=float32)
(30,)
tf.Tensor(
[0.2380066  0.19335003 0.23319791 0.23863165 0.23831964 0.18697073
 0.23600692 0.21887155 0.1588867  0.1949499  0.21993566 0.25195104
 0.18325783 0.24391052 0.18994804 0.23802298 0.20401272 0.22448331
 0.2229189  0.21869858 0.23501374 0.22200371 0.23621069 0.20525226
 0.2003792  0.24328303 0.24873887 0.23385888 0.24009213 0.2101993 ], shape=(30,), dtype=float32)
(30,)
tf.Tensor(
[0.21908474 0.22722876 0.2339882  0.23049714 0.20231368 0.2011057
 0.21599561 0.17069077 0.23896992 0.2420473  0.24056283 0.20440042
 0.24473031 0.23399888 0.2080764  0.21408066 0.23621069 0.20678237
 0.20441745 0.19976138 0.200915   0.22420047 0.21946532 0.2136519
 0.23442683 0.2120275  0.23950182 0.21574992 0.23319791 0.2304342 ], shape=(30,), dtype=float32)
(30,)
tf.Tensor(
[0.24093255 0.23797578 0.23640822 0.19804403 0.21179074 0.24339268
 0.20556012 0.24795108 0.22094505 0.25103822 0.2339133  0.18515958
 0.23047343 0.23206557 0.20824197 0.23773867 0.22685748 0.18689398
 0.21542913 0.23442683 0.24944112 0.24592474 0.22365497 0.22963423
 0.19812118 0.24454156 0.23143443 0.21166426 0.21157375 0.2214748 ], shape=(30,), dtype=float32)
(30,)
tf.Tensor(
[0.2144528  0.23790193 0.20847306 0.17789963 0.22853369 0.22519661
 0.22575805 0.23663682 0.2309236  0.21082726 0.19669218 0.1876471
 0.18697073 0.22914475 0.20111184 0.2027495  0.22810453 0.24159692
 0.24206525 0.19896871 0.22794227 0.21941704 0.21471435 0.19822605
 0.20103996 0.23831964 0.18830639 0.20552994 0.23621069 0.23235762], shape=(30,), dtype=float32)
(30,)
tf.Tensor(
[0.24529332 0.2289039  0.23621069 0.19376357 0.23415062 0.22575805
 0.21179074 0.21793306 0.23040852 0.21893027 0.24770258 0.19905189
 0.21635695 0.2532666  0.24553452 0.1958462 ], shape=(16,), dtype=float32)
(16,)
Traceback (most recent call last):
  File "run_gcae.py", line 1616, in <module>
    main()
  File "run_gcae.py", line 1365, in main
    plot_coords_by_superpop(coords_by_pop,"{0}/dimred_e_{1}_by_superpop".format(results_directory, epoch), superpopulations_file, plot_legend = epoch == epochs[0])
  File "/home/runner/work/GenoCAE/GenoCAE/utils/visualization.py", line 222, in plot_coords_by_superpop
    max_num_pops = max([len(superpop_dict[spop]) for spop in superpops])
ValueError: max() arg is an empty sequence
Error: Process completed with exit code 1.

Use 'project' without a 'HO_superpopulation' file

This is a note to self, as I cannot assign myself as I am not a Collaborator. Hence I assign myself in text :-)

Error while building docker container

The error occurred at the end of the building procedure.
I think the version format in the requirements.txt file is the problem.

 => ERROR [6/6] RUN python3 -m pip install -r /workspace/requirements.txt and &&rm /workspace/requirements.txt                                                                                                                                                                                                     1.2s
------                                                                                                                                                                                                                                                                                                                  
 > [6/6] RUN python3 -m pip install -r /workspace/requirements.txt and &&rm /workspace/requirements.txt:                                                                                                                                                                                                                
#8 1.023 ERROR: Could not find a version that satisfies the requirement and (from versions: none)
#8 1.024 ERROR: No matching distribution found for and
------
executor failed running [/bin/bash -c python3 -m pip install -r /workspace/requirements.txt and &&rm /workspace/requirements.txt]: exit code: 1

What is the cause for 'ValueError: Dimensions must be equal'?

Dear GenoCAE maintainers, hi @kausmees and @cnettel,

When I run the GenCAE experimental Pheno branch, I get an error of which I have no idea what to do with. Below the reprex.

Currently, the GitHub Actions script runs GenoCAE with the --help flag, showing the help successfully.

On my fork of GenoCAE in the GitHub Actions 'check.yaml' script, I added the following command to run:

python3 run_gcae.py train --datadir example_tiny --data issue_2_bin --model_id M1  --epochs 20 --save_interval 2  --train_opts_id ex3  --data_opts_id b_0_4 --pheno_model_id=p1

The --data issue_2_bin are the data files I supplied to Carl at this Issue and are already put in the example_tiny folder of my 'GenoCAE' fork.

GitHub Actions gives the following error:

ValueError: in user code:

    File "/home/richel/GitHubs/GenoCAE/run_gcae.py", line 424, in run_optimization  *
        loss_value += tf.math.reduce_sum(((-y_pred) * y_true)) * 1e-6

    ValueError: Dimensions must be equal, but are 2 and 4 for '{{node mul_21}} = Mul[T=DT_FLOAT](Neg_2, one_hot_2)' with input shapes: [2,4], [2,4,3].

Below is the full error log, which can also be found in this GitHub Actions log.

What does the error mean?

Thanks and cheers, Richel

Full error log

richel@N141CU:~/GitHubs/GenoCAE$ python3 run_gcae.py train --datadir example_tiny --data issue_2_bin --model_id M1  --epochs 20 --save_interval 2  --train_opts_id ex3  --data_opts_id b_0_4 --pheno_model_id=p1
2021-11-30 13:41:10.460286: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-30 13:41:10.460312: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-30 13:41:13.244831: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-11-30 13:41:13.244903: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (N141CU): /proc/driver/nvidia/version does not exist
2021-11-30 13:41:13.245186: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
tensorflow version 2.7.0

______________________________ arguments ______________________________
train : True
datadir : example_tiny
data : issue_2_bin
model_id : M1
train_opts_id : ex3
data_opts_id : b_0_4
save_interval : 2
epochs : 20
resume_from : None
trainedmodeldir : None
pheno_model_id : p1
project : False
superpops : None
epoch : None
pdata : None
trainedmodelname : None
plot : False
animate : False
evaluate : False
metrics : None

______________________________ data opts ______________________________
sparsifies : [0.0, 0.1, 0.2, 0.3, 0.4]
norm_opts : {'flip': False, 'missing_val': -1.0}
norm_mode : genotypewise01
impute_missing : True
validation_split : 0.2

______________________________ train opts ______________________________
learning_rate : 0.00032
batch_size : 10
noise_std : 0.0032
n_samples : -1
loss : {'module': 'tf.keras.losses', 'class': 'CategoricalCrossentropy', 'args': {'from_logits': False}}
regularizer : {'reg_factor': 1e-07, 'module': 'tf.keras.regularizers', 'class': 'l2'}
lr_scheme : {'module': 'tf.keras.optimizers.schedules', 'class': 'ExponentialDecay', 'args': {'decay_rate': 0.96, 'decay_steps': 100, 'staircase': False}}
______________________________
Imputing originally missing genotypes to most common value.
Reading ind pop list from /home/richel/GitHubs/GenoCAE/example_tiny/issue_2_bin.fam
Reading ind pop list from /home/richel/GitHubs/GenoCAE/example_tiny/issue_2_bin.fam
Mapping files: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 362.20it/s]
Using learning rate schedule tf.keras.optimizers.schedules.ExponentialDecay with {'decay_rate': 0.96, 'decay_steps': 100, 'staircase': False}

______________________________ Data ______________________________
N unique train samples: 800
--- training on : 800
N valid samples: 200
N markers: 4


______________________________ Building model ______________________________
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'strides': 1}
Adding layer: BatchNormalization: {}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: MaxPooling1D: {'pool_size': 5, 'strides': 2, 'padding': 'same'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Flatten: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dense: {'units': 2, 'name': 'encoded'}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 16}
Adding layer: Reshape: {'target_shape': (2, 8), 'name': 'i_msvar'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Reshape: {'target_shape': (2, 1, 8)}
Adding layer: UpSampling2D: {'size': (2, 1)}
Adding layer: Reshape: {'target_shape': (4, 8)}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu', 'name': 'nms'}
Adding layer: BatchNormalization: {}
Adding layer: Conv1D: {'filters': 1, 'kernel_size': 1, 'padding': 'same'}
Adding layer: Flatten: {'name': 'logits'}

______________________________ Building model ______________________________
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'strides': 1}
Adding layer: BatchNormalization: {}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: MaxPooling1D: {'pool_size': 5, 'strides': 2, 'padding': 'same'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same'}
Adding layer: BatchNormalization: {}
Adding layer: Flatten: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dense: {'units': 2, 'name': 'encoded'}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 16}
Adding layer: Reshape: {'target_shape': (2, 8), 'name': 'i_msvar'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same'}
Adding layer: BatchNormalization: {}
Adding layer: Reshape: {'target_shape': (2, 1, 8)}
Adding layer: UpSampling2D: {'size': (2, 1)}
Adding layer: Reshape: {'target_shape': (4, 8)}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'name': 'nms'}
Adding layer: BatchNormalization: {}
Adding layer: Conv1D: {'filters': 1, 'kernel_size': 1, 'padding': 'same'}
Adding layer: Flatten: {'name': 'logits'}

______________________________ Building model ______________________________
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: LeakyReLU: {}
Adding layer: Dense: {'units': 1}
No marker specific variable.
ALLVARS [<tf.Variable 'autoencoder/conv1d/kernel:0' shape=(5, 3, 8) dtype=float32>, <tf.Variable 'autoencoder/conv1d/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/conv1d_1/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/conv1d_1/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/batch_normalization_1/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/batch_normalization_1/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/conv1d_2/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/conv1d_2/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/batch_normalization_2/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2/batch_normalization_2/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/conv1d_3/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/conv1d_3/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_3/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_3/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/dense/kernel:0' shape=(16, 75) dtype=float32>, <tf.Variable 'autoencoder/dense/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder/dense_1/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder/dense_1/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder/encoded/kernel:0' shape=(75, 2) dtype=float32>, <tf.Variable 'autoencoder/encoded/bias:0' shape=(2,) dtype=float32>, <tf.Variable 'autoencoder/dense_2/kernel:0' shape=(2, 75) dtype=float32>, <tf.Variable 'autoencoder/dense_2/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder/dense_3/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder/dense_3/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder/dense_4/kernel:0' shape=(75, 16) dtype=float32>, <tf.Variable 'autoencoder/dense_4/bias:0' shape=(16,) dtype=float32>, <tf.Variable 'autoencoder/conv1d_4/kernel:0' shape=(5, 10, 8) dtype=float32>, <tf.Variable 'autoencoder/conv1d_4/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_4/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_4/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/conv1d_5/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/conv1d_5/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/batch_normalization_5/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/batch_normalization_5/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/conv1d_6/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/conv1d_6/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/batch_normalization_6/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/residual_block2_1/batch_normalization_6/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/nms/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder/nms/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_7/gamma:0' shape=(9,) dtype=float32>, <tf.Variable 'autoencoder/batch_normalization_7/beta:0' shape=(9,) dtype=float32>, <tf.Variable 'autoencoder/conv1d_7/kernel:0' shape=(1, 9, 1) dtype=float32>, <tf.Variable 'autoencoder/conv1d_7/bias:0' shape=(1,) dtype=float32>, <tf.Variable 'Variable:0' shape=(1, 4) dtype=float32>, <tf.Variable 'Variable:0' shape=(1, 4) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_8/kernel:0' shape=(5, 3, 8) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_8/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_8/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_8/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/conv1d_9/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/conv1d_9/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/batch_normalization_9/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/batch_normalization_9/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/conv1d_10/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/conv1d_10/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/batch_normalization_10/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_2/batch_normalization_10/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_11/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_11/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_11/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_11/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/dense_5/kernel:0' shape=(16, 75) dtype=float32>, <tf.Variable 'autoencoder_1/dense_5/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_1/dense_6/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder_1/dense_6/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_1/encoded/kernel:0' shape=(75, 2) dtype=float32>, <tf.Variable 'autoencoder_1/encoded/bias:0' shape=(2,) dtype=float32>, <tf.Variable 'autoencoder_1/dense_7/kernel:0' shape=(2, 75) dtype=float32>, <tf.Variable 'autoencoder_1/dense_7/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_1/dense_8/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder_1/dense_8/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_1/dense_9/kernel:0' shape=(75, 16) dtype=float32>, <tf.Variable 'autoencoder_1/dense_9/bias:0' shape=(16,) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_12/kernel:0' shape=(5, 10, 8) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_12/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_12/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_12/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/conv1d_13/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/conv1d_13/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/batch_normalization_13/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/batch_normalization_13/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/conv1d_14/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/conv1d_14/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/batch_normalization_14/gamma:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/residual_block2_3/batch_normalization_14/beta:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/nms/kernel:0' shape=(5, 8, 8) dtype=float32>, <tf.Variable 'autoencoder_1/nms/bias:0' shape=(8,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_15/gamma:0' shape=(9,) dtype=float32>, <tf.Variable 'autoencoder_1/batch_normalization_15/beta:0' shape=(9,) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_15/kernel:0' shape=(1, 9, 1) dtype=float32>, <tf.Variable 'autoencoder_1/conv1d_15/bias:0' shape=(1,) dtype=float32>, <tf.Variable 'Variable:0' shape=(1, 4) dtype=float32>, <tf.Variable 'Variable:0' shape=(1, 4) dtype=float32>, <tf.Variable 'autoencoder_2/dense_10/kernel:0' shape=(2, 75) dtype=float32>, <tf.Variable 'autoencoder_2/dense_10/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_2/dense_11/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder_2/dense_11/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_2/dense_12/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder_2/dense_12/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_2/dense_13/kernel:0' shape=(75, 75) dtype=float32>, <tf.Variable 'autoencoder_2/dense_13/bias:0' shape=(75,) dtype=float32>, <tf.Variable 'autoencoder_2/dense_14/kernel:0' shape=(75, 1) dtype=float32>, <tf.Variable 'autoencoder_2/dense_14/bias:0' shape=(1,) dtype=float32>] ###
Traceback (most recent call last):
  File "/home/richel/GitHubs/GenoCAE/run_gcae.py", line 1616, in <module>
    main()
  File "/home/richel/GitHubs/GenoCAE/run_gcae.py", line 1014, in main
    run_optimization(autoencoder, autoencoder2, optimizer, optimizer2, loss_func, input_init, targets_init, True, phenomodel=pheno_model, phenotargets=phenotargets_init)
  File "/home/richel/miniconda3/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/richel/miniconda3/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1129, in autograph_handler
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    File "/home/richel/GitHubs/GenoCAE/run_gcae.py", line 424, in run_optimization  *
        loss_value += tf.math.reduce_sum(((-y_pred) * y_true)) * 1e-6

    ValueError: Dimensions must be equal, but are 2 and 4 for '{{node mul_21}} = Mul[T=DT_FLOAT](Neg_2, one_hot_2)' with input shapes: [2,4], [2,4,3].

kausmees / genocae Goto Github PK

genocae's Introduction

GenoCAE

Installation

Manual Installation

Requirements:

Install Python packages:

Docker Installation

Build Docker image

CLI

CLI

Setup training

data

data options

normalization methods

model

train options

On saving model state

Examples

Training

Projecting

Plotting

Animating

Evaluating

Example results

Getting started

References

genocae's People

Contributors

Stargazers

Watchers

Forkers

genocae's Issues

PLINK v1.7

PLINK v1.9

PLINK v2.0

Example 1

Example 2

Reprex

Output

Potential solutions

Full error log

References

Command

Output

References

Full error message

Full error log

Recommend Projects

Recommend Topics

Recommend Org