grahamgower / genomatnn Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 3.0 15.8 MB

Predicts adaptive introgression using a CNN trained on genotype matrices.

License: Other

Python 99.47% Cython 0.53%

genomatnn's People

Contributors

Stargazers

Watchers

Forkers

agladstein lds-axel ningshuang-yao

genomatnn's Issues

training on arbitrary sims

I tried to update the .toml config to use my own simulated .trees files. I changed it to:

[sim.tranche]
# The labels and modelspec(s) for each tranche. The network will be trained to
# classify data as coming from one of these tranches. Each tranche consists of
# a list of simulation modelspecs.
# Only two tranches are supported.
"constant_2pop" = [
	"genomatnn_data/trees/constant_2pop",

	# Skip this for now, as it's too computationally intensive
	# to do many replicates for training. :-(
	#"HomSap/HomininComposite_4G20/DFE",
]

single_pulse_uni_AB = [
	"genomatnn_data/trees/single_pulse_uni_AB",
]

But, I get the error:

Traceback (most recent call last):
  File "/home/aglad/.conda/envs/genomatnn/bin/genomatnn", line 33, in <module>
    sys.exit(load_entry_point('genomatnn==0.1.dev115+g4e4a918', 'console_scripts', 'genomatnn')())
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/cli.py", line 587, in main
    args = parse_args(args_list)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/cli.py", line 540, in parse_args
    args.conf = config.Config(args.conf)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/config.py", line 79, in __init__
    self._getcfg_sim()
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/config.py", line 142, in _getcfg_sim
    self._getcfg_tranche(self.sim["tranche"])
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/config.py", line 150, in _getcfg_tranche
    model = sim.get_demog_model(modelspec)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/genomatnn-0.1.dev115+g4e4a918-py3.6-linux-x86_64.egg/genomatnn/sim.py", line 841, in get_demog_model
    raise ValueError(f"{modelspec} not found")
ValueError: genomatnn_data/trees/constant_2pop not found

So, am I right in interpreting this that genomatnn is not currently setup to use any arbitrary simulations defined in the toml? Or am I missing something about how to define them?

specify modelspec proportions to be used for calibration

E.g. something like:

[calibrate.weights]
"HomSap/HomininComposite_4G20/Neutral/slim" = 1
"HomSap/HomininComposite_4G20/Sweep/CEU" = 1
"HomSap/HomininComposite_4G20/AI/Nea_to_CEU" = 0.01

Or maybe per-tranche?

[calibrate.weights]
"AI" = 1
"not AI" = 0.01

Typo in demographic_models

Hi @grahamgower

It looks like the population size of YRI should be 27600 (Gower et al. 2021 Appedix 3 Table 1), while the population size is 27000 in the demographic models.
https://github.com/grahamgower/genomatnn/blob/main/demographic_models/HomininComposite_4G20.yaml#L30
https://github.com/grahamgower/genomatnn/blob/main/demographic_models/HomininComposite2_4G20.yaml#L45

move calibration configuration out of the cli

make project installable via pip

Move code under a subdirectory, add setup.py, etc.

update requirements

Update requirements.txt to lockdown numpy and tensorflow version. Also, add nose.

change licence to GPL due to msprime/stdpopsim dependence

Clarify that all bits not using GPL imports can be freely used under a permissive licence.

tests failing

I tried following the instructions in the README. And I got some errors when I ran the tests.

Here is what I did:

module load anaconda5 ## necessary for the HPC
conda create -n genomatnn gsl cudnn "numpy<1.19" "blas=*=mkl"
source activate genomatnn
pip install git+https://github.com/grahamgower/stdpopsim.git@selection --user
git clone https://github.com/grahamgower/genomatnn.git
cd genomatnn
conda update numpy ## fixes the numpy-tensorflow conflict
python setup.py install
pip install nose
python setup.py build_ext -i
nosetests -v tests

Here is the output from the tests

2020-11-24 14:08:34.157097: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/itac/2020.3.036/intel64/slib:/opt/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib:/opt/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/lib/release:/opt/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2020.4.304/linux/ipp/lib/intel64:/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin:/opt/intel/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8:/opt/intel/debugger_2020/libipt/intel64/lib:/opt/intel/compilers_and_libraries_2020.4.304/linux/daal/lib/intel64_lin
2020-11-24 14:08:34.157205: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
test_upsample_indexes (tests.test_calibrate.TestUpsample) ... ok
test_upsample_indexes_weighted (tests.test_calibrate.TestUpsample) ... ok
2020-11-24 14:08:40.758042: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-24 14:08:40.758587: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/itac/2020.3.036/intel64/slib:/opt/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib:/opt/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/lib/release:/opt/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2020.4.304/linux/ipp/lib/intel64:/opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin:/opt/intel/compilers_and_libraries_2020.4.304/linux/tbb/lib/intel64/gcc4.8:/opt/intel/debugger_2020/libipt/intel64/lib:/opt/intel/compilers_and_libraries_2020.4.304/linux/daal/lib/intel64_lin
2020-11-24 14:08:40.758626: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2020-11-24 14:08:40.758665: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (br006.pvt.bridges.psc.edu): /proc/driver/nvidia/version does not exist
2020-11-24 14:08:40.761023: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-24 14:08:41.002596: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_410"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
2020-11-24 14:08:41.011092: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-24 14:08:41.011665: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2297105000 Hz
2020-11-24 14:08:45.722282: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_2355"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
ERROR
2020-11-24 14:08:50.673550: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_3263"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
2020-11-24 14:08:54.761367: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_5208"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
ERROR
2020-11-24 14:08:59.817739: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_6116"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
2020-11-24 14:09:03.661341: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_8061"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
ERROR
2020-11-24 14:09:08.947115: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_8969"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
2020-11-24 14:09:12.812368: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_10914"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
ERROR
test_list_modelspecs (tests.test_cli.TestSim) ... ok
test_missing_config_file (tests.test_cli.TestSim) ... ok
test_sim (tests.test_cli.TestSim) ... ok
test_missing_config_file (tests.test_cli.TestTrain) ... ok
test_train (tests.test_cli.TestTrain) ... 2020-11-24 14:09:19.118835: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_11822"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
2020-11-24 14:09:23.264315: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_13767"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
ok
test_train_from_cache (tests.test_cli.TestTrain) ... 2020-11-24 14:09:25.810186: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_14675"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
2020-11-24 14:09:28.671301: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_16620"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
ok
2020-11-24 14:09:32.904075: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_17528"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
2020-11-24 14:09:36.717616: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:654] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_9"
op: "FlatMapDataset"
input: "PrefetchDataset/_8"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_slice_batch_indices_19473"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
ERROR
test_min_coalescent_time_ancient_samples (tests.test_contact.TestContact) ... ok
test_split_time (tests.test_contact.TestContact) ... ok
test_tmcra (tests.test_contact.TestContact) ... ok
test_exclusion_of_drawn_mutation (tests.test_convert.TestDrawnMutation) ... ERROR
test_compare_ts_vcf_genotype_matrixes (tests.test_convert.TestGenotypeMatrixes) ... ERROR
test_ts_genotype_matrix (tests.test_convert.TestGenotypeMatrixes) ... ok
test_reorder (tests.test_convert.TestSorting) ... ok
test_sort_similarity (tests.test_convert.TestSorting) ... ok
test_ts_pop_counts_indices (tests.test_convert.TestSorting) ... ok
test_verify_partition (tests.test_convert.TestSorting) ... ok
test_gt_bytes2vec (tests.test_misc.TestMisc) ... ok
test_basic_cnn (tests.test_tfstuff.TestModelConstruction) ... ok
test_per_population_permutation_invariant_cnn (tests.test_tfstuff.TestModelConstruction) ... ok
test_permutation_invariant_cnn (tests.test_tfstuff.TestModelConstruction) ... ok
test_accumulate_matrices (tests.test_vcf.TestVCF) ... ERROR
test_bcftools_query (tests.test_vcf.TestVCF) ... ERROR
test_genotypes (tests.test_vcf.TestVCF) ... ERROR

======================================================================
ERROR: test suite for <class 'tests.test_cli.TestApply'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/pylon5/mcz3c2p/aglad/genomatnn/tests/test_cli.py", line 52, in setUpClass
    cli.main(f"train --seed 1 {cls.config_file}".split())
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 588, in main
    args.func(args.conf)
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 120, in do_train
    do_eval(conf)
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 171, in do_eval
    model = models.load_model(conf.nn_hdf5_file)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/tensorflow-2.4.0rc2-py3.6-linux-x86_64.egg/tensorflow/python/keras/saving/save.py", line 207, in load_model
    compile)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/tensorflow-2.4.0rc2-py3.6-linux-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

======================================================================
ERROR: test suite for <class 'tests.test_cli.TestApply_phased'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/pylon5/mcz3c2p/aglad/genomatnn/tests/test_cli.py", line 52, in setUpClass
    cli.main(f"train --seed 1 {cls.config_file}".split())
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 588, in main
    args.func(args.conf)
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 120, in do_train
    do_eval(conf)
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 171, in do_eval
    model = models.load_model(conf.nn_hdf5_file)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/tensorflow-2.4.0rc2-py3.6-linux-x86_64.egg/tensorflow/python/keras/saving/save.py", line 207, in load_model
    compile)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/tensorflow-2.4.0rc2-py3.6-linux-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

======================================================================
ERROR: test suite for <class 'tests.test_cli.TestEval'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/pylon5/mcz3c2p/aglad/genomatnn/tests/test_cli.py", line 52, in setUpClass
    cli.main(f"train --seed 1 {cls.config_file}".split())
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 588, in main
    args.func(args.conf)
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 120, in do_train
    do_eval(conf)
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 171, in do_eval
    model = models.load_model(conf.nn_hdf5_file)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/tensorflow-2.4.0rc2-py3.6-linux-x86_64.egg/tensorflow/python/keras/saving/save.py", line 207, in load_model
    compile)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/tensorflow-2.4.0rc2-py3.6-linux-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

======================================================================
ERROR: test suite for <class 'tests.test_cli.TestEval_phased'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/pylon5/mcz3c2p/aglad/genomatnn/tests/test_cli.py", line 52, in setUpClass
    cli.main(f"train --seed 1 {cls.config_file}".split())
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 588, in main
    args.func(args.conf)
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 120, in do_train
    do_eval(conf)
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 171, in do_eval
    model = models.load_model(conf.nn_hdf5_file)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/tensorflow-2.4.0rc2-py3.6-linux-x86_64.egg/tensorflow/python/keras/saving/save.py", line 207, in load_model
    compile)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/tensorflow-2.4.0rc2-py3.6-linux-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

======================================================================
ERROR: test suite for <class 'tests.test_cli.TestVcfplot'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/pylon5/mcz3c2p/aglad/genomatnn/tests/test_cli.py", line 52, in setUpClass
    cli.main(f"train --seed 1 {cls.config_file}".split())
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 588, in main
    args.func(args.conf)
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 120, in do_train
    do_eval(conf)
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/cli.py", line 171, in do_eval
    model = models.load_model(conf.nn_hdf5_file)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/tensorflow-2.4.0rc2-py3.6-linux-x86_64.egg/tensorflow/python/keras/saving/save.py", line 207, in load_model
    compile)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/site-packages/tensorflow-2.4.0rc2-py3.6-linux-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

======================================================================
ERROR: test_exclusion_of_drawn_mutation (tests.test_convert.TestDrawnMutation)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pylon5/mcz3c2p/aglad/genomatnn/tests/test_convert.py", line 260, in test_exclusion_of_drawn_mutation
    seed=1,
  File "/home/aglad/.local/lib/python3.6/site-packages/stdpopsim/slim_engine.py", line 1022, in simulate
    dry_run=dry_run)
  File "/home/aglad/.local/lib/python3.6/site-packages/stdpopsim/slim_engine.py", line 1058, in _run_slim
    stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc:
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'slim': 'slim'

======================================================================
ERROR: test_compare_ts_vcf_genotype_matrixes (tests.test_convert.TestGenotypeMatrixes)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pylon5/mcz3c2p/aglad/genomatnn/tests/test_convert.py", line 167, in test_compare_ts_vcf_genotype_matrixes
    subprocess.run(["bgzip", vcf_file])
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'bgzip': 'bgzip'

======================================================================
ERROR: test_accumulate_matrices (tests.test_vcf.TestVCF)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pylon5/mcz3c2p/aglad/genomatnn/tests/test_vcf.py", line 71, in test_accumulate_matrices
    max_missing_thres=max_missing_thres,
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/vcf.py", line 178, in accumulate_matrices
    rng=rng,
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/vcf.py", line 110, in genotypes
    regions=regions,
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/vcf.py", line 29, in bcftools_query
    stderr=subprocess.PIPE,
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'bcftools': 'bcftools'

======================================================================
ERROR: test_bcftools_query (tests.test_vcf.TestVCF)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pylon5/mcz3c2p/aglad/genomatnn/tests/test_vcf.py", line 16, in test_bcftools_query
    for line in vcf.bcftools_query("%CHROM\t%POS\n", self.vcf_file):
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/vcf.py", line 29, in bcftools_query
    stderr=subprocess.PIPE,
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'bcftools': 'bcftools'

======================================================================
ERROR: test_genotypes (tests.test_vcf.TestVCF)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pylon5/mcz3c2p/aglad/genomatnn/tests/test_vcf.py", line 33, in test_genotypes
    self.vcf_file, maf_thres=maf_thres, max_missing_thres=max_missing_thres,
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/vcf.py", line 110, in genotypes
    regions=regions,
  File "/pylon5/mcz3c2p/aglad/genomatnn/genomatnn/vcf.py", line 29, in bcftools_query
    stderr=subprocess.PIPE,
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/home/aglad/.conda/envs/genomatnn/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'bcftools': 'bcftools'

----------------------------------------------------------------------
Ran 25 tests in 61.832s

FAILED (errors=10)

make it easier to share simulations between analyses

E.g. it should be easy to share Sweep/CHB between a Nea->CHB analysis and a Den->CHB analysis.

make eval plots at the end of model training

It's inconvenient when testing hyperparameters to have to manually run eval on each trained model.

remove main blocks from scripts

Some of the *.py files have __main__ blocks I used for testing, etc. Useful functionality should be moved into genomatnn.py subcommands, or converted into unit tests, and the rest deleted.

incorporate additional validation simulations in evaluation plots

E.g. DFE simulations.

need test for extra_sims in do_eval

failed test during installation

Hi! I was following the installing instructions and got 6 errors. I'm wondering whether I could get some help with this? The errors are "AttributeError: 'str' object has no attribute 'decode'" and "FileNotFoundError: [Errno 2] No such file or directory: 'bgzip'" (I tried to fix the bgzip error by uninstalling and reinstalling the bgzip package, but still got the same error when running the tests again.) I've attached the full error message below.

Thank you in advance for your help!

======================================================================
ERROR: test suite for <class 'tests.test_cli.TestApply'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/Users/yiningliu/genomatnn/tests/test_cli.py", line 52, in setUpClass
    cli.main(f"train --seed 1 {cls.config_file}".split())
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 608, in main
    args.func(args.conf)
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 120, in do_train
    do_eval(conf)
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 171, in do_eval
    model = models.load_model(conf.nn_hdf5_file)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/tensorflow-2.4.1-py3.8-macosx-10.9-x86_64.egg/tensorflow/python/keras/saving/save.py", line 206, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/tensorflow-2.4.1-py3.8-macosx-10.9-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

======================================================================
ERROR: test suite for <class 'tests.test_cli.TestApply_phased'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/Users/yiningliu/genomatnn/tests/test_cli.py", line 52, in setUpClass
    cli.main(f"train --seed 1 {cls.config_file}".split())
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 608, in main
    args.func(args.conf)
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 120, in do_train
    do_eval(conf)
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 171, in do_eval
    model = models.load_model(conf.nn_hdf5_file)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/tensorflow-2.4.1-py3.8-macosx-10.9-x86_64.egg/tensorflow/python/keras/saving/save.py", line 206, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/tensorflow-2.4.1-py3.8-macosx-10.9-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

======================================================================
ERROR: test suite for <class 'tests.test_cli.TestEval'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/Users/yiningliu/genomatnn/tests/test_cli.py", line 52, in setUpClass
    cli.main(f"train --seed 1 {cls.config_file}".split())
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 608, in main
    args.func(args.conf)
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 120, in do_train
    do_eval(conf)
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 171, in do_eval
    model = models.load_model(conf.nn_hdf5_file)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/tensorflow-2.4.1-py3.8-macosx-10.9-x86_64.egg/tensorflow/python/keras/saving/save.py", line 206, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/tensorflow-2.4.1-py3.8-macosx-10.9-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

======================================================================
ERROR: test suite for <class 'tests.test_cli.TestEval_phased'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/Users/yiningliu/genomatnn/tests/test_cli.py", line 52, in setUpClass
    cli.main(f"train --seed 1 {cls.config_file}".split())
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 608, in main
    args.func(args.conf)
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 120, in do_train
    do_eval(conf)
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 171, in do_eval
    model = models.load_model(conf.nn_hdf5_file)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/tensorflow-2.4.1-py3.8-macosx-10.9-x86_64.egg/tensorflow/python/keras/saving/save.py", line 206, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/tensorflow-2.4.1-py3.8-macosx-10.9-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

======================================================================
ERROR: test suite for <class 'tests.test_cli.TestVcfplot'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 210, in run
    self.setUp()
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 293, in setUp
    self.setupContext(ancestor)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/suite.py", line 316, in setupContext
    try_run(context, names)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/nose/util.py", line 471, in try_run
    return func()
  File "/Users/yiningliu/genomatnn/tests/test_cli.py", line 52, in setUpClass
    cli.main(f"train --seed 1 {cls.config_file}".split())
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 608, in main
    args.func(args.conf)
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 120, in do_train
    do_eval(conf)
  File "/Users/yiningliu/genomatnn/genomatnn/cli.py", line 171, in do_eval
    model = models.load_model(conf.nn_hdf5_file)
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/tensorflow-2.4.1-py3.8-macosx-10.9-x86_64.egg/tensorflow/python/keras/saving/save.py", line 206, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/site-packages/tensorflow-2.4.1-py3.8-macosx-10.9-x86_64.egg/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
    model_config = json_utils.decode(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

======================================================================
ERROR: test_compare_ts_vcf_genotype_matrixes (tests.test_convert.TestGenotypeMatrixes)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yiningliu/genomatnn/tests/test_convert.py", line 167, in test_compare_ts_vcf_genotype_matrixes
    subprocess.run(["bgzip", vcf_file])
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/yiningliu/opt/anaconda3/envs/genomatnn/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'bgzip'

----------------------------------------------------------------------
Ran 25 tests in 1053.544s

FAILED (errors=6)

ERROR (EidosScript::Match): unexpected token '#0.'

Hi @grahamgower

I have another issue encountered while trying to run the tutorial on human data. I got an error after running this line: $ genomatnn sim -n 1000 Nea_to_CEU.toml. Here is the error message:

Thanks for help !

cache vcf genotype matrices

convert sim tree sequences to vcf

Hi Graham! Is there an easy way to convert the simulated tree sequences to vcf-like files (preferably with annotation of which mutation being the AI beneficial allele)? I'm asking for the purpose of cross-comparison between methods. I see that there's genotype matrices conversion option (train -c), which I thought could work too, but I don't see the matrices being written out as actual files in the zarrcache directory. Maybe I'm missing something here? Thanks!

change pyproject.toml to require oldest-supported-numpy

https://github.com/scipy/oldest-supported-numpy

add Nea_to_Papuans adaptive introgression model

add docs

some simulations run forever

A small proportion of simulation jobs run forever, maybe due to AF conditioning where the condition is very unlikely. Should figure out what's going on and if there's a simple fix. E.g. for Nea -> CEU:

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                
205154 srx907    20   0   87012  69876   3252 R 100.0  0.0   1254:50 slim -s 795194818 /tmp/tmpyu7uzdva.slim                                                                
215055 srx907    20   0   87472  68476   3224 R 100.0  0.0   1150:49 slim -s 1675221105 /tmp/tmpptuoy1cw.slim                                                               
218061 srx907    20   0   96640  80084   3252 R 100.0  0.0   1118:18 slim -s 1390539401 /tmp/tmpg7ao8pck.slim                                                               
237002 srx907    20   0   89860  64548   3252 R 100.0  0.0 910:50.10 slim -s 558990568 /tmp/tmpf1ywxpsb.slim                                                                
253361 srx907    20   0   87268  56436   3252 R 100.0  0.0 728:48.05 slim -s 2920547319 /tmp/tmpy5911jdq.slim                                                               
212630 srx907    20   0   86828  64724   3224 R 100.0  0.0   1176:05 slim -s 1042124329 /tmp/tmpamxyj_g8.slim                                                               
222823 srx907    20   0  103012  77716   3252 R 100.0  0.0   1066:16 slim -s 3986576135 /tmp/tmpq5i7e47m.slim                                                               
238159 srx907    20   0   80780  53624   3224 R 100.0  0.0 898:03.37 slim -s 442496812 /tmp/tmpb1w1ss13.slim                                                                
242698 srx907    20   0   85988  58664   3224 R 100.0  0.0 847:44.36 slim -s 3650201765 /tmp/tmpkoh9pv5q.slim

tmpamxyj_g8.slim.txt
tmpb1w1ss13.slim.txt
tmpf1ywxpsb.slim.txt
tmpg7ao8pck.slim.txt
tmpkoh9pv5q.slim.txt
tmpptuoy1cw.slim.txt
tmpq5i7e47m.slim.txt
tmpy5911jdq.slim.txt
tmpyu7uzdva.slim.txt

Nea/Den population labels overlap in genotype matrix plots

Somehow detect overlap and adjust vertical coordinates?

update for recent github actions changes

Github recently disabled the add-path command from their actions scripts, so now the 'github actions' continuous integration tests don't run. E.g.
https://github.com/grahamgower/genomatnn/pull/34/checks?check_run_id=1454407206

add "all chromosomes" predictions plot

rename `apply` subcommand to `predict`

contigs can be sampled from regions with recombination_rate=0

Should probably reject these and sample again.

restructure vcf handling

The current structure is awkward and inefficient. There's too many wrapper functions in cli.py, and the vcf is queried multiple times for overlapping queries. It's also not really sensible to cache the matrices after processing (#19) with this structure.

vcf2mat should be changed to yield genotype vectors for a single site, which are then accumulated into matrices for potentially overlapping windows. The matrices can then be resized/sorted after accumulation. This will avoid querying the vcf multiple times, which should give a nice speedup and hopefuly avoid some of the wrappers at the higher level.

AttributeError: can't set attribute

Hi,
Having successfully installed the package under the conda environment on CentOS Linux, I encountered this error while doing the test. It seems to me that this is a problem with python, but I don't really know how to debug it.
Thanks in advance
Sincerely,
Apdulai

$ nosetests -v tests
test_upsample_indexes (tests.test_calibrate.TestUpsample) ... ok
test_upsample_indexes_weighted (tests.test_calibrate.TestUpsample) ... ok
test_apply_creates_predictions (tests.test_cli.TestApply) ... ok
test_missing_config_file (tests.test_cli.TestApply) ... ok
test_missing_nn_hdf5_file (tests.test_cli.TestApply) ... ok
test_apply_creates_predictions (tests.test_cli.TestApply_phased) ... ok
test_missing_config_file (tests.test_cli.TestApply_phased) ... ok
test_missing_nn_hdf5_file (tests.test_cli.TestApply_phased) ... ok
test_eval_creates_plots (tests.test_cli.TestEval) ... ok
test_missing_config_file (tests.test_cli.TestEval) ... ok
test_missing_nn_hdf5_file (tests.test_cli.TestEval) ... ok
test_eval_creates_plots (tests.test_cli.TestEval_phased) ... ok
test_missing_config_file (tests.test_cli.TestEval_phased) ... ok
test_missing_nn_hdf5_file (tests.test_cli.TestEval_phased) ... ok
test_list_modelspecs (tests.test_cli.TestSim) ... ok
test_missing_config_file (tests.test_cli.TestSim) ... ok
test_sim (tests.test_cli.TestSim) ... ok
test_missing_config_file (tests.test_cli.TestTrain) ... ok
test_train (tests.test_cli.TestTrain) ... ok
test_train_from_cache (tests.test_cli.TestTrain) ... ok
test_missing_config_file (tests.test_cli.TestVcfplot) ... ok
test_vcfplot_creates_plot (tests.test_cli.TestVcfplot) ... ok
test_min_coalescent_time_ancient_samples (tests.test_contact.TestContact) ... ok
test_split_time (tests.test_contact.TestContact) ... ok
test_tmcra (tests.test_contact.TestContact) ... ok
test_exclusion_of_drawn_mutation (tests.test_convert.TestDrawnMutation) ... ERROR
test_compare_ts_vcf_genotype_matrixes (tests.test_convert.TestGenotypeMatrixes) ... ok
test_ts_genotype_matrix (tests.test_convert.TestGenotypeMatrixes) ... ok
test_reorder (tests.test_convert.TestSorting) ... ok
test_sort_similarity (tests.test_convert.TestSorting) ... ok
test_ts_pop_counts_indices (tests.test_convert.TestSorting) ... ok
test_verify_partition (tests.test_convert.TestSorting) ... ok
test_gt_bytes2vec (tests.test_misc.TestMisc) ... ok
test_basic_cnn (tests.test_tfstuff.TestModelConstruction) ... ok
test_per_population_permutation_invariant_cnn (tests.test_tfstuff.TestModelConstruction) ... ok
test_permutation_invariant_cnn (tests.test_tfstuff.TestModelConstruction) ... ok
test_accumulate_matrices (tests.test_vcf.TestVCF) ... ok
test_bcftools_query (tests.test_vcf.TestVCF) ... ok
test_genotypes (tests.test_vcf.TestVCF) ... ok

======================================================================
ERROR: test_exclusion_of_drawn_mutation (tests.test_convert.TestDrawnMutation)

Traceback (most recent call last):
File "/shared/ifbstor1/home/$USER/genomatnn/tests/test_convert.py", line 253, in test_exclusion_of_drawn_mutation
ts = slim.simulate(
File "/shared/home/$USER/.local/lib/python3.8/site-packages/stdpopsim/slim_engine.py", line 1027, in simulate
ts = pyslim.load(ts_file.name)
File "/shared/ifbstor1/home/$USER/genomatnn/.conda/envs/lib/python3.8/site-packages/pyslim/slim_tree_sequence.py", line 38, in load
ts = SlimTreeSequence.load(path, legacy_metadata=legacy_metadata)
File "/shared/ifbstor1/home/$USER/genomatnn/.conda/envs/lib/python3.8/site-packages/pyslim/slim_tree_sequence.py", line 225, in load
return cls(ts, reference_sequence=reference_sequence, legacy_metadata=legacy_metadata)
File "/shared/ifbstor1/home/$USER/genomatnn/.conda/envs/lib/python3.8/site-packages/pyslim/slim_tree_sequence.py", line 170, in init
self.reference_sequence = reference_sequence
AttributeError: can't set attribute

README typo?

Is conda activate genomatnn correct?

For me, only source activate genomatnn works.

add README

Issues with using pre-trained model

Hi Graham, I was trying to apply the pre-trained model to empirical data included in the /example/ directory using command

genomatnn apply Nea_to_CEU.toml Nea_to_CEU_af-0.05_2250018620.hdf5

and I encountered the below error message saying that cache data is missing. Is it possible to directly use one of the pre-trained models (for AI in modern humans), or do I have to train my own CNN with new simulations for it to work properly? Thanks so much!!

WARNING: There are non-GPU devices in tf.distribute.Strategy, not using nccl allreduce.
Traceback (most recent call last):
File "/opt/anaconda3/bin/genomatnn", line 33, in
sys.exit(load_entry_point('genomatnn==0.1.dev138+g7a51abd', 'console_scripts', 'genomatnn')())
File "/opt/anaconda3/lib/python3.8/site-packages/genomatnn-0.1.dev138+g7a51abd-py3.8-macosx-10.9-x86_64.egg/genomatnn/cli.py", line 625, in main
args.func(args.conf)
File "/opt/anaconda3/lib/python3.8/site-packages/genomatnn-0.1.dev138+g7a51abd-py3.8-macosx-10.9-x86_64.egg/genomatnn/cli.py", line 360, in do_apply
get_predictions(conf, pred_file, samples_file)
File "/opt/anaconda3/lib/python3.8/site-packages/genomatnn-0.1.dev138+g7a51abd-py3.8-macosx-10.9-x86_64.egg/genomatnn/cli.py", line 305, in get_predictions
data = convert.load_data_cache(cache)
File "/opt/anaconda3/lib/python3.8/site-packages/genomatnn-0.1.dev138+g7a51abd-py3.8-macosx-10.9-x86_64.egg/genomatnn/convert.py", line 383, in load_data_cache
raise RuntimeError(f"{cache} doesn't exist")
RuntimeError: zarrcache_256-rows doesn't exist

filter logging events from third party packages

Debug-level logging from matplotlib is very noisy and unhelpful.
stdpopsim/slim warnings about small population sizes are also unhelpful.

use keyword-only arguments in convert.py

Its too easy to mix up parameter order otherwise.

sort vcf genotype matrices before doing prediction

fix tf function retracing for genomattn `apply`

predict_on_batch outputs the warning below. Possibly need to change the generator to produce a fixed size per batch.

5 out of the last 27 calls to <function Model.make_predict_function..predict_function at 0x7f012446a050> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

Problem installing genomatnn conda environment from .yml file

Hello,

I'm trying to install the conda environment which is in the yml file on the gith but it doesn't seem to work "Solving environment" doesn't resolve.

I guess some package versions don't seem to be available anymore in 2023, for example what version of stdpopsim are you using? Or do you have an updated version of the environment?

I tried with stdpopsim version 0.1 but after installing genomatnn the tool doesn't seem to recognize msprime even though it exists in the environment (the package loads with python on the command terminal).
genomatnn.txt

the txt file contains the conda environment that I tried to build to run genomatnn (with stdpopsim=0.1) but which doesn't work either.

Thanks in advance!

Ps: I don't understand, I managed to install genomatnn in 2022 just by replacing the numpy version with 1.20.0.

add Nea_to_CHB adaptive introgression model

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.