alessiospuriomancini / cosmopower Goto Github PK

View Code? Open in Web Editor NEW

54.0 8.0 23.0 142.88 MB

Machine Learning - accelerated Bayesian inference

Home Page: https://alessiospuriomancini.github.io/cosmopower

License: GNU General Public License v3.0

Python 3.71% Jupyter Notebook 96.29%

machine-learning deep-learning tensorflow cosmology bayesian-inference

cosmopower's Introduction

Overview • Documentation • Installation • Getting Started • Training • Trained Models • Likelihoods • Support • Citation

Overview

CosmoPower is a library for Machine Learning - accelerated Bayesian inference. While the emphasis is on building algorithms to accelerate Bayesian inference in cosmology, the interdisciplinary nature of the methodologies implemented in the package allows for their application across a wide range of scientific fields. The ultimate goal of CosmoPower is to solve inverse problems in science, by developing Bayesian inference pipelines that leverage the computational power of Machine Learning to accelerate the inference process. This approach represents a principled application of Machine Learning to scientific research, with the Machine Learning component embedded within a rigorous framework for uncertainty quantification.

In cosmology, CosmoPower aims to become a fully differentiable library for cosmological analyses. Currently, CosmoPower provides neural network emulators of matter and Cosmic Microwave Background power spectra. These emulators can be used to replace Boltzmann codes such as CAMB or CLASS in cosmological inference pipelines, to source the power spectra needed for two-point statistics analyses. This provides orders-of-magnitude acceleration to the inference pipeline and integrates naturally with efficient techniques for sampling very high-dimensional parameter spaces. The power spectra emulators implemented in CosmoPower, and first presented in its release paper, have been applied to the analysis of real cosmological data from experiments, as well as having been tested against the accuracy requirements for the analysis of next-generation cosmological surveys.

CosmoPower is written entirely in Python. Neural networks are implemented using the TensorFlow library.

Documentation

Comprehensive documentation is available here.

Installation

We recommend installing CosmoPower within a Conda virtual environment. For example, to create and activate an environment called cp_env, use:

conda create -n cp_env python=3.11 pip && conda activate cp_env

Once inside the environment, you can install CosmoPower:

from PyPI
```
  pip install cosmopower
```
To test the installation, you can use
```
  python3 -c 'import cosmopower as cp'
```
If you do not have a GPU on your machine, you will see a warning message about it which you can safely ignore.

from source

  git clone https://github.com/alessiospuriomancini/cosmopower
  cd cosmopower
  pip install .

To test the installation, you can use

  pytest

Getting Started

CosmoPower currently provides two ways to emulate power spectra, implemented in the classes cosmopower_NN and cosmopower_PCAplusNN:

cosmopower_NN	cosmopower_PCAplusNN
a neural network mapping cosmological parameters directly to (log)-power spectra	a neural network mapping cosmological parameters to coefficients of a Principal Component Analysis (PCA) of the (log)-power spectra

Below you can find minimal working examples that use CosmoPower pre-trained models from the code release paper, shared in the trained_models folder (see the Trained models section for details) to predict power spectra for a given set of input parameters. You need to clone the repository and replace /path/to/cosmopower with the location of the cloned repository to make these examples work. Further examples are available as demo notebooks in the getting_started_notebooks folder, for both cosmopower_NN () and cosmopower_PCAplusNN ().

Note that, whenever possible, we recommend working with models trained on log-power spectra, to reduce the dynamic range. Both cosmopower_NN and cosmopower_PCAplusNN have methods to provide predictions (cf. cp_pca_nn.predictions_np in the example below) as well as "10^predictions" (cf. cp_nn.ten_to_predictions_np in the example below).

Using cosmopower_NN Using cosmopower_PCAplusNN

Using `cosmopower_NN`	Using `cosmopower_PCAplusNN`
import cosmopower as cp # load pre-trained NN model: maps cosmological parameters to CMB TT log-C_ell cp_nn = cp.cosmopower_NN(restore=True, restore_filename='/path/to/cosmopower'\ +'/cosmopower/trained_models/CP_paper/CMB/cmb_TT_NN') # create a dict of cosmological parameters params = {'omega_b': [0.0225], 'omega_cdm': [0.113], 'h': [0.7], 'tau_reio': [0.055], 'n_s': [0.96], 'ln10^{10}A_s': [3.07], } # predictions (= forward pass through the network) -> 10^predictions spectra = cp_nn.ten_to_predictions_np(params)	import cosmopower as cp # load pre-trained PCA+NN model: maps cosmological parameters to CMB TE C_ell cp_pca_nn = cp.cosmopower_PCAplusNN(restore=True, restore_filename='/path/to/cosmopower'\ +'/cosmopower/trained_models/CP_paper/CMB/cmb_TE_PCAplusNN') # create a dict of cosmological parameters params = {'omega_b': [0.0225], 'omega_cdm': [0.113], 'h': [0.7], 'tau_reio': [0.055], 'n_s': [0.96], 'ln10^{10}A_s': [3.07], } # predictions (= forward pass through the network) spectra = cp_pca_nn.predictions_np(params)

import cosmopower as cp

# load pre-trained NN model: maps cosmological parameters to CMB TT log-C_ell
cp_nn = cp.cosmopower_NN(restore=True, 
                         restore_filename='/path/to/cosmopower'\
                         +'/cosmopower/trained_models/CP_paper/CMB/cmb_TT_NN')

# create a dict of cosmological parameters
params = {'omega_b': [0.0225],
          'omega_cdm': [0.113],
          'h': [0.7],
          'tau_reio': [0.055],
          'n_s': [0.96],
          'ln10^{10}A_s': [3.07],
          }

# predictions (= forward pass through the network) -> 10^predictions
spectra = cp_nn.ten_to_predictions_np(params)

import cosmopower as cp

# load pre-trained PCA+NN model: maps cosmological parameters to CMB TE C_ell
cp_pca_nn = cp.cosmopower_PCAplusNN(restore=True, 
                                    restore_filename='/path/to/cosmopower'\
                                    +'/cosmopower/trained_models/CP_paper/CMB/cmb_TE_PCAplusNN')

# create a dict of cosmological parameters
params = {'omega_b': [0.0225],
          'omega_cdm': [0.113],
          'h': [0.7],
          'tau_reio': [0.055],
          'n_s': [0.96],
          'ln10^{10}A_s': [3.07],
          }

# predictions (= forward pass through the network)
spectra = cp_pca_nn.predictions_np(params)

Note that the suffix _np of the predictions_np and ten_to_predictions_np functions refer to their implementation using NumPy. These functions are best suited to standard analysis pipelines fully implemented in normal Python, normally run on Central Processing Units. For pipelines built using the TensorFlow library, highly optimised to run on Graphics Processing Units, we recommend the use of the corresponding _tf functions (i.e. predictions_tf and ten_to_predictions_tf) in both cosmopower_NN and cosmopower_PCAplusNN (see Likelihoods for further details and examples).

Training

The training_notebooks folder contains examples of how to:

These notebooks implement emulation of CMB temperature (TT) and lensing potential ( $\phi \phi$ ) power spectra as practical examples - the procedure is completely analogous for the matter power spectrum.

Trained Models

Trained models are available in the trained_models folder. The folder contains all of the emulators used in the CosmoPower release paper; as new models are trained, they will be shared in this folder, along with a description and BibTex entry of the relevant paper to be cited when using these models. Please consider sharing your own model in this folder with a pull request!

Please refer to the README file within the trained_models folder for all of the details on the models contained there.

Likelihoods

The likelihoods folder contains examples of likelihood codes sourcing power spectra from CosmoPower. Some of these likelihoods are written in pure TensorFlow, hence they can be run with highly optimised TensorFlow-based samplers, such as the ones from TensorFlow Probability. Being written entirely in TensorFlow, these codes can be massively accelerated by running on Graphics or Tensor Processing Units. We recommend the use of the predictions_tf and ten_to_predictions_tf functions within these pipelines, to compute (log)-power spectra predictions for input parameters. The likelihoods_notebooks folder contains an example of how to run a pure-Tensorflow likelihood, the Planck-lite 2018 TTTEEE likelihood .

Contributing, Support, Community

For bugs and feature requests consider using the issue tracker.

Contributions to the code via pull requests are most welcome!

For general support, please send an email to a dot spuriomancini at ucl dot ac dot uk, or post on GitHub discussions.

Users of CosmoPower are strongly encouraged to join the GitHub discussions forum to follow the latest news on the code as well as to discuss all things Machine Learning / Bayesian Inference in cosmology!

Citation

If you use CosmoPower at any point in your work please cite its release paper:

@article{SpurioMancini2022,
         title={CosmoPower: emulating cosmological power spectra for accelerated Bayesian inference from next-generation surveys},
         volume={511},
         ISSN={1365-2966},
         url={http://dx.doi.org/10.1093/mnras/stac064},
         DOI={10.1093/mnras/stac064},
         number={2},
         journal={Monthly Notices of the Royal Astronomical Society},
         publisher={Oxford University Press (OUP)},
         author={Spurio Mancini, Alessio and Piras, Davide and Alsing, Justin and Joachimi, Benjamin and Hobson, Michael P},
         year={2022},
         month={Jan},
         pages={1771–1788}
         }

If you use a specific likelihood or trained model then in addition to the release paper please also cite their relevant papers (always listed in the corresponding directory).

License

CosmoPower is released under the GPL-3 license (see LICENSE) subject to the non-commercial use condition (see LICENSE_EXT).

CosmoPower
Copyright (C) 2021 A. Spurio Mancini & contributors

This program is released under the GPL-3 license (see LICENSE), 
subject to a non-commercial use condition (see LICENSE_EXT).

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

cosmopower's People

Contributors

Stargazers

Watchers

cosmopower's Issues

Problem with ell_range

Hello alessio,
Please i am working on cosmosis, i took some time to understand your cosmosis pipeline and the code itself. but i have a problem with how you generate "ell_range". can you please help me further on understanding the (ell_range = training_features["mode"]).

tensorflow_probability version requirement < 0.22

Hi,

The latest version of tensorflow_probability does not seem to be compatible. However, version 0.21.0 is working. So, as well as having tensorflow < 2.14 as a requirement, this should be also I think.

Cheers
Matt

Please speed me up!

Hi guys,
I was on the brink of installing CLASS when I stumbled upon your paper yesterday.

Would appreciate using your pretrained NN for the spectra coming out of this (and eventually this). I hope early accessing people doesn't cause you too much troubles.

Specification for cosmopower network packaging

From a discussion with @alessiospuriomancini and @HTJense, we came up with a proposal for a specification for a yaml file which packages a cosmopower network.

The aims are for this packaging to:

Enable replicability/reusability and distribution of networks
Ensure 'safe' use of networks (e.g. only within trained parameter ranges)
Allow for fallback to the code being emulated (e.g. by including the full list of settings used in the code during training).
Allow automated enhancement of the training set (e.g. with reinforcement learning)

Note that the aim for this is to be flexible enough to work for things other than Boltzmann codes, and (I think) the interface with inference codes such as cobaya and cosmosis should be managed within those packages.

A fuzzy proposal for this specification is here (inspired by the one for camb from @HTJense attached):

network_name: 

emulated_code:
  name:
  version:

samples:
  N_training: 
  
  xmin:
  xmax:
  xbinning:
  
  extra_args:
    {non-default arguments that were used in the emulated code}

  full_args_file: {file containing the full arguments used in the emulated code}

networks:
  {observable_name}:
    type: NN
    log: True
    n_traits:
      n_hidden: [ ]
    training:
      validation_split: 
      learning_rates: [  ]
      batch_sizes: [ ]
      gradient_accumulation_steps: [ ]
      patience_values: [ ]
      max_epochs: [ ]
  

sampled_parameters:
  {par1}: [ ,  ]
  {par2}: "lambda par1: 1e-10 * np.exp(par1)"
  
  drop: [ par1 ]

derived: [  ]

lcdm.yaml.txt

Plank low-py

Dear all,

Do you already have an implementation of the Planck low ell bins? If not I have added the planck-low py (lognorm bins from https://github.com/heatherprince/planck-low-py) into the tf likelihood you have for Planck lite high ell TTTEEE by rewriting the functions in tf format and allowing them to take multiple Cl inputs if you would be interested in me sharing.

Best wishes,

Alex

New tensorflow version breaks restoring from pickle

The recent new version of tensorflow (2.14.0) moves (or removes? I can't actually find the functions in the new version) the tensorflow.python.training.tracking sub-module.

This seems to break restoring networks from pickle files:

(Pdb) tf.__version__
'2.14.0'
(Pdb) filename
'/-----/CosmoPower/CP_paper/CMB/cmb_TT_NN'
(Pdb) pickle.load(open(filename + ".pkl", 'rb'))
*** ModuleNotFoundError: No module named 'tensorflow.python.training.tracking'

I guess the dependency could be pinned tensorflow<2.14.0.

Replace deprecated sklearn requirement

When running the tests for SOLikeT, a (intermittent!) failure happens because of the inherited dependency on sklearn with the following message:

      The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
      rather than 'sklearn' for pip commands.
      
      Here is how to fix this error in the main use cases:
      - use 'pip install scikit-learn' rather than 'pip install sklearn'
      - replace 'sklearn' by 'scikit-learn' in your pip requirements files
        (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
      - if the 'sklearn' package is used by one of your dependencies,
        it would be great if you take some time to track which package uses
        'sklearn' instead of 'scikit-learn' and report it to their issue tracker
      - as a last resort, set the environment variable
        SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error
      
      More information is available at
      https://github.com/scikit-learn/sklearn-pypi-package
      
      If the previous advice does not cover your use case, feel free to report it at
      https://github.com/scikit-learn/sklearn-pypi-package/issues/new
      [end of output]

Here is me doing the third bullet point ;-)

TE emulation accuracy issues

Dear Dr Spurio Mancini,

Firstly, many thanks for making comsopower so easy to use- it's a really nice package! I am having some trouble with getting a good accuracy for a CMB TE emulator and was wondering if you could give me some pointers on how to improve this. The outline of the notebook I am using to train is as follows:

Load in training/test data produced with class - this is ~600000 data points from a sobol sequence in a parameter space [h, omega_m, omega_b, n_s, sigma8, tau_reio, A_lens and m_nu].
Define the cosmopower PCA and use 512 components (following the prescription in the paper) using cosmopower_PCAplusNN
Train the model with the following specifications: (again tried to match up batch size and LR with the description in the paper but maybe there is something I am missing here? )
# cooling schedule
validation_split=0.1,
learning_rates=[1e-2, 1e-3, 1e-4, 1e-5, 1e-6],
batch_sizes=[1024, 2048, 4096, 10000, 50000],
gradient_accumulation_steps = [1, 1, 1, 1, 1],
# early stopping set up
patience_values = [100,100,100,100,100],
max_epochs = [1000,1000,1000,1000,1000],
)
Test the trained model against the test data. In this stage, it is clear to see that whilst the NN behaves well for some input parameters it is way off for others (see attached plot in the style of your example notebooks)

I am aware that introducing A_lens and m_nu means I would need probably more training data compared to what you have in the LCDM set-up so I am currently producing this but I am wondering if you notice something else that I can change to improve this accuracy.

Best wishes and many thanks in advance,

Alex Reeves
examples_reconstruction_PP.pdf

KeyError: 'obch2 is not a file in the archive'

import numpy as np
import pyDOE as pyDOE

number of parameters and samples

n_params = 7
n_samples = 400000

parameter ranges

obh2 = np.linspace(0.019, 0.026, n_samples)
omch2 = np.linspace(0.051, 0.255, n_samples)
h0 = np.linspace(0.64, 0.82, n_samples)
n_s = np.linspace(0.84, 1.1, n_samples)
s_8_input = np.linspace(0.1, 1.3, n_samples)
logt_agn = np.linspace(7.6, 8.0, n_samples)
A = np.linspace(-6.0, 6.0, n_samples)

LHS grid

AllParams = np.vstack([obh2, omch2, h0, n_s, s_8_input, logt_agn, A])
lhd = pyDOE.lhs(n_params, samples=n_samples, criterion=None)
idx = (lhd * n_samples).astype(int)

AllCombinations = np.zeros((n_samples, n_params))
for i in range(n_params):
AllCombinations[:, i] = AllParams[i][idx[:, i]]

saving

params = {'obh2': AllCombinations[:, 0],
'omch2': AllCombinations[:, 1],
'h0': AllCombinations[:, 2],
'n_s': AllCombinations[:, 3],
's_8_input': AllCombinations[:, 4],
'logt_agn': AllCombinations[:, 5],
'A': AllCombinations[:, 6]
}

np.savez('your_LHS_parameter_file.npz', **params)

/Users/apple/base/lib/python3.9/site-packages/numpy/lib/npyio.py:232: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if key in self._files:
/Users/apple/base/lib/python3.9/site-packages/numpy/lib/npyio.py:234: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
elif key in self.files:

KeyError Traceback (most recent call last)
Input In [18], in <cell line: 4>()
1 np.savez_compressed('your_LHS_parameter_file.npz', obh2=obh2, omch2=omch2, h0=h0, n_s=n_s, s_8_input=s_8_input,
2 logt_agn=logt_agn, A=A)
3 b = np.load('your_LHS_parameter_file.npz')
----> 4 print(b[A])

File ~/base/lib/python3.9/site-packages/numpy/lib/npyio.py:249, in NpzFile.getitem(self, key)
247 return self.zip.read(key)
248 else:
--> 249 raise KeyError("%s is not a file in the archive" % key)

KeyError: '[-6. -5.99997 -5.99994 ... 5.99994 5.99997 6. ] is not a file in the archive'

Request of Dataset

Hello Allessio, i really appreciate your response so far. Please can i request for the data set you use esp in csv or dat format.

Example code error

Cf. simonsobs/SOLikeT#182