rsinghlab / altumage Goto Github PK

Jupyter Notebook 100.00%

altumage's Introduction

AltumAge

AltumAge is a pan-tissue DNA methylation epigenetic clock based on deep learning. For the link to our paper published in npj Aging, please click here.

[New] AltumAge is available on pyaging

The easiest way to use AltumAge with your methylation data is through pyaging, our newly-released aging clock package. It is available on PyPi and can easily be installed via pip install pyaging. The tutorial for DNA methylation age prediction is available here.

Usage

In order to use AltumAge for age prediction, please follow the steps in example.ipynb. The example file also contains simple instructions to use Horvath's 2013 model for ease of comparison.

The main instructions to use AltumAge are as follows:

(1) Load required python packages:

The following packages must be installed. As of note, the model was trained with tensorflow 2.5.0, so beware of possible compatibility issues with other versions.

import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn import linear_model, preprocessing

(2) Load list of CpGs, methylation data, scaler, and AltumAge model:

From your Illumina 27k, 450k or EPIC array data, select the 20318 CpG sites from the file "CpGsites.csv" in the correct order.

cpgs = np.array(pd.read_pickle('example_dependencies/multi_platform_cpgs.pkl'))

Load the BMIQCalibration-normalized methylation data. It is crucial that the methylation beta values are normalized according to BMIQCalibration in R from "Horvath, S. DNA methylation age of human tissues and cell types." Genome Biol 14, 3156 (2013). https://doi.org/10.1186/gb-2013-14-10-r115. Moreover, reading the pickled example data only works in python version >= 3.8.

data = pd.read_pickle('example_dependencies/example_data.pkl')
real_age = data.age
methylation_data = data[cpgs]

Load the scaler, which transforms the distribution of beta values of each CpG site to mean = 0 and variance = 1.

scaler = pd.read_pickle('example_dependencies/scaler.pkl')

Finally, load AltumAge:

AltumAge = tf.keras.models.load_model('example_dependencies/AltumAge.h5')

(3) Scale the methylation data:

Scale the beta values of each CpG with sklearn robust scaler.

methylation_data_scaled = scaler.transform(methylation_data)

(4) Age prediction:

Finally, to predict age, simply use the following. The .flatten() command might be needed to transform the output into a 1D array.

pred_age_AltumAge = AltumAge.predict(methylation_data_scaled).flatten()

Voilà!

PyTorch compatibility

AltumAge's h5 tensorflow model has also been converted to the latest PyTorch 2.1 version. To use, just torch.load the AltumAge.pt file under the dependencies folder. Follow all of the preprocessing steps and just use the loaded model as usual.

Supplementary Results

The summary files are CSVs containing detailed information regarding the performance of AltumAge and Horvath's 2013 model by data set in the test set.

Data availability

To access the raw data and metadata from Array Express and Gene Expression Omnibus (GEO) or the organized, non-normalized methylation data, please access our Google Drive here.

Citation

To cite our study, please use the following:

de Lima Camillo, L.P., Lapierre, L.R. & Singh, R. A pan-tissue DNA-methylation epigenetic clock based on deep learning. npj Aging 8, 4 (2022). https://doi.org/10.1038/s41514-022-00085-y

BibTex citation:

@article {de_Lima_Camillo_AltumAge,
	author = {de Lima Camillo, Lucas Paulo and Lapierre, Louis R and Singh, Ritambhara},
	title = {A pan-tissue DNA-methylation epigenetic clock based on deep learning},
	year = {2022},
	doi = {10.1038/s41514-022-00085-y},
	publisher = {Springer Nature},
	URL = {https://doi.org/10.1038/s41514-022-00085-y},
	eprint = {https://www.nature.com/articles/s41514-022-00085-y.pdf},
	journal = {npj Aging}
}

License

Permission to use, copy, modify, and distribute this software and its documentation for any purpose other than its incorporation into a commercial product or service is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Brown University not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.

BROWN UNIVERSITY DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT SHALL BROWN UNIVERSITY BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

altumage's People

Contributors

Stargazers

Watchers

Forkers

joepalermo barardo sameelab dennisklose qyd720 ricomnl stwangkaiyan671012 tim1104 alessandrokuz

altumage's Issues

The function `transform_age` is missing

Hi,

Thanks for sharing this codebase. However, the module my_functions could not be found in this repo. Could you please add it?

Model missing

Is it possible you forgot to commit the 'AltumAge' model?

When trying to load:

AltumAge = tf.keras.models.load_model('AltumAge')

The following error gets thrown:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/framework/ops.py in _get_op_def(self, type)
   3928     try:
-> 3929       return self._op_def_cache[type]
   3930     except KeyError:

KeyError: 'MLCMatMul'

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py in load_internal(export_dir, tags, options, loader_cls, filters)
    888       try:
--> 889         loader = loader_cls(object_graph_proto, saved_model_proto, export_dir,
    890                             ckpt_options, filters)

~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py in __init__(self, object_graph_proto, saved_model_proto, export_dir, ckpt_options, filters)
    130     self._concrete_functions = (
--> 131         function_deserialization.load_function_def_library(
    132             meta_graph.graph_def.library))

~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/saved_model/function_deserialization.py in load_function_def_library(library, load_shared_name_suffix)
    339     with graph.as_default():
--> 340       func_graph = function_def_lib.function_def_to_graph(copy)
    341     _restore_gradient_functions(func_graph, renamed_functions)

~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/framework/function_def_to_graph.py in function_def_to_graph(fdef, input_shapes)
     57       input_shapes = input_shapes_attr.list.shape
---> 58   graph_def, nested_to_flat_tensor_name = function_def_to_graph_def(
     59       fdef, input_shapes)

~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/framework/function_def_to_graph.py in function_def_to_graph_def(fdef, input_shapes)
    219     else:
--> 220       op_def = default_graph._get_op_def(node_def.op)  # pylint: disable=protected-access
    221 

~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/framework/ops.py in _get_op_def(self, type)
   3932         # pylint: disable=protected-access
-> 3933         pywrap_tf_session.TF_GraphGetOpDef(self._c_graph, compat.as_bytes(type),
   3934                                            buf)

NotFoundError: Op type not registered 'MLCMatMul' in binary running on MacBook-Pro.local. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-f42698f734c6> in <module>
     10 
     11 #load AltumAge model
---> 12 AltumAge = tf.keras.models.load_model('AltumAge')

~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py in load_model(filepath, custom_objects, compile, options)
    210       if isinstance(filepath, six.string_types):
    211         loader_impl.parse_saved_model(filepath)
--> 212         return saved_model_load.load(filepath, compile, options)
    213 
    214   raise IOError(

~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/load.py in load(path, compile, options)
    142   for node_id, loaded_node in keras_loader.loaded_nodes.items():
    143     nodes_to_load[keras_loader.get_path(node_id)] = loaded_node
--> 144   loaded = tf_load.load_partial(path, nodes_to_load, options=options)
    145 
    146   # Finalize the loaded layers and remove the extra tracked dependencies.

~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py in load_partial(export_dir, filters, tags, options)
    763     A dictionary mapping node paths from the filter to loaded objects.
    764   """
--> 765   return load_internal(export_dir, tags, options, filters=filters)
    766 
    767 

~/anaconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py in load_internal(export_dir, tags, options, loader_cls, filters)
    890                             ckpt_options, filters)
    891       except errors.NotFoundError as err:
--> 892         raise FileNotFoundError(
    893             str(err) + "\n If trying to load on a different device from the "
    894             "computational device, consider using setting the "

FileNotFoundError: Op type not registered 'MLCMatMul' in binary running on MacBook-Pro.local. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
 If trying to load on a different device from the computational device, consider using setting the `experimental_io_device` option on tf.saved_model.LoadOptions to the io_device such as '/job:localhost'.

I would like to reproduce your results

Hi!

Great work on the paper. Also thanks for sharing your code! I would like to reproduce your results. Could you please share a requirements.txt to indicate the versions of your dependencies? Also will you be sharing your training code as well? I would like to train my own similar models but it might be helpful to first reproduce your model. I'm sure others will be curious about this as well.

Thanks!

BMIQcalibration

hi!
great work, cheers !

You recommend using BMIQCalibration-normalized methylation data. to do this we need to use BMIQcalibration() function which requires as SummarizedExperiment a clean beta-matrix created with clean-beta() function. This function reduces a beta-matrix to the 18747 CpGs used to calibrate methylation profiles in MEAT2.0. so obtained beta-matrix must miss some (at least 1571 probes) of 20318 CpG sites listed the file "CpGsites.csv". is that correct ? or am I missing something ?

Thank you!

Questions about missing CpGs

Hi,
I can reproduce your results using the example dataset. But have some problems to run AltumAge on our own dataset. I have two basic questions as below:

Q1: Does AltumAge need exactly 20318 CpGs to run the prediction? Q2: The author recommends using "BMIQcalibration" normalized beta values as input. However, it seems BMIQcalibration reduces "beta-matrix to the 18747 CpGs", how can we get the 20318 CpGs after BMIQcalibration ?

Thanks
Liguo

How do you handle missing CpGs?

Hello,
I am trying to use your work on a data set but I end up with a few missing CpGs after processing (or even without processing, there are 5 CpGs missing in my data set out of the 20318 you use in your model). Is there an imputing method you would recommend or a way of telling the model to ignore missing data?