This repository contains the code for the reproducibility of the experiments presented in the paper "Object-Centric Relational Representations for Image Generation" (TMLR). We propose GraPhOSE: a framework, based on graph neural networks, to condition the generation of images with an arbitrary number of objects. This framework allows for the joint conditioning of each object's structural arrangement and semantic properties.
Authors: Luca Butera, Andrea Cini, Alberto Ferrante, Cesare Alippi
Conditioning image generation on specific features of the desired output is a key ingredient of modern generative models. However, existing approaches lack a general and unified way of representing structural and semantic conditioning at diverse granularity levels. This paper explores a novel method to condition image generation, based on object-centric relational representations. In particular, we propose a methodology to condition the generation of objects in an image on the attributed graph representing their structure and the associated semantic information. We show that such architectural biases entail properties that facilitate the manipulation and conditioning of the generative process and allow for regularizing the training procedure. The proposed conditioning framework is implemented by means of a neural network that learns to generate a 2D, multi-channel, layout mask of the objects, which can be used as a soft inductive bias in the downstream generative task. We also propose a novel benchmark for image generation consisting of a synthetic dataset of images paired with their relational representation.
Our framework, in the figure, consists of two networks: the mask generator, which takes the pose-graph (containing only geometric properties) as input, and the encoder, which takes the pose-graph with semantic attributes. These two models' outputs are then combined to produce a 2D multi-channel layout mask that can be used to condition a downstream generative model. In this case it is a Generative Adversarial Network.
Sample of the different objects that can be generated in our Pose-Representable Objects (PRO) dataset. Each object can be rendered starting from a corresponding attributed pose-graph, which in turn is randomly generated given a set of object specific constraints. Synthetic images can contain more than one object at once.
All commands are meant to be run from the project's root folder.
Install the conda package manager and run
conda env create -f environment_<device>.yml
where device
is either cpu
(Linux no GPU or Intel MacOS), cuda
(Linux with GPU) or mps
(Apple Silicon MacOS) depending on your platform.
First activate the environment with
conda activate graphose
then run
python main.py <options>
Mask Generator Pre-train
python main.py +datamodule=pose_mask +model=mask +predictor=mask
Checkpoint paths can be retrieved under outputs/<ID>/checkpoints
.
GraPhOSE
python main.py +datamodule=objects +model=graphose_gan +predictor=gan \
+predictor.low_lr_mul=0.5 trainer.max_epochs=300 monitor=val/fid \
+predictor.pretrain=<pretrained-checkpoint-absolute-path>
No Pre-train GraPhOSE
python main.py +datamodule=objects +model=graphose_gan +predictor=gan \
+predictor.low_lr_mul=1.0 trainer.max_epochs=300 monitor=val/fid
No Mask Learning GraPhOSE
python main.py +datamodule=objects +model=graphose_nomask_gan +predictor=gan \
trainer.max_epochs=300 monitor=val/fid
Non Relational Baseline
python main.py +datamodule=objects +model=baseline_gan +predictor=gan \
trainer.max_epochs=300 monitor=val/fid
- Prepend
GRAPHOSE_BIG_ARCHITECTURES=1
and replace+model=<model>_gan
with+model=<model>_biggan
to train the alternative downstream generator. - Add option
trainer.gpus=1
to train on a single GPU. - Set
datamodule.num_workers
to select the number of workers for data loading. - Add
+predictor.accumulate_gradient=<n>
anddatamodule.batch_size=<64/n>
to train on batch sizes that do not fit in memory. - Use
datamodule.resolution=64
to train for 64x64 resolution (works also for pre-training). - In generale, options can be added with
+<option>=<value>
. See theconfig
folder to inspect config files and the classes they refer to. Possible options correspond to parameter names in the respective target classes. See hydra for more information.
If you find this code useful please consider to cite our paper:
@article{
butera2024objectcentric,
title={Object-Centric Relational Representations for Image Generation},
author={Luca Butera and Andrea Cini and Alberto Ferrante and Cesare Alippi},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2024},
url={https://openreview.net/forum?id=7kWjB9zW90},
note={}
}