samsartor / matfusion Goto Github PK

SVBRDF estimation from photographs for three different lighting conditions (directional, natural, and flash/no-flash illumination) is shown by refining a novel SVBRDF diffusion backbone model.

Home Page: https://samsartor.com/matfusion

License: Other

Rust 1.24% Python 1.68% Jupyter Notebook 97.03% Julia 0.04%

matfusion's Introduction

MatFusion [Paper] [Website]

SVBRDF estimation from photographs for three different lighting conditions (directional, natural, and flash/no-flash illumination) is shown by refining a novel SVBRDF diffusion backbone model, named MatFusion.

Citation

@conference{Sartor:2023:MFA,
    author    = {Sartor, Sam and Peers, Pieter},
    title     = {MatFusion: a Generative Diffusion Model for SVBRDF Capture},
    month     = {December},
    year      = {2023},
    booktitle = {ACM SIGGRAPH Asia Conference Proceedings},
    url       = {https://doi.org/10.1145/3610548.3618194},
}

Installation

You will need Git, Python, and Conda on a system with at least CUDA 11 and CUDNN 8.2. This code is only tested on Linux, but Windows and MacOS may be usable with some tweaks.

git clone 'https://github.com/samsartor/matfusion' && cd matfusion
git submodule update --init --recursive
conda env create -f environment.yml
conda activate matfusion
pip install -e matfusion_diffusers
ipython kernel install --user --name=matfusion

Before running a notebook in Jupyter, make sure to set your kernel to "matfusion".

Optional Dependencies

When running more complicated evaluation and training jobs you may need additional dependencies.

To render SVBRDFs under environment lighting or with global illumination, you should install Blender version 3.6 or higher.

To compile the matfusion_jax.data module, install Rust version 1.65 or higher and compile the dataloader_rs package.

cargo build --manifest-path dataloader_rs/Cargo.toml --release

In order to finetune the flash/no-flash model you will also need to pass --features ndarray-linalg to cargo, which will download and compile openblas so that the dataloader can simulate imperfect camera alignment between pairs of rendered images.

You may also need a Julia environment compatible with language version 1.7. Also install the required julia packages by running the julia command and entering:

import Pkg
Pkg.add(name="Images", version="0.23.3")
Pkg.add(name="FFTW", version="1.6.0")
Pkg.add(name="ProgressMeter", version="1.7.2")

These dependencies are NOT needed if you only use the demo notebooks or otherwise the write code to assemble batches yourself. They are only needed when using our dataset processing code through scripts like train.py and eval.py.

Pretrained Models

The following pretrained models are avalible.

Model Finetuning	Framework	Version	Download
Backbone	Jax	1 (corrected)	unconditional_v1_corrected_jax.tar.lz4
Flash	Jax	1	flash_v1_jax.tar.lz4
Environment	Jax	1	env_v1_jax.tar.lz4
Flash/No-flash	Jax	1	fnf_v1_jax.tar.lz4
Backbone	Diffusers	1 (corrected)	unconditional_v1_corrected_diffusers.tar.lz4
Flash	Diffusers	1	flash_v1_diffusers.tar.lz4
Environment	Diffusers	1	env_v1_diffusers.tar.lz4
Flash/No-flash	Diffusers	1	fnf_v1_diffusers.tar.lz4

To use any of the pretrained models above, untar the downloaded archive into the checkpoints folder. For example, before running the flash finetuning for Jax, your directory should look like

checkpoints
├── flash_v1_jax.tar.lz4
└── flash_v1_jax
    ├── checkpoint.msgpack
    ├── LICENSE.txt
    └── mode.json

Inference

The easiest way to run MatFusion on your own photographs is with the matfusion_jax_demo.ipynb and matfusion_diffusers_demo.ipynb Jupyter notebooks.

Alternatively you can create an !Images dataset for your photographs. For example, the Lookahead SVBRDF paper by Zhou and Kalantari provides a selection of real flash-lit photographs which you can download from https://drive.google.com/file/d/1kzJicyd9Dn-cGNWJCDqJ4fuh5b_NDajW/view. Unzip into the datasets directory and rename the OurReal directory to datasets/lookahead_real_inputs. Then you can batch-process the test images with eval.py as below. That zip file also contains the MaterialGAN eval dataset in the MGReal directory, which should be renamed to datasets/materialgan_real_inputs.

python eval.py \
    --dataset datasets/real_test_materialgan.yml \
    --checkpoint checkpoints/flash_v1_jax \
    --output results/flash_v1_on_materialgan`

Datasets

MatFusion was trained on three different datasets of SVBRDFs which each have a slightly different download process. Some fine tunings also require path-traced images of those SVBRDFs.

INRIA

The inria dataset can be downloaded from https://team.inria.fr/graphdeco/projects/deep-materials/. Unzip it into the datasets directory of this repo so that datasets/DeepMaterialsData is populated by lots of png files and then run python scripts/convert_inria.py. You should see the script create a datasets/inria_svbrdfs folder.

These SVBRDFs are distributed under a CC BY-NC-ND 2.0 licence.

CC0

Download and untar cc0_svbrdfs.tar.lz4 into the datasets directory so that it contains a datasets/cc0_svbrdfs folder.

These SVBRDFs are collected from PolyHaven and AmbientCG, and are distributed under the CC0 licence.

Mixed

Download and untar mixed_svbrdfs.tar.lz4 into the datasets directory so that it contains a datasets/mixed_svbrdfs folder.

These SVBRDFs are derived from the above INRIA and CC0 datasets, and so are distributed under a combination of the two licences. This usage of the INRIA dataset has been permitted by Adobe.

Rendering

Rendering the SVBRDFs is very CPU intensive and also requires about 1TB of free storage (since the renders are stored in OpenEXR format). This data is impossible to distribute online because of its size, but feel free to email us if rendering it yourself proves impossible.

First you should integrate the normal maps to produce displacement maps by running julia scripts/integrate-normals.jl datasets/*_svbrdfs and waiting a few hours.

To open several background Blender workers to render all the maps over a few days, run dataloader_rs/target/release/renderall datasets/train_rendered_env.yml. You may need to adjust worker and thread counts for optimal performance, since the defaults are for a machine with 128 cores.

Test Data

Our paper also presents a new test set, made up of diverse SVBRDFs from a variety of sources. Download and untar test_svbrdf.tar.lz4 into the datasets directory so that it contains a test_svbrdfs folder.

Compute Error Statistics

You can use the eval.py and score.py scripts to run rigorous evaluations of MatFusion. For example, to evaluate the flash model on our test set (once downloaded), use:

python eval.py \
    --dataset ./datasets/test_rasterized.yml \
    --checkpoint ./checkpoints/flash_v1_jax \
    --output ./results/flash_v1_on_test
python score.py --output  ./results/flash_v1_on_test

The error numbers can be viewed with the view_eval.ipynb Jupyter notebook.

Training & Finetuning

All of the various training routines can be accomplished with the train.py script. For our training we generally used 4x NVIDIA A40 GPUs each with 45GB of memory. If you are using more or less compute you should probably adjust the batch size and learning rate by passing -O batch_size={BATCH_SIZE} and -O lr={LEARNING_RATE} or by passing a custom --mode_json {PATH_TO_JSON_FILE}. Alternatively, we provide gradient accumulation with the --accumulation option, but it is not very well tested.

Unconditional Backbone Model

python train.py \
    --mode CONVNEXT_V1_UNCONDITIONAL_MODE \
    --epocs 50 \
    --dataset ./datasets/train_rasterized.yml

Flash Finetuning

python train.py \
    --finetune_checkpoint checkpoints/unconditional_v1_jax \
    --mode CONVNEXT_V1_DIRECT_RAST_IMAGES_MODE \
    --epocs 19 \
    --dataset ./datasets/train_rasterized.yml

Environment-Lit Finetuning

python train.py \
    --finetune_checkpoint checkpoints/unconditional_v1_jax \
    --mode CONVNEXT_V1_DIRECT_RENDERED_IMAGES_MODE \
    --epocs 19 \
    --dataset ./datasets/train_rendered_env.yml

Flash/No-Flash Finetuning

python train.py \
    --finetune_checkpoint checkpoints/unconditional_v1_jax \
    --mode CONVNEXT_V1_DIRECT_RENDERED_OTHER_IMAGES_MODE \
    --epocs 19 \
    --dataset ./datasets/train_rendered_fnf.yml

Project Structure

train.py is our training script
eval.py is our evaluation script
score.py does error number computation for evaluations
matfusion_jax/ is our from-scratch diffusion implementation in Jax/Flax
- model.py contains logic for saving/loading/initing/training models
- net/resnet.py implements fundamental model layers
- net/mine.py implements our actual diffusion backbone
- pipeline.py defines diffusion schedule and samplers
- config.py has all the default model configuration modes
- nprast.py implements the cook-torrance SVBRDF rasterizer
- vis.py has lots of utils for displaying results during training and evaluation
matfusion_diffusers/ is our fork of huggingface diffusers needed to run the MatFusion model in PyTorch
dataloader_rs/ is our Rust codebase for managing and generating datasets
- src/lib.rs exposes the API
- src/gen.rs actually does the dataloading in a Python-accessible way
- src/loaders.rs has the logic for loading different dataset formats and modes
- src/ids.rs identifies training samples and the textures that make them up
- src/form.rs turns raw image files on disk into actual SVBRDFs and tonemapped renderings
- src/warp.rs is used for transforming images and SVBRDFs, including to pixel-align pairs of flash/no-flash renderings
- src/bin/ has executable Rust scripts
  - renderall.rs dispatches bulk blender render jobs as defined by the render_synthetic scripts
  - mixsvbrdfs.rs implements our mixture augmentation as used to make the mixed dataset
scripts/ contains misc utilities
- compress_exrs.jl is a Julia script for compressing large numbers of EXR files
- integrate_normals.jl is a Julia script for integrating normal maps to produce heightmaps
- convert_to_diffusers.py can convert our checkpoint files into huggingface-compatable checkpoint files
- prune_checkpoint.py removes checkpointed parameters that are not needed for inference
- render_synthetic/ has our synthetic blender scenes and related python scripts
datasets/ contains the various datasets and corresponding YAML specifications
checkpoints/ can contain various pretrained models

matfusion's People

Contributors

Stargazers

Watchers

Forkers

peterzs beaveringreenland harrywang355 night1099

matfusion's Issues

mode.json unavailable

Hello,
When I try to run model = Model.from_checkpoint('./checkpoints/flash_v1_jax/') from matfusion_jax_flash_demo.ipynb I encounter the following error: FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/flash_v1_jax/mode.json'

Could you tell me how to solve this error, please? The checkpoints don't seem to have mode.json, and it doesn't work if we replace it by config.json.

Unconditional generation using pretrained backbone checkpoint

Hi @samsartor,

Thanks for your amazing work! I'm pretty interested in your code, and want to use the pretrained backbone (unconditional_v1_diffusers.tar.lz4) to generate random BRDFs without conditions. However, the generated output does not make sense. Could you please help me find out what is wrong?

Here is my attempt by modifying the matfusion_diffusers_flash_demo.ipynb:

#!/usr/bin/env python
# coding: utf-8

# In[1]:


import os
os.environ['CUDA_VISIBLE_DEVICES'] = '3'


# In[2]:


from diffusers import EulerAncestralDiscreteScheduler, DDIMScheduler, UNet2DModel, DDPMScheduler


# In[3]:


from matfusion_jax.vis import display_image, display_svbrdf, show_svbrdf


# In[4]:


import imageio.v3 as iio
import cv2
import numpy as np
import torch
from pathlib import Path
from tqdm import tqdm


# # Model Loading

# In[5]:


device = 'cuda'


# In[6]:


model = UNet2DModel.from_pretrained('./checkpoints/unconditional_v1_diffusers/').to(device)


# # Run MatFusion using Huggingface Diffusers

# In[7]:


euler_a_schedule = EulerAncestralDiscreteScheduler(
    beta_schedule='linear',
    prediction_type='v_prediction',
    timestep_spacing="linspace",
)
ddim_schedule = DDIMScheduler(
    beta_schedule='linear',
    prediction_type='v_prediction',
    rescale_betas_zero_snr=True,
    clip_sample=False,
    timestep_spacing="linspace",
)
ddpm_schedule = DDPMScheduler(
    beta_schedule='linear',
    prediction_type='v_prediction',
    clip_sample=False,
    timestep_spacing="linspace",
)
schedule = euler_a_schedule
schedule.set_timesteps(20)
timestep_mult = model.config.get('timestep_mult', 1/1000)
vis_freq = 1


# In[8]:


diffusion_frames = []

with torch.no_grad():
    y = torch.randn(1, 10, 256, 256, device=device, dtype=torch.float32)
    y = y * schedule.init_noise_sigma

    for t in tqdm(schedule.timesteps):
        noisy_svbrdf = schedule.scale_model_input(y, t)
        model_output = model(
            noisy_svbrdf,
            t*timestep_mult,
        ).sample

        step_output = schedule.step(model_output, t, y)
        y = step_output.prev_sample

        svbrdf_est = (step_output.pred_original_sample * 0.5 + 0.5).clamp(0, 1).permute(0, 2, 3, 1).cpu().numpy()
        if int(t) % vis_freq == 0:
            diffusion_frames.append(svbrdf_est)

show_svbrdf(np.concatenate(diffusion_frames), horizontal=True, gamma=2.2)


# In[9]:


svbrdf_img = display_svbrdf(svbrdf_est[0], horizontal=True, format='png', gamma=2.2)
Path('./demo/pink_svbrdf.png').write_bytes(svbrdf_img.data) # optional: save the svbrdf to disk
svbrdf_img

Here is the generated result:

Also, I really appreciate that if there will be a official demo using the pretrained backbone, and think many people will find the backbone interesting.

Thank you!

Is it possible to improve the resolution to 512x512 or 1024x1024?

Thank you for your work!
As I skimmed through your paper, I saw that you produce SVBRDF maps of 256x256. Is there a way to generate bigger svbrdf maps?

Training/finetuning code in diffusers

Thank you for this amazing work!

I notice that there are pre-trained checkpoints and demos in both Jax and Diffusers frameworks, but the training code only support sJax. Would it be possible to also release the diffusers version of the training code? I'm trying to finetune your model on a customized dataset and would need the resulting model to be a diffusers checkpoint.

ValueError: ConvnextDownBlock2D does not exist

Hello,
When I tried to run this line model = UNet2DModel.from_pretrained('./checkpoints/env_v1_diffusers/').to(device) from matfusion_diffusers_env_demo.ipynb, I encountered the following error. It seems that ConvnextDownBlock2D isn't recognized.

The config attributes {'convnext_channels_mult': 4, 'convnext_time_embedding_activation': False, 'mid_act_fn': 'gelu', 'mid_block_type': 'ConvnextMidBlock2D', 'wrong_heads': True} were passed to UNet2DModel, but are not expected and will be ignored. Please verify your config.json configuration file.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 model = UNet2DModel.from_pretrained('./checkpoints/env_v1_diffusers/').to(device)

File .../anaconda3/envs/matfusion/lib/python3.10/site-packages/diffusers/models/modeling_utils.py:611, in ModelMixin.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    608 if low_cpu_mem_usage:
    609     # Instantiate model with empty weights
    610     with accelerate.init_empty_weights():
--> 611         model = cls.from_config(config, **unused_kwargs)
    613     # if device_map is None, load the state dict and move the params from meta device to the cpu
    614     if device_map is None:

File .../anaconda3/envs/matfusion/lib/python3.10/site-packages/diffusers/configuration_utils.py:254, in ConfigMixin.from_config(cls, config, return_unused_kwargs, **kwargs)
    251         init_dict[deprecated_kwarg] = unused_kwargs.pop(deprecated_kwarg)
    253 # Return model and optionally state and/or unused_kwargs
--> 254 model = cls(**init_dict)
    256 # make sure to also save config parameters that might be used for compatible classes
    257 model.register_to_config(**hidden_dict)

File .../anaconda3/envs/matfusion/lib/python3.10/site-packages/diffusers/configuration_utils.py:636, in register_to_config.<locals>.inner_init(self, *args, **kwargs)
    634 new_kwargs = {**config_init_kwargs, **new_kwargs}
    635 getattr(self, "register_to_config")(**new_kwargs)
--> 636 init(self, *args, **init_kwargs)

File .../anaconda3/envs/matfusion/lib/python3.10/site-packages/diffusers/models/unet_2d.py:164, in UNet2DModel.__init__(self, sample_size, in_channels, out_channels, center_input_sample, time_embedding_type, freq_shift, flip_sin_to_cos, down_block_types, up_block_types, block_out_channels, layers_per_block, mid_block_scale_factor, downsample_padding, downsample_type, upsample_type, act_fn, attention_head_dim, norm_num_groups, norm_eps, resnet_time_scale_shift, add_attention, class_embed_type, num_class_embeds)
    161     output_channel = block_out_channels[i]
    162     is_final_block = i == len(block_out_channels) - 1
--> 164     down_block = get_down_block(
    165         down_block_type,
    166         num_layers=layers_per_block,
    167         in_channels=input_channel,
    168         out_channels=output_channel,
    169         temb_channels=time_embed_dim,
    170         add_downsample=not is_final_block,
    171         resnet_eps=norm_eps,
    172         resnet_act_fn=act_fn,
    173         resnet_groups=norm_num_groups,
    174         attention_head_dim=attention_head_dim if attention_head_dim is not None else output_channel,
    175         downsample_padding=downsample_padding,
    176         resnet_time_scale_shift=resnet_time_scale_shift,
    177         downsample_type=downsample_type,
    178     )
    179     self.down_blocks.append(down_block)
    181 # mid

File .../anaconda3/envs/matfusion/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py:227, in get_down_block(down_block_type, num_layers, in_channels, out_channels, temb_channels, add_downsample, resnet_eps, resnet_act_fn, transformer_layers_per_block, num_attention_heads, resnet_groups, cross_attention_dim, downsample_padding, dual_cross_attention, use_linear_projection, only_cross_attention, upcast_attention, resnet_time_scale_shift, attention_type, resnet_skip_time_act, resnet_out_scale_factor, cross_attention_norm, attention_head_dim, downsample_type)
    214 elif down_block_type == "KCrossAttnDownBlock2D":
    215     return KCrossAttnDownBlock2D(
    216         num_layers=num_layers,
    217         in_channels=in_channels,
   (...)
    225         add_self_attention=True if not add_downsample else False,
    226     )
--> 227 raise ValueError(f"{down_block_type} does not exist.")

ValueError: ConvnextDownBlock2D does not exist.

Cant start Unconditional Backbone Model training

Hello, I set up my dataset in the exact same format as train_rasterized.yml wants for the premade cc0 dataset example you provide, with the same folder layout

8xsorted
|
|----diffuse
|----|--0002_particle_board_rot0_diffuse.png
|----height
|----|--0002_particle_board_rot0_height.png
|----normals
|----|--0002_particle_board_rot0_normals.png
|----roughness
|----|--0002_particle_board_rot0_roughness.png
|----specular
|----|--0002_particle_board_rot0_specular.png

all with exact same file names plus _xmap for each map across dataset

i set up enviroment and dependices, go to run train.py

python train.py --mode CONVNEXT_V1_UNCONDITIONAL_MODE --epocs 50 --dataset ./datasets/train_rasterized.yml --workers 8 --wandb --run_name brdf1

and get

python train.py --mode CONVNEXT_V1_UNCONDITIONAL_MODE --epocs 50 --dataset ./datasets/train_rasterized.yml --workers 8 --wandb --run_name brdf16
{
  "inputs": [],
  "input_channels": 0,
  "svbrdf_geo": "normals",
  "condition": "none",
  "lr": 2e-05,
  "lr_warmup": 10000,
  "batch_size": 32,
  "use_ema": true,
  "ema_decay": 0.9999,
  "timestep_mult": 0.001,
  "noise_model": {
    "block": "convnext",
    "inputs": 0,
    "channels": 10,
    "cond_mlp_inputs": 128,
    "mid_activation": "gelu",
    "wrong_heads": true,
    "cond_activation": false,
    "features": [
      128,
      256,
      512,
      512,
      1024,
      1024
    ]
  },
  "opt": "adamw",
  "timestep_channels": 128,
  "channels": 10,
  "zero_snr": true,
  "name": "CONVNEXT_V1_UNCONDITIONAL_MODE"
}
wandb: Currently logged in as: ben10gregg. Use `wandb login --relogin` to force relogin
wandb: wandb version 0.17.0 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.14.1
wandb: Run data is saved locally in /workspace/matfusion/wandb/run-20240514_074442-i7xmxhrl
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run tough-snowflake-12
wandb: ⭐️ View project at https://wandb.ai/ben10gregg/svbrdf-diffusion
wandb: 🚀 View run at https://wandb.ai/ben10gregg/svbrdf-diffusion/runs/i7xmxhrl
saving checkpoints to checkpoints/brdf16
saving test results to Weights & Biases
creating noise model
initing niose model
creating noise model train state
replicating train state
  0%|                                                                                                                                                                | 0/7154800 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/workspace/matfusion/train.py", line 178, in <module>
    train.training_step(batch, train_vis)
  File "/workspace/matfusion/matfusion_jax/model.py", line 743, in training_step
    self.gen_state, loss_info = generator_step_impl(
  File "/workspace/matfusion/matfusion_jax/model.py", line 185, in generator_step_impl
    loss_keys = jax.random.split(key, num=x.shape[0])
AttributeError: 'NoneType' object has no attribute 'shape'
Traceback (most recent call last):
  File "/workspace/matfusion/train.py", line 178, in <module>
    train.training_step(batch, train_vis)
  File "/workspace/matfusion/matfusion_jax/model.py", line 743, in training_step
    self.gen_state, loss_info = generator_step_impl(
  File "/workspace/matfusion/matfusion_jax/model.py", line 185, in generator_step_impl
    loss_keys = jax.random.split(key, num=x.shape[0])
AttributeError: 'NoneType' object has no attribute 'shape'
```

this is my train_rasterized.yml file

```
resolution: 256
dirs:
  - svbrdfs: 8xsorted
    svbrdf_mode: "part/name_part"
    texture_gamma: 1.0
    count: 143096
loader: !Rasterized
  # we randomize FOV with the camera distance sampled from a gamma distribution
  distance: !gamma_sampled [2.0, 2.0]
```
with datasets/8xsorted and datasets/test_svbrdfs