Project Page | Paper | Video
This repository contains the official implementation of the paper Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations by Vadim Tschernezki, Iro Laina, Diane Larlus and Andrea Vedaldi. Published at 3DV22 as Oral.
We provide the code for the experiments of NeRF-N3F setting. NOTE: The repository contains currently the settings for the flower dataset (see flower video above). We will add the rendering settings for the other scenes in the next days.
Abstract: We present Neural Feature Fusion Fields (N3F), a method that improves dense 2D image feature extractors when the latter are applied to the analysis of multiple images reconstructible as a 3D scene. Given an image feature extractor, for example pre-trained using self-supervision, N3F uses it as a teacher to learn a student network defined in 3D space. The 3D student network is similar to a neural radiance field that distills said features and can be trained with the usual differentiable rendering machinery. As a consequence, N3F is readily applicable to most neural rendering formulations, including vanilla NeRF and its extensions to complex dynamic scenes. We show that our method not only enables semantic understanding in the context of scene-specific neural fields without the use of manual labels, but also consistently improves over the self-supervised 2D baselines. This is demonstrated by considering various tasks, such as 2D object retrieval, 3D segmentation, and scene editing, in diverse sequences, including long egocentric videos in the EPIC-KITCHENS benchmark.
We suggest to setup the environment through conda and pip.
- Create and activate the specified conda anvironment.
- Install the required packages from
requirements.txt
.
conda create -n n3f python=3.8
conda activate n3f
pip install -r requirements.txt
Since we demonstrate the experiments through a jupyter notebook, you'll have to install the jupyter kernel:
conda install -c anaconda ipykernel
python -m ipykernel install --user --name=n3f
If you are getting the following error: CUDA error: no kernel image is available for execution on the device
, then try installing pytorch with a different CUDA kernel, e.g.: pip install torch==1.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
.
The dataset and pretrained models can be found on google drive.
Download both files logs.tar.gz
and data.tar.gz
and extract them into the main directory. The checkpoints are located in the logs directory. The data directory contains the flower scene and the features extracted with DINO for this scene and the remaining scenes shown in the paper. This allows you to train your own models if you have downloaded the NeRF checkpoints and datasets for the remaining scenes.
We are providing a notebook that contains the code to reproduce the results for the flower scene. The other scenes will be added in the next days.
First, you can visualise the selected patch and calculate a histogram for the query feature vector vs. the retrieval vectors. This allows you to select a threshold for the scene decomposition in the next step.
After that, you can render the source view and render the decomposed target view that shows the complete image, a version that includes only the queried object and another version that excludes the queried object.
Finally, we can also compare the PCA reduced futures and feature distance maps of NeRF-N3F + DINO vs. vanilla DINO:
If you found our code or paper useful, then please cite our work as follows.
@inproceedings{tschernezki22neural,
author = {Vadim Tschernezki and Iro Laina and
Diane Larlus and Andrea Vedaldi},
booktitle = {Proceedings of the International Conference
on {3D} Vision (3DV)},
title = {Neural Feature Fusion Fields: {3D} Distillation
of Self-Supervised {2D} Image Representations},
year = {2022}
}
We suggest to check out the concurrent work by Kobayashi et al. They propose to fuse features in the same manner and mainly differ in the example applications, including the use of multiple modalities, such as text, image patches and point-and-click seeds, to generate queries for segmentation and, in particular, scene editing.
Our implementation is based on this (unofficial pytorch-NeRF) repository.