GazeTransformer: Egocentric Gaze Forecasting with Transformers

This repository contains the source code for my Masterthesis. The checkpoints used for the comparison in the evaluation can be found in this folder.

Citation

Tim Rolff, H. Matthias Harms, Frank Steinicke, Simone Frintrop: GazeTransformer: Gaze Forecasting for Virtual Reality Using Transformer Networks, Pattern Recognition. DAGM GCPR 2022, Lecture Notes in Computer Science, vol 13485. Springer, [PDF]

Abstract

During the last decade, convolutional neural networks have become the state-of-the-art approach for many computer vision problems. Recent publications in natural language processing boost the state-of-the-art performance for sequence-to-sequence models significantly by applying a novel Transformer architecture based on self-attention. Recently, researchers applied Transformers to computer vision tasks, such as object detection, image completion, and saliency prediction, competing with the state-of-the-art.

Human gaze information in virtual reality is essential for many applications, such as gaze-contingent rendering or eye movement-based interactions. By defining gaze forecasting as a time-series prediction problem, we propose a novel Transformer-based architecture, called GazeTransformer, forecasting users' gaze in dynamic virtual reality environments. Based on provided raw data, we generated an unfiltered dataset containing all gaze behavior and compared GazeTransformer to two state-of-the-art methods for gaze forecasting. Further, we evaluated different image encodings, enabling us to combine data from different sources in virtual reality, building a time-dependent sequence. As a result, GazeTransformer improved the baseline, using the current gaze for the prediction, by 8.2% (from a mean error of 3.67° to 3.37°). Further, GazeTransformer beat the prior state-of-the-art significantly (3.37° vs. 7.04° mean error), tested on the generated dataset containing all gaze behavior.

Usage

Requirements

The requirements are listed in the requirements.txt.

Dataset

Step 1: Download the dataset from the FixationNet project homepage: https://cranehzm.github.io/FixationNet

Step 2: Place the dataset in the ./dataset folder. E.g. ./dataset/rawData and ./dataset/dataset

Step 3: Generate our unfiltered dataset. Either run ./dataloader/generate.py or run specific scripts ./dataloader/generation/.

Training and Evaluation

Run the train.*.py and test.*.py scripts. The models can be found in the ./model folder. The checkpoints are stored in the same folder. ./scripts contains all scripts used during the evaluation of the thesis. ./eval_video.py generates videos for qualitative analysis.

harm-matthias-harms / gazetransformer Goto Github PK

gazetransformer's Introduction

GazeTransformer: Egocentric Gaze Forecasting with Transformers

Citation

Abstract

Usage

Requirements

Dataset

Training and Evaluation

gazetransformer's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent