Giter Club home page Giter Club logo

gazetransformer's Introduction

GazeTransformer: Egocentric Gaze Forecasting with Transformers

This repository contains the source code for my Masterthesis. The checkpoints used for the comparison in the evaluation can be found in this folder.

Citation

Tim Rolff, H. Matthias Harms, Frank Steinicke, Simone Frintrop: GazeTransformer: Gaze Forecasting for Virtual Reality Using Transformer Networks, Pattern Recognition. DAGM GCPR 2022, Lecture Notes in Computer Science, vol 13485. Springer, [PDF]

Abstract

During the last decade, convolutional neural networks have become the state-of-the-art approach for many computer vision problems. Recent publications in natural language processing boost the state-of-the-art performance for sequence-to-sequence models significantly by applying a novel Transformer architecture based on self-attention. Recently, researchers applied Transformers to computer vision tasks, such as object detection, image completion, and saliency prediction, competing with the state-of-the-art.

Human gaze information in virtual reality is essential for many applications, such as gaze-contingent rendering or eye movement-based interactions. By defining gaze forecasting as a time-series prediction problem, we propose a novel Transformer-based architecture, called GazeTransformer, forecasting users' gaze in dynamic virtual reality environments. Based on provided raw data, we generated an unfiltered dataset containing all gaze behavior and compared GazeTransformer to two state-of-the-art methods for gaze forecasting. Further, we evaluated different image encodings, enabling us to combine data from different sources in virtual reality, building a time-dependent sequence. As a result, GazeTransformer improved the baseline, using the current gaze for the prediction, by 8.2% (from a mean error of 3.67° to 3.37°). Further, GazeTransformer beat the prior state-of-the-art significantly (3.37° vs. 7.04° mean error), tested on the generated dataset containing all gaze behavior.

Usage

Requirements

The requirements are listed in the requirements.txt.

Dataset

Step 1: Download the dataset from the FixationNet project homepage: https://cranehzm.github.io/FixationNet

Step 2: Place the dataset in the ./dataset folder. E.g. ./dataset/rawData and ./dataset/dataset

Step 3: Generate our unfiltered dataset. Either run ./dataloader/generate.py or run specific scripts ./dataloader/generation/.

Training and Evaluation

Run the train.*.py and test.*.py scripts. The models can be found in the ./model folder. The checkpoints are stored in the same folder. ./scripts contains all scripts used during the evaluation of the thesis. ./eval_video.py generates videos for qualitative analysis.

gazetransformer's People

Contributors

harm-matthias-harms avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.