Giter Club home page Giter Club logo

isl-gazecapture's Introduction

ISL Mobile Gaze Experiment

This is a replication of the work published here., done by the RPI Intelligent Systems Lab.

The goal of this research is to produce a practical system for tracking the 2D position of the user's gaze on a mobile device screen. It uses an appearance-based method that employs a CNN operating on images of the user's eyes and face taken by the device's front-facing camera. Nominally, it can achieve ~2 cm net accuracy, as mesured on the GazeCapture validation dataset.

Using this Code

The code available here can be used for two basic functions. The first is constructing and training the CNN model. The second is running a simple demo using an already-trained model. The demo works by creating a simple server that can be passed images by the client, and send back the estimated gaze point. There is also a demonstration Android app that acts as a client and implements a simple demo which we intend to make available in the future.

Training the Model

Training the model requires TensorFlow, Keras, and Rhodopsin. Practically, it also requires a GPU, preferably one with at least 8 Gb of VRAM.

Building the Dataset

Training the model requires the GazeCapture dataset. The dataset must also be converted to a TFRecords format before it can be used for training. Once the dataset has been downloaded, there is an included script to perform this conversion:

~$ ./process_gazecap.py dataset_dir output_dir

The first argument is the path to the root directory of the GazeCapture dataset that you downloaded. The second argument is the path to the output directory where you want the TFRecords files to be located.

This script will create three TFRecords files in the output directory: One for the training data, one for the testing data, and one for the validation data.

For convenience, this script can be run in the same Docker container as the actual training. (See below.)

Performing the Training

By far, the easiest way to train the model is to use the pre-built Docker container. There is a script included that will automatically pull this container and open an interactive shell:

~$ ./start_train_container.sh

Note that this requires both Docker and nvidia-docker to be installed on your local machine.

The script automatically mount-binds the repository directory to the isl_gazecapture directory inside the container. Once inside this directory, training can be initiated as follows:

~$ ./train_gazecap.py train_dataset test_dataset

The first argument is the path to the TFRecords file containing the training dataset. Likewise, the second argument is the path to the file containing the testing dataset.

Configuring Training

Many of the attributes that pertain to training are set in constants defined at the top of the train_gazecap.py file. These can be modified at-will, and have comments documenting their functions.

Additionally, settings that are common to both the training procedure and the demo server are located in itracker/common/config.py. Most likely, the only parameter here that you might want to modify is NET_ARCH. This parameter points to the class of the network to train. (Different classes are defined in network.py for different network architectures.) This can be changed to any of the classes specified in network.py in order to experiment with alternative architectures.

Training a Saved Model

You can start training with a pre-trained model if you wish. This is done by passing the hd5 file containing the saved model weights with the -m parameter to the training script.

Note that the architecture of saved models cannot be automatically detected, so it must be ensured that the current value of NET_ARCH matches the saved model architecture.

Validating a Trained Model

You may with to evaluate the accuracy of the model on the GazeCapture validation dataset. This dataset should be automatically generated, but is not used during the training procedure.

In order to validate a saved model, specify the location of the validation dataset using the -v option. (Note that you will also have to specify the saved model with the -m option for this to work.)

Running the Demo

TODO (djpetti): Write this section.

isl-gazecapture's People

Contributors

djpetti avatar moutaigua8183 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

isl-gazecapture's Issues

An error occurred during compilation: "incompatible types: void cannot be converted to int"

Hi Daniel~

In the process of compiling the Android app, I have an error, can you please guide, thank you!

====================================
Android Studio: "error: incompatible types: void cannot be converted to int"

GameMainActivity.java
// need return int
gridPos = drawHandler.draw22ClassifiedResult()

DrawHandler.java
// but this return type is void.
public void draw22ClassifiedResult()

draw33ClassifiedResult() and draw34ClassifiedResult() also have the same problem.

Build command failed.

Build command failed.
Error while executing process D:\software\sdk\cmake\3.6.4111459\bin\cmake.exe with arguments {--build D:\isl\isl-gazecapture\android_app\openCVLibrary.externalNativeBuild\cmake\release\x86_64 --target MobileGazeNative}

ninja: error: '/Users/mou/Documents/AndroidStudioProjects/ISL_Android_Gaze/app/src/main/jniLibs/x86_64/libopencv_java3.so', needed by 'D:/isl/isl-gazecapture/android_app/openCVLibrary/build/intermediates/cmake/release/obj/x86_64/libMobileGazeNative.so', missing and no known rule to make it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.