Giter Club home page Giter Club logo

emip-toolkit's Introduction

GitHub issues Watchers Forks Stars GitHub last commit

πŸ‘€ Eye Movement In Programming Toolkit (EMTK)

EMIP-Toolkit (EMTK): A Python Library for Processing Eye Movement in Programming Data

The use of eye tracking in the study of program comprehension in software engineering allows researchers to gain a better understanding of the strategies and processes applied by programmers. Despite the large number of eye tracking studies in software engineering, very few datasets are publicly available.

πŸ’Ύ Datasets:

This tool evolved to include the following datasets:

  1. EMIP2020: Bednarik, Roman, et al. "EMIP: The eye movements in programming dataset." Science of Computer Programming 198 (2020): 102520.
  2. AlMadi2018: Al Madi, Naser, and Javed Khan. "Constructing semantic networks of comprehension from eye-movement during reading." 2018 IEEE 12th International Conference on Semantic Computing (ICSC). IEEE, 2018.
  3. McChesney2021: McChesney, Ian, and Raymond Bond. "Eye Tracking Analysis of Code Layout, Crowding and Dyslexia-An Open Data Set." ACM Symposium on Eye Tracking Research and Applications. 2021.
  4. AlMadi2021: Al Madi, Naser, et al. "EMIP Toolkit: A Python Library for Customized Post-processing of the Eye Movements in Programming Dataset." ACM Symposium on Eye Tracking Research and Applications. 2021.

We would be happy to include more eye movement datasets if you have any suggestions, please contact us.

πŸŽ₯ Presentation:

Watch the video Read our paper.

βš™οΈ Features:

The toolkit is designed to make using and processing eye movement in programming datasets easier and more accessible by providing the following functions:

  • Parsing raw data files from existing datasets into pandas dataframes.

  • Customizable fixation detection algorithms.

  • Raw data and filtered data visualizations for each trial.

  • Hit testing between fixations and AOIs to determine the fixations over each AOI.

  • Customizable offset-based fixation correction implementation for each trial.

  • Customizable Areas Of Interest (AOIs) mapping implementation at the line level or token level in source code for each trial.

  • Visualizing AOIs before and after fixations overlay on the code stimulus.

  • Mapping source code tokens to generated AOIs and eye movement data.

  • Adding source code lexical category tags to eye movement data using srcML. srcML is a static analysis tool and data format that provides very accurate syntactic categories (method signatures, parameters, function names, method calls, declarations and so on) for source code. We use it to enhance the eye movements dataset to enable better querying capabilities.

✍️ Examples and tutorial:

The Jupyter Notebook files contain examples and a tutorial on using the EMTK with each dataset.

πŸ“ Please Cite This Paper:

Naser Al Madi, Drew T. Guarnera, Bonita Sharif, and Jonathan I. Maletic.2021. EMIP Toolkit: A Python Library for Customized Post-processing of the Eye Movements in Programming Dataset. In ETRA ’21: 2021 Symposium on Eye Tracking Research and Applications (ETRA ’21 Short Papers), May25–27, 2021, Virtual Event, Germany. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3448018.3457425

emip-toolkit's People

Contributors

bluesocksfff avatar diluo1999 avatar jhimel22 avatar kmandr22 avatar ksdixo23 avatar nalmadi avatar quanphan2906 avatar sdotpeng avatar theweisguy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

emip-toolkit's Issues

Add a dynamic integration of srcML into the add_srcML function

The add_srcML currently uses pre-generated files for the EMIP dataset code. It does not generate srcML tags for any piece of code.

It would be great to integrate srcML into the tool so it is called automatically (behind the scene) to generate the srcML tags for any code, then add the tags to the dataframe.

This means that we will add srcML as a dependency, so let's see if we can do this in an easy way. Not sure if srcML is downloadable through pip or similar. This might create problems for our automated Action testing, if srcML is not downloadable through pip.

A good start would be at srcML website to understand the tool and how it works: https://www.srcml.org/

Add datasets in a directory called "datasets"

Initially, there was only EMIP dataset. After we added some dataset from Eye Link 1000, we found out that the organization of each dataset is different. We now want to add datasets in one directory called "datasets", and we will separate each into a folder.

In future development, we will write up instructions for importing dataset.

Parse samples into dataframe / a list of objects instead of list

The samples field of the Trial class stores the raw samples from datasets. The field is currently a list of samples, with each sample represented by another list. The field should be, however, a dataframe, with each row corresponding with one sample, or a list of objects, with each object corresponding with one sample. This way, it will be clearer what features each sample has.

Represent each sample with a list can lead to the use of magic numbers to access sample's information. An example can be seen below:

if self.eye_tracker == "SMIRed250":
for sample in self.samples:
# Invalid records
if len(sample) > 5:
x_cord = float(sample[23])
y_cord = float(sample[24]) # - 150

Generate a synthetic set of fixations and eye movements

For the sake of testing ideas like fixation correction, it would be great to be able to generate a "synthetic" eye movements according to some model or maybe completely random.

This can be used for testing code and demonstrating features as well.

Name Issue for Parser

We wanted to be more specific with the names of two parsers, so we decided to change the name from read_FileType to read_EyeTrackerName, and add the file type as a parameter.

Enhancement - Save image background size as a field of Experiment class

In draw_trial method of Trial class:

def draw_trial(self, image_path, draw_raw_data=False, draw_fixation=True, draw_saccade=False, draw_number=False,
draw_aoi=None, save_image=None):
"""Draws the trial image and raw-data/fixations over the image
circle size indicates fixation duration
image_path : str
path for trial image file.
draw_raw_data : bool, optional
whether user wants raw data drawn.
draw_fixation : bool, optional
whether user wants filtered fixations drawn
draw_saccade : bool, optional
whether user wants saccades drawn
draw_number : bool, optional
whether user wants to draw eye movement number
draw_aoi : pandas.DataFrame, optional
Area of Interests
save_image : str, optional
path to save the image, image is saved to this path if it parameter exists
"""
im = Image.open(image_path + self.image)
if self.eye_tracker == "EyeLink1000":
background_size = (1024, 768)
background = Image.new('RGB', background_size, color='black')
*_, width, _ = im.getbbox()
# offset = int((1024 - width) / 2) - 10
trial_location = (10, 375)
background.paste(im, trial_location, im.convert('RGBA'))
im = background.copy()
bg_color = find_background_color(im.copy().convert('1'))
draw = ImageDraw.Draw(im, 'RGBA')
if draw_aoi and isinstance(draw_aoi, bool):
aoi = find_aoi(image=self.image, img=im)
self.__draw_aoi(draw, aoi, bg_color)
if isinstance(draw_aoi, pd.DataFrame):
self.__draw_aoi(draw, draw_aoi, bg_color)
if draw_raw_data:
self.__draw_raw_data(draw)
if draw_fixation:
self.__draw_fixation(draw, draw_number)
if draw_saccade:
self.__draw_saccade(draw, draw_number)
plt.figure(figsize=(17, 15))
plt.imshow(np.asarray(im), interpolation='nearest')
if save_image is not None:
# Save the image with applied offset
image_name = save_image + \
str(self.participant_id) + \
"-t" + \
str(self.trial_id) + \
"-offsetx" + \
str(self.get_offset()[0]) + \
"y" + \
str(self.get_offset()[1]) + \
".png"
plt.savefig(image_name)
print(image_name, "saved!")

With the background-size:

background_size = (1024, 768)

and trial location:
trial_location = (10, 375)

I suggest saving them as fields of the Experiment class instead of declaring them arbitrarily without any context because the coordinates of the Fixations depend on the background-size and the trial location.

Getter issue for sample number and eye movement number in Trial class

Initially we had a function get_sample_number to returns the total number of eye movements. Later, we decided to store the raw sample in the Trial class. Thus, this function should now return the number of raw samples, while another function called get_eye_movement_number can do the first job.

add_srml_to_AOIs, add_tokens_to_AOIs are not extendable for future datasets

These functions manually matches the name of the stimuli with the name of the original code file, from which the stimuli was adapted. An example can be seen below (this is taken from add_tokens_to_AOIs):

EMIP-Toolkit/emip_toolkit.py

Lines 1222 to 1245 in d1a7eab

if image_name == "rectangle_java.jpg":
file_name = "Rectangle.java"
if image_name == "rectangle_java2.jpg":
file_name = "Rectangle.java"
if image_name == "rectangle_python.jpg":
file_name = "Rectangle.py"
if image_name == "rectangle_scala.jpg":
file_name = "Rectangle.scala"
# vehicle files
if image_name == "vehicle_java.jpg":
file_name = "Vehicle.java"
if image_name == "vehicle_java2.jpg":
file_name = "Vehicle.java"
if image_name == "vehicle_python.jpg":
file_name = "vehicle.py"
if image_name == "vehicle_scala.jpg":
file_name = "Vehicle.scala"

This needs to be refactor to make the two functions extendable for future datasets.

Create a web documentation EMTK

An automated web documentation for EMTK would make it easier to understand methods and functions. Also, it would provide a helpful reference for tool users.

draw_trial method - How to paste an image with a transparent background onto a larger black background image

In the draw_trial method of the Trial class:

def draw_trial(self, image_path, draw_raw_data=False, draw_fixation=True, draw_saccade=False, draw_number=False,
draw_aoi=None, save_image=None):
"""Draws the trial image and raw-data/fixations over the image
circle size indicates fixation duration
image_path : str
path for trial image file.
draw_raw_data : bool, optional
whether user wants raw data drawn.
draw_fixation : bool, optional
whether user wants filtered fixations drawn
draw_saccade : bool, optional
whether user wants saccades drawn
draw_number : bool, optional
whether user wants to draw eye movement number
draw_aoi : pandas.DataFrame, optional
Area of Interests
save_image : str, optional
path to save the image, image is saved to this path if it parameter exists
"""
im = Image.open(image_path + self.image)
if self.eye_tracker == "EyeLink1000":
background_size = (1024, 768)
background = Image.new('RGB', background_size, color='black')
*_, width, _ = im.getbbox()
# offset = int((1024 - width) / 2) - 10
trial_location = (10, 375)
background.paste(im, trial_location, im.convert('RGBA'))
im = background.copy()
bg_color = find_background_color(im.copy().convert('1'))
draw = ImageDraw.Draw(im, 'RGBA')
if draw_aoi and isinstance(draw_aoi, bool):
aoi = find_aoi(image=self.image, img=im)
self.__draw_aoi(draw, aoi, bg_color)
if isinstance(draw_aoi, pd.DataFrame):
self.__draw_aoi(draw, draw_aoi, bg_color)
if draw_raw_data:
self.__draw_raw_data(draw)
if draw_fixation:
self.__draw_fixation(draw, draw_number)
if draw_saccade:
self.__draw_saccade(draw, draw_number)
plt.figure(figsize=(17, 15))
plt.imshow(np.asarray(im), interpolation='nearest')
if save_image is not None:
# Save the image with applied offset
image_name = save_image + \
str(self.participant_id) + \
"-t" + \
str(self.trial_id) + \
"-offsetx" + \
str(self.get_offset()[0]) + \
"y" + \
str(self.get_offset()[1]) + \
".png"
plt.savefig(image_name)
print(image_name, "saved!")

This line of code pastes an image in the AlMadi 2018 runtime dataset (an image with white text and transparent background), hereafter referred to as "the image", into a black background, hereafter referred to as "this feature":

background.paste(im, trial_location, im.convert('RGBA'))

1st Question: How does converting the image into RGBA and using it as a mask image manage to achieve this feature?

My expectation is that to achieve this feature, we only need to paste the image on top of the black background without having to use any mask image:

background.paste(im.convert('RGBA'), trial_location)

Because the background of the image is already transparent, and the word is white, contrast with the black background. However, what I get is a completely white box on a black background:

result

2nd Question: Why the line of code I wrote fail to achieve this feature?

Here is full code I used to test both ways:

    image_path = "EMIP-Toolkit/datasets/AlMadi2018/runtime/images/5667346413132987794.png"
    im = Image.open(image_path)

    background_size = (1024, 768)
    background = Image.new( 'RGBA', background_size, color='black' )

    trial_location = (10, 375)

    # background.paste( im, trial_location, im.convert('RGBA') )
    background.paste( im.convert('RGBA'), trial_location )
    background.save("result.png")
    
    im = background.copy()
    im.show()

Add new dataset - Eye Tracking Analysis of Code Layout, Crowding and Dyslexia - An Open Data Set

Add a parser (or use existing one if possible) for reading data from the following dataset: https://dl.acm.org/doi/fullHtml/10.1145/3448018.3457420

Requirements:
Create a Jupyter Notebook to show that all functions and methods work with the new dataset.
Add dataset and reference to the dataset dictionary in the code.
Make sure dataset can be downloaded automatically and unzipped using existing methods.

Complete community profile

EMTK doesn't have a community profile yet, so it is not clear how people can contribute to the open-source project. To help build a community around the tool, we need a few well written documents. You can contribute these documents by looking up tutorials and checking the repositories of popular open-source projects. The documents we need are:

1- Code of conduct
2- Contributing
3- License
4- Issue templates
5- Pull request template

Eliminate inheritance in class design

Initially, we chose inheritance for class design. Every eye movement element was modeled as a super class, while the subclasses being the specific eye movements from various types of eye trackers. However, we think it would make it difficult when we add support for more types of eye trackers in future development. Now, @sdotpeng will eliminate inheritance in our program, making universal class of eye movement for various eye trackers.

(Bug) In the Jupyter notebook for the the Eyelink1000

im = EMIP[subject_ID].trial[trial_num].draw_trial(image_path, draw_raw_data=False, draw_fixation=True, draw_saccade=False, draw_number=True, draw_aoi=True)

When draw_saccade is set to true it gives an error claiming a font is missing

Add unit tests

So far we have been using the example notebooks as tests, but it would be much better to develop unit tests for every method in the toolkit. Maybe consider automating the testing process on GitHub to make collaboration and onboarding easier.

Adapt eye movement classes to empty attributes

Since we decided to remove inheritance and use universal class for eye movement #7, we have to adapt the code that allows empty attributes, in those situation where one type of eye tracker doesn't record one or more types of eye movement. @sdotpeng is in charge.

Visualization: video reconstruction of a trial

Add a new visualization to generate a video of a trial based on stimuli image and fixation (and possibly saccades) timestamps. The fixation position should appear as a circle and the video should be in real time (not faster/slower than recording).

Merge draw_trial implementations into one simple function

Since multiple classes are being merged into one, the implementation of the draw_trial should not make assumptions about the specific trial it is drawing. Initially we wanted to create a unified visualization style, but that might not work for every trial since variations in background colors and style are possible. Instead, we want the draw_trial method to allow the user to customize the visualization with various color and style options.

error in emtk/util/_get_stimuli.py

The dimensions used for pasting the stimuli onto the background are incorrect:
chrome_3hOOvq1ybe

Instead of (100, 375), they should be (0, 375) to allow users to see correct positions of fixations on the text.

Samples variable hold fixations, saccades, and blinks instead of raw samples in Al Madi 2018 dataset

In the read_EyeLink1000 function, fixations, saccades, and blinks were parsed into the samples variable. An example can be seen below:

if token[0] == "EFIX":
timestamp = int(token[2])
duration = int(token[4])
x_cord = float(token[5])
y_cord = float(token[6])
pupil = int(token[7])
fixations[count] = Fixation(trial_id=trial_id,
participant_id=participant_id,
timestamp=timestamp,
duration=duration,
x_cord=x_cord,
y_cord=y_cord,
token="",
pupil=pupil)
samples.append('EFIX' + ' '.join(token))

The same token that is used to populate the fields of the Fixation object was also appended into samples. If the Al Madi 2018 dataset does not have raw samples, the samples variable should be kept empty to avoid any confusion.

not using the variable "sample_duration"

At line 69 of idt_classifier.py, it should be
[timestamp, len(window_x) * sample_duration, statistics.mean(window_x), statistics.mean(window_y)])
not
[timestamp, len(window_x) * 4, statistics.mean(window_x), statistics.mean(window_y)])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.