nalmadi / emip-toolkit Goto Github PK

EMIP Toolkit (EMTK): A Python Library for Processing Eye Movement in Programming Data

Jupyter Notebook 94.37% Python 5.63%

python-library eye-movements research reading dataset

emip-toolkit's Introduction

👀 Eye Movement In Programming Toolkit (EMTK)

EMIP-Toolkit (EMTK): A Python Library for Processing Eye Movement in Programming Data

The use of eye tracking in the study of program comprehension in software engineering allows researchers to gain a better understanding of the strategies and processes applied by programmers. Despite the large number of eye tracking studies in software engineering, very few datasets are publicly available.

💾 Datasets:

This tool evolved to include the following datasets:

EMIP2020: Bednarik, Roman, et al. "EMIP: The eye movements in programming dataset." Science of Computer Programming 198 (2020): 102520.
AlMadi2018: Al Madi, Naser, and Javed Khan. "Constructing semantic networks of comprehension from eye-movement during reading." 2018 IEEE 12th International Conference on Semantic Computing (ICSC). IEEE, 2018.
McChesney2021: McChesney, Ian, and Raymond Bond. "Eye Tracking Analysis of Code Layout, Crowding and Dyslexia-An Open Data Set." ACM Symposium on Eye Tracking Research and Applications. 2021.
AlMadi2021: Al Madi, Naser, et al. "EMIP Toolkit: A Python Library for Customized Post-processing of the Eye Movements in Programming Dataset." ACM Symposium on Eye Tracking Research and Applications. 2021.

We would be happy to include more eye movement datasets if you have any suggestions, please contact us.

🎥 Presentation:

Read our paper.

⚙️ Features:

The toolkit is designed to make using and processing eye movement in programming datasets easier and more accessible by providing the following functions:

Parsing raw data files from existing datasets into pandas dataframes.
Customizable fixation detection algorithms.
Raw data and filtered data visualizations for each trial.
Hit testing between fixations and AOIs to determine the fixations over each AOI.
Customizable offset-based fixation correction implementation for each trial.
Customizable Areas Of Interest (AOIs) mapping implementation at the line level or token level in source code for each trial.
Visualizing AOIs before and after fixations overlay on the code stimulus.
Mapping source code tokens to generated AOIs and eye movement data.
Adding source code lexical category tags to eye movement data using srcML. srcML is a static analysis tool and data format that provides very accurate syntactic categories (method signatures, parameters, function names, method calls, declarations and so on) for source code. We use it to enhance the eye movements dataset to enable better querying capabilities.

✍️ Examples and tutorial:

The Jupyter Notebook files contain examples and a tutorial on using the EMTK with each dataset.

📝 Please Cite This Paper:

Naser Al Madi, Drew T. Guarnera, Bonita Sharif, and Jonathan I. Maletic.2021. EMIP Toolkit: A Python Library for Customized Post-processing of the Eye Movements in Programming Dataset. In ETRA ’21: 2021 Symposium on Eye Tracking Research and Applications (ETRA ’21 Short Papers), May25–27, 2021, Virtual Event, Germany. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3448018.3457425

emip-toolkit's People

Contributors

Stargazers

Watchers

emip-toolkit's Issues

Add a dynamic integration of srcML into the add_srcML function

The add_srcML currently uses pre-generated files for the EMIP dataset code. It does not generate srcML tags for any piece of code.

It would be great to integrate srcML into the tool so it is called automatically (behind the scene) to generate the srcML tags for any code, then add the tags to the dataframe.

This means that we will add srcML as a dependency, so let's see if we can do this in an easy way. Not sure if srcML is downloadable through pip or similar. This might create problems for our automated Action testing, if srcML is not downloadable through pip.

A good start would be at srcML website to understand the tool and how it works: https://www.srcml.org/

Add datasets in a directory called "datasets"

Initially, there was only EMIP dataset. After we added some dataset from Eye Link 1000, we found out that the organization of each dataset is different. We now want to add datasets in one directory called "datasets", and we will separate each into a folder.

In future development, we will write up instructions for importing dataset.

Parse samples into dataframe / a list of objects instead of list

The samples field of the Trial class stores the raw samples from datasets. The field is currently a list of samples, with each sample represented by another list. The field should be, however, a dataframe, with each row corresponding with one sample, or a list of objects, with each object corresponding with one sample. This way, it will be clearer what features each sample has.

Represent each sample with a list can lead to the use of magic numbers to access sample's information. An example can be seen below:

EMIP-Toolkit/emip_toolkit.py

Lines 431 to 436 in d1a7eab

 if self.eye_tracker == "SMIRed250": 

 for sample in self.samples: 

 # Invalid records 

 if len(sample) > 5: 

 x_cord = float(sample[23]) 

 y_cord = float(sample[24]) # - 150

Change parse methods to free functions that include the name of the eye tracker

Initially, the parser methods were in the Dataset class. Since we decided to remove inheritance and also cut off this class, all parser methods become free functions now. Name of the eye trackers needs to be included.

Generate a synthetic set of fixations and eye movements

For the sake of testing ideas like fixation correction, it would be great to be able to generate a "synthetic" eye movements according to some model or maybe completely random.

This can be used for testing code and demonstrating features as well.

Name Issue for Parser

We wanted to be more specific with the names of two parsers, so we decided to change the name from read_FileType to read_EyeTrackerName, and add the file type as a parameter.

Add more fixation detection methodS

Hi @nalmadi,
Ideally we want to provide users with more fixation detection methods. Other than the dispersion-based, we can, for example, use the 10 methods benchmarked in this paper https://link.springer.com/content/pdf/10.3758/s13428-016-0738-9.pdf. Also it would be useful if we can add more documents for this one.

Enhancement - Save image background size as a field of Experiment class

In draw_trial method of Trial class:

EMIP-Toolkit/emip_toolkit.py

Lines 593 to 674 in bac781b

 def draw_trial(self, image_path, draw_raw_data=False, draw_fixation=True, draw_saccade=False, draw_number=False, 

 draw_aoi=None, save_image=None): 

 """Draws the trial image and raw-data/fixations over the image 

  circle size indicates fixation duration 

  image_path : str 

  path for trial image file. 

  draw_raw_data : bool, optional 

  whether user wants raw data drawn. 

  draw_fixation : bool, optional 

  whether user wants filtered fixations drawn 

  draw_saccade : bool, optional 

  whether user wants saccades drawn 

  draw_number : bool, optional 

  whether user wants to draw eye movement number 

  draw_aoi : pandas.DataFrame, optional 

  Area of Interests 

  save_image : str, optional 

  path to save the image, image is saved to this path if it parameter exists 

  """ 

 im = Image.open(image_path + self.image) 

 if self.eye_tracker == "EyeLink1000": 

 background_size = (1024, 768) 

 background = Image.new('RGB', background_size, color='black') 

 *_, width, _ = im.getbbox() 

 # offset = int((1024 - width) / 2) - 10 

 trial_location = (10, 375) 

 background.paste(im, trial_location, im.convert('RGBA')) 

 im = background.copy() 

 bg_color = find_background_color(im.copy().convert('1')) 

 draw = ImageDraw.Draw(im, 'RGBA') 

 if draw_aoi and isinstance(draw_aoi, bool): 

 aoi = find_aoi(image=self.image, img=im) 

 self.__draw_aoi(draw, aoi, bg_color) 

 if isinstance(draw_aoi, pd.DataFrame): 

 self.__draw_aoi(draw, draw_aoi, bg_color) 

 if draw_raw_data: 

 self.__draw_raw_data(draw) 

 if draw_fixation: 

 self.__draw_fixation(draw, draw_number) 

 if draw_saccade: 

 self.__draw_saccade(draw, draw_number) 

 plt.figure(figsize=(17, 15)) 

 plt.imshow(np.asarray(im), interpolation='nearest') 

 if save_image is not None: 

 # Save the image with applied offset 

 image_name = save_image + \ 

 str(self.participant_id) + \ 

 "-t" + \ 

 str(self.trial_id) + \ 

 "-offsetx" + \ 

 str(self.get_offset()[0]) + \ 

 "y" + \ 

 str(self.get_offset()[1]) + \ 

 ".png" 

 plt.savefig(image_name) 

 print(image_name, "saved!")

With the background-size:

EMIP-Toolkit/emip_toolkit.py

Line 624 in bac781b

background_size = (1024, 768)

and trial location:

EMIP-Toolkit/emip_toolkit.py

Line 629 in bac781b

trial_location = (10, 375)

I suggest saving them as fields of the Experiment class instead of declaring them arbitrarily without any context because the coordinates of the Fixations depend on the background-size and the trial location.

Getter issue for sample number and eye movement number in Trial class

Initially we had a function get_sample_number to returns the total number of eye movements. Later, we decided to store the raw sample in the Trial class. Thus, this function should now return the number of raw samples, while another function called get_eye_movement_number can do the first job.

add_srml_to_AOIs, add_tokens_to_AOIs are not extendable for future datasets

These functions manually matches the name of the stimuli with the name of the original code file, from which the stimuli was adapted. An example can be seen below (this is taken from add_tokens_to_AOIs):

EMIP-Toolkit/emip_toolkit.py

Lines 1222 to 1245 in d1a7eab

 if image_name == "rectangle_java.jpg": 

 file_name = "Rectangle.java" 

 if image_name == "rectangle_java2.jpg": 

 file_name = "Rectangle.java" 

 if image_name == "rectangle_python.jpg": 

 file_name = "Rectangle.py" 

 if image_name == "rectangle_scala.jpg": 

 file_name = "Rectangle.scala" 

 # vehicle files 

 if image_name == "vehicle_java.jpg": 

 file_name = "Vehicle.java" 

 if image_name == "vehicle_java2.jpg": 

 file_name = "Vehicle.java" 

 if image_name == "vehicle_python.jpg": 

 file_name = "vehicle.py" 

 if image_name == "vehicle_scala.jpg": 

 file_name = "Vehicle.scala"

This needs to be refactor to make the two functions extendable for future datasets.

Dataset link is broken or incorrect

The AlMadi2018 dataset citation link is broken.

Here's the broken link: https://dl.acm.org/doi/10.1145/3448018.345742

This link should probably lead to this paper: https://ieeexplore.ieee.org/document/8334439

Create a web documentation EMTK

An automated web documentation for EMTK would make it easier to understand methods and functions. Also, it would provide a helpful reference for tool users.

Change fixation filter to free function and call it “idt_classifier”

Initially the function fixation_filter in the Trial class removes invalid samples and classifies fixations based on I-DT algorithm. Now, we want to divide the tasks so that idt_classifier does only classifying job, while adding one more free function that only removes the invalid samples.

Download function downloads the EMIP dataset along with this toolkit and the corrected dataset

When the download function below

EMIP-Toolkit/emip_toolkit.py

Line 1552 in d1a7eab

def download(dataset_name):

used to download the EMIP dataset, it also downloads the published version of this toolkit and the corrected dataset into the datasets folder. Is this a desirable feature? I believe that it should only download the raw data.

add progress bar to do dataset download and unzip feature

Security issue with Pillow dependency

Upgrade Pillow to version 8.2.0 or later. For example:

Pillow>=8.2.0

draw_trial method - How to paste an image with a transparent background onto a larger black background image

In the draw_trial method of the Trial class:

EMIP-Toolkit/emip_toolkit.py

Lines 593 to 674 in bac781b

 def draw_trial(self, image_path, draw_raw_data=False, draw_fixation=True, draw_saccade=False, draw_number=False, 

 draw_aoi=None, save_image=None): 

 """Draws the trial image and raw-data/fixations over the image 

  circle size indicates fixation duration 

  image_path : str 

  path for trial image file. 

  draw_raw_data : bool, optional 

  whether user wants raw data drawn. 

  draw_fixation : bool, optional 

  whether user wants filtered fixations drawn 

  draw_saccade : bool, optional 

  whether user wants saccades drawn 

  draw_number : bool, optional 

  whether user wants to draw eye movement number 

  draw_aoi : pandas.DataFrame, optional 

  Area of Interests 

  save_image : str, optional 

  path to save the image, image is saved to this path if it parameter exists 

  """ 

 im = Image.open(image_path + self.image) 

 if self.eye_tracker == "EyeLink1000": 

 background_size = (1024, 768) 

 background = Image.new('RGB', background_size, color='black') 

 *_, width, _ = im.getbbox() 

 # offset = int((1024 - width) / 2) - 10 

 trial_location = (10, 375) 

 background.paste(im, trial_location, im.convert('RGBA')) 

 im = background.copy() 

 bg_color = find_background_color(im.copy().convert('1')) 

 draw = ImageDraw.Draw(im, 'RGBA') 

 if draw_aoi and isinstance(draw_aoi, bool): 

 aoi = find_aoi(image=self.image, img=im) 

 self.__draw_aoi(draw, aoi, bg_color) 

 if isinstance(draw_aoi, pd.DataFrame): 

 self.__draw_aoi(draw, draw_aoi, bg_color) 

 if draw_raw_data: 

 self.__draw_raw_data(draw) 

 if draw_fixation: 

 self.__draw_fixation(draw, draw_number) 

 if draw_saccade: 

 self.__draw_saccade(draw, draw_number) 

 plt.figure(figsize=(17, 15)) 

 plt.imshow(np.asarray(im), interpolation='nearest') 

 if save_image is not None: 

 # Save the image with applied offset 

 image_name = save_image + \ 

 str(self.participant_id) + \ 

 "-t" + \ 

 str(self.trial_id) + \ 

 "-offsetx" + \ 

 str(self.get_offset()[0]) + \ 

 "y" + \ 

 str(self.get_offset()[1]) + \ 

 ".png" 

 plt.savefig(image_name) 

 print(image_name, "saved!")

This line of code pastes an image in the AlMadi 2018 runtime dataset (an image with white text and transparent background), hereafter referred to as "the image", into a black background, hereafter referred to as "this feature":

EMIP-Toolkit/emip_toolkit.py

Line 631 in bac781b

background.paste(im, trial_location, im.convert('RGBA'))

1st Question: How does converting the image into RGBA and using it as a mask image manage to achieve this feature?

My expectation is that to achieve this feature, we only need to paste the image on top of the black background without having to use any mask image:

background.paste(im.convert('RGBA'), trial_location)

Because the background of the image is already transparent, and the word is white, contrast with the black background. However, what I get is a completely white box on a black background:

2nd Question: Why the line of code I wrote fail to achieve this feature?

Here is full code I used to test both ways:

    image_path = "EMIP-Toolkit/datasets/AlMadi2018/runtime/images/5667346413132987794.png"
    im = Image.open(image_path)

    background_size = (1024, 768)
    background = Image.new( 'RGBA', background_size, color='black' )

    trial_location = (10, 375)

    # background.paste( im, trial_location, im.convert('RGBA') )
    background.paste( im.convert('RGBA'), trial_location )
    background.save("result.png")
    
    im = background.copy()
    im.show()

Remove the invalid sample part of the fixation filter and add it to SMI Red 250 parser

Referring to #9, we need to separate the fixation_filter method into two methods. We need to create a method for SMI Red 250 parser that removes the invalid samples.

Integrate datasets from the eyeCode

Create parsers for 3 datasets from eyeCode, a tool similar to emtk, written in Python 2. The datasets can be found here: https://github.com/synesthesiam/eyecode/tree/master/data.

add Contributing section to readme

Something similar to https://github.com/sympy/sympy would be good.

This includes creating a "Contributing" page, "Adding dataset" page, and the a "Documentation Style Guide" page

Add new dataset - Eye Tracking Analysis of Code Layout, Crowding and Dyslexia - An Open Data Set

Add a parser (or use existing one if possible) for reading data from the following dataset: https://dl.acm.org/doi/fullHtml/10.1145/3448018.3457420

Requirements:
Create a Jupyter Notebook to show that all functions and methods work with the new dataset.
Add dataset and reference to the dataset dictionary in the code.
Make sure dataset can be downloaded automatically and unzipped using existing methods.

Move corrected EMIP dataset to datasets folder and make it readable by the tool

Move the corrected EMIP dataset to the datasets directory.

Make the dataset readable by the toolkit by creating a simple parser for it.

Complete community profile

EMTK doesn't have a community profile yet, so it is not clear how people can contribute to the open-source project. To help build a community around the tool, we need a few well written documents. You can contribute these documents by looking up tutorials and checking the repositories of popular open-source projects. The documents we need are:

1- Code of conduct
2- Contributing
3- License
4- Issue templates
5- Pull request template

Eliminate inheritance in class design

Initially, we chose inheritance for class design. Every eye movement element was modeled as a super class, while the subclasses being the specific eye movements from various types of eye trackers. However, we think it would make it difficult when we add support for more types of eye trackers in future development. Now, @sdotpeng will eliminate inheritance in our program, making universal class of eye movement for various eye trackers.

(Bug) In the Jupyter notebook for the the Eyelink1000

im = EMIP[subject_ID].trial[trial_num].draw_trial(image_path, draw_raw_data=False, draw_fixation=True, draw_saccade=False, draw_number=True, draw_aoi=True)

When draw_saccade is set to true it gives an error claiming a font is missing

add_tokens_to_AOIs does not work when AOI kind is line instead of sub-line

the function add_tokens_to_AOIs does not work when AOI kind is set to "line" it only works when kind is set to "sub-line"

Add unit tests

So far we have been using the example notebooks as tests, but it would be much better to develop unit tests for every method in the toolkit. Maybe consider automating the testing process on GitHub to make collaboration and onboarding easier.

Adapt eye movement classes to empty attributes

Since we decided to remove inheritance and use universal class for eye movement #7, we have to adapt the code that allows empty attributes, in those situation where one type of eye tracker doesn't record one or more types of eye movement. @sdotpeng is in charge.

Visualization: video reconstruction of a trial

Add a new visualization to generate a video of a trial based on stimuli image and fixation (and possibly saccades) timestamps. The fixation position should appear as a circle and the video should be in real time (not faster/slower than recording).

Download a dataset automatically when used instead of downloading all datasets with the tool

Instead of downloading the toolkit and all the datasets, it would be better to automatically download the specific dataset when it is used. This is the common convention in NLTK.

Merge draw_trial implementations into one simple function

Since multiple classes are being merged into one, the implementation of the draw_trial should not make assumptions about the specific trial it is drawing. Initially we wanted to create a unified visualization style, but that might not work for every trial since variations in background colors and style are possible. Instead, we want the draw_trial method to allow the user to customize the visualization with various color and style options.

Add an attribute in Trial for the type of eye tracker and data

Since we decided to eliminate inheritance in #7, we now needed to add an attribute in the Trial class to distinguish among different types of eye trackers.

error in emtk/util/_get_stimuli.py

The dimensions used for pasting the stimuli onto the background are incorrect:

Instead of (100, 375), they should be (0, 375) to allow users to see correct positions of fixations on the text.

Issue with reset offset in the Trial class

Initially, we implemented the reset offset method to undo all applied offset. However, there seems to be a tiny bug where offset was not corrected reset.

Samples variable hold fixations, saccades, and blinks instead of raw samples in Al Madi 2018 dataset

In the read_EyeLink1000 function, fixations, saccades, and blinks were parsed into the samples variable. An example can be seen below:

EMIP-Toolkit/emip_toolkit.py

Lines 918 to 934 in d1a7eab

 if token[0] == "EFIX": 

 timestamp = int(token[2]) 

 duration = int(token[4]) 

 x_cord = float(token[5]) 

 y_cord = float(token[6]) 

 pupil = int(token[7]) 

 fixations[count] = Fixation(trial_id=trial_id, 

 participant_id=participant_id, 

 timestamp=timestamp, 

 duration=duration, 

 x_cord=x_cord, 

 y_cord=y_cord, 

 token="", 

 pupil=pupil) 

 samples.append('EFIX' + ' '.join(token))

The same token that is used to populate the fields of the Fixation object was also appended into samples. If the Al Madi 2018 dataset does not have raw samples, the samples variable should be kept empty to avoid any confusion.

not using the variable "sample_duration"

At line 69 of idt_classifier.py, it should be
[timestamp, len(window_x) * sample_duration, statistics.mean(window_x), statistics.mean(window_y)])
not
[timestamp, len(window_x) * 4, statistics.mean(window_x), statistics.mean(window_y)])

	if self.eye_tracker == "SMIRed250":
	for sample in self.samples:
	# Invalid records
	if len(sample) > 5:
	x_cord = float(sample[23])
	y_cord = float(sample[24]) # - 150

	def draw_trial(self, image_path, draw_raw_data=False, draw_fixation=True, draw_saccade=False, draw_number=False,
	draw_aoi=None, save_image=None):
	"""Draws the trial image and raw-data/fixations over the image
	circle size indicates fixation duration

	image_path : str
	path for trial image file.

	draw_raw_data : bool, optional
	whether user wants raw data drawn.

	draw_fixation : bool, optional
	whether user wants filtered fixations drawn

	draw_saccade : bool, optional
	whether user wants saccades drawn

	draw_number : bool, optional
	whether user wants to draw eye movement number

	draw_aoi : pandas.DataFrame, optional
	Area of Interests

	save_image : str, optional
	path to save the image, image is saved to this path if it parameter exists
	"""

	im = Image.open(image_path + self.image)

	if self.eye_tracker == "EyeLink1000":

	background_size = (1024, 768)
	background = Image.new('RGB', background_size, color='black')

	*_, width, _ = im.getbbox()
	# offset = int((1024 - width) / 2) - 10
	trial_location = (10, 375)

	background.paste(im, trial_location, im.convert('RGBA'))

	im = background.copy()


	bg_color = find_background_color(im.copy().convert('1'))

	draw = ImageDraw.Draw(im, 'RGBA')

	if draw_aoi and isinstance(draw_aoi, bool):
	aoi = find_aoi(image=self.image, img=im)
	self.__draw_aoi(draw, aoi, bg_color)

	if isinstance(draw_aoi, pd.DataFrame):
	self.__draw_aoi(draw, draw_aoi, bg_color)

	if draw_raw_data:
	self.__draw_raw_data(draw)

	if draw_fixation:
	self.__draw_fixation(draw, draw_number)

	if draw_saccade:
	self.__draw_saccade(draw, draw_number)

	plt.figure(figsize=(17, 15))
	plt.imshow(np.asarray(im), interpolation='nearest')

	if save_image is not None:
	# Save the image with applied offset

	image_name = save_image + \
	str(self.participant_id) + \
	"-t" + \
	str(self.trial_id) + \
	"-offsetx" + \
	str(self.get_offset()[0]) + \
	"y" + \
	str(self.get_offset()[1]) + \
	".png"

	plt.savefig(image_name)

	print(image_name, "saved!")

	if image_name == "rectangle_java.jpg":
	file_name = "Rectangle.java"

	if image_name == "rectangle_java2.jpg":
	file_name = "Rectangle.java"

	if image_name == "rectangle_python.jpg":
	file_name = "Rectangle.py"

	if image_name == "rectangle_scala.jpg":
	file_name = "Rectangle.scala"

	# vehicle files
	if image_name == "vehicle_java.jpg":
	file_name = "Vehicle.java"

	if image_name == "vehicle_java2.jpg":
	file_name = "Vehicle.java"

	if image_name == "vehicle_python.jpg":
	file_name = "vehicle.py"

	if image_name == "vehicle_scala.jpg":
	file_name = "Vehicle.scala"

	if token[0] == "EFIX":
	timestamp = int(token[2])
	duration = int(token[4])
	x_cord = float(token[5])
	y_cord = float(token[6])
	pupil = int(token[7])

	fixations[count] = Fixation(trial_id=trial_id,
	participant_id=participant_id,
	timestamp=timestamp,
	duration=duration,
	x_cord=x_cord,
	y_cord=y_cord,
	token="",
	pupil=pupil)

	samples.append('EFIX' + ' '.join(token))

nalmadi / emip-toolkit Goto Github PK

emip-toolkit's Introduction

👀 Eye Movement In Programming Toolkit (EMTK)

💾 Datasets:

🎥 Presentation:

⚙️ Features:

✍️ Examples and tutorial:

📝 Please Cite This Paper:

emip-toolkit's People

Contributors

Stargazers

Watchers

Forkers

emip-toolkit's Issues

Recommend Projects

Recommend Topics

Recommend Org