lookit / lookit-api Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 18.0 69.05 MB

Codebase for Lookit v2 and Experimenter v2. Includes an API. Docs: http://lookit.readthedocs.io/

Home Page: https://lookit.mit.edu/

License: MIT License

Python 61.85% Shell 0.04% HTML 22.48% CSS 11.41% Dockerfile 0.08% JavaScript 3.92% Makefile 0.10% SCSS 0.13%

lookit-api's People

Contributors

Stargazers

Watchers

Forkers

pattisdr hmoco mfraezz enrobyn slpgrind raecheldrew rhodricusack ttingzheng manybabies sehgalvibhor boyangqi jtrouth pursuitofepic chrisw-wsu kanikagarg openssl-sg-insights nevenaklo

lookit-api's Issues

Automated GDPR compliance

Pain point: Complying with GDPR requests requires manual work from an admin to collect and/or delete data from Lookit.

Acceptance criteria: Lookit complies with GDPR "right to be forgotten," "right to access," "right to portability" by giving participants a way to request deletion of all their data, a copy of all their data, and information about how their data is being used.

Implementation notes/Suggestions: We will likely need to differentiate between the "Lookit copy" of data and what researchers have already been given access to; when planning this task, we should schedule a quick conversation with OGC. When deleting data, we may be allowed to retain some record that it existed at all, for researchers' use in reporting rate of these events. We are already in compliance in that we can/will respond to requests appropriately, but for scaling we should make it easier to do this without manually handling requests.

Download arbitrary list of videos

Rather than downloading ALL videos or ALL CONSENT videos, allow researchers to select which sessions to download videos from by checking/unchecking a list of sessions or giving a date range. Display both session dates and the number of videos associated.

Also consider putting download link in Experimenter instead of email.

Rough code review

Review entire codebase (lookit-api, ember-lookit-frameplayer, exp-addons) for overlooked security vulnerabilities, opportunities to apply best practices, inefficiency. Add tasks as needed. Idea is to catch major/obvious issues.

Only create new session record upon completing consent frame

This would both make the collected data much easier for researchers to interpret (instead of having a separate "record" every time someone refreshed the setup page) and work more in line with how a participant might expect. It would also mean we'd only have conditions assigned in cases where the user proceeded through consent, meaning that relative counts ("we have enough of condition A but need more of condition B") would be more accurate. Plus participants who'd refreshed the setup page a lot wouldn't run into increasingly long load times as the system worked to look up every record and send in groups of 10. In principle, this could be done by either not saving the session data at all until consent, or by only granting access to an admin unless consent was completed. This would likely need to be handled at the level of generating the experiment from the JSON spec - wait for a frame that has an isConsent designation, and every frame thereafter sends data, OR a consent frame sets some property of the session data that's then accessed to check whether to send data each transition.

Expanded email functionality: announcement emails

Pain point: Email announcements of new studies for a child are currently handled by a collection of python scripts run on Kim's laptop, writing to / checking from text files of email addresses already notified. This will not scale well and allows no independence for researchers in turning on/off notifications, specifying what to say, etc. Researchers cannot send these emails themselves because they (appropriately) do not have the ability to email Lookit users who have not already participated in their studies.

Acceptance criteria:

Researchers can specify, for each study, the message for an "announcement" email that is automatically sent daily to any users with at least one child eligible for a study who has not previously participated in or previously received an announcement about this study. (Ideally this would be sent as a daily digest, one email per participant even if multiple studies have become available.)
Researchers can turn on/off announcements for their study.
Announcements are off if the study is non-discoverable.

Implementation notes/Suggestions: Centralizing email announcements of new studies allows us to prevent participants from receiving overwhelming numbers of announcements. Sending study announcements will allow us to better leverage the existing participant userbase for recruitment as the platform grows, and to keep families engaged with research.

Don't reorder the lines of the JSON schema

I.e., generate object from text, but additionally store actual text. Researchers find it unintuitive that their JSON document describing their study is rearranged upon saving; structure that makes it easier to read (e.g., grouping field names in an object by function, even though technically they're unordered) is erased. Although working from a local copy is sound practice anyway, and researchers will likely just be pasting in their JSON, it would be nicer to store their own text (and just use the parsed json).

Storing piloting status

Pain Point: The distinctions between confirmatory and exploratory work and between pilot and "real" data collection are critical for planning studies and interpreting results, but researchers face challenges in learning new practices as scientific norms change rapidly to reflect our better understanding of their impact.

Acceptance Criteria: The standard workflow in designing and starting a study on Lookit includes clear delineation of whether the work is exploratory or confirmatory, and when "real" data collection begins. Researchers can link to preregistration in their study details. Researchers state whether data collection is in "pilot" mode and that information is included in session data.

Implementation notes/Suggestions:

We could modify the display of studies to participants to show whether a study is in the pilot phase and whether it's exploratory - this also provides a foundation for requiring this information in the future, and supporting other states like "demo of a past study". In the future we aim to provide useful templates for Lookit preregistrations as well.
Could treat from UI perspective as a "state change" (analogous to making active/pausing) rather than something on study edit page (like discoverable/not). Could be an actual additional state (piloting vs active vs paused) or a separate orthogonal state (piloting vs real data collection). For considering actual additional state see https://lookit.readthedocs.io/en/develop/implementation-django-notes.html#why-transitions

Touchscreen support

Everyone wants this! It'd expand the measures we could collect from kids (natural search/exploration, interactive games, etc.) greatly. My impression is that it'd be a bit complicated to make sure everything worked smoothly across mobile OS/browsers, plus we'd need to start designating some studies as computer and some as tablet (or phone?) since these devices won't be so good in most cases for collecting infant looking data. This should happen after phase 2 for a variety of reasons. Because this is a separate and substantial block of work, ideally we might be able to find a collaborator to provide funding specifically for this piece.

Fine-tune permissions for Lookit access by researchers

Pain point: Onboarding new researchers to use the staging and/or production Lookit sites requires a substantial number of manual steps and some awkward workarounds. For instance, researchers have to try to log in, let Kim know they've done so, and then Kim grants access to the site (she is not automatically notified as an admin that someone has requested access). Kim also grants them access to existing example studies manually. There is no way for closely associated researchers (i.e., those in the same lab) to automatically get access to all of their lab's studies, and no way to associate studies with a particular research group beyond the PI contact info provided. There is no distinction between access to study details (which we might want to share broadly, e.g. to allow easy replication) and access to study data.

Planned functionality/changes:

Access: Automatically give researcher access to Lookit upon authentication via OSF. Researchers will require lab permissions to create new studies or access existing studies.
Add a study-level setting "share study protocol on Lookit" which makes study details (not data!) accessible by any Lookit researcher (no lab affiliation required), i.e. grants preview access below.
Create finer-grained permissions for individual study access, and apply to views & API. Better names for the groups are welcome...

	Preview	Design	Analysis	Submission-processor	Researcher	Manager	Admin
Read study details (protocol, etc)	x	x	x	x	x	x	x
Write study details		x				x	x
Change study status (incl. write changes that would reject study)				x	x	x	x
Manage study researchers (grant/change permissions)						x	x
View/download study data			x		x		x
Code consent and create/edit feedback				x	x		x
Change study lab							x

Create new Lab model to encompass the following functionality:
- Joining a lab:
  - Any researcher can see a list of all labs and click to request to join one.
  - A request to join triggers an email to lab managers (see below).
  - A lab manager/admin can see a list of researchers who have requested to join their lab and assign permissions (as on 'Manage Researchers' for MIT admins currently.) There must always be at least one lab admin.
  - (We can create a "practice" lab we administer that people can join to try it out until their lab is approved.)
- Editing/managing a lab:
  - Some basic information about each lab is saved. To include: name (e.g. "MIT Early Childhood Cognition Lab"), PI name(s), contact email, contact phone, lab website, parent-facing description of lab, IRB contact info, PDF upload for access agreement
  - Lab manager can edit & save info, any lab member can view info
  - Not otherwise used for now, but will eventually allow displaying lab info to participants along with studies (current/past)
- Creating a lab:
  - Any researcher can click (on same page as list of all labs) to request a new lab.
  - This takes them to a form where they need to fill out some information about the lab, see above.
  - A Lookit admin is notified when a request for a new lab is submitted.
  - A Lookit admin can view pending labs, edit information, and create the Lab. (It may be that the lab gets actually created, but some "approved" flag is in initially false and unapproved labs are basically treated as nonexistent for all other purposes.)
  - The requester is automatically added to and made an admin of the approved lab.
- Lab permissions:

	Lab researcher	Lab read	Lab admin
Create studies associated this Lab, and can be added manually to any of this Lab's studies	x	x	x
Preview role for all Lab studies		x	x
Manager role for all Lab studies			x
Manage permissions for Lab (add new researchers, etc.)			x
Edit lab metadata (description, website, etc.)			x

Changes to study model & views:
- A study must be associated with exactly 1 lab. When the study is created, this must be a lab the creator is a member of. Provide dropdown menu on study create?
- All study researchers must be members of the study's lab. Limit searches when adding researchers in "manage researchers" section to the study lab's members.
- Study lab can be changed, but only by an admin to another lab they are a member of. Probably put in new small form near 'manage researchers'.
- Note that when cloning a study, we will need to change the lab of the study to a lab the cloning user is a member of; possibly prompt for "clone into..."

Note that new Lookit-wide permissions will be expected to eventually replace the "MIT Org Read" and "MIT Org Admin," (see #459) but for now (given the staff of two...) we will just rely on superuser perms.

Explore other browser-based experiment frameworks

Create a list of other systems for building or deploying web-based experiments and/or participant management (e.g. labJS, pavlovia, jsPsych, opensesame, expfactory, prolific) -

their current capabilities & what we'd need to add
whether under active development / what that looks like
for experiment building - whether we think it could play nicely with being given child data, past response data, an experiment protocol document (in the appropriate format), using the Lookit API to send data back, and getting built in a docker container (process can be customized per system)
size of userbase, & current adoption among dev folks

Goal is to then decide:
(a) which if any we want to eventually support using on Lookit in place of ember-lookit-frameplayer, and when
(b) whether it makes sense to pursue collaborative development at this point and when to revisit if not

Sort/display available studies by child's eligibility

Pain Point: Available studies are currently displayed to families in a grid in a fixed order; with either complex inclusion criteria or more than a few studies, it is difficult for a parent to tell which studies are appropriate for their own children.

High-level acceptance criteria: A parent can see a list of studies appropriate for a given child without reading through individual study criteria. Parents and users not logged in can still browse all "discoverable" studies. Parents can view a list of studies by child under "past studies."

Tasks:

Card design:

Reconsider current "card" design and agree on sketch to implement. Will want to at minimum make sure thumbnail images are vertically aligned for aesthetics, but may want to consider a full-width card that has the thumbnail to left + info to right for ease of looking through a bunch of results. Leave a space for "No children eligible" / "Jane, Bob" text. Include lab info prominently - e.g. run by the XYZ lab at Institute. Don't include the redundant "see details" link per https://github.mit.edu/Accessibility-FY2021/lookit/issues/17
Implement new card layout with existing "current studies" page.

Finding studies (Participant-facing study list view):

Navigation:

Remove tab bar from studies. Move past studies to top nav bar next to "my account".

Tests:

Test that appropriate search options are displayed when user is logged in vs. not logged in
Test that study with previous response with completedConsentFrame true is omitted if and only if "only show studies we haven't done" is checked
Test that study with previous response with completedConsentFrame false is not omitted if "only show studies we haven't done" is checked
Test that appropriate studies are shown when filtering for preschoolers (e.g. age ranges 6mo-3y, 18mo-19mo, 23mo-5y, but not 6mo-11mo or 3y-5y)
Test that appropriate children are displayed as eligible per card
QA on staging

GUI for consent coding on Experimenter platform + central storage of consent info

Pain point: The first step in coding data from a Lookit session is to check that the associated consent video shows a parent making a statement of informed consent. It is possible for data to be collected in the absence of an informed consent statement – for instance, a parent might not read the statement because he or she does not understand written English. Currently, it is up to each lab to come up with a consent coding workflow and avoid viewing any session video until consent has been confirmed. Researchers can download individual videos, all consent videos, or all videos. This manual task is potentially error-prone and represents a lot of duplication of work across labs. A simple GUI for coding consent and central storage of consent information would enforce a clear consent coding process, reduce potential for dangerous human error, and reduce the burden on individual labs.

Acceptance criteria: Researchers can use a view on the experimenter interface to see consent videos, which they can filter by new, marked as non-consent, and marked as consent. Each video is displayed on the page along with some basic information about the associated session, including existing consent information if available. Next to the video, the researcher selects whether the consent video is valid or not (possibly from a short list of reasons why not, to distinguish tech issues from non-reading), and can enter a brief string with any notes. The consent value, note, and identity of the researcher who provided them are stored with the session record. Only videos from sessions where consent has been confirmed may be downloaded.

Implementation notes/Suggestions: Consent judgments should be possible to overwrite, e.g. in case an RA thinks something isn't valid but it is or vice versa. Storing history of such judgments would be nice to have but not critical.

Show summary stats of users/participants on Experimenter

Pain point: Upon launch, we will need to work to increase the Lookit userbase via outreach and advertising, but we don't currently have a way to evaluate such efforts (e.g., to see how many people registered in the past week).

Acceptance criteria: The family outreach specialist can easily monitor and evaluate advertising efforts by answering questions like the following, using the Lookit admin or experimenter interface without doing any programming:

How many families are registered? How many registered in the last week? Between dates X and Y? (Ideally a graph would be helpful)
How many families have participated in studies? Between dates X and Y? What is the distribution of number of unique studies families participate in?
How many of those have logged in in the past 6 months?
What's the age distribution of children of active users?
How many children have each of the diagnoses listed? (dependent on #141)
How many people registered (or participated in studies) each day over the last 3 months? etc.

Implementation notes/Suggestions: This can possibly be part of either the experimenter or the admin apps in Django. It seems like it might build on existing functionality in admin, except that we don't want usage to be limited to people who are actually admins (able to see/manipulate all data).

We've discussed building a dashboard and essentially fetching a bunch of data, then allowing filtering down from that using sliders/etc. (e.g. for age range, demographics). It could show things like new participants registered per week, a bar chart of the age distribution, tables of demographic form responses, and a plot of # unique study participants / week (one line for total unique participants, lines for individual studies).

It might turn out that there's nothing preventing us from allowing all researchers to use this from an ethics/privacy standpoint (if there's no way for them to get identifying info, just composite stats we could share with them anyway), which would be great someday, BUT the primary intended users are still a couple people at MIT for the purposes of whether we need to engineer database access based on many users.

Improve email interface and store emails sent to participants

Pain point: Sending an email to a participant - e.g., to verify consent, answer a question raised in the exit survey, or provide compensation - is cumbersome; the researcher needs to note the account ID or UUID from the response view or CSV, then go to the "email participants" view, select what type of email, and select the account by ID or UUID from a list of accounts that accept that type of email. Emails sent by researchers through the Lookit email interface then disappear forever, with no record to allow coordination among multiple lab members sending emails, documentation of participant compensation or responses to questions, etc.

Acceptance criteria:

The way you select a participant to email should be something less annoying and error-prone than the dropdown box.
It should be possible to see previous emails sent to the same participant at the point of sending an email to that participant.
Researchers should be able to distinguish between an account not accepting certain types of email and an account being missing from possible email targets. (The current setup makes sense for, say, emailing about results where we might want to email everyone who wants that and we don't care if we don't reach everyone, but makes very little sense as an interface for selecting an account to email with a question about their consent video - if they don't allow emails with questions they just don't show up in the list, which is confusing.)
There should be a way to email people for compensation that counts as a "transactional" email for purposes of opt-outs, i.e. people should still receive their gift card if they've opted out of all other mail. Otherwise we have no way to get it to them and no way to tell them, hey, if you want this you'll have to change your email settings...
A Lookit admin can see a list of all email sent through the site from all researchers, and can filter/sort by sender, participant, or date.
Researchers can see and/or download on Lookit a record of all emails that were sent to users with children who participated in a given study, including email text, account recipient, type of email (study results / clarification / etc.), who sent it, time/date, etc.
Researchers should still be able to send an email to multiple participants at once, e.g. to update people that a study has been published. (Mass email and individual email capacities could be separated out with different UIs if appropriate though.)
Researchers can also see a list on Lookit of all emails sent via their lab to a particular user (even if across multiple studies). This can be on the participant view.

Implementation notes/Suggestions:

We could create an "email" model that is created when an email is sent, links to the researcher, participant, and study, and can be viewed by any researcher with permissions to the study. (Actually we appear to already have something like this and it's kind of viewable in admin! Just not for researchers.) A list of emails could be added to the participant view and probably also to a separate 'email' view for the study.
It might help to make emails trigger-able from the consent manager, individual responses, and/or participant views, either directly embedding the email form or by having buttons that send the right data (e.g. "Email feedback or a question about this response," "Email study results," "Email compensation," with the appropriate ones disabled based on account email preferences).

Automatic concatenation and annotation of videos

Studies often generate many short clips, and it's much easier for coding to have these put together into one video per session (with embedded labels). My own coding workflow does involve doing this automatically (see https://github.com/kimberscott/lookit-data-processing); it would be helpful to be able to provide concatenated clips directly to researchers, but this will be a substantial task as the video processing will have to happen on one of the servers, triggered by finishing the study, and this is fairly computationally intensive. Also, we may want to set up to allow frames to create video labels as a special type of recording an event, and then put those labels on the videos at those times as we do for Molly's videos.

Highlight new feedback on Lookit

Researchers can leave feedback on studies, but few parents currently ever find this feedback on the website. It would be helpful to have a little "flag" shown to users to indicate that they have new feedback.

SCOPING: Diff tool for study submission review

Pain point: Before studies are actually deployed on Lookit, they have to be approved by an admin, which reserves an opportunity for us to check for compliance with terms of use, help researchers ensure instructions are clear, etc. Studies have to be re-approved after changes are made. But there's currently no way for an admin to tell what has changed, which would allow vastly expedited review in the case where the researcher fixed a typo or changed the age range, allowing us to focus energy on cases where new code has been introduced, etc.

Acceptance criteria: When reviewing a submitted study, a Lookit admin can see what has changed since the last approved version (if there is one) and see a history of actions taken on the study (e.g. edits/state changes). Either when saving changes to a study or when submitting, a researcher can provide a note about the purpose of the changes (like a commit message).

Implementation notes/Suggestions: Changes may have been made to any of the fields on the study model - e.g. purpose, description, title, eligibility criteria, JSON doc, commit SHAs. For all except the JSON doc simply displaying the previous and current versions of any changed fields would be fine. For the JSON doc some sort of actual diff output would be helpful if possible since often changes will be just to a few lines.

Databrary integration

Make it easy to transfer exactly the data where we've confirmed consent & participant allowed Databrary use to Databrary. What's easy to do has a big influence on what's done!

Revamp experimenter interface

Pain point: The details of viewing and editing studies on the experimenter interface are unintuitive.

Acceptance criteria:

It is clear to a researcher under what conditions a study needs to be stopped and re-started to make a change
Researchers can locate account/child data that relates to a particular session
Researchers can easily email the correct parent about a particular session, without writing down / memorizing an ID
It is clear to researchers how to save a change to the JSON study specification and whether their current changes have been saved

Implementation notes/Suggestions:

Scoping and possibly user testing needed to plan reorganization of interface as needed.
Consider 'draft' version of study details separate from live version, so that researchers could prepare minor changes and then re-submit without halting testing during prep

Collecting minor notes/ideas:

Transitions workflow cleanup; current proposal:
IMG_20181212_125816.pdf
Review/add appropriate unit tests
On study detail & edit pages, make buttons enabled/disabled based on user permissions for clarity

Static Pages Should Use Relative Root

There are ~23 entries in this file with:
https://storage.googleapis.com/io-osf-lookit-staging2/static/images/nsf.gif

I believe these would work if updated to /static/images/nsf.gif.

e.g.
https://github.com/CenterForOpenScience/lookit-api/blob/7826969018d276e5abc81c11e98896d0c45b6eeb/accounts/signals.py#L143

Allow families to see their videos, & add some additional details to past studies

Pain Point: Once parents participate in a study, there is no immediate confirmation that it "worked" and Lookit has their data, or any later confirmation that it has been used to do cool science. They also do not have automatic access to their own data, although they often want to see their videos; instead, a researcher has to provide it if desired, which is labor-intensive and introduces unnecessary possibilities for human error (e.g. sending the wrong child's video).

Acceptance Criteria:

Parents can download any video clips from a session from the "Past Studies" view when logged in. (Nice but not required if they can actually just see the clips embedded in the browser.)
Parents can see some basic information about past study sessions, including which child participated, status of consent coding (yes/no/pending/no video(?)), research group PI/contact.
Parents should not see sessions with completedConsentFrame=false displayed
Parents can see the "status" of a study that they participated in, as provided by the researchers - e.g., data collection, analysis, writing up results, here's a link to the paper/media article/etc.

Implementation notes/Suggestions:

We've discussed adding a freeform string field to the study model, editable by researchers in the study edit view, representing the current state of the study.
Eventually (not as part of this ticket) parents might also be able to (going forward) view any study debriefing text from "Past Studies," since this may contain individual results or condition info. Having set up an "exit survey" frame type to support withdrawal of video will make this easier, in that we'd be able to see if there's a frame of type exit-survey in the expData for this response, and if so, fetch its debriefing text.
Eventually (not as part of this ticket) parents might also be able to view/download the consent form text. This is now stored as part of the consent form in expData.

Speed up preview in cases where only JSON has changed, or where using most recent codebase

Because all the code for a particular study is deployed on Google Cloud storage, previewing a study takes some time - about 15-20 minutes. This is an extremely impractical delay for iterative testing of an experiment - if you realize you made a slight mistake on a frame specification, or want to change the colors a little and see how it looks, etc., every single change is a 15-20 minute delay. I think researchers are going to get frustrated with this very quickly and/or be less independent (less willing/able to do their own troubleshooting) in this situation, and that it's worth figuring out how to re-use the most recent deployment when (as in most cases!) the underlying code used won't have changed.

Also, the link for preview (which doesn't change anyway) should be shown on the build study page, indicating whether the most recent request has been completed, so that researchers don't have to go through their email trying to find the link.

Success: Study preview or deployment should take <= 20s when no changes have been made to the exp-addons code used.

Specify whether participant can complete study multiple times

Pain point: When a child has already participated in a study at least once, the parent sees a standard warning upon participating again, even if it is designed to be a longitudinal study with multiple sessions.

Acceptance criteria: Researchers can specify whether participants should see a warning when participating multiple times.

Implementation notes/Suggestions: Eventually we want to support particular schedules for longitudinal designs, this is a first step that will cover the immediate problem.

Supporting online processing of video frames

This is an important long-term priority, but there's not a lot we could do with it yet (before good automated gaze detection). It could potentially be supported separately, e.g. as part of a grant to work on the automated coding.

Cap number of active studies per lab

Pain point: There is no way for a Lookit admin to track cumulative time that particular studies have been active, how many active studies a particular user is responsible for, etc. This will make setting up to allow researchers to eventually pay for "study slots," or enforcing fair usage of the site, difficult.

Acceptance criteria:

An admin can see a list of current researchers/organizations and how many active studies they are responsible for
An admin can set a limit of how many active studies a researcher/organization can have at once; attempts to activate an additional study past this limit are blocked with an informative message
An admin can set a researcher's/organizations permissions to submit studies for approval
Researcher can see their current limits/resources available.

Implementation notes/Suggestions: This will require additional scoping. More flexible limits such as total study-days (e.g. researcher X has 100 study-days, after running 2 studies for 40 days each and another for 20 days the study is cut off) may be needed.

Accessibility review

Arrange basic expert accessibility review via MIT Accessibility & Usability. Decide which things to fix when, and which practices to adopt more generally. Add tasks per results.

Initial meeting Thursday 5/6.

Allow filtering session list on individual responses view

It can be difficult to find a particular participant's previous sessions on Individual Responses, or look up a particular session when responding to a question from a participant. Adding a search box (probably by parent ID, child ID, possibly child/parent nickname) would be helpful in a variety of cases.

Integration with other online experiment packages

e.g., jsPsych, labJS.

Placeholder for work to come after #177

Check why it takes so long to download data

Pain point: Fetching data via the API, e.g. "all sessions of this study completed by this child" as fetched at the start of any study, is quite slow.

Acceptance criteria: Fetching all user data (or say 3000 records) takes <15 seconds, OR some more reasonable amount of time if that's unrealistic. (It seems like not a huge amount of actual data but I don't deal with databases much.) Fetching records of all 50 sessions of a study completed by one child takes <5 seconds.

Implementation notes/Suggestions: Maybe we don't need data paginated in groups of 10 by default in the API? Is it properly indexed?

SCOPING: Ability to export/archive a static version of an experiment, without video recording

Demo should show data collected, and be easily able to be stored & hosted by the researcher.

Right now the code for a particular study (a snapshot of all the code RIGHT NOW, so that future changes don't affect operation of the study) is already bundled up and stored on Google Cloud storage. We need a way for researchers to host this themselves and run it in preview mode, without having to also run ember-lookit-frameplayer / lookit-api, and with some dummy functions overriding usual video recording. This is important to support eventually but can be added at any point.

Delete or make unavailable withdrawn data

Pain point: At the conclusion of any study, parents select a privacy level for the video (in-lab use only, scientific/educational sharing allowed, or publicity use allowed) and whether data may be shared on Databrary. Parents also have the option to withdraw video data entirely. This option is rarely used, but is important because we are filming in families’ homes and sensitive information could be accidentally disclosed during recording. When a parent selects that they would like to withdraw video, this information is included in information given to researchers, but the video is not automatically deleted - that is left to the researcher. This introduces unnecessary potential for human error.

Acceptance criteria: Video from any session that a parent withdraws video from is deleted from all Lookit storage automatically, and is not available to the researcher running the study. Other data from the session is still available.

Implementation notes/Suggestions: Ideally we might introduce a slight delay (e.g. 1-7 days) before deleting video from Lookit servers, in case the parent withdrew accidentally and asks us to restore the video; however, this is likely much more trouble than it's worth.

Minor Django update for security

Update django -> >= 1.11.19 (see Github vulnerability alerts)

Data storage options besides S3 in USA

We may want to allow researchers to specify a location for their videos to be stored rather than downloading from Lookit, and/or separate videos into locations based on studies.

This has been requested due to several IRBs' restrictions on storing participant data. However, it would substantially complicate e.g. showing participants their own videos.

Set up Google search console and Analytics

Pain point: We need to increase the userbase but don't have information about how many people are visiting the site from what sources in order to evaluate advertising efforts.

Acceptance criteria: We can see traffic to the site over time, broken down by audience and how they got there if possible.

Implementation notes/Suggestions: Setting up Google search console & analytics would probably work well for our purposes, but can consider other options.

Cache Busting / Invalidation of Static Assets

Any concern w/ the delay in cache invalidation when new assets are deployed via collectstatic? worth fingerprinting? or would it be preferred to use a CDN and issue invalidation?

Placeholder: emergent fixes based on code review

Security or code structure issues identified during review.

Current list of small stuff to address or defer at this point:

update moment.js
check why it takes so long to download data (do we need it paginated in groups of 10? is it properly indexed?)
tasks.py error checking (e.g., sp.run call -> checkcall)

Add checkboxes for various conditions/circumstances for children & families, in child & demographic surveys.

Pain point: Many researchers are interested in working with specific populations, but families do not have a standardized way to indicate whether they fall into various groups in a way that would support use of this information for recruitment/eligibility.

Acceptance criteria:

In the child and demographic forms, parents can indicate things that apply to the child/family (yes/no or checkboxes), from a standard list + "other." E.g., autism, Down syndrome, Williams syndrome, perinatal stroke, vision/hearing impairments, twin, multilingual. [Kim to provide an initial list.]
Researchers can use these fields in their study eligibility criteria - e.g., this study is for 4- to 6-month olds who were born before 37 weeks and are being raised in multilingual households.
Researchers can use presence/absence of a sibling with various criteria in study eligibility. E.g., this study is for 1- to 3-year olds with a sibling who is at least 6 years old, or for children with at least 3 siblings total, or for children without any younger siblings, or for 8- to 11-month-olds with a sibling with autism. (The last form is in fact the primary concrete motivation for this capacity, but being able to use presence of a sibling with an arbitrary condition and/or presence/absence of siblings will be useful to other researchers in the future.)

Implementation notes/Suggestions: MIT OGC advises us that including medically-relevant options is fine from a legal standpoint (MIT is not a covered entity under HIPAA).

Broader access to API

Allow researchers to generate their own tokens. Currently an admin has to make a researcher a token and provide it to them in order to use the API. Ideally anyone could try out the API on their own, without this manual step; this would encourage more development (not by us!) of coding tools that make use of automated data retrieval.

Considerations:

Rate limiting
MAYBE limit how long a token is active (e.g. a year)
Logging activity (token + request)
Alert system so we don't have to manually monitor logs

Tool to upload feedback and other manually researcher-generated session data

Pain point: Currently researchers need to use the API to leave feedback to parents about particular sessions. Feedback is one of the better ways we have to build engagement (let parents know a human watches & appreciates their video!) and we want it to be easy for researchers to leave.

Acceptance criteria:

Without using the API directly, researchers can provide feedback per session that is associated with session data and (as it is now) accessible to the family. (This should indeed be stored as part of the response, not emailed.)

Ideally this can be done via file upload, even if there is also a nice interface for it, so as not to stand in the way of whatever analysis processes individual labs develop.

Ideally, researchers can upload arbitrary text data that is stored with/linked to the session data, e.g. a manual "score" for the first part of a longitudinal study that is used by the second session to generate appropriate stimuli.

Implementation notes/Suggestions:

Feedback is currently implemented sort of like consent rulings, as a separate model; there can be multiple feedbacks for the same response, perhaps left by different researchers. This is fine but if it complicates things, condensing down to a single string feedback field on the response or making it so there is at most one feedback associated with a response would also work fine. (Migrations would be a little annoying but could just join together multiple feedbacks if needed without messing up any existing .)
MIT is the only group currently using the feedback API, but we do use it and will need to update code to accommodate changes (see https://github.com/kimberscott/lookit-data-processing/blob/coding-workflow-multilab/scripts/experimenter.py)
Feedback would ideally be possible to add/edit via the consent manager, "individual responses" view, and by uploading a copy of the study CSV they downloaded with a "feedback" column updated. It should be possible to add feedback on sessions where the consent ruling is 'rejected' or 'pending.'
The arbitrary data should be cleaned to protect the database, and multiple attributes can be stored in a single standard JSON field if that's helpful. This could also be somehow joined with the "feedback" model if easier, e.g. have a "researcher-generated session data" model that's associated with a session and includes feedback and also this other data.

Placeholder: Outside security audit

Planned as part of testing & acceptance before launch: interview, select, and hire outside firm to conduct security audit. Includes time to fix any issues found.

Check for valid sha for custom code

Pain point: Researchers can enter a custom repo/branch/commit to use for frame definitions. Right now if the sha isn't valid the default is used (i.e., the attempt to select a particular commit silently fails). Also, it's not clear to a researcher whether the "new" default (latest commit of lookit repo's master branch) will be used each time a build is initiated or whether the commit sha will be left unchanged unless they take action, so they may experience unexpected behavior that changes how their studies work.

Acceptance criteria:

If a researcher enters a commit SHA that can't be found, they see an informative error message indicating either that the build could not be completed or what was used instead.
The "None" option is renamed something informative (e.g. "Default (latest version of Lookit exp-addons master branch)") for clarity.
If a researcher uses that default, the actual commit SHA used should be stored and displayed, so that if they make other changes, they keep using the same underlying code unless they actively go select the default/latest option again.
Nice to have, not critical: Allow researcher to click "check for updates" to see if there are newer commits on the same branch.
Nice to have, not critical: Display to researcher the branch & commit message corresponding to the commit they selected for clarity.

Implementation notes/Suggestions:

Automated gaze coding

Join the baby-gaze-coding listserv to stay up-to-date on efforts in this direction: https://mailman.mit.edu:444/mailman/listinfo/baby-gaze-coding

SCOPING: Expanded email functionality - reminder emails, feedback, expanded records

Plan out additional tools for email to participants and create separate issues per feature.

These tools will allow researchers to use the email functionality more easily, and to more flexibly communicate with families. Preliminary list:

Automatically email new feedback to participants if they've opted in. (Make a new email preference about this.)
A researcher would ideally be able to specify messages to send on a particular schedule for longitudinal participants, e.g. sending up to N total automatic messages per study of the following types:
- To anyone who last participated at least N days ago and has not received this message since that session. (Stop after X emails or Y sessions.)
- To anyone who has completed at least N sessions (e.g., send a thank-you after the user completes two sessions)
A user would ideally be able to opt out of emails from a particular RESEARCHER as well as by type of email. This is so that if e.g. they've decided not to continue a particular study and don't want to hear about it anymore, they could opt out of those emails without opting out of all reminder emails. Or so that if one research group sends much more email they don't affect the opt-out rate as much for everyone else.

Support for participant compensation

Pain point: Compensating participants is necessary both for treating their time/effort fairly and for rapid recruitment. Currently labs are individually responsible for identifying participants who should receive compensation and sending gift card codes by email through the experimenter interface, which is time-consuming and introduces unnecessary possibility of human error. Setting up to give "points" per study, which families can collect across studies/children and flexibly redeem, would give us a variety of ways to improve recruitment & user engagement.

Acceptance criteria:

Participants can see how many points are associated with completing a particular study, and any restrictions (e.g. how many sessions per child are eligible for compensation, usually one).
Participants can see how many points they have received for their past study sessions, including whether status is still pending.
Participants can see their total number of points and "shop" with these from a selection of potential rewards. Participants can see past "orders" or are notified by email when points are exchanged.
An admin can manage the available rewards (e.g., gift cards, children's gear/toys/books, donations to charity): for each, a description/picture, number available, number of points each.
An admin can see or is notified of participant selections of rewards, and can mark them as fulfilled.
Researchers can set the point value of a study and allocate points to sessions. Similar to consent coding, a researcher should be able to say whether to provide compensation for a given session and that determination + the researcher's identity should be stored with session data. Researchers should be able to easily see for which sessions children met eligibility criteria, and should be able to group sessions by child and by user (family).
An admin can see how many points have been allocated by a given researcher/organization, and mark some number of those as paid, with a record kept of such adjustments, since researchers will be asked to cover their own participant compensation costs.
A researcher can see how many points they/their organization has allocated total, and how many are unpaid, with a record of transactions.
Nice to have: an admin should be able to allocate additional points on any study, to allow Lookit itself to create incentives separate from researchers'.
Nice to have: researchers can specify a maximum number of sessions to allow as part of their study definition, with the study automatically deactivated after reaching that threshold, to avoid unknown costs.

Implementation notes/Suggestions: Allocation of points could be done in a GUI analogous to that for consent coding, or actually in the same interface (although storing consent information should be possible without also determining point allocation), and/or by uploading a file. Note that tracking points allocated by organizations could also be done by having them "pre-pay" for some amount of points and only be allowed to "spend" up to that amount, as on MTurk; due to MIT's billing practices we will likely prefer just to let them allocate whatever points they want (clearly indicating the total) and bill them later.

Simplify study specification via JSON schema

E.g., have users enter sequence and frames separately; possibly provide a way to enter each frame definition separately so they can see which frames are available in a given repo and then arrange them.

Improve CSV data provided to researchers

A CSV file generated automatically shows the session data collected during the study, but the data is highly nested and is not flattened before turning into a CSV. Simple flattening would go a long way to making this data more usable; it would be even better if researchers could select which fields they wanted to include. A (partial) data dictionary could be generated and available for download alongside the CSV. The only problem with doing this after launch is that researchers relying on this data may have scripts that rely on the old version. @kimberscott may be able to work on it along with frames/documentation.

Scoping and plan of which files to create
Generate flattened session summary CSV info/headers for download
Generate session summary data dictionary for download
Hook in data dictionary download to UI (add to dropdown, use radio buttons, or just have separate download buttons)
Create longer-format session data CSV and hook in to download
Create longer-format session data data dictionary and hook in to download
Format download buttons nicely
Write documentation about use of these files

Human-readable user / child IDs, specific to organization

Pain point: Currently the primary IDs (study, user, child) are in the form "907bf070-3f2e-4667-9aa8-4d2aa5c72e46" which makes checking any code or finding correct videos (a key part of coding!) manually very difficult. Realistically, we don't check the whole thing - we use the last 3-5 characters plus context as a poor man's hash. Also, families log in with their email addresses, but those aren't shown directly to researchers, so a researcher that gets an email from a family generally won't be able to identify their records for them.

Acceptance criteria: Researchers use reasonably short, human-readable ID codes for session, child, demographic, and user records. These IDs are specific to the researcher's organization rather than uniquely associated with that user/child/data, so that if two researchers publish data including IDs a reader cannot link the data in an unanticipated way. A family can indicate which account is theirs when contacting researchers directly, e.g. to ask a question about or report a problem with a study.

Implementation notes/Suggestions: Currently, researchers are just not allowed to publish the actual IDs and are supposed to generate their own random IDs for publication purposes, but that's duplicating a bunch of work (and subject to human error). We should consider whether to implement this by actually generating IDs per researcher (or more likely organization) - datum pair or whether to provide, as a convenience, random IDs while also giving researchers access to the raw IDs (to avoid making future collaborations etc. unnecessarily difficult). I.e. this could be handled at the level of the data structure or at the level of the UI and norms for interacting with the data. To allow parents to indicate which account is theirs, we could provide parents with a contact box on Lookit or show their IDs on their account or past studies page (depending on whether IDs are researcher-specific).

Use Django Celery Beat Scheduler

Consider using https://github.com/celery/django-celery-beat so we do not have to mount a "special" persistent disk for the database file beat generates by default.

http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#using-custom-scheduler-classes - Item #4

Reference: https://github.com/CenterForOpenScience/SHARE