w3c / aria-at Goto Github PK

View Code? Open in Web Editor NEW

149.0 37.0 27.0 9.54 MB

Assistive Technology ARIA Experience Assessment

Home Page: https://aria-at.netlify.app

License: Other

JavaScript 48.26% CSS 4.66% HTML 46.69% Mustache 0.39% Shell 0.01%

accessibility a11y aria-at aria apg aria-practices

aria-at's Issues

Develop assertion model for screen reader testing

Objective: Determine how gramular to make assertions when testing screen reader support for an element or widget pattern.

Wiki page to define a common language for screen reader specific modes and features

Including things such as reading mode/interactive mode and their names in different screen readers.

Move AT task to commands mappings to design pattern folders

Right now, all of the data to map tasks to a list of specific AT commands is in one folder, tests/resources/at-commands. We need to re-architect this so that this data is in the design pattern's folder.

Create menubar editor test in prototype

Assigned to Matt until it can be assigned to Jon and Yohta

Create documentation for how test cases will be authored

Question from Jon Gunderson: How are the test cases authored for the prototype?

Feedback on Test Harness and Test vocabulary

Check out: https://github.com/w3c/aria-at/wiki/Test-Harness-and-Test-vocabulary

Are there other key components of a test that should be added, or explanations that need to be more clear, or better terms for these ideas, or other parts of this project that need terminology?

How to encode and display AT-specific instructions?

There is a lot of back ground in issues #14 and some in #15.

Currently, as a proof of concept, I added the AT-specifc commands for checkbox into a file call tests/resources/at-commands.js. The actual encoding is not super important right now (as this is a prototype), the goal is mostly to get feedback on the API.

getATCommands(screenReader, screenReaderMode, userTask)

Inputs:

Screen reader (currently, JAWS, NVDA or Voice Over)
Screen reader mode (currently, "reading" or "interaction"
The "userTask", which is currently an "verb" (which eventually we will have a list of common verbs, like "read" and operate") and a noun which describes a part of a widget. For the checkbox example, we have the userTasks: "read checkbox", "read checkbox grouping", and "operate checkbox". Each user task results in a single test. Sometimes, in a single test, a tester will perform the task in multiple modes or on different states of the same widget (for example, "read checkbox" will be performed when the checkbox is unchecked and "read checkbox" when the checkbox is checked).

Returns:

An ordered list of commands. The list is ordered so that we can control the order in which the commands are asked to be performed (because there might be an ordering which is easier or more natural to perform). A "command" has two parts, one is "instructions" the user how to perform the command and the other is the exact "key command" the user should use.
- command "instructions": I just added this concept in response to @jfhector work, because it's clear in the case of "read checkbox" and "reach checkbox grouping" that we need to provide the test with more context than just the command -- for example, we need to tell the tester where the reading cursor should be before using the command.
- command "key commands": The key commands are single keys or key combinations needed to perform the task and record the result of the ensuing AT behavior.

An example of the command "instructions" for checkbox:

User task: "read checkbox":
- Commands have one of two kinds of instructions: navigate to the checkbox using or when the cursor is on the checkbox, read it using
User task: "read checkbox grouping":
- Commands have one of two kinds of instructions: navigate through group boundaries using or when the cursor is on a checkbox, read the group using
User task: "operate checkbox":
- Commands all have the same instruction: when the cursor is on the checkbox, change the state using

The question I have are:

Is this a reasonable way to encode the commands that we need to provide a user -- specifically, having a free form "instruction" field and then a key command field? Is the language I choose reasonable?
Should this data structure exist away from tests of the examples. Currently, the instructions for all examples will be kept in tests/resources/at-commands.mjs because I assumed there are multiple widgets that will include the same "user task" in their tests (for example, the tri-state checkbox will have all of the same user tasks). For this reason, it seems like the task and the command list should not be specific to an example. However, @mcking65 pointed out some of the commands are specific to the javascript in the example -- specifically, that space checks a checkbox. Do we need to handle these case separately? Or, like in the case of space, every checkbox that is designed according to the APG should be operable by a space, so it is more specific to the design pattern than the example?
Currently, key commands are grouped according to keys that do essentially the same thing. For example, you test "Tab and Insert Tab" in one go, assuming the results are the same for both. Same with "Up Arrow and Down Arrow". See all key commands currently tested in the file tests/resources/keys.mjs. The way these keys are formatted comes directly from @jfhector's work in #14.

Review and test the checkbox example for the prototype

Feedback on runner prototype

I made a prototype of the "runner" -- the goal of the runner was to be as simple as the CSS WG's test runner and to provide us the ability to test "performing a test run". The working group should eventually build something with designers, user testing and more features :) Tomorrow I'll be working on the features to show summary's of past test runs and download JSON files with test rests.

Soon I'll push the runner to gh-pages on this repo (instead of my fork) and have the runner rebuild and publish on merges into master.

https://spectranaut.github.io/aria-at/runner/

You will still be able to visit any tests committed to this repository individually, exactly as they are committed under the test directory, for example:

https://spectranaut.github.io/aria-at/tests/checkbox/read-checkbox.html

Feedback on ARIA-AT testing system roadmap of the next three months

I made a wiki page to outline the decision that need to be made by October 28th in order to build a successful prototype.

Please feel free to review and comment here, I'll address comments when I'm back from vacation on Sept 3rd.

Documentation: clarify how tests try to isolate a single command

From the telecon today I think there was some confusion about how the tests will work. @spectranaut said that the tests try to automatically put the test page in a particular state, and the user is then instructed to do two things:

put the screen reader in a particular mode
execute a command to test

and then compare the actual result with the expected result

I think we should clarify this in documentation so it's clearer for everyone.

Change assertions from priority 1-3 to only 2

We got some feedback that "nice to have" is unclear as to it's meaning. We might want to change to "must have" and "optional".

Create tests for APG design pattern: checkbox

APG design pattern: https://w3c.github.io/aria-practices/#checkbox

Document our Test Authorship and Test Runner Development Plan

Myself and @spectranaut and others at Bocoup, in part based on information from @mcking65 , have put together a document that could be used as a starting point for a plan for test authorship and test runner development.

This is a Google Doc for now, which is public and anyone can comment and make suggestions. For archive purposes, I've sent an HTML copy to [email protected]. If we want to have a document like this for the CG, I think we should move it to the wiki for this repository.

The document is here:
https://docs.google.com/document/d/1nm4bIZf-lUnSon6LeUMxor1fjXk9TsZdcsxmCTOyIFM/edit?usp=sharing

Test result page changes before Monday March 2nd

This is an example of the test results view with random data: https://w3c.github.io/aria-at/results/test-run/1

What changes do we want to make before we upload tests we will be linking publicly?

Updates for the test design

There are some updates that need to be made to the test design I noticed this last week.

setup script description
In the results JSON, we might want to get ride of the "summary" object at the test level as well.
In the test display, the results object should be downloadable after the last question has been answered
In the test display, clicking "review tests" should move the focus to the top of the review tables
Closely review all of Michael's feedback in this issue
Ability to order tests

Initial design for displaying tests!

Hey ya'll!

Please provide feedback on the "displaying" of the following tests. To make it clear what text the test harness is providing and what is supplied via the API, I added <em> tags to all of the text that is specific to the tests (so provided by either the test author or anything AT specific).

Here are three different "tests":
reading checkbox
operating checkbox
reading checkbox group

A few things to consider:

@jfhector designs discussed here and what we should incorporate from them.
Exactly what information we want recorded in each case ("all pass", "all fail", "some fail")

Soon to come:

The buttons for "all pass", "all fail" and "some fail" will be moved to tabs or a radio group.

Reconsider format for commands.json and test files

After reviewing #73, the following questions have come up:

Should commands.json that have been generated via script be committed?
What should the process be for generating test files? And should those files be committed?

If this repo will be moving to a pattern where files are generated, I envision a test runner or automation invoking the creation of such files.

Create tests for APG example: Editable Combobox With Both List and Inline Autocomplete

APG design pattern: https://w3c.github.io/aria-practices/#combobox

Aria-AT Use Cases

This is the first draft of 3 different use cases for the Aria AT Test Runner that I was able to identify while conducting a series of conversations with @spectranaut, @mfairchild365 and @mcking65. These include basic user flows as well as alternative flows for some scenarios. I would also like to include Use Cases for the Report itself in the near future.

My intention is to post this in the form of a Wiki page as soon as I get write access to the repo (I've requested this and I'm waiting for the invite). In the meantime, I would love to get feedback and discuss what folks think about this document.

Aria-AT Use Cases

A Use Case is defined as a written description of a series of tasks performed by a user. It begins with the user’s goal and ends with said goal being met. It outlines, from the user’s perspective, how the system should respond or react to certain requests and interactions.

The benefit of use cases is the added value of identifying how an application should behave and what could go wrong. They also provide a better understanding of the users’ goals, helping define the complexity of the application and the advantage of a better identification of requirement.

This document is the result of a series of conversations conducted with ARIA-AT contributors and stakeholders. It will serve as the foundation for defining the requirements and user interface for the project.

Aria AT Test Runner use cases

Use Case 1

Use Case 1	Admin adds tests to Test Runner
Actor	Admin (Aria AT member)
Use Case Overview	After tests have been designed and reviewed, the Admin prioritize and adds them to the system for testers to be executed. After that, the Admin will review them and later publish them
Trigger	Contributors have designed new tests that have been reviewed and are ready to be executed.
Precondition	test contributions went through a review process.

Use Case 1 - Basic flow	Add tests to Test Runner
Description	This is the main success scenario. It describes the situation where only adding and assigning tests to be executed are required.
1	Admin prioritizes which patterns need to be tested in what order
2	Admin adds tests to the system
3	Admin submit/assign tests to testers

Use Case 2

Use Case 2	Tester executes test
Actor	Tester (Accessibility QA contractors, Aria-AT community members)
Use Case Overview	Once tests have been prioritized and added to the system, at least two testers will execute each of these and submit them for review.
Trigger	The Admin has added tests to the pipeline that need to be executed.
Precondition 1	Tests have been prioritized.
Precondition 2	Tests have been added to the pipeline.

Use Case 2 - Basic flow	Execute test
Description	This is the main success scenario. It describes the situation where only executing and submitting a test are required.
1	Tester provides information about what browser and screen reader combination they will be using.
2	Tester gets a set of tests according to the browser and screen reader combination they will be using.
3	Tester opens the test.
4	Tester reads the instructions.
5	Tester follows the steps to execute the test.
6	Tester submits the test for review.

Use Case 2 - Alternative Flow 4A	Instructions are not clear to tester
Description	This scenario describes the situation where the testers doesn’t understand the instructions.
4A1	Tester reads the instructions.
4A2	Tester is confused about the instructions.
4A3	Tester submits a question to the Admin with regard to the instructions of the test that needs to be executed.
4A4	Tester receives answer from the Admin.

Use Case 2 - Alternative Flow 5A	Tester doesn’t have enough time to finish
Description	This scenario describes the situation where the testers, for whatever reason, doesn’t have enough time to finish executing a test.
5A1	Tester follows the steps to execute the test
5A2	Tester needs to make a pause for whatever reason
5A3	Tester saves their work in progress

Use Case 2 - Alternative Flow 5B	Tester return to the application to finish the execution of a test
Description	This scenario describes the situation where the tester has returned to the application to finish the execution of a test that is in progress.
5B1	Tester opens test that has been partially executed.
5B2	Tester continues following the steps to execute the test.
5B3	Tester submits the test for review.

Use Case 3

Use case 3	Admin Publishes Test Results
Actor	Admin (Aria AT member)
Use Case Overview	Once at least two testers have executed a given test, the results of this one goes to a draft mode where the Admin reviews them and later publishes them.
Trigger	At least two testers have executed a test and its results are ready to be reviewed.
Precondition 1	At least two testers have executed the test.
Precondition 2	The test results are in draft mode.

Use Case 3 - Basic flow	Publish test results
Description	This is the main success scenario. It describes the situation where only minimal review and publishing the results of a test are required.
1	Admin reviews and compares the results of a test that was executed.
2	Admin chooses the correct results.
3	Admin publishes the results.

Use Case 3 - Alternative flow 1A	Test results are wrong
Description	This scenario describes the situation where the results of the execution of a test are incorrect and need to be executed again.
1A1	Admin reviews the results of a test that was execute.
1A2	Admin finds out that the results are incorrect.
1A3	Admin removes test results from draft mode.
1A4	Admin adds test to the pipeline to be executed again.

Consider displaying command titles in addition to keystrokes

For example, what does what does "C / Shift+C" do?

Name the runner website??

We should have a cool name for the production runner!

Use teleconference IRC bot: dbaron/wgmeeting-github-ircbot

Consider using https://github.com/dbaron/wgmeeting-github-ircbot for #aria-at teleconferences, so we can get minutes posted as comments in the relevant GitHub issues.

Create tests for APG design pattern: menubar

APG design pattern: https://w3c.github.io/aria-practices/#menu

Improvements to high-level use cases wiki page

I’ve created a wiki page called 'High level use cases'.

I'm opening this issue as a place to discuss modifications to that wiki page.

Design a simple way for testers to contribute

Here are some thoughts and a small design exploration from me. This is based on the assertion model that @mcking65 shared in a separate issue, and the conversation that we had reviewing it.

I do agree that we need to make things simpler and easier to understand for testers

@mfairchild365 This week I ran the test for aria-details on a11ysupport.io to familiarise myself with the data model and interface.

I struggled to figure out exactly how to perform the test. In particular:

After reading the assertions, I wasn't sure exactly how I should perform the test, and what constituted success or failure.
I was confused when prompted to select what command I used to perform the test, because I expected to be told what commands I should use. I wasn't sure which option was the right one for me to select.
I didn't easily understand the options available to me when inputting the position of the virtual cursor before and after the command (eg. 'target', 'start of target', ...). I expected to be told, and felt concerned that I might not perform the right test in the right way.

From that perspective, I think that the conversation we're having now about how to simplify things for users is very helpful and needed.

I also realise that the test pages and instructions on a11ysupport.io are generated efficently at scale, and that clearer and more granular instructions might not be viable.

My thoughts on the simplifications that @mcking65 has been exploring

Here's the link again to the assertion model that @mcking65 shared in a separate issue. My comments here focus not so much on the data model itself, but on how testing instructions can be presented.

What I think works

1. I like that Matt's table gives more specific instructions about test methods.

I think that it's important to tell testers what they need to do and how, rather than asking them what they did.

2. I like the idea of describing expected results in non-technical terms as much as possible.

I think that assertions should be written to help people who are not fully confident in their knowledge of accessibility.

3. I like the idea of organising testing instructions and results in a table.

Seeing the different tests in different rows, with instructions on the left and results on the right, felt very intuitive to me. It instantly gave me an clear mental model of how to use the interface (ie. the table).

4. I like the idea of wrapping different 'test methods' into one single test.

I like how a tester is expected to test how a checkbox is announced, when reached in different ways, in one go. I imagine that following this idea will help us simplify the interface, without making things harder for users. (Caveat: I don't fully yet fully understand the downsides of such a simplification for our data).

What I think we can simplify further

Here are a few suggestions:

1. Remove / hide columns that are useful to us but not to testers

The 'Importance' and 'Tested attributes' columns are important parts of our data model, but I imagine that we could hide this information when people are performing tests, so that the interface and what is required of them is easier to understand at a glance.

2. Organise columns in an order that makes the test instructions easy to read and understand

The 'Importance' column is currently placed between 'Test name' and 'Assertion'. I believe that we can make test instructions clearer by purposefuly ordering columns in a way that reads naturally.

I personally like using the Gherkin syntax. I think that it makes test instructions and assertions much easier to understand and nicer to read.

'GIVEN' is for precondition(s)
'WHEN' is the interaction that the user/tester performs
'THEN' is the result of that interaction (ie. how the interface reacts).

3. Merge some cells so that the structure of the table is obvious at a glance to sighted users.

E.g. The 'Screen Reader mode' cells could be merged so that it's immediately obvious that the top half of the table is for 'Reading mode', and the bottom half for 'Interaction mode'

Caveat: I don't know whether merging cells makes things harders to understand or operate for screen reader users.

A little exploration I did

I put together a simplified table in HTML following this direction.

The table is not interactive, and was done very quicly with barely any styling. (I initially had this in a Excel file but I wanted to have better screen reader support for the double headers and merged cells).

I imagine that testers will find a table like this easier to understand at a glance, and more inviting. Please let me know your thoughts.

I am imagining that our interface for contributing test results could be just a table for test case. (Of course this table would need to be dynamic based on what browser and assistive technology users have selected).

Please tell me what I'm not seeing / considering, so that we can make things better together.

What I'd like to explore next

Splitting the cells in the 'Then' column so that assertions are more granular.

My intuition is that we might be able to make things easier both for testers and for us by splitting the broad assertions in Matt's table into more granular ones. (eg. "role", "name", "state" into separate rows).

If we keep cells under "Given that" and "When" merged as is, I believe that we might be able to afford the extra granularity without making things look too complicated. (I'll give it a try).

Rundown: Review of three tests (menubar,checkbox,combobox)

Reading the posts on review, I think they could be classified into several categories:

Consistency > It seems most important to me, but mostly judgment call.
Command > I think this is something we want to discuss in the meeting instead of posts.
Language
Technical bug > I think this could be discussed through GitHub postings if we ran out of time.

Based on this classification I made a very quick rundown that summarized review posts done by Matt, Michael, and me (attached) , hoping to facilitate the upcoming meeting.
*The document doesn't include obvious issues such as missing commands or things already agreed in the checkbox issued between Matt and Michael.

0226_test_rundown.xlsx

Link to each review issue
Menubar review
Checkbox review
Combobox review

GitHub management - labels, projects, milestones

Looking over the issues we have open currently, I can group them into these labels:

tests
process
documentation
test-runner
test-report
feedback
prototype-test-runner
revisit later (maybe these should be closed, until revisited)
Agenda+ for adding something to the teleconference agenda
Agenda+ F2F for adding something to the next face-to-face meeting agenda

Do we want to use only labels, or also projects, milestones?

Test writing process improvements

Currently, the process for converting spreadsheets to tests is pretty wonky, and it took me hours to convert Matt's combobox tests using Jon's scripts. The process as we have it right now:

Test must be written in an excel format that is not documented or automatically validated (so if the formats wrong you have to manually debug it) that must be exported to CSV every time you update it.
Use Jon's two python scripts to convert to test files and command.json file (during which, there is also no data validation, so you might convert using an incorrect string for "applies to" or the generic "task" or "key" commands).
Use npm review tests (a node script) to create test plans from the test files.

I'd like to propose we do the following instead:

Have one script that takes the EXCEL SHEET and produces the test file and test plan summaries, and does automatic data validation and tells you exactly which data is wrong or missing.

I didn't do this yet because I wanted people to have experience writing tests before we locked down a test writing process. So my main question is: Do we want to write tests in excel sheets that Jon and Yohta designed, and Matt has now used? If we do this, we will have to be very strict with the excel sheet format, but the test conversion script can also tell you if your excel sheet is formatted incorrectly.

Document our Working Mode

I wrote a document that we could use as a starting point for discussion for how we work and handle disagreements in ARIA-AT CG. Hopefully, there won't be any disagreements, and we'll sail smoothly to our end goal, but if that doesn't happen I think it can be good to have an agreed upon framework.

This is a Google Doc for now, which is public and anyone can comment and make suggestions. For archive purposes, I have sent a plain text copy to [email protected]. If we want to have a document like this for the CG, I think we should move it to the wiki for this repository.

The document is here:
https://docs.google.com/document/d/1CXNRSDj1rFsVPtCu4YFww7I0OUEGRIXY6i_s453uaak/edit?usp=sharing

verifyATBehavior should accept only one test definition

Originally verifyATBehavior could be called multiple times with different test. Additionally, mode accepted a list of screen reader modes. We should change this to:

only allowing on called to verifyATBehavior
mode should accept a string not a list of strings (only one mode per test)

Investigate open source test management packages

It's possible an open source management software could be used to provide a web interface for tests instead of rolling our own. I'd like to keep track of a list of requirements while I research some, so please chime!

List of requirements, as I understand them now:

Presentation of test and assertions should be configurable (such as the discussion in issue: #6)
Should be able to upload test into the software (tests authored by this community group)
Should be able to configure custom data or meta data in the test result.
Should be able to export test results
Log in/accounts for testers

Discuss assertions: general or specific

This is continuation of the discussion in this comment: #12 (comment)

@mfairchild365 wrote:

Using a11yspport.io and HTML links as an example:

an HTML link feature that references many tests

The assertions are defined on the feature object (JSON)

And referenced by the tests (JSON)

Note that abstract versions of common assertions such as 'convey name', and 'convey role' are defined further left in the build process. This helps with consistency and reduces redundancies.
My approach is slightly different than the approach we are taking here, but I think we can apply similar concepts. We will be testing the same ARIA features across many tests.

When displaying the test to a human tester, we want the fail and success cases to be extremely clear and leave no need for interpretation. So as the tests are designed now, the "assertions" are not generic because they describe exactly how the AT should respond to the specific example widget being tested against. However, I think there might be a real compelling reason to use more abstract assertions as @mfairchild365 designed on a11ysupport.io. Also the generalizations in a11ysupport.io are pretty good and we could probably directly import the lot of them.

Similar to this problem of designing assertions is the problem of designing user instructions. For the user instructions, I think we have two different goals:

The user instructions to be abstract enough to apply to any screen reader, so that we only have to write one test -- the specific operating instructions for a specific screen reader can be swapped in depending on what AT you are testing. And example of an abstract user instruction: "operate a checkbox". Multiple tests might have this user instruction, for example, we have a test that involves operating two-state checkbox and a test that involved operating a three-state checkbox and we want to be able to sub in the same abstract user instruction (operating the checkbox is the same in both tests).
We need to tell the tester to do something very specific with the example widget which is complicated, and the instruction "operate a checkbox" is not quite enough. We need them to "check the first checkbox" or "put the mixed state checkbox into a mixed state".

So far, I've concluded that when writing the test we need to include both things seperately: (1) the "abstract operating instruction" in order to programmatically map an abstract operation to a set of AT commands the tester should use during the test, and (2) a human understandable instruction for what exactly you should be doing with the widget. The first I'm calling the "task" and the second I'm calling "specific_user_instructions".

For the assertions, on the other hand:

The generic case: the reason we might want generic assertions are (according to @mfairchild365) to be able to apply these tests to AT that are not screen readers. Currently, they are written computer screen readers in mind.
The specific case: communicate very specifically to the tester the fail case and the success case for this specific example.

Moving forward: Should we simply supply both generic and specific assertions for every test, like I'm suggestion we do for user instructions? Will this make it easier to apply the tests to other AT's moving forward? Should we just put this on the list of things "to look into abstracting later" after we have written more tests?

Mapping abstract test instructions to concrete, screen-reader-specific instructions

This first issue serves as a summary of the thread.

Latest testing command files

Testing commands for checkbox

Testing commands for menubar

Outstanding decisions

We will need to decide which testing commands to include in version 1, and which ones to leave out to avoid making the tests to effortful

Next steps

I believe that the testing commands for both checkbox and menubar are ready to use for the prototype. It'd be good for someone else to review them at some point in the near future and raise issues.

Feedback on Test Harness Feature Requirements

This is the issue to track feedback on https://github.com/w3c/aria-at/wiki/Test-Harness-Feature-Requirements

Create tests for native widgets in HTML

I think it would be useful to consider testing the behavior of native widgets in HTML, in addition to APG design patterns. It usually is easier for web developers to use a standard element and get accessibility for free, compared to rolling their own widgets with ARIA. There are also some widgets in HTML that are not in APG (e.g. color well).

For example:

<input type=checkbox> (checkbox)
<input type=range> (slider)
<input type=color> (color well)
<input type=file> (file upload control)
<select> (select-only combobox)

Some challenges:

The rendering and behavior for each control is for most widgets not tightly defined, allowing browsers (and ATs) to innovate with how they are represented.
Not all standard widgets are supported in all browsers yet.

Questions related to "setup test page"

Update on changes

In PR #76, I added the new key setup_script_description to verifyATBehavior and I added a new column to the test spreadsheet @jongund and @yohta89 designed to record this string while test writing. I did this because Matt included the description of the script in a column (where at Jon and Yohta left it in a comment in the setupScript itself) so we needed to norm on one way or the other. Additionally, I added the setup test page script description to the test plans.

Question 1: Should we include this information in the test itsself, and how?

I tried putting it into the tests to motivate discussion, here is a menubar editor test as an example.

I think we should surface this summary to the testers themselves. Testers will need to be able to know when a setup script DID NOT work, and mark the test as "uncompletable" or "broken". If they are used to seeing a widget over and over again, they should be altered when the state should be different.

Question 2: Should we modify the way we describe the set up script to not be an imperative?

Right now, they all say "check the checkbox" or "open the submenu". Maybe we should change it to a description of the state after the script executes, like "the first checkbox will be checked" and "the submenu will be opened". From the testers perspective, I think the imperative will be confusing because they will be looking for imperative statements to execute.

Decide if we will test single-key navigation in VoiceOver's Quick Nav mode

This came up on the call today, and I don't want to lose track of the conversation.

Question: Should we test the single-key navigation in VoiceOver's Quick Nav Mode?

Notes:

I checked our tests that are in the prototype, and we do not currently test this.
VoiceOver does not have single-key navigation turned on by default. It is a setting that users can enable. To test:
1. Reset to default settings:
  1. Open VoiceOver Utiltity
  2. File > Reset All VoiceOver Preferences
2. Observe the setting that is off by default
  1. Using VoiceOver Utility
  2. Open the "commanders" panel
  3. Click the "quick nav" tab
  4. Observe that the setting "Enable single-key webpage navigation when using Quick Nav" is unchecked

My take: Since this functionality is not enabled by default, we should not expend time testing it.

Prototype design

We have 14 days to build a prototype. The goal of the prototype is to test and iterate on the the test design. The following prototype will not have some required features of the ultimately desired test harness (such as the ability to log in or show past test results or queue tests based on past test results). It will cover the basic requirements for displaying tests and test result recording.

Stage 1: ariaatharness.js (as described in this PR):

displayATTest: Display test assertions, record test results in json html tag (following the convention of WPT tests). The API will be designed to allow for structured data collection on test failures.
openTestPage: Open a browser window with test page.
executeScriptInTestPage: Execute javascript in test page.

Stage 2: Write a simple test runner for the tests.

Static webpage deployed to gh-pages in this repository
Test runner displays list of tests in git repository
Test runner allows user to select a list of tests to run
Test runner will display a test, and when the test is completed, it will collect results and display the next test
Test runner will collect results and at the end of a test run, display results and provide ability to download results in a structured format (as a file).
Test runner will display all results that have been checked into a results directory. Files within sub-directories will be combined into a single table. Test results from different AT will be put into different tables, not shown side by side.

setupScripts not working for menubar edit tests

Give CG members access to github

Document vocabulary for label/name/accessible name

In the telecon today we discussed that there was inconsistency in "label" and "name". We wanted to avoid saying "label" (to align with APG), but that "name" is adequate. No need to say "accessible name".

I think we should document this. In https://github.com/w3c/aria-at/wiki/Test-Harness-and-Test-vocabulary ?

How to summarize and display test results?

Hey all! This is a somewhat urgent issue because I need to get something in by Wednesday the 27th, so hoping spend all of Tuesday the 26th at the latest implementing a summary page. Each test has a lot of information recorded in it, and I am not sure which information is the most relevant to show. I'm hoping to get feedback in this issue to simply make a first draft of a test result summary.

Here is an example html page with the summary after you complete the "read-checkbox.html" test, which I used because it is the longest test:

read checkbox results

That is the result for just one test. What we need to discuss here is:

Is this a reasonable way to show a summary of one test
How should we ultimately show results for "read checkbox" along with results from "operate checkbox" and "read checkbox grouping"?
What does it mean for a test to pass?

Current implementation of algorithm for test passing or failing:

The test passes if:
1. All assertions pass for every AT command.
2. There is no unexpected bad behaviors (such as irrelevant extra information or the AT crashing) after any AT command.

Therefore, any failing assertion or undesirable behavior after any key command will result in the test failing. We could have a different state for a test result if all assertions pass but there is some additional undesirable behavior occurs.

What will cause an assertion to pass or fail?

An assertion will fail if any screen reader commands results in incorrect information (in this case, the tester will have marked "incorrect output"). In the checkbox case, if the accessible name or role or state is actually wrong for any tested key command, then the assertion fails.

Additionally, an assertion will fail if the test author includes an additional assertion that is not related to the output of the screen reader and that assertion fails. So far we only have one example of this kind of assertion, that is the assertion about the JAWS and NVDA changing modes when you use TAB to read a checkbox.

An assertion will be considered to pass if there is only "missing" information (for example, if a test marks "no output" because the role "checkbox" is not announced).

Not all assertions are equal

The test design so far does not have a way to record the necessity of an assertion (for example, "mandatory" or "nice to have"). This will take some thinking to fit into the test design (as the tests are already quite complicated) so I do not think that the ability to mark some assertions as necessary for passing and others are not necessary will make it into the prototype for this phase of the project.

A way to see support for ARIA attributes at a glance

I'm opening this issue just so that we have a place to share some design research.

If you find useful references for how similar features are designed or implemented somewhere else, please share these examples here. Even when there are fundamental differences in the projects', having a wide range of examples and references will help us a lot when we think about design.

This for example is how the UK Government Digital Service uses Github issues to share useful design references for different features. alphagov/govuk-design-system-backlog#133

Feedback/Notes on prototype from MF

This issue will continue to be updated with feedback and notes on the prototype. I'll edit this paragraph when I'm done with my review.

The version of the prototype used for this feedback: https://w3c.github.io/aria-at/

Rough high-level time estimates:

for an expert to run 1 test (on average): ?
for an expert to author 1 test (on average): ?
for an expert to author all of the tests for a single screen reader for a single APG example: ?

Questions related to software design:

Note: The test runner was built with React.
Note: tests are stored in /tests/ and there are folders for each APG example. https://github.com/w3c/aria-at/tree/master/tests
Note: results are stored as json files in results: https://github.com/w3c/aria-at/tree/master/results
Question: can multiple result files reference the same tests? It looks like they can, but it might be hard to track down specific test results from the file structure alone. Might not be a big issue.
Note: aria-at-harness.mjs is the magic that generates the test run pages https://github.com/w3c/aria-at/blob/master/tests/resources/aria-at-harness.mjs
AT commands are housed in https://github.com/w3c/aria-at/blob/master/tests/resources/at-commands.mjs
Possible keyboard commands are stored in https://github.com/w3c/aria-at/blob/master/tests/resources/keys.mjs
Example of verifyATBehavior() call: https://github.com/w3c/aria-at/blob/master/tests/checkbox/navigate-through-checkbox-group-interaction.html
9: Question: Is it possible to create an automated test suite for our tests? For example, is it possible to test that all calls to verifyATBehavior() contain valid modes (and possibly assertions) and that all required properties are present? This could save time debugging down the road.

Questions related to authoring tests:

Question: It looks like each test is authored by hard-coding a call to verifyATBehavior() and defining an array of output_assertions. What if output assertions are shared between tests? What if AT behavior is shared between tests? For example, the assertions of role, name, and boundaries should always be met everywhere that a named group is present, and it may be present in many APG examples (and therefore tests). Pre-defining possible assertions and AT Behavior will help save time during authoring and improve consistency. Answer from 2019-12-11: Seems like a good idea, let's look into this further.
Each AT may convey roles, properties, and states slightly differently. For example, Orca conveys a "group" as "panel", NVDA conveys it as "grouping", and Narrator just conveys the group name (and thus implies a grouping). Are we writing unique test files for each AT, or is there a mechanism that translates a generic assertion into an AT specific assertion? Answer from 2019-12-11: we should use general language in the assertion to convey the assertion, but keep it flexible. "convey the group role" instead of "convey the role as 'group'"

Questions related to performing tests:

Question: when submitting forms, not all error messages are explained in the text. For example, If I click 'move to next behavior test' with speech input filled out but not radio buttons checked, the fieldset around the speech output is red and appears to receive focus, but it is not clear what the error is or what I need to do to fix it. Is this expected? known bug
Question: should the jaws version be a required field? yes, but how specific should it get? answer from 2019-12-11: the version number and build number? Sometimes the version number gets very long. We could make this a dropdown and control which version can be tested.
Question: form fields appear to be grouped into fieldsets, but those fieldsets are missing legends (programmatic names). Is this expected? answer: 2019-12-11, bug in prototype
Question: a single output is provided for forward and backward navigation for the same command (tab/shift+tab or up arrow/down arrow). However, the output might be different between forward and backward navigation. For example, using the down arrow to enter a named group will yield different output than using up arrow to exit the same group. Should we track these unique output situations?
Question: should there be a field for notes about an assertion? answer: 2019-12-11: it's possible to create many tests to cover every possible situation like what is described here.
Note: I like that the instructions are very clear
Note: I like that the expected output is very clearly communicated to the tester
Question: There are essentially 3 "modes"; 1. auto, 2. explicit reading, 3. explicit forms. For "Navigating through checkbox group in interaction mode" and JAWS, we have tests for explicit reading and explicit forms, but not auto. Since automatic mode switching is enabled by default, my gut says that auto is more important to test that the explicit modes.
Question: Tests say to use "insert+Z" to toggle the virtual cursor. I don't see this command in the JAWS documentation. Is this the correct command? Answer: yes, you can find the command by opening the "search commands" (insert+space,j) window in JAWS. Although it is interesting that none of the documentation on the website list the command.

Questions related to reading test results:

Note: I like that results are summarized in the first table and that the rest of the results are clear and understandable
Bug: go to the results page, then click a result, then click the back button in the browser. observe that the test results are now duplicated. If you click forward, then back yet again, the list of results gets even longer.

Questions related to the process as a whole:

Question: When someone submits test results, they are presented with a link to download a JSON version of the results? How are they contributed back to the project? Do they have to open an issue or submit a PR?

Example JSON output

{
  "results": [
    {
      "test": "Navigating through checkbox group in interaction mode.",
      "details": [
        {
          "name": "Navigating through checkbox group in interaction mode.",
          "specific_user_instruction": "navigate through boundaries of checkbox group",
          "task": "navigate to checkbox group",
          "commands": [
            {
              "command": "Tab / Shift+Tab",
              "output": "test",
              "unexpected_behaviors": [],
              "support": "FULL",
              "assertions": [
                {
                  "assertion": "The role 'group' is spoken",
                  "priority": 1,
                  "pass": "Good Output "
                },
                {
                  "assertion": "The group's name 'Sandwich Condiments' is spoken",
                  "priority": 1,
                  "pass": "Good Output "
                },
                {
                  "assertion": "The boundaries of the group (before the first checkbox and after the last checkbox) are conveyed",
                  "priority": 1,
                  "pass": "Good Output "
                }
              ]
            }
          ],
          "summary": {
            "1": {
              "pass": 3,
              "fail": 0
            },
            "2": {
              "pass": 0,
              "fail": 0
            },
            "3": {
              "pass": 0,
              "fail": 0
            },
            "unexpectedCount": 0
          }
        }
      ],
      "status": "PASS"
    }
  ],
  "assistiveTechnology": {
    "name": "JAWS",
    "version": ""
  },
  "browser": {
    "name": "Chrome",
    "version": "78.0.3904.108"
  },
  "designPattern": "checkbox"
}

Timeline for test plan development in Spring/Summer 2020

Break into smaller pieces, and expect it to be an iterative process with lots of opportunity for feedback from AT developers.

Potentially: Write tests for three examples, get feedback, iterate on designs, then work on next three examples and repeat.

Tests should display "must have", "should have" and "nice to have" instead of priority numbers

Update link to editors draft for the design patterns in tests

Develop report for reviewing test assertions, instructions, and meta data

After the tests for an APG pattern are written in HTML and JavaScript in the WPT format, we need a way for people to easily read through the tests. Without a report of what is in the HTML files that hold the WPT format, you would have to read the HTML source or run the tests, neither of which is a practical solution for reviewing the test plan for a pattern or validating whether the encoded expectations are correct.

Uses:

Test Author needs to review test plan to ensure there are no mistakes in the tests assertions and documentation.
Test plans need peer review to ensure plans are complete and accurate.
Assistive technology developers need to review expectations for their product.

When generating the report, we should be able to choose the assistive technologies that are included. Uses:

Commands or instructions for a subset of applicable assistive technologies have been edited and need to be reviewed.
We are providing the report to a specific assistive technology developer.

Create tests for APG design pattern: grid

APG design pattern: https://w3c.github.io/aria-practices/#grid

w3c / aria-at Goto Github PK

aria-at's Issues

Aria-AT Use Cases

Aria AT Test Runner use cases

Use Case 1

Use Case 2

Use Case 3

I do agree that we need to make things simpler and easier to understand for testers

My thoughts on the simplifications that @mcking65 has been exploring

What I think works

1. I like that Matt's table gives more specific instructions about test methods.

2. I like the idea of describing expected results in non-technical terms as much as possible.

3. I like the idea of organising testing instructions and results in a table.

4. I like the idea of wrapping different 'test methods' into one single test.

What I think we can simplify further

1. Remove / hide columns that are useful to us but not to testers

2. Organise columns in an order that makes the test instructions easy to read and understand

3. Merge some cells so that the structure of the table is obvious at a glance to sighted users.

A little exploration I did

What I'd like to explore next

Splitting the cells in the 'Then' column so that assertions are more granular.

Latest testing command files

Testing commands for checkbox

Testing commands for menubar

Outstanding decisions

Next steps

Update on changes

Question 1: Should we include this information in the test itsself, and how?

Question 2: Should we modify the way we describe the set up script to not be an imperative?

Current implementation of algorithm for test passing or failing:

What will cause an assertion to pass or fail?

Not all assertions are equal

Rough high-level time estimates:

Questions related to software design:

Questions related to authoring tests:

Questions related to performing tests:

Questions related to reading test results:

Questions related to the process as a whole:

Example JSON output

Recommend Projects

Recommend Topics

Recommend Org