w3c / aria-at Goto Github PK
View Code? Open in Web Editor NEWAssistive Technology ARIA Experience Assessment
Home Page: https://aria-at.netlify.app
License: Other
Assistive Technology ARIA Experience Assessment
Home Page: https://aria-at.netlify.app
License: Other
Objective: Determine how gramular to make assertions when testing screen reader support for an element or widget pattern.
Including things such as reading mode/interactive mode and their names in different screen readers.
Right now, all of the data to map tasks to a list of specific AT commands is in one folder, tests/resources/at-commands
. We need to re-architect this so that this data is in the design pattern's folder.
Assigned to Matt until it can be assigned to Jon and Yohta
Question from Jon Gunderson: How are the test cases authored for the prototype?
Check out: https://github.com/w3c/aria-at/wiki/Test-Harness-and-Test-vocabulary
Are there other key components of a test that should be added, or explanations that need to be more clear, or better terms for these ideas, or other parts of this project that need terminology?
There is a lot of back ground in issues #14 and some in #15.
Currently, as a proof of concept, I added the AT-specifc commands for checkbox into a file call tests/resources/at-commands.js. The actual encoding is not super important right now (as this is a prototype), the goal is mostly to get feedback on the API.
getATCommands(screenReader, screenReaderMode, userTask)
Inputs:
Returns:
An example of the command "instructions" for checkbox:
navigate to the checkbox using
or when the cursor is on the checkbox, read it using
navigate through group boundaries using
or when the cursor is on a checkbox, read the group using
when the cursor is on the checkbox, change the state using
The question I have are:
tests/resources/at-commands.mjs
because I assumed there are multiple widgets that will include the same "user task" in their tests (for example, the tri-state checkbox will have all of the same user tasks). For this reason, it seems like the task and the command list should not be specific to an example. However, @mcking65 pointed out some of the commands are specific to the javascript in the example -- specifically, that space checks a checkbox. Do we need to handle these case separately? Or, like in the case of space, every checkbox that is designed according to the APG should be operable by a space, so it is more specific to the design pattern than the example?I made a prototype of the "runner" -- the goal of the runner was to be as simple as the CSS WG's test runner and to provide us the ability to test "performing a test run". The working group should eventually build something with designers, user testing and more features :) Tomorrow I'll be working on the features to show summary's of past test runs and download JSON files with test rests.
Soon I'll push the runner to gh-pages on this repo (instead of my fork) and have the runner rebuild and publish on merges into master.
https://spectranaut.github.io/aria-at/runner/
You will still be able to visit any tests committed to this repository individually, exactly as they are committed under the test directory, for example:
https://spectranaut.github.io/aria-at/tests/checkbox/read-checkbox.html
I made a wiki page to outline the decision that need to be made by October 28th in order to build a successful prototype.
Please feel free to review and comment here, I'll address comments when I'm back from vacation on Sept 3rd.
From the telecon today I think there was some confusion about how the tests will work. @spectranaut said that the tests try to automatically put the test page in a particular state, and the user is then instructed to do two things:
and then compare the actual result with the expected result
I think we should clarify this in documentation so it's clearer for everyone.
We got some feedback that "nice to have" is unclear as to it's meaning. We might want to change to "must have" and "optional".
APG design pattern: https://w3c.github.io/aria-practices/#checkbox
group
role and name.Myself and @spectranaut and others at Bocoup, in part based on information from @mcking65 , have put together a document that could be used as a starting point for a plan for test authorship and test runner development.
This is a Google Doc for now, which is public and anyone can comment and make suggestions. For archive purposes, I've sent an HTML copy to [email protected]. If we want to have a document like this for the CG, I think we should move it to the wiki for this repository.
The document is here:
https://docs.google.com/document/d/1nm4bIZf-lUnSon6LeUMxor1fjXk9TsZdcsxmCTOyIFM/edit?usp=sharing
This is an example of the test results view with random data: https://w3c.github.io/aria-at/results/test-run/1
What changes do we want to make before we upload tests we will be linking publicly?
There are some updates that need to be made to the test design I noticed this last week.
Hey ya'll!
Please provide feedback on the "displaying" of the following tests. To make it clear what text the test harness is providing and what is supplied via the API, I added <em>
tags to all of the text that is specific to the tests (so provided by either the test author or anything AT specific).
Here are three different "tests":
reading checkbox
operating checkbox
reading checkbox group
A few things to consider:
Soon to come:
After reviewing #73, the following questions have come up:
commands.json
that have been generated via script be committed?If this repo will be moving to a pattern where files are generated, I envision a test runner or automation invoking the creation of such files.
APG design pattern: https://w3c.github.io/aria-practices/#combobox
This is the first draft of 3 different use cases for the Aria AT Test Runner that I was able to identify while conducting a series of conversations with @spectranaut, @mfairchild365 and @mcking65. These include basic user flows as well as alternative flows for some scenarios. I would also like to include Use Cases for the Report itself in the near future.
My intention is to post this in the form of a Wiki page as soon as I get write access to the repo (I've requested this and I'm waiting for the invite). In the meantime, I would love to get feedback and discuss what folks think about this document.
A Use Case is defined as a written description of a series of tasks performed by a user. It begins with the user’s goal and ends with said goal being met. It outlines, from the user’s perspective, how the system should respond or react to certain requests and interactions.
The benefit of use cases is the added value of identifying how an application should behave and what could go wrong. They also provide a better understanding of the users’ goals, helping define the complexity of the application and the advantage of a better identification of requirement.
This document is the result of a series of conversations conducted with ARIA-AT contributors and stakeholders. It will serve as the foundation for defining the requirements and user interface for the project.
Use Case 1 | Admin adds tests to Test Runner |
---|---|
Actor | Admin (Aria AT member) |
Use Case Overview | After tests have been designed and reviewed, the Admin prioritize and adds them to the system for testers to be executed. After that, the Admin will review them and later publish them |
Trigger | Contributors have designed new tests that have been reviewed and are ready to be executed. |
Precondition | test contributions went through a review process. |
Use Case 1 - Basic flow | Add tests to Test Runner |
---|---|
Description | This is the main success scenario. It describes the situation where only adding and assigning tests to be executed are required. |
1 | Admin prioritizes which patterns need to be tested in what order |
2 | Admin adds tests to the system |
3 | Admin submit/assign tests to testers |
Use Case 2 | Tester executes test |
---|---|
Actor | Tester (Accessibility QA contractors, Aria-AT community members) |
Use Case Overview | Once tests have been prioritized and added to the system, at least two testers will execute each of these and submit them for review. |
Trigger | The Admin has added tests to the pipeline that need to be executed. |
Precondition 1 | Tests have been prioritized. |
Precondition 2 | Tests have been added to the pipeline. |
Use Case 2 - Basic flow | Execute test |
---|---|
Description | This is the main success scenario. It describes the situation where only executing and submitting a test are required. |
1 | Tester provides information about what browser and screen reader combination they will be using. |
2 | Tester gets a set of tests according to the browser and screen reader combination they will be using. |
3 | Tester opens the test. |
4 | Tester reads the instructions. |
5 | Tester follows the steps to execute the test. |
6 | Tester submits the test for review. |
Use Case 2 - Alternative Flow 4A | Instructions are not clear to tester |
---|---|
Description | This scenario describes the situation where the testers doesn’t understand the instructions. |
4A1 | Tester reads the instructions. |
4A2 | Tester is confused about the instructions. |
4A3 | Tester submits a question to the Admin with regard to the instructions of the test that needs to be executed. |
4A4 | Tester receives answer from the Admin. |
Use Case 2 - Alternative Flow 5A | Tester doesn’t have enough time to finish |
---|---|
Description | This scenario describes the situation where the testers, for whatever reason, doesn’t have enough time to finish executing a test. |
5A1 | Tester follows the steps to execute the test |
5A2 | Tester needs to make a pause for whatever reason |
5A3 | Tester saves their work in progress |
Use Case 2 - Alternative Flow 5B | Tester return to the application to finish the execution of a test |
---|---|
Description | This scenario describes the situation where the tester has returned to the application to finish the execution of a test that is in progress. |
5B1 | Tester opens test that has been partially executed. |
5B2 | Tester continues following the steps to execute the test. |
5B3 | Tester submits the test for review. |
Use case 3 | Admin Publishes Test Results |
---|---|
Actor | Admin (Aria AT member) |
Use Case Overview | Once at least two testers have executed a given test, the results of this one goes to a draft mode where the Admin reviews them and later publishes them. |
Trigger | At least two testers have executed a test and its results are ready to be reviewed. |
Precondition 1 | At least two testers have executed the test. |
Precondition 2 | The test results are in draft mode. |
Use Case 3 - Basic flow | Publish test results |
---|---|
Description | This is the main success scenario. It describes the situation where only minimal review and publishing the results of a test are required. |
1 | Admin reviews and compares the results of a test that was executed. |
2 | Admin chooses the correct results. |
3 | Admin publishes the results. |
Use Case 3 - Alternative flow 1A | Test results are wrong |
---|---|
Description | This scenario describes the situation where the results of the execution of a test are incorrect and need to be executed again. |
1A1 | Admin reviews the results of a test that was execute. |
1A2 | Admin finds out that the results are incorrect. |
1A3 | Admin removes test results from draft mode. |
1A4 | Admin adds test to the pipeline to be executed again. |
For example, what does what does "C / Shift+C" do?
We should have a cool name for the production runner!
Consider using https://github.com/dbaron/wgmeeting-github-ircbot for #aria-at teleconferences, so we can get minutes posted as comments in the relevant GitHub issues.
APG design pattern: https://w3c.github.io/aria-practices/#menu
I’ve created a wiki page called 'High level use cases'.
I'm opening this issue as a place to discuss modifications to that wiki page.
Here are some thoughts and a small design exploration from me. This is based on the assertion model that @mcking65 shared in a separate issue, and the conversation that we had reviewing it.
@mfairchild365 This week I ran the test for aria-details
on a11ysupport.io to familiarise myself with the data model and interface.
I struggled to figure out exactly how to perform the test. In particular:
From that perspective, I think that the conversation we're having now about how to simplify things for users is very helpful and needed.
I also realise that the test pages and instructions on a11ysupport.io are generated efficently at scale, and that clearer and more granular instructions might not be viable.
Here's the link again to the assertion model that @mcking65 shared in a separate issue. My comments here focus not so much on the data model itself, but on how testing instructions can be presented.
I think that it's important to tell testers what they need to do and how, rather than asking them what they did.
I think that assertions should be written to help people who are not fully confident in their knowledge of accessibility.
Seeing the different tests in different rows, with instructions on the left and results on the right, felt very intuitive to me. It instantly gave me an clear mental model of how to use the interface (ie. the table).
I like how a tester is expected to test how a checkbox is announced, when reached in different ways, in one go. I imagine that following this idea will help us simplify the interface, without making things harder for users. (Caveat: I don't fully yet fully understand the downsides of such a simplification for our data).
Here are a few suggestions:
The 'Importance' column is currently placed between 'Test name' and 'Assertion'. I believe that we can make test instructions clearer by purposefuly ordering columns in a way that reads naturally.
I personally like using the Gherkin syntax. I think that it makes test instructions and assertions much easier to understand and nicer to read.
E.g. The 'Screen Reader mode' cells could be merged so that it's immediately obvious that the top half of the table is for 'Reading mode', and the bottom half for 'Interaction mode'
Caveat: I don't know whether merging cells makes things harders to understand or operate for screen reader users.
I put together a simplified table in HTML following this direction.
The table is not interactive, and was done very quicly with barely any styling. (I initially had this in a Excel file but I wanted to have better screen reader support for the double headers and merged cells).
I imagine that testers will find a table like this easier to understand at a glance, and more inviting. Please let me know your thoughts.
I am imagining that our interface for contributing test results could be just a table for test case. (Of course this table would need to be dynamic based on what browser and assistive technology users have selected).
Please tell me what I'm not seeing / considering, so that we can make things better together.
My intuition is that we might be able to make things easier both for testers and for us by splitting the broad assertions in Matt's table into more granular ones. (eg. "role", "name", "state" into separate rows).
If we keep cells under "Given that" and "When" merged as is, I believe that we might be able to afford the extra granularity without making things look too complicated. (I'll give it a try).
Reading the posts on review, I think they could be classified into several categories:
Based on this classification I made a very quick rundown that summarized review posts done by Matt, Michael, and me (attached) , hoping to facilitate the upcoming meeting.
*The document doesn't include obvious issues such as missing commands or things already agreed in the checkbox issued between Matt and Michael.
Link to each review issue
Menubar review
Checkbox review
Combobox review
Looking over the issues we have open currently, I can group them into these labels:
tests
process
documentation
test-runner
test-report
feedback
prototype-test-runner
revisit later
(maybe these should be closed, until revisited)Agenda+
for adding something to the teleconference agendaAgenda+ F2F
for adding something to the next face-to-face meeting agendaDo we want to use only labels, or also projects, milestones?
Currently, the process for converting spreadsheets to tests is pretty wonky, and it took me hours to convert Matt's combobox tests using Jon's scripts. The process as we have it right now:
npm review tests
(a node script) to create test plans from the test files.I'd like to propose we do the following instead:
I didn't do this yet because I wanted people to have experience writing tests before we locked down a test writing process. So my main question is: Do we want to write tests in excel sheets that Jon and Yohta designed, and Matt has now used? If we do this, we will have to be very strict with the excel sheet format, but the test conversion script can also tell you if your excel sheet is formatted incorrectly.
I wrote a document that we could use as a starting point for discussion for how we work and handle disagreements in ARIA-AT CG. Hopefully, there won't be any disagreements, and we'll sail smoothly to our end goal, but if that doesn't happen I think it can be good to have an agreed upon framework.
This is a Google Doc for now, which is public and anyone can comment and make suggestions. For archive purposes, I have sent a plain text copy to [email protected]. If we want to have a document like this for the CG, I think we should move it to the wiki for this repository.
The document is here:
https://docs.google.com/document/d/1CXNRSDj1rFsVPtCu4YFww7I0OUEGRIXY6i_s453uaak/edit?usp=sharing
Originally verifyATBehavior could be called multiple times with different test. Additionally, mode
accepted a list of screen reader modes. We should change this to:
It's possible an open source management software could be used to provide a web interface for tests instead of rolling our own. I'd like to keep track of a list of requirements while I research some, so please chime!
List of requirements, as I understand them now:
This is continuation of the discussion in this comment: #12 (comment)
@mfairchild365 wrote:
Using a11yspport.io and HTML links as an example:
- an HTML link feature that references many tests
- The assertions are defined on the feature object (JSON)
- And referenced by the tests (JSON)
Note that abstract versions of common assertions such as 'convey name', and 'convey role' are defined further left in the build process. This helps with consistency and reduces redundancies.
My approach is slightly different than the approach we are taking here, but I think we can apply similar concepts. We will be testing the same ARIA features across many tests.
When displaying the test to a human tester, we want the fail and success cases to be extremely clear and leave no need for interpretation. So as the tests are designed now, the "assertions" are not generic because they describe exactly how the AT should respond to the specific example widget being tested against. However, I think there might be a real compelling reason to use more abstract assertions as @mfairchild365 designed on a11ysupport.io. Also the generalizations in a11ysupport.io are pretty good and we could probably directly import the lot of them.
Similar to this problem of designing assertions is the problem of designing user instructions. For the user instructions, I think we have two different goals:
So far, I've concluded that when writing the test we need to include both things seperately: (1) the "abstract operating instruction" in order to programmatically map an abstract operation to a set of AT commands the tester should use during the test, and (2) a human understandable instruction for what exactly you should be doing with the widget. The first I'm calling the "task" and the second I'm calling "specific_user_instructions".
For the assertions, on the other hand:
Moving forward: Should we simply supply both generic and specific assertions for every test, like I'm suggestion we do for user instructions? Will this make it easier to apply the tests to other AT's moving forward? Should we just put this on the list of things "to look into abstracting later" after we have written more tests?
This first issue serves as a summary of the thread.
This is the issue to track feedback on https://github.com/w3c/aria-at/wiki/Test-Harness-Feature-Requirements
I think it would be useful to consider testing the behavior of native widgets in HTML, in addition to APG design patterns. It usually is easier for web developers to use a standard element and get accessibility for free, compared to rolling their own widgets with ARIA. There are also some widgets in HTML that are not in APG (e.g. color well).
For example:
<input type=checkbox>
(checkbox)<input type=range>
(slider)<input type=color>
(color well)<input type=file>
(file upload control)<select>
(select-only combobox)Some challenges:
In PR #76, I added the new key setup_script_description
to verifyATBehavior
and I added a new column to the test spreadsheet @jongund and @yohta89 designed to record this string while test writing. I did this because Matt included the description of the script in a column (where at Jon and Yohta left it in a comment in the setupScript itself) so we needed to norm on one way or the other. Additionally, I added the setup test page script description to the test plans.
I tried putting it into the tests to motivate discussion, here is a menubar editor test as an example.
I think we should surface this summary to the testers themselves. Testers will need to be able to know when a setup script DID NOT work, and mark the test as "uncompletable" or "broken". If they are used to seeing a widget over and over again, they should be altered when the state should be different.
Right now, they all say "check the checkbox" or "open the submenu". Maybe we should change it to a description of the state after the script executes, like "the first checkbox will be checked" and "the submenu will be opened". From the testers perspective, I think the imperative will be confusing because they will be looking for imperative statements to execute.
This came up on the call today, and I don't want to lose track of the conversation.
Question: Should we test the single-key navigation in VoiceOver's Quick Nav Mode?
Notes:
My take: Since this functionality is not enabled by default, we should not expend time testing it.
We have 14 days to build a prototype. The goal of the prototype is to test and iterate on the the test design. The following prototype will not have some required features of the ultimately desired test harness (such as the ability to log in or show past test results or queue tests based on past test results). It will cover the basic requirements for displaying tests and test result recording.
Stage 1: ariaatharness.js (as described in this PR):
displayATTest
: Display test assertions, record test results in json html tag (following the convention of WPT tests). The API will be designed to allow for structured data collection on test failures.openTestPage
: Open a browser window with test page.executeScriptInTestPage
: Execute javascript in test page.Stage 2: Write a simple test runner for the tests.
results
directory. Files within sub-directories will be combined into a single table. Test results from different AT will be put into different tables, not shown side by side.In the telecon today we discussed that there was inconsistency in "label" and "name". We wanted to avoid saying "label" (to align with APG), but that "name" is adequate. No need to say "accessible name".
I think we should document this. In https://github.com/w3c/aria-at/wiki/Test-Harness-and-Test-vocabulary ?
Hey all! This is a somewhat urgent issue because I need to get something in by Wednesday the 27th, so hoping spend all of Tuesday the 26th at the latest implementing a summary page. Each test has a lot of information recorded in it, and I am not sure which information is the most relevant to show. I'm hoping to get feedback in this issue to simply make a first draft of a test result summary.
Here is an example html page with the summary after you complete the "read-checkbox.html" test, which I used because it is the longest test:
That is the result for just one test. What we need to discuss here is:
Therefore, any failing assertion or undesirable behavior after any key command will result in the test failing. We could have a different state for a test result if all assertions pass but there is some additional undesirable behavior occurs.
An assertion will fail if any screen reader commands results in incorrect information (in this case, the tester will have marked "incorrect output"). In the checkbox case, if the accessible name or role or state is actually wrong for any tested key command, then the assertion fails.
Additionally, an assertion will fail if the test author includes an additional assertion that is not related to the output of the screen reader and that assertion fails. So far we only have one example of this kind of assertion, that is the assertion about the JAWS and NVDA changing modes when you use TAB to read a checkbox.
An assertion will be considered to pass if there is only "missing" information (for example, if a test marks "no output" because the role "checkbox" is not announced).
The test design so far does not have a way to record the necessity of an assertion (for example, "mandatory" or "nice to have"). This will take some thinking to fit into the test design (as the tests are already quite complicated) so I do not think that the ability to mark some assertions as necessary for passing and others are not necessary will make it into the prototype for this phase of the project.
I'm opening this issue just so that we have a place to share some design research.
If you find useful references for how similar features are designed or implemented somewhere else, please share these examples here. Even when there are fundamental differences in the projects', having a wide range of examples and references will help us a lot when we think about design.
This for example is how the UK Government Digital Service uses Github issues to share useful design references for different features. alphagov/govuk-design-system-backlog#133
This issue will continue to be updated with feedback and notes on the prototype. I'll edit this paragraph when I'm done with my review.
The version of the prototype used for this feedback: https://w3c.github.io/aria-at/
/tests/
and there are folders for each APG example. https://github.com/w3c/aria-at/tree/master/testsresults
: https://github.com/w3c/aria-at/tree/master/resultsverifyATBehavior()
call: https://github.com/w3c/aria-at/blob/master/tests/checkbox/navigate-through-checkbox-group-interaction.htmlverifyATBehavior()
contain valid modes (and possibly assertions) and that all required properties are present? This could save time debugging down the road.verifyATBehavior()
and defining an array of output_assertions
. What if output assertions are shared between tests? What if AT behavior is shared between tests? For example, the assertions of role
, name
, and boundaries
should always be met everywhere that a named group is present, and it may be present in many APG examples (and therefore tests). Pre-defining possible assertions and AT Behavior will help save time during authoring and improve consistency. Answer from 2019-12-11: Seems like a good idea, let's look into this further.{
"results": [
{
"test": "Navigating through checkbox group in interaction mode.",
"details": [
{
"name": "Navigating through checkbox group in interaction mode.",
"specific_user_instruction": "navigate through boundaries of checkbox group",
"task": "navigate to checkbox group",
"commands": [
{
"command": "Tab / Shift+Tab",
"output": "test",
"unexpected_behaviors": [],
"support": "FULL",
"assertions": [
{
"assertion": "The role 'group' is spoken",
"priority": 1,
"pass": "Good Output "
},
{
"assertion": "The group's name 'Sandwich Condiments' is spoken",
"priority": 1,
"pass": "Good Output "
},
{
"assertion": "The boundaries of the group (before the first checkbox and after the last checkbox) are conveyed",
"priority": 1,
"pass": "Good Output "
}
]
}
],
"summary": {
"1": {
"pass": 3,
"fail": 0
},
"2": {
"pass": 0,
"fail": 0
},
"3": {
"pass": 0,
"fail": 0
},
"unexpectedCount": 0
}
}
],
"status": "PASS"
}
],
"assistiveTechnology": {
"name": "JAWS",
"version": ""
},
"browser": {
"name": "Chrome",
"version": "78.0.3904.108"
},
"designPattern": "checkbox"
}
Break into smaller pieces, and expect it to be an iterative process with lots of opportunity for feedback from AT developers.
Potentially: Write tests for three examples, get feedback, iterate on designs, then work on next three examples and repeat.
After the tests for an APG pattern are written in HTML and JavaScript in the WPT format, we need a way for people to easily read through the tests. Without a report of what is in the HTML files that hold the WPT format, you would have to read the HTML source or run the tests, neither of which is a practical solution for reviewing the test plan for a pattern or validating whether the encoded expectations are correct.
Uses:
When generating the report, we should be able to choose the assistive technologies that are included. Uses:
APG design pattern: https://w3c.github.io/aria-practices/#grid
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.