Light

Define the role of automation about aria-at HOT 6 OPEN

jugglinmike commented on September 21, 2024

Define the role of automation

from aria-at.

Comments (6)

jugglinmike commented on September 21, 2024

It looks like I'll need more than "just a moment" to create that event. I hope to post it here within 24 hours.

from aria-at.

jugglinmike commented on September 21, 2024

Here are the meeting details.

from aria-at.

jugglinmike commented on September 21, 2024

At the end of today's meeting, @mcking65 expressed a desire for AT responses to be collected by a human "at least once." I'd like to nail that down a little further (here in this discussion thread if possible, or during next week's one-off meeting if not).

First: does a manual collection for one AT satisfy the requirement for ALL ATs? In other words, do we want to require that AT responses are manually collected "at least once [for each AT]" or "at least once [for all ATs, collectively]"? (I have a similar question for web browsers.)

Second: should test modification refresh the need for manual validation?

Maybe another way to think about this is to consider which component (or group of components) the manual collection is intended to validate: the tests, the automation system, the browsers, or the ATs.

from aria-at.

jugglinmike commented on September 21, 2024

The meeting minutes are available on w3.org.

The full IRC log of that discussion

<jugglinmike> MEETING: ARIA-AT Community Group Automation Workstream
<jugglinmike> present+ jugglinmike
<jugglinmike> scribe+ jugglinmike
<jugglinmike> Topic: Considering the direction of automation
<jugglinmike> present+ Matt_King
<jugglinmike> present+: Michael_Fairchild
<jugglinmike> present+ Michael_Fairchild
<mzgoddard> +present
<jugglinmike> jugglinmike: When do we want to allow/disallow automation in the collection of AT responses?
<jugglinmike> Matt_King: I'm going to make some proposals to change the glossary to separate "test results" into "AT responses" from "response analysis"
<jugglinmike> Matt_King: The "test case" (the thing you're testing) and the assertions are designed to be AT-agnostic (for the set of ATs which are in-scope for the command)
<jugglinmike> Matt_King: Then you have commands or events which generate responses. Right now, we use the word "output", but we might want to use the word "response" instead.
<jugglinmike> Matt_King: I think those responses should be called "command responses" because they are tied to a response. That's one proposal
<jugglinmike> Matt_King: Command responses aren't part of the test. Running the test generates command responses
<jugglinmike> Matt_King: Should the analysis be considered part of "running the test"?
<jugglinmike> Matt_King: Another proposal I have for the glossary is to label the analysis of command responses "verdicts"
<jugglinmike> present+ mzgoddard
<jugglinmike> mzgoddard: Does that include unexpected responses?
<jugglinmike> Matt_King: No
<jugglinmike> Matt_King: Unexpected behaviors are an attribute of the response
<jugglinmike> Matt_King: I think there are two aspects of unexpected behavior: a token which documents if it's present and what kind it is, and a textual description
<jugglinmike> Matt_King: When it comes to assertion verdicts, I think the realm of automation is pretty limited
<jugglinmike> Matt_King: Having automation simply detect congruence with previously-interpreted responses is reasonable
<jugglinmike> Michael_Fairchild: in the future, we might be able to train an AI model to perform analysis, but that seems far off right now
<jugglinmike> github: #909
<jugglinmike> github #909
<jugglinmike> jugglinmike: Verdicts need to be tied to the content of the tests. A change to the test would mean the system should not "trust" or "reuse" verdicts previously reported by a human
<jugglinmike> Matt_King: I was anticipating that almost everything here is versioned
<jugglinmike> Matt_King: If we get feedback from an AT vendor that they want some assertion to change, that drives a change to the test plan
<jugglinmike> Matt_King: If a screen reader command changes, that also drives a change to the test plan
<jugglinmike> Matt_King: What if we change a test plan and we want to generate new results for the updated test plan? We would expect the automation to collect all the responses, but in the cases where the assertions have changed, then it would say "I don't have verdict"
<jugglinmike> Matt_King: We need to be able to insert a test between "test 1" and "test 2" without invalidating either of their verdicts
<jugglinmike> Matt_King: Could changes to instructions ever invalidate the verdict?
<jugglinmike> Matt_King: The setup scripts could definitely. Ditto with the commands and the assertions
<jugglinmike> mzgoddard: Whatever we land on, we can agree that there are a group of traits which we can use to look up prior verdicts and reuse.
<jugglinmike> Matt_King: Exactly
<jugglinmike> Matt_King: Automation is allowed to run a test, execute commands, generate responses. In the event that human-approved verdicts exists, and IF those verdicts are still applicable, then assign those verdicts to the responses.
<jugglinmike> s/generate/collect/
<jugglinmike> Michael_Fairchild: I pause in thinking that any change to the APG would invalidate the verdicts
<jugglinmike> Michael_Fairchild: It ought to be enough to look at the AT response, because a meaningful change to the APG example would change the AT response
<jugglinmike> jugglinmike: But if the source changes meaningfully and the AT incorrectly does not change its output, then the system would be "fooled" into re-using a prior verdict which is actually no longer valid
<jugglinmike> jugglinmike: It seems like these considerations need to be addressed in the working mode. Is that right?
<jugglinmike> Matt_King: I'm worried about the working mode becoming too opaque. I think the working mode is more for humans to understand the high level. Maybe certain sections of the work could link to separate documents which get into mechanics
<jugglinmike> Matt_King: These condiserations definitely need to be documented, though I'm not sure how yet
<jugglinmike> Matt_King: The first thing I want to do is update the glossary and then use that to inform how we want to update the working mode
<jugglinmike> Matt_King: There is a need for stakeholders (e.g. AT implementers) to understand the working mode. Like, "what's the appeals process?"
<jugglinmike> Matt_King: Complexity will be the enemy of clarity there
<jugglinmike> Zakim, end the meeting

from aria-at.

jugglinmike commented on September 21, 2024

@mcking65 as per our conversation on 2023-03-27, I've updated the name of this issue and added a checklist describing the next steps. Does that reflect your understanding of the work ahead?

from aria-at.

jugglinmike commented on September 21, 2024

@mcking65 In addition to the above, I'm wondering if the following change makes sense for the app.

Currently, the app includes "unexpected behaviors" in its description of test results. If I understand the new terminology correctly, it would be appropriate to call these "unexpected responses." That would be an improvement in my mind for a couple reasons:

it would avoid insinuating the presence of some other classification of data (instead of documenting both "responses" and "behaviors", we only consider "responses" and denote some of them as "unexpected")
it would reinforce the relationship between these two pieces of information (the two have identical data types)

What do you think?

from aria-at.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.