Accessibility Conformance Testing Rules Format

1. Introduction

There are currently many products available which aid their users in testing web content for conformance to accessibility standards such as WCAG 2.0. As the web develops and grows in both size and complexity, these tools are essential for managing the accessibility of resources available on the web.

This format is intended to enable a consistent interpretation of how to test for accessibility requirements so as to avoid conflicting results of accessibility tests. It is intended to be applicable to describe both manual accessibility tests as well as automated testing done through accessibility test tools (ATTs).

Describing how to test certain accessibility requirements will result in accessibility tests that are transparent with test results that are reproducible. The Accessibility Conformance Testing (ACT) Rules Format defines the requirements of these test descriptions, known as Accessibility Conformance Testing Rules (ACT Rules).

2. Scope

The ACT Rules Format defined in this specification is focused on the description of rules applicable to content created using web technologies, such as HTML, CSS, WAI-ARIA, SVG and more, including digital publishing. The ACT Rules Format, however, is designed to be technology agnostic, meaning that it can conceivably be used for other technologies.

For instance, the ACT Rules Format can be used to describe ACT Rules dedicated to testing the accessibility requirements defined in the Web Content Accessbility Guidelines, which are specifically designed for web content.

Other accessibility requirements applicable to web technologies can also be testable with ACT Rules. For example, ACT Rules could be developed to test the conformance of web-based user agents to the User Agent Accessibility Guidelines. However, the ACT Rules Format would not necessarily be suitable to describe tests for the conformance of a non-web-based user agent.

3. ACT Rule Structure

3.1. Rule Outline

An ACT Rule MUST be written in plain language. It MUST consist of the following items:

Descriptive title
Unique identifier
Rule Description
Accessibility Requirements
Limitations, Assumptions or Exceptions, if any
Accessibility Support (optional)
Aspects Under Test
Test Procedure

Note: The unique identifier can be any unique text value, such as plain text, URL or a database identifier.

3.2. Rule Description

An ACT Rule MUST have a description that is in plain language and provides a brief explanation of what the rule does.

3.3. Accessibility Requirements

An ACT Rule MUST identify the accessibility requirements to which the rule maps, (for example, WCAG 2.0 success criterion 1.1.1). An ACT Rule is a complete or partial test for one or more accessibility requirements.

Outcomes from an ACT Rule MUST be consistent with the accessibility requirement, e.g. a rule only returns the outcome Fail when the content fails the accessibility requirement. This means that the rule maps to the accessibility requirement, as opposed to it merely being related to the requirement, thematically or otherwise.

The actual definition of specific accessibility requirements is beyond the scope of ACT Rules and of this document. For WCAG 2, Success Criteria are considered to be accessibility requirements. Some organizations have additional accessibility requirements, such as specific implementation techniques to meet WCAG 2 Success Criteria.

3.4. Limitations, Assumptions or Exceptions

An ACT Rule MUST list any limitations, assumptions or any exceptions for the test, the test environment, technologies being used or the subject being tested; For example, A rule for Success Criterion 1.4.1: Use of Color has to make an assumption that CSS is used to make a link visually evident - typically by using CSS background, border, color, font, or text-decoration properties.

4. Accessibility Support

ACT Rules are designed to test accessible uses of a web technology. However, not every part of a web technology is implemented in all assistive technologies a website may need to support. The concept of accessibility supported use of a Web technology is defined in [WCAG20]. Because of this, ACT Rules are not necessarily applicable in all test scenarios. For instance, a web page that has to work in assistive technologies that have no WAI-ARIA support, wouldn’t be tested with an ACT Rule that relies on WAI-ARIA support, since this could lead to false positive results.

Even within a rules group, certain individual rules are not always applicable. This leaves one fewer solution for passing that particular rule group. Because of this, ACT Rules have to be atomic, and so an ACT Rule MUST NOT rely on other rules to be part of the same test scenario.

To support users of ACT Rules in properly defining their test scenarios, an ACT Rule SHOULD include a warning, if there are significant accessibility support concerns known about a rule.

5. Aspects Under Test

ACT Rules can operate on different aspects of the subject being tested. An aspect is a distinct part of the test subject itself or its underlying implementation. For example, rendering a web page to an end user involves multiple different technologies, some or all of which may be of interest to an ACT Rule. Some rules may need to operate directly on the HTTP messages exchanged between a server and a client, while others need to operate on the DOM tree exposed by a web browser. Some rules may even need to operate on several aspects simultaneously such as both the HTTP messages and the DOM tree.

An ACT Rule MUST include a description of all the aspects under test by the rule. Each aspect MUST be discrete with no overlap between the aspects. Some aspects are already well defined within the context of web content, such as HTTP messages, DOM tree, and CSS styling, and do not warrant a detailed description. Other aspects may not be well defined or even specific to web content. In these cases, an ACT Rule SHOULD include a detailed description, or a reference to one, of the aspect in question.

5.1. Common Aspects

5.1.1. HTTP Messages

The HTTP messages exchanged between a client and a server as part of requesting a web page may be of interest to ACT Rules. For example to perform validation of HTTP headers or unparsed HTML and CSS.

5.1.2. DOM Tree

The DOM tree constructed from parsing HTML, and optionally executing DOM manipulating JavaScript, may be of interest to ACT Rules to test the structure of web pages. In the DOM tree, information about individual elements of a web page, and their relations, becomes available.

The means by which the DOM tree is constructed, be it by a web browser or not, is not of importance as long as the construction follows any applicable specifications that might exist, such as [DOM].

5.1.3. CSS Styling

The computed CSS styling resulting from parsing CSS and applying it to the DOM may be of interest to ACT Rules that wish to test the web page as presented to the user. Through CSS styling, information about the position, the foreground and background colors, the visibility, and more, of elements becomes available.

The means by which the CSS styling is computed, be it by a web browser or not, is not of importance as long as the computation follows any applicable specifications that might exist, such as [CSSOM].

5.1.4. Accessibility Tree

The accessibility tree constructed from extracting information from both the DOM tree and the CSS styling may be of interest to ACT Rules. This can be used to test the web page as presented to assistive technologies such as screen readers. Through the accessibility tree, information about the semantic roles, accessible names and descriptions, and more, of elements becomes available.

The means by which the accessibility tree is constructed, be it by a web browser or not, is not of importance as long as the construction follows any applicable specifications that might exist, such as [CORE-AAM-1.1].

5.1.5. Language

Language, either written or spoken, as available in nodes of the DOM or accessibility trees may be of interest to ACT Rules that wish to test things like complexity or intention of the language. For example, an ACT Rule might wish to test that certain paragraphs of text within the DOM tree do not exceed a certain readability score or that the text alternative of an image provides a sufficient description thereof.

The means by which the language is assessed, be it by a person or a machine, is not of importance as long as the assessment meets the criteria defined in Requirements for WCAG 2.0 Checklists and Techniques §humantestable.

6. ACT Test Procedure

6.1. Applicability

The applicability section is a required part of an ACT rule. It MUST contain a precise description of the parts of the test subject to which the rule applies. For example, specific nodes in the DOM tree, or tags that are incorrectly closed in an HTML document. These are known as the test targets.

Applicability MUST be described objectively, unambiguously and in plain language. When a formal syntax (such as a CSS selector or XPATH) can be used, that (part of the) applicability MAY use that syntax in addition to the plain language description. While testing, if nothing within the test subject matches the applicability of the rule, the result is inapplicable.

An objective description is one that can be resolved without uncertainty in a given technology. Examples of objective properties in HTML are element names, their computed role, the spacing between two elements, etc. Subjective properties on the other hand, are concepts like decorative, navigation mechanism and pre-recorded. Even concepts like headings and images can be misunderstood. Is this talking about the tag name, the accessibility role, it’s purpose in the web page? The later of which is almost impossible to define with objectivity. When used in applicability, these concepts MUST have an objective definition. This definition can be part of a larger glossary shared between rules.

A rule for labels of HTML input elements may have the following expectations:

The test target has an accessible name (as described in Accessible Name and Description: Computation and API Mappings 1.1).
The accessible name describes the purpose of the test target.

6.2. Expectations

An ACT Rule MUST contain one or more expectations. An expectation is a statement that must be true about each test target (see Applicability). Each expectation MUST be unambiguous and be written in plain language. Unlike in applicability, a certain level of subjectivity is allowed in expectations. Meaning that the expectation has only one possible meaning, but that meaning isn’t fully quantifiable.

Editor note - The task force is looking for feedback about if expectations should be allowed to reference each other, or if each must be testable independently of the others. For details, see Github issue #173.

When all expectations are true for a test target it passed the rule. If one or more expectations is false, the test target failed the rule. If the rule is part of a rule group, a test target that failed a rule may still meet the requirement (see rule grouping for details).

A rule for labels of HTML input elements may have the following expectations:

The test target has an accessible name (as described in Accessible Name and Description: Computation and API Mappings 1.1).
The accessible name describes the purpose of the test target.

7. Rule Grouping

In accessibility testing, there are often multiple ways to solve the same problem. For instance, in HTML tables, header cells can be indicated through the scope attribute, by using the headers attributes, or by using ARIA labels. All of these separate techniques could be described in one big rule. But this creates a large, and often difficult to maintain rule. To ensure rules are kept small and atomic, they SHOULD be put into a rule-group instead.

To meet the accessibility requirement, only one rule in a rule group has to pass. This way, for our table example, one could write three separate rules, one for scope, one for headers+id and one ARIA labels. If a table meets any of these rules, it automatically passes the group, and the failed results of the other rules can be ignored.

Editor note - The ACT Task Force is looking for feedback about the use of Rule groups. We are considering if a group should be allowed to have a single rule, as though it adds some complexity it minimizes change if rules are added in the future. Additionally, we are considering allowing a group to require more than one pass, before the group passes. Particularly in an example of WCAG 2.0’s Multiple ways success criterion, this may be useful. For details, see Github issue #161.

Test targets can pass a rules group if it was applicable to at least one rule within that group, and returned a passed result. Likewise, a test target will fail the rule group, if it was applicable in any rule in the group, but none of them passed it. In the example of HTML Tables, passing the table scope rule, and failing the aria label and headers+id rule, would mean the test target passed this group of rules.

Note that rules in a group MAY have different applicability. Because of this, not every element applicable within the group is tested by every rule. Rules MAY also be disabled during a test run due to accessibility support concerns. See Accessibility support for details.

8. ACT Data Format (Output Data)

With ACT Rules, it becomes important that data coming from different sources can be compared. Only by having a shared vocabulary, can accessibility data that is produced by different auditors be compared and where necessary aggregated. Every ACT Rule MUST therefore require that the output is expressed in a format that has all of the features described in the ACT Data Format.

Rules are tested in two steps. Firstly, an applicability is used to find a list of Test Targets (elements, tags or other "components") within the web page or other test subject. Following this, each test target is tested to see if all expectations are true. This will give the outcome for each test target. For contextual information, the output data must also include test subject and the rule identifier.

Editor note - The ACT Taskforce is investigating to what extend a shared data format may or may not be necessary for the rules format, and if it is necessary, how in depth it should be. For details, see Github issue #162.

This will mean that every time a rule is executed on a page, it will return a set with zero or more results, each of which MUST have at least the following properties:

test (Rule ID)
subject (Web page)
pointer (Test target)
outcome (Passed or Failed)

Output data using EARL and JSON-LD

{
  "@context": "https://raw.githubusercontent.com/w3c/wcag-act/master/earl-act.json",
  "@type": "Assertion",
  "subject": "https://example.org/",
  "test": "auto-wcag:rules/SC1-1-1-css-image.html",
  "result": {
    "outcome": "Failed",
    "pointer": "html > body > h1:first-child"
  }
}

8.1. Test Target

When representing the test target in the output data, it is often impractical or impossible to serialize the test target as a whole. Instead of this, a pointer can be used to indicate where the test target exists within the web page or other test subject. There are a variety of pointer methods available, such as those defined in Pointer Methods in RDF 1.0.

The pointer method used in the output data of an ACT Rule MUST include the pointer method used in Implementation Validation.

8.2. Outcome

The procedure of a rule MUST always return one of the following outcomes:

Passed: All expectations for the Test Target were true
Failed: One or more expectations for the Test Target was false
Inapplicable: There were no Test Targets in the Test Subject

8.3. Ensure Comparable Results

In addition to the Test Target and the Outcome, ACT Rule output data MUST include the following contextual information:

the Web page, file or other test subject the rule was applied to and
an identifier of the rule itself.

8.4. Test Subject

When a single URL can be used to reference the web page, or other test subject, this URL MUST be used. In scenarios where more complex actions are required to obtain the test subject (in the state that it is to be tested), it is left to ATT developers to determine which method is best used to express the test subject.

9. Rule Quality Assurance

9.1. Implementation Validation

Implementation tests are tests for accessibility test tools. These tests enable the developers and users of ATTs to validate the implementation of an ACT Rule. An ACT rule MUST have implementation tests for the applicability, as well as for each expectation in the rule.

An implementation test consists of two parts: a piece of input data and an expected result. When applying the test, the piece of input data, for instance an HTML code snippet, is evaluated by following the rule’s test procedure. The result is then compared to the expected result of the test. The expected result consists of a list of pointers and the expected outcome (Passed, Failed, Inapplicable) of the evaluation.

9.2. Accuracy Benchmarking

The web is ever changing, and technologies are used in such diverse and creative ways that it is impossible to predict in advance, all the ways that accessibility issues can occur and all the ways they can be solved for. When writing ACT Rules, it is almost inevitable that exceptions will be overlooked during the design of a rule, or that new technologies will emerge that introduce new exceptions.

This makes it important to be able to regularly test if the rule has the accuracy that is expected of it. This can be done by benchmark testing. In benchmark testing, the accuracy of a rule is measured, by comparing its results to those obtained through accessibility expert testing.

The accuracy of a rule is the average between the false positives and false negatives, which are in turn calculated as follows:

False positives: This is the percentage of test targets, that were failed by the rule, but were not failed by an accessibility expert.
False negatives: This is the percentage of test targets, that were passed by the rule, but were failed by an accessibility expert.

There are several ways this can be done. For instance, accessibility test tools can implement a feature which lets users indicate that a result is in error, or pages that for which accessibility results are known, can be tested using ATT, and the results are compared. To compare results from ACT Rules to those of expert evaluations, data aggregation may be necessary.

9.3. Rule Aggregation

As described in section §8 ACT Data Format (Output Data) a rule will return a list of results, each of which contain 1) an outcome (Passed or Failed), 2) the test target, 3) the Rule ID and 4) the test subject. Data expressed this way has a great deal of detail, as it gives multiple pass / fail results for each rule.

Most expert evaluations do not report results at this level of detail. Often reports are limited to giving a single outcome (Passed, Failed, Inapplicable) per page, for each success criteria (or other accessibility requirement). To compare the data, results from rules should be combined, so that they are at the same level.

When all rules pass, that does not mean that all accessibility requirements are met. Only if the rules can test 100% of what should be tested, can this claim be made. Otherwise the outcome for a criteria is inconclusive.

Example: An expert evaluates a success criterion to fail on a specific page. When testing that page using ACT Rules, there are two rules that map to this criterion. The first rule returns no results. The second rule finds 2 test targets that pass, and a 3rd test target that fails.

In this example, the first rule is inapplicable (0 results), and the second rule has failed (1 fail, 2 pass). Combining this inapplicable and fail, means the success criterion has failed.

See Appendix 1: Aggregation examples, using JSON-LD and EARL on how this could be expressed using JSON-LD and EARL.

9.4. Update Management

9.4.1. Change Log

It is important to keep track of changes to the ACT rules so that users of the rules can understand if changes in test results are due to changes in the rules used when performing the tests, rather than changes in the content itself. All changes to an ACT Rule that can change the outcome of a test MUST be recorded in a change log. The change log can either be part of the rule document itself or be referenced from it.

Each new release of an ACT Rule MUST be identifiable with either a date or a version number. Additionally, a reference to the previous version of that rule MUST be available. It is recommended that for extensive changes, a new rule is created and the old rule is deprecated.

An example of when a new rule should be created would be when going from a rule that tests the use of a blink element, to a rule that looks for animated style changes. This potentially adds lots of new issues that were previously out of scope. But for that same rule, adding an exception to allow blink elements positioned off screen, should be done by updating the existing rule.

9.4.2. Issues List

An ACT Rule MAY include an issues list. This list must be used to record cases in which the ACT Rule might return a failure where it should have returned a pass or vice versa. There may be several reasons why this might occur, including:

Certain scenarios or the use of technologies is very rare and was not included in the rule for that reason.
Certain accessibility features are impossible to test within the test environment. They might for instance only be testable by accessing the accessibility API, require screen capturing, etc.
The scenario did not exist (due to changing technologies) or was overlooked during the initial design of the rule.

The issues list serves two purposes. For users of ACT Rules, they give insight into why an inaccurate result might have occurred, as well as provide confidence in the result of that rule. For the designer of the rule, the issues list is also useful to plan future updates to the rule. A new version of the rule might see issues resolved and the item moved to the change log.

Appendix 1: Aggregation examples, using JSON-LD and EARL

Example:

{
  "@context": "https://raw.githubusercontent.com/w3c/wcag-act/master/earl-act.json",
  "@type": "Assertion",
  "subject": "https://example.org/",
  "test": "auto-wcag:SC1-1-1-css-image.html",
  "result": {
    "outcome": "Failed",
    "source": [{
      "test": "auto-wcag:SC1-1-1-css-image.html",
      "result": {
        "outcome": "Failed",
        "pointer": "html > body > h1:first-child"
      }
    }, {
      "test": "auto-wcag:SC1-1-1-css-image.html",
      "result": {
        "outcome": "Passed",
        "pointer": "html > body > h1:nth-child(2)"
      }
    }]
  }
}

Example: Aggregate rules to a WCAG success criterion

{
  "@context": "https://raw.githubusercontent.com/w3c/wcag-act/master/earl-act.json",
  "@type": "Assertion",
  "subject": "https://example.org/",
  "test": {
    "@id": "wcag20:#text-equiv-all",
    "title": "1.1.1 Non-text Content"
  },
  "result": {
    "outcome": "Failed",
    "source": [{
      "test": "auto-wcag:SC1-1-1-css-image.html",
      "result": {
        "outcome": "Failed",
        "pointer": "html > body > h1:first-child"
      }
    }, {
      "test": "auto-wcag:SC1-1-1-longdesc.html",
      "result": {
        "outcome": "Passed",
        "pointer": "html > body > img:nth-child(2)"
      }
    }]
  }
}

Example: Aggregate a list of results to a result for the website

{
  "@context": "https://raw.githubusercontent.com/w3c/wcag-act/master/earl-act.json",
  "@type": "Assertion",
  "subject": {
    "@type": ["WebSite", "TestSubject"],
    "@value": "https://example.org/"
  }
  "test": "http://www.w3.org/WAI/WCAG2A-Conformance",
  "result": {
    "outcome": "Failed",
    "source": [{
      "test": "wcag20:text-equiv-all",
      "result": {
        "outcome": "Failed",
        "source": [ … ]
      }
    }, {
      "test": "wcag20:media-equiv-av-only-alt",
      "result": {
        "outcome": "Passed",
        "source": [ … ]
      }
    }, {
      "test": "wcag20:media-equiv-captions",
      "result" : {
        "outcome": "Inapplicable",
        "source": [ … ]
      }
    }, … ]
  }
}