Challenges with Accessibility Guidelines Conformance and Testing, and Approaches for Mitigating Them

Abstract

This document explores the page-based conformance verification approach used by WCAG 2.0 and 2.1 accessibility guidelines. It explains how this approach is challenging to apply to certain websites and web applications. It also explores ideas on how future versions of guidelines might address these challenges. This document focuses primarily on challenges to large, highly complex, dynamic sites. Other efforts in WAI are looking at different aspects of conformance for other types of sites.

The challenges covered broadly fall into five main areas:

Numerous provisions need human involvement to test and verify conformance, which is especially challenging to scale for large websites and for dynamic websites;
Large and dynamic sites with their changing permutations may be difficult to validate;
Third parties frequently add and change content on large and dynamic sites;
Applying a web-based, and page-based conformance model can be challenging to do for non-web Information and Communications Technologies (ICT).
The centrality of Accessibility Supported in the many provisions tied to use with assistive technologies and platform accessibility features, combined with the lack of definition of what constitutes Accessibility Supported, further exacerbates the need for expert human judgement (#1 above), as well as potential different and non-overlapping sets of these features used when including 3rd party content (#3 above).

The purpose of this document is to help understand those challenges more holistically, and explore approaches for mitigating them so that we can address such challenges more fully in future accessibility guidelines including the forthcoming W3C Accessibility Guidelines (WCAG 3.0) (now in early development) where the W3C Working Group Charter expressly anticipates a new conformance model.

Introduction

Problem Statement

Assessing the accessibility of a website is of critical importance. Website authors want to have website accessibility assessments in order to understand the places where visitors with disabilities may be unable to use a site in order to alleviate those difficulties. External parties who have an interest in the accessibility of a website likewise want to have website assessments so they can understand whether the site meets their accessibility fitness criteria. To aid in this assessment, the Accessibility Guidelines Working Group (AGWG) of the World Wide Web Consortium (W3C) developed the Web Content Accessibility Guidelines (WCAG), containing both a rich set of success criteria to meet the needs of people with disabilities and conformance requirements for the same. Assessing conformance of a website to all of the success criteria is how accessibility assessments have been done to date, either through assessing every individual page, or through a page sampling approach.

While the challenges discussed in this document apply to websites and web applications broadly, this document focuses particularly on situations involving large, dynamic, and complex websites. There are important reasons WCAG 2 and related resources have the guidelines and success criteria they do. Failure to conform to these is likely to erect a barrier for people with disabilities. The issues raised in this document do not mean sites should not strive to conform to WCAG 2. But it is also vital to consider aspects that commonly lead to accessibility challenges found during late stage testing, such as a lack of accessibility prioritization, training, and integration throughout the design, development, and maintenance process. A new version of accessibility guidelines, W3C Accessibility Guidelines 3.0 (WCAG 3.0), rethinks all aspects of accessibility guidance. It is also expressly chartered to develop a new conformance model which should help address the challenges explored in this document.

Large websites are often highly complex, with substantial dynamic content, including content updates, new content, and user interface changes that happen almost continuously, perhaps at the rate of hundreds or even thousands of page updates per second. This is especially the case where third parties are actively populating and changing site content, such as website users contributing content. Ensuring that every one of these page updates fully satisfies all success criteria (as appropriate), especially where expert human review is required for some criteria presents a problem for scaling conformance assessments. Further, where pages are generated programmatically, finding every bug related to that generation may prove challenging, especially when they only arise from uncommon content scenarios or combinations (and updates to those algorithms and code happen multiple times per week). It is incumbent on websites - especially for large, complex, dynamic websites - to do everything they can to conform. However, to date, no large, complex software has been bug free. Similarly, authors of large, dynamic, and complex sites have struggled to claim conformance with no accessibility defects on any page.

Assessing conformance of such large, highly complex, dynamic sites to the Web Content Accessibility Guidelines (WCAG) 2.0 [wcag20] or 2.1 [wcag21] has proved difficult. The Web Content Accessibility Guidelines 2.0 include a set of normative requirements in order for a web page to conform to WCAG 2.0, including that conformance is for full web page(s) only, and cannot be achieved if part of a web page is excluded, along with a Note that states If a page cannot conform (for example, a conformance test page or an example page), it cannot be included in the scope of conformance or in a conformance claim. The conformance requirements also state what is allowed in any optional Conformance Claims, starting with: Conformance is defined only for web pages. However, a conformance claim may be made to cover one page, a series of pages, or multiple related web pages. For the purposes of this document, we use the term WCAG 2.x conformance model to refer to the normative text in the Conformance section of WCAG 2.0 and WCAG 2.1.

This WCAG 2.x conformance model contains a mitigation related to partial conformance for 3rd party content (see Sec. 3.1: Treatment of 3rd party content and Statements of Partial Conformance below). Further in recognition of these challenges, the W3C Note Website Accessibility Conformance Evaluation Methodology (WCAG-EM) 1.0 [wcag-em] was published in 2014 to provide guidance on evaluating how well websites conform to the Web Content Accessibility Guidelines. This W3C document describes a procedure to evaluate websites and includes considerations to guide evaluators and to promote good practice, which can help organizations to make a conformance claim, while acknowledging that there may be errors on pages not in the sample set or that were not picked up by automated evaluation tools on pages that were not in the human evaluation sample. While WCAG-EM provides a practical method for claiming conformance for a website, it doesn't fully address the challenges in making every part of every page in a large, dynamic website conform to every success criterion.

Also, the Authoring Tool Accessibility Guidelines 2.0 (ATAG) [atag20] provides guidelines for designing web content authoring tools that are both more accessible to authors with disabilities as well as is designed to enable, support, and promote the production of more accessible web content by all authors. Leveraging ATAG-conformant authoring tools can significantly help with preventing accessibility issues from occurring on a website. Moreover, several challenges with conformance could better be addressed by authoring tools or user agents, both of which are scoped for nonnormative WCAG 3.0 guidance.

Further, the Research Report on Web Accessibility Metrics [accessibility-metrics-report] from the Research and Development Working Group (RDWG) explores the main qualities that website accessibility need to consider to communicate accessibility in a simple form such as a number. One particularly interesting thing this research report explored are qualities such as the severity of an accessibility barrier and the time it takes for a site visitor to conduct a task, as alternative measures that could complement conformance-based metrics. Unfortunately, this research report was last updated in May 2014, and has not progressed to being a published W3C Note. The Research and Development Working Group was not renewed by W3C Membership in 2015, though a Research Questions Task Force, under the Accessible Platform Architecture (APA) Working Group, is able to look at similar issues.

Finally, the Accessibility Conformance Testing Task Force and ACT Rules Community Group (ACT-R) are working to standardize accessibility conformance testing methodologies. They are doing it through defining a W3C specification, published as a W3C Recommendation in 2019, Accessibility Conformance Testing (ACT) Rules Format 1.0, [act-rules-format-1.0] as well as considering ways to output metrics around what the tests find. This could contribute to an alternative to the conformance model, which requires a site to have no defects on any page to claim conformance. Whenever possible, they are turning WCAG success criteria into automated test rules to enable accessibility testing on a large scale. When effective automated tests are not possible, they are writing semi-automated or manual accessibility tests that are both informative and valuable. It does not, however, speak to scaling tests that require human involvement, or the challenges of third-party content, or solve the problem of programmatically generated web pages. While they are all substantial contributions to the field of web accessibility, WCAG-EM, ATAG, and ACT task forces actively address different types of challenges, and do not fully address the challenges described in this document.

Separately, the phrase substantially conforms to WCAG is a way of conveying that a website is broadly accessible, but that not every page conforms to every requirement. Unfortunately, that phrase has no W3C definition today and there is no definition or mechanism that sets a level for a site to meet that would qualify as substantial conformance.

Silver Research Findings

Now known as W3C Accessibility Guidelines (WCAG 3.0), this next iteration of W3C accessibility guidance was conceived and designed to be research-based, coming out of the Silver Research effort. A summary of the conformance related findings from Silver research contributed to this document, and is provided in its entirety as Appendix C below. Findings that expand on the four main challenges enumerated in this document are integrated throughout. Additional enumerated challenges identified by Silver research are also articulated as independent, specific challenges.

Working over many years, the Silver Task Force of the Accessibility Guidelines Working Group (AGWG) and the Silver Community Group collaborated with researchers on questions that the Silver Groups identified. This research was used to develop 11 problem statements that needed to be solved for Silver. The conformance-related Silver generated Problem Statements are included as originally submitted for this document below in Appendix C.

Mitigation Approaches

In addition to describing in detail the challenges of assessing conformance, this document also explores some approaches to mitigating those challenges. Each of the main challenges sections below describes one or more significant mitigation approaches.

While some approaches may be more applicable to a particular website design than others; and not all approaches may be appropriate or practical for a particular website, it is likely that many websites can utilize at least some of these approaches. They are proposed as ways by which these conformance challenges may be mitigated, while maximizing the likelihood that all website visitors will be able to use the site effectively. Though the challenges described in this document illustrate that it is difficult for large, complex, and/or dynamic websites to ensure a site has no defects, mitigation strategies may be able to help create a website that people with disabilities can use even though some conformance issues may persist.

Goals

This document has two key goals:

To develop, catalog, and characterize the challenges with accessibility guidelines conformance, and conformance verification that have arisen both through the multi-year research process preceding work on W3C Accessibility Guidelines (WCAG 3.0) which have been in development under the name Silver, as well as through active discussion in the Silver Conformance Options Subgroup.
To develop, catalog, and characterize mitigation approaches to these challenges, so that websites can be as accessible as possible and better assessed for accessibility to visitors with disabilities.

A better understanding of the situations in which the WCAG 2.x conformance model may be difficult to apply could lead to more effective conformance models and testing approaches in the future.

It is important to recognize that success criteria in WCAG 2.x are quite distinct from the conformance model. These criteria describe approaches to content accessibility that are thoughtfully designed to enable people with a broad range of disabilities to effectively consume and interact with web content. Challenges with the conformance model do not in any way invalidate the criteria. For example, while requiring human judgment to validate a page limits testing to sampling of templates, flows, and top tasks, etc. (see Challenge #1 below), without that human judgement it may not be possible to deliver a page that makes sense to someone with a disability. Similarly, while it may not be possible to know that all third party content is fully accessible (see Challenge #3 below), without review of that content by someone sufficiently versed in accessibility it may not be possible to be sure that pages containing third party content fully conform to WCAG 2.x. Human judgement is a core part of much of WCAG 2.x for good reasons, and the challenges that arise from it are important to successfully grapple with.

Additional Background

This document is published to seek additional contributions from the wider web community on:

Any additional challenges, or further illustration of challenges in the existing identified areas below;
Contributions to the mitigation approaches, and questions or concerns about the mitigation approaches;

We seek to gain a thorough understanding of the challenges faced by large, complex, and dynamic websites who are attempting to provide accessible services to their website users. It is expected that a more thorough understanding of these challenges can lead to either a new conformance model, or an alternative model that is more appropriate for large, complex, and/or dynamic websites (in WCAG 3.0).

This document also includes previously published research from the Silver Task Force and Community Group that is related to Challenges with Accessibility Guidelines Conformance and Testing. There is some overlap between the challenges captured in this published research and the challenges enumerated in the first 4 sections of this document. The research findings have been folded into other sections of this document as appropriate.

Also present in this document is an introductory discussion of approaches to mitigate the impact of the challenges cited that have been suggested by various stakeholders. We are publishing this updated draft now to continue seeking wide review to further catalogue and characterize the challenges and mitigation approaches, so that this work can become input into W3C accessibility guidelines (WCAG 3.0).

1. Challenge #1: Scaling Conformance Verification

A challenge common to many success criteria is the inability for automatic testing to fully validate conformance and the subsequent time, cost, and expertise needed to perform the necessary manual test to cover the full range of the requirements. HTML markup can be automatically validated to confirm that it is used according to specification, but a human is required to verify whether the HTML elements used correctly reflect the meaning of the content. For example, text on a web page marked as contained in a paragraph element may not trigger any failure in an automated test, nor would an image with alternative text equal to red, white, and blue bird, but a human will identify that the text needs to be enclosed in a heading element to reflect the actual use on the page, and also that the proper alternative text for the image is American Airlines logo. Many existing accessibility success criteria require an informed human evaluation to ensure that the human end-users benefit from conformance. The same can be said of very large web-based applications that are developed in an agile manner with updates delivered in rapid succession, often on an hourly basis.

We can think of this as the distinction between quantitative and qualitative analysis. We know how to automatically test for and count the occurrences of relevant markup. However, we do not yet know how to automatically verify the quality of what that markup conveys to the user. In the case of adjudging appropriate quality, informed human review is still required.

Appendix A describes challenges with applying the WCAG 2.x conformance model to specific Guidelines and Success Criteria, primarily based on required human involvement in evaluation of conformance to them. The list is not exhaustive, but it covers the preponderance of known challenges with all A and AA Success Criteria.

1.1 Silver Research Findings

Silver research identified two further challenges related to scaling conformance verification:

In Constraints on What is Strictly Testable Silver finds that The requirement for valid and reliable testability for WCAG success criteria presents a structural barrier to including the needs of people with disabilities whose needs are not strictly testable. User needs such as thos articulated by the W3C's Cognitive and Learning Disabilities (COGA) Task Force in their extensive W3C Note publication Making content usable for people with cognitive and learning disabilities [coga-usable] only expand and exacerbate the need for expert human testing. As silver also notes: The entire principle of understandable is critical for people with cognitive disabilities, yet success criteria intended to support the principle are not easy to test for or clear on how to measure.
Silver also finds that Human evaluation does not yield consistent conclusions: Regardless of proficiency, there is a significant gap in how any two human auditors will identify a success or fail of criteria. … Ultimately, there is variance between: any two auditors; … Because there's so much room for human error, an individual may believe they've met a specific conformance model when, in reality, that's not the case. … There isn't a standardized approach to how the conformance model applies to success criteria at the organizational level and in specific test case scenarios.

1.2 Mitigations

There are a number of approaches for mitigating scaling challenges. For example, if pages can be built using a small number of page templates that are fully vetted for conformance to criteria relating to structure, heading use, and layout, then pages generated with those templates are much more likely to have well defined headings and structure. Further, if pages are limited to rendering images that come from a fully vetted library of images that have well defined ALT text, and these images are always used in similar contexts (i.e., a store inventory) then issues with poor ALT text can be minimized if not entirely eliminated. Another approach that can be used in some situations is to encode website content in a higher-level abstraction from HTML (e.g. a wiki-based website, when content authors can specify that a particular piece of text is to be emphasized strongly [which would be rendered within a <strong> </strong> block], but they cannot specify that a particular piece of text is to be made boldface [so that <b> </b> markup is never part of the website]). While none of these approaches can mitigate every challenge in conformance and testing with every success criterion, they are powerful approaches where applicable to help minimize accessibility issues in websites.

5. Challenge #5: Accessibility Supported

The Conformance section of WCAG 2.1 states that only accessibility-supported ways of using technologies can be relied upon for conformance. However, the definition of Accessibility Supported, together with the relevant section of Understanding WCAG leaves this conformance requirement under-specified by providing insufficient guidance on how it is to be realized.

The first Note under the definition of Accessibility Supported states that: The WCAG Working group and the W3C do not specify which or how much support by assistive technologies there must be for a particular use of a Web technology in order for it to be classified as accessibility supported. This is further expanded upon in the section Level of Assistive Technology Support Needed for Accessibility Support: This topic raises the question of how many or which assistive technologies must support a Web technology in order for that Web technology to be considered accessibility supported. The WCAG Working group and the W3C do not specify which or how many assistive technologies must support a Web technology in order for it to be classified as accessibility supported. This is a complex topic and one that varies both by environment and by language.

The centrality of Accessibility Supported in the many WCAG 2 success criteria that are tied to use with assistive technologies and platform accessibility features, combined with the lack of definition of what constitutes Accessibility Supported, means that expert human judgement is required to evaluate whether there is sufficient accessibility support for specific technical implementations of accessibility implemented to meet WCAG 2 success criteria. The lack of a broad base of accessibility expertise in the web development field, combined with the challenges mentioned above related to Challenge #1: Scaling Conformance Verification, (when human evaluation is required) make the Accessibility Supported requirement a further challenge unto itself. Understanding Conformance 2.0 further notes that: There is a need for an external and international dialogue on this topic. Meanwhile, the nonnormative WCAG evaluation methodology (WCAG-EM) advises approaching this conformance requirement by determining: the minimum set of combinations of operating systems, web browsers, assistive technologies, and other user agents that the website is expected to work with, and that is in-line with the WCAG 2.0 guidance on accessibility support.

Not only does this conformance requirement of WCAG 2.x ask the content provider to check their content markup with commonly used browsers, it also asks that they further check usability with an undefined range of assistive technologies on those same commonly used operating environments in order to make a conformance claim. This requirement greatly exacerbates Challenge #2: Large, complex, and dynamic websites may have too many changing permutations to validate effectively. It also exacerbates Challenge #3: 3rd party content, where third party content may have been shown to be Accessibility Supported with a set of browsers, access features, and assistive technologies that has no overlap with the set of browsers, access features, and assistive technologies that constitute Accessibility Support for the site hosting that third party content.

5.1 Silver Research Findings

Silver concluded that Accessibility Supported is a conformance requirement of WCAG 2 that is poorly understood and incompletely implemented, … i.e. the role of AT in assessing conformance. See additional details in Appendix C below.

5.2 Mitigations

We know of no useable mitigations to achieve the Accessibility Supported conformance requirement for public facing websites. WCAG-EM's Second Note suggests that: For some websites in closed networks, such as an intranet website, where both the users and the computers used to access the website are known, this baseline may be limited to the operating systems, web browsers and assistive technologies used within this closed network. It continues saying: However, in most cases this baseline is ideally broader to cover the majority of current user agents used by people with disabilities in any applicable particular geographic region and language community. Beyond placing the responsibility on the evaluator to establish this baseline, Note 5 in Understanding Conformance 2.0 suggests that: One way for authors to locate uses of a technology that are accessibility supported would be to consult compilations of uses that are documented to be accessibility supported. … Authors, companies, technology vendors, or others may document accessibility-supported ways of using Web content technologies. Unfortunately, we know of no such public repository.

A. Detailed Challenges with Scaling Conformance Verification for the Success Criteria in WCAG 2.1 A and AA

This appendix describes challenges with applying the WCAG 2.x conformance model to specific Guidelines and Success Criteria, primarily based on required human involvement in evaluation of conformance to them. The list is not exhaustive, but it covers the preponderance of known challenges with all A and AA Success Criteria. The purpose of this list is not to critique WCAG 2 nor to imply that sites and policies should not do their best, and strive to conform to it, but rather to indicate known areas for which it may not be possible to conform, and which a new conformance model would hopefully address.

We have seen the market respond to the increased demand for accessibility professionals in part due to the amount of required human involvement in the valuation of conformance, with many international efforts such as the International Association of Accessibility Professionals (IAAP) which train and/or certify accessibility professionals. While this is resulting in downward pressure on costs of testing and remediation with more accessibility professionals becoming available to meet the need, it doesn't in and of itself eliminate the challenges noted below. Furthermore, for the near term, it appears the demand will be greater than the supply of this type of specialized expertise.

Also, the Website Accessibility Conformance Evaluation Methodology (WCAG-EM) 1.0 [wcag-em] lays out a strategy to combine human testing and automated testing. In the model, automation is used for a large number of pages (or all pages) and sampling is used for human testing. The WCAG-EM suggests that the human evaluation sample might include templates pages, component libraries, key flows (such as choosing a product and purchasing it, or signing up for a newsletter, etc.), and random pages. While WCAG-EM provides a practical method for claiming conformance for a website, it doesn't fully address the challenges in making every part of every page in a large, dynamic website conform to every success criterion.

Text Alternatives for Non-Text Content

Guideline 1.1

Text alternatives for images are an early, and still widely used, accessibility enhancement to HTML. Yet text alternatives remain one of the more intractable accessibility guidelines to assess with automated accessibility checking. While testing for the presence of alternative text is straightforward, and a collection of specific errors (such as labeling a graphic spacer.gif) can be identified by automated testing, human judgment remains necessary to evaluate whether or not any particular text alternative for a graphic is correct and conveys the true meaning of the image. Image recognition techniques are not mature enough to fully discern the underlying meaning of an image and the intent of the author in its inclusion. As a simple example, an image or icon of a paper clip would likely be identified by image recognition simply as a paper clip. However, when a paper clip appears in content often its meaning is to show there is an attachment. In this specific example, the alternative text should be attachment, not paper clip. Similarly, the image of a globe (or any graphical object representing planet Earth) can be used for a multiplicity of reasons, and the appropriate alternative text should indicate the reason for that use and not descriptive wording such as globe or Planet Earth. One not uncommon use of a globe today expands to allow users to select their preferred language, but there may be many other reasonable uses of such an icon.

Time-Based Media

Guideline 1.2

Practices for creating alternatives to spoken dialog, and to describe visual content, were established in motion picture and TV content well before the world wide web came into existence. These practices formed the basis of the Media Accessibility User Requirements (MAUR) [media-accessibility-reqs] for time-based streaming media on the web in HTML5, which now supports both captioning and descriptions of video.

Yet, just as with text alternatives, automated techniques and testing aren't sufficient for creating and validating accessible alternatives to time-based media. For example, Automatic Speech Recognition (ASR) often fails when the speech portion of the audio is low quality, isn’t clear, or has background noise or sound-effects. In addition, current automated transcript creation software doesn't perform speaker identification, meaningful sound identification, or correct punctuation that all are necessary for accurate captioning. Work on automatically generated descriptions of video are in their infancy, and like image recognition techniques, don’t provide usable alternatives to video.

Similarly, while there is well articulated guidance on how to create text transcripts or captions for audio-only media (such as radio programs and audio books), automated techniques and testing again aren't sufficient for creating and validating these accessible alternatives. Knowing what is important in an audio program to describe to someone who cannot hear is beyond the state of the art. There are several success criteria under this Guideline that all share these challenges of manual testing being required to ensure alternatives accurately reflect the content in the media. These include:

Audio-only and video-only (Prerecorded) (Success Criterion 1.2.1)
Captions (Prerecorded) (Success Criterion 1.2.2)
Audio description or media alternative (Prerecorded) (Success Criterion 1.2.3)
Captions (Live) (Success Criterion 1.2.4)
Audio description (Prerecorded) (Success Criterion 1.2.5)

Info and Relationships

Success Criterion 1.3.1

Whether in print or online, the presentation of content is often structured in a manner intended to aid comprehension. Sighted users perceive structure and relationships through various visual cues. Beyond simple sentences and paragraphs, the sighted user may see headings with nested subheadings. There may be sidebars and inset boxes of related content. Tables may be used to show data relationships. Comprehending how content is organized is a critical component of understanding the content.

As with media above, automated testing can determine the presence of structural markup, and can flag certain visual presentations as likely needing that structural markup. But such automated techniques remain unable to decipher if that markup usefully organizes the page content in a way that a user relying on assistive technology can examine the page systematically and readily understand its content.

Meaningful Sequence

Success Criterion 1.3.2

Often the sequence in which content is presented affects its meaning. In some content there may be even more than one meaningful way of ordering that content. However, as with Info and Relationships above, automated techniques are unable to determine whether content will be presented to screen reader users in a meaningful sequence ordering. For example, the placement of a button used to add something to a virtual shopping cart is very important for screen reader users, as improper placement can lead to confusion about which item is being added.

Sensory Characteristics

Success Criterion 1.3.3

Ensuring that no instructions rely on references to sensory characteristics presents similar challenges to ensuring that color isn't the sole indicator of meaning (Success Criterion 1.4.1) – it is testing for a negative, and requires a deep understanding of meaning conveyed by the text to discern a failure programmatically. For example, while instructions such as select the red button reference a sensory characteristic, select the red button which is also the first button on the screen may provide sufficient non-sensory context to not cause a problem (and multi-modal, multi-sensory guidance is often better for users with cognitive impairments or non-typical learning styles).

Orientation

Success Criterion 1.3.4

While an automated test can determine that the orientation is locked, full evaluation of conformance to this criterion is tied to whether it is essential for the content to be locked to one specific orientation (e.g. portrait or landscape views of an interface rendered to a cell phone). This requires human judgment to ensure that, any time the orientation is locked, the orientation is essential to that content to determine conformance. As of yet, this requires human judgement and is not fully automatable.

Identify Input Purpose

Success Criterion 1.3.5

An automated test can easily determine that input fields use HTML markup to indicate the input purpose, however, manual verification is needed to determine that the correct markup was used to match the intent for the field. For example, for a name input field, there are 10 variations of HTML name purpose attributes with different meaning and using the incorrect markup would be confusing to the user.

Use of Color

Success Criterion 1.4.1

This poses the same challenges as Sensory Characteristics (Success Criterion 1.3.3). To discern whether a page fails this criterion programmatically requires understanding the full meaning of the related content on the page and whether any meaning conveyed by color is somehow also conveyed in another fashion (e.g. whether the meaning of the colors in a bar chart is conveyed in the body of associated text or with a striping/stippling pattern as well on the bars, or perhaps some other fashion).

Audio Control

Success Criterion 1.4.2

An automated test tool would be able to identify media/audio content in a website, identify whether auto-play is turned on in the code, and also determine the duration. However, an automated test tool cannot determine whether there is a mechanism to pause, stop the audio, or adjust the volume of the audio independent of the overall system volume level. This still requires manual validation.

Contrast (Minimum)

Success Criterion 1.4.3

Automated tools can check the color of text against the background in most cases. However, there are several challenges with using current state of the art automated tools for this success criterion, including (1) when background images are used, automated tests aren't reliably able to check for minimum contrast of text against the image—especially if the image is a photograph or drawing where the text is placed over the image, and (2) situations in which depending upon context such as text becoming incidental because it is part of an inactive user interface component or is purely decorative or part of a logo. These would take human intervention to sample the text and its background to determine if the contrast meets the minimum requirement.

Resize Text

Success Criterion 1.4.4

While automated tools can test whether it is possible to resize text on a webpage, it takes human evaluation to determine whether there has been a loss of content or functionality as a result of the text resizing.

Images of Text

Success Criterion 1.4.5

This poses the same challenge as Orientation (Success Criterion 1.3.4) - it is tied to whether it is essential for text to be part of an image. This requires human judgment, making this criterion not readily automatable. Additionally, methods of employing OCR on images will not accurately discern text of different fonts that overlap each other, or be able to recognize unusual characters or text with poor contrast with the background of the image.

Reflow

Success Criterion 1.4.10

While automated tests can detect the presence of vertical and horizontal scroll bars, there are currently no reliable tests to automate validating that there has been no loss in content or functionality. Human evaluation is also still needed to determine when two-dimensional scrolling is needed for content that requires two-dimensional layout for usage or meaning.

Non-text Contrast

Success Criterion 1.4.11

This success criterion requires several levels of checks that are difficult or impossible to automate as it allows for exceptions which require human intervention to examine the intent and potentially employ exceptions to comply with the guideline. Automated checks would have to include:

A way to identify UI components, which is easy for standard HTML elements, but more difficult for non-standard custom scripted components.
Whether the default user agent visual treatments for identifying UI components and states are being used (so that the exception can be utilized)
Where default treatments are not employed, a way to identify changes in state and then compare the two states for sufficient contrast. This requires human evaluation to test for differences in the portions of graphics used to show the different states or provide meaning (e.g. checked, unchecked, radio button selected, radio button unselected, toggle button selected vs. unselected and so on).
For graphical objects, a way to identify what part of the graphics are required to understand the content. Once identified, checks to determine whether the presentation of the graphics is essential to utilize the exception which requires human intervention.

Text Spacing

Success Criterion 1.4.12

This success criterion involves using a tool or method to modify text spacing and then checking to ensure no content is truncated or overlapping. There is currently no way to reliably automate validating that no loss of content of functionality has occurred when text spacing has been modified.

Content on Hover or Focus

Success Criterion 1.4.13

As content needs to be surfaced by providing focus using either a mouse pointer or keyboard focus, to then determine whether the following 3 criteria are met, this test currently requires human evaluation.

Dismissible: A mechanism is available to dismiss the additional content without moving pointer hover or keyboard focus, unless the additional content communicates an input error or does not obscure or replace other content;
Hoverable: If pointer hover can trigger the additional content, then the pointer can be moved over the additional content without the additional content disappearing;
Persistent: The additional content remains visible until the hover or focus trigger is removed, the user dismisses it, or its information is no longer valid.

Keyboard Operable

Success Criterion 2.1.1

While an automated test can evaluate whether a page can be tabbed through in its entirety, ensuring keyboard operability of all functionality currently requires a human to manually navigate through content to ensure all interactive elements are not only in the tab order, but can be fully operated using keyboard controls.

Character Key Shortcuts

Success Criterion 2.1.4

Character key shortcuts can be applied to content via scripting but whether and what these shortcut key presses trigger can only be determined by additional human evaluation.

Timing Adjustable

Success Criterion 2.2.1

There is currently no easy way to automate checking whether timing is adjustable. Ways of controlling differ in naming, position, and approach (including dialogs/popups before the time-out). This can also be affected by how the server registers user interactions (e.g. for automatically extending the time-out).

Pause, Stop, Hide

Success Criterion 2.2.2

Typically the requirement to control moving content is provided by interactive controls placed in the vicinity of moving content, or occasionally at the beginning of content. Since position and naming vary, this assessment cannot currently be automated (this involves checking that the function works as expected).

Three Flashes or Below Threshold

Success Criterion 2.3.1

There are currently no known automated tests that are accurately able to assess areas of flashing on a webpage to ensure that the flashing happens less than three times per second.

Bypass Blocks

Success Criterion 2.4.1

While it can be determined that native elements or landmark roles are used, there is currently no automated way to determine whether they are used to adequately structure content (are they missing out on sections that should be included). The same assessment would be needed when other Techniques are used (structure by headings, skip links).

Page titled

Success Criterion 2.4.2

Automating a check for whether the page has a title is simple; ensuring that the title is meaningful and provides adequate context as to the purpose of the page is not currently possible.

Focus Order

Success Criterion 2.4.3

There is currently no known way to automate ensuring that focus handling with dynamic content (e.g. moving focus to a custom dialog, keep focus in dialog, return to trigger) follows a logical order.

Link Purpose (In Context)

Success Criterion 2.4.4

Automated tests can check for the existence of links with the same name, as well as check whether links are qualified programmatically, but checking whether the link text adequately describes the link purpose still involves human judgment.

Multiple ways

Success Criterion 2.4.5

Automated tests can validate whether pages can be reached with multiple ways (e.g. nav and search), but will miss cases where exceptions hold (all pages can be reached from anywhere) and still require human validation.

Headings and Labels

Success Criterion 2.4.6

Automated tests can detect the existence of headings and labels, however, there is currently no way to automate determining whether the heading or label provides adequate context for the content that follows.

Pointer Gestures

Success Criterion 2.5.1

There are currently no known automated checks that would accurately detect complex gestures - even when a script indicates the presence of particular events like touch-start, the event called would need to be checked in human evaluation.

Pointer Cancellation

Success Criterion 2.5.2

When mouse-down events are used (this can be done automatically), checking for one of the following four options that ensure the functionality is accessible requires human evaluation:

No Down-Event: The down-event of the pointer is not used to execute any part of the function;
Abort or Undo: Completion of the function is on the up-event, and a mechanism is available to abort the function before completion or to undo the function after completion;
Up Reversal: The up-event reverses any outcome of the preceding down-event;
Essential: Completing the function on the down-event is essential.

Motion Actuation

Success Criterion 2.5.4

Motion activated events may be detected automatically but whether there are equivalents for achieving the same thing with user interface components currently requires human evaluation.

On Focus

Success Criterion 3.2.1

There is currently no reliable way to accurately automate checking whether a change caused by moving focus should be considered a change of content or context.

On Input

Success Criterion 3.2.2

There is currently no reliable way to accurately automate checking whether changing the setting of any user interface component should be considered a change of content or context, or to automatically detect whether relevant advice exists before using the component in question.

Error Identification

Success Criterion 3.3.1

Insuring whether an error message correctly identifies and describes the error accurately and in a way that provides adequate context currently requires human evaluation.

Labels or Instructions

Success Criterion 3.3.2

A.35 Edge cases (labels close enough to a component to be perceived as a visible label) will require a human check. Some labels may be programmatically linked but hidden or visually separated from the element to which they are linked. Whether instructions are necessary and need to be provided will hinge on the content. Human check needed.

Error Suggestion

Success Criterion 3.3.3

Whether an error suggestion is helpful or correct currently requires human evaluation.

Name, Role, Value

Success Criterion 4.1.2

Incorrect use of ARIA constructs can be detected automatically but constructs that appear correct may still not work, and widgets that have no ARIA (but need it to be understood) can go undetected. Human post-check of automatic checks is still necessary.

B. Detailed Challenges with Conformance Verification and Testing for Non-Web ICT

As noted in Challenge #4 Non-Web Information and Communications Technologies above, 18 success criteria out of the 38 A and AA criteria in WCAG 2.0 could be applied to non-web ICT only after replacing specific terms or phrases. 4 of those 12 (2.4.1, 2.4.5, 3.2.3, and 3.2.4) related to either a set of web pages or multiple web pages, which is more difficult to characterize for non-web ICT. Another 4 are the non-interference set (1.4.2, 2.1.2, 2.2.2, and 2.3.1) which need further special consideration as they would apply to an entire software application. The remaining 10 were more straightforward to apply to non-web ICT, but still required some text changes.

Since publication of WCAG2ICT, [wcag2ict] WCAG 2.1 was published introducing a number of additional success criteria at the A and AA levels. Some of these may also pose specific challenges for conformation verification and testing in the non-web ICT context.

The 18 success criteria noted in WCAG2ICT are discussed below in four sections, the last of which address the 14 of the 38 A and AA criteria in WCAG 2.0 which relate to an Accessibility Supported interface, which may not be possible for software running in a closed environment (e.g. an airplane ticket kiosk).

B.1 Set of Web Pages Success Criteria

These four success criteria, include either the term set of pages or multiple pages, which in the non-web ICT context becomes either a Set of Documents or a Set of Software Programs. In either case (document or software), whether the criterion applies is dependent upon whether such a set exists, which may require human judgment. Where that set is determined to exist, it may be difficult to employ programmatic testing techniques to verify compliance with the specific criterion.

Bypass Blocks

Success Criterion 2.4.1

To ensure this criterion is met for non-web documents, once the set of documents is defined, every document in the set must be searched for blocks of content that are repeated across all of those documents, and a mechanism to skip those repeated blocks. Since the blocks aren't necessarily completely identical (e.g. a repeated listing of all other documents in a set might not include the document containing that list), a tool to do this may not be straightforward, and in any case, no such tool is known to exist today to do this with non-web documents.

Similarly, to ensure this criterion is met for non-web software, once the set of software is defined, every software application in the set must be searched for blocks of content that are repeated across all of those applications, and a mechanism to skip those repeated blocks. Since the blocks aren't necessarily completely identical (e.g. a repeated listing of all other software in a set might not include the software application containing that list), a tool to do this may not be straightforward, and in any case, no such tool is known to exist today to do this with non-web software.

Multiple Ways

Success Criterion 2.4.5

To ensure this criterion is met for non-web documents, once the set of documents is defined, every document in the set must provide multiple mechanisms for locating every other document in the set. As noted by WCAG2ICT, if the documents are on a file system, it may be possible to browse through the files or programs that make up a set, or search within members of the set for the names of other members. A file directory would be the equivalent of a site map for documents in a set, and a search function in a file system would be equivalent to a web search function for web pages. However, if this is not the case, then the set of documents must expose at least 2 ways of locating every other document in the set. Determining if this is the case is not possible today with any testing tool we are aware of, and so would require human inspection.

Similarly, to ensure this criterion is met for non-web software, once the set of software is defined, every software application in the set must provide multiple mechanisms for locating every other application in the set. As noted by WCAG2ICT, if the software applications are on a file system, it may be possible to browse through the files or programs that make up a set, or search within members of the set for the names of other members. A file directory would be the equivalent of a site map for documents in a set, and a search function in a file system would be equivalent to a web search function for web pages. However, if this is not the case, then the set of software applications must expose at least 2 ways of locating every other application in the set. Determining if this is the case is not possible today with any testing tool we are aware of, and so would require human inspection.

Success Criterion 3.2.3

To ensure this criterion is met for non-web documents, once the set of documents is defined, every document in the set must be searched for the navigation mechanisms it contains (e.g. a table of contents, an index). Every document in that set must then be inspected to verify it contains the same navigation mechanisms as every other document, in the same relative order. Determining if this is the case is not possible today with any testing tool we are aware of, and so would require human inspection.

Similarly, to ensure this criterion is met for non-web software, once the set of software is defined, every software application in the set must be searched for the navigation mechanisms it contains (e.g. the way keyboard commands are implemented should be the same for every application). Every application in that set must then be inspected to verify it contains the same navigation mechanisms as every other software application. Determining if this is the case is not possible today with any testing tool we are aware of, and so would require human inspection.

Consistent Identification

Success Criterion 3.2.4

To ensure this criterion is met for non-web documents, once the set of documents is defined, every document in the set must be searched for all of the functional components (e.g. tables, figures, graphs, indices), noting how those components are identified. Every document in that set must then be inspected to verify that where they contain the same components as every other document in the set, they are identified in a consistent fashion. Determining if this is the case is not possible today with any testing tool we are aware of, and so would require human inspection.

Similarly, to ensure this criterion is met for non-web software, once the set of software is defined, every software application in the set must be searched for all of the functional components (e.g. menus, dialog boxes, other user interface elements and patterns), noting how those components are identified. Every application in that set must then be inspected to verify that where they contain the same components as every other software application in the set, they are identified in a consistent fashion. Determining if this is the case is not possible today with any testing tool we are aware of, and so would require human inspection.

B.2 Non-Interference Success Criteria

The non-interference success criteria are things that apply to all areas of the page. As explained in WCAG2ICT in the section Comments on Conformance, it wasn't possible to unambiguously carve up software into discrete pieces, and so the unit of evaluation for non-web software is the whole software program. As with any software testing this can be a very large unit of evaluation, and methods similar to standard software testing might be used. Standard software testing employs both programmatic testing and manual testing – automating what can be automated, and using human inspection otherwise. In the cases below, some level of human inspection or involvement would normally be part of the software testing strategy to verify compliance with these four criteria.

Audio Control

Success Criterion 1.4.2

Where non-web documents contain audio, especially audio that automatically plays in certain circumstances (e.g. a slide in a slide deck starts playing a video when that slide is shown), – this criterion is typically met through the user agent or software application or operating system the user is using to interact with the document (rather than through an affordance in the static document itself). Because of this, compliance with this success criterion may be software application or operating system dependent, and therefore difficult to assess compliance for outside of a specific, named application or operating system.

Where non-web software contains audio, especially audio that automatically plays in certain circumstances (e.g. making a ringing sound to indicate an incoming call) …

EDITOR'S NOTE
Section content yet to be written.

No Keyboard Trap

Success Criterion 2.1.2

Non-web documents rarely if ever include code for responding to keyboard focus. This criterion is typically met through the user agent or software application the user is using to interact with the document (rather than through an affordance in the static document itself). Because of this, compliance with this success criterion may be software application or operating system dependent, and therefore difficult to assess compliance for outside of a specific, named application or operating system. Even then, programmatic testing for this may not be possible.

Where non-web software contains a user interface that can be interacted with from a keyboard, it may be possible to test for this programmatically, though we are not aware of any such test today. Where interaction with the user interface is supported from a keyboard interface provided by an assistive technology (e.g. a Bluetooth keyboard driving a screen reader for a tablet or phone UI), programmatic testing may be especially challenging.

Pause, Stop, Hide

Success Criterion 2.2.2

As with audio, where non-web documents contain animation — especially animation that automatically plays in certain circumstances (e.g. a slide in a slide deck starts an animation when that slide is shown) — this criterion is typically met through the user agent or software application the user is using to interact with the document (rather than through an affordance in the static document itself). Because of this, compliance with this success criterion may be software application dependent, and therefore difficult to assess compliance for outside of a specific, named application. Even then, programmatic testing for this may not be possible.

Where non-web software contains animation — especially audio that automatically plays in certain circumstances (e.g. showing a trailer for a movie when the user selects a movie title) — this criterion is typically met through some setting in the application to suppress such animations, or perhaps in the operating system. Because it can be difficult to tell when the animation is not desired by the user and when it is (did the user ask to play a trailer?), this may not be possible to discern programmatically.

Three Flashes or Below Threshold

Success Criterion 2.3.1

While this success criterion may be difficult to programmatically test for in all situations (especially for software applications), there is nothing in this criterion that is otherwise challenging to apply in the non-web ICT context.

B.3 Remaining WCAG 2.0 A/AA success criterion mentioned in WCAG2ICT as needing additional text changes

Reflow

Success Criterion 1.4.10

EDITOR'S NOTE
Section content yet to be written.

Timing Adjustable

Success Criterion 2.2.1

EDITOR'S NOTE
Section content yet to be written.

Pointer Gestures

Success Criterion 2.5.1

EDITOR'S NOTE
Section content yet to be written.

Pointer Cancellation

Success Criterion 2.5.2

EDITOR'S NOTE
Section content yet to be written.

Language of Page

Success Criterion 3.1.1

EDITOR'S NOTE
Section content yet to be written.

Language of Parts

Success Criterion 3.1.2

The purpose of this success criterion is to enable assistive technologies like screen readers to determine the language used for different passages of text on a web page. While some software environments like Java and GNOME/GTK+ support this both for text substrings within a block of text as well as for individual user interface elements, others do not. Therefore, it may not be possible for some software to meet this success criterion. Separately, programmatic testing for this may not be possible, as expert human judgment is needed to determine what the correct language is for some text passages.

Error Prevention

Success Criterion 3.3.4

EDITOR'S NOTE
Section content yet to be written.

Parsing

Success Criterion 4.1.1

EDITOR'S NOTE
Section content yet to be written.

Name, Role, Value

Success Criterion 4.1.2

EDITOR'S NOTE
Section content yet to be written.

B.4 New A/AA Success Criteria in WCAG 2.1

Text Spacing

Success Criterion 1.4.12

EDITOR'S NOTE
Section content yet to be written.

B.5 Success Criteria Needing Special Treatment in Non-Accessibility Supported Environments

15 of the 38 A and AA criteria in WCAG 2.0 relate to an accessibility supported interface — they are designed with interoperability with assistive technologies in mind. Such interaction may not be possible for many types of software (e.g. software running in a closed environment like an airplane ticket kiosk). Thus, in those environments, the only way to address the needs articulated in these criteria may be for the software to be self-voicing for blind users who can hear, and otherwise self-accessible to the needs of people with other disabilities which are commonly supported via assistive technologies. It may not be feasible to support all disability user needs (e.g. including a refreshable braille display in the device to support deaf-blind users, and then maintaining those braille displays to ensure their mechanisms don't get damaged).

Non-Text Content (Success Criterion 1.1.1)
Audio-only and video-only (Prerecorded) (Success Criterion 1.2.1)
Audio description or media alternative (Prerecorded) (Success Criterion 1.2.3)
Info and Relationships (Success Criterion 1.3.1)
Meaningful Sequence (Success Criterion 1.3.2)
Resize Text (Success Criterion 1.4.4)
Images of Text (Success Criterion 1.4.5)
Reflow (Success Criterion 1.4.10)
Keyboard (Success Criterion 2.1.1)
Character Key Shortcuts (Success Criterion 2.1.4)
Language of Page (Success Criterion 3.1.1)
Language of Parts (Success Criterion 3.1.2)
Error Identification (Success Criterion 3.3.1)
Parsing (Success Criterion 4.1.1)
Name, Role, Value (Success Criterion 4.1.2)

C. Challenges of Conformance as identified from Silver Research

Now known as W3C Accessibility Guidelines (WCAG 3.0), this iteration of W3C accessibility guidance was conceived and designed to be research-based. Working over many years, the Silver Task Force of the Accessibility Guidelines Working Group (AGWG) and the Silver Community Group collaborated with researchers on questions that the Silver Groups identified. This research was used to develop 11 problem statements that needed to be solved for Silver. The detailed problem statements include the specific problem, the result of the problem, the situation and priority, and the opportunity presented by the problem. The problem statements were organized into three main areas: Usability, Conformance, and Maintenance. The section following is taken from the Conformance sections of the Silver Design Sprint Final Report and the Silver Problem Statements. Details of the research questions and the individual reports are in Research Archive of Silver wiki.

The following is shown as originally presented by the Silver task force, Key conclusions have been folded into specific enumerated challenges as appropriate.

C.1 Silver Research Problem Statements

Originally published as the Silver Design Sprint Final Report (2018). These problem statements were presented to the Silver Design Sprint participants.

Constraints on What is Strictly Testable provides an obstacle to including guidance that meets the needs of people with disabilities but is not conducive to a pass/fail test.
Human Testable (related to Ambiguity) also relates to differences in knowledge and priorities of different testers achieve different results.
Accessibility Supported is a conformance requirement of WCAG 2 that is poorly understood and incompletely implemented.
Evolving Technology of the rapidly changing web must constantly be evaluated against the capabilities of assistive technology and evolving assistive technology must be evaluated against the backward compatibility of existing websites.

C.2 Details of Problem Statements

Originally published as Silver Problem Statements, this was a detailed analysis of the research results behind the above list.

C.2.1 Definition of Conformance

Conformance to a standard means that you meet or satisfy the requirements of the standard. In WCAG 2.0 the requirements are the Success Criteria. To conform to WCAG 2.0, you need to satisfy the Success Criteria, that is, there is no content which violates the Success Criteria.

WCAG 2.0 Conformance Requirements:

Conformance Level (A to AAA)

Conformance Scope (For full web pages only, not partial)

Complete Process

Only "Accessibility-supported" ways of using technologies

Non-Interference: Technologies that are not accessibility supported can be used, as long as all the information is also available using technologies that are accessibility supported and as long as the non-accessibility-supported material does not interfere.

C.2.2 Themes from Research

No monitoring process to test the accuracy of WCAG compliance claims (Keith et al., 2012)
Difficulties for conformance (Keith et al., 2012)
- Third parties documents, applications and services
- Know-how of IT personnel
- Tension between accessibility and design
Specific success criteria for failure - 1.1.1 , 2.2., 4.1.2 (Keith et al., 2012)
Reliably Human Testable, not reliably testable Is Accessibility Conformance an Elusive Property? (Brajnik et al. 2012), found the average agreement was at the 70-75% mark, while the error rate was around 29%.
- Expertise appears to improve (by 19%) the ability to avoid false positives. Finally, pooling the results of two independent experienced evaluators would be the best option, capturing at most 76% of the true problems and producing only 24% of false positives. Any other independent combination of audits would achieve worse results.
- This means that an 80% target for agreement, when audits are conducted without communication between evaluators, is not attainable, even with experienced evaluators.
Challenges and Recommendations (Alonso et al., 2010)
- accessibility supported ways of using technologies
- Testability of Success Criteria
- Openness of Techniques and Failures
- Aggregation of Partial Results
Silver needs to expand the scope beyond web to include apps, documents, authoring, user agents, wearables, kiosks, IoT, VR, etc. and be inclusive of more disabilities. (UX Professionals Use of WCAG: Analysis)
Accessibility Supported allows inadequate assistive technologies to be claimed for conformance, particularly in non-native English speaking countries. (Interviews on Conformance)

C.2.3 Constraints on What is Strictly Testable

Specific problem: Certain success criteria are quite clear and measurable, like color contrast. Others, far less so. The entire principle of understandable is critical for people with cognitive disabilities, yet success criteria intended to support the principle are not easy to test for or clear on how to measure. As a simple example, there is no clear, recent or consistent definition – within any locale or language – on what lower secondary education level means in regard to web content. Language and text content is also not the only challenge among those with cognitive and learning disabilities. Compounding this, most of the existing criteria in support of understanding are designated as AAA, which relatively few organizations attempt to conform with.

Result of problem: The requirement for valid and reliable testability for WCAG success criteria presents a structural barrier to including the needs of people with disabilities whose needs are not strictly testable. Guidance that WCAG working group members would like to include cannot be included. The needs of people with disabilities – especially intellectual and cognitive disabilities – are not being met.

Situation and Priority: Of the 70 new success criteria proposed by the Cognitive Accessibility Task Force to support the needs of people with cognitive and intellectual disabilities, only four to six (depending on interpretation) were added to WCAG 2.1 and only one is in level AA. The remainder are in level AAA, which is rarely implemented. This means user needs are not met.

Opportunity: Multiple research projects and audience feedback have concluded that simpler language is desired and needed for audiences of the guidelines. Clear but flexible criteria with considerations for a wider spectrum of disabilities helps ensure more needs are met.

C.2.4 Human Testable

Specific problem: Regardless of proficiency, there is a significant gap in how any two human auditors will identify a success or fail of criteria. Various audiences have competing priorities when assessing the success criteria of any given digital property. Knowledge varies for accessibility standards and how people with disabilities use assistive technology tools. Ultimately, there is variance between: any two auditors; any two authors of test cases; and human bias. Some needs of people of disabilities are difficult to measure in a quantifiable way.

Result of problem: Success criteria are measured by different standards and by people who often make subjective observations. Because there's so much room for human error, an individual may believe they've met a specific conformance model when, in reality, that’s not the case. The ultimate impact is on an end user with a disability who cannot complete a given task, because the success criteria wasn’t properly identified, tested and understood.

Situation and Priority: There isn't a standardized approach to how the conformance model applies to success criteria at the organizational level and in specific test case scenarios.

Opportunity: There's an opportunity to improve the success criteria such that human auditors and testers find the success criteria more understandable. Educating business leaders on how the varying levels of conformance apply to their organization may be useful as well. We can educate about the ways that people with disabilities use their assistive technology.

C.2.5 Accessibility Supported

Specific problem: Accessibility supported was implemented in a way that did not facilitate consistent adoption by developers and testers. It also requires a harmonious relationship and persistent interoperability between content technologies and requesting technologies that must be continuously evaluated as either is updated. Further, the WG defers the judgment of how much, how many, or which AT must support a technology to the community. It is poorly understood, even by experts.

Result of problem: Among the results are: difficulty understanding what qualifies as a content technology or an assistive technology; difficulty quantifying assistive technologies or features of user agents; claiming conformance with inadequate assistive technology; and difficulty claiming conformance.

Situation and Priority: Any claim or assertion that a web page conforms to the guidelines may require an explicit statement defining which assistive technology and user agent(s) the contained technologies rely upon, and presumably inclusive of specific versions and or release dates of each. One could infer then that a conformance claim is dependent upon a software compatibility claim naming browsers and assistive technology and their respective versions. This would create a burden to author and govern such claims. Additionally, no one can predict and anticipate new technologies and their rates of adoption by people with disabilities.

Opportunity: As the technologies in this equation evolve, the interoperability may be affected by any number of factors outside of the control of the author and publisher of a web page. Either accessibility supported should not be a component of conformance requirements, or it should clearly, concisely, and explicitly define and quantify the technologies or classes of technologies, AND set any resulting update or expiry criteria for governance.

C.2.6 Evolving Technology

Specific problem: Evolving Technology: As content technology evolves, it must be re-evaluated against assistive technology for compatibility. Likewise, as assistive technology evolves or emerges, it must be evaluated against the backward compatibility of various content technology.

Result of problem: There is no versioning consideration for updates to user agents and assistive technology. Strict conformance then typically has an expiry.

Situation and Priority: There is no clear and universal understanding of the conformance model or its longevity. Some will infer that there is always a conformance debt when any technology changes.

Opportunity: Consider conformance statements to include an explicit qualifier of time of release or versions of technology. OR consider a more general approach that is not explicit and is flexible to the differences in technologies as they evolve, identifying the feature of the assistive tech rather than the version of the assistive tech. OR consider a model that quantifies conformance as a degree of criteria met.

Challenges with Accessibility Guidelines Conformance and Testing, and Approaches for Mitigating Them

W3C Editor's Draft 15 July 2025

Abstract

Status of This Document

Introduction

Problem Statement

Silver Research Findings

Mitigation Approaches

Goals

Additional Background

Key Terms

1. Challenge #1: Scaling Conformance Verification

1.1 Silver Research Findings

1.2 Mitigations

2. Challenge #2: Large, complex, and dynamic websites may have too many changing permutations to validate effectively

2.1 Mitigations

3. Challenge #3: 3rd party content

3.1 Treatment of 3rd party content and Statements of Partial Conformance

3.2 Silver Research Findings

3.3 Mitigations

4. Challenge #4: Non-Web Information and Communications Technologies

4.1 Silver Research Findings

4.2 Mitigations

5. Challenge #5: Accessibility Supported

5.1 Silver Research Findings

5.2 Mitigations

A. Detailed Challenges with Scaling Conformance Verification for the Success Criteria in WCAG 2.1 A and AA

Text Alternatives for Non-Text Content

Time-Based Media

Info and Relationships

Meaningful Sequence

Sensory Characteristics

Orientation

Identify Input Purpose

Use of Color

Audio Control

Contrast (Minimum)

Resize Text

Images of Text

Reflow

Non-text Contrast

Text Spacing

Content on Hover or Focus

Keyboard Operable

Character Key Shortcuts

Timing Adjustable

Pause, Stop, Hide

Three Flashes or Below Threshold

Bypass Blocks

Page titled

Focus Order

Link Purpose (In Context)

Multiple ways

Headings and Labels

Pointer Gestures

Pointer Cancellation

Motion Actuation

On Focus

On Input

Error Identification

Labels or Instructions

Error Suggestion

Name, Role, Value

B. Detailed Challenges with Conformance Verification and Testing for Non-Web ICT

B.1 Set of Web Pages Success Criteria

Bypass Blocks

Multiple Ways

Consistent Navigation

Consistent Identification

B.2 Non-Interference Success Criteria

Audio Control

No Keyboard Trap

Pause, Stop, Hide

Three Flashes or Below Threshold

B.3 Remaining WCAG 2.0 A/AA success criterion mentioned in WCAG2ICT as needing additional text changes

Reflow

Timing Adjustable

Pointer Gestures

Pointer Cancellation

Language of Page