Inaccessibility of CAPTCHA

Alternatives to Visual Turing Tests on the Web

W3C Editor's Draft

More details about this document
This version:
https://w3c.github.io/captcha-accessibility/
Latest published version:
https://www.w3.org/TR/turingtest/
Latest editor's draft:
https://w3c.github.io/captcha-accessibility/
History:
https://www.w3.org/standards/history/turingtest
Editors:
Janina Sajka
Jason White
Michael Cooper (W3C)
Former editor:
Matt May (W3C)

Abstract

Various approaches have been employed over many years to distinguish human users of web sites from robots. The traditional CAPTCHA approach asking users to identify obscured text in an image remains common, but other approaches have emerged. All interactive approaches require users to perform a task believed to be relatively easy for humans but difficult for robots. Unfortunately the very nature of the interactive task inherently excludes many people with disabilities, resulting in a denial of service to these users. Research findings also indicate that many popular CAPTCHA techniques are no longer particularly effective or secure, further complicating the challenge of providing services secured from robotic intrusion yet accessible to people with disabilities. This document examines a number of approaches that allow systems to test for human users and the extent to which these approaches adequately accommodate people with disabilities, including recent non-interactive and tokenized approaches. We have grouped these approaches by two principal category classifications: Legacy Approaches that typically have significant accessibility and security-related limitations, and state-of-the-art approaches that, individually or in combination, meet security objectives without compromising access for people with disabilities.

Status of This Document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was published by the Accessible Platform Architectures Working Group as an Editor's Draft.

Publication as an Editor's Draft does not imply endorsement by W3C and its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 2 November 2021 W3C Process Document.

1. Introduction

1.1 The CAPTCHA Context

Web sites which provide interactive services have long sought to limit their services only to human users. They seek to avoid exposing their collected data and content publishing services to ever more cleverly articulated web robots. Whether the service be travel and event ticketing, email, blogging, or calendaring services, social media services, or some combination of these and many more, experience has demonstrated that even authenticated login provides inadequate protection from malicious actors. Such sites still need to know their interacting user is a human individual, and not a software robot. Arguably the industry's need for reliable Turing testing is only growing more critical.

An early (and still widespread) solution relies on the use of graphical representations of text in registration or comment areas of a web site. The site attempts to verify that the user is in fact a human by requiring the user to complete a task referred to as a "Completely Automated Public Turing Test, to Tell Computers and Humans Apart," or CAPTCHA. The assumption is that humans find this task relatively easy, while robots find it nearly impossible to perform.

The CAPTCHA was initially developed by researchers at Carnegie Mellon University and has been primarily associated with a technique whereby an individual identifies a distorted set of characters in a bit-mapped image, then enters those characters into a web form. This approach is widely familiar to users of the web, though the term CAPTCHA is generally recognized only by web professionals.

In recent times the types of CAPTCHA that appear on web sites and mobile apps have changed significantly. Since our concern here is the accessibility of systems that seek to distinguish human users from their robotic impersonators, the term “CAPTCHA” is used in this document generically to refer to all approaches which are specifically designed to differentiate a human from a computer, including fully non-interactive approaches.

It will surprise no one that we applaud the recent emergence of non-interactive approaches because functional non-interactive approaches pose no accessibility challenge to users. Unfortunately, some current non-interactive approaches come at the price of exposing much data about the individual user to the non-interactive host analysis engine that user might rather prefer to keep confidential. We are further heartened by the even more recent development of tokenized approaches that promise trustable Turing testing requiring only minimal interaction with users.

1.2 The Accessibility Challenge

While online users continue broadly to report finding traditional CAPTCHAs frustrating to complete, it is generally assumed that an interactive CAPTCHA can be resolved within a few incorrect attempts. The point of distinction for people with disabilities is that a CAPTCHA not only separates computers from humans, but also often prevents people with disabilities from performing the requested procedure. For example, asking users who are blind, visually impaired or dyslexic to identify textual characters in a distorted graphic is asking them to perform a task they are intrinsically least able to accomplish. Similarly, asking users who are deaf, hard of hearing, or living with auditory processing disorder to identify and transcribe in writing the content of an audio CAPTCHA is asking them to perform a task they’re intrinsically least likely to accomplish. Furthermore, traditional CAPTCHAs have generally presumed that all web users can read and transcribe English-based words and characters, thus making the test inaccessible to a large number of non-English speaking web users worldwide. Frankly, a design pattern that expects multiple attempts from users as a matter of course is arguably inaccessible by design to persons living with an anxiety disorder as well as to many living with a range of other cognitive and learning disabilities.

As software developers at Cloudflare have observed [attestation], considerable time that could be devoted to more productive tasks is lost as a result of responding to CAPTCHA challenges. According to the authors' data, an average of 32 seconds are required for a user to complete a CAPTCHA. Applied to a large user population, the total time lost becomes significant. Although reliable estimates are not readily available, it is reasonable hypothesize that, even for people with disabilities who can complete CAPTCHA challenges successfully, albeit with some difficulty or inconvenience, the typical time devoted to the task is likely to be longer than the average for the population as a whole. Hence the cost can be assumed to be disproportionately large compared with that incurred by the general population.

.

While Accessibility best practices require, and assistive technologies expect, substantive graphical images to be authored with text equivalents, alternative text in CAPTCHA images would clearly be self-defeating. CAPTCHAs are, consequently, allowed under the W3C's Web Content Accessibility Guidelines (WCAG) provided that "text alternatives that identify and describe the purpose of the non-text content are provided, and alternative forms of CAPTCHA using output modes for different types of sensory perception are provided to accommodate different disabilities."

It is important to understand the limitation of the WCAG CAPTCHA exemption. It applies only to the content of the CAPTCHA. WCAG still requires that alternative text identify the graphical object as a CAPTCHA. Conformance with all other WCAG guidelines also remains critical for web accessibility.

The rationale for this highly specific exemption in WCAG is simple. A CAPTCHA without an accessible and usable alternative makes it impossible for users with certain disabilities to create accounts, write comments, or make purchases on such sites. In essence, such CAPTCHAs fail to properly recognize users with disabilities as human, obstructing their participation in contemporary society. Such issues also extend to situational disabilities whereby a user may not be able to effectively view a traditional CAPTCHA on a mobile device due to the small screen size, or hear an audio-based CAPTCHA in a noisy environment.

1.3 CAPTCHA Effectiveness

Malicious activity on the web has only grown over the years to comprise an alarmingly high percentage of all Internet traffic. While we would certainly not suggest the web's woes arise from sloppy or ill-considered CAPTCHA implementations, we do suggest current conditions only reinforce the importance of well considered and closely monitored security and privacy strategies consistent with appropriate user support that includes people with disabilities. Getting CAPTCHA right needs to be part of the solution.

It is important to acknowledge that using a CAPTCHA as a security solution is becoming increasingly ineffective. Current CAPTCHA methods that rely primarily on traditional image-based approaches, logic problems, or audio CAPTCHA alternatives can be largely cracked using both complex and simple computer algorithms. Research suggests that as character-based CAPTCHAs become increasingly vulnerable to defeat by advancing optical character recognition technologies, more severe distortion of the characters is introduced to resist these attacks. However, such enhanced distortion techniques also make it progressively less feasible even for humans who are well endowed with sensory and cognitive capacity to solve CAPTCHA challenges reliably, ultimately making character-based CAPTCHAs impractical [captcha-ocr].

Pattern-matching algorithms can achieve an even higher success rate of cracking CAPTCHAs in some instances, as demonstrated in CAPTCHA Security: A Case Study [captcha-security] and HMM-based Attacks on Google’s reCAPTCHA with Continuous Visual and Audio Symbols [recaptcha-attacks]. While efforts are being made to strengthen traditional CAPTCHA security, more robust security solutions risk reducing the typical user’s ability to understand the CAPTCHA that needs to be resolved, e.g., Defeating line-noise CAPTCHAs with multiple quadratic snakes [defeat-line-noise]. A recent study at the University of Maryland has demonstrated 90% success rate cracking Google's audio reCAPTCHA using Google's own speech recognition service. Indeed, as noted below, Google's reCAPTCHA v2 service has recently begun declining to actually provide the audio CAPTCHA alternative clearly proffered onscreen.

In fact it is arguable that online services which offer the content developer a ready solution for distinguishing human users from robots may well be helping defeat that very function. For example Google's reCAPTCHA proclaims : "Hundreds of millions of CAPTCHAs are solved by people every day. reCAPTCHA makes positive use of this human effort by channeling the time spent solving CAPTCHAs into annotating images and building machine learning datasets. This in turn helps improve maps and solve hard AI problems." It is legitimate to consider whether it also describes a classic vicious cycle which is helping defeat the effectiveness of visual and auditory CAPTCHA deployments.

It is therefore highly recommended that the purpose and effectiveness of any deployed solution be carefully considered and evaluated across multiple browser and operating system environments before adoption, and then closely monitored for effective performance. Alternative security methods, such as two-step or multi-device verification, along with emerging protocols for identifying human users with high reliability should also be carefully considered in preference to traditional image-based CAPTCHA methods for both security and accessibility reasons.

2. Legacy Approaches

Many techniques are available to web sites to discourage or eliminate fraudulent activities such as inappropriate account creation. Several of them may be as effective as the visual verification technique while being somewhat more accessible to people with disabilities. Others may be overlaid as an accommodation for the purposes of accessibility. All of these techniques are interactive, in that they require input from the user which is intended to establish human agency, and to exclude automated processes from further accessing the protected Web resource. To achieve their security objective, these techniques challenge the user with tasks that are supposed to be amenable to correct performance by humans, but not by non-interactive computer programs. However, tasks that reliably establish human interaction without disadvantaging people with disabilities are difficult to design, and this is all the more true as the capacity of algorithms to perform tasks previously requiring human intelligence continues to improve.

2.1 Traditional Character-Based CAPTCHA

The traditional character-based CAPTCHA, as previously discussed, is largely inaccessible and insecure. It focuses on the presentation of letters or words presented in an image and designed to be difficult for robots to identify. The user is then asked to enter the CAPTCHA information into a form.

The use of a traditional CAPTCHA is obviously problematic for people who are blind, as the screen readers they rely on to use web content cannot process the image, thus preventing them from uncovering the information required by the form. Because the characters embedded in a CAPTCHA are often distorted or have other characters in close proximity to each other in order to foil technological solution by robots, they are also very difficult for users with other visual disabilities. This common CAPTCHA technique is also less reliably solved by users with cognitive and learning disabilities, see The Effect of CAPTCHA on User Experience among Users with and without Learning Disabilities [captcha-ld]. Because they’re intentionally distorted to foil robots, they also foil users who are more easily confused by surrealistic images or who do not possess sufficiently acute vision to “see” beyond the presented distortion and uncover the text the site requires in order to proceed.

While some sites have begun providing CAPTCHAs utilizing languages other than English, an assumption that all web users can understand and reproduce English predominates. Clearly, this is not the case. Research has demonstrated how CAPTCHAs based on written English impose a significant barrier to many on the web; see Effects of Text Rotation, String Length, and Letter Format on Text-based CAPTCHA Robustness [captcha-robustness]. This problem is likely to increase when using Latin-script characters beyond the ASCII range, with accents and diacritics, or shapes not included in the set used for English. For example, speakers of Arabic or Thai may not have enough knowledge to identify a distorted version of such characters. Furthermore, users may not have the necessary keys available on their local keyboard.

2.2 Sound Output

To re-frame the problem, text is easy to manipulate, which is good for assistive technologies, but just as good for robots. One logical solution to this problem is to offer another non-textual method of using the same content. To achieve this, audio is played that contains a series of characters, words, or phrases being read out which the user then needs to enter into a form. As with visual CAPTCHA however, robots are also capable of recognizing spoken content—as Amazon’s Alexa and Android’s Google Assistant, among other spoken dialog systems, have so ably demonstrated. Consequently, the characters, words, or phrases the user is to uncover and transcribe in the form are also distorted in an audio CAPTCHA and are usually played over a sonic environment of obfuscating sounds.

The industry recognized this problem early. CNet reported in Spam-bot tests flunk the blind [newscom] that “Hotmail’s sound output, which is itself distorted to avoid the same programmatic abuse, was unintelligible to all four test subjects, all of whom had good hearing.”

If the sound output, which is itself distorted to avoid the same programmatic abuse, can render the CAPTCHA difficult to hear; there can also be confusion in understanding whether a number is to be entered as a numerical value or as a word, e.g.,‘7’ or ‘seven’. Often the audio CAPTCHA user will hear sounds which seem to be words or numerical values that should be entered, but turn out to be just background noise.

Sound is also intrinsically temporal, but the import of this unavoidable fact is too often under appreciated—perhaps because the world we live in as seen through the eyes is also temporal. Unlike the real world seen through the eyes however, the traditional CAPTCHA is a still image that can be stared at until comprehension dawns. Sound has no analog to the visual still image.

Whenever any portion of an audio CAPTCHA is not understood; at least some part of the CAPTCHA must be replayed, usually several times. Currently, few audio CAPTCHAs provide an easily invoked and reliable replay feature, let alone an independent volume control or a pause, rewind, and fast-forward feature. Consequently, an entirely new audio CAPTCHA is often played should any part of one audio CAPTCHA prove difficult to understand.

Some audio CAPTCHA tacitly admit this failure by offering a link allowing the user to Download the audio CAPTCHA, typically as a mp3 file. The implicit assumption is that the user will use a favorite audio player—which does provide for independent volume control and pause, play, rewind, and fast forward capabilities—to play the audio CAPTCHA MP3 file again and again until comprehension dawns, perhaps pausing and rewinding the playback and perhaps writing down on the side the text destined for the web form. Clearly this is very inconvenient and subject to web site time outs. It also illustrates why simply providing an audio CAPTCHA alternative to the traditional visual CAPTCHA does not provide equivalent access to the user.

Furthermore, just as not all web users should be presumed proficient with English in visual CAPTCHA, they should not be presumed capable of understanding and transcribing aural English in an audio CAPTCHA. Unfortunately, non English audio CAPTCHAs appear to be very rare indeed. As of this writing, we are aware of only one stand-alone multilingual CAPTCHA solution provider with support for a significant number of the world's languages.

Users who are deaf-blind, don’t have or use a sound card, find themselves in noisy environments, or don’t have required sound plugins properly configured and functioning, are thus also prevented from proceeding. Furthermore, relatively few audio CAPTCHAs properly support all the various browsers and operating systems in use today. Similarly, users of browsers which do not support easy direction of sound output to a particular audio device, or to all available audio devices on the system, are also hampered.

Users who live with some form of cognitive disability may also find audio CAPTCHAs even more difficult to solve than character-based visual CAPTCHAs. Audio CAPTCHAs are known to impose a cognitive overload to all human users in comparison to the cognitive load necessary to understand normal human speech [information-security]. Further, studies of CAPTCHAs requiring human recognition of distorted or obscured speech have shown that they are more difficult for all users to solve and more demanding in terms of time and efforts compared to text or image-based CAPTCHAs. [solving-captchas]. These facts make audio CAPTCHAs a poor choice for users with cognitive disabilities.

Although auditory forms of CAPTCHA that present distorted speech create recognition difficulties for screen reader users, the accuracy with which such users can complete the CAPTCHA tasks is increased if the user interface is carefully designed to prevent screen reader audio and CAPTCHA audio from being intermixed. This can be achieved by implementing functions for controlling the audio that do not require the user to move focus away from the text response field; see Evaluating existing audio CAPTCHAs and an interface optimized for non-visual use [eval-audio].

Experiments with a combined auditory and visual CAPTCHA requiring users to identify well known objects by recognizing either images or sounds, suggest that this technique is highly usable by screen reader users. However, its security-related properties remain to be explored, as mentioned in Towards a universally usable human interaction proof: evaluation of task completion strategies [task-completion].

2.3 Logic Puzzles

The goal of visual verification is to separate human from machine. One reasonable way to do this is to test for logic. Simple mathematical or word puzzles, trivia, spatial tasks, or similar logic tests may raise the bar for robots, at least to the point where using them is more attractive elsewhere.

The use of logic puzzles as a CAPTCHA technique, however, introduces substantial barriers to access for people with language, learning or cognitive disabilities. An individual living with dyscalculia will understandably find even simple arithmetic puzzles challenging. A blind individual will be unable to identify the hammer from among graphical depictions of common tools. When puzzles are used, therefore, it is advisable to support a variety of puzzles so that someone unable to solve a given puzzle can obtain a different kind of puzzle when requesting another challenge.

Any development of CAPTCHA challenges in this direction should be accompanied by thorough usability research involving people with a variety of language, learning, and cognitive disabilities, as such an approach remains largely unexplored in practice and in the research literature. It should also be noted that answers may need to be handled flexibly, if they require free-form text. Also, a system would have to maintain a vast number of questions, or shift them around programmatically, in order to keep spiders from capturing them all for use by web robots. Puzzle-based CAPTCHA challenges are also readily subject to defeat by human operators engaged in crowd-sourcing activity on behalf of attackers.

2.4 Image and Video

2.4.1 Visual Comparison CAPTCHAs

There are a number of CAPTCHA techniques based on the identification of still images. This can include requiring the user to identify whether an image is a man or a woman, or whether an image is human-shaped or avatar-shaped among other comparison approaches, such as CAPTCHAStar! A novel CAPTCHA based on interactive shape discovery, [captchastar], FaceCAPTCHA: a CAPTCHA that identifies the gender of face images unrecognized by existing gender classifiers [facecaptcha], and Social and egocentric image classification for scientific and privacy applications [social-classification].

While alternative audio comparison CAPTCHAs might be explored such as using similar or different sounds for comparison, the reliance on visual comparison alone makes these approaches difficult, if not impossible for people with vision-related disabilities. They're also very difficult for people living with visual processing disorders, among other cognitive and learning disabilities.

2.4.2 3D CAPTCHA

A 3D representation of letters and numbers can make it more difficult for OCR software to identify them, in turn increasing the security of the CAPTCHA, described in On the security of text-based 3D CAPTCHAs [3d-captcha-security]. However, this solution raises similar accessibility issues to traditional CAPTCHAs.

2.4.3 Movement-Based and Video Game CAPTCHA

This process is based on the movement of interactive elements such as a slider or the completion of a basic video game as a CAPTCHA, like Game-based image semantic CAPTCHA on handset devices [game-captcha]. The benefits include removal of language barriers, and the removal of CAPTCHA frustration due to the presumed intuitiveness of the associated task and the enjoyment of playing video games.

Importantly, the implementation of this CAPTCHA would need to support multiple input interfaces as different devices may lack some input methods such as a keyboard or touchscreen. Another potential issue is that screen reader support for interface elements may unintentionally provide a backdoor for the CAPTCHA to be bypassed by allowing a bot to play the game.

2.5 The Google reCAPTCHA v2

2.5.1 Background

Acquired in 2009 from Carnegie Mellon University, Google's reCAPTCHA overwhelmingly dominates CAPTCHA deployment on the web today. However, reCAPTCHA Version 1 is no longer supported.

2.5.2 Version 2: Are you a robot?

reCAPTCHA v2 provided an API that was most effectively marketed as the "no CAPTCHA re CAPTCHA," and its checkbox proclaiming: "I'm not a robot" became a cultural icon, spawning various cultural offshoots in art, theater, and popular music.

The checkbox was, of course, never a checkbox in the traditional HTML sense. The pseudo-checkbox process became a prodigious collector of user data well beyond mouse movement and keyboard navigation, including the date, the language the browser is set to, all cookies placed by Google over the last 6 months, CSS information for that page, an inventory of mouse clicks made on that screen (or touches if on a touch device), an inventory of plugins installed on the browser, and an itemization of all JavaScript objects, all to determine whether the user is human or robot. Of course Google also generally knows much about individual users, including their customary IP addresses, the telephone numbers and email addresses of their friends, family and colleagues, where they have been at every moment of every day, as well as their web search and YouTube habits. This is why the simple checkbox could keep the CAPTCHA process disarmingly simple, though it also explains why a link to Google's privacy policy has always accompanied the "no CAPTCHA reCAPTCHA". Disclosure and certain provisions of the Privacy Policy are required to satisfy legal requirements in California and in the E.U.

Even though specific WCAG failures were often noted, Google's reCAPTCHA v2 was for a time regarded the most accessible CAPTCHA solution for one simple reason, it was capable of being comfortably completed using a variety of assistive technologies. More recently it has been widely observed that utilizing keyboard navigation, as many assistive technology users do, no longer works. Instead, users are presented with a traditional inaccessible CAPTCHA as a fall-back mechanism. Our own tests with various browsers on various operating environments have been generally successful with Google's own reCAPTCHA test page. However, browsing in incognito mode, clearing or blocking cookies, and additional factors can apparently trigger a fallback to traditional CAPTCHA these days for many assistive technology users.

One reCAPTCHA v2 innovation seems most promising. Rather than reproduce characters, users are asked to type the words they see (or hear). It even appears unnecessary to spell these correctly or to enter all the words presented in order to be adjudged human.

Most disappointingly, it now appears that audio CAPTCHAs previously available with reCAPTCHA v2 implementations are now sometimes no longer being provided. Instead users see a message that reads: "Your computer or network may be sending automated queries. To protect our users, we can't process your request right now." Users who have depended on audio CAPTCHA alternatives, who have previously been able to function with reCAPTCHA v2, are thus suddenly and seemingly capriciously locked out and denied service on sites still using reCAPTCHA v2.

3. State-of-the-art Approaches

These techniques are more effective in avoiding disadvantage to people with disabilities, while achieving appropriate security goals to protect Web sites and applications from malicious actors. They may be interactive (e.g., requiring the use of biometrics), but they also include non-interactive methods of excluding hostile automated processes, or of limiting their damaging effects.

3.1 Interactive State-of-the-Art Approaches

3.1.1 Biometrics

Biometric identifiers have become a very popular authentication mechanism, especially on mobile platforms which routinely now provide the requisite hardware. Some physical characteristic of the user, such as a fingerprint or a facial profile, is first acquired and then recognized to verify the individual’s identity. This process effectively limits the ability of web robots to create a large number of false identities.

However, biometric authentication mechanisms also need to be carefully designed to avoid introducing accessibility barriers. Individuals who lack the biological characteristics required by a particular authentication method, e.g., fingers, or who are unable to perform the enrollment procedures, e.g.,senior citizens whose fingerprints can no longer be reliably sensed due to aging, are effectively precluded from using a fingerprint biometric. This can result in denial of access to certain users with disabilities and explains why reliance on a single biometric identifier is insufficient to satisfy public sector procurement standards in the European Union EN 301 549, section 5.3 [en-301-549] and regulations under section 508 of the Rehabilitation Act and Section 255 of the Communications Act, 36 CFR 1194, Appendix C, section 403 in the United States [36-cfr-1194]. As a further example, the use of voice recognition (i.e., speaker identification) as a biometric identifier can be problematic for those with speech or hearing-related disabilities.

For this reason, biometric identification systems should be designed to allow users to choose among multiple and unrelated biometric identifiers. With that sole caveat, properly designed biometric identification systems are particularly attractive in situations where it is necessary to identify a particular human user. Their reliability is high, the cognitive load placed on the user low, and they are particularly difficult to foil. However, conventional applications of biometric authentication verify, and therefore disclose, the user's identity. They are thus unsuitable under circumstances in which it is desirable to preserve the user's anonymity reasons of privacy, while nevertheless establishing that the entity attempting to access an online service is human. The scheme described in the next section is designed to solve this problem.

3.1.2 Cryptographic Attestation of Personhood

An approach designed to verify that the user is a person, while preserving individual privacy, has recently been proposed by Cloudflare [attestation]. It is built upon the Web Authentication (WebAuthn) API [webauthn-1]. The WebAuthn registration process is invoked to establish that the user is in control of a hardware authentication device produced by a known and trusted manufacturer, as determined by a valid chain of digital certificates. If biometric authentication occurs in this procedure, as it typically does, then it is used only to unlock the private cryptographic key of the authentication device, and hence the user's identity is never explicitly disclosed to the party requesting evidence of personhood. A variant of this scheme has also been developed which offers stronger protection of privacy by not revealing the identity of the device manufacturer, which could be exploited in combination with other information to infer the user's identity. (Even if biometric authentication is not involved, the user is generally required to interact with the authentication device, for instance by pressing a button.) This version of the approach requires the implementation of a protocol based on zero-knowledge proofs [attestation-zero-knowledge].

Since the user is free to choose among a variety of authentication devices from reliable manufacturers, the hardware can be selected that best satisfies his or her accessibility-related needs and preferences. The inherent flexibility of the proposed approach is clearly advantageous to both security and accessibility.

3.1.3 Leveraging the Multi-Device Environment

It has become common for many, though by no means all users to access various on line services through multiple devices such as desktop and mobile computers, smart phones, tablets, and wearables such as smart watches. This proliferation has led to online services delivering identification solutions that take into account a combination of multi-device and multi-platform vectors for simple and effective user authentication, including persons with disabilities [auth-mult]. We note that several major service providers (such as Facebook) now support cross-site user authentication. However, in relation to the specific ability to tell a human and bot apart, it appears only Google's reCAPTCHA v3 API provides cross-site CAPTCHA services without actually passing specific identifying data.

We would expect Google's reCAPTCHA v3 service would score no need to present a CAPTCHA whenever another browser tab is already properly logged in to a Google product such as Calendar on whatever registered user device. However, while this may prevent the third party site from collecting personal data, it does assist Google in acquiring more user data. This constitutes a significant cost to the user's privacy in an industry so capable of cross-referencing massive amounts of data in the absence of meaningful regulations and controls on where and how that data may be used. This is a very strong accessibility concern as people with disabilities are generally reluctant to disclose any information about their disability on the web except when and only when they expressly choose to reveal that information themselves for their own particular reasons.

The multi-device environment is widely used to authenticate a human user by requiring some action on a registered second device, most often a smart cellular telephone. Known as "dual factor authentication," this process is mandatory at each of the three largest email service providers, Gmail, Yahoo, and Outlook which accept outbound mail only after the user has authenticated through a telephone number they are required to provide. Similarly, should Twitter spot activity it consider suspicious, it will hold tweets until the user revalidates through both a reCAPTCHA challenge and a telephone call. Yet increasingly, as telemarketing calls proliferate, web users are reluctant to provide data aggregators their personal telephone numbers. Clearly, a voice only authentication approach also cannot serve deaf and hard of hearing users properly.

Access a Google account service through a new browser or laptop and Google will hold off granting access until the user responds to a pop-up "toast" message on the registered telephone device showing the user's photo and asking: "Was that you?" The user must verify that it was indeed they before access can continue. A variant previously common at Google and still in use elsewhere places a voice call or sends a text message with a short code the user is required to input into a form field to continue.

Another variant of this approach, one employed by Cisco's Webex teleconference service, asks the user to press a particular key on their telephone to continue. This is easy enough on a desk phone, but it becomes problematic for the text to speech (TTS) dependent smart phone user who must now hear the phone's TTS voice in order to get the dial pad to pop up, and then find the appropriate touch tone key all at the same time as the Webex service voice is also speaking, repeating: "Welcome to Webex. Press 1 to be connected to your meeting. ... Welcome to Webex. Press 1 to be connected to your meeting. ..." It is important to recognize that both of these voices are routed through the same physical device speaker, even on units equipped with dual speakers for playing music in stereo.

Providing the user the option of contact via voice and/or text is good, but some services offer only text. This disadvantages the user without an accessible text capable device, and there are many such users. Similarly, services offering only a voice call option disadvantage deaf and hearing impaired users. As ever, the rule must be to provide options for the user to choose among, including fallback options.

3.2 Non-Interactive State-of-the-Art Approaches

While traditional CAPTCHA and other interactive approaches to limiting the activities of web robots are sometimes effective, they do make using a site more cumbersome. This is often unnecessary, as non-interactive mechanisms exist to check for spam or other invalid content typically introduced by robots.

The approaches described in this section can be regarded as alternatives or complimentary approaches to traditional CAPTCHA.

Since a CAPTCHA can sometimes be circumvented by an attacker (e.g., by using crowd-sourcing techniques), cryptographic keys can in some circumstances be hijacked. Detecting and responding to web robots that have successfully and unexpectedly gained access to a web resource is thus desirable even in the presence of other measures. The advantage of limiting sensory and cognitive demands on people with disabilities only accrues when these non-interactive strategies are used alone, or when they are combined with other traditional CAPTCHA-avoidance approaches.

3.2.1 Spam Filtering

Applications that use continuous authentication and “hot words” to flag spam content, or Bayesian filtering to detect other patterns consistent with spam, are very popular, and quite effective. While such risk analysis systems may experience false negatives from time to time, properly-tuned systems can achieve results comparable to a traditional visual CAPTCHA, while also removing the added cognitive burden on the user and eliminating access barriers.

Most major blogging software contains spam filtering capabilities, or can be fitted with a plug-in for this functionality. Many of these filters can automatically delete messages that reach a certain spam threshold, and mark questionable messages for manual moderation. More advanced systems can control attacks based on posting frequency, filter content sent using the Trackback protocol, and ban users by IP address range, temporarily or permanently.

3.2.2 Proof-Of-Work

One strategy for thwarting misuse of web resources is to load suspicious clients with significant computational workloads, thus slowing down the interaction and hopefully deterring malicious parties by reducing their ability to engage in undesirable activities such as disseminating spam. This approach has been explored in the development of proof-of-work challenges. [kaPoW-plugins] These can be made arbitrarily expensive computationally based on an associated reputation score for each client that attempts to access a resource. Clients that are adjudged more likely to be malicious are required to solve more computationally expensive problems. Less resource-consumptive problems are provided to clients that are adjudged more likely to be web browsers operated by human users.

The proof of work approach should have a negligible effect on the human user's interactive experience, provided that the reputation scoring is relatively accurate. However, it is designed to impose substantial cost on the operators of web robots—perhaps even greater than the cost of hiring human workers as CAPTCHA solvers.

Implementing this approach is straightforward. It requires the client to execute JavaScript code to solve the computational problem, and the solution is then verified by a server to establish that the work has been performed. It poses no direct accessibility problems, though it may slow performance for users of older hardware.

3.2.3 Heuristics

Heuristics are discoveries in a process that seem to indicate a given result. It may be possible to detect the presence of a robotic user based on the volume of data the user requests, series of common pages visited, IP addresses, data entry methods, or other signature data that can be collected.

Again, this requires a careful examination of site data. If pattern-matching algorithms can’t find good heuristics, then this is not a good solution. Also, polymorphism, or the creation of changing footprints, is apt to result, if it hasn’t already, in robots, just as polymorphic (“stealth”) viruses appeared to get around virus checkers looking for known viral footprints.

Another heuristic approach identified in Botz-4-Sale: Surviving DDos Attacks that Mimic Flash Crowds [killbots] involves the use of CAPTCHA images, with a twist: how the user reacts to the test is as important as whether or not it was solved. This system, which was designed to thwart distributed denial of service (DDoS) attacks, bans automated attackers which make repeated attempts to retrieve a certain page, while protecting against marking humans incorrectly as automated traffic. When the server’s load drops below a certain level, the CAPTCHA-based authentication process is removed entirely.

3.2.4 Honeypots

Providing a CAPTCHA visible to robots but not to humans appears to be sufficiently successful to be supported in several content management systems such as Drupal Honeypots and in several commercial WordPress plugins. The form is created to attract robots and then hidden from the user by markup such as CSS-Hidden. It's an approach that is easily implemented even in hand authored markup and should be considered. The Hilton Hotel Corporation has used a honeypot CAPTCHA on the Sign In page for Hilton Honors, its loyalty program website where a prominent focusable field is labeled: "This field is for robots only. Please leave blank."

3.2.5 Limited-Use Accounts

Users of free accounts very rarely need full and immediate access to a site’s resources. For example, users who are searching for concert tickets may need to conduct only three searches a day, and new email users may only need to send the same notification of their new address to their friends. Sites may create policies that limit the frequency of interaction explicitly (that is, by disabling an account for the rest of the day) or implicitly (by slowing the response times incrementally). Creating limits for new users can be an effective means of making high-value sites unattractive targets to robots.

Drawbacks to this approach include the need to perform sufficient testing and data collection to determine useful limits that will serve human users yet frustrate robots. It requires site designers to look at statistics of normal and exceptional users, and determine whether clear demarcation exists between them.

3.2.6 Public-Key Infrastructure (PKI)

Another approach is to use certificates for individuals who wish to verify their identity. A party relying on a certificate offered by a user attempting to access online services can assess the trustworthiness of the certificate's issuer, and the likelihood that the private key has been compromised, in evaluating the risk that the offerer is actually a web robot rather than a human agent. Highly trusted certification authorities such as governments, as in Estonia's e-Residency Program require evidence of an individual's identity as a basis for issuing a certificate. Provided that the private key is not compromised and cannot be misused by an attacker, there is a high degree of assurance that messages cryptographically signed by it which could serve to establish the user's identity to web-based services have genuinely been authorized by the certificate holder.

The use of certificates as an indicator that an access attempt has been authorized by a human discloses the user's identity to the web service provider, and thus should not be deployed in circumstances in which anonymity is necessary. In addition, Transport Layer Security (TLS) client certificate authentication, as defined in TLS 1.2 and earlier versions of the protocol, gives rise to privacy concerns [tls-tracking].

A variant of this concept, in which only people with disabilities who are affected by other verification systems would register, is sometimes proposed. Such approaches raise significant privacy and stigmatization concerns and are usually opposed strongly by people with disabilities themselves and by organizations that serve them. Such approaches should not be confused with situations where people voluntarily self-identify as individuals with disabilities. An example is the U.S. based Bookshare whose services are only available to persons with documented print disabilities under the terms of an international copyright treaty administered by the United Nations' World Intellectual Property Organization (WIPO) and known as the Marrakesh Treaty. [marrakesh]

3.2.7 The Non-interactive Google reCAPTCHA Version 3

Late in 2018 Google released reCAPTCHA v3 promising to eliminate "the need to interrupt users with challenges at all." Google also informed us that their goals with reCAPTCHA v3 included increasing "the accessibility of the web by removing traditional CAPTCHAs" entirely. Obviously, fully non-interactive Turing testing is a most welcome development direction for accessibility. When the non-interactive Turing test returns a score indicating high confidence that the user is human, or indeed a score indicating high confidence that the user is a robot, and experience has demonstrated the non-interactive engine is reliable, we can only offer praise and gratitude for technological progress that more effectively supports persons with disabilities.

Of course no approach will always return unambiguous results. In such situations Google advises that content providers "use a secondary challenge that makes sense in the context of their site such as two-factor authentication, send the post to moderators, or combine the score with signals specific to their site to make a more informed judgment." Google intends that traditional CAPTCHA no longer be used as a fallback mechanism and has dropped it from reCAPTCHA v3, though it remains in their slightly older, 2017 reCAPTCHA v2 Invisible service.

The reality is that what action is taken in response to an ambiguous core returned by reCAPTCHA v3 is in the hands of the content provider. Services like reCAPTCHA gain their market share by offering to relieve the content provider of the hard work inherent in mounting effective and accessible Turing testing. Sadly this leaves the door open to any fallback approach a content provider might choose. Meanwhile, Google's reCAPTCHA FAQ declares that reCAPTCHA Version 2 is not going away. It is therefore imperative that methods for disambiguating an ambiguous non-interactive score be well documented and easily implementable in order to better overcome the tendency to simply adopt the old familiar approach.

3.2.8 Turing Tokens from the Cloud?

The "cloud" has become a well-known term among computer users. It describes the growing concentration of web content and software service delivery in content delivery networks (CDN) such as Akamai, Cloudflare, and Amazon Cloudfront. These CDNs provide the value add of localized last mile cached content delivery and the ability to effectively deflect various malicious activity such as denial of service (DOS) attacks. As almost two-thirds of Internet content is now delivered by CDNs, they are now also unintentionally forced to become Turing test arbiters. This in turn has resulted in the development of fresh innovative approaches to CAPTCHA such as Privacy Pass [privacy-pass], now available as a browser extension on Cloudflare.

While Privacy Pass still begins with a CAPTCHA challenge, it does provide the user a trove of cryptographically blinded tokens which can satisfy further challenges in the background and dramatically reduce interactive CAPTCHA challenges. Most refreshingly it offers meaningful privacy protection, even anonymity, while reliably validating the user is human. Essentially, the CDN is functioning as a trust broker on the user's behalf. When a user "spends" a token, they're saying to the site they're accessing: "You don't trust me, but you do trust the entity that issued this token, and they're vouching for me." As this approach is developed further, we can reasonably hope the onus of the initial challenge can be further mitigated with robust support for web accessibility, perhaps by expanding available initial CAPTCHA validation approaches, e.g., adding support for biometrics.

3.2.9 Why not Federated Turing Tokens?

A host of varied identity management systems with varying features and varying levels of accessibility are in use on the web today. Some, like Last Pass, offer to securely store authentication credentials and frequently used form data that individual users and variously defined groups can invoke across a range of personal devices. Services like Amazon's Cognito, allow service providers to support a range of users existing authentications to facilitate accessing a range of cross platform services, in essence contracting out the task of login authentication and access control. Others, like Facebook's Account Kit, seeks to leverage login authentication on a highly popular web service to grow the ecosystem developed using user data collected on that platform. It has been unclear to us at first blush which, if any, of these services prioritize restricting account services to human users only. It is, however, clear that virtually none provides their authenticated users, about whom they know a great deal, support for non-interactively authenticating their humanity with third parties. The only exceptions of which we're aware are reCAPTCHA v3 and Privacy Pass.

We believe adding third party CAPTCHA support to identity management services is needed. Any service offering to say to a third party: "You don't know this user so you don't trust them, but you know and trust me, therefore you can trust this user" could only exist after earning trust across the industry. Such a service would indeed need to be careful to sign up only human users. The service that leveraged both user authentication and trusted anonymity would, we believe quickly prove viability. All users, but most certainly persons with disabilities need not only simplified authentication but also the ability to interact across the web using widely accepted trust credentials with minimal interruption for validating their humanity. Such an accessible enhanced quality VPN service providing not just cross-site login but also Turing authentication, perhaps for a monthly fee—such a service could conceivably even validate and broker all the user's registered devices across the Internet anonymizing even financial transactions and shipping data as a full service on line trust broker. Letters of credit had a similar beginning in international finance some centuries hence, so why not on line Turing Tokens and privacy protection today?

4. Conclusion

Editor's note

This section is to be rewritten in light of emerging technologies and recent developments. The Task Force is reviewing the current range of alternatives to CAPTCHA for the purpose of revising this document to offer up to date and informed advice that serves the needs of people with disabilities, while maintaining a high level of security for application and service providers. The conclusions formerly in this section have been removed to make way for the new material, which will be a complete rewrite of the conclusions.

A. Terms

The following terms are used in this document:

AI
Artificial Intelligence
alternative text
Text that is associated with, and provides a brief description or label of, non-text content.
assistive technology
Hardware and / or software that acts as a stand-alone user agent, or alongside a mainstream user agent to meet the functional requirements of users with disabilities that go beyond those provided by mainstream user agents alone.
Bayesian filter
Recursive probabilistic heuristic to categorize content, typically used in spam filtering.
CAPTCHA
“Completely Automated Public Turing Test to distinguish between Computers and Humans” relying on a challenge believed to be difficult for machines to satisfy correctly but relatively easy for humans.
continuous authentication
Mechanism to determine that a user is still the one previously verified without requiring interactive re-authentication.
heuristic
Way to solve a problem with high reliability though not perfection.
honeypot
A decoy service intended to elicit interaction from web robots.
non-interactive
@@
public-key infrastructure
Authentication of the entity which has encrypted content via a registered decryption key.
robot
Software application that performs automated tasks on web content.
screen reader
Assistive technology that renders content as speech or Braille.
spam filter
Software that processes email messages to separate undesired, usually automated, messages from desired messages.
spider
Robot that processes web content and recursively follows links to process the content at the link target.
Turing test
A test to determine whether responses provided by a software application are distinguishable from the responses of human individuals.
user agent
Any software that retrieves, renders and facilitates end user interaction with web content.
VPN
Virtual Private Network

B. Acknowledgments

B.1 Contributors to This Version:

B.2 Contributors to the Previous Version:

B.3 Enabling Funders

This publication has been funded in part with U.S. Federal funds from the Health and Human Services, National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR), initially under contract number ED-OSE-10-C-0067, then under contract number HHSP23301500054C, and now under HHS75P00120P00168. The content of this publication does not necessarily reflect the views or policies of the U.S. Department of Health and Human Services or the U.S. Department of Education, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.

C. References

C.1 Informative references

[36-cfr-1194]
36 CFR Appendix C to Part 1194, Functional Performance Criteria and Technical Requirements. Legal Information Institute. URL: https://www.law.cornell.edu/cfr/text/36/appendix-C_to_part_1194
[3d-captcha-security]
On the security of text-based 3D CAPTCHAs. Nguyen, V. D.; Chow, Y.-W.; Susilo, W.. Computers & Security, 45. 2014.
[attestation]
Humanity wastes about 500 years per day on CAPTCHAs. It’s time to end this madness. Thibault Meunier. Cloudflare. 13 May 2021. URL: https://blog.cloudflare.com/introducing-cryptographic-attestation-of-personhood/
[attestation-zero-knowledge]
Introducing Zero-Knowledge Proofs for Private Web Attestation with Cross/Multi-Vendor Hardware. Watson Ladd. Cloudflare. 12 August 2021. URL: https://blog.cloudflare.com/introducing-zero-knowledge-proofs-for-private-web-attestation-with-cross-multi-vendor-hardware/
[auth-mult]
Design, Testing and Implementation of a New Authentication Method Using Multiple Devices. Cetin, C.. J. Ligatti, D. Goldgof, & Y. Liu (Eds.): ProQuest Dissertations Publishing. 2015.
[captcha-ld]
The Effect of CAPTCHA on User Experience among Users with and without Learning Disabilities. Gafni, R.; Nagar, I..
[captcha-ocr]
CAPTCHA: Attacks and Weaknesses against OCR technology. Silky Azad; Kiran Jain. Journal of Computer science and Technology. 2013. URL: https://computerresearch.org/index.php/computer/article/download/368/368
[captcha-robustness]
Effects of Text Rotation, String Length, and Letter Format on Text-based CAPTCHA Robustness. Tangmanee, C.. Journal of Applied Security Research, 11(3). 2016.
[captcha-security]
CAPTCHA Security: A Case Study. Yan, J.; El Ahmad, A. S.. Security & Privacy, IEEE, 7(4). 2009.
[captchastar]
CAPTCHaStar! A novel CAPTCHA based on interactive shape discovery. Conti, M.; Guarisco, C.; Spolaor, R.. 2015.
[defeat-line-noise]
Defeating line-noise CAPTCHAs with multiple quadratic snakes. Nakaguro, Y.; Dailey, M. N.; Marukatat, S.; Makhanov, S. S.. Computers & Security, 37. 2013.
[en-301-549]
EN 301 549 v3.2.1: Harmonised European Standard - Accessibility requirements for ICT products and services. CEN/CENELEC/ETSI. 2021-03. URL: https://www.etsi.org/deliver/etsi_en/301500_301599/301549/03.02.01_60/en_301549v030201p.pdf
[eval-audio]
Evaluating existing audio CAPTCHAs and an interface optimized for non-visual use. Bigham, J. P.; Cavender, A. C.. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. April 2009.
[facecaptcha]
FaceCAPTCHA: a CAPTCHA that identifies the gender of face images unrecognized by existing gender classifiers. Kim, J.; Kim, S.; Yang, J.; Ryu, J.-h.; Wohn, K.. An International Journal, 72(2). 2014.
[game-captcha]
Game-based image semantic CAPTCHA on handset devices. Yang, T.-I.; Koong, C.-S.; Tseng, C.-C.. An International Journal, 74(14). 2015.
[information-security]
Handbook of Information and Communication Security. Peter Stavroulakis; Mark Stamp. Springer Science & Business Media. 2010.
[kaPoW-plugins]
kaPoW plugins: protecting web applications using reputation-based proof-of-work. Tien Le; Akshay Dua; Wu-chang Feng. Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality. 2012.
[killbots]
Botz-4-Sale: Surviving DDos Attacks that Mimic Flash Crowds. Srikanth Kandula; Dina Katabi; Matthias Jacob; Arthur Burger. URL: https://www.usenix.org/legacy/events/nsdi05/tech/kandula/kandula_html/
[marrakesh]
Marrakesh Treaty to Facilitate Access to Published Works for Persons Who Are Blind, Visually Impaired or Otherwise Print Disabled. World Intellectual Property Organization. 27 June 2013. URL: https://www.wipo.int/treaties/en/ip/marrakesh
[newscom]
Spam-bot tests flunk the blind. Paul Festa. News.com. 2 July 2003. URL: https://web.archive.org/web/20030707210529/http://news.com.com/2100-1032-1022814.html
[privacy-pass]
Privacy Pass: Bypassing Internet Challenges Anonymously. Alex Davidson; Ian Goldberg; Nick Sullivan; George Tankersley; Filippo Valsorda. Proceedings on Privacy Enhancing technologies; 2018 (3):164-180. 2018. URL: https://www.petsymposium.org/2018/files/papers/issue3/popets-2018-0026.pdf
[recaptcha-attacks]
HMM-based Attacks on Google's ReCAPTCHA with Continuous Visual and Audio Symbols. Sano, S.; Otsuka, T.; Itoyama, K.; Okuno, H. G.. Journal of Information Processing, 23(6). 2015.
[social-classification]
Social and egocentric image classification for scientific and privacy applications. Korayem, M.. D. Crandall, J. Bollen, A. Kapadia, & P. Radivojac (Eds.): ProQuest Dissertations Publishing. 2015.
[solving-captchas]
How good are humans at solving CAPTCHAs? A large scale evaluation. Elie Bursztein et al. 2010 IEEE symposium on security and privacy. 2010.
[task-completion]
Towards a universally usable human interaction proof: evaluation of task completion strategies. Sauer, G.; Lazar, J.; Hochheiser, H.; Feng, J.. ACM Transactions on Accessible Computing (TACCESS), 2(4). 2010.
[tls-tracking]
Exploiting TLS Client Authentication for Widespread User Tracking. Lucas Foppe; Jeremy Martin; Travis Mayberry; Eric C. Rye; Lamont Brown. Proceedings on Privacy Enhancing Technologies. 2018. URL: https://www.petsymposium.org/2018/files/papers/issue4/popets-2018-0031.pdf
[webauthn-1]
Web Authentication:An API for accessing Public Key Credentials Level 1. Dirk Balfanz; Alexei Czeskis; Jeff Hodges; J.C. Jones; Michael Jones; Akshay Kumar; Huakai Liao; Rolf Lindemann; Emil Lundberg. W3C. 4 March 2019. W3C Recommendation. URL: https://www.w3.org/TR/webauthn-1/