Inaccessibility of CAPTCHA

Alternatives to Visual Turing Tests on the Web

W3C Editor's Draft

This version:
https://w3c.github.io/apa/captcha/
Latest published version:
https://www.w3.org/TR/turingtest/
Latest editor's draft:
https://w3c.github.io/apa/captcha/
Editors:
(W3C)
Former editor:
Matt May (W3C)

Abstract

Various approaches have been employed over many years to distinguish human users of web sites from robots. The traditional CAPTCHA approach asking users to identify obscured text in an image remains common, but other approaches have emerged. All interactive approaches require users to perform a task believed to be relatively easy for humans but difficult for robots. Unfortunately the very nature of the interactive task inherently excludes many people with disabilities, resulting in a denial of service to these users. Research findings also indicate that many popular CAPTCHA techniques are no longer particularly effective or secure, further complicating the challenge of providing services secured from robotic intrusion yet accessible to people with disabilities. This document examines a number of approaches that allow systems to test for human users and the extent to which these approaches adequately accommodate people with disabilities, including recent noninteractive and tokenized approaches. We have grouped these approaches by two category classifications: Stand-Alone Approaches that can be deployed on a web host without engaging the services of unrelated third parties and Multi-Party Approaches that engage the services of an unrelated third party.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was published by the Accessible Platform Architectures Working Group as an Editor's Draft.

Comments regarding this document are welcome. Please send them to public-apa@w3.org (archives).

Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 March 2019 W3C Process Document.

1. Introduction

1.1 The CAPTCHA Context

Both large and small web sites which provide interactive services have long sought to limit their services only to human users. They seek to avoid exposing their collected data and content publishing services to ever more cleverly articulated web robots. Whether the service be travel and event ticketing, email, blogging, or calendaring services, social media services, or some combination of these and many more, experience has demonstrated that even authenticated login provides inadequate protection from malicious actors. Such sites still need to know their interacting user is a human individual, and not a software robot. Arguably the industry's need for reliable Turing testing is only growing more critical.

An early (and still widespread) solution relies on the use of graphical representations of text in registration or comment areas of a web site. The site attempts to verify that the user is in fact a human by requiring the user to complete a task referred to as a "Completely Automated Public Turing Test, to Tell Computers and Humans Apart," or CAPTCHA. The assumption is that humans find this task relatively easy, while robots find it nearly impossible to perform.

The CAPTCHA was initially developed by researchers at Carnegie Mellon University and has been primarily associated with a technique whereby an individual identifies a distorted set of characters in a bit-mapped image, then enters those characters into a web form. This approach is widely familiar to users of the web, though the term CAPTCHA is generally recognized only by web professionals.

In recent times the types of CAPTCHA that appear on web sites and mobile apps have changed significantly. Since our concern here is the accessibility of systems that seek to distinguish human users from their robotic impersonators, the term “CAPTCHA” is used in this document generically to refer to all approaches which are specifically designed to differentiate a human from a computer, including fully noninteractive approaches.

It will surprise no one that we applaud the recent emergence of noninteractive approaches because functional noninteractive approaches pose no accessibility challenge to users. Unfortunately, some current noninteractive approaches come at the price of exposing much data about the individual user to the noninteractive host analysis engine that user might rather prefer to keep confidential. We are further heartened by the even more recent development of tokenized approaches that promise trustable Turing testing requiring only minimal interaction with users.

1.2 The Accessibility Challenge

While online users continue broadly to report finding traditional CAPTCHAs frustrating to complete, it is generally assumed that an interactive CAPTCHA can be resolved within a few incorrect attempts. The point of distinction for people with disabilities is that a CAPTCHA not only separates computers from humans, but also often prevents people with disabilities from performing the requested procedure. For example, asking users who are blind, visually impaired or dyslexic to identify textual characters in a distorted graphic is asking them to perform a task they are intrinsically least able to accomplish. Similarly, asking users who are deaf, hard of hearing, or living with auditory processing disorder to identify and transcribe in writing the content of an audio CAPTCHA is asking them to perform a task they’re intrinsically least likely to accomplish. Furthermore, traditional CAPTCHAs have generally presumed that all web users can read and transcribe English-based words and characters, thus making the test inaccessible to a large number of non-English speaking web users worldwide. Frankly, a design pattern that expects multiple attempts from users as a matter of course is arguably inaccessible by design to persons living with an anxiety disorder as well as to many living with a range of other cognitive and learning disabilities.

While Accessibility best practices require, and assistive technologies expect, substantive graphical images to be authored with text equivalents, alternative text in CAPTCHA images would clearly be self-defeating. CAPTCHAs are, consequently, allowed under the W3C's Web Content Accessibility Guidelines (WCAG) provided that "text alternatives that identify and describe the purpose of the non-text content are provided, and alternative forms of CAPTCHA using output modes for different types of sensory perception are provided to accommodate different disabilities."

It is important to understand the limitation of the WCAG CAPTCHA exemption. It applies only to the content of the CAPTCHA. WCAG still requires that alternative text identify the graphical object as a CAPTCHA. Conformance with all other WCAG guidelines also remains critical for web accessibility.

The rationale for this highly specific exemption in WCAG is simple. A CAPTCHA without an accessible and usable alternative makes it impossible for users with certain disabilities to create accounts, write comments, or make purchases on such sites. In essence, such CAPTCHAs fail to properly recognize users with disabilities as human, obstructing their participation in contemporary society. Such issues also extend to situational disabilities whereby a user may not be able to effectively view a traditional CAPTCHA on a mobile device due to the small screen size, or hear an audio-based CAPTCHA in a noisy environment.

1.3 CAPTCHA Effectiveness

Malicious activity on the web has only grown over the years to comprise an alarmingly high percentage of all Internet traffic. While we would certainly not suggest the web's woes arise from sloppy or ill-considered CAPTCHA implementations, we do suggest current conditions only reinforce the importance of well considered and closely monitored security and privacy strategies consistent with appropriate user support that includes people with disabilities. Getting CAPTCHA right needs to be part of the solution.

It is important to acknowledge that using a CAPTCHA as a security solution is becoming increasingly ineffective. Current CAPTCHA methods that rely primarily on traditional image-based approaches, logic problems, or audio CAPTCHA alternatives can be largely cracked using both complex and simple computer algorithms. Research suggests that as character-based CAPTCHAs become increasingly vulnerable to defeat by advancing optical character recognition technologies, more severe distortion of the characters is introduced to resist these attacks. However, such enhanced distortion techniques also make it progressively less feasible even for humans who are well endowed with sensory and cognitive capacity to solve CAPTCHA challenges reliably, ultimately making character-based CAPTCHAs impracticable [captcha-ocr].

Pattern-matching algorithms can achieve an even higher success rate of cracking CAPTCHAs in some instances, as demonstrated in CAPTCHA Security: A Case Study [captcha-security] and HMM-based Attacks on Google’s reCAPTCHA with Continuous Visual and Audio Symbols [recaptcha-attacks]. While efforts are being made to strengthen traditional CAPTCHA security, more robust security solutions risk reducing the typical user’s ability to understand the CATPCHA that needs to be resolved, e.g., Defeating line-noise CAPTCHAs with multiple quadratic snakes [defeat-line-noise]. A recent study at the University of Maryland has demonstrated 90% success rate cracking Google's audio reCAPTCHA using Google's own speech recognition service. Indeed, as noted below, Google's V. 2 reCAPTCHA service has recently begun declining to actually provide the audio CAPTCHA alternative clearly proffered onscreen.

In fact it is arguable that online services which offer the content developer a ready solution for distinguishing human users from robots may well be helping defeat that very function. For example Google's reCAPTCHA proclaims : "Hundreds of millions of CAPTCHAs are solved by people every day. reCAPTCHA makes positive use of this human effort by channeling the time spent solving CAPTCHAs into annotating images and building machine learning datasets. This in turn helps improve maps and solve hard AI problems." It is legitimate to consider whether it also describes a classic vicious cycle which is helping defeat the effectiveness of visual and auditory CAPTCHA deployments.

It is therefore highly recommended that the purpose and effectiveness of any deployed solution be carefully considered and evaluated across multiple browser and operating system environments before adoption, and then closely monitored for effective performance. Alternative security methods, such as two-step or multi-device verification, along with emerging protocols for identifying human users with high reliability should also be carefully considered in preference to traditional image-based CAPTCHA methods for both security and accessibility reasons.

2. Stand-Alone Approaches

Many techniques are available to web sites to discourage or eliminate fraudulent activities such as inappropriate account creation. Several of them may be as effective as the visual verification technique while being more accessible to people with disabilities. Others may be overlaid as an accommodation for the purposes of accessibility. We group our review by interactive and non-interactive categories

2.1 Interactive Stand-Alone Approaches

2.1.1 Traditional Character-Based CAPTCHA

The traditional character-based CAPTCHA, as previously discussed, is largely inaccessible and insecure. It focuses on the presentation of letters or words presented in an image and designed to be difficult for robots to identify. The user is then asked to enter the CAPTCHA information into a form.

The use of a traditional CAPTCHA is obviously problematic for people who are blind, as the screen readers they rely on to use web content cannot process the image, thus preventing them from uncovering the information required by the form. Because the characters embedded in a CAPTCHA are often distorted or have other characters in close proximity to each other in order to foil technological solution by robots, they are also very difficult for users with other visual disabilities. This common CAPTCHA technique is also less reliably solved by users with cognitive and learning disabilities, see The Effect of CAPTCHA on User Experience among Users with and without Learning Disabilities [captcha-ld]. Because they’re intentionally distorted to foil robots, they also foil users who are more easily confused by surrealistic images or who do not possess sufficiently acute vision to “see” beyond the presented distortion and uncover the text the site requires in order to proceed.

While some sites have begun providing CAPTCHAs utilizing languages other than English, an assumption that all web users can understand and reproduce English predominates. Clearly, this is not the case. Users of Arabic or Thai+character sets, for example, may not be familiar with the English alphabet or may not have enough knowledge to identify a distorted version of such characters. Furthermore, the default keyboard is likely to be localized, +potentially making it difficult to enter an alternative character set unless specifically set up to do so. Research has demonstrated how CAPTCHAs based on written English impose a significant barrier to many on the web; see Effects of Text Rotation, String Length, and Letter Format on Text-based CAPTCHA Robustness [captcha-robustness].

2.1.2 Sound Output

To re-frame the problem, text is easy to manipulate, which is good for assistive technologies, but just as good for robots. One logical solution to this problem is to offer another non-textual method of using the same content. To achieve this, audio is played that contains a series of characters, words, or phrases being read out which the user then needs to enter into a form. As with visual CAPTCHA however, robots are also capable of recognizing spoken content—as Amazon’s Alexa and Android’s Google Assistant, among other spoken dialog systems, have so ably demonstrated. Consequently, the characters, words, or phrases the user is to uncover and transcribe in the form are also distorted in an audio CAPTCHA and are usually played over a sonic environment of obfuscating sounds.

The industry recognized this problem early. CNet reported in Spam-bot tests flunk the blind [newscom] that “Hotmail’s sound output, which is itself distorted to avoid the same programmatic abuse, was unintelligible to all four test subjects, all of whom had good hearing.”

If the sound output, which is itself distorted to avoid the same programmatic abuse, can render the CAPTCHA difficult to hear; there can also be confusion in understanding whether a number is to be entered as a numerical value or as a word, e.g.,‘7’ or ‘seven’. Often the audio CAPTCHA user will hear sounds which seem to be words or numerical values that should be entered, but turn out to be just background noise.

Sound is also intrinsically temporal, but the import of this unavoidable fact is too often under appreciated—perhaps because the world we live in as seen through the eyes is also temporal. Unlike the real world seen through the eyes however, the traditional CAPTCHA is a still image that can be stared at until comprehension dawns. Sound has no analog to the visual still image.

Whenever any portion of an audio CAPTCHA is not understood; at least some part of the CAPTCHA must be replayed, usually several times. Currently, few audio CAPTCHAs provide an easily invoked and reliable replay feature, let alone an independent volume control or a pause, rewind, and fast-forward feature. Consequently, an entirely new audio CAPTCHA is often played should any part of one audio CAPTCHA prove difficult to understand.

Some audio CAPTCHA tacitly admit this failure by offering a link allowing the user to Download the audio CAPTCHA, typically as a mp3 file. The implicit assumption is that the user will use a favorite audio player—which does provide for independent volume control and pause, play, rewind, and fast forward capabilities—to play the audio CAPTCHA MP3 file again and again until comprehension dawns, perhaps pausing and rewinding the playback and perhaps writing down on the side the text destined for the web form. Clearly this is very inconvenient and subject to web site time outs. It also illustrates why simply providing an audio CAPTCHA alternative to the traditional visual CAPTCHA does not provide equivalent access to the user.

Furthermore, just as not all web users should be presumed proficient with English in visual CAPTCHA, they should not be presumed capable of understanding and transcribing aural English in an audio CAPTCHA. Unfortunately, non English audio CAPTCHAs appear to be very rare indeed. We are aware of only one multilingual CAPTCHA solution provider with support for a significant number of the world's languages.

Users who are deaf-blind, don’t have or use a sound card, find themselves in noisy environments, or don’t have required sound plugins properly configured and functioning, are thus also prevented from proceeding. Furthermore, relatively few audio CAPTCHAs properly support all the various browsers and operating systems in use today. Similarly, users of browsers which do not support easy direction of sound output to a particular audio device, or to all available audio devices on the system, are also hampered.

Users who live with some form of cognitive disability may also find audio CAPTCHAs even more difficult to solve than character-based visual CAPTCHAs. Audio CAPTCHAs are known to impose a cognitive overload to all human users in comparison to the cognitive load necessary to understand normal human speech [information-security]. Further, studies of CAPTCHAs requiring human recognition of distorted or obscured speech have shown that they are more difficult for all users to solve and more demanding in terms of time and efforts compared to text or image-based CAPTCHAs. [solving-captchas]. These facts make audio CAPTCHAs a poor choice for users with cognitive disabilities.

Although auditory forms of CAPTCHA that present distorted speech create recognition difficulties for screen reader users, the accuracy with which such users can complete the CAPTCHA tasks is increased if the user interface is carefully designed to prevent screen reader audio and CAPTCHA audio from being intermixed. This can be achieved by implementing functions for controlling the audio that do not require the user to move focus away from the text response field; see Evaluating existing audio CAPTCHAs and an interface optimized for non-visual use [eval-audio].

Experiments with a combined auditory and visual CAPTCHA requiring users to identify well known objects by recognizing either images or sounds, suggest that this technique is highly usable by screen reader users. However, its security-related properties remain to be explored, as mentioned in Towards a universally usable human interaction proof: evaluation of task completion strategies [task-completion].

2.1.3 Logic Puzzles

The goal of visual verification is to separate human from machine. One reasonable way to do this is to test for logic. Simple mathematical or word puzzles, trivia, spatial tasks, or similar logic tests may raise the bar for robots, at least to the point where using them is more attractive elsewhere.

The use of logic puzzles as a CAPTCHA technique, however, introduces substantial barriers to access for people with language, learning or cognitive disabilities. An individual living with dyscalculia will understandably find even simple arithmetic puzzles challenging. A blind individual will be unable to identify the hammer from among graphical depictions of common tools. When puzzles are used, therefore, it is advisable to support a variety of puzzles so that someone unable to solve a given puzzle can obtain a different kind of puzzle when requesting another challenge.

Any development of CAPTCHA challenges in this direction should be accompanied by thorough usability research involving people with a variety of language, learning, and cognitive disabilities, as such an approach remains largely unexplored in practice and in the research literature. It should also be noted that answers may need to be handled flexibly, if they require free-form text. Also, a system would have to maintain a vast number of questions, or shift them around programmatically, in order to keep spiders from capturing them all for use by web robots. Puzzle-based CAPTCHA challenges are also readily subject to defeat by human operators engaged in crowd-sourcing activity on behalf of attackers.

2.1.4 Image and Video

2.1.4.1 Visual Comparison CAPTCHAs

There are a number of CAPTCHA techniques based on the identification of still images. This can include requiring the user to identify whether an image is a man or a woman, or whether an image is human-shaped or avatar-shaped among other comparison approaches, such as CAPTCHAStar! A novel CAPTCHA based on interactive shape discovery, [captchastar], FaceCAPTCHA: a CAPTCHA that identifies the gender of face images unrecognized by existing gender classifiers [facecaptcha], and Social and egocentric image classification for scientific and privacy applications [social-classification].

While alternative audio comparison CAPTCHAs might be explored such as using similar or different sounds for comparison, the reliance on visual comparison alone makes these approaches difficult, if not impossible for people with vision-related disabilities. They're also very difficult for people living with visual processing disorders, among other cognitive and learning disabilities.

2.1.4.2 3D CAPTCHA

A 3D representation of letters and numbers can make it more difficult for OCR software to identify them, in turn increasing the security of the CAPTCHA, described in On the security of text-based 3D CAPTCHAs [3d-captcha-security]. However, this solution raises similar accessibility issues to traditional CAPTCHAs.

2.1.4.3 Movement-Based and Video Game CAPTCHA

This process is based on the movement of interactive elements such as a slider or the completion of a basic video game as a CAPTCHA, like Game-based image semantic CAPTCHA on handset devices [game-captcha]. The benefits include removal of language barriers, and the removal of CAPTCHA frustration due to the presumed intuitiveness of the associated task and the enjoyment of playing video games.

Importantly, the implementation of this CAPTCHA would need to support multiple input interfaces as different devices may lack some input methods such as a keyboard or touchscreen. Another potential issue is that screen reader support for interface elements may unintentionally provide a backdoor for the CAPTCHA to be bypassed by allowing a bot to play the game.

2.1.5 Biometrics

Biometric identifiers have become a very popular authentication mechanism, especially on mobile platforms which routinely now provide the requisite hardware. Some physical characteristic of the user, such as a fingerprint or a facial profile, is first acquired and then recognized to verify the individual’s identity. This process effectively limits the ability of web robots to create a large number of false identities.

However, biometric authentication mechanisms also need to be carefully designed to avoid introducing accessibility barriers. Individuals who lack the biological characteristics required by a particular authentication method, e.g., fingers, or who are unable to perform the enrollment procedures, e.g.,senior citizens whose fingerprints can no longer be reliably sensed due to aging, are effectively precluded from using a fingerprint biometric. This can result in denial of access to certain users with disabilities and explains why reliance on a single biometric identifier is insufficient to satisfy public sector procurement standards in the European Union EN 301 549, section 5.3 [en-301-549] and regulations under section 508 of the Rehabilitation Act and Section 255 of the Communications Act, 36 CFR 1194, Appendix C, section 403 in the United States [36-cfr-1194].

For this reason, biometric identification systems should be designed to allow users to choose among multiple and unrelated biometric identifiers. With that sole caveat, properly designed biometric identification systems are particularly attractive in situations where it is necessary to identify a particular human user. Their reliability is high, the cognitive load placed on the user low, and they are particularly difficult to foil. They have not yet been rendered suitable, however, in circumstances when it is necessary to preserve the user’s anonymity (i.e., the task is verifying that the user is human, without providing identifying information).

2.2 Non-Interactive Stand-Alone Approaches

While traditional CAPTCHA and other interactive approaches to limiting the activities of web robots are sometimes effective, they do make using a site more cumbersome. This is often unnecessary, as non-interactive mechanisms exist to check for spam or other invalid content typically introduced by robots.

The approaches described in this section can be regarded as alternatives or complimentary approaches to traditional CAPTCHA.

Since a CAPTCHA can sometimes be circumvented by an attacker (e.g., by using crowd-sourcing techniques), cryptographic keys can in some circumstances be hijacked. Detecting and responding to web robots that have successfully and unexpectedly gained access to a web resource is thus desirable even in the presence of other measures. The advantage of limiting sensory and cognitive demands on people with disabilities only accrues when these non-interactive strategies are used alone, or when they are combined with other traditional CAPTCHA-avoidance approaches.

2.2.1 Spam Filtering

Applications that use continuous authentication and “hot words” to flag spam content, or Bayesian filtering to detect other patterns consistent with spam, are very popular, and quite effective. While such risk analysis systems may experience false negatives from time to time, properly-tuned systems can achieve results comparable to a traditional visual CAPTCHA, while also removing the added cognitive burden on the user and eliminating access barriers.

Most major blogging software contains spam filtering capabilities, or can be fitted with a plug-in for this functionality. Many of these filters can automatically delete messages that reach a certain spam threshold, and mark questionable messages for manual moderation. More advanced systems can control attacks based on posting frequency, filter content sent using the Trackback protocol, and ban users by IP address range, temporarily or permanently.

2.2.2 Proof-Of-Work Techniques

One strategy for thwarting misuse of web resources is to load suspicious clients with significant computational workloads, thus slowing down the interaction and hopefully deterring malicious parties by reducing their ability to engage in undesirable activities such as disseminating spam. This approach has been explored in the development of proof-of-work challenges. [kaPoW-plugins] These can be made arbitrarily expensive computationally based on an associated reputation score for each client that attempts to access a resource. Clients that are adjudged more likely to be malicious are required to solve more computationally expensive problems. Less resource-consumptive problems are provided to clients that are adjudged more likely to be web browsers operated by human users.

The proof of work approach should have a negligible effect on the human user's interactive experience, provided that the reputation scoring is relatively accurate. However, it is designed to impose substantial cost on the operators of web robots—perhaps even greater than the cost of hiring human workers as CAPTCHA solvers.

Implementing this approach is straightforward. It requires the client to execute JavaScript code to solve the computational problem, and the solution is then verified by a server to establish that the work has been performed. It poses no direct accessibility problems, though it may slow performance for users of older hardware.

2.2.3 Heuristic Approaches

Heuristics are discoveries in a process that seem to indicate a given result. It may be possible to detect the presence of a robotic user based on the volume of data the user requests, series of common pages visited, IP addresses, data entry methods, or other signature data that can be collected.

Again, this requires a careful examination of site data. If pattern-matching algorithms can’t find good heuristics, then this is not a good solution. Also, polymorphism, or the creation of changing footprints, is apt to result, if it hasn’t already, in robots, just as polymorphic (“stealth”) viruses appeared to get around virus checkers looking for known viral footprints.

Another heuristic approach identified in Botz-4-Sale: Surviving DDos Attacks that Mimic Flash Crowds [killbots] involves the use of CAPTCHA images, with a twist: how the user reacts to the test is as important as whether or not it was solved. This system, which was designed to thwart distributed denial of service (DDoS) attacks, bans automated attackers which make repeated attempts to retrieve a certain page, while protecting against marking humans incorrectly as automated traffic. When the server’s load drops below a certain level, the CAPTCHA-based authentication process is removed entirely.

2.2.4 Honeypots

Providing a CAPTCHA visible to robots but not to humans appears to be sufficiently successful to be supported in several content management systems such as Drupal Honeypots and in several commercial WordPress plugins. The form is created to attract robots and then hidden from the user by markup such as CSS-Hidden. It's an approach that is easily implemented even in hand authored markup and should be considered. The Hilton Hotel Corporation has used a honeypot CAPTCHA on the Sign In page for Hilton Honors, its loyalty program website where a prominent focusable field is labeled: "This field is for robots only. Please leave blank."

2.2.5 Limited-Use Accounts

Users of free accounts very rarely need full and immediate access to a site’s resources. For example, users who are searching for concert tickets may need to conduct only three searches a day, and new email users may only need to send the same notification of their new address to their friends. Sites may create policies that limit the frequency of interaction explicitly (that is, by disabling an account for the rest of the day) or implicitly (by slowing the response times incrementally). Creating limits for new users can be an effective means of making high-value sites unattractive targets to robots.

Drawbacks to this approach include the need to perform sufficient testing and data collection to determine useful limits that will serve human users yet frustrate robots. It requires site designers to look at statistics of normal and exceptional users, and determine whether clear demarcation exists between them.

3. Multi-Party Approaches

3.1 Public-Key Infrastructure (PKI) Approaches

Another approach is to use certificates for individuals who wish to verify their identity. A party relying on a certificate offered by a user attempting to access online services can assess the trustworthiness of the certificate's issuer, and the likelihood that the private key has been compromised, in evaluating the risk that the offerer is actually a web robot rather than a human agent. Highly trusted certification authorities such as governments, as in Estonia's e-Residency Program require evidence of an individual's identity as a basis for issuing a certificate. Provided that the private key is not compromised and cannot be misused by an attacker, there is a high degree of assurance that messages cryptographically signed by it which could serve to establish the user's identity to web-based services have genuinely been authorized by the certificate holder.

The use of certificates as an indicator that an access attempt has been authorized by a human discloses the user's identity to the web service provider, and thus should not be deployed in circumstances in which anonymity is necessary. In addition, Transport Layer Security (TLS) client certificate authentication, as defined in TLS 1.2 and earlier versions of the protocol, gives rise to privacy concerns [tls-tracking].

A variant of this concept, in which only people with disabilities who are affected by other verification systems would register, is sometimes proposed. Such approaches raise significant privacy and stigmatization concerns and are usually opposed strongly by people with disabilities themselves and by organizations that serve them. Such approaches should not be confused with situations where people voluntarily self-identify as individuals with disabilities. An example is the U.S. based Bookshare whose services are only available to persons with documented print disabilities under the terms of an international copyright treaty administered by the United Nations' World Intellectual Property Organization (WIPO) and known as the Marrakesh Treaty. [marrakesh]

3.2 The Google reCAPTCHA

Acquired in 2009 from Carnegie Mellon University, Google's reCAPTCHA overwhelmingly dominates CAPTCHA deployment on the web today. However, reCAPTCHA Version 1 is no longer supported.

3.2.1 Version 2: Are you a robot?

reCAPTCHA Version 2 provided an API that was most effectively marketed as the "no CAPTCHA re CAPTCHA," and its checkbox proclaiming: "I'm not a robot" became a cultural icon, spawning various cultural offshoots in art, theater, and popular music.

The checkbox was, of course, never a checkbox in the traditional HTML sense. The pseudo-checkbox process became a prodigious collector of user data well beyond mouse movement and keyboard navigation, including the date, the language the browser is set to, all cookies placed by Google over the last 6 months, CSS information for that page, an inventory of mouse clicks made on that screen (or touches if on a touch device), an inventory of plugins installed on the browser, and an itemization of all javascript objects, all to determine whether the user is human or robot. Of course Google also generally knows much about individual users, including their customary IP addresses, the telephone numbers and email addresses of their friends, family and colleagues, where they have been at every moment of every day, as well as their web search and YouTube habits. This is why the simple checkbox could keep the CAPTCHA process disarmingly simple, though it also explains why a link to Google's privacy policy has always accompanied the "no CAPTCHA reCAPTCHA". Disclosure and certain provisions of the Privacy Policy are required to satisfy legal requirements in California and in the E.U.

Even though specific WCAG failures were often noted, Google's reCAPTCHA V2 was for a time regarded the most accessible CAPTCHA solution for one simple reason, it was capable of being comfortably completed using a variety of assistive technologies. More recently it has been widely observed that utilizing keyboard navigation, as many assistive technology users do, no longer works. Instead, users are presented with a traditional inaccessible CAPTCHA as a fall-back mechanism. Our own tests with various browsers on various operating environments have been generally successful with Google's own reCAPTCHA test page. However, browsing in incognito mode, clearing or blocking cookies, and additional factors can apparently trigger a fallback to traditional CAPTCHA these days for many assistive technology users.

One reCAPTCHA V. 2 innovation seems most promising. Rather than reproduce characters, users are asked to type the words they see (or hear). It even appears unnecessary to spell these correctly or to enter all the words presented in order to be adjudged human.

Most disappointingly, it now appears that audio CAPTCHAs previously available with V. 2 implementations are now sometimes no longer being provided. Instead users see a message that reads: "Your computer or network may be sending automated queries. To protect our users, we can't process your request right now." Users who have depended on audio CAPTCHA alternatives, who have previously been able to function with reCAPTCHA v.2., are thus suddenly and seemingly capriciously locked out and denied service on sites still using V.2.

3.2.2 The Noninteractive Version 3

Late in 2018 Google released reCAPTCHA V3 promising to eliminate "the need to interrupt users with challenges at all." Google also informed us that their goals with V.3 included increasing "the accessibility of the web by removing traditional CAPTCHAs" entirely. Obviously, fully noninteractive Turing testing is a most welcome development direction for accessibility. When the noninteractive Turing test returns a score indicating high confidence that the user is human, or indeed a score indicating high confidence that the user is a robot, and experience has demonstrated the noninteractive engine is reliable, we can only offer praise and gratitude for technological progress that more effectively supports persons with disabilities.

Of course no approach will always return unambiguous results. In such situations Google advises that content providers "use a secondary challenge that makes sense in the context of their site such as two-factor authentication, send the post to moderators, or combine the score with signals specific to their site to make a more informed judgment." Google intends that traditional CAPTCHA no longer be used as a fallback mechanism and has dropped it from V.3 reCAPTCHA, though it remains in their slightly older, 2017 reCAPTCHA V.2 Invisible service.

The reality is that what action is taken in response to an ambiguous core returned by V. 3 is in the hands of the content provider. Services like reCAPTCHA gain their market share by offering to relieve the content provider of the hard work inherent in mounting effective and accessible Turing testing. Sadly this leaves the door open to any fallback approach a content provider might choose. Meanwhile, Google's reCAPTCHA FAQ declares that reCAPTCHA Version 2 is not going away. It is therefore imperative that methods for disambiguating an ambiguous noninteractive score be well documented and easily implementable in order to better overcome the tendency to simply adopt the old familiar approach.

3.3 Leveraging the Multi-Device Environment

It has become common for many, though by no means all users to access various on line services through multiple devices such as desktop and mobile computers, smart phones, tablets, and wearables such as smart watches. This proliferation has led to online services delivering identification solutions that take into account a combination of multi-device and multi-platform vectors for simple and effective user authentication, including persons with disabilities [auth-mult]. We note that several major service providers (such as Facebook) now support cross-site user authentication. However, in relation to the specific ability to tell a human and bot apart, it appears only Google's V. 3 reCAPTCHA API provides cross-site CAPTCHA services without actually passing specific identifying data.

We would expect Google's V. 3 reCAPTCHA system would score no need to present a CAPTCHA whenever another browser tab is already properly logged in to a Google product such as Calendar on whatever registered user device. However, while this may prevent the third party site from collecting personal data, it does assist Google in acquiring more user data. This constitutes a significant cost to the user's privacy in an industry so capable of cross-referencing massive amounts of data in the absence of meaningful regulations and controls on where and how that data may be used. This is a very strong accessibility concern as people with disabilities are generally reluctant to disclose any information about their disability on the web except when and only when they expressly choose to reveal that information themselves for their own particular reasons.

The multi-device environment is widely used to authenticate a human user by requiring some action on a registered second device, most often a smart cellular telephone. Known as "dual factor authentication," this process is mandatory at each of the three largest email service providers, Gmail, Yahoo, and Outlook which accept outbound mail only after the user has authenticated through a telephone number they are required to provide. Similarly, should Twitter spot activity it consider suspicious, it will hold tweets until the user revalidates through both a reCAPTCHA challenge and a telephone call. Yet increasingly, as telemarketing calls proliferate, web users are reluctant to provide data aggregators their personal telephone numbers. Clearly, a voice only authentication approach also cannot serve deaf and hard of hearing users properly.

Access a Google account service through a new browser or laptop and Google will hold off granting access until the user responds to a pop-up "toast" message on the registered telephone device showing the user's photo and asking: "Was that you?" The user must verify that it was indeed they before access can continue. A variant previously common at Google and still in use elsewhere places a voice call or sends a text message with a short code the user is required to input into a form field to continue.

Another variant of this approach, one employed by Cisco's Webex teleconference service, asks the user to press a particular key on their telephone to continue. This is easy enough on a desk phone, but it becomes problematic for the text to speech (TTS) dependent smart phone user who must now hear the phone's TTS voice in order to get the dialpad to pop up, and then find the appropriate touch tone key all at the same time as the Webex service voice is also speaking, repeating: "Welcome to Webex. Press 1 to be connected to your meeting. ... Welcome to Webex. Press 1 to be connected to your meeting. ..." It is important to recognize that both of these voices are routed through the same physical device speaker, even on units equipped with dual speakers for playing music in stereo.

Providing the user the option of contact via voice and/or text is good, but some services offer only text. This disadvantages the user without an accessible text capable device, and there are many such users. Similarly, services offering only a voice call option disadvantage deaf and hearing impaired users. As ever, the rule must be to provide options for the user to choose among, including fallback options.

3.4 Turing Tokens from the Cloud?

The "cloud" has become a well-known term among computer users. It describes the growing concentration of web content and software service delivery in content delivery networks (CDN) such as Akamai, Cloudflare, and Amazon Cloudfront. These CDNs provide the value add of localized last mile cached content delivery and the ability to effectively deflect various malicious activity such as denial of service (DOS) attacks. As almost two-thirds of Internet content is now delivered by CDNs, they are now also unintentionally forced to become Turing test arbiters. This in turn has resulted in the development of fresh innovative approaches to CAPTCHA such as Privacy Pass [privacy-pass], now available as a browser extension on Cloudflare.

While Privacy Pass still begins with a CAPTCHA challenge, it does provide the user a trove of cryptographically blinded tokens which can satisfy further challenges in the background and dramatically reduce interactive CAPTCHA challenges. Most refreshingly it offers meaningful privacy protection, even anonymity, while reliably validating the user is human. Essentially, the CDN is functioning as a trust broker on the user's behalf. When a user "spends" a token, they're saying to the site they're accessing: "You don't trust me, but you do trust the entity that issued this token, and they're vouching for me." As this approach is developed further, we can reasonably hope the onus of the initial challenge can be further mitigated with robust support for web accessibility, perhaps by expanding available initial CAPTCHA validation approaches, e.g.,adding support for biometrics.

3.5 Why not Federated Turing Tokens?

Even more valuable than federated identity or CDN's Turing Tokens would be the service that leveraged both user authentication and trusted anonymity. The more useful service is the service that can deliver content with minimal user interruption, but also allow the user to interact across the Internet with reliable trust credentials. this new protocol and its trustworthy status across the industry to offer individual users an enhanced quality VPN service providing not just cross-site login, but cross-site Turing authentication, perhaps for a monthly fee. Such a service could conceivably even validate and broker all the user's registered devices across the Internet anonymizing even financial transactions and shippingdata as a full service on line trust broker. Letters of credit had a similar beginning in international finance some centuries hence, so why not on line Turing Tokens and privacy protection today?

4. Conclusion

CAPTCHA development has certainly become more sophisticated over time. This has included the development of several alternatives to text-based characters contained in bitmapped images, some of which have served to support access for persons with disabilities. However, it has also become clear not only that traditional CAPTCHA continues to be challenging for people with disabilities, but also that it is increasingly insecure and arguably now ill suited to the purpose of distinguishing human individuals from their robotic impersonators.

Yet the need for reliable and accessible solutions persists. In fact the need has arguably become more urgent as the limits of authenticated login alone have become more and more evident in misuse of major services around the globe.

It is therefore highly recommended that the purpose and effectiveness of any deployed CAPTCHA solution be carefully considered before adoption, and then closely monitored for effective performance. As with all good software and on line content provisioning, analysis should begin with a careful consideration of system requirements and a thorough understanding of user needs, including the needs of persons with disabilities.

Clearly, some approaches such as Google's reCAPTCHA, two-step or multi-device verification can be easily and affordably deployed. Yet problems persist even in these systems, especially for non English speakers. Furthermore, deployers of such approaches should be aware that they are participating in exposing their users to a massive collection of personal data across multiple trans-national data profiling systems, quite apart from any societal governance.

It is important, therefore, also to consider available stand-alone approaches such as honeypots and heuristics, along with current image and aural CAPTCHA libraries that support multiple languages. As always, testing and system monitoring for effectiveness should supply the ultimate determination, even as we recognize that an effective system today may prove ineffective a few years from now.

We summarize our conclusions in the following points:


  1. Risk analyses of attempts to access a resource are generally desirable. Some on line resources are simply greater targets than others. It is critical that analyses include an evidence based determination of how challenging a CAPTCHA needs to be. Users should not be forced beyond what is strictly necessary to keep a site secure, e.g. if a honeypot suffices, use a honeypot until evidence of robotic attacks dictates something else.
  2. Whenever an interactive CAPTCHA is to be implemented in order to obviate security and privacy considerations, it is important to minimize how often users are subjected to interactive CAPTCHA challenges. With CAPTCHA less interaction is clearly more accessibility. As noted above, we're encouraged by the development of approaches such as Privacy Pass.
  3. Whenever an interactive CAPTCHA is implemented, a variety of alternative challenges must be made available to engage different sensory and cognitive capabilities of the user in order that the user can choose an approach that best fits their abilities. We humans possess a variety of intellectual strengths and weaknesses. To fail to offer a variety of challenges is to ignore this simple truth.
  4. All else being equal, we prefer non-interactive approaches because these pose no accessibility challenges. However, they may expose the user to the collection of personal data.
  5. Third parties may be engaged to verify the authenticity of an access attempt. However, such solutions may give rise to privacy trade-offs.

In other words, while some CAPTCHA approaches are better than others, and while more recent approaches offer clear advantage over older approaches, there is still no single, ideal solution. It is important to exercise care that any implemented CAPTCHA technology correctly allow people with disabilities to identify themselves as human.

A. Terms

The following terms are used in this document:

AI
Artificial Intelligence
alternative text
Text that is associated with, and provides a brief description or label of, non-text content.
assistive technology
Hardware and / or software that acts as a stand-alone user agent, or alongside a mainstream user agent to meet the functional requirements of users with disabilities that go beyond those provided by mainstream user agents alone.
Bayesian filter
Recursive probabilistic heuristic to categorize content, typically used in spam filtering.
CAPTCHA
“Completely Automated Public Turing Test to distinguish between Computers and Humans” relying on a challenge believed to be difficult for machines to satisfy correctly but relatively easy for humans.
continuous authentication
Mechanism to determine that a user is still the one previously verified without requiring interactive re-authentication.
heuristic
Way to solve a problem with high reliability though not perfection.
honeypot
A decoy service intended to elicit interaction from web robots.
non-interactive
@@
public-key infrastructure
Authentication of the entity which has encrypted content via a registered decryption key.
robot
Software application that performs automated tasks on web content.
screen reader
Assistive technology that renders content as speech or Braille.
spam filter
Software that processes email messages to separate undesired, usually automated, messages from desired messages.
spider
Robot that processes web content and recursively follows links to process the content at the link target.
Turing test
A test to determine whether responses provided by a software application are distinguishable from the responses of human individuals.
user agent
Any software that retrieves, renders and facilitates end user interaction with web content.
VPN
Virtual Private Network

B. Acknowledgments

B.1 Contributors to This Version:

B.2 Contributors to the Previous Version:

B.3 Enabling Funders

This publication has been funded in part with U.S. Federal funds from the Health and Human Services, National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR) under contract number HHSP23301500054C. The content of this publication does not necessarily reflect the views or policies of the U.S. Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.

C. References

C.1 Informative references

[36-cfr-1194]
36 CFR Appendix C to Part 1194, Functional Performance Criteria and Technical Requirements. Legal Information Institute. URL: https://www.law.cornell.edu/cfr/text/36/appendix-C_to_part_1194
[3d-captcha-security]
On the security of text-based 3D CAPTCHAs. Nguyen, V. D.; Chow, Y.-W.; Susilo, W.. Computers & Security, 45. 2014.
[auth-mult]
Design, Testing and Implementation of a New Authentication Method Using Multiple Devices. Cetin, C.. J. Ligatti, D. Goldgof, & Y. Liu (Eds.): ProQuest Dissertations Publishing. 2015.
[captcha-ld]
The Effect of CAPTCHA on User Experience among Users with and without Learning Disabilities. Gafni, R.; Nagar, I..
[captcha-ocr]
CAPTCHA: Attacks and Weaknesses against OCR technology. Silky Azad; Kiran Jain. Journal of Computer science and Technology. URL: https://computerresearch.org/index.php/computer/article/download/368/368
[captcha-robustness]
Effects of Text Rotation, String Length, and Letter Format on Text-based CAPTCHA Robustness. Tangmanee, C.. Journal of Applied Security Research, 11(3). 2016.
[captcha-security]
CAPTCHA Security: A Case Study. Yan, J.; El Ahmad, A. S.. Security & Privacy, IEEE, 7(4). 2009.
[captchastar]
CAPTCHaStar! A novel CAPTCHA based on interactive shape discovery. Conti, M.; Guarisco, C.; Spolaor, R.. 2015.
[defeat-line-noise]
Defeating line-noise CAPTCHAs with multiple quadratic snakes. Nakaguro, Y.; Dailey, M. N.; Marukatat, S.; Makhanov, S. S.. Computers & Security, 37. 2013.
[en-301-549]
EN 301 549: Accessibility requirements suitable for public procurement of ICT products and services in Europe. CEN/CENELEC/ETSI. URL: http://mandate376.standards.eu/standard
[eval-audio]
Evaluating existing audio CAPTCHAs and an interface optimized for non-visual use. Bigham, J. P.; Cavender, A. C.. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. April 2009.
[facecaptcha]
FaceCAPTCHA: a CAPTCHA that identifies the gender of face images unrecognized by existing gender classifiers. Kim, J.; Kim, S.; Yang, J.; Ryu, J.-h.; Wohn, K.. An International Journal, 72(2). 2014.
[game-captcha]
Game-based image semantic CAPTCHA on handset devices. Yang, T.-I.; Koong, C.-S.; Tseng, C.-C.. An International Journal, 74(14). 2015.
[kaPoW-plugins]
kaPoW plugins: protecting web applications using reputation-based proof-of-work. Tien Le; Akshay Dua; Wu-chang Feng. Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality.
[killbots]
Botz-4-Sale: Surviving DDos Attacks that Mimic Flash Crowds. Srikanth Kandula; Dina Katabi; Matthias Jacob; Arthur Burger. URL: https://www.usenix.org/legacy/events/nsdi05/tech/kandula/kandula_html/
[marrakesh]
Marrakesh Treaty to Facilitate Access to Published Works for Persons Who Are Blind, Visually Impaired or Otherwise Print Disabled. World Intellectual Property Organization. 27 June 2013. URL: https://www.wipo.int/treaties/en/ip/marrakesh
[newscom]
Spam-bot tests flunk the blind. Paul Festa. News.com. URL: https://web.archive.org/web/20030707210529/http://news.com.com/2100-1032-1022814.html
[privacy-pass]
Privacy Pass: Bypassing Internet Challenges Anonymously. Alex Davidson; Ian Goldberg; Nick Sullivan; George Tankersley; Filippo Valsorda. Proceedings on Privacy Enhancing technologies; 2018 (3):164-180. URL: https://www.petsymposium.org/2018/files/papers/issue3/popets-2018-0026.pdf
[recaptcha-attacks]
HMM-based Attacks on Google's ReCAPTCHA with Continuous Visual and Audio Symbols. Sano, S.; Otsuka, T.; Itoyama, K.; Okuno, H. G.. Journal of Information Processing, 23(6). 2015.
[social-classification]
Social and egocentric image classification for scientific and privacy applications. Korayem, M.. D. Crandall, J. Bollen, A. Kapadia, & P. Radivojac (Eds.): ProQuest Dissertations Publishing. 2015.
[task-completion]
Towards a universally usable human interaction proof: evaluation of task completion strategies. Sauer, G.; Lazar, J.; Hochheiser, H.; Feng, J.. ACM Transactions on Accessible Computing (TACCESS), 2(4). 2010.
[tls-tracking]
Exploiting TLS Client Authentication for Widespread User Tracking. Lucas Foppe; Jeremy Martin; Travis Mayberry; Eric C. Rye; Lamont Brown. Proceedings on Privacy Enhancing Technologies. URL: https://www.petsymposium.org/2018/files/papers/issue4/popets-2018-0031.pdf