The original IME API was written with APIs in this document, but for the standardization process to proceed smoothly, we extract what we agree on as a “main” part first to make it a public standard faster.
This document includes all the APIs including what is already included in the main part, but the editor is planning to make this diff-only (i.e. all interfaces will be partial).
The content of this document may (or may not) be merged into the main API spec in the future.

This specification defines an “IME API” that provides Web applications with scripted access to an IME (input-method editor) associated with a hosting user agent. This IME API includes:

An InputMethodContext interface, which provides methods to retrieve detailed data from an in-progress IME composition.
A CompositionEvent interface, which represents read-only attributes about the current composition, such as actual text and its associated information.

This API is designed to be used in conjunction with DOM events [[DOM-LEVEL-3-EVENTS]].

Introduction

Even though existing Web-platform APIs allow developers to implement very complicated Web applications, such as visual chat applications or WYSIWYG presentation editors, developers have difficulties when implementing Web applications that involve input-method editors. To mitigate the difficulties, the DOM Level 3 Events specification[[DOM-LEVEL-3-EVENTS]] introduces composition events to retrieve composition text while it is being composed in an associated IME.

However, Web applications can still run into difficulties when they want text input on non-editable elements such as the <div> element without contenteditable attribute being set to true, or interact with IMEs; those difficulties include the fact that a Web application cannot do the following:

get input outside any editable element even with focus
indicate to the user whether the Web application renders composition text by itself, or needs to ask user agents to render it
determine the place where user agents render composition text
determine the place where user agents render candidate windows

To solve these IME-related problems, this specification introduces an IME API that allows Web applications to interact with the IME. This specification introduces interfaces for compositions, so Web applications can read detailed composition data. A CompositionEvent object provides a reference to an ongoing IME composition, so Web applications can retrieve the composition text and its attributes. In addition, this API also gives Web applications the ability to give a hint as to where to position a composition window.

There are also proposed standards related to IME:

Changing IME mode via CSS (ime-mode property in CSS3 UI)
Controlling generic input modality (inputmode attribute in HTML spec)

They can work independently of this API.

Please also refer to the separate use cases for Input Method Editor API document for more information about possible use cases.

Consider the following examples.

This example is a simple web search page which gives a user search query suggestions while the user is doing composition. This example code hides the suggestion box when IME candidate window may overlap with it.

<!DOCTYPE html>
<html>
<head>
<style type="text/css">
#search0 {
    max-width: 400px;
}

#input0 {
    width: 100%;
}

#suggest0 {
    width: 100%;
    list-style: none;
    margin: 0;
    padding: 0;
    border-style: solid;
    border-width: 1px;
    border-color: #000;
}
</style>
<script language="javascript" type="text/javascript">
function init() {
    var node = document.getElementById('input0');
    // This code only handles the compositionupdate event for brevity of the
    // example, but of course other input field changes should also be handled.
    node.addEventListener('compositionupdate', onCompositionUpdate, false);

    // Register handlers for candidate window appearance change.
    var ctx = node.inputMethodContext;
    ctx.addEventListener('candidatewindowshow', onCandidateWindowShow, false);
    ctx.addEventListener('candidatewindowhide', onCandidateWindowHide, false);
}

// Sends an XHR request to get search suggestions.
// Upon receiving the result, expandSuggest() is called back.
function getSuggests(query) {
    // For brevity, implementation of this function is omitted.
}

function expandSuggest(candidates) {
    // Callback after getting search suggestions.
    var suggest = document.getElementById('suggest0');
    var i;
    // Clear old suggestions.
    for (i = 0; i < suggest.childNodes.length; i++) {
        suggest.removeChild(suggest.childNodes[0]);
    }
    // Render new suggeston list.
    for (i = 0; i < candidates.length; i++) {
        suggest.appendChild(document.createElement('li'));
        suggest.childNodes[i].textContent = candidates[i];
    }
}

function onCompositionUpdate(event) {
    var query = document.getElementById('input0').value;
    getSuggests(query);
}

// Hides suggest window once IME candidate window is shown.
function onCandidateWindowShow(event) {
    var suggest = document.getElementById('suggest0');
    suggest.style.display = 'none';
}

// Unhides suggest window once IME candidate window is closed.
function onCandidateWindowHide(event) {
    var suggest = document.getElementById('suggest0');
    suggest.style.display = '';
}
</script>
</head>
<body>
<div id="search0">
  <input type="text" id="input0" placeholder="search here">
  <ul id="suggest0"></ul>
</div>
</body>
</html>

This example shows the source which draws composition text in its own style.

For the usage of compositionupdate event, see also the note in Drawing Composition Text section.

<!DOCTYPE html>
<html>
<head>
<style>
.clause {
  margin-right: 2px;
  border-style: solid;
  border-width: 0px 0px 1px 0px;
  border-color: #333333;
}
.selected {
  border-width: 0px 0px 2px 0px;
  border-color: #000000;
}
</style>
<script language="javascript" type="text/javascript">
// compositionupdate event handler
function onCompositionUpdate(event) {
    var target = event.target;

    // Remove all children.
    while (target.childNodes.length < 0) {
        edit.removeChild(edit.childNodes[0]);
    }

    // Get the IME context.
    var ctx = target.inputMethodContext;

    // Create clause spans.
    var text = event.data;
    var segments = event.getSegments();
    for (i = 0; i < segments.length; i++) {
        var span = document.createElement('span');
        span.classList.add('clause');
        var selected = false;
        if (segments[i] == event.activeSegmentStart) {
            selected = true;
            span.classList.add('selected');
        }
        var end = text.length;
        if (i < segments.length - 1) {
             end = segments[i + 1];
        }
        span.textContent = text.substring(segments[i], end);
        target.appendChild(span);
        if (selected) {
            ctx.setCaretRectangle(document.body, rect.left, rect.top,
                                                 rect.width, rect.height);
        }
    }

    // Prevent the browser from drawing composition text.
    event.preventDefault();
}

function init() {
    var composition = document.getElementById('composition');
    composition.enableEditingEvents();
    composition.focus();
    composition.addEventHandler('compositionupdate', onCompositionUpdate, false);
}
</script>
</head>
<body onload="init();">
<div id="composition"></div>
</body>
</html>

The CompositionEvent Interface

This interface represents an ongoing IME composition, by extending CompositionEvent interface definedin [[DOM-LEVEL-3-EVENTS]]. It provides attributes representing the text being composed by an IME and its associated information.

readonly attribute long activeSegmentStart

Represents the 0-based index to the first character of the active segment in the current composition text.

An author can assume that activeSegmentStart and activeSegmentEnd boundary fit in with any of segments, and that activeSegmentStart is equal or less than activeSegmentEnd.

readonly attribute long activeSegmentEnd

Represents the 0-based index to the next position of the last character of the active segment in the current composition text. If there is no active segment, activeSegmentStart and activeSegmentEnd MUST be the same value and they indicate the position of the caret.

For example, if “DEF” is selected in composition text “abcDEFghi”, activeSegmentStart is 3 and activeSegmentEnd is 6. If the caret is between “B” and “C” in composition text “ABCD”, both actieSegmentStart and activeSegmentEnd are 2.

sequence<unsigned long> getSegments()

If composition text is segmented into clauses by an IME, this array contains 0-based indices of the starting character of each clause in increasing order. If composition text is not segmented, it contains one '0' element.

This can be used to render compostion text as segmented clauses, but how to visualize them are not specified here. It is recommended to follow the platform standard way to visualize the segmentation (e.g. using underline or reverse background etc.).

The way the platform standard renders composition text might be customized by a user (e.g. for accessibility). CSS property to indicate the current default style to render the composition is desired.

The InputMethodContext Interface

attribute EventHandler oncandidatewindowshow

This event should be fired immediately after the IME candidate window is set to appear, which means immediately after the position information for the candidate window has been identified.

Common things among oncandidatewindowshow/update/hide events:

To get a better performance, these events are of the generic "Event" interface and fired directly to the InputMethodContext object. They do not bubble or capture through the DOM tree.
Event handlers for these events will be able to use the "this" property of the event, or event.target / event.currentTarget to refer to the InputMethodContext object upon which the event is firing.
These events are not cancellable, meaning that the appearing of the candicate window cannot be controlled via these events by cancelling them.
Web applications need only register for these events once per element (the handlers will remain valid for as long as the element is alive.

attribute EventHandler oncandidatewindowupdate

This events should be fired after either:

The IME candidate window has been identified as needing to change size (but before animating to the new position) as a result of displaying new/changed alternatives or predictions.
The IME candidate window has been identified as needing to change size (but before animating to the new position) due to user-zoom, browser frame resize, or other action that changes the candidate window placement.

In either case, the event should be fired after the new size/location of the candidate window is known by the user agent.

attribute EventHandler oncandidatewindowhide

This event should be fired after the candidate window is fully hidden (after the dismissal animation has ended, if there is any). The event handler code will see that no ClientRect can be obtained inside this handler.

readonly attribute DOMString locale

Represents the locale of the current input method as a BCP-47 tag (e.g. "en-US"). The locale MAY be the empty string when inapplicable or unknown.

More clarification should be written on how IME locale should be described using BCP-47.

readonly attribute HTMLElement? target

Represents the element associated with the InputMethodContext object.

Once a target element gets deleted or modified not to accept any input, any access to the InputMethodContext interface through the object has no effect. Any method calls will just return, accesses to composition and target will return null, and accesses to locale will return the empty string.

readonly attribute unsigned long compositionStartOffset

Represents the starting offset of the composition relative to the target if a composition is occurring, or 0 if there is no composition in progress. For composition occurring on an <input> or <textarea> element, the compositionStartOffset is the starting offset of the composition within the target's value string. For compositions occurring on an element with the contentEditable flag set, then this is the starting offset relative to the target's textContent property (textContent is a linear view of all the text under an element).

readonly attribute unsigned long compositionEndOffset

Represents the ending offset of the composition, in a similar way to compositionStartOffset.

ClientRect getCandidateWindowClientRect()

An web application may use this information to explicitly control the position for its own input-related UI elements, such as search suggestions.

Note: client coordinates are in document pixels and have origin at the upper-left corner of the client area.

void confirmComposition()

Finishes the ongoing composition of the hosting user agent.

When a Web application calls this function, a user agent sends a compositionend event and a textInput event as a user types an ‘Accept’ key as written in “Input Method Editors” section the DOM Level 3 Events specification [[!DOM-LEVEL-3-EVENTS]].

This may cause the race condition when a script calls confirmComposition while system IME gets accept key in the same time, a user agent may confirm the ongoing composition twice. Instead, introducing cancelComposition() is proposed and when a script needs the behavior of confirmComposition(), it can commit the current composition and then call cancelComposition().

void setCaretRectangle(Node anchor, long x, long y, long w, long h)

Notifies the rectangle of composition text to a user agent. When a user agent renders a candidate window or a composition window, it uses this rectangle to draw windows next to the composition or prevent these windows from being rendered on this rectangle.

If a Web application wants to render composition text itself, it has to tell a user agent where the composition text will be located whenever it gets compositionstart event or compositionend event.

On Windows, this rectangle is used as a parameter for ImmSetCandidateWindow(). On Mac, this rectangle is sent when it calls [firstRectForCharacterRange:]. On Linux (GTK), this rectangle is used as a parameter for gtk_im_context_set_cursor_location().

The anchor parameter represents the DOM node against which the rectangle is positioned. This MAY be a different node than the node that listens to composition events which has focus and therefore a Web application can draw composition text where it does not have focus.
The x, and y are the offsets to the top-left of the rectangle relative to anchor node's top-left.
The w, and h are width and height of the rectangle.

These set values apply only to the current composition session (i.e. until compositionend event is sent).

A user agent MAY need to convert these coordinates to the screen coordinates when it shows a candidate window.

void enableEditingEvents()

Causes the target element of this context to get editing events (e.g. input, compositionupdate) delivered when the element is focused.

If this method is called on a editable element ( <input>, <textarea>, contenteditable=true), it has no effect.

void disableEditingEvents()

Causes the target element of this context to turn off receiving editing events (e.g. input, compositionupdate) after enableEditingEvents() is called.

If this method is called on a editable element ( <input>, <textarea>, contenteditable=true), it has no effect.

Best Practices

This specification provides an interface for developing IME-aware Web applications.

This section describes practices for some use-cases.

Life of InputMethodContext

Once a InputMethodContext interface is obtained, it should be valid for the lifetime of its target element's lifetime, as long as the element is editable or focusable. Once the target gets disabled, authors MAY NOT access an IME through the interface even after the target gets enabled again. Once the target is deleted, any access to the interface is void.

Any access to the InputMethodContext interface makes sense mostly when the target element has focus. In other words, it makes little sense if you access the interface when the target element doesn't have focus.

Drawing Composition Text

As of this writing, the order of compositionupdate event and DOM update is being discussed (see Bug 18931) and if compositionupdate event happens after DOM modification, this section has to be rewritten using beforeInput event, which is also being discussed. Example 2 code should be updated as well.

If a Web application wants to draw composition text by itself, it SHOULD handle the compositionupdate event to get notified from the IME that the composition text has been changed, and then use the interface described in this document to retrieve composition and let the IME know where the composition text is drawn by calling setCaretRectangle() method. If setCaretRectangle() is not called, IME will not have information about where to show IME UIs, and it may show UIs at an obtrusive position. To avoid this situation, a user agent may set some decent default position in the vicinity of the focused input field.

When a Web application draws composition text, it MUST call preventDefault() in compositionupdate handler so that the user agent will not draw the text.

When a Web application wants to handle DOM level3 composition events on a non-<input>, <textarea>, or contenteditable node, it MUST call enableEditingEvents() method beforehand and focus it to get composition events.

The following diagram shows the flow of events among the keyboard, the IME, the user agent, and the Web application when a user types ‘kyouha’ to convert to ‘今日は’.

Event flow of IME and an IME-aware Web application.

Introduction

Background: What’s an Input Method Editor?

Terminology and Algorithms

The inputMethodContext property

The CompositionEvent Interface

The InputMethodContext Interface

Best Practices

Life of InputMethodContext

Drawing Composition Text