[I18N-ACTION-612]

Recommendations for language and direction attributes in data formats


The Problem

There is no standard way to indicate language and base direction in JSON and related vocabularies. JSON, and related vocabularies such as Web IDL, are increasingly used in the Web platform to provide formats for data interchange. Document formats based on JavaScript use the String data type for textual data, including natural language text.

The Internationalization Working Group's recommendation for best practice is to allow for the language and base direction of text to be declared for each string. In addition, we usually recommend that document formats allow for default values to be declared at the document level. (Additional interior markup or tagging is also desirable in certain contexts, but this is beyond the scope of this document or not applicable to plain-text exchange of string data.)

While most markup languages such as XML or HTML provide a ready means of identifying language (such as in HTML using the lang attribute) or base direction (such as in HTML using the dir attribute), data interchange formats also require this information if data consumers are to be able to present, process, or display the data correctly.

We have created documents outlining the specific problems associated with the failure to identify language[1] [2] and base direction[3]. See these for specific illustrations of the need for language/direction metadata, as well as an examination of some of the possible work-arounds or solutions.

Proposal

The I18N WG would like to propose the following long term solutions to this problem:

Solution 1: New Data Type

Create a new data type whose serialization optionally includes language and direction. Examples:

	myLocalizedString: "Hello World!"@en^ltr
	myLocalizedString_fr: "Bonjour monde !"@fr
	myLocalizedString_ar: "مرحبا بالعالم!"@ar-EG^rtl
	myLocalizedString_und: "שלום עולם!"^rtl
	myLanguageNeutralString: "978-0-123-4567-X" // no language or direction for this non-natural-language string
Pros
Cons

Solution 2: Common Dictionary Pattern

Specify a common dictionary pattern for specifications. Define guidelines for document-level and item-level language/direction identification. Require new document formats and revisions of existing document formats to adopt these guidelines. This would take a form similar to that proposed for Payments API [4] which would address both needs. Here's how a document fragment might look:

[
  displayItems: [{
    label: "البند الخاص (للبيع!)",
    dir: "rtl",
    lang: "ar-AE",
    amount: { },
  }]
]
Pros
Cons

What if we did nothing?

Document formats based on JSON have existed for some years now. Much of the Web utilizes document formats based on JSON, using mechanisms such as XHR to populate Web pages dynamically or using formats such as XXX to exchange data between processes. Why aren't the issues outlined above more prevalent?

Pros

No effort.

Cons

Recreates or preserves the problems we've identified. Requires specifications to consider natural language text requirements on a case-by-case basis.

Appendices

Appendix A. Text of dictionary example from Payments API

Localizable dictionary

dictionary Localizable {    DOMString     lang;    TextDirection dir = "auto";
};
lang member
A [ BCP47 ] language tag that specifies the primary language for the values of the human-readable members of the inheriting dictionary.
dir member
Specifies the base direction for the human-readable members of an inheriting dictionary.

TextDirection enum

enum TextDirection {    "auto",    "ltr",    "rtl"
};

The text-direction values are the following, implying that the value of the human-readable members is by default:

auto
Directionality is determined by the [ BIDI ] algorithm.
ltr
Left-to-right text.
rtl
Right-to-left text.