This document describes requirements for the layout and presentation of text in languages that use the Bengali script when they are used by Web standards and technologies, such as HTML, CSS, Mobile Web, Digital Publications, and Unicode.

This early draft has not yet been through any review process. Please do not rely on the contents.

This document describes the basic requirements for Bengali script layout and text support on the Web and in eBooks. These requirements provide information for Web technologies such as CSS, HTML and digital publications about how to support users of Bengali scripts. Currently the document focuses on Bengali as used for the Bangla language. The information here is developed in conjunction with a document that summarises gaps in support on the Web for Bengali.

The editor's draft of this document is being developed by the India International Program Task Force, part of the W3C Internationalization Interest Group. It will be published by the Internationalization Working Group. The end target for this document is a Working Group Note.

Sending comments on this document

If you wish to make comments regarding this document, please raise them as github issues . Only send comments by email if you are unable to raise issues on github (see links below). All comments are welcome.

To make it easier to track comments, please raise separate issues or emails for each comment, and point to the section you are commenting on  using a URL.

Introduction

About this document

Some text goes here.

Gap analysis

This document is pointed to by a separate document, Bengali Gap Analysis, which describes gaps in support for Bengali on the Web, and prioritises and describes the impact of those gaps on the user.

Wherever an unsupported feature is indentified through the gap analysis process, the requirements for that feature need to be documented. This document is where those requirements are described.

This document should contain no reference to a particular technology. For example, it should not say "CSS does/doesn't do such and such", and it should not describe how a technology, such as CSS, should implement the requirements. It is technology agnostic, so that it will be evergreen, and it simply describes how the script works. The gap analysis document is the appropriate place for all kinds of technology-specific information.

Other related resources

The document International text layout and typography index (known informally as the text layout index) points to this document and others, and provides a central location for developers and implementers to find information related to various scripts.

The W3C also maintains a tracking system that has links to github issues in W3C repositories. There are separate links for (a) requests from developers to the user community for information about how scripts/languages work, (b) issues raised against a spec, and (c) browser bugs. For example, you can find out what information developers are currently seeking, and the resulting list can also be filtered by script.

Bengali Script Overview

Bengali is an abugida. Consonant letters have an inherent vowel sound. Combining vowel-signs are attached to the consonant to indicate that a different vowel follows the consonant.

The orthographic syllable is the unit for various aspects of the behaviour of the script. The alphabet is split into vowels and consonants. With one exception (ɔ-kar), each vowel is represented by both an independent version and a combining vowel sign.

Text runs horizontally, left to right, and lines typically break at the spaces between words.

The script has no upper-/lowercase distinction.

The basic unit for text segmentation is the syllable. Unicode grapheme clusters don't cover consonant clusters, so some additional processing is needed to identify text unit boundaries.

Bengali script summary can be read for a high level overview of characters used for the script, and some basic features. Text from that the latter part of that page was used for the initial version of this document.

Text direction

Bengali is written horizontally, left to right.

Structural boundaries & markers

Grapheme boundaries

The basic unit for working with Bengali text is the orthographic syllable, ie. one consonant or a sequence of consonants with hasant between, plus optional additional combining characters (such as vowel-signs).

In Bengali an orthographic syllable that forms a conjunct should be treated as an indivisible unit of text for most editing operations. shows a Bengali word with a conjunct at the end, and the expected segmentation.

ঝিল্লি  →  ঝি+ল্লি

Expected minimal units (right) during segmentation of the word ঝিল্লি jhilli.

If, however, a conjunct is not formed and the hasant is visible, the first consonant plus hasant would be treated as separate from the second consonant, and the vowel-sign would appear to the left of the second consonant (see ).

ঝিল্‌লি  →  ঝি+ল্+লি

Expected segmentation of the word ঝিল্লি jhilli when there is no conjunct.

Note that in Bengali an orthographic syllable may be longer than a Unicode grapheme cluster, if it forms a conjunct. shows a Bengali word with a conjunct at the end, and the segmentation that would result from applying Unicode grapheme clusters only.

ঝিল্লি  → ঝি+ল্+লি

Segmentation of the word ঝিল্লি jhilli with a conjunct when using Unicode grapheme clusters.

For Bengali, applications need to provide tailored extensions to correctly segment the text. Such tailoring needs to be able to distinguish between sequences that are displayed as conjuncts, and those where the hasant is visible.

Word boundaries

Words are separated by spaces.

Phrase boundaries: Danda & double danda

[U+0964 DEVANAGARI DANDA], is used for sentence final punctuation.

There are two alternative approaches to the use of spaces with danda:

  1. No space character appears between the end of the phrase and the danda glyph, but the advance width of the danda in a font should open a small gap before it. The danda is then typically followed by a single space.
  2. A space is allowed before and after the danda in order to balance the space before and after it. In this case, the danda must still be kept from wrapping to a new line on its own; it should wrap with the previous word and space together.

These same principles apply to [U+0965 DEVANAGARI DOUBLE DANDA].

The double danda should be written using the dedicated Unicode character, and not by combining two single dandas.

The double danda is sometimes used to set apart section or verse numbering, in which the number is placed between pairs of double dandas. To obtain the correct spacing, the character sequence is usually <double danda, space, numeral(s), double danda>.

Quotations

The default quote marks for Bengali should be [U+201C LEFT DOUBLE QUOTATION MARK] at the start, and [U+201D RIGHT DOUBLE QUOTATION MARK] at the end.

When an additional quote is embedded within the first, the quote marks should be [U+2018 LEFT SINGLE QUOTATION MARK] and [U+2019 RIGHT SINGLE QUOTATION MARK].   This is according to CLDR – need to check.

Font styles

Italics and bold are not traditional feature of Bengali text.

Text decoration

Underlining is not traditional feature of Bengali text

Line & paragraph layout

Line breaking

The primary break opportunities for line breaking are at inter-word spaces.

If a line is broken inside a word, any consonant clusters should be kept intact unless they are separated by visible hasant characters (see ).

Line breaking should not move a danda or double danda to the beginning of a new line, even if they are preceded by a space character. These punctuation characters should behave in the same way as a full stop does in English text.

Counters

Counters are used to number lists, chapter headings, etc.

Bengali uses a numeric counter style, based on the decimal model, and using the standard Bengali digits,'০' '১' '২' '৩' '৪' '৫' '৬' '৭' '৮' '৯' in a decimal pattern.

1 ⇨  2 ⇨  3 ⇨  4 ⇨ 
11 ⇨ ১১ 22 ⇨ ২২ 33 ⇨ ৩৩ 44 ⇨ ৪৪
111 ⇨  2222 ⇨ ২২২

Examples of counter values using the Bengali numeric counter style.

Acknowledgements

Special thanks to the following people who contributed to this document (contributors' names listed in in alphabetic order).

Akshat Joshi, Hai Liang, John Hudson, Vivek Pani.

Please find the latest info of the contributors at the GitHub contributors list.