CSS Text Module Level 4

Editor’s Draft,

More details about this document
This version:
https://drafts.csswg.org/css-text-4/
Latest published version:
https://www.w3.org/TR/css-text-4/
Previous Versions:
Feedback:
CSSWG Issues Repository
Inline In Spec
Editors:
Elika J. Etemad / fantasai (Apple)
(Invited Expert)
(Adobe Systems)
Florian Rivoal (Invited Expert)
Suggest an Edit for this Spec:
GitHub Editor
Test Suite:
https://wpt.fyi/results/css/css-text/

Abstract

This CSS module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, and text transformation.

CSS is a language for describing the rendering of structured documents (such as HTML and XML) on screen, on paper, etc.

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by W3C. Don’t cite this document other than as work in progress.

Please send feedback by filing issues in GitHub (preferred), including the spec code “css-text” in the title, like this: “[css-text] …summary of comment…”. All issues and comments are archived. Alternately, feedback can be sent to the (archived) public mailing list www-style@w3.org.

This document is governed by the 03 November 2023 W3C Process Document.

1. Introduction

Tests

The test coverage information in this specification covers wpt/css/css-text/ and subdirectories, as well as those tests in wpt/css/CSS2/ and subdirectories that relate to this specification.


This module describes the typesetting controls of CSS; that is, the features of CSS that control the translation of source text to formatted, line-wrapped text. Various CSS properties provide control over case transformation, white space collapsing, text wrapping, line breaking rules and hyphenation, alignment and justification, spacing, and indentation. See Additions Since Level 3 for additions since Level 3.

Note: Font selection is covered in the CSS Fonts Module. [CSS-FONTS-3]

Features for decorating text, such as underlines, emphasis marks, and shadows, (previously part of this module) are covered in the CSS Text Decoration Module. [CSS-TEXT-DECOR-3]

Bidirectional and vertical text are addressed in the CSS Writing Modes Module. [CSS-WRITING-MODES-4].

Further information about the typesetting requirements of various languages and writing systems around the world can be found in the Internationalization Working Group’s Language Enablement Index. [TYPOGRAPHY]

Tests

The following tests are crash tests that relate to general usage of the features described in this specification but are not tied to any particular normative statement.


1.1. Module Interactions

Tests

Tests not needed for this section.


This module, together with the CSS Text Decoration Module, replaces and extends the text-level features defined in Cascading Style Sheets Level 2 chapter 16. [CSS-TEXT-DECOR-3] [CSS2]

In addition to the terms defined below, other terminology and concepts used in this specification are defined in Cascading Style Sheets Level 2 and the CSS Writing Modes Module. [CSS2] and [CSS-WRITING-MODES-4].

1.2. Value Definitions

Tests

Tests not really needed for this section; could possibly test that css-wide keywords apply to every property.


This specification follows the CSS property definition conventions from [CSS2] using the value definition syntax from [CSS-VALUES-3]. Value types not defined in this specification are defined in CSS Values & Units [CSS-VALUES-3]. Combination with other CSS modules may expand the definitions of these value types.

In addition to the property-specific values listed in their definitions, all properties defined in this specification also accept the CSS-wide keywords as their property value. For readability they have not been repeated explicitly.

1.3. Languages and Typesetting

Tests

Tests not needed for this section: these are definitions, they get tested through their application, not by themselves.


Authors should accurately language-tag their content for the best typographic behavior.

Many typographic effects vary by linguistic context. Language and writing system conventions can affect line breaking, hyphenation, justification, glyph selection, and many other typographic effects. In CSS, language-specific typographic tailorings are only applied when the content language is known (declared). Therefore, higher quality typography requires authors to communicate to the UA the correct linguistic context of the text in the document.

The content language of an element is the (human) language the element is declared to be in, according to the rules of the document language. Note that it is possible for the content language of an element to be unknown—​e.g. untagged content, or content in a document language that does not have a language-tagging facility, is considered to have an unknown content language.

Note: Authors can declare the content language using the global lang attribute in HTML or the universal xml:lang attribute in XML. See the rules for determining the content language of an HTML element in HTML, and the rules for determining the content language of an XML element in XML 1.0. [HTML] [XML10]

The content language an element is declared to be in also identifies the specific written form of that language used in that element, known as the content writing system. Depending on the document language’s facilities for identifying the content language, this information can be explicit or implied. See the normative Appendix F: Identifying the Content Writing System.

Note: Some languages have more than one writing system tradition; in other cases a language can be transliterated into a foreign writing system. Authors should subtag such cases so that the UA can adapt appropriately.

For example, Korean (ko) can be written in Hangul (-Hang), Hanja (-Hani), or a combination (-Kore). Historical documents written solely in Hanja do not use word spaces and are formatted more like modern Chinese than modern Korean. In other words, for typographic purposes ko-Hani behaves more like zh-Hant than ko (ko-Kore).

As another example Japanese (ja) is typically written in a combination (-Japn) of Hiragana (-Hira), Katakana (-Kana), and Kanji (-Hani). However, it can also be “romanized” into Latin (-Latn) for special purposes like language-learning textbooks, in which case it should be formatted more like English than Japanese.

As a third example contemporary Mongolian is written in two scripts: Cyrillic (-Cyrl, officially used in Mongolia) and Mongolian (-Mong, more common in Inner Mongolia, part of China). These have very different formatting requirements, with Cyrillic behaving similar to Latin and Greek, and Mongolian deriving from both Arabic and Chinese writing conventions.

1.4. Characters and Letters

Tests

For the most part, tests not really needed for this section: these are definitions, they get tested through their applications, by themselves. The few testable assertions that are made have coverage.

Possible additions:


The basic unit of typesetting is the character. However, because writing systems are not always as simple as the basic English alphabet, what a character actually is depends on the context in which the term is used. For example, in Hangul (the Korean writing system), each square representation of a syllable (e.g. =Han) can be considered a character. However, the square symbol is really composed of multiple letters each representing a phoneme (e.g. =h, =a, =n) and these also could each be considered a character.

A basic unit of computer text encoding, for any given encoding, is also called a character, and depending on the encoding, a single encoding character might correspond to the entire pre-composed syllabic character (e.g. ), to the individual phonemic character (e.g. ), or to smaller units such as a base letterform (e.g. ) and any combining marks that vary it (e.g. extra strokes that represent aspiration).

In turn, a single encoding character can be represented in the data stream as one or more bytes; and in programming environments one byte is sometimes also called a character.

Therefore the term character is fairly ambiguous where technical precision is required.

For text layout, we will refer to the typographic character unit as the basic unit of text. Even within the realm of text layout, the relevant character unit depends on the operation. For example, line-breaking and letter-spacing will segment a sequence of Thai characters that include U+0E33  ำ THAI CHARACTER SARA AM differently; or the behavior of a conjunct consonant in a script such as Devanagari may depend on the font in use. So the typographic character represents a unit of the writing system—​such as a Latin alphabetic letter (including its diacritics), Hangul syllable, Chinese ideographic character, Myanmar syllable cluster—​that is indivisible with respect to a particular typographic operation (line-breaking, first-letter effects, tracking, justification, vertical arrangement, etc.).

Tests

Unicode Standard Annex #29: Text Segmentation defines a unit called the grapheme cluster which approximates the typographic character. [UAX29] A UA must use the extended grapheme cluster (not legacy grapheme cluster), as defined in UAX29, as the basis for its typographic character unit. However, the UA should tailor the definitions as required by typographic tradition since the default rules are not always appropriate or ideal—​and is expected to tailor them differently depending on the operation as needed.

Tests

Note: The rules for such tailorings are out of scope for CSS.

The following are some examples of typographic character unit tailorings required by standard typesetting practice:

A typographic letter unit (or letter for the purpose of this specification) is a typographic character unit belonging to one of the Letter or Number general categories. See Appendix E: Characters and Properties for how to determine the Unicode properties of a typographic character unit.

The rendering characteristics of a typographic character unit divided by an element boundary is undefined. Ideally each component should be rendered according to the formatting requirements of its respective element’s properties while maintaining correct shaping and positioning of the typographic character unit as a whole. However, depending on the nature of the formatting differences between its parts and the capabilities of the font technology in use, this is not always possible. Therefore such a typographic character unit may be rendered as belonging to either side of the boundary, or as some approximation of belonging to both. Authors are forewarned that dividing grapheme clusters or ligatures by element boundaries may give inconsistent or undesired results.

1.5. Text Processing

Tests

This section has adequate coverage. Exhaustive coverage unrealistic, since this section is effectively a dependency on all of Unicode. Some tests nonetheless provided for key functionality (such as the effect of certain control characters on Arabic shaping).


CSS is built on Unicode. [UNICODE] UAs that support Unicode must adhere to all normative requirements of the Unicode Core Standard, except where explicitly overridden by CSS. UAs implemented on the basis of a non-Unicode text encoding model are still expected to fulfill the same text handling requirements by assuming an appropriate mapping and analogous behavior.

Tests

For the purpose of determining adjacency for text processing (such as white space processing, text transformation, line-breaking, etc.), and thus in general within this specification, intervening inline box boundaries and out-of-flow elements must be ignored. With respect to text shaping, however, see § 8.7 Shaping Across Element Boundaries.

Tests

2. Transforming Text

2.1. Case Transforms: the text-transform property

Tests

This section has good test coverage overall, and very good i18n coverage in particular.

Missing tests:

Possible additions:


Name: text-transform
Value: none | [capitalize | uppercase | lowercase ] || full-width || full-size-kana | math-auto
Initial: none
Applies to: text
Inherited: yes
Percentages: n/a
Computed value: specified keyword
Canonical order: n/a
Animation type: discrete
Tests

This property transforms text for styling purposes. It has no effect on the underlying content, and must not affect the content of a plain text copy & paste operation.

Tests

Authors must not rely on text-transform for semantic purposes; rather the correct casing and semantics should be encoded in the source document text and markup.

Tests

Values have the following meanings:

none
No effects.
Tests
capitalize
Puts the first typographic letter unit of each word, if lowercase, in titlecase; other characters are unaffected.
Tests
uppercase
Puts all letters in uppercase.
Tests
lowercase
Puts all letters in lowercase.
Tests
full-width
Puts all typographic character units in full-width form. If a character does not have a corresponding full-width form, it is left as is. This value is typically used to typeset Latin letters and digits as if they were ideographic characters.
Tests
full-size-kana
Converts all small Kana characters to the equivalent full-size Kana. This value is typically used for ruby annotation text, where authors may want all small Kana to be drawn as large Kana to compensate for legibility issues at the small font sizes typically used in ruby.
Tests
math-auto
See MathML Core § 4.2 New text-transform value.
Tests
The following example converts the ASCII characters used in abbreviations in Japanese text to their full-width variants so that they lay out and line break like ideographs:
abbr:lang(ja) { text-transform: full-width; }

Note: The purpose of text-transform is to allow for presentational casing transformations without affecting the semantics of the document. Note in particular that text-transform casing operations are lossy, and can distort the meaning of a text. While accessibility interfaces may wish to convey the apparent casing of the rendered text to the user, the transformed text cannot be relied on to accurately represent the underlying meaning of the document.

In this example, the first line of text is capitalized as a visual effect.
section > p:first-of-type::first-line {
  text-transform: uppercase;
}

This effect cannot be written into the source document because the position of the line break depends on layout. But also, the capitalization is not reflecting a semantic distinction and is not intended to affect the paragraph’s reading; therefore it belongs in the presentation layer.

In this example, the ruby annotations, which are half the size of the main paragraph text, are transformed to use regular-size kana in place of small kana.
rt { font-size: 50%; text-transform: full-size-kana; }
:is(h1, h2, h3, h4) rt { text-transform: none; /* unset for large text*/ }

Note that while this makes such letters easier to see at small type sizes, the transformation distorts the text: the reader needs to mentally substitute small kana in the appropriate places—​not unlike reading a Latin inscription where all “U”s look like “V”s.

For example, if text-transform: full-size-kana were applied to the following source, the annotation would read “じゆう” (jiyū), which means “liberty”, instead of “じゅう” (jū), which means “ten”, the correct reading and meaning for the annotated “十”.

<ruby><rt>じゅう</ruby>

2.1.1. Mapping Rules

Tests

This section has adequate test coverage.


For capitalize, what constitutes a “word“ is UA-dependent; [UAX29] is suggested (but not required) for determining such word boundaries. Out-of-flow elements and inline element boundaries must not introduce a text-transform word boundary and must be ignored when determining such word boundaries.

Tests

Note: Authors cannot depend on capitalize to follow language-specific titlecasing conventions (such as skipping articles in English).

The UA must use the full case mappings for Unicode characters, including any conditional casing rules, as defined in the Default Case Algorithms section of The Unicode Standard. [UNICODE] If (and only if) the content language of the element is, according to the rules of the document language, known, then any appropriate language-specific rules must be applied as well. These minimally include, but are not limited to, the language-specific rules in Unicode’s SpecialCasing.txt.

Tests
For example, in Turkish there are two “i”s, one with a dot—​“İ” and “i”—​and one without—​“I” and “ı”. Thus the usual case mappings between “I” and “i” are replaced with a different set of mappings to their respective dotless/dotted counterparts, which do not exist in English. This mapping must only take effect if the content language is Turkish written in its modern Latin-based writing system (or another Turkic language that uses Turkish casing rules); in other languages, the usual mapping of “I” and “i” is required. This rule is thus conditionally defined in Unicode’s SpecialCasing.txt file.
Tests

The definition of full-width and half-width forms can be found in Unicode Standard Annex #11: East Asian Width. [UAX11] The mapping to full-width form is defined by taking code points with the <wide> or the <narrow> tag in their Decomposition_Mapping in Unicode Standard Annex #44: Unicode Character Database. [UAX44] For the <narrow> tag, the mapping is from the code point to the decomposition (minus <narrow> tag), and for the <wide> tag, the mapping is from the decomposition (minus the <wide> tag) back to the original code point.

Tests

The mappings for small Kana to full-size Kana are defined in Appendix G: Small Kana Mappings.

2.2. Expanding Between Words: the word-space-transform property

Tests

This section has generally good coverage.

Missing tests:


Name: word-space-transform
Value: none | [ space | ideographic-space ] && auto-phrase?
Initial: none
Applies to: text
Inherited: yes
Percentages: N/A
Computed value: as specified
Canonical order: per grammar
Animation type: discrete
Tests

Some languages and writing systems have alternative ways of delimiting words, either using different separating characters, or sometimes no visible character at all. This property allows authors to change the rendering from one style to another without needing to change the markup.

none
This property has no effect.
Tests
space
Expandable separators within the child text of this element are replaced by U+0020 SPACE.
Tests
ideographic-space
Expandable separators within the child text of this element are replaced by U+3000 IDEOGRAPHIC SPACE.
Tests
auto-phrase
If the content language is known and the user agent supports linguistic analysis for this language, the user agent must detect phrase boundaries. If a word-separator character, other space separator, or U+200B ZERO WIDTH SPACE character does not already occur at that boundary, then the UA must insert a virtual expandable separator.
Tests

If this value is omitted, or if the content language is unknown, or if the user agent does not support detecting phrase boundaries for that language, there are no virtual expandable separator.

Tests

For the purpose of this property, expandable separators are any of:

A virtual expandable separator is a UA-detected syntactic boundary in the text that represents an expandable separator not otherwise occuring in the source document. It has no effect other than for this property.

The user agent must not replace expandable separators immediately preceding or following a forced line break (ignoring any intervening inline box boundaries, and associated margin/border/padding).

Tests

Note: Because virtual expandable separators are placed in the outermost element that participates in an inline box boundary, if one would coincide with boundary of an inline box whose parent box has a used value of word-space-transform: none, that particular virtual expandable separator is not expanded, since it would be placed in the parent box.

Tests

Like text-transform, this property transforms text for styling purposes only. It has no effect on the underlying content, and must not affect the content of a plain text copy & paste operation.

Tests

Unlike books for adults, Japanese books for young children often feature spaces between sentence segments, to facilitate reading. People with dyslexia also tend to find this style easier to read.

Absent any particular styling, the following sentence would be rendered as depicted below.

<p>むかしむかし、<wbr>あるところに、<wbr>おじいさんと<wbr>おばあさんが<wbr>すんでいました。

むかしむかし、あるところに、おじいさんとおばあさんがすんでいました。


Phrase-based spacing can be achieved with the following css:

p {
  word-space-transform: ideographic-space;
}

むかしむかし、 あるところに、 おじいさんと おばあさんが すんでいました。


Another common variant additionally restricts the allowable line breaks to these phrase boundaries. Using the same markup, this is easily achieved with the following css:

p {
  word-break: keep-all;
  word-space-transform: ideographic-space;
}

むかしむかし、 あるところに、 おじいさんと おばあさんが すんでいました。

Tests

In addition to making the source code more readable, using wbr rather than U+200B in the markup also allow authors to classify the delimiters into different groups.

In the following example, wbr elements are either unmarked when they delimit a word, or marked with class p when they also delimit a phrase.

<p>らいしゅう<wbr><wbr>じゅぎょう<wbr><wbr class=p
>たいこ<wbr><wbr>ばち<wbr><wbr class=p
>もって<wbr>きて<wbr>ください。

Using this, it is possible not only to enable the rather common phrase-based spacing, but also word-by-word spacing that is likely to be preferred by people with dyslexia to reduce ambiguities, or other variants such as a combination of phrase-based spacing and of word-based wrapping.

Usual rendering

らいしゅうじゅぎょうたいこばちもってきてください。


Phrase spacing
p wbr.p {
  word-space-transform: ideographic-space;
}

らいしゅうじゅぎょうに たいこばちを もってきてください。


Word spacing
p wbr {
  word-space-transform: ideographic-space;
}

らいしゅう の じゅぎょう に たいこ と ばち を もって きて ください。


Phrase spacing, word wrapping
p {
  word-break: keep-all;
}
p wbr.p {
  word-space-transform: ideographic-space;
}

らいしゅうじゅぎょうに たいこばちを もってきてください。


Word spacing and wrapping
p {
  word-break: keep-all;
}
p wbr {
  word-space-transform: ideographic-space;
}

らいしゅう の じゅぎょう に たいこ と ばち を もって きて ください。

2.3. Order of Operations

Tests

This section has adequate test coverage.


When multiple transformations need to be applied, they are applied in the following order:

  1. word-space-transform
  2. capitalize, uppercase, and lowercase
  3. full-width
  4. full-size-kana
Tests

Word space transformation and text transformation happen after § 4.3.1 Phase I: Collapsing and Transformation but before § 4.3.2 Phase II: Trimming and Positioning. This means for instance that full-width only transforms spaces (U+0020) to U+3000 IDEOGRAPHIC SPACE within preserved white space.

Tests

Note: As defined in Appendix A: Text Processing Order of Operations, transforming affects line-breaking and other formatting operations.

Tests

3. White Space and Wrapping: the white-space property

Tests

This section has good overall test coverage, particularly through tests for § 4 White Space Processing & Control Characters and subsections, and longhand properties.

Missing tests:


Name: white-space
Value: normal | pre | pre-wrap | pre-line | <'white-space-collapse'> || <'text-wrap-mode'> || <'white-space-trim'>
Initial: normal
Applies to: text
Inherited: see individual properties
Percentages: n/a
Computed value: specified keyword
Canonical order: n/a
Animation type: discrete
Tests

This property is a shorthand for white-space-collapse, text-wrap-mode, and white-space-trim. It specifies two things:

Note: This shorthand combines both inheritable and non-inheritable properties. If this is a problem, please inform the CSSWG.

Unless otherwise specified, any omitted longhand is set to its initial value.

The following table gives the normative mapping of the values of the shorthand’s special keywords to their equivalent longhand values.

white-space white-space-collapse text-wrap-mode white-space-trim
normal collapse wrap none
pre preserve nowrap none
pre-wrap preserve wrap none
pre-line preserve-breaks wrap none

Note: In some cases, preserved white space and other space separators can hang when at the end of the line; this can affect whether they are measured for intrinsic sizing.

The following informative table summarizes the behavior of various white-space values:

New Lines Spaces and Tabs Text Wrapping End-of-line spaces End-of-line other space separators
normal Collapse Collapse Wrap Remove Hang
pre Preserve Preserve No wrap Preserve No wrap
nowrap Collapse Collapse No wrap Remove Hang
pre-wrap Preserve Preserve Wrap Hang Hang
break-spaces Preserve Preserve Wrap Wrap Wrap
pre-line Preserve Collapse Wrap Remove Hang

4. White Space Processing & Control Characters

Tests

This section has reasonably good test coverage.

Missing tests:


The source text of a document often contains formatting that is not relevant to the final rendering: for example, breaking the source into segments (lines) for ease of editing or adding white space characters such as tabs and spaces to indent the source code. CSS white space processing allows the author to control interpretation of such formatting: to preserve or collapse it away when rendering the document. White space processing in CSS (which is controlled with the white-space-collapse and white-space-trim properties) interprets white space characters only for rendering: it has no effect on the underlying document data.

Note: Depending on the document language, segments can be separated by a particular newline sequence (such as a line feed or CRLF pair), or delimited by some other mechanism, such as the SGML RECORD-START and RECORD-END tokens.

For CSS processing, each document language–defined “segment break” or “newline sequence”—​or if none are defined, each line feed (U+000A)—​in the text is treated as a segment break, which is then interpreted for rendering as specified by the white-space property.

In the case of HTML, newlines are normalized to line feed characters (U+000A) for representation in the DOM, so when an HTML document is represented as a DOM tree each line feed (U+000A) is treated as a segment break. [HTML] [DOM]

Note: In most common CSS implementations, HTML does not get styled directly. Instead, it is processed into a DOM tree, which is then styled. Unlike HTML, the DOM does not give any particular meaning to carriage returns (U+000D), so they are not treated as segment breaks. If carriage returns (U+000D) are inserted into the DOM by means other than HTML parsing, they then get treated as defined below.

Tests

Note: A document parser might not only normalize any segment breaks, but also collapse other space characters or otherwise process white space according to markup rules. Because CSS processing occurs after the parsing stage, it is not possible to restore these characters for styling. Therefore, some of the behavior specified below can be affected by these limitations and may be user agent dependent.

Note: Anonymous blocks consisting entirely of collapsible white space are removed from the rendering tree. Thus any such white space surrounding a block-level element is collapsed away. See CSS 2.1 § 9.2.2.1 Anonymous inline boxes. [CSS2]

Control characters (Unicode category Cc)—​other than tabs (U+0009), line feeds (U+000A), carriage returns (U+000D) and sequences that form a segment break—​must be rendered as a visible glyph which the UA must synthesize if the glyphs found in the font are not visible, and must be otherwise treated as any other character of the Other Symbols (So) general category and Common script. The UA may use a glyph provided by a font specifically for the control character, substitute the glyphs provided for the corresponding symbol in the Control Pictures block, generate a visual representation of its code point value, or use some other method to provide an appropriate visible glyph. As required by Unicode, unsupported Default_ignorable characters must be ignored for text rendering. [UNICODE]

Tests

Carriage returns (U+000D) are treated identically to spaces (U+0020) in all respects.

Tests

Note: For HTML documents, carriage returns present in the source code are converted to line feeds at the parsing stage (see HTML § 13.2.3.5 Preprocessing the input stream and the definition of normalize newlines in Infra and therefore do no appear as U+000D CARRIAGE RETURN to CSS. [HTML] [INFRA]) However, the character is preserved—​and the above rule observable—​when encoded using an escape sequence (&#x0d;).

4.1. White Space Collapsing: the white-space-collapse property

Tests

This section has limited direct coverage, but extensive coverage through the white-space shorthand.

Missing tests:


This section is still under discussion and may change in future drafts.

Name: white-space-collapse
Value: collapse | discard | preserve | preserve-breaks | preserve-spaces | break-spaces
Initial: collapse
Applies to: text
Inherited: yes
Percentages: n/a
Computed value: specified keyword
Canonical order: per grammar
Animation type: discrete
Tests

This property specifies whether and how white space is collapsed. Values have the following meanings, which must be interpreted according to the White Space Processing Rules:

collapse
This value directs user agents to collapse sequences of white space into a single character (or in some cases, no character).
Tests

Direct tests as a longhand:


    CSS 2 indirect tests via 'white-space: normal':


    CSS 2 indirect tests via 'white-space: nowrap':


    Indirect tests via 'white-space: normal':


    Indirect tests via 'white-space: nowrap':


    preserve
    This value prevents user agents from collapsing sequences of white space. Segment breaks such as line feeds are preserved as forced line breaks.
    Tests

    Direct tests as a longhand:


      CSS 2 indirect tests via 'white-space: pre':


      Indirect tests via 'white-space: pre':


      Indirect tests via 'white-space: pre-wrap':


      preserve-breaks
      Like collapse, this value collapses consecutive white space characters, but preserves segment breaks in the source as forced line breaks.
      Tests

      Direct tests as a longhand:


      CSS 2 indirect tests via 'white-space: pre-line':


      Indirect tests via 'white-space: pre-line':


      preserve-spaces
      This value prevents user agents from collapsing sequences of white space, and converts tabs and segment breaks to spaces. (This value is intended to represent the behavior of xml:space="preserve" in SVG.)
      break-spaces
      The behavior is identical to that of preserve, except that:

      Note: This value does not guarantee that there will never be any overflow due to white space: for example, if the line length is so short that even a single white space character does not fit, overflow is unavoidable.

      discard
      This value directs user agents to “discard” all white space in the element.
      Tests

      Does this preserve line break opportunities or no? Do we need a distinct "hide" value? If it preserves line break opportunities, maybe it should be replaced with a word-space-transform value?

      White space that was not removed or collapsed due to white space processing is called preserved white space.

      The following style rules implement MathML’s white space processing:

      @namespace m "http://www.w3.org/1998/Math/MathML";
      m|* {
        white-space-collapse: discard;
      }
      m|mi, m|mn, m|mo, m|ms, m|mtext {
        white-space-trim: discard-inner;
      }
      

      4.2. White Space Trimming: the white-space-trim property

      Tests

      This section mostly lacks tests


      Name: white-space-trim
      Value: none | discard-before || discard-after || discard-inner
      Initial: none
      Applies to: inline boxes and block containers
      Inherited: no
      Percentages: n/a
      Computed value: specified keyword(s)
      Canonical order: per grammar
      Animation type: discrete

      This property allows authors to specify trimming behavior at the beginning and end of a box. Values have the following meanings:

      discard-before
      This value directs the UA to collapse all collapsible whitespace immediately before the start of the element.
      discard-after
      This value directs the UA to collapse all collapsible whitespace immediately after the end of the element.
      discard-inner
      For block containers this value directs UAs to discard all whitespace at the beginning of the element up to and including the last segment break before the first non-white-space character in the element as well as to discard all white space at the end of the element starting with the first segment break after the last non-white-space character in the element. For other elements this value directs UAs to discard all whitespace at the beginning and end of the element.
      Tests

      Note: Discarding document white space using white-space-trim can change where soft wrap opportunities occur in the text.

      The following style rules render DT elements as a comma-separated list, even if they are coded on separate lines of the source document:

      dt { display: inline; }
      dt + dt:before { content: ", "; white-space-trim: discard-before; }
      

      The following style rule removes source-formatting white space adjacent to the opening/closing tags of a preformatted block, but not any indentation or interleaved white space applied to the actual contents of the element:

      pre { white-space: pre; white-space-trim: discard-inner; }
      

      This results in the following two source-code snippets:

      <pre>
      
        some
      preformatted
      
        text
      
      </pre>
      
      <pre>  some
      preformatted
      
        text</pre>
      
      rendering identically as:
        some
      preformatted
      
        text

      If instead we apply it to an inline element:

      span { white-space: normal; white-space-trim: discard-inner; }
      
      start[<span>
      
        some
      inline
        text
      
      </span>]end
      
      start[<span>  some
      inline
        text</span>]end
      
      this directs the UA to discard all of the leading/trailing white space before the actual contents of the element:
      start[some inline text]end
      

      White space processing for white-space-trim takes place before § 4.3.1 Phase I: Collapsing and Transformation.

      4.3. The White Space Processing Rules

      Tests

      This section has good test coverage. Most tests to be found in subsections.


      Except where specified otherwise, white space processing in CSS affects only the document white space characters: spaces (U+0020), tabs (U+0009), and segment breaks.

      Tests

      Note: The set of characters considered document white space (part of the document content) and those considered syntactic white space (part of the CSS syntax) are not necessarily identical. However, since both include spaces (U+0020), tabs (U+0009), and line feeds (U+000A) most authors won’t notice any differences.

      Besides space (U+0020) and no-break space (U+00A0), Unicode defines a number of additional space separator characters. [UNICODE] In this specification all characters in the Unicode general category Zs except space (U+0020) and no-break space (U+00A0) are collectively referred to as other space separators.

      Tests

      4.3.1. Phase I: Collapsing and Transformation

      Tests

      This section has good test coverage, most parts are well exercised.

      Missing tests:


      Note: white-space-trim is taken into account prior to this phase.

      For each inline (including anonymous inlines; see CSS 2.1 § 9.2.2.1 Anonymous inline boxes [CSS2]) within an inline formatting context, white space characters are processed as follows prior to line breaking and bidi reordering, ignoring bidi formatting characters (characters with the Bidi_Control property [UAX9]) as if they were not there:

      Tests
      Tests
      The following example illustrates the interaction of white-space collapsing and bidirectionality. Consider the following markup fragment, taking special note of spaces (with varied backgrounds and borders for emphasis and identification):
      <ltr>A <rtl> B </rtl> C</ltr>

      where the <ltr> element represents a left-to-right embedding and the <rtl> element represents a right-to-left embedding. If the white-space property is set to normal, the white-space processing model will result in the following:

      This will leave two spaces, one after the A in the left-to-right embedding level, and one after the B in the right-to-left embedding level. The text will then be ordered according to the Unicode bidirectional algorithm, with the end result being:

      A  BC

      Note that there will be two spaces between A and B, and none between B and C. This is best avoided by putting spaces outside the element instead of just inside the opening and closing tags and, where practical, by relying on implicit bidirectionality instead of explicit embedding levels.

      Tests

      4.3.2. Phase II: Trimming and Positioning

      Tests

      This section has good test coverage, all parts are well exercised.


      Then, the entire block is rendered. Inlines are laid out, taking bidi reordering into account, and wrapping as specified by the text-wrap-mode and text-wrap-style property. As each line is laid out,

      1. A sequence of collapsible spaces at the beginning of a line is removed.
        Tests
      2. If the tab size is zero, preserved tabs are not rendered. Otherwise, each preserved tab is rendered as a horizontal shift that lines up the start edge of the next glyph with the next tab stop. If this distance is less than 0.5ch, then the subsequent tab stop is used instead. Tab stops occur at points that are multiples of the tab size from the starting content edge of the preserved tab’s nearest block container ancestor. The tab size is given by the tab-size property.
        Tests

        Note: See the Unicode rules on how tabulation (U+0009) interacts with bidi. [UAX9]

        Tests
      3. A sequence of collapsible spaces at the end of a line is removed, as well as any trailing U+1680   OGHAM SPACE MARK whose white-space-collapse property is collapse or preserve-breaks.
        Tests

        Note: Due to Unicode Bidirectional Algorithm rule L1, a sequence of collapsible spaces located at the end of the line prior to bidi reordering will also be at the end of the line after reordering. [UAX9] [CSS-WRITING-MODES-4]

        Tests
      4. If there remains any sequence of white space, other space separators, and/or preserved tabs at the end of a line (after bidi reordering [CSS-WRITING-MODES-4]):