Requirements for Latin Text Layout and Pagination

Editor’s Draft,

This version:
http://w3c.github.io/dpub-pagination/
Latest version:
http://www.w3.org/TR/dpub-latinreq/
Previous Versions:
http://www.w3.org/TR/2014/WD-dpub-latinreq-20140313/
Feedback:
public-digipub@w3.org with subject line “[dpub-latinreq] … message topic …” (archives)
Issue Tracking:
GitHub
Inline In Spec
Editor:
(Hachette Livre)

Abstract

This document describes requirements for pagination and layout of books that use the Latin script, based on the tradition of print book design and composition. It is hoped that these principles can inform the pagination of digital content as well, and serve as a reference for the CSS Working Group and other interested parties. This work was inspired by [JLREQ].

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is an Editor’s Draft.

Publication as a Editor’s Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

The (archived) public mailing list public-digipub@w3.org (see instructions) is preferred for discussion of this specification. When sending e-mail, please put the text “dpub-latinreq” in the subject, preferably like this: “[dpub-latinreq] …summary of comment…

This document was produced by the Digital Publishing Interest Group (part of the Digital Publishing Activity).

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 August 2014 W3C Process Document.

Table of Contents

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a work in progress. No section should be considered final, and the absence of any content does not imply that such content is out of scope, or may not appear in the future. If you feel something should be covered here, tell us! The initial version of this document will focus on books, and at this time will not include requirements specific to magazines or newspapers. The scope will depend heavily on the willingness of people to contribute to this document. Please contact the Digital Publishing Interest Group if you would like to help. Once the document is stable, the group will publish it as an Interest Group Note.

This document was published by the Digital Publishing Interest Group as an Editor’s Draft. If you wish to make comments regarding this document, please send them to public-digipub@w3.org (subscribe, archives). All comments are welcome.

Publication as an Editor’s Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 August 2014 W3C Process Document.

1. Introduction

Not all stories worth telling can fit in a tweet, on a computer screen, or on a single piece of paper. Ever since the codex replaced the scroll, humans have divided our stories into pages. Pagination is the art and the craft of turning that scroll of content into discrete pieces, whether destined for book pages or screens. Pagination requires us to think about the document at all levels, from the total number of pages to the tiny spaces between letters. Along with graphic design and typography, it determines the look of the page.

Typography is the craft of endowing human language with a durable visual form, and thus with an independent existence.

—Robert Bringhurst, The Elements of Typographic Style

Good pagination, like good typography, aims to be invisible. As the reader turns the page, the stream of words and images in her mind should not be interrupted. Two thousand years of experience have taught us how best to do this. The goal of this document is to describe those rules, as clearly as possible, so they can be implemented in the Open Web Platform. We hope for a day where the pagination of digital books will be as beautiful and transparent as the best printed books.

Note: Our goal is for this document to describe layout and pagination for all languages that use the Latin script.

2. Fundamentals

Makeup is a highly skilled procedure. If the text is merely divided mechanically into portions of equal length, without regard to where the divisions fall, some of the pages that result are bound to be unacceptable logically or aesthetically: they will incorporate bad breaks.

Chicago Manual of Style, 14th Edition, 19.40.

What therefore God hath joined together, let not man put asunder.

The Bible, Matthew 19:6

Every rule of pagination boils down to a single principle: break pages with as little disruption to the reading experience as possible. A widow leaves the last line of a paragraph isolated from the rest of the thought. A recto hyphen means a word is interrupted by a page turn. A heading at the bottom of a page removes the title from the section, and the section from the title.

2.1. Tradeoffs

Pagination involves tradeoffs. Fixing a widow may result in a misaligned spread. Fixing that may result in a loose line or paragraph. What is acceptable in one book, or for one publisher, may be unacceptable to another. What is acceptable in one country, or language, may be unacceptable elsewhere.

2.2. Untangling the Vertical and the Horizontal

Page breaks are often line breaks. The tiniest change in kerning can make a paragraph longer or shorter, and thus create a widow or an orphan. The work of pagination, as done by typesetters, human or machine, inevitably involves the consideration of the lines of text. And so we will not try too hard to avoid talking about line breaks, when they potentially influence pagination.

3. The Spatial Geometry of Pages: Spreads and Bleeds

Open a printed book and what you see isn’t a single page, but two pages side by side—a spread.

diagram of facing text pages
Text spread

Books set in Latin scripts typically share some common features.

3.1. Crossing the Gutter

Large images, tables, or sidebars may extend across both pages of a spread.

image crossing gutter
Image extends across both pages in spread

3.2. Page size, orientation, and arrangement

The size, orientation, and arrangement of pages might vary even in a single book:

3.3. Bleeds

Books are printed on large sheets of paper, which are then folded and cut. Since the cutting is not infinitely precise, any object that should extend to the very edge of the page in fact needs to extend a bit outside the page boundary. A bleed is the part of an object that extends outside the page, generally by a small amount such as 9 points.

bleed
Image bleeds to top and outside

Note: Imposition and related topics are out-of-scope for this document.

4. Hyphenation and Justification

Good hyphenation and justification is critically important to the appearance and readability of text. Print typesetting systems can often achieve very good results, but most online reading systems do this very poorly.

4.1. Hyphenation

Text is often easier to read when words are allowed to break at the end of lines, thus avoiding massive variations in word-spacing or margins. But determining acceptable places to break words is a difficult problem:

All of the following are the results of automated hyphenation algorithms:

pre-ached
wee-knights
read-just
leg-ends
ex-acting
co-inage

Words hyphenate differently based on pronunciation or meaning:

photo-graph
pho-togra-pher
 
re-cord (verb)
rec-ord (noun)
 
pres-ent (verb)
pre-sent (noun)
 
cre-ator
crea-ture

4.1.1. Parameters for Hyphenation

The following choices need to be made when considering hyphenation of text.

  1. Should this text be hyphenated at all? Hyphenation is generally suppressed in headings. [CSS3TEXT] includes a hyphens property, to enable or disable hyphenation.
  2. What’s the shortest word that can be hyphenated? Five or six is typical.
  3. What’s the minimum number of characters allowed before a hyphen? Two is typical, and is sometimes stated as “two-up.” PrinceXML has the prince-hyphenate-before property, but this is not in any current CSS draft.
  4. What’s the minimum number of characters allowed after a hyphen? Three is typical, and can be stated as “three-down.” PrinceXML has the prince-hyphenate-after property, but this is not in any current CSS draft.
  5. How many consecutive lines can end with a hyphen (known as a “ladder”)? Two or three is typical. PrinceXML has the prince-hyphenate-lines property, but this is not in any current CSS draft.
  6. Should capitalized words be hyphenated?
  7. Can the last word of a paragraph be hyphenated?
  8. Can the last word in a column, page, or spread be hyphenated?

4.1.2. Choosing hyphenation points

A key question is, “who decides what is acceptable?” The answer depends on the language, the culture, the subject matter, and the material being typeset.

4.1.2.1. Language

Each language has its own conventions about hyphenation. U.S. English hyphenates differently than U.K. English. In some European languages, words may be spelled differently when hyphenated.

TK

Of course, the same text may include words from many different languages.

4.1.2.2. Culture

Even within the same language, authorities differ on the proper hyphenation of words.

in-de-pen-dent (American Heritage Dictionary)
in-de-pend-ent (Webster’s)

Copyeditors will often specify a canonical reference for hyphenation, which is usually a particular edition of a particular dictionary.

4.1.2.3. Subject Matter

Specialized subject matter may require additional hyphenation dictionaries. This is common in medicine, law, and science.

4.1.2.4. Exceptions

Authors should be able to provide a list of exceptions, which add to or override what the system would normally do. The format for doing so should be easily understood.

TeX uses the following format. Possible hyphenation positions are indicated with (surprise!) hyphens. Hyphenation should be prevented where hyphens are absent.

\hyphenation { 
sur-pris-ingly 
tan-ta-liz-ing-ly 
these
}

PrinceXML uses a prefixed property prince-hyphenate-patterns: url('en_US.dic'); to load a hyphenation dictionary. No current CSS specification includes support for this idea.

4.2. Justification

4.2.1. Algorithms

4.2.1.1. Greedy
4.2.1.2. Knuth-Plass (TeX)
4.2.1.3. Adobe (InDesign)

5. Paginating Single-Column Text

The simplest situation, which is very common, is when the content is only text, in a single column. Aside from chapter and book optimizations (to be discussed later) and line-breaking, the biggest issue is likely to be widows (see this figure for an example).

5.1. Widows

A widow is when the last line(s) of a paragraph falls at the top of a page. Publishers have different standards. Most frown on a single line at the top of the page, although some are OK if that line spans at least three-quarters of the page. Others require at least two lines of a paragraph at the top of a page.

Note: [css3-break] includes the widows and orphans properties, with integer values.

[css3-break] does not consider a fractional value for the widow or orphan properties.

diagram of facing text pages with widow at top of verso page
Text spread with widow

Many typesetting systems have settings to prevent widows. CSS discusses these issues in [css3-break]. Unfortunately, these systems usually create another problem when they fix the widow. In this figure, there’s no longer a widow at the top of the page, but since the system merely moved a line from the left page to the right, it left behind an empty line, and the pages no longer align at the bottom.

diagram of facing text pages where left page is one line shorter than right page
Widow fixed, but pages don’t align

More needs to be done. Removing one line of text from each page of the spread, shown in this figure, solves the problem.

diagram of facing text pages where left page is one line shorter than right page
Widow fixed by “running short.”

5.2. Orphans

An orphan has two possible meanings in typesetting. It can refer to the minimum number of lines required before a page break (as in [css3-break]). It can also refer to the last line of a paragraph in any context. In the former context, many publishers now accept a single line of a paragraph before a page break. For the latter, standards vary widely. Some publishers want the last line to be longer than the paragraph indent. Some require one or two full words, or a certain number of characters. Most avoid having only a fragment of a word as the last line.

CSS does not currently address the second meaning of orphan.

5.3. Constraints on page depth

In traditional typesetting, the first defense against bad breaks is to change the depth of the page. “Running long” or “running short” means including one more (or one less) line of text on each page of the spread, thus sidestepping the previously-identified issue.

A typical book design includes instructions on whether it’s acceptable to run short, long, or (more rarely) both. Often there are also constraints on how many consecutive spreads (or pages) may be altered in this way. If running both long and short, it’s usually forbidden to go from one to another without an intervening normal spread.

Running long or short may affect the space between the last line of text and a page footer or folio. Most publishers prefer footers to be in a fixed position. If, instead, the space between the last line of text and the footer is fixed, the footer is said to "bounce."

5.4. Facing Pages

If a document has facing pages, the publisher usually requires that they align top and bottom. Exceptions include:

  1. It’s the last page of a chapter.
  2. The page contains no text—only images or tables
  3. When aligning facing pages will make some other issue worse

5.5. Recto and Verso Hyphens

Publishers sometimes constrain what characters may appear before a page break. Most commonly, the right-hand page of a spread may not end with a word fragment, as the reader must turn the page before reading the rest of the word. Less common is a prohibition on the verso page ending with a hyphen.

5.6. Space Breaks and Ornaments

Many novels, and some narrative non-fiction books, include small breaks in the text. These are usually represented by one to three blank lines, or by a small ornament or dingbat. Problems arise when these breaks fall at the top or bottom of a page.

Space break

If, however, the space break falls at the bottom of the page, confusion can result. In the figure below, it’s hard to tell there’s a space break, as it just looks like the page is a few lines short.

Incorrect: Space break at bottom of page

In that case, asterisks or some other ornament is added to the top or bottom of a page, as a visual reminder of the break. To get everything to work out, the spread was run short, and the space break (now with ornament) pushed to the top of the second page:

Space break at top of page with asterisks

This is an example where the page position of an element determines its content as well as design. A ::page-top or ::page-bottom pseudo-element might prove useful.

6. Paragraphs and indentation

7. Fonts

Texts are built from letters. Modern typesetting systems must be able to choose from hundreds or thousands of glyphs depending on the circumstances.

7.1. Ligatures

Two or more letters may be better displayed as a single glyph:

Example TK

7.2. Numbers and math

7.2.1. Lining, oldstyle, and tabular figures

Traditionally, text set in mixed-case type should use old-style figures. Text set in all caps should use lining figures. Columns of numbers (such as in tables) are clearer using tabular figures, which are of uniform width.

excerpt from Moby-Dick showing old-style figures
Old-style figures in text
excerpt from Moby-Dick showing lining figures
Lining figures in headline

7.3. Alternate forms

7.3.1. Caps

Text may be set in a mix of uppercase and lowercase, exclusively in upper- or lowercase, in small caps, in caps/small caps, etc. In many cases this is purely a design choice, and the displayed text may use a different case than the source document.

7.3.2. Swashes and Stylistic Alternates

two identical headings, the top without swashes and the bottom with swashes
Bickham Script Pro without (top) and with (bottom) swashes

8. Initial Capitals

Large, decorative letters have been used to start new sections of text since long before printing. In fact, their use predates lowercase letters entirely.

8.1. Drop caps

A drop cap is a larger-than-usual letter at the start of a paragraph, with a baseline at least one line lower than the first baseline of the paragraph. The size of drop caps is usually indicated by how many lines they occupy—two-line and three-line drop caps are the most common.

Two-line drop cap

Aligning the letter vertically is a challenge. The cap height of the letter should align with the cap height of the first line of text. The baseline of the letter should fall on the baseline of one of the following lines (the second for a 2-line drop cap, etc.).

Three-line drop cap

The horizontal position of the drop cap and the surrounding text is also an issue, as variations in glyph shapes may require increasing or decreasing space to the right of the drop cap, and in some cases separate adjustments may be required for each line adjacent to the drop cap.

Drop cap without runaround
Drop cap with runaround

The position of a drop cap in relation to the left margin may also need to be adjusted. Letters like "C" may need to move left slightly to visually align with the left margin.

A drop cap may be desired on a paragraph which starts with a punctuation mark, most often a quotation mark. In this case, one option is to delete the quotation mark entirely.

Input on techniques for coping with initial punctuation on drop caps would be appreciated.

8.2. Raised caps and sunken caps

A raised cap is a large letter used to start a paragraph, which uses the same baseline as the rest of the first line. A sunken cap both sinks below the text baseline, and extends above.

raised cap
Raised cap. The initial letter is the size of a 3-line initial, but does not drop.
sunken drop initial
Sunken cap. The letter drops to the second line, but is the size of a three-line initial letter.

Note: The CSS Working Group has proposed an initial-letter property to allow for properly-aligned drop caps. See dev.w3.org/csswg/css-inline/#DropInitial.

9. Running headers and footers

Books often have material printed at the top and/or bottom of each page, outside the normal content area. These headers or footers may serve as guideposts for reader, fodder for designers, low-tech DRM, or merely a way to know what book your fellow train passenger is reading. There’s more to running headers than is dreamt of in the open web platform…

9.1. Content

Running heads and footers may contain:

In some cases the content of running heads may have an internal structure—a chapter title might have an italic word—or may require different text styles or fonts.

Running head with text ornament

In this example, the running header contains the author name, the page number, and an ornament. This seemingly simple case was quite complex, using [css3-gcpm]-like features implemented by PrinceXML.

@page body:left {
  @top-center { 
    content: flow(verso);
    }
  }

p.verso-cus {
  flow: static(verso);
  content: prince-glyph-index(80);
  font-family: 'Type Embellishments One';
  font-size: 10pt;
  text-align: center;
}

p.verso-cus:before {
  content: counter(page);
  display: inline;
  padding-right: 15pt;
  font-family: 'Garamond 3 LT Std';
  font-style: italic;
  font-size: 10pt;
}

p.verso-cus:after {
  content: string(flow-header-left-rw);
  display: inline;
  padding-left: 15pt;
  font-family: 'Garamond 3 LT Std';
  font-style: italic;
  font-size: 10pt;
}

9.2. Which content?

An element whose content is used in running heads may appear many times on a page. Authors must be able to specify which instance is used. [css3-gcpm] provides the start, first, last, and first-except keywords to accomplish this:

first
The value of the first assignment on the page is used. If there is no assignment on the page, the "entry value" is used.
start
If the element is the first element on the page, the value of the first assignment is used. Otherwise the "entry value" is used. The "entry value" may be empty if the element hasn’t yet appeared.
last
The "exit value" of the named string is used.
first-except
This is identical to first, except that the empty string is used on the page where the value is assigned.

Are these values enough to handle indexes, dictionaries, and other use cases?

9.3. Placement

Running headers and footers may appear in almost any position on a page.

Running headers are addressed by [css3-gcpm].

EPUB 3.0 has now deprecated support for headers and footers using oeb-page-head and oeb-page-foot.

10. Heads

10.1. General Considerations

TK

10.2. Heads at the top of a page

When a head falls at the top of a page, a spacing adjustment is often necessary. Here’s a typical arrangement, with a line and a half of space above the head, and a half-line-space below, so that the text stays on the proper baselines.

Level One Head in Text

If that head appears at the top of the page, the subsequent text will be off by a half-line.

Head at top of page

Everything works out if we add a half-line-space back.

Head sunk to get back on lead

10.3. Heads at the bottom of a page

A head should never be the last thing on a page; it must be followed by two or three lines of text.

10.4. Bridge heads, side heads, and run-in heads

TK

11. Images

11.1. TK

Figure with caption and runaround

Some things to note about this image:

  1. the caption and image are treated as a unit
  2. Text runs around the image+caption
  3. image runs right up to the gutter of the page (i.e. extends beyond usual content area)

11.2. Inline images

TK

11.3. Bleeds

TK

Images that cross spread

image before callout?

placing multiple images on page… inside/outside, top/bottom, stagger

broadside

placement of caption/title

12. Tables

12.1. Alignment

Many tables have specialized requirements for the alignment of cells in a given column.

12.1.1. Align on character

All entries in a given column may need to align to a predetermined character, most commonly a decimal point. Typically, the longest entry in the column should be centered, and then the other entries should align to that entry.

In some cases, a composite “longest entry” needs to be constructed:

|   445.85  |
| 12345.6   |
|     1.234 |
|      .1   |

In this case, the user agent should act as if 12345.234 was the longest line, so the margin to the left of 12345.6 will be equal to the margin to the right of 1.234.

When a collection of whole numbers with no decimal points are in a column and are asked to align, the longest whole number should center in the column and the rest of the whole numbers should right align on the right indent of the longest whole number.

If the content of a table cell is being aligned to a character, that content should not have wrapping applied by the rendering system.

12.1.2. Flush left center alignment

What should we call this?

Also known as centering on the longest line, the longest line in a column is found and centered, and other entries in the column are aligned to the left edge of the longest line.

As before, header and footer cells are ignored, and the author should be able to exclude specified cells from the alignment process.

This type of alignment is often used in text, for poetry or prose extracts.

User agents should not break single-word cells.

12.2. Table widths

In print, tables are not randomly sized but typically set to one of a few fixed widths. This requirement necessitates that a composition engine know how to “snap to” one of the desired widths. This may help show relationships between separate tables.

broadside

placement of caption/title

spread

multi-page

continued lines

13. Lists, Indexes, and Tables of Contents

13.1. Indexes

13.1.1. Collapsing Page Ranges

When generating indexes or referring to page ranges, one often ends up with duplicated or sequential numbers.

1, 3, 3, 7, 8, 9, 10, 16

This should be formatted as:

1, 3, 7–10, 16

with duplicates removed and consecutive numbers replaced by ranges.

14. Footnotes

Having to read footnotes resembles having to go downstairs to answer the door while in the midst of making love.

—Noël Coward

In print publishing, a footnote consists of two parts: a reference (often rendered as an asterisk or superscripted number) and the footnote body.

Footnotes themselves can be quite complicated. Footnotes can contain multiple paragraphs, block quotes, poems, lists, and tables. Footnotes can contain other footnotes (an edge case, admittedly, but David Foster Wallace was notorious for this). Footnotes can extend across multiple pages. In short, a footnote is a container that can hold almost anything.

In order to describe footnotes in HTML, one must separate the footnote reference (which is an inline element) from the footnote itself, as HTML frowns on placing complex block structures inside paragraphs. This is quite different from something like DocBook, where the content model allows a footnote element inside a paragraph, and that footnote can itself contain multiple paragraphs, etc.

<p>It was the best of times<span class="ref-footnote-rw">*</span>, it was the blurst of times.</p>

<div class="block-rw footnotes-rw">
<p><span class="num-footnote-rw">*</span>Oh yes, but the telephone is so impersonal.</p>
<p>I prefer the hands-on touch you only get with hired goons.</p>
</div>

There may also be more than one reference to the same footnote.

Footnote handling as described in [css3-gcpm] assumes the footnote is coded inline at the point of reference. This situation is under discussion on the www-style list.

14.1. Inline footnotes and multiple footnote regions

Some types of footnotes may be displayed inline, as in the top figure. Other books (see below) may have two separate streams of footnotes, requiring two footnote regions.

Page of Hamlet showing inline footnotes above block footnotes
Inline footnotes
Page of Hamlet showing inline footnotes above block footnotes
Multiple footnote regions
Multiple footnote regions #2

14.2. At the foot of what?

Footnotes usually fall at the bottom of the page, but may need to be at placed at the end of a column, table, sidebar, or other document structure.

14.3. Breaking footnotes across pages

Some footnotes can extend across more than one page. Limits on the size of the footnote area(s) may be required, so that a page containing only footnotes is avoided.

Note: Sometimes, footnotes may require so much space that they cannot all be placed before the end of a document section. In this case, it’s acceptable to have pages that consist only of footnotes.

14.4. Numbering

Three questions must be answered when numbering footnotes. First, which numbering scheme should be used? Second, what are we actually numbering? Third, is the numbering system reset at some point in the document?

14.4.1. Numbering schemes

Footnotes are most commonly numbered with arabic numerals, lower-case letters, or a sequence of symbols: *, †, ‡, and §, ||, and #. With symbols, they may be doubled or tripled after exhausting the sequence, but long before |||||| is used, the choice of numbering should be re-evaluated.

14.4.2. What are we counting?

Usually, footnote numbers count footnotes. But in some cases, the reference may be a line number, paragraph number, or section number.

14.4.3. Resetting numbers

Footnote numbering may restart with each new chapter, or each new page. The former is common with numeric footnotes, the latter with footnotes using symbols.

Note: Footnotes are addressed by [css3-gcpm].

Note: Digital publications often render footnotes differently from print. They may become pop-ups, move to the end of the section, or to the end of the document. We are not currently attempting to document digital best practices around footnotes.

15. Cross-references

Books often contain text that refers to other components of the same book. Such text commonly consists of the name or title of that component, along with a number used to identify that component.

as described in Chapter 14
From the Aiguille du Midi, follow A16 to 2950m
Make Anchovy Mayonnaise (page 762), with 6 or 8 anchovies.
For another example, see Figure 1.4.
and equation (31.3) shows it does not!

Many typesetting systems allow authors to generate numbers for such components. A cross-reference needs to be able to access such generated content from another location in the document.

Note: CSS provides counters to number things; creating cross-references would require a mechanism to access the value of a counter at a particular location. The target-counter and target-counters functions in [css3-content] are designed to do this.

[css3-content] does not have a mechanism to customize the content of a cross-reference based on the type of element being referred to. See https://lists.w3.org/Archives/Public/public-digipub/2015Aug/0079.html and subsequent discussion.

Sidebar

Some things to notice:

  1. The image floats to the top of the column inside the sidebar
  2. The columns themselves base-align
  3. The sidebar title and “supertitle” are on the same line.

17. Marginalia

alignment with reference

18. Equations

Mathematics is a critical part of many books, from learning materials for kindergartners to monographs on physics.

18.1. Breaking equations

TK

18.2. Numbering equations

Equations are often numbered. In the figure below, note that the equations are centered horizontally, the equation number is flush right, and the equation number is centered vertically relative to the equation.

Excerpt from Wikipedia article on differential forms, showing Maxwell’s equations in two lines, centered horizontally, with equation number aligned to the right margin and vertically centered
Equation Number

Note: The alignment in this figure was implemented with [css-flexbox-1].

18.3. Aligning equations

Some publishers require that all equations on a page align on the equals sign.

    x + 3z = 7 + 2y
    
2x + y + z = 4

Intervening text which may
extend for several lines


   10 + 2y = 3x + 2z

Note: the alignment is generally scoped to a page or (more likely) a defined set of equations.

Note: this is similar to how numbers in a table column may align on a decimal point or other character.

18.4. Annotating equations

Annotated Equation from XKCD’s What if?

19. Columns

Often the first page of a chapter or article will be set in a single column, and subsequent pages set in multiple columns.

diagram of facing text pages with one column on first page, and two on second page
One column text flows to two columns

20. Punctuation

Spacing around punctuation marks is a known obsession of typographers.

20.1. Language-specific spacing rules

Punctuation English French Spanish
Exclamation Point ! [thin space]! ¡text!
Colon : [thin space]: :
Question Mark ? [thin space]? ¿text?
Open Quote «[thin space]
Close Quote [thin space]»

20.2. Em-dashes and en-dashes

To space or not to space? That is the question. Even within publishing houses, arguments continue over the proper display of em-dashes. Some imprints at Hachette use closed em-dashes, others insist on thin spaces around em-dashes. If the same book is to be published in the United Kingdom, em-dashes would be replaced with en-dashes, with larger spaces around them.

Given the subtlety of many of these rules, it’s helpful to use CSS to generate typographically-sophisticated output from material written by lay authors, or to adapt content to varying publisher or language requirements.

Older drafts of [css3-gcpm] contained a text-replace property, which has been implemented by PrinceXML.

body {
prince-text-replace: "—" "\200A—\200A";
}

In this example, we’re adding hair spaces around em-dashes.

20.3. Number formatting

Different languages have different conventions for formatting numbers. Punctuation marks are inserted at specified points in numbers to aid readability. For example, in English commas are used to separate groups of digits, and a period to denote the decimal point.

299,792,458.0

However, in Spanish and Norwegian the roles are reversed, with the period used to group digits, and the comma for the decimal point:

299.792.458,0
Language Grouping separator Decimal separator
Austria, Belgium, Brazil, Canada (fr), Czech Republic, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, Netherlands, Norway, Peru, Poland, Portugal, Romania, South Africa, Spain, Sweden, Switzerland space comma
Argentina, Austria, Brazil, Denmark, Germany, Italy, Portugal, Romania, Slovenia, Spain (older) period comma
Great Britain, United States comma period

Note: The CSSWG has informally proposed a method for formatting numbers.

21. Special Considerations for Genres

21.1. Education

21.2. Trade

21.3. STEM

21.4. Reference

22. Digital Issues

23. Large-Scale Issues in Pagination

23.1. Book optimization

In trade publishing, we often know how many pages will be in a book before it is written. The nature of printing and binding also mandate that the number of pages in a book be some multiple of eight, sixteen, or thirty-two pages. Publishers often limit how many blank pages are allowed at the end of a book.

23.2. Chapter optimization

A chapter that ends with only a few lines of text looks like a mistake, and wastes paper (or electrons!) Generally a page should contain at least five lines of text.

Appendix A: Baseline Grids

A baseline grid is a series of evenly-spaced horizontal alignment lines. This is used to provide a vertical rhythm for a design, to align adjacent content (text or graphics), and to align baselines on facing pages in printed material.

The grid lines can be spaced at line-height intervals or a factor of line-height.

Content can be aligned to the grid in various ways. Roman body text typically sets the baseline on a grid line. Graphics might have their top, bottom or both set on grid lines, or be centered between grid lines. Text blocks (consider a multi-line heading with line-height at 1.4x grid height) might have their last baseline or first baseline on a grid line, or have the block’s combined height centered between grid lines. Centering is much more important in ideographic type systems.

If normal layout would result in a misalignment, content shifts down to the next available grid line.

Sometimes it’s necessary to have particular content opt out of aligning to a grid.

There can be one or more grids per document. Multiple grids can overlap (body grid and side content grid) or run in series (a vertical stack of pages). Grids can be nested (think of a document being represented as a graphic inside another document). A particular piece of content only aligns to a single grid.

Appendix B: Of Leading and Sinkage: The Language of Print

Translating print designs to the open web platform can be tricky.

Leading
Line-height
Recto
Right-hand page of a spread
Verso
Left-hand page of a spread

The Classical Rules of Hyphenation and Pagination

  1. At hyphenated line-ends, leave at least two characters behind, and take at least three forward.
  2. Avoid leaving the stub-end of a hyphenated word, or any word shorter than four letters, as the last line of a paragraph.
  3. Avoid more than three consecutive hyphenated lines.
  4. Hyphenate proper names only as a last resort unless they occur with the frequency of common nouns.
  5. Hyphenate according to the conventions of the language.
  6. Link short numerical and mathematical expressions with hard spaces.
  7. Avoid beginning more than two consecutive lines with the same word.
  8. Never begin a page with the last line of a multi-line paragraph.
  9. Balance facing pages by moving single lines.
  10. Avoid hyphenated breaks where the text is interrupted.
  11. Abandon any and all rules of hyphenation and pagination that fail to serve the needs of the text.

Further Reading

Bringhurst, Robert. The Elements of Typographic Style

Felici, Jim. The Complete Manual of Typography

Haralambous, Yannis. Fonts & Encodings: From Advanced Typography to Unicode and Everything in Between

Haslam, Andrew. Book Design

Highsmith, Cyrus. Inside Paragraphs

Kane, John. A Type Primer

Knuth, Donald. Digital Typography

Lawson, Alexander. Anatomy of a Typeface

Mitchell; Wightman. Book Typography

Nickel, Kristina. Ready to Print

Steer, Vincent. Printing Design and Layout (1948)

Tracy, Walter. Letters of Credit: A View of Type Design

Tschichold, Jan. The Form of the Book: Essays on the Morality of Good Design

Acknowledgments

Eric Aubourg, Luc Audrain, Bert Bos, Tom Byrer, James Clark, Brady Duga, Ivan Herman, Tony Graham, Bill Kasdorf, Jean Kaplansky, Sanders Kleinfeld, Liam Quin, Alan Stearns, Tzviya Siegman

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

Informative References

[CSS3TEXT]
Elika Etemad; Koji Ishii. CSS Text Module Level 3. 10 October 2013. LCWD. URL: http://www.w3.org/TR/css-text-3/
[CSS-FLEXBOX-1]
Tab Atkins Jr.; Elika Etemad; Rossen Atanassov. CSS Flexible Box Layout Module Level 1. 14 May 2015. LCWD. URL: http://www.w3.org/TR/css-flexbox-1/
[CSS3-BREAK]
Rossen Atanassov; Elika Etemad. CSS Fragmentation Module Level 3. 29 January 2015. WD. URL: http://www.w3.org/TR/css3-break/
[CSS3-CONTENT]
Ian Hickson. CSS3 Generated and Replaced Content Module. 14 May 2003. WD. URL: http://www.w3.org/TR/css3-content
[CSS3-GCPM]
Dave Cramer. CSS Generated Content for Paged Media Module. 13 May 2014. WD. URL: http://www.w3.org/TR/css-gcpm-3/
[CSS3-PAGE]
Melinda Grant; et al. CSS Paged Media Module Level 3. 14 March 2013. WD. URL: http://www.w3.org/TR/css3-page/
[JLREQ]
Yasuhiro Anan; et al. Requirements for Japanese Text Layout. 3 April 2012. NOTE. URL: http://www.w3.org/TR/jlreq/

Issues Index

PrinceXML has the prince-hyphenate-before property, but this is not in any current CSS draft.
PrinceXML has the prince-hyphenate-after property, but this is not in any current CSS draft.
PrinceXML has the prince-hyphenate-lines property, but this is not in any current CSS draft.
PrinceXML uses a prefixed property prince-hyphenate-patterns: url('en_US.dic'); to load a hyphenation dictionary. No current CSS specification includes support for this idea.
[css3-break] does not consider a fractional value for the widow or orphan properties.
CSS does not currently address the second meaning of orphan.
This is an example where the page position of an element determines its content as well as design. A ::page-top or ::page-bottom pseudo-element might prove useful.
Input on techniques for coping with initial punctuation on drop caps would be appreciated.
Are these values enough to handle indexes, dictionaries, and other use cases?
What should we call this?
[css3-content] does not have a mechanism to customize the content of a cross-reference based on the type of element being referred to. See https://lists.w3.org/Archives/Public/public-digipub/2015Aug/0079.html and subsequent discussion.