Internationalization tips for linking to headings & figures

It can be very helpful, especially for longer documents, to use scripting so that an HTML page automatically generates the content of a link that points to a heading, a figure, or a table. For example, the script might pull the actual heading text into the link, and if the headings are automatically numbered using CSS, may also pull in the number of the heading. As the document evolves, and the heading text changes, or its position in the document changes, there is no need to update all the links pointing to that heading.

If you are linking to headings and figures for a page that will be translated into another language, or for a multilingual page, this article looks at things you need to bear in mind and provides markup templates that will be helpful.

We won't provide script code to do the job in this article, but will focus instead on the markup you should aim to produce. Most of the examples of markup in the article represent the outcome of the link generation process, rather than the markup that the content author actually uses.

Summary

Here we summarise the bare bones of the solution. Read the rest of the article for more detail.

The markup used by the content author can vary, but should allow them to easily see what the link will eventually point to. The following is just an example.

<p>A description of <a>#myFigure</a> is provided in <a>#myHeading</a>.</p>

This article focuses on the outcome of the link generation process, rather than the markup that the content author actually uses.

Links to unnumbered headings would typically benefit from a structure such as the following.

<a class="secref" href="#myHeading" lang="cy" dir="ltr">Gwneud y we fyd-eang yn <em>wirioneddol</em> fyd-eang!</a>

However, for numbered headings, the number should be isolated from the rest of the heading text, and the lang and dir attributes should be applied to the latter, as in this example.

<a class="secref" href="#myHeading4">2. <span lang="arb" dir="rtl">نشاط التدويل، W3C</span></a>

For the link text, include all the markup inside the heading tag, not just the characters.

Also identify the type of link using semantic class names, so that, for example, links to headings can be styled differently from links to figures.

The lang attribute indicates the computed language of the heading itself.

The dir attribute indicates the computed direction of the heading itself.

If your code generates text, such as "Fig." to be included in the link text, ensure that it is easily localised for translated content. Also, don't assume that spacing and punctuation will be the same for all languages: ensure that the punctuation and spacing are part of the localised string. For example, a link to Fig. 12 would become 図12 in Japanese.

The rest of the article provides additional details.

Extracting header text

Make sure that you extract the markup inside the heading, rather than just the characters.

Avoid using .textContent to access the text of a heading. Instead use a method that copies all the nodes inside the heading tag.

Apart from the fact that it often makes the link text more readable when you include the inline markup that the heading contained, this is particularly important if the heading text contains markup that sets the language or direction of the text it contains.

For example, the following heading uses two important spans. The first makes it possible to apply a font capable of handling the diacritics in the transcription, and the second both supplies language (enabling use of an appropriate Syriac Estrangelo font) and direction information. It's important to carry that markup into the link text.

<h2>Using <span class="transcription">maǧlīyānā</span> (<span lang="syr-Syre" dir="rtl">ܡܓ̰ܠܝܢܐ‎</span>) to reproduce classical phonemes</h2>

Differentiating header links, figure links, etc.

In some languages is may be desirable to style links containing heading text differently from links containing figure pointers.

For this reason it is useful to distinguish the various different types of link from each other, using class names to indicate the semantics of the markup.

Here's an example of class names that distinguish the figure from the header.

<p>A description of <a class="figref" href="#myFigure">Fig. 12</a> is provided in <a class="secref" href="#myHeading">Description of the figure</a>.</p>

Generated text

In a multilingual document or a translation of a document it may be necessary to adapt any text generated by your script.

For instance, the example in the previous section generates the text "Fig. " and inserts it into the link text alongside the number of that figure. If the same script is used for a Japanese translation of the document, or is applied to a section of the document that is in Japanese rather than in English, it would be necessary to replace the generated text with "図".

Note, significantly, that no period or space is needed for the Japanese version. It's important not to assume in your script that all languages will use punctuation and spacing in the same way. Those characters should be part of the localised replacement string, rather than added by a separate line of code.

The box below shows what the generated link markup could look like. (Note also that there is no space inserted around the link text "図12", either.)

<p><a class="figref" href="#myFigure">図12</a>の説明は<a class="secref" href="#myHeading">図の説明</a>にあります。</p>

Applying important contextual information

The following guidelines are particularly important when dealing with a heading in a different language from the paragraph containing the link, but some aspects may also be important in monolingual situations. Including this markup should not cause any problems for the general case.

The language of the link text

Take an example of an English paragraph that refers to a section heading in Japanese. Here's the section heading:

国際化活動 W3C

A Japanese heading.

The markup for the heading is as follows. The language is indicated in the h4 tag in this case, but it may be indicated in a previous tag and inherited by the h4 tag.

<h4 id="myHeading" lang="ja">国際化活動 W3C</h4>

It's important to ensure that a Japanese font gets associated with the Japanese text in the link, rather than a Chinese font which is the default for ideographic characters in some browsers. However, if we only pull out the nodes inside the heading tag, we lose the information about the language being Japanese.

Incorrect result.

This is a paragraph that points to the section <a class="secref" href="#myHeading">国際化活動 W3C</a>.

The code inserting the reference link into the paragraph needs to get the computed language of the heading (as a whole) and apply that to the anchor tag. The end result we are seeking is like this.

This is a paragraph that points to the section <a class="secref" href="#myHeading" lang="ja">国際化活動 W3C</a>.

The base direction

There are similar considerations for documents containing text that runs in multiple directions; for example, when embedding Arabic, Hebrew, Dhivehi, etc text into a Latin context. It is essential to apply the correct base direction for a sequence of characters that is inserted into a paragraph, and this requires finding the computed direction of the markup around the original heading.

For example, take a heading such as:

<h4 id="myHeading" lang="ar" dir="rtl">نشاط التدويل، W3C</h4>

In this case the direction is declared on the heading tag itself, but it could be declared on an element further up the hierarchy. This code displays as shown just below. Note the location of the text "W3C".

نشاط التدويل، W3C

An Arabic heading with Latin text at the end.

Drop just the inner HTML of the heading tag into our previous paragraph, and you get the "W3C" on the wrong side of the Arabic text

Incorrect result.

This is a paragraph that points to نشاط التدويل، W3C.

If, on the other hand, we apply a dir attribute with the appropriate direction value around the inserted text ...

This is a paragraph that points to <a class="secref" href="#myHeading" lang="ar" dir="rtl">نشاط التدويل، W3C</a>.

... then this produces the expected result:

This is a paragraph that points to نشاط التدويل، W3C.

Bidirectional isolation

In bidirectional text, it generally makes sense to isolate strings or markup fragments that are dropped into a location in an HTML file, so that the text inside the string doesn't interact directionally with text outside it. In HTML5 the dir attribute isolates the content of the tag it is attached to from the surrounding text, so we have been isolating the reference from the surrounding paragraph already.

However, let's suppose that we want the reference link to include the number of the heading, as well as the heading text.

Suppose our heading looks like this in its original RTL context.

2. نشاط التدويل، W3C

A numbered Arabic heading.

We would expect to see something like this in an English paragraph:

This is a paragraph that points to 2. نشاط التدويل، W3C.

Unfortunately, if we simply follow the advice given so far (to put dir on the anchor tag), we would see the following:

Incorrect result.

This is a paragraph that points to 2. نشاط التدويل، W3C.

The "W3C" is in the right place, but the section number looks wrong. To make this work as expected, we need to also isolate the section number from the heading text, and add the directional information to the heading text, rather than to the anchor tag as a whole. The following code would do the job.

<p>This is a paragraph that points to <a class="secref" href="#myHeading4">2. <span lang="ar" dir="rtl">نشاط التدويل، W3C</span></a>.</p>

You could also, if you wanted, move the "2" out of the link altogether, like this:

<p>This is a paragraph that points to 2. <a class="secref" href="#myHeading4" lang="ar" dir="rtl">نشاط التدويل، W3C</a>.</p>

This approach would still produce expected results if the text was dropped into a RTL paragraph. In that case we do want the section number to appear to the right. There is no need to change the markup for this to happen.

هذه فقرة تشير إلى 2. نشاط التدويل، W3C.

It also works with alphabetic section numbers.

Should I use dir=auto ?

The auto value of the dir attribute looks at the incoming data, finds the first strong directional character (in the examples just above, an Arabic letter), and sets the base direction for the contents of the anchor tag to that direction (for the examples above, RTL).

If, however, the heading begins with ASCII characters (on the right), like this ...

HTML و CSS: تصميم و إنشاء مواقع الويب

An Arabic heading with Latin text at the start.

... we have a problem.

Incorrect result.

This is a paragraph that points to HTML و CSS: تصميم و إنشاء مواقع الويب.

Because the first strong bidirectional character happens to be "H" in this case, the base direction for the inserted text is set to LTR.

In summary, since first-strong heuristics can be fooled, it's better to ascertain the base direction of the heading by looking at the DOM, and apply that to the embedded link.