Ruby content model

This is a draft for to promote discussion and to work through ideas, and may be updated at any time.

You can leave a comment using this link.

Problem statement

The Ruby Annotation spec and HTML5 contain different content models for ruby markup. Other specs, such as TTML and WebVTT are also developing content models. CSS Ruby, as it develops, will place some additional requirements on ruby content models.

These notes will hopefully lead us to document a basic content model for ruby that can be used for future ruby-related developments, and will provide a core base that promotes a level of consistency between existing models.

Note carefully that what follows talks about the content model, not the markup model. For example, where you see rb, it doesn't necessarily mean that there are rb markup tags. It represents the concept of base text. The content model shows the relationships between the various items that make up ruby. Specifications using this content model may (though it's not necessarily recommended) use very different markup tag names for the elements, they may also extend the model in some way, and they may introduce rules about whether closing or opening tags are needed.

Existing models

All of these need further work, this is just a first stab at describing them.

HTML5

A simplified view of the HTML5 model (ignoring rp and ignoring nesting) is:

ruby
:	(rb rt)+
|	rb+ rt+
|	rb+ rtc rtc?

rtc
:	rt+

rb
:	(text|span)+

An rb may or may not have markup, and may contain spans as well as text. There are certain additional points that need to be made, such as how to map rbs and rts.

XHTML (Ruby Annotation)

The XHTML model is

ruby
:	rb rt
|	rbc rtc rtc?

rbc
:	rb+

rtc
:	rt+

TTML

The TTML model is the same as the XHTML model.

WebVTT

ruby
:	(rb rt)+

rb
:	WebVTT_cue_internal_text

rt
:	WebVTT_cue_internal_text

The WebVTT_cue_internal_text can include just text, or text with certain marked up inline items.

More stuff to consider

Use cases

Observations

On the whole, it’s better to use elements to identify base text, since it’s accessible for manipulation or styling.

Other things to discuss

separate out rp - do we need it? it makes everything more complicated if we do, need to allow it’s use in more places than RA allows

on the other hand, BaBaBa should become either same or BBBaaa when inline

RA should be extended to allow BaBaBa arrangements (?)

HTML nested arrangements should probably not be in this model

rbc may not be needed

must be rules about how to deal with mismatched numbers of rb vs rt

 

The model

This is not even close to being useful. It's just preliminary sketches. Don't use or refer to it.

ruby
:	(rb rt)+
|	rb+ rt+
|	rb+ rtc rtc?

rtc
:	rt+

rb
:	(text|span)+

Each rtc or rtc+rtc must correspond to one rb.

Author: Richard Ishida