Incremental Font Transfer

Editor’s Draft,

This version:
https://w3c.github.io/PFE/Overview.html
Latest published version:
https://www.w3.org/TR/example/
Feedback:
public-webfonts-wg@w3.org with subject line “[PFE] … message topic …” (archives)
Issue Tracking:
GitHub
Inline In Spec
Editors:
Chris Lilley (W3C)
(Apple Inc.)
(Google Inc.)

Abstract

Example example

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by W3C. Don’t cite this document other than as work in progress.

If you wish to make comments regarding this document, please file an issue on the specification repository.

This document was produced by the Web Fonts Working Group

This document was produced by groups operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 15 September 2020 W3C Process Document.

This is a largely empty document because we have just started working on it.

1. Introduction

This section is not normative.

The success of WebFonts is unevenly distributed. This specification allows WebFonts to be used where slow networks, very large fonts, or complex subsetting requirements currently preclude their use. For example, even using WOFF 2 [WOFF2], fonts for CJK languages are too large to be practical.

See the Progressive Font Enrichment: Evaluation Report [PFE-report] for the investigation which led to this specification.

2. Patch Based Incremental Transfer

TODO(garretrieger): Describe the high level version of how this version operates.

2.1. Data Types

This section lists all of the data types that are used to form the request and response messages sent between the client and server.

2.1.1. Encoding

All data types defined here are encoded into a byte representation for transport using CBOR (Concise Binary Object Representation) [rfc8949]. More information on how each data types should be encoded by CBOR are given in the definition of those data types.

2.1.2. Primitives

Data Type Description CBOR Major Type
Integer An integer value range [-2^64 - 1, 2^64 - 1] inclusive 0 or 1
ByteString Variable number of bytes 2
ArrayOf<Type> Array of a variable number of items of Type 4

2.1.3. SparseBitSet

A data structure which compactly stores a set of distinct unsigned integers. The set is represented as a tree where each node has a fixed number of children that recursively sub-divides an interval into equal partitions. A tree of height H with branching factor B can store set membership for integers in the interval [0 to BH-1] inclusive. The tree is encoded into a ByteString for transport.

To construct the tree T which encodes set S first select the branching factor B (how many children each node has). B can be 4, 8, 16, or 32.

Note: the encoder can use any of the possible branching factors, but it is recommended to use 4 as that has been shown to give the smallest encodings for most sets typically encountered.

Next, determine the height, H, of the tree:

H = ceil(logB(max(S) + 1))

Next create a tree of height H where all non-leaf nodes have B children. Each node in the tree has a single value composed of B bits. Given a node p which has B children: c0 ... cB - 1 and is in a tree, T, of height H, then:

The tree is encoded into a bit string. The bits of the bit string are numbered from 0 to M. When appending multi bit values to the bit string, bits are added in order from least significant bit to most significant bit.

First append 2 bits which encode the branching factor:

Bits Branching Factor
00 4
01 8
10 16
11 32

Then append the value H - 1 as a 6 bit unsigned integer.

Next the nodes are encoded into the bit string by traversing the nodes of the T in level order and appending the value for each non-zero node to the bit string.If all set values covered by a node’s interval are present within set S that node can instead be encoded in the bit string as B bits all set to zero. All children of that node must not be encoded.

Lastly the bit string is converted into a ByteString by converting each consecutive group of 8 bits into the next byte of the string. The bit with the smallest index in the bit string is the least significant bit in the byte and the bit with the largest index is the most significant bit.

The set {2, 33, 323} in a tree with a branching factor of 8 is encoded as the bit string:
  BitString:
  |- header |- lvl 0 |---- level 1 ----|------- level 2 -----------|
  |         |   n0   |   n1       n2   |   n3       n4       n5    |
  [ 10010000 10000100 10001000 10000000 00100000 01000000 00010000 ]

  Which then becomes the ByteString:
  [
    0b00001001,
    0b00100001,
    0b00010001,
    0b00000001,
    0b00000100,
    0b00000010,
    0b00001000
  ]

First determine the height of the tree:

H = ceil(log8(323 + 1)) = 3

Then append

Level 0:

Level 1:

Level 2:

The set {0, 1, 2, ..., 17} can be encoded with a branching factor of 4 as:
  BitString:
  |- header | l0 |- lvl 1 -| l2  |
  |         | n0 | n1 | n2 | n3  |
  [ 00010000 1100 0000 1000 1100 ]

  ByteString:
  [
    0b00001000,
    0b00000011,
    0b00110001
  ]

First determine the height of the tree:

H = ceil(log4(17 + 1)) = 3

Then append

Level 0:

Level 1:

Level 2:

2.1.4. Objects

Objects are data structures comprised of key and value pairs. Objects are encoded via CBOR as maps (major type 5). Each key and value pair is encoded as a single map entry. Keys are always unsigned integers and are encoded using major type 0. Values are encoded using the encoding specified by the type of the value.

All fields in an object are optional and do not need to have an associated value. Conversely when decoding and object fields may be present which are not specified in the schema. The decoder must ignore without error any key and value pairs where the key is not recognized.

There are several types of object used, each type is defined by a schema in § 2.1.5 Object Schemas. The schema for a type specifies for each field:

2.1.5. Object Schemas

2.1.5.1. CompressedList
ID Field Name Value Type
0 value_deltas ArrayOf<Integer>

Encodes a list of unsigned integers. The list is ordered and allows duplicate values. Given a list L to be encoded the array value_deltas is calculated:

value_deltas = []
if len(value_deltas) > 0:
  value_deltas[0] = L[0]
  for i in range(1, len(value_deltas)):
    value_deltas[i] = L[i] - L[i-1]
The list [2, 2, 5, 1, 3, 7] would be encoded as [2, 0, 3, -4, 2, 4].
2.1.5.2. CompressedSet

Encodes a set of unsigned integers. The set is not ordered and does not allow duplicates. Members of the set are encoded into either a sparse bit set or a list of ranges. To obtain the final set the members of the sparse bit set and the list of ranges are unioned together.

The list of ranges is encoded as a series of deltas. For example the ranges

[3, 10], [13, 15], [17, 17] would be encoded as [3, 7, 3, 2, 2, 0].

ID Field Name Type
0 sparse_bit_set SparseBitSet
1 range_deltas ArrayOf<Integer>
2.1.5.3. PatchRequest
ID Field Name Value Type
0 protocol_version Integer
1 original_font_checksum Integer
2 base_checksum Integer
3 patch_format ArrayOf<Integer>
4 codepoints_have CompressedSet
5 codepoints_needed CompressedSet
6 ordering_checksum Integer
7 indices_have CompressedSet
8 indices_needed CompressedSet

patch_format can include be any of the values specified in § 2.8.3 Patch and Compression Formats.

2.1.5.4. PatchResponse
ID Field Name Value Type
0 response_type Integer
1 original_font_checksum Integer
2 patch_format Integer
3 patch ByteString
4 patched_checksum Integer
4 codepoint_ordering CompressedList
5 ordering_checksum Integer

response_type can be one of the following values:

Value Response Type
0 PATCH
1 REBASE
2 REINDEX

2.2. New Font Request

A new font request is sent by the client to get data for a font that it has not previously requested. In response the server will give the client a subset of the requested font that covers the codepoints listed in the request. A new font request is made via HTTP and may use either the GET or POST method.

TODO(garretrieger): mention how the url identifies the specific font being requested.

2.2.1. POST

If sent as a POST request the post body will be a single PatchRequest object encoded via CBOR. All fields of PatchRequest should be left unset except for:

2.2.2. GET

If sent as a GET request the client will include a single query parameter:

2.2.3. Standard Response

The server will response to a new font request with a New Font Response.

2.2.4. Errors

The following errors may occur as a result of a new font request:

2.3. Patch Font Request

A patch font request is sent by a client to add data for additional codepoints to its font. Patch font requests are sent via HTTP and can only use the POST method. The request body is a single PatchRequest object encoded via CBOR. The fields of should be set as follows:

If the server has previously provided a codepoint_ordering for this font the client should set:

If the server has not previously provided a codepoint_ordering for this font then the codepoints_have and codepoints_needed fields should be used instead:

2.3.1. Standard Response

The server will response to a new font request with a Patch Font Response.

2.3.2. Recoverable Errors

2.3.2.1. Client’s Original Font does not Match Server’s

Over time servers may upgrade the original copies of a font with newer versions. After such an upgrade, clients who have subsets built from the previous versions may contact the server and request a patch against the previous version of the font.

The server will detect this case by checking that PatchRequest.original_font_checksum matches the checksum of the current version of the font. If a mismatch is detected there are two possible resolutions:

2.3.2.2. Client’s Base does not Match Server’s

Over time servers may upgrade or change the way they compute subsets of fonts. This could result in the base font that a server computes not matching the base font that a client has. This case can be detected by the server by comparing PatchRequest.base_checksum to the checksum for the base that the server computed. If there is a mismatch the server should respond with a § 2.4 New Font Response instead of the usual § 2.5 Patch Font Response. The replacement font must cover at minimum the codepoints specified in the codepoints_have/indices_have and codepoints_needed/indices_needed fields.

2.3.2.3. Client Codepoint Reordering does not Match Servers

The codepoint mapping used by the client may not be recognized by the server. This case can be detected by comparing PatchRequest.ordering_checksum to a checksum of the server’s codepoint reordering. If there is a mismatch the server should respond with a § 2.6 Update Codepoint Ordering Response. After receiving a § 2.6 Update Codepoint Ordering Response the client should resend their § 2.3 Patch Font Request using the new code point reordering.

If a reordering mismatch is detected it must be resolved prior to attempting to resolve any of the other recoverable error scenarios.

2.3.2.4. Client Side Patched Base Checksum Mismatch
2.3.2.5. Cmap Format 4 Overflow
2.3.2.6. Offset Overflow during Subsetting

2.3.3. Errors

The following errors may occur as a result of a patch font request:

2.4. New Font Response

If the client asks for a new font the server will respond with a single PatchResponse object encoded via COBR:

If the client receives a new font response it should replace any version of the font it currently has with the decompressed contents of the PatchResponse.patch Also PatchResponse.original_font_checksum, PatchResponse.codepoint_ordering, and PatchResponse.codepoint_ordering_checksum should be saved as they are needed for future requests.

2.5. Patch Font Response

2.6. Update Codepoint Ordering Response

2.7. Error Response

2.7.1. Font Not Found

2.7.2. Bad Request

2.7.3. Internal Error

2.8. Procedures

2.8.1. Computing Checksums

2.8.2. Codepoint reodering

2.8.2.1. Computing Checksum
2.8.2.2. Recommended algorithm

2.8.3. Patch and Compression Formats

3. Range Request Incremental Transfer

4. Negotiating Incremental Transfer Type

Privacy and Security Considerations

Note any issues that have been raised about privacy and security.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Conformance Classes

A conformant user agent must implement all the requirements listed in this specification that are applicable to user agents.

A conformant server must implement all the requirements listed in this specification that are applicable to servers.

Index

Terms defined by this specification

References

Normative References

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[RFC8949]
C. Bormann; P. Hoffman. Concise Binary Object Representation (CBOR). December 2020. Internet Standard. URL: https://datatracker.ietf.org/doc/html/rfc8949

Informative References

[PFE-report]
Chris Lilley. Progressive Font Enrichment: Evaluation Report. 15 October 2020. Note. URL: https://www.w3.org/TR/PFE-evaluation/
[RFC4648]
S. Josefsson. The Base16, Base32, and Base64 Data Encodings. October 2006. Proposed Standard. URL: https://datatracker.ietf.org/doc/html/rfc4648
[WOFF2]
Vladimir Levantovsky; Raph Levien. WOFF File Format 2.0. 1 March 2018. REC. URL: https://www.w3.org/TR/WOFF2/

Issues Index

Note any issues that have been raised about privacy and security.