Selecting part of a resource on the Web is an ubiquitous action. Over the years several selection techniques have been developed, usually in conjunction with the media type of the resource. Often these selections are expressed as fragment identifiers [[?url]], but that is not always the case.

This document relies on existing selection techniques, providing a common model and syntax defined by the Web Annotation Data Model [[?annotation-model]]. That specification developed a JSON-based approach to select targets or bodies of various types of Web Resources. This foundational model has been extended by adding selector types applicable to collective resources and a new model component for describing positions in text and byte streams.

Due to the lack of practical business cases for Web Publications, and the consequent lack of commitment to implement the technology, the Publishing Working Group has chosen to discontinue the work on Web Publications, archive the work in the form of a Working Group Note, and focus on other areas of interest. As a consequence, the present document has also been discontinued and is being published as a Working Group Note. The public record of the group's discussions is available in group's archive of meeting minutes.

This document was still a work in progress at the time of its publication. As a result, anyone seeking to extend the Web Annotation Data Model [[?annotation-model]] to select targets and bodies of various types of Web resources should read the approach and proposals outlined in this document with an abundance of caution. It is being published to archive the work and allow incubation, should interest emerge in the future to resume its development.

Handling Undefined JSON Properties

This specification relies on a subset of JSON terms originally defined as part of the Web Annotation Data Model [[?annotation-model]] and Vocabulary [[?annotation-vocab]]. This specification extends the definitions of some of these terms and defines additional terms in order to satisfy additional use cases, but all uses conforming to original definitions remain valid. In order to ensure backward compatibility, implementations of this specification MAY ignore any JSON terms not defined in this specification (directly or by reference to the Web Annotation Data Model and Vocabulary) and MUST NOT treat as invalid any JSON term encountered that is not defined in this specification.

Conformance requirements related to specific selectors

Not all Selectors defined in this specification are relevant for all resource media types. A conforming implementation MAY therefore ignore a certain type of Selector when the corresponding media type(s) associated with the Selector are not handled by that particular implementation.

Introduction

The Web Annotation Selection Model

Selecting a part of a resource on the Web is an ubiquitous action. Similarly, referencing a position in a resource representation is often necessary to support curation of and discourse about Web resources. Interactive editing of a resource, highlighting an area on the screen, adding an annotation to a specific point in a resource, or defining a bookmark to a location or a section of a long document are all examples that involve selection or positioning within a resource.

Over the years several techniques for selection have been developed, usually in conjunction with the media type of the resource. These include referring to a unique identifier within a resource, defining a time interval for an audio or video track, identifying an element within the DOM tree for an XML source, or using CSS style elements to locate and select content. Often these selections are expressed as fragment identifiers [[?url]], but that is not always the case.

This document relies on a selection technique defined by the Web Annotation Data Model [[?annotation-model]] providing a common model and syntax for such selections. That specification developed a formalism based on JSON [[?json]] to select targets or bodies of various types of Web Resources. The model relies on the concepts of Selectors, encapsulating in a JSON object the various ways selections have been defined for different media types, and States, encapsulating selections based on HTTP requests and responses. The model also includes a way to combine and/or refine selections, a feature that may greatly improve the efficiency of applications relying on complex selections. A selection or a state specifier, as described in the original model and used in this document, may also have its own unique identity in the form of an URL. This URL SHOULD be dereferencable and return the selection/state specifier definition itself.

Using the URL of the selection definition, instead of the reference to the “complete” resource could be seen as akin to a server side redirection, returning part of a resource.

The Web Annotation Working Group has also published a separate Working Group Note entitled “Selectors and States” [[?selectors-states]]. That note extracts the selector model from the full Web Annotation Data Model [[?annotation-model]] to make it more palatable for users who are not necessarily interested in other aspects of the Annotation Model. Although normatively this specification refers to the [[?annotation-model]], readers should probably consult the [[?selectors-states]] note for a better understanding of the underlying concepts. Note, however, that the [[?selectors-states]] Working Group Note also includes a proposal for a fragment identifier syntax; that syntax has not been used in the current specification.

Examples

The example below shows the usage of a Text Quote Selector referring to a specific portion of text in a resource:

{
  "source": "http://example.org/page1",
  "selector": {
    "type": "TextQuoteSelector",
    "exact": "annotation",
    "prefix": "this is an ",
    "suffix": " that has some"
  }
}

The next example shows the usage of refinement: specific portion of text is selected from a paragraph; the latter is identified via a “traditional” fragment identification.

{
  "source": "http://example.org/page1",
  "selector": {
    "type": "FragmentSelector",
    "value": "para5",
    "refinedBy": {
      "type": "TextQuoteSelector",
      "exact": "Selected Text",
      "prefix": "text before the ",
      "suffix": " and text after it"
    }
  }
}

See the [[?selectors-states]] Note for more examples for the usage of the Web Annotation Model for selection.

Extensions of the Web Annotation Approach

The approach defined by the [[?annotation-model]] has been extended in this specification by adding selector types applicable to collective resources and a new model component for describing positions in text and byte streams. It provides methods for selecting a segment of a collective resource (e.g., a “Web Publication” [[?wpub]]) that itself contains or is composed of other discrete and individually identifiable resources, even when the segment of interest spans parts of more than one included resource. The common model for selection as described in this specification makes it easier to provide generic and interoperable tools and APIs to handle selections in various applications.

More specifically, this document extends the Web Annotation Data Model by adding three new selectors, namely:

These changes aim at addressing the particular requirements of resource collections on the Web, like Web Applications or Web Publications [[?wpub]].

Additionally, the current document augments the Web Annotation Data Model of selectors and states with a new class of specifier, Positions. Two position specifiers are defined:

Although defined in conjunction with Web Publications, the techniques described in this document can be used for any type of Web Resource.

Terminology

This section is normative

Wherever appropriate, this document relies on terminology defined by the note on “Publishing and Linking on the Web” [[?publishing-linking]], including, in particular, user, user agent, browser, and address. Furthermore, the document also relies on some additional terms defined by the “Web Publication” [[?wpub]], including a URL.

Resource
An item of interest that MAY be identified by a URL.
Web Resource
A Resource that MUST be identified by a URL, as described in the Web Architecture [[?webarch]]. Web Resources MAY be dereferencable via their URL.
Locator
A Resource that specifies a position in or a portion of another Web Resource. A Locator expresses its relationship to the relevant Web Resource (the Source) through a source term, and MAY express a contextual relationship to an additional Web Resource through a scope term. The original [[?annotation-model]] document used the term Specific Resource as a generic term encompassing usages that go beyond selection. This specification uses the term “Locator” as an alias.
Source
The overall Web Resource whose selection is refined through the usage of Selector, Position, or State specifier(s).
Segment (of Interest)
The part of the Web Resource that is specified in a Locator using a Selector specifier.
Locus (of Interest)
The location in the Web Resource that is specified in a Locator using a Position specifier.
External Web Resource
A Web Resource which is not part of the representation the selection, such as a web page, image, or video. External Web Resources are dereferencable from their URL.
Property
A feature of a Resource, that often has a particular data type. In the model sections, the term “Property” is used to refer to only those features which are not Relationships and instead have a literal value such as a string, integer, or date. The valid values for a Property are thus any data type other than object, or an array containing members of that data type if more than one is allowed.
Relationship
In the model sections, the term “Relationship” is used to distinguish those features that refer to other Resources, either by reference to the Resource's URL or by including a description of the Resource in the representation. The valid values for a Relationship are: a quoted string containing a URL, an object that has the “id” property, or an array containing either of these if more than one is allowed.
Type
A feature of a Resource whose valid values are predefined strings (defined in this document) denoting the particular type of Selector, Position, or State specifier.

Locators

This term is formally defined in the [[?annotation-model]]; this (somewhat shortened) specification is a provided as a convenient reference only. Note, however, that the Position Specifier is not part of the original specification, and has been added by this specification.

A Resource that specifies a location in or a portion of another Web Resource. It does this using Specifiers that can be any of:

Specifiers MAY be External Web Resources with their own URLs, such as in the example for the Selector construction, however it is RECOMMENDED that they be included in full within the Locator representation to avoid requiring unnecessary network interactions to retrieve all of the information.

Model

Term Type Description
id Property The identity of the Locator
A Locator SHOULD have exactly 1 URL that identifies it.
source Relationship The relationship between a Locator and the resource that it is a more specific representation of, i.e., the Source.
There MUST be exactly 1 source relationship associated with a Locator. The source resource MAY be described in detail as discussed in the Web Annotation Data Model [[?annotation-model]] or it MAY simply be identified by the resource’s URL.
scope Relationship The relationship between a Locator and an additional resource other than the source that provides scope or context for the Locator.
There MAY be 0 or more scope relationships for each Locator. When the source is part of a group or collection of resources that has its own URL, scope MAY be used to record this URL.

Selectors

This term is formally defined in the [[?annotation-model]]; this (somewhat shortened) specification is provided as a convenient reference only.

The definition of 'Selector' as used in this specification differs from the normative definition of the term in [[?css3-selectors]]. The text / terminology in this specification regarding Selectors may need to be revised.

Selection of part of a Web Resource requires two distinct entities:

  1. the URL of the overall resource; we will refer to this as the Source (see Locator).
  2. the identification for the part of that resource; we will refer to this as the Segment (of Interest).

A Selector specifies how to determine the Segment from within the Source resource. The nature of the Selector is dependent on the selection technique chosen (which determines the class of the Selector) and the media-type of the Source, as the methods to describe Segments from various media-types differ. The Source and the Selector(s) are encapsulated in a Locator.

Example Use Case: Qitara wants to associate a selection of text in a web page with a slice of a dataset. She selects both using her client, and creates Locators with Selectors for both entities before associating them with one another.

Model

Term Type Description
selector Relationship The relationship between a Locator and a Selector.
There MAY be 0 or more selector relationships associated with a Locator. Multiple Selectors SHOULD select the same content, however some Selectors will not have the same precision as others. User Agents MUST pick one of the described segments, if they are different.

Example

			{
			  "source": "http://example.org/page1",
			  "selector": "http://example.org/paraselector1"
			}
            

Embedded Resource Selector

This section is normative

For some use cases it is required to identify a resource that is part of a collection or group of resources, where that collection has its own identity on the Web (and can be identified via its own URL). An example is selecting a resource that is a chapter of a Web Publication [[?wpub]] or Packaged Web Publication [[?pwpub]]. Given the URL of such a collective resource as the value of source, an Embedded Resource Selector can be used to select and identify an item within the collection, e.g., the chapter, through its value relationship. This Selector is usually used in conjunction with additional Selectors, e.g., through refinement.

Example Use Case: Janine wants to select the cover image of a Web Publication, which is linked to the Web Publication as a whole. She uses an Embedded Resource Selector to designate the image, with the Web Publication’s address as the Source for the selector.

Model

Term Type Description
type Relationship The class of the Selector.
Range Selectors MUST have exactly 1 type and the value MUST be EmbeddedResourceSelector.
value Relationship The URL [[?url]] of the resource within the collection or group of resources identified by the Source.
An EmbeddedResourceSelector MUST have exactly 1 value property.
The URL MAY be a relative URL, with the value of Source serving as a base URL.

Example

{
  "source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
  "selector": {
    "type": "EmbeddedResourceSelector",
    "value": "https://dauwhe.github.io/html-first/MobyDickNav/images/book-cover.jpg"
  }
}				

A frequent usage of refinement is in combination with an Embedded Resource Selector to denote the fact that a particular selection is related to, e.g., a Web Publication. For example:

{
  "source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
  "selector": {
    "type": "EmbeddedResourceSelector",
    "value": "MobyDickNav/html/c001.html",
    "refinedBy": {
      "type": "CssSelector",
      "value": "#elemid > .elemclass + p"
    }		
  }
}	

Note the usage of a relative URL in the example; it is considered to be good practice to use relative URL, when applicable.

Embedded Resource Selector serialized as a fragment identifier

This section is normative

This section is predicated on the assumption that Packaged Web Publications (PWPs) rely on a packaging format of a media type intended exclusively for packaging PWPs or only for packaging PWPs and EPUBs. If the Working Group decides that PWPs should be packaged using a more generic packaging format with a more generally used media type, or if the Group for any other reason decides not to pursue registering this fragment identifier scheme, then this section (including subsection) should be removed.

For some simple use cases involving Packaged Web Publications, it may be more convenient or more consistent with past practice to express a simple Embedded Resource selection as a fragment identifier [[?url]] that can be appended to the URL of the collective resource, i.e., the source associated with the Embedded Resource selection. (An informative precedent for this approach is the International Digital Publishing Forum Recommended Specification, EPUB Canonical Fragment Identifiers 1.1 [[?cfi]], which defines a fragment identifier serialized model for selecting and positioning within resources of the application/epub+zip media type.)

A mapping for serializing simple Embedded Resource selections as fragment identifiers is defined below. This mapping allows the Segment (of interest) to be expressed in a single URL. Note that this mapping is valid only if the URL of the Source is the URL of a Packaged Web Publication and does not itself already include a fragment identifier of its own.

An Embedded Resource Selector is serialized as a fragment identifier using a function-like syntax, i.e.:

  • The source for the selection is the base URL to which a # character is appended.
  • Next comes the fixed string ERS (in lieu of a function name), followed by a single 'parameter' enclosed in parentheses.
  • The single 'parameter' of the function-like notation is a URL [[?url]], i.e., the value from the JSON serialization of the Embedded Resource Selector. The parameter MAY be an absolute URL or relative to the base URL.

The value of the URL 'parameter' appearing in a ERS fragment identifier SHOULD be percent encoded [[?rfc3986]]. The encoding is a MUST for characters that may make the URL ambiguous, namely:

character code
space %20
= %3D
, %2C
# %23

A fragment identifier is defined for a specific media type. This means that, formally, the fragment identifier syntax and semantics defined in this section must be registered for any PWP media type(s) by IANA. Until such a registration is done, these fragment identifiers have the potential to conflict with other fragment identifier schemes specified by media type registrations.

Example

The example below is semantically equivalent to the example on the usage of an Embedded Resource Selector:

https://dauwhe.github.io/html-first/MobyDick.pwpub#ERS(
  https://dauwhe.github.io/html-first/MobyDickNav/images/cover.jpg)

(A new line character has been introduced into the Example above to facilitate readability; in real usage such new line characters are not allowed in a URL.)

The usage of a fragment identifier may also make the usage of explicit refinement unnecessary. The example below, which incidentally uses a relative rather than absolute URL, is semantically equivalent to the example combining an embedded resource selector with refinement:

{
  "source": "https://dauwhe.github.io/html-first/MobyDick.pwpub#ERS(MobyDickNav/html/c001.html)",
  "selector": {
    "type": "CssSelector",
    "value": "#elemid > .elemclass + p"
  }
}	
					

Refinement of ERS fragment identifiers

The fragment identifier serialization mapping of Embedded Resource Selectors generally does not support refinement, except that the URL 'fragment' may include its own fragment identifier, appropriate to the media type of the resource identified by the URL. The following example illustrates such a pattern. (To increase readability, the percent encoding has been omitted from the example.)

https://dauwhe.github.io/html-first/MobyDick.pwpub#ERS(MobyDickNav/images/cover.jpg#xywh=50,50,640,480)

The URL above is the result of mapping the JSON-serialized Embedded Resource Selector below; note that the link to the Media Fragments URI 1.0 Recommendation [[?media-frags]] cannot be mapped to the fragment identifier serialization, so the mapping is not entirely lossless.

{
  "source": "https://dauwhe.github.io/html-first/MobyDick.pwpub",
  "selector": {
    "type": "EmbeddedResourceSelector",
    "value": "https://dauwhe.github.io/html-first/MobyDickNav/images/cover.jpg",
    "refinedBy": {
      "type": "FragmentSelector",
      "conformsTo": "http://www.w3.org/TR/media-frags/",
      "value": "xywh=50,50,640,480"
    }		
  }
}   

Span Selector

This section is normative

Selections from a group of resources, e.g., the group of resources which comprise a Web Publication [[?wpub]]), may be extensive and may span member resource boundaries. For resource-spanning selections that are continuous in some ordering of the group of resources, a Span Selector can be used to identify the beginning and the end of the selection using Embedded Resource Selectors, refined as appropriate. Embedded Resource Selectors (without refinement) are also used in enumerating any intervening resources between the beginning and end of the selection that are included in the selection. A Span Selection MUST span at least two resources. (For continuous selections wholly contained within a single resource, use a Range Selector.) In the absence of refinement, the selection consists of the member resource identified by the startSelector property (the first resource in the selection), the member resource identified by the endSelector (the last resource in the selection), and the intervening member resource(s) (in some ordering of the group) between the starting and ending member resources as enumerated by the selectors property. If the startSelector is refined with another selector, then only the part of the first resource from the start of the refined selection to that resource's end (i.e., including what is identified by refinement) is included in the span selection. If the endSelector is refined with another selector, then only the part of the last resource prior to the start of the refined selection (i.e., excluding what is identified by refinement) is included in the selection.

The ordering of resources does not make use of external features, like the default reading order in a Web Publication [[?wpub]]. The order is exclusively established via the selectors property.

Example Use Case: Misha wants to comment on text in a Web Publication that spreads over several constituent resources. He selects the start and the end of the selection in different of those resources; his User Agent calculates the Span Selector using a series of Embedded Resource Selections from the first selection as a start and the last selector as the end to provide a continuous span.

Model

Term Type Description
type Relationship The class of the Selector.
Span Selectors MUST have exactly 1 type and the value MUST be SpanSelector.
startSelector Relationship The Selector which describes the inclusive starting point of the span.
There MUST be exactly 1 startSelector associated with a Span Selector and it MUST be an Embedded Resource Selector, which MAY be refined with other selectors.
selectors Relationship Provides an ordered, possibly empty, list of Embedded Resource Selectors, which identify intermediate resources subsumed in the full selection. These Embedded Resource Selectors MUST NOT be refined with other selectors.
There MAY at most 1 selectors relationship associated with a Span Selector. In the absence of a selectors relationship, a user agent SHOULD assume that the start and end resources are contiguous.
endSelector Relationship The Selector which describes the exclusive ending point of the span.
There MUST be exactly 1 endSelector associated with a Span Selector and it MUST be an Embedded Resource Selector, which MAY be refined with other selectors.

Example

{
  "source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
  "selector": {
    "type": "SpanSelector",
    "startSelector": {
      "type": "EmbeddedResourceSelector",
      "value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
      "refinedBy" : {
        "type": "TextQuoteSelector",
        "exact": "Call me Ishmael.",
        "suffix": "Some years ago"	
      }
    },
    "selectors": [{
      "type": "EmbeddedResourceSelector",
      "value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c002.html",
    },{
      "type": "EmbeddedResourceSelector",
      "value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c003.html",			
    }],
    "endSelector": {
      "type": "EmbeddedResourceSelector",
      "value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c004.html",
      "refinedBy": {
        "type": "TextQuoteSelector",
        "exact": "He commenced dressing",
        "suffix": " at top"	
      }
    }
  }
}
				

Multi Resource Selector

This section is normative

For some use cases it is required to identify a segment (of interest) that spans multiple selections, possibly over multiple members of a group of resources (e.g., spanning a subset of the resources which comprise a Web Publication [[?wpub]]). A Multi Resource Selection can be used to identify such a segment of interest by creating an ordered list of selectors. A Multi Resource Selection identifies a collection of discrete selections, whether within a single resource or spread over several resources included in a single Source. If the segment of interest spans more than one resource, these selectors MUST all be Embedded Resource Selectors, each of which MAY be refined.

Example Use Case: Example Use Case: Rachel is writing a summative assessment question with hints pointing back to the textbook. The questions pulls on material presented in Chapter 2, a-head 3, Chapter 4, a-head 6, and in Chapter 7, a-head 8. She uses the Multi Resource Selector defining a single link to add to the hints section of her assessment questions that references Sections 2.3, Section 4.6, and Section 7.8, but nothing in between them.

Model

Term Type Description
type Relationship The class of the Selector.
Multi Resource Selectors MUST have exactly 1 type and the value MUST be MultiResourceSelector.
selectors Relationship A list of Selectors.
There MUST be exactly 1 selectors list associated with a Multi Resource Selector.
The list MUST have at least 2 elements.

Example

{
  "source": "https://textbook.example.org/",
  "selector": {
    "type": "MultiResourceSelector",
    "selectors": [{
      "type" : "EmbeddedResourceSelector",
      "value": "https://textbook.example.org/section2.html",
      "refinedBy": {
        "type": "CssSelector",
        "value": "body>section:nth-of-type(3)"
      }
    },{
      "type": "EmbeddedResourceSelector",
      "value": "https://textbook.example.org/section4.html",
      "refinedBy": {
        "type": "CssSelector",
        "value": "body>section:nth-of-type(6)"
      }
    },{
      "type": "EmbeddedResourceSelector",
      "value": "https://textbook.example.org/section7.html",
      "refinedBy": {
        "type": "CssSelector",
        "value": "body>section:nth-of-type(8)"
      }
    }]
  }    
}
				

Positions

A Position object describes a Locus (of Interest) within a stream representation of a Web Resource.

A Position specifier requires knowledge of two distinct entities:

  1. the URL of the overall resource; this is the same Source as used for Selectors (see ).
  2. an integer representing the count of bytes or characters in the stream preceding the locus of interest.

Example Use Case: Allen, while reading chapter 1 of a digital edition of Moby Dick, generates (as a separate resource with its own URL) a Position specifier to note the position in the digital text stream representation where the first page break in chapter 1 of a Moby Dick print edition occurred.

Model

Term Type Description
position Relationship The relationship between the Locator and a Position specifier.
A Locator MAY have 0 or 1 position relationships.

When processing a Locator that includes Selector and/or State specifier(s) the Position specifier (if present) MUST be processed last.

Example

{
  "scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
  "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
  "position": {
    "id": "http://example.org/printPageBreak-c1.1"
  }
}
			

Text Stream Position

The TextStreamPosition specifier describes an inter-character position in a text stream by recording the number of characters that precede the position. The property value is used to record this character count. A value of 0 would describe the position immediately before the first character, a value of 1 would describe the position immediately after the first character and before the second character (i.e., the position between the first two characters of the stream), and so on. For example, if the text stream was “abcdefghijklmnopqrstuvwxyz” and the value was 7, then the position referenced would be the position between "g" and "h". If n is the length of the text stream, then a value equal to n denotes the position immediately following the last character in the text stream.

In some situations, it is important to preserve which side of a position a location reference points to. For example, when resolving a text stream position in a dynamically paginated environment, it could make a difference if a position is attached to the content before or after the location being referenced (e.g., to determine whether to display the verso or recto side at a page break). In a TextStreamPosition object, the bias property MAY be used to attach a position reference to the character preceding the position identified by the value property ("bias": "before") or to the character following the position ("bias": "after"). For example, if the text stream was “abcdefghijklmnopqrstuvwxyz”, the value was 7, and the bias was before, then the position referenced would be the position between "g" and "h" and would be attached to "g".

The property bias is only meaningful when some type of break (e.g., a page break or line break) falls or might fall at the position specified by the TextStreamPosition specifier.

Example Use Case: George notices that a letter is missing between characters 322 and 323 in the text of an HTML file he is reading and decides he wants to mark the position of the missing letter by generating a TextStreamPosition specifier so that the letter can be inserted later during editing. He also wants to ensure that if a hyphen and line break should be dynamically inserted in this position the missing character would follow the break, and so he uses the bias property to associate the position reference with the character that follows the position being referenced (i.e., character 323).

Is this a valid example as regards bias? Or is this kind of situation better handled by an application rather than conflating a side-bias property with the position reference. If not compelling enough, we need a better more concrete, real-world use case for side-bias! Otherwise we should drop side-bias.

Model

Term Type Description
type Relationship The class of the Position specifier.
A Text Stream Position specifier MUST have exactly 1 type and the value MUST be TextStreamPosition.
value Property The count of characters in the text stream preceding the Locus (of interest).
Each TextStreamPosition MUST have exactly 1 value property, and it MUST be a non-negative integer less than or equal to the number of characters in the text stream.
bias Property This property is used to associate a position reference with either the character that preceeds it or follows it.
Each TextPositionSelector MAY include 0 or 1 bias properties, and the value of bias MUST be either before or after.

The text MUST be selected and normalized in the same way as for the Text Quote Selector before counting the number of characters to determine the value to be used.

Example

{
  "scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
  "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
  "position": {
    "type": "TextStreamPosition",
    "value": 322,
    "bias": "after"
  }
}
				

Data Stream Position

Similar to the TextStreamPosition specifier, the DataStreamPosition specifier describes a position between two bytes in a byte stream representation of a resource by recording the number of bytes that precede the position. The property value is used to record this byte count. A value of 0 would describe the position immediately before the first byte, a value 1 would describe the position immediately after the first byte and before the second byte (i.e., the position between the first two bytes of the stream), and so on. If n is the length of the byte stream, then a value equal to n denotes the position immediately following the last byte in the stream.

Example Use Case: Paul's data processing application fails after processing the first 401 bytes of a resource's byte stream representation. Before exiting, the application generates a DataStreamPosition object to record the position where processing was interrupted. This will facilitate resumption of processing after the processing bug is resolved.

Model

Term Type Description
type Relationship The class of the Position specifier.
A Data Stream Position specifier MUST have exactly 1 type and the value MUST be DataStreamPosition.
value Property The count of bytes in the byte stream preceding the Locus (of interest).
Each DataStreamPosition MUST have exactly 1 value property, and it MUST be a non-negative integer less than or equal to the number of bytes in the stream.

Example

{
  "source": "https://example.org/MyData.json",
  "position": {
    "type": "DataStreamPosition",
    "value": 401
  }
}
				

Using Position to Refine

Unlike Selector and State specifiers, a Position specifier can not be refined (since it identifies a point in a stream rather than a part of a resource or a representation of a resource). However, it may be easier, more reliable, more accurate, or less brittle to resolve a Position specifier in the context of a part of a resource defined by a Selector and/or a representation of a resource described by a State. Thus a Position specifier may be used to refine a Selector or State (including refined Selectors or States), as long as the Position specifier is the final refinement step processed.

Example Use Case: Deren is one of several people who are collaboratively editing an HTML file. He needs to generate a TextStreamPosition specifier identifying a position where the word "Mister" needs to be inserted in one of the paragraphs he alone has been assigned to edit. He doesn't want to specify a text character count within the text stream representation of the entire HTML file, since he knows other edits are in progress that could affect that character count. So instead he uses a CssSelector to select the paragraph of interest and then refines this selection with a TextStreamPosition specifier to reference the position within the paragraph where the insertion is needed.

Example

{
  "scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
  "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
  "selector": {
    "type": "CssSelector",
    "value": "p:nth-child(2)",
    "refinedBy": {
      "type": "TextStreamPosition",
      "value": 8
    }		
  }
}
				

Changes Relative to the Web Annotation Model

Editorial Changes

Non-editorial Changes