Selectors and States

Abstract

Selecting part of a resource on the Web is an ubiquitous action. Over the years several selection techniques have been developed, usually in conjunction with the media type of the resource. Many of these approaches are also expressed in terms of a fragment identifiers [url], but that is not always the case.

This document does not define any new approach to selection. Instead, it relies on existing techniques, providing a common model and syntax to express and possibly combine selections. The formal specification and the semantics originate from a separate Recommendation, namely the Web Annotation Data Model [annotation-model], where it is used to select targets of annotations. The current document only “extracts” Selectors and States from that data model; by doing so, it makes their usage easier for applications developers whose concerns are not related to annotations.

1. Introduction

Selecting part of a resource on the Web is an ubiquitous action. Interactive editing of a resources, highlighting an area on the screen, adding an annotation to a specific point in a resource, or using a section of a larger dataset for visualization are all examples that involve selection within a resource. Over the years several selection techniques have been developed, usually in conjunction with the media type of the resource. These include referring to a unique identifier within a resource, defining a time interval for an audio or video track, identifying an element within the DOM tree for an XML source, or using CSS style elements to locate content. Many of these approaches are also expressed in terms of a fragment identifiers [url], but that is not always the case.

This document does not define any new approach to selection. Instead, it relies on existing techniques, providing a common model and syntax to express selections. Furthermore, the model also includes a way to combine selections via refinements, a feature that may greatly improve the efficiency of applications relying on complex selections. Such a common model makes it easier to provide generic and interoperable tools and APIs to handle selections in various applications.

A selection or state, as described in this document, may have its own unique identity in the form of an IRI. This IRI SHOULD be dereferencable and return the selection/state definition itself.

Note

Using the IRI of the selection definition, instead of the reference to the “complete” resource could be seen as akin to a server side redirection, returning part of a resource.

The data model is defined in [json], in the form of JSON objects and keys. The formal specification and the semantics of these originate from a larger model, namely the Web Annotation Data Model [annotation-model], where it is used to select targets of annotations. The current document only “extracts” Selectors and States from that data model; by doing so, it makes their usage easier for applications developers whose concerns are not related to annotations. The only new feature in this document is the section on selector fragments. Although this document is aimed to be self consistent, in case of inconsistencies or errors the [annotation-model] remains the authoritative source.

1.1 Relationships to RDF

The definitions in this documents also follow the extra syntactic rules of JSON-LD [json-ld]. This means that, through the usage of the JSON-LD Context defined in Appendix A of the Web Annotation Vocabulary [annotation-vocab], also available online as http://www.w3.org/ns/anno.jsonld, Selectors and States can be seen as RDF Graphs [rdf11-concepts] that can be linked to, or used from within, other RDF graphs. Using the mapping defined in that Context Selectors and States can be expressed in other RDF serializations like Turtle [turtle]. The formal specification of the corresponding vocabulary is provided by [annotation-vocab]. To make the equivalences easier for the reader, 6. Examples in Turtle contains all JSON-LD examples of this document serialized in Turtle, too.

It must be noted that applications unrelated to RDF and/or Linked Data can safely ignore the context as well as the vocabulary mapping, and work exclusively with the JSON objects and keys as defined below.

1.2 Terminology

IRI: An IRI, or Internationalized Resource Identifier, is an extension to the URI specification to allow characters from Unicode, whereas URIs must be made up of a subset of ASCII characters. There is a mapping algorithm for translating between IRIs and the equivalent encoded URI form. IRIs are defined by [rfc3987].
Resource: An item of interest that MAY be identified by an IRI.
Web Resource: A Resource that MUST be identified by an IRI, as described in the Web Architecture [webarch]. Web Resources MAY be dereferencable via their IRI.
Specific Resource: A Resource that serves as a wrapper around the selection of part of another Web Resource. A Specific Resource identifies the relevant Web Resource (the Source) through a source term, and MAY contain other terms to refine the selection.
Source: The overall Web Resource whose selection is refined through the usage of Selectors or States.
Segment (of Interest): The part of the Resource that is selected using a Selector.
External Web Resource: A Web Resource which is not part of the representation the selection, such as a web page, image, or video. External Web Resources are dereferencable from their IRI.
Property: A feature of a Resource, that often has a particular data type. In the model sections, the term "Property" is used to refer to only those features which are not Relationships and instead have a literal value such as a string, integer, or date. The valid values for a Property are thus any data type other than object, or an array containing members of that data type if more than one is allowed.
Relationship: In the model sections, the term "Relationship" is used to distinguish those features that refer to other Resources, either by reference to the Resource's IRI or by including a description of the Resource in the representation. The valid values for a Relationship are: a quoted string containing an IRI, an object that has the "id" property, or an array containing either of these if more than one is allowed.
Class: Resources may be divided, conceptually, into groups called "classes"; members of a class are known as Instances of that class. Resources are associated with a particular class through typing. Classes are identified by IRIs, i.e., they are also Web Resources themselves.
Type: A special Relationship that associates an Instance of a class to the Class it belongs to.
Instance: An element of a group of Resources represented by a particular Class.

Term	Type	Description
@context	Property	This term is only necessary if the Specific Resource is to be considered as an RDF graph; it determines the meaning of the JSON in RDF. If used, the Specific Resource MUST have 1 or more `@context` values and `http://www.w3.org/ns/anno.jsonld` MUST be one of them. If there is only one value, then it MUST be provided as a string.
id	Property	The identity of the Specific Resource A Specific Resource SHOULD have exactly 1 IRI that identifies it.
type	Relationship	The class of the Specific Resource The Specific Resource SHOULD have the `ResourceSelection` class. This term is only necessary if the Specific Resource is to be considered as an RDF graph.
ResourceSelection	Class	The class of Specific Resources used for selectors. The `ResourceSelection` class SHOULD be associated with a Specific Resource to be clear as to its role as a more specific region or state of another resource.
source	Relationship	The relationship between a Specific Resource and the resource that it is a more specific representation of, ie, the Source. There MUST be exactly 1 `source` relationship associated with a Specific Resource. The source resource MAY be described in detail as in the core data model or be just the resource's IRI.

3. Selectors

Selection of part of a Web Resource requires two distinct entities:

the IRI of the overall resource; we will refer to this as the Source.
the identification for the part of that resource; we will refer to this as the Segment (of Interest).

A Selector object is used to describe how to determine the Segment from within the Source resource. The nature of the Selector is dependent on the type of resource, as the methods to describe Segments from various media-types differ. These two entities are encapsulated in a Specific Resource.

Example Use Case: Qitara wants to associate a selection of text in a web page with a slice of a dataset. She selects both using her client, and creates Specific Resources with Selectors for both entities before associating them with one another.

Model

Term	Type	Description
selector	Relationship	The relationship between a Specific Resource and a Selector. There MAY be 0 or more `selector` relationships associated with a Specific Resource. Multiple Selectors SHOULD select the same content, however some Selectors will not have the same precision as others. Consuming user agents MUST pick one of the described segments, if they are different.

Example

Example 1: Selectors

{
    "source": "http://example.org/page1",
    "selector": "http://example.org/paraselector1"
}

3.1 Fragment Selector

As the most well understood mechanism for selecting a Segment is to use the fragment part of an IRI defined by the representation's media type, it is useful to allow this as a description mechanism via a Selector. This allows existing and future fragment specifications to be used with Specific Resources in a consistent way. To be clear about which fragment type is being used, the Selector may refer to the specification that defines it.

Example Use Case: Ramona wants to associate part of a video as the description of an image. She selects the time range within the video and clicks that it is describing the target. Her client then creates the Annotation using a SpecificResource with a FragmentSelector.

Model

Term	Type	Description
type	Relationship	The class of the Selector. FragmentSelectors MUST have exactly 1 `type` and the value MUST be `FragmentSelector`.
FragmentSelector	Class	A resource which describes the Segment through the use of the fragment component of an IRI.
value	Property	The contents of the fragment component of an IRI that describes the Segment. The FragmentSelector MUST have exactly 1 `value` property.
conformsTo	Relationship	The relationship between the FragmentSelector and the specification that defines the syntax of the IRI fragment in the `value` property. The Fragment Selector SHOULD have exactly 1 `conformsTo` link to the specification that defines the syntax of the fragment and MUST NOT have more than 1.

It is RECOMMENDED to use FragmentSelector as a consistent method compatible with other means of describing SpecificResources, rather than using the IRI with a fragment directly. Consuming applications SHOULD be aware of both.

The following IRIs are some of the specifications that define the semantics of fragments, and hence may be used with the conformsTo relationship. Other IRIs MAY also be used.

Name	Fragment Specification	Description
HTML	http://tools.ietf.org/rfc/rfc3236	[rfc3236] Example: `namedSection`
PDF	http://tools.ietf.org/rfc/rfc3778	[rfc3778] Example: `page=10&viewrect=50,50,640,480`
Plain Text	http://tools.ietf.org/rfc/rfc5147	[rfc5147] Example: `char=0,10`
XML	http://tools.ietf.org/rfc/rfc3023	[rfc3023] Example: `xpointer(/a/b/c)`
RDF/XML	http://tools.ietf.org/rfc/rfc3870	[rfc3870] Example: `namedResource`
CSV	http://tools.ietf.org/rfc/rfc7111	[rfc7111] Example: `row=5-7`
Media	http://www.w3.org/TR/media-frags/	[media-frags] Example: `xywh=50,50,640,480`
SVG	http://www.w3.org/TR/SVG/	[svg11] Example: `svgView(viewBox(50,50,640,480))`
EPUB3	http://www.idpf.org/epub/linking/cfi/epub-cfi.html	[cfi] Example: `epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)`

Note

The IRI that uses the fragment may be reconstructed by concatenating the source, a #, and the value. For example, the IRI from the example below would be http://example.org/video1#t=30,60.

Example

Example 2: Fragment Selector

{
    "source": "http://example.org/video1",
    "selector": {
      "type": "FragmentSelector",
      "conformsTo": "http://www.w3.org/TR/media-frags/",
      "value": "t=30,60"
    }
}

3.2 CSS Selector

One of the most common ways to select elements in the HTML Document Object Model is to use CSS Selectors [css3-selectors]. CSS Selectors allow for a wide variety of well supported ways to describe the path to an element in a web page, and thus cover many of the basic use cases for selection. Results are not defined for when a CSS Selector is applied to a representation that does not conform to the Document Object Model.

Example Use Case: Sally selects a paragraph in a web page that she wishes remove. Her client calculates a CSS path that cleanly identifies that element to be deleted.

Model

Term	Type	Description
type	Relationship	The class of the Selector. CssSelectors MUST have exactly 1 `type` and the value MUST be `CssSelector`.
CssSelector	Class	The type of the CSS Selector resource. CSS Selectors MUST have this class associated with them.
value	Property	The CSS selection path to the Segment. There MUST be exactly 1 `value` associated with a CSS Selector.

Note

Implementers SHOULD use only commonly supported features of CSS that directly contribute to selection of an element or content, rather than styling or transformation, in order to maximize interoperability between systems.

Example

Example 3: CSS Selector

{
    "source": "http://example.org/page1.html",
    "selector": {
      "type": "CssSelector",
      "value": "#elemid > .elemclass + p"
    }
}

3.3 XPath Selector

Another common method of selecting elements and content within a resource that supports the Document Object Model (DOM), such as documents in XML or HTML, is to use an XPath selection [dom-level-3-xpath]. XPath allows a great deal of flexibility when describing the path through the structure to the selected content. Results are not defined for when an XPath Selector is applied to a representation that does not conform to the DOM.

Note

Implementers should note that the HTML5 specification allows parsers to add elements into the DOM that are considered to be missing. XPaths SHOULD be constructed to include these elements, rather than from the element structure in the document.

Example Use Case: Teynika selects a span within a table in an HTML page and writes a note about the content. To refer explicitly to this element, her client carefully constructs an XPath to identify the relevant element.

Model

Term	Type	Description
type	Relationship	The class of the Selector. XPath Selectors MUST have exactly 1 `type` and the value MUST be `XPathSelector`.
XPathSelector	Class	The type of the XPath Selector resource. XPath Selectors MUST have this class associated with them.
value	Property	The xpath to the selected segment. There MUST be exactly 1 `value` associated with an XPath Selector.

Note

Implementers SHOULD use only commonly supported features of XPath that directly contribute to selection of an element or content in order to maximize interoperability between systems.

Example

Example 4: XPath Selector

{
    "source": "http://example.org/page1.html",
    "selector": {
      "type": "XPathSelector",
      "value": "/html/body/p[2]/table/tr[2]/td[3]/span"
    }
}

3.4 Text Quote Selector

This Selector describes a range of text by copying it, and including some of the text immediately before (a prefix) and after (a suffix) it to distinguish between multiple copies of the same sequence of characters.

For example, if the document was "abcdefghijklmnopqrstuvwxyz", one could select "efg" by a prefix of "abcd", the match of "efg" and a suffix of "hijk".

Example Use Case: Ulrika selects a typo ('anotation') in a web page and adds a comment that it should be replaced with the correct spelling ('annotation').

Model

Term	Type	Description
type	Relationship	The class of the Selector. Text Quote Selectors MUST have exactly 1 `type` and the value MUST be `TextQuoteSelector`.
TextQuoteSelector	Class	The class for a Selector that describes a textual segment by means of quoting it, plus passages before or after it. The TextQuoteSelector MUST have this class associated with it.
exact	Property	A copy of the text which is being selected, after normalization. Each TextQuoteSelector MUST have exactly 1 `exact` property.
prefix	Property	A snippet of text that occurs immediately before the text which is being selected. Each TextQuoteSelector SHOULD have exactly 1 `prefix` property, and MUST NOT have more than 1.
suffix	Property	The snippet of text that occurs immediately after the text which is being selected. Each TextQuoteSelector SHOULD have exactly 1 `suffix` property, and MUST NOT have more than 1.

The selection of the text MUST be in terms of unicode code points (the "character number"), not in terms of code units (that number expressed using a selected data type). Selections SHOULD NOT start or end in the middle of a grapheme cluster. The selection MUST be based on the logical order of the text, rather than the visual order, especially for bidirectional text. For more information about the character model of text used on the web, see [charmod].

The text MUST be normalized before recording in the Annotation. Thus HTML/XML tags SHOULD be removed, and character entities SHOULD be replaced with the character that they encode. Note that this does not affect the state of the content of the document being annotated, only the way that the content is recorded in the Annotation document.

If, after processing the prefix, exact, and suffix, the user agent discovers multiple matching text sequences, then the selection SHOULD be treated as matching all of the matches.

Note

If the content is under copyright or has other rights asserted on its use, then this method of selecting text is potentially dangerous. For example, a user might select the entire text of the document to annotate, which would not be desirable to copy into the Annotation and share. For static texts with access and/or distribution restrictions, the use of the Text Position Selector is perhaps more appropriate.

Example

Example 5: Text Quote Selector

{
    "source": "http://example.org/page1",
    "selector": {
      "type": "TextQuoteSelector",
      "exact": "anotation",
      "prefix": "this is an ",
      "suffix": " that has some"
    }
}

3.5 Text Position Selector

This Selector describes a range of text by recording the start and end positions of the selection in the stream. Position 0 would be immediately before the first character, position 1 would be immediately before the second character, and so on. The start character is thus included in the list, but the end character is not.

For example, if the document was "abcdefghijklmnopqrstuvwxyz", the start was 4, and the end was 7, then the selection would be "efg".

Example Use Case: Valeria writes a review of an ebook that does not allow its content to be extracted and copied. Her client describes the selection using its start and end position in the content.

Model

Term	Type	Description
type	Relationship	The class of the Selector. Text Position Selectors MUST have exactly 1 `type` and the value MUST be `TextPositionSelector`.
TextPositionSelector	Class	The class for a Selector which describes a range of text based on its start and end positions. The TextPositionSelector MUST have this class associated with it.
start	Property	The starting position of the segment of text. The first character in the full text is character position 0, and the character is included within the segment. Each TextPositionSelector MUST have exactly 1 `start` property, and the value MUST be a non-negative integer.
end	Property	The end position of the segment of text. The character is not included within the segment. Each TextPositionSelector MUST have exactly 1 `end` property, and the value MUST be a non-negative integer.

The text MUST be selected and normalized in the same way as for the Text Quote Selector before counting the number of characters to determine the start and end positions.

Note

The use of this Selector does not require text to be copied from the Source document into the Annotation graph, unlike the Text Quote Selector, but is very brittle with regards to changes to the resource. Any edits or dynamically transcluded content may change the selection, and thus it is RECOMMENDED that a State be additionally used to help identify the correct representation.

Example

Example 6: Text Position Selector

{
    "source": "http://example.org/ebook1",
    "selector": {
      "type": "TextPositionSelector",
      "start": 412,
      "end": 795
    }
}

3.6 Data Position Selector

Similar to the Text Position Selector, the Data Position Selector uses the same properties but works at the byte in bitstream level rather than the character in text level.

Example Use Case: Wendy produces visualizations of regions of online disk images for forensic purposes. Her client generates the start and end positions from the binary stream, rather than the more human readable display she is using.

Model

Term	Type	Description
type	Relationship	The class of the Selector. Data Position Selectors MUST have exactly 1 `type` and the value MUST be `DataPositionSelector`.
DataPositionSelector	Class	The class for a Selector which describes a range of data based on its start and end positions within the byte stream. The DataPositionSelector MUST have this class associated with it.
start	Property	The starting position of the segment of data. The first byte is character position 0. Each DataPositionSelector MUST have exactly 1 `start` property.
end	Property	The end position of the segment of data. The last character is not included within the segment. Each DataPositionSelector MUST have exactly 1 `end` property.

Example

Example 7: Data Position Selector

{
    "source": "http://example.org/diskimg1",
    "selector": {
      "type": "DataPositionSelector",
      "start": 4096,
      "end": 4104
    }
}

3.7 SVG Selector

An SvgSelector defines an area through the use of the Scalable Vector Graphics [svg11] standard. This allows the user to select a non-rectangular area of the content, such as a circle or polygon by describing the region using SVG. The SVG may be either embedded or referenced as an External Web Resource.

Note that the SvgSelector uses SVG to select an area of a resource. Segments of an SVG representation may also be selected using selectors, including the FragmentSelector or even an SvgSelector.

Example Use Case: Xena is tagging an old map online with a diagonal region for a historical road. Her client creates SVG polygon to highlight the region by overlaying a transparent area with a different color.

Model

Term	Type	Description
type	Relationship	The class of the Selector. SVG Selectors MUST have exactly 1 `type` and the value MUST include `SvgSelector`.
SvgSelector	Class	The class for a Selector which defines a shape for the selected area using the SVG standard. The Selector MUST have this class associated with it.
value	Property	The character sequence of the SVG content. There MAY be exactly 1 `value` property associated with the Selector, and if so the value of the property MUST be well-formed SVG XML.

The dimensions of the SVG shape or canvas MUST be relative to the dimensions of the Source, such that scaling the shape's size to the full size of the image correctly describes the desired area.

Note

Implementers SHOULD use only commonly supported features of SVG that directly contribute to describing a region, rather than styling or transformation, in order to maximize interoperability between systems. It is NOT RECOMMENDED to include style information within the SVG element, nor Javascript, animation, text or other non-shape oriented information. Clients SHOULD ignore such information if present.

Example

Example 8: SVG Selector

{
    "source": "http://example.org/map1",
    "selector": {
      "type": "SvgSelector",
      "id": "http://example.org/svg1"
    }
}

Example 9: SVG Selector, embedded

{
    "source": "http://example.org/map1",
    "selector": {
      "type": "SvgSelector",
      "value": "<svg:svg> ... </svg:svg>"
    }
}

3.8 Range Selector

Selections made by users may be extensive and/or cross over internal boundaries in the representation, making it difficult to construct a single selector that robustly describes the correct content. A Range Selector can be used to identify the beginning and the end of the selection by using other Selectors. In this way, two points can be accurately identified using the most appropriate selection mechanisms, and then linked together to form the selection. The selection consists of everything from the beginning of the starting selector through to the beginning of the ending selector, but not including it.

Example Use Case: Yadira wants to comment on two adjacent cells in a table that is part of a web page. She selects the two cells and her client constructs XPaths to the the first cell, and the cell that immediately follows the second. Her client then creates a Range Selector with the first XPath Selector as the start, and the second XPath selector as the end.

Model

Term	Type	Description
type	Relationship	The class of the Selector. Range Selectors MUST have exactly 1 `type` and the value MUST be `RangeSelector`.
RangeSelector	Class	The type of the Range Selector resource. Range Selectors MUST have this class associated with them.
startSelector	Relationship	The Selector which describes the inclusive starting point of the range. There MUST be exactly 1 `startSelector` associated with a Range Selector.
endSelector	Relationship	The Selector which describes the exclusive ending point of the range. There MUST be exactly 1 `endSelector` associated with a Range Selector. Both `startSelector` and `endSelector` SHOULD be of the same class.

Example

Example 10: Range Selector

{
    "source": "http://example.org/page1.html",
    "selector": {
      "type": "RangeSelector",
      "startSelector": {
        "type": "XPathSelector",
        "value": "//table[1]/tr[1]/td[2]"
      },
      "endSelector": {
        "type": "XPathSelector",
        "value": "//table[1]/tr[1]/td[4]"
      }
    }
}

3.9 Refinement of Selection

It may be easier, more reliable or more accurate to specify the segment of interest of a resource as a selection of a selection, rather than as a selection of the complete resource. Particularly for resources that contain other resources, such as various packaging formats, this also allows decomposition of the selection mechanisms when the components do not have unique identifiers. This is accomplished by having selectors chained together, where each refines the results of the previous one.

Example Use Case: Zara selects a paragraph of text and then a short phrase within it to remove. Her client records the phrase as a TextQuoteSelector that further modifies a FragmentSelector used to identify the paragraph that the phrase is part of.

Model

Term	Type	Description
refinedBy	Relationship	The relationship between a broader selector and the more specific selector that should be applied to the results of the first. A Selector MAY be `refinedBy` 1 or more other Selectors. If more than 1 is given, then they are considered to be alternatives that will result in the same selection.

Example

Example 11: Selector Refinement

{
    "source": "http://example.org/page1",
    "selector": {
      "type": "FragmentSelector",
      "value": "para5",
      "refinedBy": {
        "type": "TextQuoteSelector",
        "exact": "Selected Text",
        "prefix": "text before the ",
        "suffix": " and text after it"
      }
    }
}

4. States

A State describes the intended state of a resource when selected, and thus provides the information needed to retrieve the correct representation of that resource. Web resources change over time, and a State might be used to describe how to recover the intended previous version. Web resources also have multiple formats, and a State might equally be used to describe how to retrieve that particular format.

The state aspect of a Web Resource requires two distinct entities:

the IRI of the overall resource; this is the same Source as used for Selectors (see 3. Selectors).
the identification for the state of the resource.

A State object is used to describe how to determine the state of interest from within the Source resource. These two entities are encapsulated in a Specific Resource.

Example Use Case: Alexandra visualizes data on a web page that changes frequently. Her client records information to allow other clients to hopefully reconstruct the original visualization.

Model

Term	Type	Description
state	Relationship	The relationship between the Specific Resource and the State. There MAY be 0 or more `state` relationships for each Specific Resource. Multiple States SHOULD select the same content, however some States will not have the same precision as others. Consuming user agents MUST pick one of the described segments, if they are different.

States MUST be processed before processing Selector information.

Example

Example 12: State

{
    "source": "http://example.org/page1",
    "state": {
      "id": "http://example.org/state1"
    }
}

4.1 Time State

A Time State resource records the time at which the resource is when the intended selection occurs, typically the time that the resource was created and/or a link to a persistent copy of the current version. The timestamp for the resource could be resolved via the Memento protocol, described in RFC 7089 [rfc7089].

Example Use Case: Britney makes a note about the current state of the front page of a news website, and flags that the page is likely to change often. Her client adds in a State with the current time to describe the version of the page.

Model

Term	Type	Description
type	Relationship	The class of the State. Time States MUST have exactly 1 `type` and the value MUST be `TimeState`.
TimeState	Class	A description of how to retrieve a representation of the Source resource that is temporally appropriate for the Annotation. The State MUST have this class associated with it.
sourceDate	Property	The timestamp at which the Source resource should be interpreted. There MAY be 0 or more `sourceDate` properties per TimeState. If there is more than 1, each gives an alternative timestamp at which the Source may be interpreted. The timestamp MUST be expressed in the `xsd:dateTime` format, and MUST use the UTC timezone expressed as "Z". If `sourceDate` is provided, then `sourceDateStart` and `sourceDateEnd` MUST NOT be provided.
sourceDateStart	Property	The timestamp that begins the interval over which the Source resource should be interpreted. There MAY be exactly 1 `sourceDateStart` property per TimeState. The timestamp MUST be expressed in the `xsd:dateTime` format, and MUST use the UTC timezone expressed as "Z". If `sourceDateStart` is provided then `sourceDateEnd` MUST also be provided.
sourceDateEnd	Property	The timestamp that ends the interval over which the Source resource should be interpreted. There MAY be exactly 1 `sourceDateEnd` property per TimeState. The timestamp MUST be expressed in the `xsd:dateTime` format, and MUST use the UTC timezone expressed as "Z". If `sourceDateEnd` is provided then `sourceDateStart` MUST also be provided.
cached	Relationship	A link to a copy of the Source resource's representation, appropriate for the application. There MAY be 0 or more `cached` relationships per TimeState. If there is more than 1, each gives an alternative copy of the representation.

Example

Example 13: Time State

{
    "source": "http://example.org/page1",
    "state": {
      "type": "TimeState",
      "cached": "http://archive.example.org/copy1",
      "sourceDate": "2015-07-20T13:30:00Z"
    }
}

4.2 Request Header State

As there are potentially many representations that can be delivered from a resource with a single IRI, and a selection may only apply to one of them, it is important to be able to record the HTTP Request headers that need to be sent to retrieve the correct representation. The HttpRequestState resource maintains a copy of the headers to be replayed when obtaining the representation.

Example Use Case: Carla retrieves a PDF representation of a web resource that can deliver HTML, PDF or plain text and then writes a description about it. She signals that her description is only about the PDF representation. Her client then includes a State to describe how to retrieve the target representation.

Model

Term	Type	Description
type	Relationship	The class of the State. Request Header States MUST have exactly 1 `type` and the value MUST be `HttpRequestState`.
HttpRequestState	Class	A description of how to retrieve an appropriate representation of the Source resource, based on the HTTP Request headers to send on the request. The State MUST have this class associated with it.
value	Property	The HTTP request headers to send as a single, complete string. An HttpRequestState MUST have exactly 1 `value` property.

Note

The representation retrieved from the server by the original annotator's client might not be completely determined by request headers alone. For example, the IP address of the client might also determine the language of the representation, based on the language of the country the user was present in at the time. If the server returns a Content-Location header, then the client might instead use it as the target of the Annotation, rather than the IRI that was requested.

Example

Example 14: HTTP Request State

{
    "source": "http://example.org/resource1",
    "state": {
      "type": "HttpRequestState",
      "value": "Accept: application/pdf"
    }
}

4.3 Refinement of State

Similar to the refinement of selection, it may be easier, more reliable or more accurate to specify the appropriate state of the resource as a hierarchy of atomic State resources. This is particularly appropriate for representing the combination of a State that reflects an internal transformation along with the results of a State that describes an external request. This decomposition is accomplished by having the states chained together in the same way as Selectors.

Further, given that the State(s) will likely result in a specific representation, there may be specific Selectors that are appropriate for describing the segment of the representation. In order to accommodate this, States may also be refined by Selectors.

Example Use Case: Devina writes a comment about a travel e-book which has many versions available over time, and is available in different formats. She is particularly commenting on a specific version and format, so her client adds both a TimeState to capture the time and an HttpRequestState to capture the format.

Model

Term	Type	Description
refinedBy	Relationship	The relationship between a broader State and either a more specific State or a Selector that SHOULD be applied to the results of the first. Each State MAY be `refinedBy` 1 or more other States or Selectors. If more than 1 is given, then they are considered to be alternatives that will result in the same result.

Example

Example 15: Refinement of States

{
    "source": "http://example.org/ebook1",
    "state": {
      "type": "TimeState",
      "sourceDate": "2016-02-01T12:05:23Z",
      "refinedBy": {
        "type": "HttpRequestState",
        "value": "Accept: application/epub+zip"
      }
    }
}

5. Selectors and States as Fragment Identifiers

Although Selectors and States provide a flexible way of identifying, e.g., a suitable Segment of a Resource, the fact that this is defined through an indirection using a Specific Resource may be an obstacle for some applications. For example, many RDF tools rely on a single IRI to identify and dereference a given resource, and the extra indirection introduced by Selectors and States would be considered to be a problem.

To mitigate this issue, a mapping of Selectors and States on IRI fragments [url] is defined below. As a result of this mapping the targeted Segment, or the relevant state, is expressed in a single (albeit complex) IRI. In that IRI the Selector, respectively the State, is expressed as a single string and serves as a fragment combined with the IRI of the Source. Note that this representation is valid only if the IRI for the Source does not contain a fragment identifier of its own (an IRI may contain at most one fragment identification).

The syntax for mapping a Selector, respectively a State, follows the same, “functional” syntax as used, for example, by the XPointer Framework [xptr-framework]:

The fragment uses the selector(…), respectively the state(…), functional syntax
The (comma separated) “parameters” of the functional notation are:
- For the keys refinedBy, startSelector, and endSelector the syntax is key=selector(…), respectively key=state(…) when appropriate, with the value following, recursively, the same syntax as the full fragment;
- otherwise the key, and the corresponding value, follows the simple key=value syntax, e.g., type=FragmentSelector.
For types and properties other than those specified in the Web Annotation model, the full IRI MUST be used.

(see the examples below.)

The values SHOULD be percent encoded [rfc3986]; the encoding is a MUST for characters that may make the fragment ambiguous, namely:

character	code
space	%20
`=`	%3D
`,`	%2C
`#`	%23

Note

A fragment identifier is defined for a specific media type. This means that, formally, the fragment identifier syntax and semantics defined in this section should be registered for each media type separately by IANA. Until such a registration is done, these fragment identifiers have the potential to conflict with other fragments possibly specified by the media type registrations. Consequently, this pattern should only be used when the implementation cannot produce or manage the full representation described above.

5.1 JSON examples converted to fragment identifiers

This section contains a mapping of all examples used in the definion of Selectors and States onto full IRI-s with fragment identifiers. Note that the examples below have been, in some cases, broken into several lines for a greater readability; in real usage such new lines are not allowed in an IRI.

Note

A simple converter tool is also available to test the conversion of the JSON format to fragment and back.

Example for a 3.1 Fragment Selector

Example 16: Fragment Selector as Fragment

http://example.org/video1
    #selector(type=FragmentSelector,conformsTo=http://www.w3.org/TR/media-frags,
              value=t%3D30%2C60)

Example for a 3.2 CSS Selector

Example 17: CSS Selector as Fragment

http://example.org/page1.html
    #selector(type=CssSelector,value=%23elemid%20>%20.elemclass%20+%20p)

Example for a 3.3 XPath Selector

Example 18: XPath Selector as Fragment

http://example.org/page1.html
    #selector(type=XPathSelector,value=/html/body/p[2]/table/tr[2]/td[3]/span)

Example for a 3.4 Text Quote Selector

Example 19: Text Quote Selector as Fragment

http://example.org/page1
    #selector(type=TextQuoteSelector,exact=annotation,prefix=this%20is%20an%20,
              suffix=%20that%20has%20some)

Example for a 3.5 Text Position Selector

Example 20: Text Position Selector as Fragment

http://example.org/ebook1
    #selector(type=TextPositionSelector,start=412,end=795)

Example for a 3.6 Data Position Selector

Example 21: Data Position Selector as Fragment

http://example.org/diskimg1
    #selector(type=DataPositionSelector,start=4096,end=4104)

First example for a 3.7 SVG Selector

Example 22: SVG Selector as Fragment, referring to an external SVG

http://example.org/map1
    #selector(type=SvgSelector,id=http://example.org/svg1)

Second example for a 3.7 SVG Selector

Example 23: SVG Selector as Fragment, using embedded SVG

http://example.org/map1
    #selector(type=SvgSelector,
              value=<svg:svg>%20...%20</svg:svg>)

Please note that long SVG representations will produce very long URLs when produced according to this pattern. Care should be taken in environments where there is a character limit to URLs, and implementers should consider publishing the SVG as a separate resource and using its IRI as shown in Example 22.

Example for a 3.8 Range Selector

Example 24: Range Selector as Fragment

http://example.org/page1.html
    #selector(type=RangeSelector,
              startSelector=selector(type=XPathSelector,value=//table[1]/tr[1]/td[2]),
              endSelector=selector(type=XPathSelector,value=//table[1]/tr[1]/td[4]))

Example for a 3.9 Refinement of Selection

Example 25: Selector Refinement as Fragment

http://example.org/page1
    #selector(type=FragmentSelector,value=para5,
              refinedBy=selector(type=TextQuoteSelector,exact=Selected%20Text,
                        prefix=text%20before%20the%20,suffix=%20and%20text%20after%20it))

Example for a 4.1 Time State

Example 26: Time State as Fragment

http://example.org/page1
    #state(type=TimeState,cached=http://archive.example.org/copy1,
           sourceDate=2015-07-20T13:30:00Z)

Example for a 4.2 Request Header State

Example 27: HTTP Request State as Fragment

http://example.org/resource1
    #state(type=HttpRequestState,value=Accept:%20application/pdf)

Example for a 4.3 Refinement of State

Example 28: Refinement of States as Fragment

http://example.org/ebook1
    #state(type=TimeState,sourceDate=2016-02-01T12:05:23Z,
           refinedBy=state(type=HttpRequestState,value=Accept:%20application/epub+zip))

5.2 Serializing IRI to URL

Care should be taken that to make use of a Selectors and States IRIs as URLs (i.e., not only as identifiers but as locators), each segment of the IRI must be mapped to a corresponding URL segment following [rfc3987]. Applying percent encoding method for entire IRI string might also cause unnecessary troubles. Some examples:

Example 29: Text Quote Selector in Japanese as an IRI and a URL, respectively

http://jp.example.org/page1
    #selector(type=TextQuoteSelector,
	      exact=ペンを,
	      prefix=私は、,
	      suffix=持っています)

http://jp.example.org/page1
    #selector(type=TextQuoteSelector,
	      exact=%E3%83%9A%E3%83%B3%E3%82%92,
	      prefix=%E7%A7%81%E3%81%AF%E3%80%81,
	      suffix=%E6%8C%81%E3%81%A3%E3%81%A6%E3%81%84%E3%81%BE%E3%81%99)

Example 30: Percent Encoded Text Quote Selector as an IRI and a URL, respectively

http://example.org/page1
    #selector(type=TextQuoteSelector,exact=annotation,
	      prefix=this%20is%20an%20,suffix=%20that%20has%20some)

http://example.org/page1
    #selector(type=TextQuoteSelector,exact=annotation,
	      prefix=this%2520is%2520an%2520,suffix=%2520that%2520has%2520some)

Note that the IRI may also contain an internationalized domain name, which must be encoded as well (see [rfc3490]).

6. Examples in Turtle

This section contains all the examples used in the definion of Selectors and States expressed in [turtle], using the RDF vocabulary terms as defined in [annotation-vocab]. Note that, in contrast to the JSON examples, all examples below include the type definition, as an accepted practice in Linked Data environments. The namespaces used in the examples are:

Prefix	Namespace	Description
oa	http://www.w3.org/ns/oa#	[annotation-model]
dcterms	http://purl.org/dc/terms/	[dcterms]
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#	[rdf-schema]

Example for a 3.1 Fragment Selector

Example 31: Fragment Selector in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/video1> ;
     oa:hasSelector [
        a oa:FragmentSelector ;
        dcterms:conformsTo <http://www.w3.org/TR/media-frags/> ;
        rdf:value "t=30,60"
    ]  .

Example for a 3.2 CSS Selector

Example 32: CSS Selector in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/page1.html> ;
     oa:hasSelector [
        a oa:CssSelector ;
        rdf:value "#elemid > .elemclass + p"
     ] .

Example for a 3.3 XPath Selector

Example 33: XPath Selector in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/page1.html> ;
     oa:hasSelector [
        a oa:XPathSelector ;
        rdfs:value "/html/body/p[2]/table/tr[2]/td[3]/span"
     ] .

Example for a 3.4 Text Quote Selector

Example 34: Text Quote Selector in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/page1> ;
     oa:hasSelector [
        a oa:TextQuoteSelector ;
        oa:exact "anotation" ;
        oa:prefix "this is an " ;
        oa:suffix " that has some"
     ].

Example for a 3.5 Text Position Selector

Example 35: Text Position Selector in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/ebook1> ;
     oa:hasSelector [
        a oa:TextPositionSelector ;
        oa:start 412 ;
        oa:end 795
     ].

Example for a 3.6 Data Position Selector

Example 36: Data Position Selector in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/diskimg1> ;
     oa:hasSelector [
        a oa:DataPositionSelector ;
        oa:start 4096 ;
        oa:end 4104
    ].

First example for a 3.7 SVG Selector

Example 37: SVG Selector as Fragment, referring to an external SVG

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/map1> ;
     oa:hasSelector <http://example.org/svg1> .
<http://example.org/svg1> a oa:SvgSelector.

Second example for a 3.7 SVG Selector

Example 38: SVG Selector as Fragment, using embedded SVG

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/map1> ;
     oa:hasSelector [
        a oa:SvgSelector ;
        rdf:value "<svg:svg> ... </svg:svg>"
     ] .

Example for a 3.8 Range Selector

Example 39: Range Selector in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/page1> ;
     oa:hasSelector [
        a oa:RangeSelector ;
        oa:hasStartSelector [
           a oa:XPathSelector ;
           rdfs:value "//table[1]/tr[1]/td[2]"
        ] ;
        oa:hasEndSelector [
           a oa:XPathSelector ;
           rdfs:value "//table[1]/tr[1]/td[4]"
        ]
     ] .

Example for a 3.9 Refinement of Selection

Example 40: Selector Refinement in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/page1> ;
     oa:hasSelector [
        a oa:FragmentSelector ;
        rdf:value "para5" ;
        oa:refinedBy [
            a oa:TextQuoteSelector ;
            oa:exact "Selected Text" ;
            oa:prefix "text before the " ;
            oa:suffix "and text after it"
        ]
     ] .

Example for a 4.1 Time State

Example 41: Time State in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/page1> ;
     oa:hasState [
        a oa:TimeState ;
        oa:cachedSource <http://example.org/copy1> ;
        oa:sourceDate "2015-07-20T13:30:00Z"
    ] .

Example for a 4.2 Request Header State

Example 42: HTTP Request State in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/resource1> ;
     oa:hasState [
        a oa:HttpRequestState ;
        rdf:value "Accept: application/pdf"
     ] .

Example for a 4.3 Refinement of State

Example 43: Refinement of States in Turtle

[] a oa:ResourceSelection;
     oa:hasSource <http://example.org/ebook1> ;
     oa:hasState [
        a oa:TimeState ;
        oa:sourceDate "2016-02-01T12:05:23Z" ;
        oa:refinedBy [
           a oa:HttpRequestState ;
           rdf:value "Accept: application/epub+zip"
        ]
     ] .

	Fragment	CSS	XPath	Text Quote	Text Position	Data Position	Svg
HTML (text/html)	✔︎	✔︎	✔︎	✔︎	✔︎	✘	✘
CSV (text/csv)	✔︎	✘	✘	✔︎	✔︎	✘	✘
Plain Text (text/plain)	✔︎	✘	✘	✔︎	✔︎	✘	✘
Other text files (text/*)	?	✘	✘	✔︎	✔︎	✘	✘
EPUB2, EPUB3 (application/epub+zip)	✔︎	✘	✘	✔︎	✘	✘	✘
PDF (application/pdf)	✔︎	✘	✘	✔︎	✔︎	✘	✘
XML (application/xml, application/*+xml)	✔︎	✔︎	✔︎	✔︎	✔︎	✘	✘
SVG (image/svg+xml)	✔︎	✔︎	✔︎	✔︎	✔︎	✘	✔︎
Image, other than SVG (image/gif, image/jpeg, image/png, image/tiff)	✔︎	✘	✘	✘	✘	?	✔︎
Video (video/*)	✔︎	✘	✘	✘	✘	?	✔︎
Binary Data Files	?	✘	✘	✘	✘	✔︎	✘

	Fragment	CSS	XPath	Text Quote	Text Position	Data Position	Svg
CSS (text/css)	✘	✘	✘	✔︎	✔︎	✘	✘
TSV (text/tab-separated-values)	✔︎^✝	✘	✘	✔︎	✔︎	✘	✘
RDF/Turtle (text/turtle)	✔︎^✝	✘	✘	?	?	✘	✘
JSON (application/json, application/*+json)	✘	✘	✘	✔︎	?	✘	✘
Programming languages (application/javascript, python files, etc.)	✘	✘	✘	✔︎	?	✘	✘
^✝Fragments are not formally defined through IETF, though there are well-known connections to existing fragments or practices

Term	Usage
cached	Time State
conformsTo	Fragment Selector
end	Text Position Selector, Data Position Selector
endSelector	Range Selector
exact	Text Quote Selector
prefix	Text Quote Selector
refinedBy	Selector, State
selector	Specific Resource
source	Specific Resource
sourceDate	Time State
sourceDateEnd	Time State
sourceDateStart	Time State
start	Text Position Selector, Data Position Selector
startSelector	Range Selector
state	Specific Resource
suffix	Text Quote Selector
type	Note: Every object MAY have a `type`. Specific Resource, Fragment Selector, CSS Selector, XPath Selector, Text Quote Selector, Text Position Selector, Data Position Selector, SVG Selector, Time State, Request Header State
value	Fragment Selector, CSS Selector, SVG Selector, XPath Selector, Request Header State

Abstract

Status of This Document

1. Introduction

1.1 Relationships to RDF

1.2 Terminology

2. Specific Resources

Model

3. Selectors

Model

Example

3.1 Fragment Selector

Model

Example

3.2 CSS Selector

Model

Example

3.3 XPath Selector

Model

Example

3.4 Text Quote Selector

Model

Example

3.5 Text Position Selector

Model

Example

3.6 Data Position Selector

Model

Example

3.7 SVG Selector

Model

Example

3.8 Range Selector

Model

Example

3.9 Refinement of Selection

Model

Example

4. States

Model

Example

4.1 Time State

Model

Example

4.2 Request Header State

Model

Example

4.3 Refinement of State

Model

Example

5. Selectors and States as Fragment Identifiers

5.1 JSON examples converted to fragment identifiers

5.2 Serializing IRI to URL

6. Examples in Turtle

A. Correspondence Among Media Types and Selectors

A.1 Additional Media Types/Selector Combination

B. Index of JSON Terms

C. Acknowledgements

D. References

D.1 Informative references