This document defines the procedures and rules to be applied when converting tabular data into JSON. Tabular data may be complemented with metadata annotations that describe its structure, the meaning of its content and how it may form part of a collection of interrelated tabular data. This document specifies the effect of this metadata on the resulting JSON.

The CSV on the Web Working Group was chartered to produce a recommendation "Access methods for CSV Metadata" as well as recommendations for "Metadata vocabulary for CSV data" and "Mapping mechanism to transforming CSV into various formats (e.g., RDF, JSON, or XML)". This document aims to satisfy the JSON variant of the mapping recommendation.

Introduction

This document describes the processing of tabular data to create a set of nested objects that MUST be serialized as JSON [[!RFC7159]].

The conversion of CSV content to JSON is intended for web developers who need not care about the complexities of RDF [[!rdf11-concepts]]. Where the formality of RDF is required, [[!csv2rdf]] provides the procedures for mapping from CSV content to RDF which may be serialized to [[json-ld]].

The [[!tabular-data-model]] defines an annotated tabular data model consisting of tables, columns, rows, and cells, enriched with annotations that describe the structure of the tabular data and the meaning of its content. A group of tables is a collection of tables published as a single atomic unit.

The conversion procedure described in this specification operates on the annotated tabular data model. This specification does not specify the processes needed to convert CSV-encoded data into tabular data form. Please refer to [[!tabular-data-model]] for details of parsing tabular data.

Conversion applications MUST provide at least two modes of operation: standard and minimal.

Standard mode conversion frames the information gleaned from the cells of the tabular data with details of the rows, tables, and a group of tables within which that information is provided.

Minimal mode conversion includes only the information gleaned from the cells of the tabular data within the output.

Standard and minimal conversion are described normatively below.

Conversion applications MAY offer additional implementation specific conversion modes.

Transformation definitions, as defined in [[!tabular-metadata]] MAY be used to specify how tabular data can be transformed into another format using a script or template. Such transformation definitions MAY use the JSON output described in this specification as input.

Tabular data MUST conform to the description from [[!tabular-data-model]]. In particular note that each row MUST contain the same number of cells (although some of these cells may be empty).

Not all CSV-encoded data can be parsed into a tabular data model. An algorithm for parsing CSV-based files is described in [[!tabular-data-model]].

Typographical conventions

Converting Tabular Data to JSON

The procedures for converting tabular data into JSON are described below for both standard and minimal modes.

Algorithm terms

about URL
The about URL annotation on the current cell. As defined in [[!tabular-data-model]].
annotated table
The annotated table is defined in [[!tabular-data-model]] as describing a particular table and its annotations.
array
An array is defined in JSON ([[!RFC7159]]) as an ordered sequence of zero or more values, where a value is a string, number, boolean, null, object, or array.
cell
A cell is defined in [[!tabular-data-model]] as the intersection of a row and a column within a table.
cell errors
Cell errors are defined in [[!tabular-data-model]] as a (possibly empty) list of validation errors generated while parsing the literal content of a cell to generate the semantic value.
cell value
A cell value is defined in [[!tabular-data-model]] as the semantic value of the cell; this MAY be null or a sequence of values.
column
A column is defined in [[!tabular-data-model]] as a vertical arrangement of cells within a table.
group of tables
A group of tables is defined in [[!tabular-data-model]] as comprising a set of annotated tables and a set of annotations that relate to that group.
group of tables identifier
The group of tables identifier is the id annotation on a group of tables. As defined in [[!tabular-data-model]].
name
In the context of this specification, name is used as defined in JSON ([[!RFC7159]]); that is, that name is a string that provides a unique key within a set of name-value pairs within a JSON object.
non-core annotations
Core annotations are listed in [[!tabular-data-model]]; groups of tables and tables may also have other annotations that are not defined in that specification; these are known as non-core annotations.
notes
A list of notes, as defined in [[!tabular-data-model]], attached to an annotated table or group of tables using the notes property. This may be an empty list.
object
An object is defined in JSON ([[!RFC7159]]) as an unordered collection of zero or more name-value pairs, where name is a string and value is a string, number, boolean, null, object, or array.
property URL
The property URL annotation on the current cell. As defined in [[!tabular-data-model]].
row
The row is defined in [[!tabular-data-model]] as a horizontal arrangement of cells within a table.
row number
A row number is defined in [[!tabular-data-model]] as the position of the row within the table, starting from 1.
row source number
A row source number is defined in [[!tabular-data-model]] as the position of the row within the source tabular data file. Provision of the row source number is dependent on parsing applications and may be reported as null.
subject
Within this algorithm, a subject is the resource that the value of a given cell refers to. This may be specified using about URL.
table identifier
The table identifier is the id annotation on an annotated table. As defined in [[!tabular-data-model]].
value URL
The value URL annotation on the current cell. As defined in [[!tabular-data-model]].

Generating JSON

A conformant JSON conversion application MUST produce output conforming to this algorithm according to the chosen mode of conversion: standard or minimal.

Where an annotated table is defined in isolation (e.g. in the absence of a group of tables), a default group of tables is provided with a single tables annotation that refers to the given table.

The [[!tabular-data-model]] specifies that string values within tabular data (such as column titles or cell string values) MUST contain only Unicode characters. No Unicode normalization (as specified in [[UAX15]]) is applied to these string values during the conversion to JSON.

Minimal mode

The steps in the algorithm defined here apply to minimal mode.

  1. Insert an empty array A into the JSON output. The objects containing the name-value pairs associated with the cell values will be subsequently inserted into this array.

  2. Each table is processed sequentially in the order they are referenced in the group of tables. For each table where the suppress output annotation is false:

    1. Each row within the table is processed sequentially in order. For each row in the current table:

      1. Generate a sequence of objects, S1 to Sn, each of which corresponds to a subject described by the current row, as described in .

        The subject(s) described by each row are determined according to the about URL annotation for each cell in the current row. Where about URL is undefined, a default subject for the row is used.

      2. As described in , process the sequence of objects, S1 to Sn, to produce a new sequence of root objects, SR1 to SRm, that MAY include nested objects.

        A row MAY describe multiple interrelated subjects; where the value URL annotation on one cell matches the about URL annotation on another cell in the same row.

      3. Insert each root object, SR1 to SRm, into array A.

Standard mode

The steps in the algorithm defined here apply to standard mode.

  1. Insert an empty object G into the JSON output which is associated with the group of tables.

  2. If the group of tables has an identifier IG; insert the following name-value pair into object G:

    name
    @id
    value
    IG
  3. Insert any notes and non-core annotations specified for the group of tables into object G according to the rules provided in .

  4. Insert the following name-value pair into object G:

    name
    tables
    value
    AT

    where AT is an array into which the objects describing the annotated tables will be subsequently inserted.

  5. Each table is processed sequentially in the order they are referenced in the group of tables.

    For each table where the suppress output annotation is false:

    1. Insert an empty object T into the array AT to represent the table.

    2. If the table has an identifier IT; insert the following name-value pair into object T:

      name
      @id
      value
      IT
    3. Specify the source tabular data file URL for the current table based on the url annotation; insert the following name-value pair into object T:

      name
      url
      value
      URL
    4. Insert any notes and non-core annotations specified for the table into object T according to the rules provided in .

      All other core annotations for the table are ignored during the conversion; including information about table schemas and their columns, foreign keys, table direction, transformations, etc.

    5. Insert the following name-value pair into object T:

      name
      row
      value
      AR

      where AR is an array into which the objects describing the rows will be subsequently inserted.

    6. Each row within the table is processed sequentially in order. For each row in the current table:

      1. Insert an empty object R into the array AR to represent the row.

      2. Specify the row number n for the row; insert the following name-value pair into object R:

        name
        rownum
        value
        n
      3. Specify the row source number nsource for the row within the source tabular data file URL using a fragment-identifier as specified in [[RFC7111]]; if row source number is not null, insert the following name-value pair into object R:

        name
        url
        value
        URL#row=nsource
      4. Specify any titles for the row; if row titles is not null, insert the following name-value pair into object R:

        name
        titles
        value
        t

        where t is the single value or array of values provided by the row titles annotation.

        JSON has no native support for expressing language information; therefore any such information associated with the row titles is ignored.

      5. Insert any non-core annotations specified for the row into object R according to the rules provided in .

      6. Insert the following name-value pair into object R:

        name
        describes
        value
        A

        where A is an array. The objects containing the name-value pairs associated with the cell values will be subsequently inserted into this array.

      7. Generate a sequence of objects, S1 to Sn, each of which corresponds to a subject described by the current row, as described in .

        The subject(s) described by each row are determined according to the about URL annotation for each cell in the current row. Where about URL is undefined, a default subject for the row is used.

      8. As described in , process the sequence of objects, S1 to Sn, to produce a new sequence of root objects, SR1 to SRm, that MAY include nested objects.

        A row MAY describe multiple interrelated subjects; where the value URL annotation on one cell matches the about URL annotation on another cell in the same row.

      9. Insert each root object, SR1 to SRm, into array A.

Generating Objects

The steps in the algorithm defined here apply to both standard and minimal modes.

This algorithm generates a sequence of objects, S1 to Sn, each of which corresponds to a subject described by the current row. The algorithm inserts name-value pairs into Si depending on the cell values as outlined in the following steps.

  1. Determine the unique subjects for the current row. The subject(s) described by each row are determined according to the about URL annotation for each cell in the current row. A default subject for the row is used for any cells where about URL is undefined.

  2. For each subject that the current row describes where at least one of the cells that refers to that subject has a value or value URL that is not null, and is associated with a column where suppress output annotation is false:

    1. Create an empty object Si to represent the subject i.

      (i is the index number with values from 1 to n, where n is the number of subjects for the row)

      Subject i is identified according to the about URL annotation of its associated cells: IS. For a default subject where about URL is not specified by its cells, IS is null.

    2. If the identifier for subject i, IS, is not null, then insert the following name-value pair into object Si:

      name
      @id
      value
      IS
    3. Each cell referring to subject i is then processed sequentially according to the order of the columns.

      For each cell referring to subject i, where the suppress output annotation for the column associated with that cell is false, insert a name-value pair into object Si as described below:

      1. If the value of property URL for the cell is not null, then name N takes the value of property URL compacted according to the rules as defined in URL Compaction in [[!tabular-metadata]].

        Else, name N takes the URI decoded value of the name annotation for the column associated with the cell. (URI decoding is necessary as name may have been encoded if it was taken from a supplied title.)

      2. If the value URL for the current cell is not null, then insert the following name-value pair into object Si:

        name
        N
        value
        Vurl

        where Vurl is the value of value URL annotation for the current cell expressed as a string in the JSON output. If N is @type, compact Vurl according to the rules as defined in URL Compaction in [[!tabular-metadata]].

      3. Else, if the cell value is a list that is not empty, then the cell value provides a sequence of values for inclusion within the JSON output; insert an array Av containing each value V of the sequence into object Si:

        name
        N
        value
        Av

        Each of the values V derived from the sequence MUST be expressed in the JSON output according to the datatype of V as defined below in .

        Since arrays are implicitly ordered in JSON, the ordered annotation has no effect on the JSON output.

      4. Else, if the cell value is not null, then the cell value provides a single value V for inclusion within the JSON output; insert the following name-value pair into object Si:

        name
        N
        value
        V

        Value V derived from the cell values MUST be expressed in the JSON output according to the datatype of the value as defined in .

    4. If name N occurs more than once within object Si, the name-value pairs from each occurrence of name N MUST be compacted to form a single name-value pair with name N and whose value is an array containing all values from each of those name-value pairs. Where the value from one or more contributing name-value pairs is of type array, the values from contributing arrays are included directly to the resulting array (i.e. arrays of values are flattened).

Generating Nested Objects

The steps in the algorithm defined herein apply to both standard and minimal modes.

Where the current row describes multiple subjects, it MAY be possible to organize the objects associated with those subjects such that some objects are nested within others; e.g. where the value URL annotation for one cell matches the about URL annotation for another cell in the same row. This algorithm considers a sequence of objects generated according to , S1 to Sn, each of which corresponds to a subject described by the current row. It generates a new sequence of root objects, SR1 to SRm, that MAY include nested objects.

Where the current row describes only a single subject, this algorithm may be bypassed as no nesting is possible. In such a case, the root object SR1 is identical to the original object S1.

This nesting algorithm is based on the interrelationships between subjects described within a given row that are specified using the value URL annotation. Cell values expressing the identity of a subject in the current row (i.e., as a simple literal) will be ignored by this algorithm.

The algorithm uses the following terms:

child
If two vertices are connected in a tree, the one which is further away from the root of the tree is referred to as the child of the other.
descendant
A vertex N is a descendant of a vertex M if either N is the child M, or there are vertices V1,…,Vk, such that V1=M, Vk=N, and Vk+1 is a child of Vk.
edge
One of the main constituents of graphs; edges are used to establish relationships among vertices. In the context of this algorithm, edges are expressed in JSON using a name-value pair whose value is another object or an array of objects.
forest
A collection of disjoint trees. For the purpose of this algorithm, the order of trees is important, i.e., forests can also be viewed as an ordered sequence of roots.
graph
Data structure consisting of vertices (or "nodes") and edges. See, for example, [[Knuth]] for further details.
node
Synonym of vertex.
root
A dedicated vertex in a tree; a root is not the child of any vertex.
tree
A tree (or rooted tree) is a connected, acyclic graph where one vertex has been designated as the root, in which case the edges have a natural orientation towards or away from the root.
vertex
One of the main constituents of graphs; a vertex usually holds further information or data. In the context of this algorithm, vertices are used to represent the JSON objects.

The nesting algorithm is defined as follows:

  1. For all cells in the current row, determine the value URLs, Vurl, that occur only once. The list of these uniquely occurring value URLs is referred to as the URL-list.

  2. Create an empty forest F. Vertices in the trees of this forest represent the subjects described by the current row.

  3. For each object Si in the sequence S1 to Sn:

    1. Determine the identity of object Si: IS. If present in object Si, the name-value pair with name @id provides the value of IS. Else, object Si is not explicitly identified and IS is null.

    2. Check whether there is a vertex N in forest F that represents object Si. If none of the existing vertices in forest F represent object Si, then insert a new tree into forest F whose root is a vertex N that represents object Si and has identity IS.

    3. For all cells associated with the current object Si (e.g. whose about URL annotation matches IS):

      1. If the value URL annotation of the current cell is defined and its value, Vurl, appears in the URL-list, then check each of the other objects in the sequence S1 to Sn to determine if Vurl identifies one of those objects.

        For object Sj, if the name-value pair with name @id is present and its value matches Vurl, then:

        1. If the root of the tree containing vertex N is a vertex that represents object Sj, then object Si is already a descendant of object Sj; no further action SHOULD be taken for this instance of Vurl.

          This clause in the algorithm prevents circular loops from being created.

          Furthermore, because the URL-list contains value URLs that occur only once for the current row, object Si cannot be a descendant of intermediate vertices in the tree.

        2. Else, if there is a root vertex M in forest F that represents object Sj, then set vertex M as a child of vertex N and remove vertex M from the list of roots in forest F (i.e., the tree rooted by M becomes a sub-tree of N).

        3. Else, create a new vertex M that represents object Sj as a child of vertex N.

  4. Each vertex in forest F represents an object in the original sequence of objects S1 to Sn and is associated with a subject described by the current row. Rearrange objects S1 to Sn such that they mirror the structure of the trees in forest F as follows: If vertex M, representing object Si, is a child of vertex N, representing object Sj, then the name-value pair in object Sj associated with the edge relating M and N MUST be modified such that the (literal) value, Vurl, from that name-value pair is replaced by object Si thus creating a nested object.

    Objects represented by root vertices are referred to as root objects.

  5. Return the sequence of root objects, SR1 to SRm.

An implementation may be able to optimize the algorithm by skipping branches (e.g. if URL-list is empty) or by other means.

Interpreting datatypes

Cell values are expressed in the JSON output according to the cell value's datatype. The relationship between the base annotation value of the cell value's datatype and the primitive types supported by JSON (as specified in [[!RFC7159]]) is provided below:

datatypeJSON primitive type
anyAtomicTypestring
anyURIstring
base64Binarystring
booleanboolean
datestring
dateTimestring
dateTimeStampstring
decimalnumber
integernumber
longnumber
intnumber
shortnumber
bytenumber
nonNegativeIntegernumber
positiveIntegernumber
unsignedLongnumber
unsignedIntnumber
unsignedShortnumber
unsignedBytenumber
nonPositiveIntegernumber
negativeIntegernumber
doublenumber
durationstring
dayTimeDurationstring
yearMonthDurationstring
floatnumber
gDaystring
gMonthstring
gMonthDaystring
gYearstring
gYearMonthstring
hexBinarystring
QNamestring
stringstring
normalizedStringstring
tokenstring
languagestring
Namestring
NMTOKENstring
xmlstring
htmlstring
jsonstring
timestring

Instances of JSON reserved characters within string values MUST be escaped as defined in [[!RFC7159]].

JSON has no native support for expressing language information; therefore the language of a value has no effect on the JSON output.

Only the base annotation value is used to determine the primitive type used within the JSON output. Additional restrictions to the cell value's datatype, such as the id annotation, are ignored for the purposes of conversion to JSON.

A datatype's format is irrelevant to the conversion procedure defined in this specification; the cell value has already been parsed from the contents of the cell according to the format annotation.

Cell errors MUST be recorded by applications where the contents of a cell cannot be parsed or validated (see Parsing Cells and Validating Tables in [[!tabular-data-model]] respectively). In cases where cell errors are recorded, applications may attempt to determine the appropriate JSON primitive type during the subsequent conversion process according to local rules.

JSON-LD to JSON

This section defines a mechanism for transforming the [[json-ld]] dialect used for non-core annotations and notes originating from the processing of metadata (as defined in [[!tabular-metadata]]) into JSON.

Conversion applications may have other means to create annotated tables, e.g. through some application specific APIs. In such cases the exact format for non-core annotations or notes may be different. Specifications for such annotation processes should specify how these annotations are converted into JSON.

Name-value pairs from notes and non-core annotations annotations are generally copied verbatim from the metadata description subject to the exceptions below:

  1. Name-value pairs whose value is an object using the [[json-ld]] keyword @value, for example:

    name
    N
    value
    { "@value": "V" }

    are transformed to:

    name
    N
    value
    V

    Name-value pairs occurring within the value object that use [[json-ld]] keywords @language and @type are ignored.

  2. Name-value pairs whose value is an object using the [[json-ld]] keyword @id to coerce a string-value to be interpreted as an IRI, for example:

    name
    N
    value
    { "@id": "Vurl" }

    are transformed to:

    name
    N
    value
    Vurl

In addition to compacting values of property URLs, URLs which were the value of objects using the [[json-ld]] keyword @type are compacted according to the rules as defined in URL Compaction in [[!tabular-metadata]].

Examples

Each of the examples expresses more complex conversions - it is recommended that readers of this specification work through the examples in sequential order.

Simple example

This example comprises a single annotated table containing information attributes about countries; country code, position (latitude, longitude) and name. Whilst the input tabular data file, published at http://example.org/countries.csv, includes a header line, no further metadata annotations are given. The tabular data file is provided below:

        

The annotated table generated from parsing the tabular data file is shown below and provides the basis for the conversion to JSON.

Annotations for the resulting table T, with 4 columns and 3 rows, are shown below:

idcore annotations
urlcolumnsrows
Thttp://example.org/countries.csvC1, C2, C3, C4R1, R2, R3

Annotations for the columns, rows and cells in table T are shown in the tables below:

Column annotations:

idcore annotations
tablenumbersource numbercellsnametitles
C1T11C1.1, C2.1, C3.1countryCodecountryCode
C2T22C1.2, C2.2, C3.2latitudelatitude
C3T33C1.3, C2.3, C3.3longitudelongitude
C4T44C1.4, C2.4, C3.4namename

Row annotations:

idcore annotations
tablenumbersource numbercells
R1T12C1.1, C1.2, C1.3, C1.4
R2T23C2.1, C2.2, C2.3, C2.4
R3T34C3.1, C3.2, C3.3, C3.4

Cell annotations:

idcore annotations
tablecolumnrowstring valuevalueproperty URL
C1.1TC1R1"AD""AD"null
C1.2TC2R1"42.5""42.5"null
C1.3TC3R1"1.6""1.6"null
C1.4TC4R1"Andorra""Andorra"null
C2.1TC1R2"AE""AE"null
C2.2TC2R2"23.4""23.4"null
C2.3TC3R2"53.8""53.8"null
C2.4TC4R2"United Arab Emirates""United Arab Emirates"null
C3.1TC1R3"AF""AF"null
C3.2TC2R3"33.9""33.9"null
C3.3TC3R3"67.7""67.7"null
C3.4TC4R3"Afghanistan""Afghanistan"null

Minimal mode output for this example is provided below:

        

The about URL annotation has not been set for cells in table T ({ "url": "http://example.org/countries.csv"}); cells in a given row where about URL has not been specified are assumed to refer to the same subject and so the name-value pairs associated with the cell values of that row occur within the same object.

Given that the property URL is null for cells in table T ({ "url": "http://example.org/countries.csv"}), the simplified name is used in the name-value pairs; e.g. countryCode rather than http://example.org/countries.csv#countryCode

Standard mode output for this example is provided below:

        

Even though the table was defined in isolation, the annotated table is wrapped in a group of tables.

The name-value pair with name url provides reference to the original tabular data file and to specific rows therein.

The row number is provided for each row using name-value pair with name rownum.

The object containing the name-values pairs associated with the cell values of a row are related to the object for that row using the name-value pair with name describes.

Example with single table and rich annotations

This example is based on Use Case #11 - City of Palo Alto Tree Data and comprises a single annotated table describing an inventory of tree maintenance operations. The input tabular data file, published at http://example.org/tree-ops-ext.csv, and the associated metadata description http://example.org/tree-ops-ext.csv-metadata.json are provided below:

        
        

The notes annotation in the metadata description uses the Open Annotation data model currently under development within the Web Annotations Working Group. This is purely illustrative; no constraints are placed on the value of the notes annotation.

The annotated table generated from parsing the tabular data file and associated metadata is shown below and provides the basis for the conversion to JSON.

Core annotations for the resulting table T, with 9 columns and 3 rows, are shown below:

idcore annotations
idurlcolumnsrowsnotes
T<http://example.org/tree-ops-ext>http://example.org/tree-ops-ext.csvC1, C2, C3, C4, C5, C6, C7, C8, C9R1, R2, R3[{ "@type": "oa:Annotation", ... }]

Non-core annotations for the table T are:

dc:title
"Tree Operations"
dcat:keyword
["tree", "street", "maintenance"]
dc:publisher
[{ "schema:name": "Example Municipality", "schema:url": { "@id": "http://example.org" } }]
dc:license
{ "@id": "http://opendefinition.org/licenses/cc-by/" }
dc:modified
"2010-12-31"

The value of the notes annotation has been shortened for clarity in the table above.

Annotations for the columns, rows and cells in table T are shown in the tables below:

Column annotations:

idcore annotationsannotations
tablenumbersource numbercellsnametitlesrequiredsuppress outputdc:description
C1T11C1.1, C2.1, C3.1GIDGID, Generic IdentifiertruetrueAn identifier for the operation on a tree.
C2T22C1.2, C2.2, C3.2on_streetOn StreetThe street that the tree is on.
C3T33C1.3, C2.3, C3.3speciesSpeciesThe species of the tree.
C4T44C1.4, C2.4, C3.4trim_cycleTrim CycleThe operation performed on the tree.
C5T55C1.5, C2.5, C3.5dbhDiameter at Breast HtDiameter at Breast Height (DBH) of the tree (in feet), measured 4.5ft above ground.
C6T66C1.6, C2.6, C3.6inventory_dateInventory DateThe date of the operation that was performed.
C7T77C1.7, C2.7, C3.7commentsCommentsSupplementary comments relating to the operation or tree.
C8T88C1.8, C2.8, C3.8protectedProtectedIndication (YES / NO) whether the tree is subject to a protection order.
C9T99C1.9, C2.9, C3.9kmlKMLKML-encoded description of tree location.

In this example, output for column C1 (GID) is not required; note the suppress output annotation on this column.

Row annotations:

idcore annotations
tablenumbersource numbercellsprimary key
R1T12C1.1, C1.2, C1.3, C1.4, C1.5, C1.6, C1.7, C1.8, C1.9C1.1
R2T23C2.1, C2.2, C2.3, C2.4, C2.5, C2.6, C2.7, C2.8, C2.9C2.1
R3T34C3.1, C3.2, C3.3, C3.4, C3.5, C3.6, C3.7, C3.8, C3.9C3.1

Cell annotations:

idcore annotations
tablecolumnrowstring valuevalueabout URL
C1.1TC1R1"1""1"http://example.org/tree-ops-ext#gid-1
C1.2TC2R1"ADDISON AV""ADDISON AV"<http://example.org/tree-ops-ext#gid-1>
C1.3TC3R1"Celtis australis""Celtis australis"<http://example.org/tree-ops-ext#gid-1>
C1.4TC4R1"Large Tree Routine Prune""Large Tree Routine Prune" (English)<http://example.org/tree-ops-ext#gid-1>
C1.5TC5R1"11"11<http://example.org/tree-ops-ext#gid-1>
C1.6TC6R1"10/18/2010"2010-10-18<http://example.org/tree-ops-ext#gid-1>
C1.7TC7R1""null<http://example.org/tree-ops-ext#gid-1>
C1.8TC8R1""false<http://example.org/tree-ops-ext#gid-1>
C1.9TC9R1"<Point><coordinates>-122.156485,37.440963</coordinates></Point>""<Point><coordinates>-122.156485,37.440963</coordinates></Point>" (XML)<http://example.org/tree-ops-ext#gid-1>
C2.1TC1R2"2""2"<http://example.org/tree-ops-ext#gid-2>
C2.2TC2R2"EMERSON ST""EMERSON ST"<http://example.org/tree-ops-ext#gid-2>
C2.3TC3R2"Liquidambar styraciflua""Liquidambar styraciflua"<http://example.org/tree-ops-ext#gid-2>
C2.4TC4R2"Large Tree Routine Prune""Large Tree Routine Prune" (English)<http://example.org/tree-ops-ext#gid-2>
C2.5TC5R2"11"11<http://example.org/tree-ops-ext#gid-2>
C2.6TC6R2"6/2/2010"2010-06-02<http://example.org/tree-ops-ext#gid-2>
C2.7TC7R2""null<http://example.org/tree-ops-ext#gid-2>
C2.8TC8R2""false<http://example.org/tree-ops-ext#gid-2>
C2.9TC9R2"<Point><coordinates>-122.156749,37.440958</coordinates></Point>""<Point><coordinates>-122.156749,37.440958</coordinates></Point>" (XML)<http://example.org/tree-ops-ext#gid-2>
C3.1TC1R3"6""6"<http://example.org/tree-ops-ext#gid-6>
C3.2TC2R3"ADDISON AV""ADDISON AV"<http://example.org/tree-ops-ext#gid-6>
C3.3TC3R3"Robinia pseudoacacia""Robinia pseudoacacia"<http://example.org/tree-ops-ext#gid-6>
C3.4TC4R3"Large Tree Routine Prune""Large Tree Routine Prune" (English)<http://example.org/tree-ops-ext#gid-6>
C3.5TC5R3"29"29<http://example.org/tree-ops-ext#gid-6>
C3.6TC6R3"6/1/2010"2010-06-01<http://example.org/tree-ops-ext#gid-6>
C3.7TC7R3"cavity or decay; trunk decay; codominant leaders; included bark; large leader or limb decay; previous failure root damage; root decay; beware of BEES""cavity or decay", "trunk decay", "codominant leaders", "included bark", "large leader or limb decay", "previous failure root damage", "root decay", "beware of BEES"<http://example.org/tree-ops-ext#gid-6>
C3.8TC8R3"YES"true<http://example.org/tree-ops-ext#gid-6>
C3.9TC9R3"<Point><coordinates>-122.156299,37.441151</coordinates></Point>""<Point><coordinates>-122.156299,37.441151</coordinates></Point>" (XML)<http://example.org/tree-ops-ext#gid-6>

The lists of values from cells in column C7 ("name": "comments") are assumed to be unordered as the boolean ordered annotation, which defaults to false, has not be set within the metadata description.

Minimal mode output for this example is provided below:

        

The subject described by each row is explicitly defined using the about URL annotation; e.g. the subject of row R1 is http://example.org/tree-ops-ext#gid-1.

Output for column C1 ({ "name": "GID" }) is not included as column suppress output annotation is true.

Cells C1.7 and C2.7 (rows R1 and R2; column, { "name": "comments" }) have null values - no output is included for these cells.

Cell C3.7 (row R3; column, { "name": "comments" }) contains a sequence of values; the set of values are included in an array.

Standard mode output for this example is provided below:

        

Table T ({ "url": "http://example.org/tree-ops-ext.csv"}) has been explicitly identified: { "@id": "<http://exmple.org/tree-ops-ext>"}.

Non-core annotations and notes specified for table T ({ "url": "http://example.org/tree-ops-ext.csv"}) are included in the output.

Example with single table and using virtual columns to produce multiple subjects per row

This example uses a single annotated table describing a listing of music events. Each row from the tabular data file corresponds to three resources; the music event itself, the location where that event occurs and the offer to sell tickets for that event. The goal is to convert the CSV content into schema.org markup that a search engine such as Google can use to index music events. Details of how Google expects this information to be structured can be found here.

The input tabular data file, published at http://example.org/events-listing.csv, and the associated metadata description http://example.org/events-listing.csv-metadata.json are provided below:

        
        

The CSV to JSON translation is limited to providing one statement, or triple, per column in the table. The target schema.org markup requires 10 statements to describe each event. As the base tabular data file contains 5 columns, an additional 5 virtual columns have been added in order to provide for the full complement of statements—including the relationships between the 3 resources (event, location, and offer) described by each row of the table. Note that the virtual annotation is true for these virtual columns.

Furthermore, note that no attempt is made to reconcile between locations or offers that may be associated with more than one event; every row in the table will create both a new location resource and offer resource in addition to the event resource. If considered necessary, applications such as OpenRefine may be used to identify and reconcile duplicate location resources once the JSON output has been generated.

The annotated table generated from parsing the tabular data file and associated metadata is shown below and provides the basis for the conversion to JSON.

Annotations for the resulting table T, with 10 columns and 2 rows, are shown below:

idcore annotations
urlcolumnsrows
Thttp://example.org/events-listing.csvC1, C2, C3, C4, C5, C6, C7, C8, C9, C10R1, R2

Annotations for the columns, rows and cells in table T are shown in the tables below:

Column annotations:

idcore annotations
tablenumbersource numbercellsnametitlesvirtual
C1T11C1.1, C2.1nameName
C2T22C1.2, C2.2start_dateStart Date
C3T33C1.3, C2.3location_nameLocation Name
C4T44C1.4, C2.4location_addressLocation Address
C5T55C1.5, C2.5ticket_urlTicket Url
C6T66C1.6, C2.6type_eventtrue
C7T77C1.7, C2.7type_placetrue
C8T88C1.8, C2.8type_offertrue
C9T99C1.9, C2.9locationtrue
C10T1010C1.10, C2.10offerstrue

Row annotations:

idcore annotations
tablenumbersource numbercells
R1T12C1.1, C1.2, C1.3, C1.4, C1.5, C1.6, C1.7, C1.8, C1.9, C1.10
R2T23C2.1, C2.2, C2.3, C2.4, C2.5, C2.6, C2.7, C2.8, C2.9, C2.10

Cell annotations:

idcore annotations
tablecolumnrowstring valuevalueabout URLproperty URLvalue URL
C1.1TC1R1"B.B. King""B.B. King"<http://example.org/events-listing.csv#event-1>schema:name
C1.2TC2R1"2014-04-12T19:30"2014-04-12T19:30:00<http://example.org/events-listing.csv#event-1>schema:startDate
C1.3TC3R1"Lupo’s Heartbreak Hotel""Lupo’s Heartbreak Hotel"<http://example.org/events-listing.csv#place-1>schema:name
C1.4TC4R1"79 Washington St., Providence, RI""79 Washington St., Providence, RI"<http://example.org/events-listing.csv#place-1>schema:address
C1.5TC5R1"https://www.etix.com/ticket/1771656"<https://www.etix.com/ticket/1771656><http://example.org/events-listing.csv#offer-1>schema:url
C1.6TC6R1""null<http://example.org/events-listing.csv#event-1>rdf:typeschema:MusicEvent
C1.7TC7R1""null<http://example.org/events-listing.csv#place-1>rdf:typeschema:Place
C1.8TC8R1""null<http://example.org/events-listing.csv#offer-1>rdf:typeschema:Offer
C1.9TC9R1""null<http://example.org/events-listing.csv#event-1>schema:location<http://example.org/events-listing.csv#place-1>
C1.10TC10R1""null<http://example.org/events-listing.csv#event-1>schema:offers<http://example.org/events-listing.csv#offer-1>
C2.1TC1R2"B.B. King""B.B. King"<http://example.org/events-listing.csv#event-2>schema:name
C2.2TC2R2"2014-04-13T20:00"2014-04-13T20:00:00<http://example.org/events-listing.csv#event-2>schema:startDate
C2.3TC3R2"Lynn Auditorium""Lynn Auditorium"<http://example.org/events-listing.csv#place-2>schema:name
C2.4TC4R2"Lynn, MA, 01901""Lynn, MA, 01901"<http://example.org/events-listing.csv#place-2>schema:address
C2.5TC5R2"http://frontgatetickets.com/venue.php?id=11766"<http://frontgatetickets.com/venue.php?id=11766><http://example.org/events-listing.csv#offer-2>schema:url
C2.6TC6R2""null<http://example.org/events-listing.csv#event-2>rdf:typeschema:MusicEvent
C2.7TC7R2""null<http://example.org/events-listing.csv#place-2>rdf:typeschema:Place
C2.8TC8R2""null<http://example.org/events-listing.csv#offer-2>rdf:typeschema:Offer
C2.9TC9R2""null<http://example.org/events-listing.csv#event-2>schema:location<http://example.org/events-listing.csv#place-2>
C2.10TC10R2""null<http://example.org/events-listing.csv#event-2>schema:offers<http://example.org/events-listing.csv#offer-2>

Minimal mode output for this example is provided below:

        

Three resources are defined for each row within the table; event, location and offer. Therefore three objects are created for each row.

Each column description explicitly defines both aboutUrl and propertyUrl properties which are used to create the about URL and property URL annotations on the column's cells.

Columns C6, C7 and C8 ({ "name": "type_event"}, { "name": "type_place"} and { "name": "type_offer"}) define the semantic types of the resources described by each row: schema:MusicEvent, schema:Place and schema:Offer respectively—noting that the use of rdf:type is converted to the name @type (as used in [[json-ld]]) by this conversion application.

Column C9 ({ "name": "location"}) uses the about URL, property URL and value URL to assert the relationship between the event and location resources.

Column C10 ({ "name": "offer"}) uses the about URL, property URL and value URL to assert the relationship between the event and offer resources.

Standard mode output for this example is provided below:

        

The resources described by each row are explicitly defined using the about URL annotation—in this case three resources per row (event, location, and offer). The objects containing the name-values pairs associated with the cell values of a row are related to the object for each subject in that row using the name-value pair with name describes.

Example with table group comprising four interrelated tables

This example is based on Use Case #4 - Publication of public sector roles and salaries and uses four annotated tables published as a group of tables. Information about senior roles and junior roles within a government department or organization are published in CSV format by each department. These are validated against a centrally published schema to ensure that all the data published by departments is consistent. Additionally, lists of organizations and professions are also published centrally, providing controlled vocabularies against which departmental submissions are validated.

Information published about junior and senior roles provides summary information for each post within the government department or organization. Whilst the junior role information is anonymous, providing only an indication of the number of full-time-equivalent (FTE) staff occupying a given post, the senior role information specifies the named individual occupying each post. As such, each row from the tabular data file describing senior roles corresponds to two resources; the post and the person occupying that post.

This example is concerned only with converting the information provided by each government department or organization rather than the centrally published information listing organizations and professions.

The input tabular data files and associated metadata descriptions are provided below:

        
        
        
        
        
        
        
        
        

This example makes extensive use of the example.org domain. As described in [[RFC6761]], this domain is used for illustrative examples within documentation. In reality, the resources described here with the URL path /gov.uk would be centrally published by the UK Government at, say, the domain data.gov.uk.

Given that these resources are centrally published with an aspiration for reuse, the schema descriptions have been factored out into separate resources. As such, the top-level metadata description resource metadata.json simply provides the list of tables and binds each of them to the appropriate schema that is defined elsewhere.

Finally, note that because the centrally published metadata descriptions are intended to be reused across many government departments and organizations, extra consideration has been given to defining URIs for the person and post resources defined in each row of the senior roles tabular data and subsequently referenced from the junior roles tabular data. To ensure that naming clashes are avoided, the unique reference for the organization to which the person or post belongs has been included in a path segment of the identifier. For example, the URI template property aboutUrl used to identify the senior post is specified as http://example.org/organization/{organizationRef}/post/{ref}, thus yielding the URI http://example.org/organization/hefce.ac.uk/post/90115 for the post described in the first row of the senior roles tabular data.

The group of tables generated from parsing the tabular data files and associated metadata is shown below and provides the basis for the conversion to JSON.

Annotations for the group of tables G and the four tables Ta, Tb, Tc, and Td are shown below:

Group of Tables annotations:

idcore annotations
tables
GTa, Tb, Tc, Td

Table annotations:

idcore annotations
urlcolumnsrowssuppress outputforeign keys
Tahttp://example.org/gov.uk/data/organizations.csvCa1, Ca2, Ca3Ra1, Ra2trueFa1
Tbhttp://example.org/gov.uk/professions.csvCb1Rb1, Rb2, Rb3, Rb4true
Tchttp://example.org/senior-roles.csvCc1, Cc2, Cc3, Cc4, Cc5, Cc6Rc1, Rc2falseFc1, Fc2, Fc3
Tdhttp://example.org/junior-roles.csvCd1, Cd2, Cd3, Cd4, Cd5, Cd6, Cd7Rd1, Rd2falseFd1, Fd2, Fd3

In this example, output for the centrally published lists of organizations and professions, tables Ta and Tb (http://example.org/gov.uk/data/organizations.csv and http://example.org/gov.uk/data/professions.csv respectively), are not required; only information from the departmental submissions is to be translated to RDF. Note the suppress output annotation on this table.

The following foreign keys are defined:

idcolumns in tablecolumns in referenced table
Fa1Ca3Ca1
Fc1Cc5Cc1
Fc2Cc6Cb1
Fc3Cc7Ca1
Fd1Cd1Cc1
Fd2Cd7Cb1
Fd3Cd8Ca1

Annotations for the columns, rows and cells in table T are shown in the tables below:

Column annotations:

idcore annotations
tablenumbersource numbercellsnametitlesrequiredvirtual
Ca1Ta11Ca1.1, Ca2.1refOrganization Unique Referencetrue
Ca2Ta11Ca1.2, Ca2.2nameOrganization Name
Ca3Ta11Ca1.3, Ca2.3departmentDepartment Reference
Cb1Tb11Cb1.1, Cb2.1, Cb3.1, Cb4.1nameProfessiontrue
Cc1Tc11Cc1.1, Cc2.1refPost Unique Referencetrue
Cc2Tc22Cc1.2, Cc2.2nameName
Cc3Tc33Cc1.3, Cc2.3gradeGrade
Cc4Tc44Cc1.4, Cc2.4jobJob Title
Cc5Tc55Cc1.5, Cc2.5reportsToReports to Senior Post
Cc6Tc66Cc1.6, Cc2.6professionProfession
Cc7Tc77Cc1.7, Cc2.7organizationRefOrganization Referencetrue
Cc8Tc88Cc1.8, Cc2.8post_holdertrue
Cd1Td11Cd1.1, Cd2.1reportsToSeniorReporting Senior Posttrue
Cd2Td22Cd1.2, Cd2.2gradeGrade
Cd3Td33Cd1.3, Cd2.3min_payPayscale Minimum (£)
Cd4Td44Cd1.4, Cd2.4max_payPayscale Maximum (£)
Cd5Td55Cd1.5, Cd2.5jobGeneric Job Title
Cd6Td66Cd1.6, Cd2.6numberNumber of Posts (FTE)
Cd7Td77Cd1.7, Cd2.7professionProfession
Cd8Td88Cd1.8, Cd2.8organizationRefOrganization Referencetrue

Column Cc8, with the virtual annotation specified as true, is used to relate the person resource, whose name is provided in column Cc2, to the associated post resource within the current row of table Tc ({ "url": "http://example.org/senior-roles.csv" }).

Row annotations:

idcore annotations
tablenumbersource numbercells
Ra1Ta12Ca1.1, Ca1.2, Ca1.3
Ra2Ta23Ca2.1, Ca2.2, Ca2.3
Rb1Tb12Cb1.1
Rb2Tb23Cb2.1
Rb3Tb34Cb3.1
Rb4Tb45Cb4.1
Rc1Tc12Cc1.1, Cc1.2, Cc1.3, Cc1.4, Cc1.5, Cc1.6, Cc1.7, Cc1.8
Rc2Tc23Cc2.1, Cc2.2, Cc2.3, Cc2.4, Cc2.5, Cc2.6, Cc2.7, Cc2.8
Rd1Td12Cd1.1, Cd1.2, Cd1.3, Cd1.4, Cd1.5, Cd1.6, Cd1.7, Cd1.8
Rd2Td23Cd2.1, Cd2.2, Cd2.3, Cd2.4, Cd2.5, Cd2.6, Cd2.7, Cd2.8

Cell annotations:

idcore annotations
tablecolumnrowstring valuevalueabout URLproperty URLvalue URL
Ca1.1TaCa1Ra1"hefce.ac.uk""hefce.ac.uk"<http://example.org/organization/hefce.ac.uk>dc:identifier
Ca1.2TaCa2Ra1"Higher Education Funding Council for England""Higher Education Funding Council for England"<http://example.org/organization/hefce.ac.uk>foaf:name
Ca1.3TaCa3Ra1"bis.gov.uk""bis.gov.uk"<http://example.org/organization/hefce.ac.uk>org:subOrganizationOf<http://example.org/organization/bis.gov.uk>
Ca2.1TaCa1Ra2"bis.gov.uk""bis.gov.uk"<http://example.org/organization/bis.gov.uk>dc:identifier
Ca2.2TaCa2Ra2"Department for Business, Innovation and Skills""Department for Business, Innovation and Skills"<http://example.org/organization/bis.gov.uk>foaf:name
Ca2.3TaCa3Ra2"xx"null<http://example.org/organization/bis.gov.uk>org:subOrganizationOf
Cb1.1TbCb1Rb1"Finance""Finance"
Cb2.1TbCb1Rb2"Information Technology""Information Techology"
Cb3.1TbCb1Rb3"Operational Delivery""Operational Delivery"
Cb4.1TbCb1Rb4"Policy""Policy"
Cc1.1TcCc1Rc1"90115""90115"<http://example.org/organization/hefce.ac.uk/post/90115>dc:identifier
Cc1.2TcCc2Rc1"Steve Egan""Steve Egan"<http://example.org/organization/hefce.ac.uk/person/1>foaf:name
Cc1.3TcCc3Rc1"SCS1A""SCS1A"<http://example.org/organization/hefce.ac.uk/post/90115><http://example.org/gov.uk/def/grade>
Cc1.4TcCc4Rc1"Deputy Chief Executive""Deputy Chief Executive"<http://example.org/organization/hefce.ac.uk/post/90115><http://example.org/gov.uk/def/job>
Cc1.5TcCc5Rc1"90334""90334"<http://example.org/organization/hefce.ac.uk/post/90115>org:reportsTo<http://example.org/organization/hefce.ac.uk/post/90334>
Cc1.6TcCc6Rc1"Finance""Finance"<http://example.org/organization/hefce.ac.uk/post/90115><http://example.org/gov.uk/def/profession>
Cc1.7TcCc7Rc1"hefce.ac.uk""hefce.ac.uk"<http://example.org/organization/hefce.ac.uk/post/90115>org:postIn<http://example.org/organization/hefce.ac.uk>
Cc1.8TcCc8Rc1""null<http://example.org/organization/hefce.ac.uk/post/90115>org:heldBy<http://example.org/organization/hefce.ac.uk/person/1>
Cc2.1TcCc1Rc2"90334""90334"<http://example.org/organization/hefce.ac.uk/post/90334>dc:identifier
Cc2.2TcCc2Rc2"Sir Alan Langlands""Sir Alan Langlands"<http://example.org/organization/hefce.ac.uk/person/2>foaf:name
Cc2.3TcCc3Rc2"SCS4""SCS4"<http://example.org/organization/hefce.ac.uk/post/90334><http://example.org/gov.uk/def/grade>
Cc2.4TcCc4Rc2"Chief Executive""Chief Executive"<http://example.org/organization/hefce.ac.uk/post/90334><http://example.org/gov.uk/def/job>
Cc2.5TcCc5Rc2"xx"null<http://example.org/organization/hefce.ac.uk/post/90334>org:reportsTo
Cc2.6TcCc6Rc2"Policy""Policy"<http://example.org/organization/hefce.ac.uk/post/90334><http://example.org/gov.uk/def/profession>
Cc2.7TcCc7Rc2"hefce.ac.uk""hefce.ac.uk"<http://example.org/organization/hefce.ac.uk/post/90334>org:postIn<http://example.org/organization/hefce.ac.uk>
Cc2.8TcCc8Rc2""null<http://example.org/organization/hefce.ac.uk/post/90334>org:heldBy<http://example.org/organization/hefce.ac.uk/person/2>
Cd1.1TdCd1Rd1"90115""90115"org:reportsTo<http://example.org/organization/hefce.ac.uk/post/90115>
Cd1.2TdCd2Rd1"4""4"<http://example.org/gov.uk/def/grade>
Cd1.3TdCd3Rd1"17426"17426<http://example.org/gov.uk/def/min_pay>
Cd1.4TdCd4Rd1"20002"20002<http://example.org/gov.uk/def/max_pay>
Cd1.5TdCd5Rd1"Administrator""Administrator"<http://example.org/gov.uk/def/job>
Cd1.6TdCd6Rd1"8.67"8.67<http://example.org/gov.uk/def/number_of_posts>
Cd1.7TdCd7Rd1"Operational Delivery""Operational Delivery"<http://example.org/gov.uk/def/profession>
Cd1.8TdCd8Rd1"hefce.ac.uk""hefce.ac.uk"org:postIn<http://example.org/organization/hefce.ac.uk>
Cd2.1TdCd1Rd2"90115""90115"org:reportsTo<http://example.org/organization/hefce.ac.uk/post/90115>
Cd2.2TdCd2Rd2"5""5"<http://example.org/gov.uk/def/grade>
Cd2.3TdCd3Rd2"19546"19546<http://example.org/gov.uk/def/min_pay>
Cd2.4TdCd4Rd2"22478"22478<http://example.org/gov.uk/def/max_pay>
Cd2.5TdCd5Rd2"Administrator""Administrator"<http://example.org/gov.uk/def/job>
Cd2.6TdCd6Rd2"0.5"0.5<http://example.org/gov.uk/def/number_of_posts>
Cd2.7TdCd7Rd2"Operational Delivery""Operational Delivery"<http://example.org/gov.uk/def/profession>
Cd2.8TdCd8Rd2"hefce.ac.uk""hefce.ac.uk"org:postIn<http://example.org/organization/hefce.ac.uk>

Notice that value URL is not specified for cells Ca2.3 and Cc2.5 because in each case the cell value is null and the virtual annotation of column Cb5 is not defined.

Minimal mode output for this example is provided below:

        

Prefixes defined within the RDFa 1.1 Initial Context ([[rdfa-core]]) are not expanded; e.g. dc: for <http://purl.org/dc/terms/>.

Output for tables Ta and Tb ({ "url": "http://example.org/gov.uk/data/organizations.csv" } and { "url": "http://example.org/gov.uk/data/professions.csv" }) are not included as the suppress output annotation has the value true for each of the tables.

The property URL is specified for all cells in tables Tc and Td.

Columns Cc5 and Cd1 ({ "name": "reportsTo" } and { "name": "reportsToSenior" }) use the about URL, property URL and value URL annotations to assert the relationship between the post described by a given row and the senior post to which it reports. However, since senior posts and junior posts are described in different tables it is not possible to create nested objects for this particular case.

Similarly, columns Cc7 and Cd8 (both with { "name": "organizationRef" }) use the about URL, property URL and value URL annotations to assert the relationship between the post described by a given row and the organization to which it belongs.

Finally, note that two resources are created for each row within table Tc ({ "url": "http://example.org/senior-roles.csv" }): the person and the post they occupy. The relationship between these resources is specified via virtual column Cc8 ({ "name": "post_holder" }) using the about URL, property URL and value URL annotations. The person object provides the value of the name-value pair with corresponding name org:heldBy, thus nesting the person object within the post object.

Standard mode output for this example is provided below:

        

Acknowledgements

Changes since previous versions

Changes since candidate recommendation of 16 July 2015

Changes since the working draft of 16 April 2015

Changes since the first public working draft of 08 January 2015

The document has undergone substantial changes since the first public working draft. Below are some of the changes made: