This document defines the procedures and rules to be applied when converting tabular data into RDF. Tabular data may be complemented with metadata annotations that describe its structure, the meaning of its content and how it may form part of a collection of interrelated tabular data. This document specifies the effect of this metadata on the resulting RDF.

The CSV on the Web Working Group was chartered to produce Recommendations for "Access methods for CSV Metadata", "Metadata vocabulary for CSV data" and "Mapping mechanism to transforming CSV into various Formats (e.g., RDF, JSON, or XML)". This document aims to satisfy the RDF variant of the mapping Recommendation.

Introduction

This document describes the processing of tabular data to create an RDF subject-predicate-object triples [[!rdf11-concepts]]. Since RDF is an abstract syntax, these triples MAY be serialized in a concrete RDF syntax such as N-Triples [[n-triples]], Turtle [[turtle]], RDFa [[rdfa-primer]], JSON-LD [[json-ld]] or TriG [[trig]]. The RDF serializations offered by a conversion application is implementation dependent.

The [[!tabular-data-model]] defines an annotated tabular data model consisting of tables, columns, rows and cells, enriched with annotations that describe the structure of the tabular data and the meaning of its content. A table group is a collection of tables published as a single atomic unit.

The conversion procedure described in this specification operates on the tabular data. This specification does not specify the processes needed to convert CSV-encoded data into tabular data form. Please refer to [[!tabular-data-model]] for details of parsing tabular data.

Conversion applications MUST provide at least two modes of operation: standard and minimal.

Standard mode conversion frames the information gleaned from the cells of the tabular data with details of the rows, tables and a table group within which that information is provided.

Minimal mode conversion includes only the information gleaned from the cells of the tabular data.

Standard and minimal conversion are described normatively below.

Conversion applications MAY offer additional implementation specific conversion modes.

Conversion specifications, as defined in [[!tabular-metadata]] MAY be used to specify how tabular data can be transformed into another format using a script or template. Such a conversion specification MAY use the RDF output described in this specification as input.

The conversion procedure described in this specification is considered to be entirely textual. There is no requirement on conversion applications to check the semantic consistency of the data during the conversion, nor validate the triples against RDF syntax rules. Downstream applications SHOULD be aware of the potential for syntax errors and take appropriate action.

Tabular data MUST conform to the description from [[!tabular-data-model]]. In particular note that each row MUST contain the same number of cells (although some of these cells may be empty). Given this constraint, not all CSV-encoded data can be considered to be tabular data. As such, the conversion procedure described in this specification cannot be applied to all CSV files.

This specification makes use of the compact IRI Syntax; please refer to the Compact IRIs from [[json-ld]].

This specification makes use of the following namespaces:

csvw:
http://www.w3.org/ns/csvw#
rdf:
http://www.w3.org/1999/02/22-rdf-syntax-ns#
xsd:
http://www.w3.org/2001/XMLSchema#

Converting Tabular Data to RDF

The procedures for converting tabular data into RDF are described below for both standard and minimal modes.

Algorithm terms

aboutUrl
aboutUrl is the evaluation of the URI template property aboutUrl for the current cell.
annotated table
The annotated table is defined in [[!tabular-data-model]] as describing a particular table and its metadata.
blank node
A blank node is defined in [[!rdf11-concepts]] as an RDF Term disjoint from IRIs or literals.
cell
A cell is defined in [[!tabular-data-model]] as the intersection of a row and a column within a table.
cell errors
Cell errors are defined in [[!tabular-data-model]] as a (possibly empty) list of validation errors generated while parsing the literal content of a cell to generate the semantic value.
cell value
A cell value is defined in [[!tabular-data-model]] as the semantic value of the cell; this MAY be null or, in the case that the cell specifies a separator property, a sequence of values.
column
A column is defined in [[!tabular-data-model]] as a vertical arrangement of cells within a table.
common properties
The common properties of a metadata resource are defined in Section 3.3 Common Properties of [[!tabular-metadata]]). The RDF triples corresponding to these properties are the result of running the algorithm specified in or equivalalent, over the common properties defined within the metadata description.
identifier
The identifier is the evaluation of the @id property for the current resource. As defined in [[!tabular-data-model]], the identifier is null if the @id property is undefined. The identifier MAY be applied to either a table group or a table.
literal node
A literal node is defined in [[!rdf11-concepts]] as a node within an RDF graph that provides values such as strings, numbers, and dates.
node
A node is defined in [[!rdf11-concepts]] as a subject or an object of an RDF triple. When in subject position, it can be either a blank node or identified with a URL; when in object position, it can be a blank node, a literal, or identified with a URL.
notes
A list of notes, as defined in [[!tabular-data-model]], attached to an annotated tableor table group using the notes property. This may be an empty list.
predicate
A predicate is defined in [[!rdf11-concepts]] as an IRI that denotes the property used to relate nodes within an RDF triple.
prefixed name
A prefixed name is an abbreviation for a URI, in the syntax prefix:name. See Names of Common Properties in [[!tabular-metadata]] for information on expansion.
propertyUrl
The propertyUrl is the evaluation of the URI template property propertyUrl for the current cell.
row
The row is defined in [[!tabular-data-model]] as a horizontal arrangement of cells within a table.
row number
A row number is defined in [[!tabular-data-model]] as the position of the row within the table, starting from 1.
row source number
A row source number is defined in [[!tabular-data-model]] as the position of the row within the source tabular data file. Provision of the row source number is dependent on parsing applications and may be reported as null.
subject
Within this algorithm, a subject is the resource that the value of a given cell refers to. This may be specified using the aboutUrl property.
table group
The table group is defined in [[!tabular-data-model]] as comprising a set of annotated tables and a set of annotations that relate to those tables.
table group description
The table group description object as defined in [[!tabular-data-model]].
valueUrl
The valueUrl is the evaluation of the URI template property valueUrl for the current cell.

Generating RDF

A conformant RDF conversion application MUST emit triples conforming to those described in this algorithm according to the chosen mode of conversion: standard or minimal.

Unless specified otherwise, the steps in the algorithm defined herein apply to both standard and minimal modes.

Where an annotated table is defined in isolation (e.g. in the absence of a table group description), a default table group description is provided with a single resources annotation that refers to that table.

  1. In standard mode only, establish a new node G. If the table group has an identifier then node G MUST be identified accordingly; else if identifier is null, then node G MUST be a blank node.
  2. In standard mode only, specify the type of node G as csvw:TableGroup; emit the following triple:

    subject
    node G
    predicate
    rdf:type
    object
    csvw:TableGroup
  3. In standard mode only, emit the triples generated by running the algorithm specified in over any notes and common properties specified for the table group, with node G as an initial subject, the notes or common property as property, and the value of the notes or common property as value.

  4. For each table where the value of property suppressOutput is false:

    1. In standard mode only, establish a new node T which represents the current table.

      If the table has an identifier then node T MUST be identified accordingly; else if identifier is null, then node T MUST be a blank node.

    2. In standard mode only, relate the table to the table group; emit the following triple:

      subject
      node G
      predicate
      csvw:table
      object
      node T
    3. In standard mode only, specify the type of node T as csvw:Table; emit the following triple:

      subject
      node T
      predicate
      rdf:type
      object
      csvw:Table
    4. In standard mode only, specify the source tabular data file URL for the current table based on the value of property url; emit the following triple:

      subject
      node T
      predicate
      csvw:url
      object
      a node identified by URL
    5. In standard mode only, emit the triples generated by running the algorithm specified in over any notes and common properties specified for the table, with node T as an initial subject, the notes or common property as property, and the value of the notes or common property as value.

      All other annotations for the table are ignored during the conversion; including information about table schemas and column descriptions specified therein, dialect descriptions, foreign-key-definitions etc.

    6. For each row in the current table:

      1. In standard mode only, establish a new blank node R which represents the current row.

      2. In standard mode only, relate the row to the table; emit the following triple:

        subject
        node T
        predicate
        csvw:row
        object
        node R
      3. In standard mode only, specify the row number n for the row; emit the following triple:

        subject
        node R
        predicate
        csvw:rownum
        object
        a literal n; specified with datatype IRI xsd:integer
      4. In standard mode only, specify the row source number nsource for the row within the source tabular data file URL using a fragment-identifier as specified in [[RFC7111]]; if row source number is not null, emit the following triple:

        subject
        node R
        predicate
        csvw:url
        object
        a node identified by URL#row=nsource
      5. Establish a new blank node Sdef to be used as the default subject for cells where aboutUrl is undefined.

        A row MAY describe multiple interrelated subjects; where the valueUrl property for one cell matches the aboutUrl property for another cell in the same row.

        For each cell in the current row where the value of property suppressOutput for the column associated with that cell is false:

        1. Establish a node S from the aboutUrl property if set, or from Sdef otherwise as the current subject.

        2. In standard mode only, relate the current subject to the current row; emit the following triple:

          subject
          node R
          predicate
          csvw:describes
          object
          node S
        3. If the value of propertyUrl for the cell is not null, then predicate P takes the value of propertyUrl.

          Else, predicate P is constructed by appending the value of the name property for the column associated with the cell to the the tabular data file URL as a fragment identifier.

        4. If the valueUrl for the current cell is not null, then valueUrl identifies a node Vurl that is related the current subject using the predicate P; emit the following triple:
          subject
          node S
          predicate
          P
          object
          node Vurl
        5. Else, if the cell specifies a separator property and the cell value is not an empty sequence and the cell specifies that boolean property ordered is true, then the cell value provides an ordered sequence of literal nodes for inclusion within the RDF output using an instance of rdf:List Vlist as defined in [[rdf-schema]]. This instance is related to the subject using the predicate P; emit the triples defining list Vlist plus the following triple:
          subject
          node S
          predicate
          P
          object
          node Vlist
        6. Else, if the cell specifies a separator property and the cell value is not an empty sequence, then the cell value provides an unordered sequence of literal nodes for inclusion within the RDF output, each of which is related to the subject using the predicate P. For each value provided in the sequence, add a literal node Vliteral; emit the following triple:
          subject
          node S
          predicate
          P
          object
          literal node Vliteral
        7. Else, if the cell value is not null and the cell does not specify a separator property, then the cell value provides a single literal node Vliteral for inclusion within the RDF output that is related the current subject using the predicate P; emit the following triple:
          subject
          node S
          predicate
          P
          object
          literal node Vliteral

          The literal nodes derived from the cell values MUST be expressed according to the datatype property of the cell as defined below: Interpreting datatypes.

          In the case where a sequence of values is provided, the datatype applies to all members of the sequence.

Interpreting datatypes

Cell values are expressed in the RDF output according to the cell's datatype property. The relationship between the value of the datatype property and the datatype IRI used in the RDF is provided in the table below.

A cell's format property is irrelevant to the conversion procedure defined in this specification; the cell value has already been parsed from the contents the cell according to the format property.

Where the contents of the cell cannot be parsed, or other validation errors occur, cell errors will be provided. It is an implementation decision to determine how conversion applications should proceed in the event that cell errors are encountered.

datatypeRDF datatype IRIRemarks
anyAtomicTypexsd:anyAtomicTypeany is considered to be equivalent to anyAtomicType
anyxsd:anyAtomicType
anyAtomicTypexsd:anyAtomicType
anyURIxsd:anyURI
base64Binaryxsd:base64Binary
binaryxsd:base64Binarybinary is considered to be equivalent to base64Binary
booleanxsd:boolean
datexsd:date
dateTimexsd:dateTime
datetimexsd:dateTimedatetime is considered to be equivalent to dateTime
dateTimeStampxsd:dateTimeStamp
decimalxsd:decimal
integerxsd:integer
longxsd:long
intxsd:int
shortxsd:short
bytexsd:byte
nonNegativeIntegerxsd:nonNegativeInteger
positiveIntegerxsd:positiveInteger
unsignedLongxsd:unsignedLong
unsignedIntxsd:unsignedInt
unsignedShortxsd:unsignedShort
unsignedBytexsd:unsignedByte
nonPositiveIntegerxsd:nonPositiveInteger
negativeIntegerxsd:negativeInteger
doublexsd:double
numberxsd:doublenumber is considered to be equivalent to double
durationxsd:duration
dayTimeDurationxsd:dayTimeDuration
yearMonthDurationxsd:yearMonthDuration
floatxsd:float
gDayxsd:gDay
gMonthxsd:gMonth
gMonthDayxsd:gMonthDay
gYearxsd:gYear
gYearMonthxsd:gYearMonth
hexBinaryxsd:hexBinary
QNamexsd:QName
stringxsd:stringWhere the lang property is defined for a cell, the appropriate language tag (as defined in [[!rdf11-concepts]]) MUST be provided for the string.
normalizedStringxsd:normalizedString(as for string)
tokenxsd:token(as for string)
languagexsd:language(as for string)
Namexsd:Name(as for string)
NMTOKENxsd:NMTOKEN(as for string)
xmlrdf:XMLLiteral
htmlrdf:HTML
jsoncsvw:JSONcsvw:JSON is a sub-class of xsd:string
timexsd:time

Inclusion of provenance information

In addition to the namespaces defined above, the following namespace is used in this section:

prov:
http://www.w3.org/ns/prov#

Conversion applications MAY include provenance information in the RDF output describing how and when the output was created; e.g. using terms from the PROV Ontology [[prov-o]]. Information that may be of interest to downstream applications includes:

In order to faciliate the provision of such information, this specification introduces two instances of prov:Role:

csvw:csvEncodedTabularData
defines the role of the source tabular data file
csvw:tabularMetadata
defines the role of the metadata description file
An illustrative example of provenance information is provided below in Turtle [[turtle]] syntax, the conversion application used is identified as http://example.org/my-csv2rdf-application:
       

JSON-LD to RDF

This section defines a mechanism for transforming the [[json-ld]] Dialect used for common properties and notes into RDF in a manner consistent with the Deserialize JSON-LD to RDF Algorithm defined in [[!json-ld-api]]. Converters MAY use any algorithm which results in equivalent triples.

Given a subject, property and value in normalized form:

  1. Property is a term defined in the [[csvw-context]], a prefixed name, or an absolute URL; expand to an absolute URL by replacing a term with the URI from the term definition in [[csvw-context]] or a prefixed name as described in .
  2. If value is an array, generate RDF by running this algorithm using subject, property using each array member as value.
  3. If value is an object containing @value, create an RDF Literal lit using the string value of @value and language from @language, or datatype from @type if present, expanding @type as necessary using the procedure outlined for property, and emit the following triple:
    subject
    node subject
    predicate
    property
    object
    literal node lit
  4. Else, if value is an object:
    1. Establish a new node S from the value of @id, if it exists, and new blank node otherwise and emit the following triple:
      subject
      node subject
      predicate
      node property
      object
      node S
    2. For every value of @type, either a term defined in the [[csvw-context]], a prefixed name, or an absolute URL; establish a new node Ti by expanding the value to an absolute URL by replacing a term with the URI from the term definition in [[csvw-context]] or a prefixed name with its expanded value. For each Ti, emit the following triple:
      subject
      node S
      predicate
      rdf:type
      object
      node Ti
    3. For every key and val from value that does not start with @ (U+0040) generate RDF by running this algorithm using S for subject, key for property and val for value.
  5. Else, establish lit as an RDF Literal as follows:
    1. If value is true or false, create an RDF Literal lit using the strings "true" or "false", accordingly with datatype xsd:boolean
    2. Else, if value is a JSON number with a non-zero fractional part, create an RDF Literal lit using the canonical representation for value with datatype xsd:double.
    3. Else, if value is a JSON number with no non-zero fractional part, create an RDF Literal lit using the canonical representation for value with datatype xsd:integer.

    Emit the following triple:

    subject
    node subject
    predicate
    property
    object
    literal node lit

Examples

In addition to the namespaces defined above, the examples provided here make use of the following namespaces:

dc:
http://purl.org/dc/terms/
foaf:
http://xmlns.com/foaf/0.1/
oa:
http://www.w3.org/ns/oa#
schema:
http://schema.org/

Furthermore, these examples also make use of the Turtle syntax @base declaration (as defined in [[turtle]]). Where a single tabular data file is used in the example, the @base declaration is set to the URL of that tabular data file.

Each of the examples expresses more complex conversions - it is recommended that readers of this specification work through the examples in sequential order.

Simple example

This example comprises a single annotated table containing information attributes about countries; country code, position (latitude, longitude) and name. Whilst the input tabular data file, published at http://example.org/countries.csv, includes a header line, no further metadata annotations are given. The tabular data file is provided below:

        

The annotated table generated from parsing the tabular data file is shown below and provides the basis for the conversion to RDF.

Annotations for the resulting table T, with 4 columns and 3 rows, are shown below:

idcore annotationsannotations
urlcolumnsrows
Thttp://example.org/countries.csvC1, C2, C3, C4R1, R2, R3

Annotations for the columns, rows and cells in table T are shown in the tables below.

Column annotations:

idcore annotationsannotations
tablenumbersource numbercellsnametitle
C1T11C1.1, C2.1, C3.1countryCodecountryCode
C2T22C1.2, C2.2, C3.2latitudelatitude
C3T33C1.3, C2.3, C3.3longitudelongitude
C4T44C1.4, C2.4, C3.4namename

Row annotations:

idcore annotations
tablenumbersource numbercells
R1T12C1.1, C1.2, C1.3, C1.4
R2T23C2.1, C2.2, C2.3, C2.4
R3T34C3.1, C3.2, C3.3, C3.4

Cell annotations:

idcore annotationsannotations
tablecolumnrowstring valuevalueerrorspropertyUrl
C1.1TC1R1"AD""AD"<http://example.org/countries.csv#countryCode>
C1.2TC2R1"42.546245""42.546245"<http://example.org/countries.csv#latitude>
C1.3TC3R1"1.601554""1.601554"<http://example.org/countries.csv#longitude>
C1.4TC4R1"Andorra""Andorra"<http://example.org/countries.csv#name>
C2.1TC1R2"AE""AE"<http://example.org/countries.csv#countryCode>
C2.2TC2R2"23.424076""23.424076"<http://example.org/countries.csv#latitude>
C2.3TC3R2"53.847818""53.847818"<http://example.org/countries.csv#longitude>
C2.4TC4R2"United Arab Emirates""United Arab Emirates"<http://example.org/countries.csv#name>
C3.1TC1R3"AF""AF"<http://example.org/countries.csv#countryCode>
C3.2TC2R3"33.93911""33.93911"<http://example.org/countries.csv#latitude>
C3.3TC3R3"67.709953""67.709953"<http://example.org/countries.csv#longitude>
C3.4TC4R3"Afghanistan""Afghanistan"<http://example.org/countries.csv#name>

As the value of propertyUrl has not been set within the metadata description it defaults to the URI Template (see [[RFC6570]]) #{[column-name]}, where [column-name] is the value of the name property for the column associated with the cell. For example, the value of propertyUrl for all cells in column C1 ("name": "countryCode") is http://example.org/countries.csv#countryCode.

Minimal mode output for this example is provided in Turtle [[turtle]] syntax below:

        

The aboutUrl property has not been set for cells in table T ({ "url": "http://example.org/countries.csv"}) - cells in a given row where aboutUrl has not been specified are assumed to refer to the same subject. This unspecified subject is treated as a blank node.

Standard mode output for this example is provided in Turtle [[turtle]] syntax below:

        

Even though the table was defined in isolation, the table is wrapped in a table group.

The type of both table and table group resources is explicitly stated; csvw:TableGroup and csvw:Table respectively.

The csvw:url property provides reference to the original tabular data file and to specific rows therein - noting the need to escape the Turtle-syntax reserved character = (U+003D) within the fragment identifier.

The row number is provided for each row using csvw:rownum property.

A subject and row are related using the csvw:describes property.

Example with single table and rich annotations

This example is based on Use Case #11 - City of Palo Alto Tree Data and comprises a single annotated table describing an inventory of tree maintenance operations. The input tabular data file, published at http://example.org/tree-ops-ext.csv, and the associated metadata description http://example.org/tree-ops-ext.csv-metadata.json are provided below:

        
        

The notes annotation in the metadata description uses the Open Annotation data model currently under development within the Web Annotations Working Group. This is purely illustrative; no constraints are placed on the value of the notes annotation.

The annotated table generated from parsing the tabular data file and associated metadata is shown below and provides the basis for the conversion to RDF.

Annotations for the resulting table T, with 9 columns and 3 rows, are shown below:

idcore annotationsannotations
urlcolumnsrows
Thttp://example.org/tree-ops-ext.csvC1, C2, C3, C4, C5, C6, C7, C8, C9R1, R2, R3@id<http://example.org/tree-ops-ext>
dc:title"Tree Operations"
dc:keywords["tree", "street", "maintenance"]
dc:publisher[{ "schema:name": "Example Municipality", "schema:url": { "@id": "http://example.org" } }]
dc:license<http://opendefinition.org/licenses/cc-by/>
dc:modified"2010-12-31"
notes[{ "@type": "oa:Annotation", ... }]
primaryKeyC1

The value of the notes annotation has been shortened for clarity in the table above.

Annotations for the columns, rows and cells in table T are shown in the tables below.

Column annotations:

idcore annotationsannotations
tablenumbersource numbercellsnametitlerequiredsuppressOutputdc:description
C1T11C1.1, C2.1, C3.1GIDGID, Generic IdentifiertruetrueAn identifier for the operation on a tree.
C2T22C1.2, C2.2, C3.2on_streetOn StreetThe street that the tree is on.
C3T33C1.3, C2.3, C3.3speciesSpeciesThe species of the tree.
C4T44C1.4, C2.4, C3.4trim_cycleTrim CycleThe operation performed on the tree.
C5T55C1.5, C2.5, C3.5dbhDiameter at Breast HtDiameter at Breast Height (DBH) of the tree (in feet), measured 4.5ft above ground.
C6T66C1.6, C2.6, C3.6inventory_dateInventory DateThe date of the operation that was performed.
C7T77C1.7, C2.7, C3.7commentsCommentsSupplementary comments relating to the operation or tree.
C8T88C1.8, C2.8, C3.8protectedProtectedIndication (YES / NO) whether the tree is subject to a protection order.
C9T99C1.9, C2.9, C3.9kmlKMLKML-encoded description of tree location.

In this example, output for column C1 (GID) is not required; note the suppressOutput annotation on this column.

Row annotations:

idcore annotations
tablenumbersource numbercells
R1T12C1.1, C1.2, C1.3, C1.4, C1.5, C1.6, C1.7, C1.8, C1.9
R2T23C2.1, C2.2, C2.3, C2.4, C2.5, C2.6, C2.7, C2.8, C2.9
R3T34C3.1, C3.2, C3.3, C3.4, C3.5, C3.6, C3.7, C3.8, C3.9

Cell annotations:

idcore annotationsannotations
tablecolumnrowstring valuevalueerrorslangdatatypeformatdefaultaboutUrl
C1.1TC1R1"1""1"stringhttp://example.org/tree-ops-ext#gid-1
C1.2TC2R1"ADDISON AV""ADDISON AV"string<http://example.org/tree-ops-ext#gid-1>
C1.3TC3R1"Celtis australis""Celtis australis"string<http://example.org/tree-ops-ext#gid-1>
C1.4TC4R1"Large Tree Routine Prune""Large Tree Routine Prune"enstring<http://example.org/tree-ops-ext#gid-1>
C1.5TC5R1"11"11integer<http://example.org/tree-ops-ext#gid-1>
C1.6TC6R1"10/18/2010"2010-10-18dateM/d/yyyy<http://example.org/tree-ops-ext#gid-1>
C1.7TC7R1""nullstring<http://example.org/tree-ops-ext#gid-1>
C1.8TC8R1""falsebooleanYES|NO"NO"<http://example.org/tree-ops-ext#gid-1>
C1.9TC9R1"<Point><coordinates>-122.156485,37.440963</coordinates></Point>""<Point><coordinates>-122.156485,37.440963</coordinates></Point>"xml<http://example.org/tree-ops-ext#gid-1>
C2.1TC1R2"2""2"string<http://example.org/tree-ops-ext#gid-2>
C2.2TC2R2"EMERSON ST""EMERSON ST"string<http://example.org/tree-ops-ext#gid-2>
C2.3TC3R2"Liquidambar styraciflua""Liquidambar styraciflua"string<http://example.org/tree-ops-ext#gid-2>
C2.4TC4R2"Large Tree Routine Prune""Large Tree Routine Prune"enstring<http://example.org/tree-ops-ext#gid-2>
C2.5TC5R2"11"11integer<http://example.org/tree-ops-ext#gid-2>
C2.6TC6R2"6/2/2010"2010-06-02dateM/d/yyyy<http://example.org/tree-ops-ext#gid-2>
C2.7TC7R2""nullstring<http://example.org/tree-ops-ext#gid-2>
C2.8TC8R2""falsebooleanYES|NO"NO"<http://example.org/tree-ops-ext#gid-2>
C2.9TC9R2"<Point><coordinates>-122.156749,37.440958</coordinates></Point>""<Point><coordinates>-122.156749,37.440958</coordinates></Point>"xml<http://example.org/tree-ops-ext#gid-2>
C3.1TC1R3"6""6"string<http://example.org/tree-ops-ext#gid-6>
C3.2TC2R3"ADDISON AV""ADDISON AV"string<http://example.org/tree-ops-ext#gid-6>
C3.3TC3R3"Robinia pseudoacacia""Robinia pseudoacacia"string<http://example.org/tree-ops-ext#gid-6>
C3.4TC4R3"Large Tree Routine Prune""Large Tree Routine Prune"enstring<http://example.org/tree-ops-ext#gid-6>
C3.5TC5R3"29"29integer<http://example.org/tree-ops-ext#gid-6>
C3.6TC6R3"6/1/2010"2010-06-01dateM/d/yyyy<http://example.org/tree-ops-ext#gid-6>
C3.7TC7R3"cavity or decay; trunk decay; codominant leaders; included bark; large leader or limb decay; previous failure root damage; root decay; beware of BEES""cavity or decay", "trunk decay", "codominant leaders", "included bark", "large leader or limb decay", "previous failure root damage", "root decay", "beware of BEES"string<http://example.org/tree-ops-ext#gid-6>
C3.8TC8R3"YES"truebooleanYES|NO"NO"<http://example.org/tree-ops-ext#gid-6>
C3.9TC9R3"<Point><coordinates>-122.156299,37.441151</coordinates></Point>""<Point><coordinates>-122.156299,37.441151</coordinates></Point>"xml<http://example.org/tree-ops-ext#gid-6>

For brevity, the propertyUrl is not shown in the table of cell annotations. Where not explicitly set, the value of propertyUrl defaults to the URI Template (see [[RFC6570]]) #{[column-name]}, where [column-name] is the value of the name property for the column associated with the cell. For example, the value of propertyUrl for all cells in column C2 ("name": "on_street") is http://example.org/tree-ops-ext.csv#on_street.

The lists of values from cells in column C7 ("name": "comments") are assumed to be unordered as the boolean property ordered, with default value true, has not be set within the metadata description.

Minimal mode output for this example is provided in Turtle [[turtle]] syntax below:

        

The subject described by each row is explcitly defined using the aboutUrl property; e.g. the subject of row R1 is http://example.org/tree-ops-ext#gid-1.

Output for column C1 ({ "name": "GID" }) is not included as column property suppressOutput has value true.

A language tag is specified for values of column C4 ({ "name": "trim_cycle" }) as cell property lang is specified with value en.

The datatype property is set on columns C5, C6, C8 and C9 ({ "name": "dbh"}, { "name": "inventory_date" }, { "name": "protected" } and { "name": "kml" }); integer, date, boolean and xml respectively. The datatype property is inherited by all cells in each of those columns, therefore the RDF output for those cells includes the appropriate datatype IRI.

Cells C1.7 and C2.7 (rows R1 and R2; column, { "name": "comments" }) have null values - no output is included for these cells.

Cell C3.7 (row R3; column, { "name": "comments" }) contains an unordered sequence of values; the set of values are included as a simple set of triples as opposed to an instance of rdf:List as the ordered property has not been specified (default is unorderd).

Standard mode output for this example is provided in Turtle [[turtle]] syntax below:

        

Table T ({ "url": "http://example.org/tree-ops-ext.csv"}) has been explicitly identified: { "@id": "<http://exmple.org/tree-ops-ext>"}.

Common properties and notes specified for table T ({ "url": "http://example.org/tree-ops-ext.csv"}) are included in the output.

As the metadata description file http://example.org/tree-ops-ext.csv-metadata.json defines a default language within the context ("@context": ["http://www.w3.org/ns/csvw", {"@language": "en"}]), all common properties of type string (e.g. dc:title, dc:keywords, dc:publisher, dc:license and dc:modified) are expressed in the RDF output using the the appropriate language tag.

Example with single table and using virtual columns to produce multiple subjects per row

This example uses a single annotated table describing a listing of music events. Each row from the tabular data file corresponds to three resources; the music event itself, the location where that event occurs and the offer to sell tickets for that event. The goal is to convert the CSV content into schema.org markup that a search engine such as Googlecan use to index music events. Details of how Google expects this information to be structured can be found here.

The input tabular data file, published at http://example.org/events-listing.csv, and the associated metadata description http://example.org/events-listing.csv-metadata.json are provided below:

        
        

The CSV to RDF translation is limited to providing one statement, or triple, per column in the table. The target schema.org markup requires 10 statements to describe each event. As the base tabular data file contains 5 columns, an additional 5 virtual columns have been added in order to provide for the full complement of statements - including the relationships between the 3 resources (event, location and offer) described by each row of the table. Note that the virtual property is set to true for these virtual columns.

Furthermore, note that no attempt is made to reconcile between locations or offers that may be associated with more than one event; every row in the table will create both a new location resource and offer resource in addition to the event resource. If considered necessary, applications such as OpenRefine may be used to identify and reconcile duplicate location resources once the RDF output has been generated.

The annotated table generated from parsing the tabular data file and associated metadata is shown below and provides the basis for the conversion to RDF.

Annotations for the resulting table T, with 10 columns and 2 rows, are shown below:

idcore annotationsannotations
urlcolumnsrows
Thttp://example.org/events-listing.csvC1, C2, C3, C4, C5, C6, C7, C8, C9, C10R1, R2

Annotations for the columns, rows and cells in table T are shown in the tables below.

Column annotations:

idcore annotationsannotations
tablenumbersource numbercellsnametitlevirtual
C1T11C1.1, C2.1nameName
C2T22C1.2, C2.2start_dateStart Date
C3T33C1.3, C2.3location_nameLocation Name
C4T44C1.4, C2.4location_addressLocation Address
C5T55C1.5, C2.5ticket_urlTicket Url
C6T66C1.6, C2.6type_eventtrue
C7T77C1.7, C2.7type_placetrue
C8T88C1.8, C2.8type_offertrue
C9T99C1.9, C2.9locationtrue
C10T1010C1.10, C2.10offerstrue

Row annotations:

idcore annotations
tablenumbersource numbercells
R1T12C1.1, C1.2, C1.3, C1.4, C1.5, C1.6, C1.7, C1.8, C1.9, C1.10
R2T23C2.1, C2.2, C2.3, C2.4, C2.5, C2.6, C2.7, C2.8, C2.9, C2.10

Cell annotations:

idcore annotationsannotations
tablecolumnrowstring valuevalueerrorsdatatypeformataboutUrlpropertyUrlvalueUrl
C1.1TC1R1"B.B. King""B.B. King"string<http://example.org/events-listing.csv#event-1>schema:name
C1.2TC2R1"2014-04-12T19:30"2014-04-12T19:30:00datetimeyyyy-MM-ddTHH:mm:ss<http://example.org/events-listing.csv#event-1>schema:startDate
C1.3TC3R1"Lupo’s Heartbreak Hotel""Lupo’s Heartbreak Hotel"string<http://example.org/events-listing.csv#place-1>schema:name
C1.4TC4R1"79 Washington St., Providence, RI""79 Washington St., Providence, RI"string<http://example.org/events-listing.csv#place-1>schema:address
C1.5TC5R1"https://www.etix.com/ticket/1771656"<https://www.etix.com/ticket/1771656>anyURI<http://example.org/events-listing.csv#offer-1>schema:url
C1.6TC6R1""nullstring<http://example.org/events-listing.csv#event-1>rdf:typeschema:MusicEvent
C1.7TC7R1""nullstring<http://example.org/events-listing.csv#place-1>rdf:typeschema:Place
C1.8TC8R1""nullstring<http://example.org/events-listing.csv#offer-1>rdf:typeschema:Offer
C1.9TC9R1""nullstring<http://example.org/events-listing.csv#event-1>schema:location<http://example.org/events-listing.csv#place-1>
C1.10TC10R1""nullstring<http://example.org/events-listing.csv#event-1>schema:offers<http://example.org/events-listing.csv#offer-1>
C2.1TC1R2"B.B. King""B.B. King"string<http://example.org/events-listing.csv#event-2>schema:name
C2.2TC2R2"2014-04-13T20:00"2014-04-13T20:00:00datetimeyyyy-MM-ddTHH:mm:ss<http://example.org/events-listing.csv#event-2>schema:startDate
C2.3TC3R2"Lynn Auditorium""Lynn Auditorium"string<http://example.org/events-listing.csv#place-2>schema:name
C2.4TC4R2"Lynn, MA, 01901""Lynn, MA, 01901"string<http://example.org/events-listing.csv#place-2>schema:address
C2.5TC5R2"http://frontgatetickets.com/venue.php?id=11766"<http://frontgatetickets.com/venue.php?id=11766>anyURI<http://example.org/events-listing.csv#offer-2>schema:url
C2.6TC6R2""nullstring<http://example.org/events-listing.csv#event-2>rdf:typeschema:MusicEvent
C2.7TC7R2""nullstring<http://example.org/events-listing.csv#place-2>rdf:typeschema:Place
C2.8TC8R2""nullstring<http://example.org/events-listing.csv#offer-2>rdf:typeschema:Offer
C2.9TC9R2""nullstring<http://example.org/events-listing.csv#event-2>schema:location<http://example.org/events-listing.csv#place-2>
C2.10TC10R2""nullstring<http://example.org/events-listing.csv#event-2>schema:offers<http://example.org/events-listing.csv#offer-2>

Minimal mode output for this example is provided in Turtle [[turtle]] syntax below:

        

Three resources are defined for each row within the table; event, location and offer.

Each column explicitly defines both aboutUrl and propertyUrl properties which are inherited by the column's cells.

Columns C6, C7 and C8 ({ "name": "type_event"}, { "name": "type_place"} and { "name": "type_offer"}) define the semantic types of the resources described by each row: schema:MusicEvent, schema:Place and schema:Offer respectively.

Column C9 ({ "name": "location"}) uses the aboutUrl and valueUrl to assert the relationship between the event and location resources.

Column C10 ({ "name": "offer"}) uses the aboutUrl and valueUrl to assert the relationship between the event and offer resources.

Standard mode output for this example is provided in Turtle [[turtle]] syntax below:

        

The resources described by each row are explcitly defined using the aboutUrl property - in this case three resources per row (event, location and offer); the relationship between the row and each subject resource is asserted using the csvw:describes property; e.g. for row R1 we state [] csvw:describes t1:event-1, t1:place-1, t1:offer-1 .

Example with table group comprising three interrelated tables

This example is based on Use Case #4 - Publication of public sector roles and salaries and uses three annotated tables published within a table group. Information about senior roles and junior roles within a government department are published in CSV format by each department. These are validated against a centrally published schema to ensure that all the data published by departments is consistent. Additionally, a list of professions is also published centrally, providing a controlled vocabulary against which departmental submissions are validated.

The input tabular data files and associated metadata descriptions are provided below:

        
        
        
        
        
        

In this example, the resource gov.uk/professions.csv is identified using a relative URL to host http://example.org. In reality this resource would be published centrally by government and served from some remote host. Similarly, the metadata description resource metadata.json would be also be centrally published. Government departments seeking to validate their role and salary data would download a copy of this metadata description and place it, without modification, in the same directory as their tabular data files whose names MUST match those specified in the metadata description; senior-roles.csv and junior-roles.csv.

The table group generated from parsing the tabular data files and associated metadata is shown below and provides the basis for the conversion to RDF.

Annotations for the table group G and the three tables Ta, Tb, and Tc are shown below.

Table Group annotations:

idcore annotationsannotations
resources
GTa, Tb, Tc@typeTableGroup

Table annotations:

idcore annotationsannotations
urlcolumnsrowsprimaryKeysuppressOutputforeignKeys
columnsreference
Tahttp://example.org/gov.uk/professions.csvCa1Ra1, Ra2, Ra3, Ra4Ca1true
Tbhttp://example.org/senior-roles.csvCb1, Cb2, Cb3, Cb4, Cb5, Cb6Rb1, Rb2Cb1Cb5Cb1
Cb6Ca1
Tchttp://example.org/junior-roles.csvCc1, Cc2, Cc3, Cc4, Cc5, Cc6, Cc7Rc1, Rc2Cc1Cb1
Cc7Ca1

In this example, output for the centrally published list of professions, table Ta (http://example.org/gov.uk/professions.csv), is not required; only information from the departmental submissions is to be translated to RDF. Note the suppressOutput annotation on this table.

Annotations for the columns, rows and cells in table T are shown in the tables below.

Column annotations:

idcore annotationsannotations
tablenumbersource numbercellsnametitlerequired
Ca1Ta11Ca1.1, Ca2.1, Ca3.1, Ca4.1nameProfessiontrue
Cb1Tb11Cb1.1, Cb2.1refPost Unique Referencetrue
Cb2Tb22Cb1.2, Cb2.2nameName
Cb3Tb33Cb1.3, Cb2.3gradeGrade
Cb4Tb44Cb1.4, Cb2.4jobJob Title
Cb5Tb55Cb1.5, Cb2.5reportsToReports to Senior Post
Cb6Tb66Cb1.6, Cb2.6professionProfession
Cc1Tc11Cc1.1, Cc2.1reportsToSeniorReporting Senior Post
Cc2Tc22Cc1.2, Cc2.2gradeGrade
Cc3Tc33Cc1.3, Cc2.3min_payPayscale Minimum (£)
Cc4Tc44Cc1.4, Cc2.4max_payPayscale Maximum (£)
Cc5Tc55Cc1.5, Cc2.5jobGeneric Job Title
Cc6Tc66Cc1.6, Cc2.6numberNumber of Posts (FTE)
Cc7Tc77Cc1.7, Cc2.7professionProfession

Row annotations:

idcore annotations
tablenumbersource numbercells
Ra1Ta12Ca1.1
Ra2Ta23Ca2.1
Ra3Ta34Ca3.1
Ra4Ta45Ca4.1
Rb1Tb12Cb1.1, Cb1.2, Cb1.3, Cb1.4, Cb1.5, Cb1.6
Rb2Tb23Cb2.1, Cb2.2, Cb2.3, Cb2.4, Cb2.5, Cb2.6
Rc1Tc12Cc1.1, Cc1.2, Cc1.3, Cc1.4, Cc1.5, Cc1.6, Cc1.7
Rc2Tc23Cc2.1, Cc2.2, Cc2.3, Cc2.4, Cc2.5, Cc2.6, Cc2.7

Cell annotations:

idcore annotationsannotations
tablecolumnrowstring valuevalueerrorsdatatypeaboutUrlpropertyUrlvalueUrl
Ca1.1TaCa1Ra1"Finance""Finance"string
Ca2.1TaCa1Ra2"Information Technology""Information Techology"string
Ca3.1TaCa1Ra3"Operational Delivery""Operational Delivery"string
Ca4.1TaCa1Ra4"Policy""Policy"string
Cb1.1TbCb1Rb1"90115""90115"string<http://example.org/senior-roles.csv#post-90115>dc:identifier
Cb1.2TbCb2Rb1"Steve Egan""Steve Egan"string<http://example.org/senior-roles.csv#post-90115>foaf:name
Cb1.3TbCb3Rb1"SCS1A""SCS1A"string<http://example.org/senior-roles.csv#post-90115><http://example.org/def/grade>
Cb1.4TbCb4Rb1"Deputy Chief Executive""Deputy Chief Executive"string<http://example.org/senior-roles.csv#post-90115><http://example.org/def/job>
Cb1.5TbCb5Rb1"90334""90334"string<http://example.org/senior-roles.csv#post-90115><http://example.org/def/reportsTo><http://example.org/senior-roles.csv#post-90334>
Cb1.6TbCb6Rb1"Finance""Finance"string<http://example.org/senior-roles.csv#post-90115><http://example.org/def/profession>
Cb2.1TbCb1Rb2"90334""90334"string<http://example.org/senior-roles.csv#post-90334>dc:identifier
Cb2.2TbCb2Rb2"Sir Alan Langlands""Sir Alan Langlands"string<http://example.org/senior-roles.csv#post-90334>foaf:name
Cb2.3TbCb3Rb2"SCS4""SCS4"string<http://example.org/senior-roles.csv#post-90334><http://example.org/def/grade>
Cb2.4TbCb4Rb2"Chief Executive""Chief Executive"string<http://example.org/senior-roles.csv#post-90334><http://example.org/def/job>
Cb2.5TbCb5Rb2"xx"nullstring<http://example.org/senior-roles.csv#post-90334><http://example.org/def/reportsTo>
Cb2.6TbCb6Rb2"Policy""Policy"string<http://example.org/senior-roles.csv#post-90334><http://example.org/def/profession>
Cc1.1TcCc1Rc1"90115""90115"string<http://example.org/def/reportsTo><http://example.org/senior-roles.csv#post-90115>
Cc1.2TcCc2Rc1"4""4"string<http://example.org/def/grade>
Cc1.3TcCc3Rc1"17426"17426integer<http://example.org/def/min_pay>
Cc1.4TcCc4Rc1"20002"20002integer<http://example.org/def/max_pay>
Cc1.5TcCc5Rc1"Administrator""Administrator"string<http://example.org/def/job>
Cc1.6TcCc6Rc1"8.67"8.67number<http://example.org/def/number-of-posts>
Cc1.7TcCc7Rc1"Operational Delivery""Operational Delivery"string<http://example.org/def/profession>
Cc2.1TcCc1Rc2"90115""90115"string<http://example.org/def/reportsTo><http://example.org/senior-roles.csv#post-90115>
Cc2.2TcCc2Rc2"5""5"string<http://example.org/def/grade>
Cc2.3TcCc3Rc2"19546"19546integer<http://example.org/def/min_pay>
Cc2.4TcCc4Rc2"22478"22478integer<http://example.org/def/max_pay>
Cc2.5TcCc5Rc2"Administrator""Administrator"string<http://example.org/def/job>
Cc2.6TcCc6Rc2"0.5"0.5number<http://example.org/def/number-of-posts>
Cc2.7TcCc7Rc2"Operational Delivery""Operational Delivery"string<http://example.org/def/profession>

Notice that valueUrl is not specified for cell Cb2.5 because the cell value is null and the virtual property of column Cb5 is not specified.

Minimal mode output for this example is provided in Turtle [[turtle]] syntax below:

        

Output for table Ta ({ "url": "http://example.org/gov.uk/professions.csv" }) is not included as property suppressOutput has value true.

The propertyUrl is specified for all cells in tables Tb and Tc.

Columns Cb5 and Cc1 ({ "name": "reportsTo" } and { "name": "reportsToSenior" }) use the aboutUrl, propertyUrl and valueUrl properties to assert the relationship between the given post and the senior post it reports to for the cells therein.

Standard mode output for this example is provided in Turtle [[turtle]] syntax below:

        

Table group G was explicitly defined, but has not been explicitly identified; the table resource is treated as a blank node.

The resources described by each row of table Tb ({ "url": "http://example.org/senior-roles.csv"}) are explcitly defined using the aboutUrl property; therefore, say, for row Rb1 we state [] csvw:describes t2:post-90115 .; whilst the aboutUrl property has not been defined for resources described by each row of table Tc ({ "url": "http://example.org/junior-roles.csv"}); therefore blank nodes are used, e.g. for row Rc1 we state [] csvw:describes _:d8b8e40c-8c74-458b-99f7-64d1cf5c65f2 ..

Acknowledgements