This document defines a textual syntax for RDF called TriG that allows an RDF dataset to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. TriG is an extension of the Turtle [[RDF12-TURTLE]] format.

RDF 1.2 TriG shares triple terms with [[RDF12-TURTLE]] as a fourth kind of RDF term which can be used as the object of another triple, making it possible to make statements about other statements. RDF 1.2 TriG also adds shares directional language-tagged strings with [[RDF12-TURTLE]].

In addition, RDF 1.2 TriG shares the reifying triples and annotation syntax extensions with [[RDF12-TURTLE]] which allows triple terms to also be asserted.

This document is part of the RDF 1.2 document suite. TriG is intended the meet the charter requirement of the RDF Working Group to define an RDF syntax for multiple graphs. TriG is an extension of the Turtle syntax for RDF [[RDF12-TURTLE]]. The current document is based on the original proposal by Chris Bizer and Richard Cyganiak.

Introduction

This document defines TriG, a concrete syntax for RDF as defined in the RDF Concepts and Abstract Syntax document [[!RDF12-CONCEPTS]]. TriG is an extension of Turtle [[RDF12-TURTLE]], extended to support representing a complete RDF dataset.

TriG Language

A TriG document allows writing down an RDF dataset in a compact textual form. It consists of a sequence of directives, triple statements, graph statements which contain triple-generating statements and optional blank lines. Comments may be given after a # that is not part of another lexical token and continue to the end of the line.

Graph statements are a pair of an IRI or blank node and a group of triple statements surrounded by braces ({}). The IRI or blank node of the graph statement may be used in another graph statement which implies taking the union of the tripes generated by each graph statement. An IRI or blank node used as a graph label may also reoccur as part of any triple statement. Optionally a graph statement may not not be labeled with an IRI. Such a graph statement corresponds to the Default Graph of an RDF dataset.

The construction of an RDF dataset from a TriG document is defined in and .

Triple Statements

As TriG is an extention of the Turtle language it allows for any constructs from the Turtle language. Simple Triples, Predicate Lists, and Object Lists can all be used either inside a graph statement, or on their own as in a TriG document. When outside a graph statement, the triples are considered to be part of the default graph of the RDF dataset.

Graph Statements

A graph statement pairs an IRIs or blank node with an RDF graph. The triple statements that make up the graph are enclosed in {}.

In a TriG document a graph IRI or blank node may be used as label for more than one graph statements. The graph label of a graph statement may be omitted. In this case the graph is considered the default graph of the RDF dataset.

A RDF dataset might contain only a single graph.

      
    

An RDF dataset may contain a default graph and/or zero or more named graphs.

      
    

TriG provides various alternative ways to write graphs and triples, giving the data writer choices for clarity:

      
    

Triple Terms

TriG shares the same syntax for representing triple terms as Turtle, including Reifying Triples and the Annotation Syntax.

      
    

Other Terms

All other terms and directives come from Turtle.

Special Considerations for Blank Nodes

Blank nodes sharing the same identifier in differently labeled graph statements are considered to be the same blank node.

This specification defines conformance criteria for:

A conforming TriG document is a Unicode string that conforms to the grammar and additional constraints defined in , starting with the trigDoc production. A TriG document serializes an RDF dataset.

A conforming TriG parser is a system capable of reading TriG documents on behalf of an application. It makes the serialized RDF dataset, as defined in , available to the application, usually through some form of API.

The IRI that identifies the TriG language is: http://www.w3.org/ns/formats/TriG

This specification does not define how TriG parsers handle non-conforming input documents.

Media Type and Content Encoding

The media type of TriG is application/trig. The content encoding of TriG content is always UTF-8.

TriG Grammar

A TriG document is a Unicode [[!UNICODE]] character string encoded in UTF-8. Unicode characters only in the range U+0000 to U+10FFFF inclusive are allowed.

White Space

White space (production WS) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. Rule names below in capitals indicate where white space is significant; these form a possible choice of terminals for constructing a TriG parser.

White space is significant in the production String.

Comments

Comments in TriG start with a # outside an IRIREF or String, and continue to the end of line (marked by LF, or CR), or end of file if there is no end of line after the comment marker. Comments are treated as white space.

IRI References

Relative IRI references are resolved with base IRIs as per [[[RFC3986]]] [[RFC3986]] using only the basic algorithm in section 5.2. Neither Syntax-Based Normalization nor Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed. Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of [[[RFC3987]]] [[RFC3987]].

The @base or BASE directive defines the Base IRI used to resolve relative IRI references per [[RFC3986]] section 5.1.1, "Base URI Embedded in Content". Section 5.1.2, "Base URI from the Encapsulating Entity" defines how the In-Scope Base IRI may come from an encapsulating document, such as a SOAP envelope with an `xml:base` directive or a MIME multipart document with a `Content-Location` header. The "Retrieval URI" identified in 5.1.3, Base "URI from the Retrieval URI", is the URL from which a particular TriG document was retrieved. If none of the above specifies the Base URI, the default Base URI (section 5.1.4, "Default Base URI") is used. Each @base or BASE directive sets a new In-Scope Base URI, relative to the previous one.

Escape Sequences

There are three forms of escapes used in TriG documents:

Context where each kind of escape sequence can be used
numeric
escapes
string
escapes
reserved character
escapes
IRIs, used as RDF terms or as in PREFIX or BASE declarations yes no no
local names no no yes
Strings yes yes no

%-encoded sequences are in the character range for IRIs and are explicitly allowed in local names. These appear as a '%' followed by two hex characters and represent that same sequence of three characters. These sequences are not decoded during processing. A term written as <http://a.example/%66oo-bar> in TriG designates the IRI http://a.example/%66oo-bar and not IRI http://a.example/foo-bar. A term written as ex:%66oo-bar with a prefix PREFIX ex: <http://a.example/> also designates the IRI http://a.example/%66oo-bar.

Grammar

The EBNF used here is defined in XML 1.0 [[!EBNF-NOTATION]].

Notes:

  1. A blank node identifier represents the same blank node throughout the TriG document.
  2. Keywords in single quotes ('@base', '@prefix', 'a', 'true', 'false') are case-sensitive. Keywords in quotation marks ("BASE", "PREFIX", "GRAPH") are case-insensitive.
  3. Escape sequences UCHAR and ECHAR are case sensitive.
  4. When tokenizing the input and choosing grammar rules, the longest match is chosen.
  5. The TriG grammar is LL(1) and LALR(1) when the rules with uppercased names are used as terminals.
  6. The entry point into the grammar is trigDoc.
  7. In signed numbers, no white space is allowed between the sign and the number.
  8. The strings '@prefix' and '@base' match the pattern for LANG_DIR, though neither prefix nor base are registered language subtags. This specification does not define whether a quoted literal followed by either of these tokens (e.g., "A"@base) is in the Turtle language.

A text version of this grammar is available here.

Selected Terminal Literal Strings

This document uses some specific terminal literal strings [[EBNF-NOTATION]]. To clarify the Unicode code points used for these terminal literal strings, the following table describes specific characters and sequences used throughout this document.

CodeGlyphDescription
U+000A LF Line feed
U+000D CR Carriage return
U+0022 " Quotation mark
U+0023 # Number sign
U+0027 ' Apostrophe
U+002D - Hyphen
U+003B : Colon
U+0040 @ At sign
U+005C \ Backslash
U+005F _ Underscore
U+0061 a Latin small letter E
U+007B { Left curly bracket
U+007D } Right curly bracket

Other short terminal literal strings are composed of specific sequences of Unicode characters:

space
U+0020
"""
three concatenated quotation mark characters, each having the code point U+0022
'''
three concatenated apostrophe characters, each having the code point U+0027
--
two concatenated - characters

Parsing

The RDF Concepts and Abstract Syntax [[!RDF12-CONCEPTS]] specification defines four types of RDF term: IRIs, literals, blank nodes, and triple terms. Literals are composed of a lexical form and an optional language tag [[!BCP47]] – possibly including a base direction – or an optional datatype IRI. An extra type, prefix, is used during parsing to map string identifiers to namespace IRIs. This section maps a string conforming to the grammar in to a set of triples by mapping strings matching productions and lexical tokens to RDF terms or their components (e.g., language tags, lexical forms of literals). Grammar productions change the parser state and emit triples.

Parser State

Parsing TriG requires a state of nine items:

Term Constructors can create a stack of these values indicated by using language such as "records the |curSubject| and |curPredicate|."

RDF Term Constructors

This table maps productions and lexical tokens to RDF terms or components of RDF terms listed in :

production type procedure
IRIREF IRI The characters between "<" and ">" are taken, after the numeric escape sequences are processed, to form the Unicode string of the IRI. Relative IRI reference resolution is performed per .
PNAME_NS prefix When used in a prefixID or sparqlPrefix production, the prefix is the potentially empty Unicode string matching the first argument of the rule is a key into the namespaces map.
IRI When used in a PrefixedName production, the iri is the value in the namespaces map corresponding to the first argument of the rule.
PNAME_LN IRI A potentially empty prefix is identified by the first sequence, PNAME_NS. The namespaces map MUST have a corresponding namespace. The Unicode string of the IRI is formed by unescaping the reserved characters in the second argument, PN_LOCAL, and concatenating this onto the namespace.
STRING_LITERAL_SINGLE_QUOTElexical formThe characters between the outermost 's are taken, with numeric and string escape sequences unescaped, to form the Unicode string of a lexical form.
STRING_LITERAL_QUOTElexical formThe characters between the outermost "s are taken, with numeric and string escape sequences unescaped, to form the Unicode string of a lexical form.
STRING_LITERAL_LONG_SINGLE_QUOTElexical formThe characters between the outermost '''s are taken, with numeric and string escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG_QUOTElexical formThe characters between the outermost """s are taken, with numeric and string escape sequences unescaped, to form the Unicode string of a lexical form.
LANG_DIR language tagThe characters following the @ form the language tag and optionally the base direction, if the matched characters include --.
RDFLiteral literal The literal has a lexical form of the first rule argument, String. If the '^^' iri rule is matched, the datatype IRI is derived from the iri, and the literal has no language tag. If the LANG_DIR rule is matched, the language tag and base direction are taken from LANG_DIR. If there is no base direction, the datatype is rdf:langString. If there is a base direction, the datatype is rdf:dirLangString. If neither matched, the datatype is xsd:string, and the literal has no language tag.
INTEGER literal The literal has a lexical form of the input string, and a datatype of xsd:integer.
DECIMAL literal The literal has a lexical form of the input string, and a datatype of xsd:decimal.
DOUBLE literal The literal has a lexical form of the input string, and a datatype of xsd:double.
BooleanLiteral literal The literal has a lexical form of the true or false, depending on which matched the input, and a datatype of xsd:boolean.
BLANK_NODE_LABEL blank node The string matching the second argument, PN_LOCAL, is a key in bnodeLabels. If there is no corresponding blank node in the map, one is allocated.
ANON blank node A blank node is generated.
blankNodePropertyListblank node A blank node is generated. Note the rules for blankNodePropertyList in the next section.
collection blank node For non-empty lists, a blank node is generated. Note the rules for collection in the next section.
IRI For empty lists, the resulting IRI is rdf:nil. Note the rules for collection in the next section.
reifier IRI | blank node The |curReifier| is taken from term, which is taken from the matched iri production or BlankNode production, if any. If no such production is matched, term is taken from a fresh RDF blank node.
tripleTerm triple term The triple term is composed of the terms constructed from the ttSubject, predicate, and ttObject productions.
reifiedTriple IRI | blank node The term is taken from the matched reifier, if any, or from a fresh RDF blank node.
annotationBlock IRI | blank node The term is taken from a previously matched reifier, if any, or from a fresh RDF blank node.

RDF Triples Construction

A TriG document defines an RDF Dataset composed of one default graph and zero or more named graphs. Each graph is composed of a set of RDF triples.

Output Graph

The state |curGraph| is initially unset. It records the label of the graph for triples produced during parsing. If undefined, the default graph is used.

The rule labelOrSubject sets both |curGraph| and |curSubject| (only one of these will be used).

The following grammar production clauses set |curGraph| to be undefined, indicating the default graph:

  • The grammar production clause wrappedGraph in rule block
  • The grammar production in rule triples2.

The grammar production labelOrSubject predicateObjectList '.' unsets |curGraph| before handling the predicateObjectList production in rule triplesOrGraph.

Triple Output

Each RDF triple produced is added to |curGraph|, or the default graph if |curGraph| is not set at that point in the parsing process.

The subject production sets the |curSubject|. The verb production sets the |curPredicate|.

Triples are produced at the following points in the parsing process and each RDF triple produced is added to the graph identified by |curGraph|.

Triple Production

Each object N in the document produces an RDF triple: |curSubject| |curPredicate| N.

Reifiers

Beginning the reifier production, the |curReifier| is taken from the reifier term constructor. Then yield the the RDF triple |curReifier| rdf:reifies |curTripleTerm|.

Reified Triples

Beginning the reifiedTriple production records the |curTripleTerm|. A new tripleTerm instance |curTripleTerm| is created using the rtSubject, verb, and rtObject productions. Finishing the reifiedTriple production, if the |curReifier| is not set, it is assigned a fresh RDF blank node; it next yields the RDF triple |curReifier| rdf:reifies |curTripleTerm|, and then restores the recorded value of the |curTripleTerm|. The node produced by matching reifiedTriple is the the |curReifier|.

Annotations

Beginning the annotation production records the |curSubject| and |curPredicate|. A new tripleTerm instance |curTripleTerm| is created using the |curSubject| |curPredicate| |curObject|, and the value of the |curReifier| is cleared. Finishing the annotation production restores the recorded values of the |curSubject| and |curPredicate|.

Annotation Blocks

Beginning the annotationBlock production records the |curTripleTerm|. If the |curReifier| is not set, then it is assigned a fresh RDF blank node and the production yields the RDF triple |curReifier| rdf:reifies |curTripleTerm|. The |curSubject| is taken from the |curReifier| Finishing the annotationBlock production clears the value of the |curReifier| and restores the |curTripleTerm|.

If the |curReifier| was already set, the reifying triple |curReifier| rdf:reifies |curTripleTerm| was emitted in .

Property Lists

Beginning the blankNodePropertyList production records the |curSubject| and |curPredicate|, and sets |curSubject| to a novel blank node B. Finishing the blankNodePropertyList production restores |curSubject| and |curPredicate|. The node produced by matching blankNodePropertyList is the blank node B.

Collections

Beginning the collection production records the |curSubject| and |curPredicate|. Each object in the collection production has a |curSubject| set to a novel blank node B and a |curPredicate| set to rdf:first. For each object objectn after the first produces a triple:objectn-1 rdf:rest objectn . Finishing the collection production creates an additional triple curSubject rdf:rest rdf:nil . and restores |curSubject| and |curPredicate| The node produced by matching collection is the first blank node B for non-empty lists and rdf:nil for empty lists.

Privacy Considerations

The TriG format is used to express arbitrary application data, which may include the expression of personally identifiable information (PII) or other information which could be considered sensitive. Authors publishing such information are advised to carefully consider the needs and use of publishing such information, as well as the applicable regulations for the regions where the data is expected to be consumed and potentially revealed (e.g., GDPR, CCPA, others), particularly whether authorization measures are needed for access to the data.

Security Considerations

The STRING_LITERAL_SINGLE_QUOTE, STRING_LITERAL_QUOTE, STRING_LITERAL_LONG_SINGLE_QUOTE, and STRING_LITERAL_LONG_QUOTE, productions allows the use of unescaped control characters. Although this specification does not directly expose this content to an end user, it might be presented through a user agent, which may cause the presented text to be obfuscated due to presentation of such characters.

TriG is a general-purpose assertion language; applications may evaluate given data to infer more assertions or to dereference IRIs, invoking the security considerations of the scheme for that IRI. Note in particular, the privacy issues in [[RFC3023]] section 10 for HTTP IRIs. Data obtained from an inaccurate or malicious data source may lead to inaccurate or misleading conclusions, as well as the dereferencing of unintended IRIs. Care must be taken to align the trust in consulted resources with the sensitivity of the intended use of the data; inferences of potential medical treatments would likely require different trust than inferences for trip planning.

The TriG language is used to express arbitrary application data; security considerations will vary by domain of use. Security tools and protocols applicable to text (for example, PGP encryption, checksum validation, password-protected compression) may also be used on TriG documents. Security/privacy protocols must be imposed which reflect the sensitivity of the embedded information.

TriG can express data which is presented to the user, such as RDF Schema labels. Applications rendering strings retrieved from untrusted TriG documents, or using unescaped characters, SHOULD use warnings and other appropriate means to limit the possibility that malignant strings might be used to mislead the reader. The security considerations in the media type registration for XML ([[RFC3023]] section 10) provide additional guidance around the expression of arbitrary data and markup.

TriG uses IRIs as term identifiers. Applications interpreting data expressed in TriG SHOULD address the security issues of [[[RFC3987]]] [[RFC3987]] Section 8, as well as [[[RFC3986]]] [[RFC3986]] Section 7.

Multiple IRIs may have the same appearance. Characters in different scripts may look similar (for instance, a Cyrillic "о" (code point U+043E) may appear similar to a Latin "o" (code point U+006F)). A character followed by combining characters may have the same visual representation as another character (for example, LATIN SMALL LETTER "E" (code point U+0065) followed by COMBINING ACUTE ACCENT (code point U+0301) has the same visual representation as LATIN SMALL LETTER "E" WITH ACUTE (U+00E9)). Any person or application that is writing or interpreting data in TriG must take care to use the IRI that matches the intended semantics, and avoid IRIs that may look similar. Further information about matching visually similar characters can be found in [[[UNICODE-SECURITY]]] [[UNICODE-SECURITY]] and [[[RFC3987]]] [[RFC3987]] Section 8.

Internet Media Type, File Extension and Macintosh File Type

The Internet Media Type (formerly known as MIME Type) for TriG is "application/trig".

It is recommended that TriG files have the extension ".trig" (all lowercase) on all platforms.

The information that follows has been submitted to the Internet Engineering Steering Group (IESG) for review, approval, and registration with IANA.

Type name:
application
Subtype name:
trig
Required parameters:
None
Optional parameters:
None
Encoding considerations:
The syntax of TriG is expressed over code points in Unicode [[!UNICODE]]. The encoding is always UTF-8 [[!UTF-8]].
Unicode code points may also be expressed using an \uXXXX (U+0000 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-Fa-f]
Security considerations:
See .
Interoperability considerations:
There are no known interoperability issues.
Published specification:
This specification.
Applications which use this media type:
TriG is used widely for representing RDF data. There are implementations available in most common programming languages.
Additional information:
Magic number(s):
TriG documents may have the strings 'prefix' or 'base' (case independent) near the beginning of the document.
File extension(s):
.trig
Macintosh file type code(s):
TEXT
Base URI:
The TriG base directive can change the current base URI for relative IRI references in the language that are used sequentially later in the document.
Person & email address to contact for further information:
W3C RDF-star Working Group <public-rdf-star-wg@w3.org>
Intended usage:
Common
Restrictions on usage:
None
Author(s):
The TriG specification is the product of the RDF-star WG. The W3C reserves change control over this specifications.

Acknowledgments

Acknowledgments for RDF 1.1

The editors gratefully acknowledge the work of Chris Bizer and Richard Cyganiak in creating the original TriG specification. Valuable contributions to this version were made by Gregg Kellogg, Eric Prud'hommeaux and Sandro Hawke.

The document was improved through the review process by the wider community.

Acknowledgments for RDF 1.2

In addition to the editors, the following people have contributed to this specification:

Recognize members of the Task Force? Not an easy to find list of contributors.

Changes between RDF 1.1 and RDF 1.2

This section describes the main differences from the RDF 1.1 Recommendation.