RDF 1.2 N-Quads

N-Quads is a line-based, plain text format for encoding an RDF dataset.

RDF 1.2 N-Quads introduces triple terms as a fourth kind of RDF term which can be used as the object of another triple, making it possible to make statements about other statements. RDF 1.2 N-Quads also adds support for directional language-tagged strings.

N-Quads Language

An N-Quads document allows writing down an RDF dataset in a textual form. An RDF dataset is made up of simple statements consisting of a subject, predicate, object, an optional graph name and optional blank lines. Comments may be given after a # that is not part of another lexical token and continue to the end of the line.

Simple Statements

A simple statement extends the definition of simple triple in [[RDF12-N-TRIPLES]] with an optional named graph.

The simplest statement is a sequence of (subject, predicate, object) terms, forming an RDF triple, and an optional graph name (a blank node identifier or IRI) denoting the named graph in a dataset to which the triple belongs, and terminated by a period (.). White space (spaces, and/or tabs) may surround terms, except where significant as noted in the grammar.

Comments are treated as white space, and may be given after a # that is not part of another lexical token and continue to the end of the line.

The graph name can be omitted, in which case the triples are considered part of the default graph of the RDF dataset.

Version Announcement

The N-Quads language has evolved since its origin, and RDF 1.2 adds new syntax. RDF 1.2 N-Quads introduces the VERSION directive along with an optional `version` Media Type parameter. When respectively serializing and parsing N-Quads with new features such as initial text directions or triple terms, authors and parsers can announce and detect the use of the new syntax forms using these directives; similarly, HTTP clients and servers can use the `version` Media Type parameter.

As with N-Triples, the version declaration is case-sensitive.

When providing content over HTTP, servers can announce the version using the optional `version` Media Type parameter:

See Version Announcement in [[RDF12-TURTLE]] for more considerations on using the version declaration.

Triple Terms

A triple term may be the object of an RDF triple.

A triple term is represented as a tripleTerm with subject, predicate, and object preceded by <<(, and followed by )>>. Note that triple terms may be nested.

IRIs

As in N-Triples, IRIs may be written only as resolved IRIs. IRIs are preceded by < and followed by >, and may contain numeric escape sequences. For example <http://example.org/#green-goblin>.

RDF Literals

As in N-Triples, literals are used to identify values such as strings, numbers, dates.

Literals (Grammar production Literal) have a lexical form followed by either a language tag (possibly including initial text direction), a datatype IRI, or neither.

The representation of the lexical form consists of an initial delimiter ", a sequence of permitted characters or numeric escape sequences or string escape sequences, and a final delimiter.

Literals may not contain the characters ", LF, or CR except in their escaped forms. In addition \ may not appear in any quoted literal except as part of an escape sequence and a " character can only be included in a quoted literal using an escape sequence.

The corresponding lexical form is the characters between the delimiters, after processing any escape sequences. If present, the LANG_DIR terminal matches the language tag and optionally the initial text direction. The language tag is preceded by an @, and, if present, the initial text direction is separated from the language tag by --. If there is no language tag, there may be a datatype IRI, preceded by ^^. If there is no datatype IRI and no language tag, then it is a simple literal and the datatype is http://www.w3.org/2001/XMLSchema#string.

RDF Blank Nodes

As in N-Triples, RDF blank nodes are expressed as _: followed by a blank node label matching the BLANK_NODE_LABEL production.

Informally, the first character after _: is either a character matched by PN_CHARS_U or a digit. Any following characters, if present, are matched by PN_CHARS or by ., except that . is not permitted as the last character.

A fresh RDF blank node is allocated for each unique blank node identifier in a document. Repeated use of the same blank node identifier identifies the same blank node.

A Canonical form of N-Quads

This section defines a canonical form of N-Quads which has a completely specified layout. The grammar for the language is unchanged.

Canonical N-Quads extends Canonical N-Triples in [[RDF12-N-TRIPLES]] to include graphLabel.

While the N-Quads syntax allows choices for the representation and layout of RDF data, the canonical form of N-Quads provides a unique syntactic representation of any quad. Each code point can be represented by only one of UCHAR, ECHAR, or unencoded character, where the relevant production allows for a choice in representation. Each quad is represented entirely on a single line with specified white space.

Canonical N-Quads has the following additional constraints on layout:

White space MUST NOT be used except after subject, predicate, object, and graphLabel, any of which MUST be a single space.
A Canonical N-Quads document MUST NOT include a VERSION directive.
Literals with the datatype http://www.w3.org/2001/XMLSchema#string MUST NOT use the datatype IRI part of the literal, and are represented using only STRING_LITERAL_QUOTE.
HEX MUST use only digits ([0–9]) and uppercase letters ([A–F]).
Alphabetic characters in LANG_DIR MUST use only the lowercase letters ([a–z]) with any uppercase letters case mapped to lowercase.
Within STRING_LITERAL_QUOTE:
- Characters BS, HT, LF, FF, CR, ", and \ MUST be encoded using ECHAR.
- Characters in the range from U+0000 to U+0007, VT, characters in the range from U+000E to U+001F, DEL, and characters not matching the Char production from [[XML11]] MUST be represented by UCHAR using a lowercase \u with 4 HEXes.
- All characters not required to be represented by ECHAR or UCHAR MUST be represented by their native [[UNICODE]] representation.
The token EOL MUST be a single LF.
The final EOL MUST be provided.

N-Quads Grammar

An N-Quads document is an RDF string encoded in UTF-8 [[!RFC3629]]. Only Unicode scalar values, in the ranges U+0000 to U+D7FF and U+E000 to U+10FFFF, are allowed. This excludes surrogate code points, range U+D800 to U+DFFF.

White Space

White space (spaces, and/or tabs) is allowed outside of terminals. Rule names in capitals below indicate where white space is significant.

White space is significant in the production STRING_LITERAL_QUOTE.

A blank line, consisting of only white space and/or a comment, may appear wherever a statement production is allowed, and is treated as white space.

As with, N-Triples [[RDF12-N-TRIPLES]], N-Quads allows only horizontal white space (spaces or tabs).

Comments

Comments in N-Quads start at # outside an IRIREF or STRING_LITERAL_QUOTE, and continue to the end of line — marked by character CR or LF — or to the end of file, if there is no end of line after the comment marker. Comments are treated as white space.

Grammar

The EBNF used here is defined in XML 1.0 [[EBNF-NOTATION]].

Escape sequence rules are the same as N-Triples [[RDF12-N-TRIPLES]] and Turtle [[RDF12-TURTLE]]. However, as only the STRING_LITERAL_QUOTE production is allowed new lines in literals MUST be escaped.

The 'VERSION' terminal is in single quotes to indicate that it is case-sensitive.

A text version of this grammar is available here.

Selected Terminal Literal Strings

This document uses some specific terminal literal strings [[EBNF-NOTATION]]. To clarify the Unicode code points used for these terminal literal strings, the following table describes specific characters and sequences used throughout this document.

Code	Glyph	Description
`U+0008`	`BS`	Backspace
`U+0009`	`HT`	Horizontal tab
`U+000A`	`LF`	Line feed
`U+000B`	`VT`	Vertical tab
`U+000C`	`FF`	Form feed
`U+000D`	`CR`	Carriage return
`U+0022`	`"`	Quotation mark
`U+0023`	`#`	Number sign
`U+002D`	`-`	Hyphen
`U+002E`	`.`	Full stop
`U+0030`	`0`	Digit zero
`U+0039`	`9`	Digit nine
`U+003B`	`:`	Colon
`U+003C`	`<`	Less-than sign
`U+003E`	`>`	Greater-than sign
`U+0040`	`@`	At sign
`U+0041`	`A`	Latin capital letter A
`U+0046`	`F`	Latin capital letter F
`U+005C`	`\`	Backslash
`U+005F`	`_`	Underscore
`U+0061`	`a`	Latin small letter A
`U+007A`	`z`	Latin small letter Z
`U+007F`	`DEL`	Delete
`U+00B7`	`·`	Middle dot
`U+203F`	`‿`	Undertie
`U+2040`	`⁀`	Character tie

Other short terminal literal strings are composed of specific sequences of Unicode characters:

space: U+0020
<<(: two concatenated less-than sign characters, each having the code point U+003C, followed by a left parenthesis character, having the code point U+0028
)>>: a left parenthesis character, having the code point U+0029 followed by two concatenated greater-than sign characters, each having the code point U+003E; two concatenated greater-than sign characters, each having the code point U+003E
^^: two concatenated circumflex accent characters, each having the code point U+005E
_:: _ followed by :
--: two concatenated - characters

Parsing

Parsing N-Quads requires a state of two items:

Map[string -> blank node] bnodeLabels — A mapping from string to blank node.
`xsd:string` |curVersion| – The RDF version used for parsing the document into Quads. If specified as part of a Media Type, the default value for |curVersion| is taken from the `version` parameter. Acceptable values for `version` are defined in 2.1 Version Labels in [[RDF12-CONCEPTS]]. The version announcement is only a hint; this specification does not mandate any parser behavior based on |curVersion|, but a parser MAY signal an error or a warning when it encounters a feature that does not match the value of |curVersion|.

RDF Term Constructors

This table maps productions and lexical tokens to RDF terms or components of RDF terms listed in :

production	type	procedure
versionSpecifier	literal	The \|curVersion\| is taken from a literal using the matched RDF string lexical form and `xsd:string` datatype.
BLANK_NODE_LABEL	blank node	The string after `_:`, is a key in bnodeLabels. If there is no corresponding blank node in the map, one is allocated.
IRIREF	IRI	The characters between `<` and `>` are taken, with escape sequences unescaped, to form the IRI. The resulting IRI MUST comply with the syntactic restrictions of generic IRI syntax, and SHOULD conform to section 3.3 of [[RFC3986]] and comply with any narrower restrictions imposed by the corresponding IRI scheme specification.
LANG_DIR	language tag	The characters following the `@` form the language tag and optionally the initial text direction, if the matched characters include `--`. The language tag MUST be well-formed according to section 2.2.9 of [[!BCP47]]. If present, the initial text direction MUST be either `ltr` or `rtl`.
STRING_LITERAL_QUOTE	RDF lexical form	The characters between the outermost quotation marks (`"`) are taken, with escape sequences unescaped, to form the string of the lexical form.
literal	literal	The literal has a lexical form of the first rule argument, `STRING_LITERAL_QUOTE`, and either a language tag with optional initial text direction from `LANG_DIR` or a datatype IRI of `iri`, depending on which rule matched the input. If the `LANG_DIR` rule matched, the language tag and initial text direction are taken from LANG_DIR. If there is no initial text direction, the datatype is `rdf:langString`. If there is a initial text direction, the datatype is `rdf:dirLangString`. If neither `LANG_DIR` nor datatype IRI match, the literal has a datatype of `xsd:string`.
tripleTerm	triple term	The triple term is composed of the terms constructed from the `subject`, `predicate`, and `object` productions.

As processors which detect errors on input can result in datasets which contain fewer triples than are described in the input (including no triples whatsoever), consumers should consider information of any errors signaled when using the resulting dataset, which may be incomplete and/or include ill-typed or ill-formed terms.

RDF Dataset Construction

An N-Quads document defines an RDF dataset composed of RDF graphs composed of a set of RDF triples. The statement production produces a triple defined by the terms constructed for subject, predicate, and object. This RDF triple is added to the graph labeled by the production graphLabel, if no graphLabel is present the triple is added to the RDF dataset's default graph.

Security Considerations

The STRING_LITERAL_QUOTE production allows the use of unescaped control characters. Although this specification does not directly expose this content to an end user, it might be presented through a user agent, which may cause the presented text to be obfuscated due to presentation of such characters.

N-Quads is a general-purpose assertion language; applications may evaluate given data to infer more assertions or to dereference IRIs, invoking the security considerations of the scheme for that IRI. Note in particular, the privacy issues in [[RFC3023]] section 10 for HTTP IRIs. Data obtained from an inaccurate or malicious data source may lead to inaccurate or misleading conclusions, as well as the dereferencing of unintended IRIs. Care must be taken to align the trust in consulted resources with the sensitivity of the intended use of the data; inferences of potential medical treatments would likely require different trust than inferences for trip planning.

The N-Quads language is used to express arbitrary application data; security considerations will vary by domain of use. Security tools and protocols applicable to text (for example, PGP encryption, checksum validation, password-protected compression) may also be used on N-Quads documents. Security/privacy protocols must be imposed which reflect the sensitivity of the embedded information.

N-Quads can express data which is presented to the user, such as RDF Schema labels. Applications rendering strings retrieved from untrusted N-Quads documents, or using unescaped characters, SHOULD use warnings and other appropriate means to limit the possibility that malignant strings might be used to mislead the reader. The security considerations in the media type registration for XML ([[RFC3023]] section 10) provide additional guidance around the expression of arbitrary data and markup.

N-Quads uses IRIs as term identifiers. Applications interpreting data expressed in N-Quads SHOULD address the security issues of [[[RFC3987]]] [[RFC3987]] Section 8, as well as [[[RFC3986]]] [[RFC3986]] Section 7.

Multiple IRIs may have the same appearance. Characters in different scripts may look similar (for instance, a Cyrillic "о" may appear similar to a Latin "o"). A character followed by combining characters may have the same visual representation as another character (for example, LATIN SMALL LETTER "E" followed by COMBINING ACUTE ACCENT has the same visual representation as LATIN SMALL LETTER "E" WITH ACUTE). Any person or application that is writing or interpreting data in N-Quads must take care to use the IRI that matches the intended semantics, and avoid IRIs that may look similar. Further information about matching visually similar characters can be found in [[[UNICODE-SECURITY]]] [[UNICODE-SECURITY]] and [[[RFC3987]]] [[RFC3987]] Section 8.

Internet Media Type and File Extension

The Internet Media Type (formerly known as MIME Type) for N-Quads is "application/n-quads".

The information that follows has been submitted to the Internet Engineering Steering Group (IESG) for review, approval, and registration with IANA.

Type name:

application

Subtype name:

n-quads

Required parameters:

None

Optional parameters:

version: This parameter is optional. If present, acceptable values of version are defined in 2.1 Version Labels in [[RDF12-CONCEPTS]].
profile: This parameter is optional and is used to include additional information. It does not change the semantics of the resource representation when processed without knowledge of the profile. The value of a profile parameter is a non-empty list of space-separated URIs. For more information and background, please refer to [[RFC6906]].

Encoding considerations:

The syntax of N-Quads is expressed over code points in Unicode [[!UNICODE]]. The encoding is always UTF-8 [[!UTF-8]].

Unicode code points may also be expressed using an \uXXXX (U+0000 to U+FFFF) or \UXXXXXXXX syntax (for code points up to U+10FFFF) where `X` is a hexadecimal digit `[0-9A-F]`

Security considerations:

See .

Interoperability considerations:

There are no known interoperability issues.

Published specification:

This specification.

Applications which use this media type:

N-Quads is used widely for representing RDF data. There are implementations available in most common programming languages.

Additional information:

Magic number(s):: None.
File extension(s):: .nq

Person & email address to contact for further information:

RDF & SPARQL Working Group <public-rdf-star-wg@w3.org>

Intended usage:

Common

Restrictions on usage:

None

Author(s):

The N-Quads specification is the product of the RDF & SPARQL WG. The W3C reserves change control over this specifications.

The `profile` parameter may be used by clients to express their preferences in the content negotiation process and by a server to indicate additional information about the response.

If the `profile` parameter is given by a client, a server should return a document that honors all the profiles in the list that are recognized by the server. A server should not respond with an error solely based upon the profile value.

If the `profile` parameter is given by a server, a client may choose to ignore it.

It is recommended that profile URIs be dereferenceable and provide useful documentation at that URI.

When used as a media type parameter [[RFC4288]] in an HTTP Content-Type header or an HTTP Accept header [[RFC7231]] the value of the `profile` parameter will need to be enclosed in quotes (ASCII `"`) if it contains special characters such as white space, including any space used to separate multiple profile URIs.

It is important to note that the value of the `profile` parameter contains one or more URIs and not IRIs. It might therefore be necessary to convert between IRIs and URIs, as specified in section 3 Relationship between IRIs and URIs of [[RFC3987]].

Changes between RDF 1.1 and RDF 1.2

This specification extends the original N-Quads syntax, as defined in [[[N-QUADS]]] [[N-QUADS]], to support the new features introduced by [[[RDF12-CONCEPTS]]] [[RDF12-CONCEPTS]]. This extension is fully backward compatible: any document complying with the old version complies with the new version, and parses to the same graph. Furthermore, any document complying with the new version and containing only RDF 1.1 features is also compliant with the older version (with the exception of the `VERSION` directive; see ). Finally, none of the new syntactic constructs are valid in the old syntax. This means that any N-Quads document using RDF 1.2 features does not comply with the previous version of this specification and cannot be interpreted as a different graph under it.

More specifically, the following changes have been made:

N-Quads is now described as an extension of N-Triples [[RDF12-N-TRIPLES]].
Add to define the canonical form of N-Quads.
Better align the use of white space and comments with [[RDF12-TURTLE]].
Removed language about white space use between terminals that would otherwise be (mis-)recognized, is this can't happen in N-Triples.
Clarify the use of blank lines, including those composed of white space and/or comments. Comments can appear at the end of a triple before the newline as was already evident from .
Create separate subsections of for White space and Comments, better mirroring [[RDF12-TURTLE]].
Updated the PN_CHARS_U grammar production to be consistent with Turtle. Formerly, PN_CHARS_U included "`:`" in N-Triples and N-Quads, but not in Turtle nor TriG. PN_CHARS_U is a component of BLANK_NODE_LABEL.
Adds support for triple terms as described in with updates to .
Separated from and updated language.
Changes the `LANGTAG` terminal production to LANG_DIR to include an optional initial text direction.
Added and parser state to announce the RDF version associated with the input document.
Clarify that Unicode surrogates are not legal in N-Quads documents, nor as the value of a Unicode escape sequence.
Added the optional profile media type parameter to allow clients and servers to convey additional profile information for N-Quads representations without changing their semantics.

Introduction