N-Quads is a line-based, plain text format for encoding an RDF dataset.
RDF 1.2 N-Quads introduces triple terms as a fourth kind of RDF term which can be used as the subject or object of another triple, making it possible to make statements about other statements. RDF 1.2 N-Quads also adds support for directional language-tagged strings.
This document is part of the RDF 1.2 document suite. The N-Quads format is a line-based RDF syntax, which is an extension of N-Triples [[RDF12-N-TRIPLES]]. The main distinction is that N-Quads allows the encoding of multiple graphs in a single document representing an RDF Dataset.
This document defines N-Quads, a concrete syntax for RDF [[RDF12-CONCEPTS]], and an extension of N-Triples [[RDF12-N-TRIPLES]]. N-Quads is an easy to parse, line-based, concrete syntax for RDF Datasets [[RDF12-CONCEPTS]].
As with N-Triples, an N-Quads document contains no parsing directives.
N-Quads statements are a sequence of RDF terms representing the
subject,
predicate, and
object
of an RDF Triple
and an optional graph name
identifying a named graph
associated with the triple within an RDF dataset,
also known as a quad.
These may be separated by white space (spaces
, and/or tabs
).
This sequence is terminated by a .
(optionally followed by white space and/or a comment),
and a new line (optional at the end of a document).
The RDF dataset represented by an N-Quads document contains
exactly each quad matching the N-Quads
statement
production.
An N-Quads document allows writing down an
RDF dataset
in a textual form.
An RDF dataset is made up of simple statements
consisting of a
subject,
predicate,
object, an optional
graph name
and optional blank lines.
Comments may be given after a #
that is not part of
another lexical token and continue to the end of the line.
A simple statement extends the definition of simple triple in [[RDF12-N-TRIPLES]] with an optional named graph.
The simplest statement is a sequence of
(subject,
predicate,
object) terms
forming an RDF triple
and an optional
graph name
(a blank node identifier
or IRI) labeling what
named graph
in a dataset the triple belongs to.
White space (spaces
, and/or tabs
) may surround terms,
except where significant as noted in the grammar.
Comments are treated as white space, and may be given after a #
that is not part of
another lexical token and continue to the end of the line.
The graph name can be omitted, in which case the triples are considered part of the default graph of the RDF dataset.
A triple term may be the subject or object of an RDF triple.
A triple term
is represented as a tripleTerm
with
subject
,
predicate
, and
object
preceded by <<(
, and
followed by )>>
.
Note that triple terms
may be nested.
As in N-Triples, IRIs may be written only as resolved IRIs.
IRIs are preceded by <
and
followed by >
,
and may contain numeric escape sequences (described below).
For example <http://example.org/#green-goblin>
.
As in N-Triples, literals are used to identify values such as strings, numbers, dates.
Literals (Grammar production Literal
)
have a lexical form followed by either a
language tag
(possibly including base direction),
a datatype IRI,
or neither.
The representation of the lexical form consists of an
initial delimiter "
,
a sequence of permitted characters or numeric escape sequence or string escape sequence,
and a final delimiter.
Literals may not contain the characters "
,
LF
, or
CR
except in their escaped forms.
In addition \
may not appear in any quoted literal except as part of an escape sequence
and a "
character
can only be included in a quoted literal using an escape sequence.
The corresponding lexical form
is the characters between the delimiters, after processing any escape sequences.
If present, the LANG_DIR
terminal matches the language tag
and optionally the base direction.
The language tag
is preceded by an @
,
and, if present, the base direction
is separated from the language tag
by --
.
If there is no language tag, there may be a datatype IRI,
preceded by ^^
.
If there is no datatype IRI and no language tag, then
it is a simple literal
and the datatype is http://www.w3.org/2001/XMLSchema#string
.
As in N-Triples,
RDF blank nodes are expressed as _:
followed by a blank node label which is a series of name characters.
The characters in the label are built upon PN_CHARS_BASE
,
liberalized as follows:
_
and
the digit characters 0
–9
, inclusive
may appear anywhere in a blank node label..
may appear anywhere except the first or last character.-
,
·
,
‿
,
⁀
,
and
the combining diacritical marks (U+0300
to U+036F
)
are permitted anywhere except the first character.A fresh RDF blank node is allocated for each unique blank node identifier in a document. Repeated use of the same blank node identifier identifies the same blank node.
This section defines a canonical form of N-Quads which has a completely specified layout. The grammar for the language is unchanged.
Canonical N-Quads extends
Canonical N-Triples in [[RDF12-N-TRIPLES]]
to include graphLabel
.
While the N-Quads syntax allows choices for the representation and layout of RDF data,
the canonical form of N-Quads provides a unique syntactic representation of any quad.
Each code point
can be represented by only one of
UCHAR
,
ECHAR
,
or unencoded character,
where the relevant production allows for a choice in representation.
Each quad is represented entirely on a single line with specified white space.
Canonical N-Quads has the following additional constraints on layout:
subject
,
predicate
,
object
,
and graphLabel
,
any of which MUST be a single space
.http://www.w3.org/2001/XMLSchema#string
MUST NOT use the datatype IRI part of the literal
,
and are represented using only STRING_LITERAL_QUOTE
.
HEX
MUST use only digits
([
0
–9
]
)
and uppercase letters ([
A
–F
]
).LANG_DIR
MUST use only
the lowercase letters ([
a
–z
]
)
with any uppercase letters case mapped to lowercase.STRING_LITERAL_QUOTE
:
BS
,
HT
,
LF
,
FF
,
CR
,
"
, and
\
MUST be encoded using ECHAR
.U+0000
to U+0007
,
VT
,
characters in the range from U+000E
to U+001F
,
DEL
,
and characters not matching the Char production from [[XML11]]
MUST be represented by UCHAR
using a lowercase \u
with 4 HEX
es.ECHAR
or
UCHAR
MUST be represented by their native [[UNICODE]] representation.EOL
MUST be a single LF
.EOL
MUST be provided.This specification defines conformance criteria for:
A conforming N-Quads document is an RDF string
that conforms to the grammar and additional constraints defined in ,
starting with the nquadsDoc
production.
An N-Quads document serializes an RDF dataset.
N-Quads documents do not provide a way of serializing empty graphs that may be part of an RDF dataset.
A conforming Canonical N-Quads document is an N-Quads document that follows the additional constraints of Canonical N-Quads.
A conforming N-Quads parser is a system capable of reading N-Quads documents on behalf of an application. It makes the serialized RDF dataset, as defined in , available to the application, usually through some form of API.
The IRI that identifies the N-Quads language is: http://www.w3.org/ns/formats/N-Quads
The media type of N-Quads is application/n-quads
.
The content encoding of N-Quads is always UTF-8.
See N-Quads Media Type for the media type
registration form.
The original specification,
N-Quads: Extending N-Triples with Context,
proposed the use of media type text/x-nquads
with an encoding
using 7-bit US-ASCII.
An N-Quads document is an RDF string encoded in UTF-8 [[!RFC3629]].
White space (spaces
, and/or tabs
) is allowed outside of terminals.
Rule names in capitals below indicate where white space is significant.
White space is significant in the production STRING_LITERAL_QUOTE
.
A blank line, consisting of only white space and/or a comment,
may appear wherever a statement
production is allowed,
and is treated as white space.
As with, N-Triples [[RDF12-N-TRIPLES]],
N-Quads allows only horizontal white space (spaces
or tabs
).
Comments in N-Quads start at #
outside an IRIREF
or STRING_LITERAL_QUOTE
,
and continue to the end of line
— marked by character
CR
or
LF
—
or to the end of file, if there is no end of line after the comment marker.
Comments are treated as white space.
The EBNF used here is defined in XML 1.0 [[EBNF-NOTATION]].
Escape sequence rules are the same as N-Triples [[RDF12-N-TRIPLES]] and Turtle [[RDF12-TURTLE]].
However, as only the STRING_LITERAL_QUOTE
production is allowed new lines in literals MUST be escaped.
A text version of this grammar is available here.
This document uses some specific terminal literal strings [[EBNF-NOTATION]]. To clarify the Unicode code points used for these terminal literal strings, the following table describes specific characters and sequences used throughout this document.
Code | Glyph | Description |
---|---|---|
U+0008 |
BS |
Backspace |
U+0009 |
HT |
Horizontal tab |
U+000A |
LF |
Line feed |
U+000B |
VT |
Vertical tab |
U+000C |
FF |
Form feed |
U+000D |
CR |
Carriage return |
U+0022 |
" |
Quotation mark |
U+0023 |
# |
Number sign |
U+002D |
- |
Hyphen |
U+002E |
. |
Full stop |
U+0030 |
0 |
Digit zero |
U+0039 |
9 |
Digit nine |
U+003B |
: |
Colon |
U+003C |
< |
Less-than sign |
U+003E |
> |
Greater-than sign |
U+0040 |
@ |
At sign |
U+0041 |
A |
Latin capital letter A |
U+0046 |
F |
Latin capital letter F |
U+005C |
\ |
Backslash |
U+005F |
_ |
Underscore |
U+0061 |
a |
Latin small letter A |
U+007A |
F |
Latin small letter Z |
U+007F |
DEL |
Delete |
U+00B7 |
· |
Middle dot |
U+203F |
‿ |
Undertie |
U+2040 |
⁀ |
Character tie |
Other short terminal literal strings are composed of specific sequences of Unicode characters:
space
U+0020
<<(
U+003C
,
followed by a left parenthesis character, having the code point U+0028
)>>
U+0029
followed by two concatenated greater-than sign characters, each having the code point U+003E
U+003E
^^
U+005E
_:
_
followed by :
--
-
charactersParsing N-Quads requires a state of one item:
bnodeLabels
— A mapping from string to blank node.This table maps productions and lexical tokens to RDF terms or components of RDF terms listed in :
production | type | procedure |
---|---|---|
BLANK_NODE_LABEL | blank node |
The string after _: ,
is a key in bnodeLabels.
If there is no corresponding blank node in the map,
one is allocated.
|
IRIREF | IRI |
The characters between <
and > are taken,
with escape sequences unescaped,
to form the IRI.
|
LANG_DIR | language tag |
The characters following the @
form the language tag
and optionally the base direction,
if the matched characters include
-- .
|
STRING_LITERAL_QUOTE | RDF lexical form |
The characters between the outermost quotation marks (" ) are taken,
with escape sequences unescaped,
to form the string of the lexical form.
|
literal | literal |
The literal has a lexical form of the first rule argument,
STRING_LITERAL_QUOTE ,
and either a language tag
with optional base direction
from LANG_DIR
or a datatype IRI of iri ,
depending on which rule matched the input.
If the LANG_DIR rule matched,
the language tag
and base direction
are taken from LANG_DIR.
If there is no base direction,
the datatype is rdf:langString .
If there is a base direction,
the datatype is rdf:dirLangString .
If neither LANG_DIR
nor datatype IRI match,
the literal has a datatype of xsd:string .
|
tripleTerm | triple term |
The triple term
is composed of the terms constructed from
the subject ,
predicate , and
object productions.
|
An N-Quads document defines an RDF dataset
composed of RDF graphs composed of a set of
RDF triples.
The statement
production produces a
triple defined by the terms constructed for
subject
,
predicate
, and
object
.
This RDF triple is added to the graph labeled by
the production graphLabel
,
if no graphLabel
is present the triple is added to the RDF dataset's default graph.
The N-Quads format is used to express arbitrary application data, which may include the expression of personally identifiable information (PII) or other information which could be considered sensitive. Authors publishing such information are advised to carefully consider the needs and use of publishing such information, as well as the applicable regulations for the regions where the data is expected to be consumed and potentially revealed (e.g., GDPR, CCPA, others), particularly whether authorization measures are needed for access to the data.
The STRING_LITERAL_QUOTE
production allows the use of unescaped control characters.
Although this specification does not directly expose this content to an end user,
it might be presented through a user agent, which may cause the presented text to
be obfuscated due to presentation of such characters.
N-Quads is a general-purpose assertion language; applications may evaluate given data to infer more assertions or to dereference IRIs, invoking the security considerations of the scheme for that IRI. Note in particular, the privacy issues in [[RFC3023]] section 10 for HTTP IRIs. Data obtained from an inaccurate or malicious data source may lead to inaccurate or misleading conclusions, as well as the dereferencing of unintended IRIs. Care must be taken to align the trust in consulted resources with the sensitivity of the intended use of the data; inferences of potential medical treatments would likely require different trust than inferences for trip planning.
The N-Quads language is used to express arbitrary application data; security considerations will vary by domain of use. Security tools and protocols applicable to text (for example, PGP encryption, checksum validation, password-protected compression) may also be used on N-Quads documents. Security/privacy protocols must be imposed which reflect the sensitivity of the embedded information.
N-Quads can express data which is presented to the user, such as RDF Schema labels. Applications rendering strings retrieved from untrusted N-Quads documents, or using unescaped characters, SHOULD use warnings and other appropriate means to limit the possibility that malignant strings might be used to mislead the reader. The security considerations in the media type registration for XML ([[RFC3023]] section 10) provide additional guidance around the expression of arbitrary data and markup.
N-Quads uses IRIs as term identifiers. Applications interpreting data expressed in N-Quads SHOULD address the security issues of [[[RFC3987]]] [[RFC3987]] Section 8, as well as [[[RFC3986]]] [[RFC3986]] Section 7.
Multiple IRIs may have the same appearance. Characters in different scripts may look similar (for instance, a Cyrillic "о" may appear similar to a Latin "o"). A character followed by combining characters may have the same visual representation as another character (for example, LATIN SMALL LETTER "E" followed by COMBINING ACUTE ACCENT has the same visual representation as LATIN SMALL LETTER "E" WITH ACUTE). Any person or application that is writing or interpreting data in N-Quads must take care to use the IRI that matches the intended semantics, and avoid IRIs that may look similar. Further information about matching visually similar characters can be found in [[[UNICODE-SECURITY]]] [[UNICODE-SECURITY]] and [[[RFC3987]]] [[RFC3987]] Section 8.
The Internet Media Type (formerly known as MIME Type) for N-Quads is "application/n-quads".
It is recommended that N-Quads files have the extension ".nq" (all lowercase) on all platforms.
It is recommended that N-Quads files stored on Macintosh HFS file systems be given a file type of "TEXT".
This information that follows will be submitted to the IESG for review, approval, and registration with IANA.
\uXXXX
(U+0000
to U+FFFF
)
or \UXXXXXXXX
syntax (for code points up to U+10FFFF
)
where `X` is a hexadecimal digit `[0-9A-F]`The editor of the RDF 1.1 edition acknowledges valuable contributions from Gregg Kellogg, Andy Seaborne, Eric Prud'hommeaux, Dave Beckett, David Robillard, Gregory Williams, Antoine Zimmermann, Sandro Hawke, Richard Cyganiak, Pat Hayes, Henry S. Thompson, Bob Ferris, Henry Story, Andreas Harth, Lee Feigenbaum, Peter Ansell, Evan Patton and David Booth.
This specification is a product of extensive deliberations by the members of the RDF Working Group chaired by Guus Schreiber and David Wood. It draws upon the earlier specification in N-Quads: Extending N-Triples with Context, edited by Richard Cyganiak, Andreas Harth, and Aidan Hogan.
The editors of the RDF 1.2 edition acknowledge valuable contributions from Andy Seaborne.
In addition to the editors, the following people have contributed to this specification:
Recognize members of the Task Force? Not an easy to find list of contributors.
PN_CHARS_U
grammar production to be consistent with Turtle.
Formerly, PN_CHARS_U
included "`:`" in N-Triples and N-Quads, but not in Turtle nor TriG.
PN_CHARS_U
is a component
of BLANK_NODE_LABEL
.LANG_DIR
to include
an optional base direction.