yml2vocab

Generate RDFS vocabulary files from YAML

This script in this module converts a simple RDF vocabulary, described in YAML, into a formal RDFS in JSON-LD, Turtle, and HTML+RDFa. Optionally, a simple JSON-LD @context is also generated for the vocabulary. Neither the script nor the YAML format is prepared for complex vocabularies; its primary goal is to simplify the generation of simple, straightforward RDFS vocabularies not requiring, for instance, sophisticated OWL statements.

When running, the script relies on two files:

  1. The vocabulary.yml file, containing the definition for the vocabulary entries. (It is also possible to use a different name for the YAML file, see below.)
  2. The template.html file, used to create the HTML file version of the vocabulary. (It is also possible to use a different name for the template file, see below.)

Definition of the vocabulary in the YAML file

The vocabulary is defined in a YAML file, which contains several block sequences with the following keys: vocab, prefix, ontology, class, property, individual, and datatype. Only the vocab and ontology blocks are required, all others are optional.

Each block sequence consists of blocks with the following keys:id, property, value, label, upper_value, domain, range, deprecated, comment, status, defined_by, context, and see_also. The interpretation of these key/value pairs may depend on the top level block where they reside, but some have a common interpretation.

For classes, properties, individuals, and datatypes, the id key, and either the comment or the defined_by keys, are required. All the others are optional.

There are some examples in the example directory on GitHub that illustrate all of these terms.

External terms

The value of the id key is, usually, simple reference identifying the class, property, etc., as part of the vocabulary. It is also possible to use a curie instead of a simple reference. Such terms are considered to be external terms: terms that are formally defined in another vocabulary, and are listed only to increase the readability of the vocabulary specification. (Typical cases are schema.org or Dublin Core terms, that are frequently used in combination with other vocabularies.)

External terms, while they appear in the HTML document generated by the tool, do not bear formal RDF statements in Turtle, JSON-LD, or RDFa; they only appear as information only items in the generated document.

The prefix part of the curie must be defined through the prefix top level block.

Installation and use

The script is in TypeScript (version 5.0.2 and beyond) running on top of node.js (version 21 and beyond). It can also run with deno (version 2.1 and beyond).

Beyond the YAML file itself, the script relies on an HTML template file, i.e., a skeleton file in HTML that is completed by the vocabulary entries. The example template file on GitHub provides a good starting point for a template that also makes use of respec. The script relies on the existing id values and section structures to be modified/extended by the script. Unused subsections (e.g., when there are no deprecated classes) are removed from the final HTML file.

Installation from npm

The script can be used as a standard npm module via:

npm install yml2vocab

Running on a command line

NPM/Node.js

The npm installation installs the node_modules/.bin/yml2vocab script. The script can be used as:

yml2vocab [-v vocab_file_name] [-t template_file_name] [-c]
Deno

If deno is installed globally, one can also run the script directly (without any further installation) by

deno run --allow-read --allow-write --allow-env /a/b/c/main_.ts [-v vocab_file_name] [-t template_file_name] [-c]

in the top level. To make it simpler, a binary, compiled version of the program can be generated by

deno compile --allow-read --allow-write --allow-env main.ts

which results in an executable file, called yml2vocab, that can be stored anywhere in the user’s $PATH.

Command line argument

The script generates the vocab_file_name.ttl, vocab_file_name.jsonld, and vocab_file_name.html files for the Turtle, JSON-LD, and HTML+RDFa versions, respectively. The script relies on the vocab_file_name.yml file for the vocabulary specification in YAML and a template_file_name file for a template file. The defaults are vocabulary and template.html, respectively.

If the -c flag is also set, the additional vocab_file_name.context.jsonld is also generated, containing a JSON-LD file that can be used as a separate @context reference in a JSON-LD file. Note that this JSON-LD file does not necessarily use all the sophistication that JSON-LD defines for @context; these may have to be added manually.

Running from a Javascript/TypeScript program

The simplest way of using the module from Javascript is:

const yml2vocab = require('yml2vocab');
async function main() {
    await yml2vocab.generateVocabularyFiles("vocabulary","template.html",false);
}
main();

This reads (asynchronously) the YAML and template files and stores the generated vocabulary representations (see the command line interface for details) in the directory alongside the YAML file. By setting the last argument to true a @context is also generated.

The somewhat lower level yml2vocab.VocabGeneration class can also be used:

const yml2vocab = require('yml2vocab');
const vocabGeneration = new yml2vocab.VocabGeneration(yml_content);     // YAML content is text form, before parsing
const turtle: string  = vocabGeneration.getTurtle();                    // returns the turtle content as a string
const jsonld: string  = vocabGeneration.getJSONLD();                    // returns the JSON-LD content as a string
const html: string    = vocabGeneration.getHTML(template_file_content); // returns the HTML+RDFa content as a string
const context: string = vocabGeneration.getContext();                   // returns the minimal @context file for the vocabulary

If TypeScript is used instead of Javascript the same works, except that the require must be replaced by:

import yml2vocab from 'yml2vocab';

There is no need to install any extra typing, it is included in the package. The interfaces are simply using strings, no extra TypeScript type definitions have been defined.

Cloning the repository

The repository may also be cloned.

Content of the directory

The following files and directories are generated/modified by either the script or npm; better not to touch these directly:

Acknowledgement

The original idea, structure, and script (in Ruby) was created by Gregg Kellogg for v1 of the Credentials Vocabulary and with a vocabulary definition using CSV. The CSV definitions have been changed to YAML, and the script itself has been re-written in TypeScript, and developed further since.

Many features are the result of further discussions with Many Sporny and Benjamin Young.