Structure for this document has been debated by the Dataset Exchange Working Group, principally at the following locations:

Abstract

A profile is a named set of constraints on one or more identified base specifications, necessary to accomplish a particular function.

This document aims to provide guidance on how to create, describe and publish profiles.

Overview of DXWG documents on profiles

This document is part of a set of documents on profiles, edited by the W3C Dataset Exchange Working Group (DXWG). Some of the documents are general while some are technology-specific:

Introduction

This section is non-normative.

A profile can be understood as an outline of some "thing" when seen from a specific point of view. Also, Profiling is the task of distilling the essential aspects or character of something, such as a person, from a specific angle. In the craft domain, a profile is taken by a tool that matches itself in detail to the contours of a 3-dimensional object and returns a 2-dimensional accurate representation from which other formable materials can be constrained and fashioned to that profile, or matched with it to determine how accurately it portrays the original 3-dimensional object from which the profile was taken.

In the same sense then, information entities can be viewed from different perspectives and in order to prepare them for specific uses they are frequently tested for their goodness of fit to some pattern, or the pattern can be provided prior to the gathering of the information to provide some constraint to ensure adequacy and appropriateness of that information asset to the job in hand.

In information technology, profiles may also support the data needs of specific applications. Profiling is often the work of a community interested in interoperability and data exchange. We define a profile generally as named set of constraints on one or more identified base specifications.

Good data practice generally begins with vocabulary and ontology designers who are encouraged to make their work as broadly applicable as possible so as to maximize future adoption. As a result, vocabularies and ontologies typically define a data model using minimal semantics. For example, DCAT [[vocab-dcat-2]] defines the concept of a dataset as an abstract entity with distributions and data services as means of accessing data; there is no mention of whether a distribution should be in a particular serialization, or set of serializations, nor of how data services should be configured. While it states that the value of dcat:theme should be a SKOS [[skos-reference]] concept, it does not specify a particular SKOS concept scheme, and so on. Other vocabularies, such as Dublin Core Terms [[DCTERMS]], are similarly for general use. This means that data models and methods of working can be applied in different circumstances than those in which the original definition work was carried out and, in that sense, these promote broad interoperability.

In addition to addressing the needs of a specific community, a profile may also apply to a single system. Any individual system will be designed to meet a specific set of needs; that is, it will operate in a specific context. It is that context, and the individual choices made by the engineers working within it, that will determine how a vocabulary or set of vocabularies will be used. For example, a system ingesting data may require that a specific subset of properties from a range of vocabularies is used and that only terms from a defined code list are used as values for specified properties. In other words, where the 'base vocabulary' might say "the value of this property SHOULD be a value from a managed code list", a specialized profile will say "the value of this property MUST be from this specific code list".

This document is about how to formulate and communicate profiles and the ways in which profiles can be identified and related to each other.

Definitions

We recognise that the term 'profile' occurs in several domains, and that there are a range of definitions available from different communities. We have taken this variability into account but for the purpose of this document we are using the following definitions:

profile

A named set of constraints on one or more identified base specifications, including the identification of any implementing subclasses of datatypes, semantic interpretations, vocabularies, options and parameters of those base specifications necessary to accomplish a particular function.

This definition includes what are often called "application profiles", "metadata application profiles", or "metadata profiles".

Source: deliberations of the DXWG. See ProfileContext wiki page.

profiling

The act of creating a profile - an activity that has been undertaken by many communities with a range of formalisms.

The motivation for profiles

Communities create and use data standards to ensure interoperability for information exchange. Although members of a community may use the same basic standard schema, it is very common for different subsets within the larger community to need some further specification of the data they create to meet their own needs. To continue to support interoperability of their data with others, these community members need to express the specifics of their implementation of the data schema. Profiles serve this purpose. Profiles enumerate vocabulary terms, cardinality, and validation rules, and can also include descriptions of the rules used by creators to make decisions regarding their data elements.

Examples of profiles and related work

Profiles can take a number of forms and can have a variety of relationships to existing vocabularies, standards, and other profiles. We recognise this variety, but for the purposes of this document we are focusing on the most general forms of profile and profiling. Although it is not possible to list all of the types of profiles, some illustrations of frequently-used profiles include:

This section lists work that has been influential in the development of this document.

See comment 408916364

Profiles defined

This section builds on the formal definition for profile and the requirements DXWG has identified, and lays down a conceptual model for profiles, their components and their functions.

A Profile is based on an existing Specification, which can have the status of a Standard and which can be itself a Profile of another Specification. A profile can have various Components/Manifestations in various expression languages for profiles. These Components/Manifestations play specific Functions (roles) for the implementation of the services that motivate the creation of the Profile. These Components are serialized and published as Distributions in various formats, which are the concrete input for performing these services.

Profiles are named collections of properties

Profiles SHOULD be made up of a collection of properties or terms, and that collection SHOULD have a name. Properties are selected from existing vocabularies. Because profiles as being described here are not limited to those in RDF, collections could be of metadata terms from any type of metadata schema.

It could also be in roles/functions if there is nothing there about defining terms. Antoine Isaac, 2018-11-11

Multiple base specifications [RPFMBSPEC]

A profile can have multiple base specifications.

github:issue/268

Profile of profiles [RPFPP]

One can create a profile of profiles, with elements potentially inherited on several levels.

Profile inheritance [RPFINHER]

Profiles may add to or specialise clauses from one or more base specifications. Such profiles inherit all the constraints from base specifications.

It was approved in minutes but there was no github issue for it. (This requirement is wrongly refered to as Github #238 and I've removed that reference). Antoine Isaac, 2018-11-11

Data publication according to different profiles-1 [RPFDAPUB-1]

Some data may conform to one or more profiles at once

This requirement overlaps with the following requirement and there is a lot of uncertainty on where this requirement should be categorized. Antoine Isaac, 2018-11-25

Data publication according to different profiles-2 [RPFDAPUB-2]

Data publishers may publish data according to different profiles, either simultaneously (e.g. in one same data "distribution") or in parallel (e.g. via content negotiation)

(KC:) This might be better placed in the publication section
There is a lot of uncertainty on where this requirement should be categorized. There is a big 'profile negotiation' flavour to it, but it's not only about it. We are keeping it here for now as we are not sure where it should be. It was listed as a general requirement, but it could be categorized instead as a requirement for profile negotiation, or DCAT distribution, or both. In addition, it is highly overlapping with the previous requirement (and we've edited the heading to reflect it). Antoine Isaac, 2018-11-25
(Antoine:) Obviously at this stage some names in this model have to be finalized. We will also provide with a diagram that shows our 'big picture' on profiles and their possible (or recommended) components. Something at the level of the diagram at https://docs.google.com/drawings/d/1dHkpwKwUwMgS1RqSCTPO3uOoRiY_qNk0z5bhXJlYi4Y/, which should work as an abstracted version of the prof-o model https://w3c.github.io/dxwg/profilesont/ (i.e. one without classes and properties that come from a specific namespace/vocabulary.

(Lars:) Probably we should not just use the DCAT view (distributions) but we should rather use more generic Web architecture terms as resources and representations.

(Riccardo:) We should talk about 'distribution of a manifestation' not 'distribution' alone.

The W3C DXWG has identified general requirements about profiles, which we list here and are going to expand upon (in more detailed requirements and recommendations) in the following section:

(Antoine:) Once the structure is stabilized this section should list the other sections it 'introduces'. Maybe these sections will be moved as sub-sections of this one.

Best Practices

BP13 Use locale-neutral data representations

Because profiles are intended to convey information both within a community and at times between communities, wide use would be facilitated by BP13, which is that the profile should use locale-neutral data representations where possible. Some data communities have deep and historical practices that use terminology that is specific to the community. The creation of a profile is an opportunity to transform that practice to widely known standards.

BP15 Reuse vocabularies, preferably standardized ones

BP15 is one of the cornerstones of the profile practice, which is to reuse vocabularies, in particular standardized ones. One would need to define "standardized" in this context, but perhaps a better solution is to define the qualities of preferred vocabularies: have a stable URI; are supported by an organization or community; (more??).

The Functions of a Profile

Requirements covering aspects of how profiles are being used, i.e. what functionality may they express or support, for example validation, or documentation of data.

Human-readable definitions [RPFHRDEF]

Profiles can have human-readable definitions of terms and input instructions.

Global rules for descriptive content [RPGRDC]

There needs to be a property in the profile where the rules for the descriptive content can be provided. This would apply to the entire profile.

Data validation [RPFVALID]

A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation

This requirement, which is a bit redundant with github:issue/279, has been shifted towards the function of data validation instead of focusing on the representations that enable it.

External specifications for individual properties [RPFESIP]

Profiles should be able to indicate which external specifications are expected to be applied/have been applied to values of individual properties.

Validity rules [RPFVALIDR]

Profiles may provide rules governing value validity.

Value lists for data elements [RPFVLIST]

Profiles may provide lists of values to pick from in order to populate data elements.

Cardinality rules [RPFCARDR]

Profiles may provide rules on cardinality of terms (including "recommended").

Dependency rules [RPFDEPR]

Profiles may express dependencies between elements of the vocabulary (if A then not B, etc.).

github:issue/278

User interface support [RPFUI]

Profiles can have what is needed to drive forms for data input or for user display.

We could reference a list of profile expression languages and the functions they are typically expected to enable, similar to what has been started in Andrea's sheet: https://docs.google.com/spreadsheets/d/1zty4jtzhg0_1xojldomq1xeheliwvp2-stw6_-_zxr4/

Best Practices

BP16 Choose the right formalization level

The recommendation in BP16 is to choose the right formalization level. This is a useful recommendation for all data and metadata, and would naturally apply to profiles. Profiles should be suited to the tasks they are designed to support; not less nor more. In particular we should caution against overly strict use of constraints, which then make it harder for others to either make direct use of the profile or to create a profile of the profile where their needs vary only a small amount.

Profile publication

Profiles are published in the form of Components/Manifestations that are serialized in Distributions.

The W3C DXWG has identified the following requirement for such publication:

Profile documentation [RPFDOCU]

A profile should have human-readable documentation that expresses for humans the main components of a profile, which can also be available as machine-readable resources (ontology or schema files, SHACL files, etc). This includes listing of elements in the profile, instructions and recommendations on how to use them, constraints that determine what data is valid according to the profile, etc.

Schema implementations [RPFSCHEMAS]

Profiles may be written in or may link to a document or schema in a validation language (ShEx, SHACL, XMLschema).

This requirement, which is a bit redundant with github:issue/273, has been shifted towards the notion of distributions of schemas, instead of focusing on the general validation function.

(Antoine:) From https://github.com/w3c/dxwg/issues/242#issuecomment-408916364:
The document should indicate that a motivation to have description of profiles is to help clients find out what profiles are available that could match their needs and ask for it. This motivation applies for the Profiles Vocabulary and the metadata presented in the sections below.

Best Practices

BP9 Use persistent URIs as identifiers of datasets

Use persistent URIs as identifiers of profiles. This seems to be a non-controversial requirement because it applies to any web resource. There is a related best practice (BP10) which is stated as "Use persistent URIs as identifiers within datasets." This can be well-suited to the aspect of a profile that consists of the reuse of vocabulary terms that have been prevously defined, but also to the definition of new terms within the scope of the profile itself. Each element of a profile's vocabulary should have a URI (IRI?) that identifies the term and information about the term (such as labels, definitions, etc.)

BP14 Provide data in multiple formats

The BPs also recommend that data be provided in multiple formats, and this is a good recommendation for profiles as well. If we take the point of view that profiles, like DCAT, have an abstract essence that can be made manifest in more than one way, we already have a good basis for satisfaction of BP14. (This will bring up the question that has already come up in the context of DCAT and conneg - are all of the forms equivalent? We may not be able to resolve that question.)

BP12 Use machine-readable standardized data formats

profiles should be published in standard data formats (BP12). It should also be stated that profiles should be published in and make use of technologies that are appropriate to the community which is expected to use them. A profile using RDF and OWL will not well serve a community that has only an XML/XSD-based skill set. This also relates to the above recommendation relating to providing data in multiple formats. In many communities the skills and data history can vary, so providing profiles with as many as possible of the commonly used technologies will increase the utility of the profile (as well as the profiled instance data).

Profile metadata

Profile metadata [RPFMD]

Profiles must be discoverable through a machine-readable metadata that describes what is offered and how to invoke the offered profiles.

Documenting ecosystems of profiles [RPFDECOS]

From the perspective of management of profiles, and guidance to users and data experts, ecosystems of profiles should be properly described (e.g. in profile catalogues/repositories), especially documenting the relationships between profiles and what they are based on, and between profiles that are based on other profiles.

(Antoine:) From https://github.com/w3c/dxwg/issues/242#issuecomment-408916364, Administrative metadata:
(Antoine:) I've kept it here, but I do agree with the comment at https://github.com/w3c/dxwg/issues/242#issuecomment-408916364 (and above) that URI should be in a specific section on publication of profiles.
(Antoine:) From https://github.com/w3c/dxwg/issues/242#issuecomment-408916364, Descriptive metadata:
(Antoine:) From the 24 Oct discussion in Lyon reminding some approved requirements, which could fit here:

Best Practices

BP1 Provide metadata

In the BPoW this is defined as metadata about the dataset; with profiles this can be descriptive metadata within the profile or a separate metadata statement about the profile. (Since profiles themselves are generally forms of metadata they may be able to incorporate description and other administrative information about the profile within itself, if desired.) In addition there is a common set of administrative data that is recommended for many information resources such as dates and version designators for each version, and provenance (who or what agency created the digital file).

BP 2: Provide descriptive metadata

The BPDoW limits its recommendation of metadata covering the topic of the dataset to general keywords and themes and categories (BP2). It may be desirable provide more specific topical information to satisfy the DXWG requirement that profiles should be discoverable by search engines. The quality of discoverability will vary based on the depth of description of the topic and/or community area that it serves. (We may wish to recommend some particulars.)

BP30 Make feedback available

Ideally, the profile would have a management cycle for maintenance and updates. This should involve the community of users, as noted in BP29 (Gather feedback from data consumers) and BP30 (make feedback available). The strength and value of a profile will depend on the involvement of the community of users.

BP27 Preserve identifiers

The aspect of the management cycle that is often ignored is that of the de-commissioning of datasets or of superseded versions. For users it is key that previously used identifiers always point to a useful document or message.

The Profiles ontology

This section is non-normative.

(Antoine:) What I have in mind for this section is to introduce the DXWG Profiles ontology. Say how it can support the requirements we identified, point to it, and suggest to use in contexts where an RDF / Linked Data description of the profiles and their associated resources can be made. If it's non-normative, it should be presented as a possible approach to implementing our recommendations, for example using a sentence like The machine-readable version of the metadata on profiles and their links to associated resources may be provided using the Profiles ontology developed by the DXWG working group (shamelessly inspired from how we've done it in the DWBP doc at https://www.w3.org/TR/dwbp/#quality). I guess it should also reprise some points made at https://github.com/w3c/dxwg/issues/323.

Examples of application

This section is non-normative.

(Antoine:) The idea here would be to point to examples that follow our recommendations, made up to illustrate specific features, or complete 'real' implementations. Ideally these would be using prof-o, on the condition that we still refer to prof-o as a suggestion not a formal recommendation (see the way it's been mentioned above). I reckon that some may find it more natural to have fully fledged

Existing profiles

DCAT Application Profiles

Asset Description Metadata Schema

From the Asset Description Metadata Schema (ADMS) specification [[vocab-adms]]: "ADMS is a profile of DCAT [[vocab-dcat]], used to describe semantic assets".

Dublin Core Application Profile & The Singapore Framework

The Singapore Framework for Dublin Core Application Profiles [[DCSF]]

The Guidelines for Dublin Core Application Profiles [[DCAP]]

Description Set Profiles: A constraint language for Dublin Core Application Profiles [[DCDSP]]

Open Geospatial Consortium Profiling

Profiling of ISO standards

ODRL profiles

BIBFRA.ME (one namespace)

(Antoine) This comes from https://github.com/w3c/dxwg/issues/242#issuecomment-408916364, I'm ok having it but if we don't have time to write it up I can live with it...

Examples of new profiles

(Nick:) Following on from existing profiles above, we could have here new profiles, made since the introduction of this document. To act as exemplars.

Security and Privacy

Appendices

Definitions

formalism

definition not finalised

Source: deliberations of the DXWG. See GitHub Issue 194.

specification

An act of identifying something precisely or of stating a precise requirement.

Source: Oxford English Dictionary.

standard

A basis for comparison; a reference point against which other things can be evaluated.

Source: [[DCTERMS]].

Comparison with definitions in related work

(Antoine:) I'm keeping Nick's idea so that the WG can make its mind on whether it is worth pursuing. I personally think it's a noble endeavour but one that would consume too much of our time now.

The following term definitions, taken from the DC AP & SF specifications ([[DCAP]], [[DCSF]] & [[DCDSP]]) are included to indicate how this document's definitions contrast with those of previous work.

profile
The term profile is widely used to refer to a document that describes how standards or specifications are deployed to support the requirements of a particular application, function, community, or context. In the metadata community, the term _application profile_has been applied to describe the tailoring of standards for specific applications.
[[DCSF]]

Requirements

The placement of these Requirements within the main body of this document is being discussed on the DXWG wiki:

Here listed are the individual requirements addressed by the Working Group in the formulation of this document.

238: An approved profile-guidance Use Case

Additional Issues

This section will be removed in a later version of this document.

Additional Issues related to this document and not yet placed within it are listed at the: