Structure for this document has been debated by the Dataset Exchange Working Group, principally at the following locations:

Abstract

A profile is a named set of constraints on one or more identified base specifications, necessary to accomplish a particular function.

This document aims to provide guidance on how to create, describe and publish profiles.

Overview of DXWG documents on profiles

This document is part of a set of documents on profiles, edited by the W3C Dataset Exchange Working Group (DXWG). Some of the documents are general while some are technology-specific:

Introduction

This section is non-normative.

A profile can be understood as an outline of some "thing" when seen from a specific point of view. Also, Profiling is the task of distilling the essential aspects or character of something, such as a person, from a specific angle. In the craft domain, a profile is taken by a tool that matches itself in detail to the contours of a 3-dimensional object and returns a 2-dimensional accurate representation from which other formable materials can be constrained and fashioned to that profile, or matched with it to determine how accurately it portrays the original 3-dimensional object from which the profile was taken.

In the same sense then, information entities can be viewed from different perspectives and in order to prepare them for specific uses they are frequently tested for their goodness of fit to some pattern, or the pattern can be provided prior to the gathering of the information to provide some constraint to ensure adequacy and appropriateness of that information asset to the job in hand.

In information technology, profiles may also support the data needs of specific applications. Profiling is often the work of a community interested in interoperability and data exchange. We define a profile generally as named set of constraints on one or more identified base specifications.

Good data practice generally begins with vocabulary and ontology designers who are encouraged to make their work as broadly applicable as possible so as to maximize future adoption. As a result, vocabularies and ontologies typically define a data model using minimal semantics. For example, DCAT [[vocab-dcat-2]] defines the concept of a dataset as an abstract entity with distributions and data services as means of accessing data; there is no mention of whether a distribution should be in a particular serialization, or set of serializations, nor of how data services should be configured. While it states that the value of dcat:theme should be a SKOS [[skos-reference]] concept, it does not specify a particular SKOS concept scheme, and so on. Other vocabularies, such as Dublin Core Terms [[DCTERMS]], are similarly for general use. This means that data models and methods of working can be applied in different circumstances than those in which the original definition work was carried out and, in that sense, these promote broad interoperability.

In addition to addressing the needs of a specific community, a profile may also apply to a single system. Any individual system will be designed to meet a specific set of needs; that is, it will operate in a specific context. It is that context, and the individual choices made by the engineers working within it, that will determine how a vocabulary or set of vocabularies will be used. For example, a system ingesting data may require that a specific subset of properties from a range of vocabularies is used and that only terms from a defined code list are used as values for specified properties. In other words, where the 'base vocabulary' might say "the value of this property SHOULD be a value from a managed code list", a specialized profile will say "the value of this property MUST be from this specific code list".

This document is about how to formulate and communicate profiles and the ways in which profiles can be identified and related to each other.

Definitions

We recognise that the term 'profile' occurs in several domains, and that there are a range of definitions available from different communities. We have taken this variability into account but for the purpose of this document we are using the following definition:

profile

A named set of constraints on one or more identified base specifications, including the identification of any implementing subclasses of datatypes, semantic interpretations, vocabularies, options and parameters of those base specifications necessary to accomplish a particular function.

This definition includes what are often called "application profiles", "metadata application profiles", or "metadata profiles".

Source: deliberations of the DXWG. See ProfileContext wiki page.

The motivation for profiles

Communities create and use data standards to ensure interoperability for information exchange. Although members of a community may use the same basic standard schema, it is very common for different subsets within the larger community to need some further specification of the data they create to meet their own needs. To continue to support interoperability of their data with others, these community members need to express the specifics of their implementation of the data schema. Profiles serve this purpose. Profiles enumerate vocabulary terms, cardinality, and validation rules, and can also include descriptions of the rules used by creators to make decisions regarding their data elements.

In the DXWG use case and requirements document [[DCAT-UCR]], our group has identified typical use cases for profiles, such as:

Examples of profiles and related work

Profiles can take a number of forms and can have a variety of relationships to existing vocabularies, standards, and other profiles. We recognise this variety, but for the purposes of this document we are focusing on the most general forms of profiles and profiling. Although it is not possible to list all of the types of profiles, some illustrations of frequently-used profiles include:

This section lists work that has been influential in the development of this document.

See comment 408916364

Diving into profiles

This section builds on the formal definition for profile as well as the requirements the W3C Data Exchange Working Group (DXWG) has identified, and lays down a conceptual model for profiles, their components and their functions.

Its goal is to establish a general view of profiles that supports the many potentially more specific examples in practice. Usages of the term 'profile' in other contexts may not be consistent with these requirements, and are thus out of scope.

A Profile is based on an existing Specification, which can have the status of a Standard and which can itself be a Profile of another Specification. A profile can have various Components/Manifestations in various expression languages for profiles. These Components/Manifestations play specific Functions (roles) for the implementation of the services that motivate the creation of the Profile. These Components are serialized and published as Distributions in various formats, which are the concrete input for performing these services.

The idea of the previous paragraph is to have a set of links to various sections, as done in the SKOS Primer at https://www.w3.org/TR/skos-primer/#secsimple. It needs to be re-written considering the final structure of the document. Also, the terminology in this section is still under discussion, e.g. here or here.

The profile may be expressed in a single specification, or it may consist of more than one component. For example, a profile may have a vocabulary in the form of a schema (such as OWL or XML schema), as well as a text file containing documentation for persons doing data creation, and actionable validation code, such as a ShEx or SHACL document. Components may be added or removed without changing the fact that the profile meets the definition of a profile as stated in this document. A profile can exist independently of the set of resources that is used to describe it.

Profiles are uniquely named sets of data constraints, such as data elements (e.g. classes, properties, value domains) that describe (meta)data.

When building a profile you SHOULD give it a name and you SHOULD build it from properties or terms that you have collected from pre-existing sources. Because we are considering a wide range of profiles, not just those in RDF, collections could be of metadata terms from any type of metadata schema, but they might also include collections of identifiers, value lists and enumerations, other data elements, and anything else that might be helpful in ensuring that the data matches the purpose intended by you and your community

In our context, constraints is a general notion, which includes specification of optional data elements.

It could also be in roles/functions if there is nothing there about defining terms. Antoine Isaac, 2018-11-11

Multiple base specifications [RPFMBSPEC]

A profile MAY be based on multiple base specifications. For example, a profile MAY be based on several data models and vocabularies at the same time. In profiles using XML schema or RDF technology, using multiple base specifications generally means using elements from multiple namespaces. A profile can constrain all or some elements from its base specifications.

Diagram showing an example of a profile with multiple base specifications
An example of profile with multiple base specifications (only two of them shown here): the Europeana Data Model, presented at https://w3c.github.io/dxwg/ucr/#ID37

Profile of profiles [RPFPP]

One can create a profile of profiles, where conforming to a lower profile in the hierarchy means conforming to all the above profiles. For example (see figure), conforming to the DPLA Metadata Application Profile means conforming to the Europeana Data Model, Dublin Core terms and Open Archives Initiative Object Reuse and Exchange.

Diagram showing an example of a profile of a profile
The example above, continued: the Europeana Data Model has been further refined by the Digital Public Library of America to create its own profile.
This example may be updated in the future, so as to reflect the decision on how much re-use must be observed in a specification so that it can be counted as a profile of another.

This guidance document leaves open, which level profile creators should indicate as their base. Indeed, the granularity of the work of profile creators vary: some may profile an existing profile as a 'ready-made', others may dive into the details of each original specification. In any case, the mechanism that we recommend for describing profiles allows data consumers to "follow their nose" through arbitrary networks of profiles.

Profile inheritance [RPFINHER]

Profiles may add to or specialise clauses from one or more base specifications. Such profiles inherit all the constraints from base specifications.

It was approved in minutes but there was no github issue for it. (This requirement is wrongly refered to as Github #238 and I've removed that reference). Antoine Isaac, 2018-11-11

Data publication according to different profiles-1 [RPFDAPUB-1]

Some data may conform to one or more profiles at once

This requirement overlaps with the following requirement and there is a lot of uncertainty on where this requirement should be categorized. Antoine Isaac, 2018-11-25

Data publication according to different profiles-2 [RPFDAPUB-2]

Data publishers may publish data according to different profiles, either simultaneously (e.g. in one same data "distribution") or in parallel (e.g. via content negotiation)

(KC:) This might be better placed in the publication section
There is a lot of uncertainty on where this requirement should be categorized. There is a big 'profile negotiation' flavour to it, but it's not only about it. We are keeping it here for now as we are not sure where it should be. It was listed as a general requirement, but it could be categorized instead as a requirement for profile negotiation, or DCAT distribution, or both. In addition, it is highly overlapping with the previous requirement (and we've edited the heading to reflect it). Antoine Isaac, 2018-11-25
(Antoine:) Obviously at this stage some names in this model have to be finalized. We will also provide with a diagram that shows our 'big picture' on profiles and their possible (or recommended) components. Something at the level of the diagram at https://docs.google.com/drawings/d/1dHkpwKwUwMgS1RqSCTPO3uOoRiY_qNk0z5bhXJlYi4Y/, which should work as an abstracted version of the prof-o model https://w3c.github.io/dxwg/profilesont/ (i.e. one without classes and properties that come from a specific namespace/vocabulary.

(Lars:) Probably we should not just use the DCAT view (distributions) but we should rather use more generic Web architecture terms as resources and representations.

(Riccardo:) We should talk about 'distribution of a manifestation' not 'distribution' alone.

The W3C DXWG has identified general requirements about profiles, which we list here and are going to expand upon (in more detailed requirements and recommendations) in the following section:

(Antoine:) Once the structure is stabilized this section should list the other sections it 'introduces'. Maybe these sections will be moved as sub-sections of this one.

Best Practices

BP13 Use locale-neutral data representations

Because profiles are intended to convey information both within a community and at times between communities, wide use would be facilitated by BP13, which is that the profile should use locale-neutral data representations where possible. Some data communities have deep and historical practices that use terminology that is specific to the community. The creation of a profile is an opportunity to transform that practice to widely known standards.

BP15 Reuse vocabularies, preferably standardized ones

BP15 is one of the cornerstones of the profile practice, which is to reuse vocabularies, in particular standardized ones. One would need to define "standardized" in this context, but perhaps a better solution is to define the qualities of preferred vocabularies: have a stable URI; are supported by an organization or community; (more??).

The Functions of a Profile

Requirements covering aspects of how profiles are being used, i.e. what functionality may they express or support, for example validation, or documentation of data.

Human-readable definitions [RPFHRDEF]

Profiles can have human-readable definitions of terms and input instructions.

Global rules for descriptive content [RPGRDC]

There needs to be a property in the profile where the rules for the descriptive content can be provided. This would apply to the entire profile.

Data validation [RPFVALID]

A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation

This requirement, which is a bit redundant with github:issue/279, has been shifted towards the function of data validation instead of focusing on the representations that enable it.

External specifications for individual properties [RPFESIP]

Profiles should be able to indicate which external specifications are expected to be applied/have been applied to values of individual properties.

Validity rules [RPFVALIDR]

Profiles may provide rules governing value validity.

Value lists for data elements [RPFVLIST]

Profiles may provide lists of values to pick from in order to populate data elements.

Cardinality rules [RPFCARDR]

Profiles may provide rules on cardinality of terms (including "recommended").

Dependency rules [RPFDEPR]

User interface support [RPFUI]

Profiles can have what is needed to drive forms for data input or for user display.

We could reference a list of profile expression languages and the functions they are typically expected to enable, similar to what has been started in Andrea's sheet: https://docs.google.com/spreadsheets/d/1zty4jtzhg0_1xojldomq1xeheliwvp2-stw6_-_zxr4/

Best Practices

BP16 Choose the right formalization level

The recommendation in BP16 is to choose the right formalization level. This is a useful recommendation for all data and metadata, and would naturally apply to profiles. Profiles should be suited to the tasks they are designed to support; not less nor more. In particular we should caution against overly strict use of constraints, which then make it harder for others to either make direct use of the profile or to create a profile of the profile where their needs vary only a small amount.

Profile publication

Profiles are published in the form of Components/Manifestations that are serialized in Distributions.

The W3C DXWG has identified the following requirement for such publication:

Profile documentation [RPFDOCU]

A profile should have human-readable documentation that expresses for humans the main components of a profile, which can also be available as machine-readable resources (ontology or schema files, SHACL files, etc). This includes listing of elements in the profile, instructions and recommendations on how to use them, constraints that determine what data is valid according to the profile, etc.

Schema implementations [RPFSCHEMAS]

Profiles may be written in or may link to a document or schema in a validation language (ShEx, SHACL, XMLschema).

This requirement, which is a bit redundant with github:issue/273, has been shifted towards the notion of distributions of schemas, instead of focusing on the general validation function.

(Antoine:) From https://github.com/w3c/dxwg/issues/242#issuecomment-408916364:
The document should indicate that a motivation to have description of profiles is to help clients find out what profiles are available that could match their needs and ask for it. This motivation applies for the Profiles Vocabulary and the metadata presented in the sections below.

Best Practices

BP9 Use persistent URIs as identifiers of datasets

Use persistent URIs as identifiers of profiles. This seems to be a non-controversial requirement because it applies to any web resource. There is a related best practice (BP10) which is stated as "Use persistent URIs as identifiers within datasets." This can be well-suited to the aspect of a profile that consists of the reuse of vocabulary terms that have been prevously defined, but also to the definition of new terms within the scope of the profile itself. Each element of a profile's vocabulary should have a URI (IRI?) that identifies the term and information about the term (such as labels, definitions, etc.)

BP14 Provide data in multiple formats

The BPs also recommend that data be provided in multiple formats, and this is a good recommendation for profiles as well. If we take the point of view that profiles, like DCAT, have an abstract essence that can be made manifest in more than one way, we already have a good basis for satisfaction of BP14. (This will bring up the question that has already come up in the context of DCAT and conneg - are all of the forms equivalent? We may not be able to resolve that question.)

BP12 Use machine-readable standardized data formats

profiles should be published in standard data formats (BP12). It should also be stated that profiles should be published in and make use of technologies that are appropriate to the community which is expected to use them. A profile using RDF and OWL will not well serve a community that has only an XML/XSD-based skill set. This also relates to the above recommendation relating to providing data in multiple formats. In many communities the skills and data history can vary, so providing profiles with as many as possible of the commonly used technologies will increase the utility of the profile (as well as the profiled instance data).

Profile metadata

Profile metadata [RPFMD]

Profiles must be discoverable through a machine-readable metadata that describes what is offered and how to invoke the offered profiles.

Documenting ecosystems of profiles [RPFDECOS]

From the perspective of management of profiles, and guidance to users and data experts, ecosystems of profiles should be properly described (e.g. in profile catalogues/repositories), especially documenting the relationships between profiles and what they are based on, and between profiles that are based on other profiles.

(Antoine:) From https://github.com/w3c/dxwg/issues/242#issuecomment-408916364, Administrative metadata:
(Antoine:) I've kept it here, but I do agree with the comment at https://github.com/w3c/dxwg/issues/242#issuecomment-408916364 (and above) that URI should be in a specific section on publication of profiles.
(Antoine:) From https://github.com/w3c/dxwg/issues/242#issuecomment-408916364, Descriptive metadata:
(Antoine:) From the 24 Oct discussion in Lyon reminding some approved requirements, which could fit here:

Best Practices

BP1 Provide metadata

In the BPoW this is defined as metadata about the dataset; with profiles this can be descriptive metadata within the profile or a separate metadata statement about the profile. (Since profiles themselves are generally forms of metadata they may be able to incorporate description and other administrative information about the profile within itself, if desired.) In addition there is a common set of administrative data that is recommended for many information resources such as dates and version designators for each version, and provenance (who or what agency created the digital file).

BP 2: Provide descriptive metadata

The BPDoW limits its recommendation of metadata covering the topic of the dataset to general keywords and themes and categories (BP2). It may be desirable provide more specific topical information to satisfy the DXWG requirement that profiles should be discoverable by search engines. The quality of discoverability will vary based on the depth of description of the topic and/or community area that it serves. (We may wish to recommend some particulars.)

BP30 Make feedback available

Ideally, the profile would have a management cycle for maintenance and updates. This should involve the community of users, as noted in BP29 (Gather feedback from data consumers) and BP30 (make feedback available). The strength and value of a profile will depend on the involvement of the community of users.

BP27 Preserve identifiers

The aspect of the management cycle that is often ignored is that of the de-commissioning of datasets or of superseded versions. For users it is key that previously used identifiers always point to a useful document or message.

The Profiles ontology

This section is non-normative.

(Antoine:) What I have in mind for this section is to introduce the DXWG Profiles ontology. Say how it can support the requirements we identified, point to it, and suggest to use in contexts where an RDF / Linked Data description of the profiles and their associated resources can be made. If it's non-normative, it should be presented as a possible approach to implementing our recommendations, for example using a sentence like The machine-readable version of the metadata on profiles and their links to associated resources may be provided using the Profiles ontology developed by the DXWG working group (shamelessly inspired from how we've done it in the DWBP doc at https://www.w3.org/TR/dwbp/#quality). I guess it should also reprise some points made at https://github.com/w3c/dxwg/issues/323.

Examples of application

This section is non-normative.

(Antoine:) The idea here would be to point to examples that follow our recommendations, made up to illustrate specific features, or complete 'real' implementations. Ideally these would be using prof-o, on the condition that we still refer to prof-o as a suggestion not a formal recommendation (see the way it's been mentioned above). I reckon that some may find it more natural to have fully fledged

Existing profiles

DCAT Application Profiles

Asset Description Metadata Schema

From the Asset Description Metadata Schema (ADMS) specification [[vocab-adms]]: "ADMS is a profile of DCAT [[vocab-dcat]], used to describe semantic assets".

Dublin Core Application Profile & The Singapore Framework

The Singapore Framework for Dublin Core Application Profiles [[DCSF]]

The Guidelines for Dublin Core Application Profiles [[DCAP]]

Description Set Profiles: A constraint language for Dublin Core Application Profiles [[DCDSP]]

Open Geospatial Consortium Profiling

Profiling of ISO standards

ODRL profiles

BIBFRA.ME (one namespace)

(Antoine) This comes from https://github.com/w3c/dxwg/issues/242#issuecomment-408916364, I'm ok having it but if we don't have time to write it up I can live with it...

Examples of new profiles

(Nick:) Following on from existing profiles above, we could have here new profiles, made since the introduction of this document. To act as exemplars.

Security and Privacy

Appendices

Definitions

profiling

The act of creating a profile - an activity that has been undertaken by many communities with a range of formalisms.

formalism

definition not finalised

Source: deliberations of the DXWG. See GitHub Issue 194.

specification

An act of identifying something precisely or of stating a precise requirement.

Source: Oxford English Dictionary.

standard

A basis for comparison; a reference point against which other things can be evaluated.

Source: [[DCTERMS]].

Base specification

A specification MAY assume a role of being a base specification, which is the foundation of the profile.

Constraint

Data element

Comparison with definitions in related work

(Antoine:) I'm keeping Nick's idea so that the WG can make its mind on whether it is worth pursuing. I personally think it's a noble endeavour but one that would consume too much of our time now.

The following term definitions, taken from the DC AP & SF specifications ([[DCAP]], [[DCSF]] & [[DCDSP]]) are included to indicate how this document's definitions contrast with those of previous work.

profile
The term profile is widely used to refer to a document that describes how standards or specifications are deployed to support the requirements of a particular application, function, community, or context. In the metadata community, the term _application profile_has been applied to describe the tailoring of standards for specific applications.
[[DCSF]]

Requirements

The placement of these Requirements within the main body of this document is being discussed on the DXWG wiki:

Here listed are the individual requirements addressed by the Working Group in the formulation of this document.

238: An approved profile-guidance Use Case

Additional Issues

This section will be removed in a later version of this document.

Additional Issues related to this document and not yet placed within it are listed at the: