Structure for this document has been debated by the Dataset Exchange Working Group, principally at the following locations:
A profile is a named set of constraints on one or more identified base specifications, necessary to accomplish a particular function.
This document aims to provide guidance on how to create, describe and publish profiles.
This document is part of a set of documents on profiles, edited by the W3C Dataset Exchange Working Group (DXWG). Some of the documents are general while some are technology-specific:
This section is non-normative.
A profile can be understood as an outline of some "thing" when seen from a specific point of view. Also, Profiling is the task of distilling the essential aspects or character of something, such as a person, from a specific angle. In the craft domain, a profile is taken by a tool that matches itself in detail to the contours of a 3-dimensional object and returns a 2-dimensional accurate representation from which other formable materials can be constrained and fashioned to that profile, or matched with it to determine how accurately it portrays the original 3-dimensional object from which the profile was taken.
In the same sense then, information entities can be viewed from different perspectives and in order to prepare them for specific uses they are frequently tested for their goodness of fit to some pattern, or the pattern can be provided prior to the gathering of the information to provide some constraint to ensure adequacy and appropriateness of that information asset to the job in hand.
In information technology, profiles may also support the data needs of specific applications. Profiling is often the work of a community interested in interoperability and data exchange. We define a profile generally as named set of constraints on one or more identified base specifications.
Good data practice generally begins with vocabulary and ontology designers who are encouraged to make their work as broadly applicable as possible so as to maximize future adoption. As a result, vocabularies and ontologies typically define a data model using minimal semantics. For example, DCAT [[vocab-dcat-2]] defines the concept of a dataset as an abstract entity with distributions and data services as means of accessing data; there is no mention of whether a distribution should be in a particular serialization, or set of serializations, nor of how data services should be configured. While it states that the value of dcat:theme should be a SKOS [[skos-reference]] concept, it does not specify a particular SKOS concept scheme, and so on. Other vocabularies, such as Dublin Core Terms [[DCTERMS]], are similarly for general use. This means that data models and methods of working can be applied in different circumstances than those in which the original definition work was carried out and, in that sense, these promote broad interoperability.
In addition to addressing the needs of a specific community, a profile may also apply to a single system. Any individual system will be designed to meet a specific set of needs; that is, it will operate in a specific context. It is that context, and the individual choices made by the engineers working within it, that will determine how a vocabulary or set of vocabularies will be used. For example, a system ingesting data may require that a specific subset of properties from a range of vocabularies is used and that only terms from a defined code list are used as values for specified properties. In other words, where the 'base vocabulary' might say "the value of this property SHOULD be a value from a managed code list", a specialized profile will say "the value of this property MUST be from this specific code list".
This document is about how to formulate and communicate profiles and the ways in which profiles can be identified and related to each other.
We recognise that the term 'profile' occurs in several domains, and that there are a range of definitions available from different communities. We have taken this variability into account but for the purpose of this document we are using the following definitions:
A named set of constraints on one or more identified base specifications, including the identification of any implementing subclasses of datatypes, semantic interpretations, vocabularies, options and parameters of those base specifications necessary to accomplish a particular function.
This definition includes what are often called "application profiles", "metadata application profiles", or "metadata profiles".
Source: deliberations of the DXWG. See ProfileContext wiki page.
The act of creating a profile - an activity that has been undertaken by many communities with a range of formalisms.
Communities create and use data standards to ensure interoperability for information exchange. Although members of a community may use the same basic standard schema, it is very common for different subsets within the larger community to need some further specification of the data they create to meet their own needs. To continue to support interoperability of their data with others, these community members need to express the specifics of their implementation of the data schema. Profiles serve this purpose. Profiles enumerate vocabulary terms, cardinality, and validation rules, and can also include descriptions of the rules used by creators to make decisions regarding their data elements.
This section builds on the formal definition for profile and the requirements DXWG has identified, and lays down a conceptual model for profiles, their components and their functions.
A Profile is based on an existing Specification, which can have the status of a Standard and which can be itself a Profile of another Specification. A profile can have various Components/Manifestations in various expression languages for profiles. These Components/Manifestations play specific Functions (roles) for the implementation of the services that motivate the creation of the Profile. These Components are serialized and published as Distributions in various formats, which are the concrete input for performing these services.
Profiles SHOULD be made up of a collection of properties or terms, and that collection SHOULD have a name. Properties are selected from existing vocabularies. Because profiles as being described here are not limited to those in RDF, collections could be of metadata terms from any type of metadata schema.
It could also be in roles/functions if there is nothing there about defining terms. Antoine Isaac, 2018-11-11
A profile can have multiple base specifications.github:issue/268
One can create a profile of profiles, with elements potentially inherited on several levels.
Profiles may add to or specialise clauses from one or more base specifications. Such profiles inherit all the constraints from base specifications.
It was approved in minutes but there was no github issue for it. (This requirement is wrongly refered to as Github #238 and I've removed that reference). Antoine Isaac, 2018-11-11
Some data may conform to one or more profiles at once
This requirement overlaps with the following requirement and there is a lot of uncertainty on where this requirement should be categorized. Antoine Isaac, 2018-11-25
Data publishers may publish data according to different profiles, either simultaneously (e.g. in one same data "distribution") or in parallel (e.g. via content negotiation)
(Lars:) Probably we should not just use the DCAT view (distributions) but we should rather use more generic Web architecture terms as resources and representations.
(Riccardo:) We should talk about 'distribution of a manifestation' not 'distribution' alone.
The W3C DXWG has identified general requirements about profiles, which we list here and are going to expand upon (in more detailed requirements and recommendations) in the following section:
(Antoine:) Once the structure is stabilized this section should list the other sections it 'introduces'. Maybe these sections will be moved as sub-sections of this one.
BP13 Use locale-neutral data representations
Because profiles are intended to convey information both within a community and at times between communities, wide use would be facilitated by BP13, which is that the profile should use locale-neutral data representations where possible. Some data communities have deep and historical practices that use terminology that is specific to the community. The creation of a profile is an opportunity to transform that practice to widely known standards.
BP15 Reuse vocabularies, preferably standardized ones
BP15 is one of the cornerstones of the profile practice, which is to reuse vocabularies, in particular standardized ones. One would need to define "standardized" in this context, but perhaps a better solution is to define the qualities of preferred vocabularies: have a stable URI; are supported by an organization or community; (more??).
Requirements covering aspects of how profiles are being used, i.e. what functionality may they express or support, for example validation, or documentation of data.
Profiles can have human-readable definitions of terms and input instructions.
There needs to be a property in the profile where the rules for the descriptive content can be provided. This would apply to the entire profile.
A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation
This requirement, which is a bit redundant with github:issue/279, has been shifted towards the function of data validation instead of focusing on the representations that enable it.
Profiles should be able to indicate which external specifications are expected to be applied/have been applied to values of individual properties.
Profiles may provide rules governing value validity.
Profiles may provide lists of values to pick from in order to populate data elements.
Profiles may provide rules on cardinality of terms (including "recommended").
Profiles may express dependencies between elements of the vocabulary (if A then not B, etc.).github:issue/278
Profiles can have what is needed to drive forms for data input or for user display.
BP16 Choose the right formalization level
The recommendation in BP16 is to choose the right formalization level. This is a useful recommendation for all data and metadata, and would naturally apply to profiles. Profiles should be suited to the tasks they are designed to support; not less nor more. In particular we should caution against overly strict use of constraints, which then make it harder for others to either make direct use of the profile or to create a profile of the profile where their needs vary only a small amount.
Profiles are published in the form of Components/Manifestations that are serialized in Distributions.
The W3C DXWG has identified the following requirement for such publication:
A profile should have human-readable documentation that expresses for humans the main components of a profile, which can also be available as machine-readable resources (ontology or schema files, SHACL files, etc). This includes listing of elements in the profile, instructions and recommendations on how to use them, constraints that determine what data is valid according to the profile, etc.
Profiles may be written in or may link to a document or schema in a validation language (ShEx, SHACL, XMLschema).
This requirement, which is a bit redundant with github:issue/273, has been shifted towards the notion of distributions of schemas, instead of focusing on the general validation function.
BP9 Use persistent URIs as identifiers of datasets
Use persistent URIs as identifiers of profiles. This seems to be a non-controversial requirement because it applies to any web resource. There is a related best practice (BP10) which is stated as "Use persistent URIs as identifiers within datasets." This can be well-suited to the aspect of a profile that consists of the reuse of vocabulary terms that have been prevously defined, but also to the definition of new terms within the scope of the profile itself. Each element of a profile's vocabulary should have a URI (IRI?) that identifies the term and information about the term (such as labels, definitions, etc.)
BP14 Provide data in multiple formats
The BPs also recommend that data be provided in multiple formats, and this is a good recommendation for profiles as well. If we take the point of view that profiles, like DCAT, have an abstract essence that can be made manifest in more than one way, we already have a good basis for satisfaction of BP14. (This will bring up the question that has already come up in the context of DCAT and conneg - are all of the forms equivalent? We may not be able to resolve that question.)
BP12 Use machine-readable standardized data formats
profiles should be published in standard data formats (BP12). It should also be stated that profiles should be published in and make use of technologies that are appropriate to the community which is expected to use them. A profile using RDF and OWL will not well serve a community that has only an XML/XSD-based skill set. This also relates to the above recommendation relating to providing data in multiple formats. In many communities the skills and data history can vary, so providing profiles with as many as possible of the commonly used technologies will increase the utility of the profile (as well as the profiled instance data).
Profiles must be discoverable through a machine-readable metadata that describes what is offered and how to invoke the offered profiles.
From the perspective of management of profiles, and guidance to users and data experts, ecosystems of profiles should be properly described (e.g. in profile catalogues/repositories), especially documenting the relationships between profiles and what they are based on, and between profiles that are based on other profiles.
BP1 Provide metadata
In the BPoW this is defined as metadata about the dataset; with profiles this can be descriptive metadata within the profile or a separate metadata statement about the profile. (Since profiles themselves are generally forms of metadata they may be able to incorporate description and other administrative information about the profile within itself, if desired.) In addition there is a common set of administrative data that is recommended for many information resources such as dates and version designators for each version, and provenance (who or what agency created the digital file).
BP 2: Provide descriptive metadata
The BPDoW limits its recommendation of metadata covering the topic of the dataset to general keywords and themes and categories (BP2). It may be desirable provide more specific topical information to satisfy the DXWG requirement that profiles should be discoverable by search engines. The quality of discoverability will vary based on the depth of description of the topic and/or community area that it serves. (We may wish to recommend some particulars.)
BP30 Make feedback available
Ideally, the profile would have a management cycle for maintenance and updates. This should involve the community of users, as noted in BP29 (Gather feedback from data consumers) and BP30 (make feedback available). The strength and value of a profile will depend on the involvement of the community of users.
BP27 Preserve identifiers
The aspect of the management cycle that is often ignored is that of the de-commissioning of datasets or of superseded versions. For users it is key that previously used identifiers always point to a useful document or message.
This section is non-normative.
This section is non-normative.
definition not finalised
Source: deliberations of the DXWG. See GitHub Issue 194.
An act of identifying something precisely or of stating a precise requirement.
Source: Oxford English Dictionary.
A basis for comparison; a reference point against which other things can be evaluated.
Here listed are the individual requirements addressed by the Working Group in the formulation of this document.
This section will be removed in a later version of this document.
Additional Issues related to this document and not yet placed within it are listed at the: