This document advises on best practices related to the publication and usage of spatial data on the Web; the use of Web technologies as they may be applied to location. The best practices are intended for practitioners, including Web developers and geospatial experts, and are compiled based on evidence of real-world application. These best practices suggest a significant change of emphasis from traditional Spatial Data Infrastructures by adopting a Linked Data approach. As location is often the common factor across multiple datasets, spatial data is an especially useful addition to the Linked Data cloud; the 5 Stars of Linked Data paradigm is promoted where relevant.
As a First Public Working Draft, this document is incomplete. The editors seek to illustrate the full scope of the best practices- albeit with the details missing at this stage. In particular, the examples for each best practice are largely incomplete. The editors intend to compile a much richer set of examples in the period leading up to publication of the next Working Draft. Feedback is requested on the scope of this document and the best practices herein. The editors are particularly keen for reviewers to cite examples that may be used to further illustrate these best practices.
The charter for this deliverable states that the best practices will include "an agreed spatial ontology conformant to the ISO 19107 abstract model and based on existing available ontologies [...]". Rather than creating a new spatial ontology, the editors aim to provide a methodology that will help data publishers choose which exisitng spatial ontology is relevant for their application. If deemed necesssary to meet the stated requirements (see [[SDW-UCR]]), a new spatial ontology, or elements that extend the existing spatial ontologies, will be established in subsequent Working Draft releases.
The editors also intend to provide supplementary methods to navigate through the best practices in order to increase the utility of this document. This will be addressed for the next Working Draft release.
For OGCThis is a Public Draft of a document prepared by the Spatial Data on the Web Working Group (SDWWG) — a joint W3C-OGC project (see charter). The document is prepared following W3C conventions. The document is released at this time to solicit public comment.
Data on the Web Best Practices ([[DWBP]]) outlines a growing interest in publishing and consuming data on the Web. Very often the common factor across multiple datasets is the location data. Spatial data, or data related to a location, is what this Best Practice document is all about.
Definition of "spatial data" is required.
It's not that there is a lack of spatial data on the Web; the maps, satellite and street level images offered by search engines are familiar and there are many more nice examples of spatial data being used in Web applications. However, the data that has been published is difficult to find and often problematic to access for non-specialist users. The key problems we are trying to solve in this document are discoverability and accessibility, and our overarching goal is to bring publishing spatial data into the web mainstream as a mechanism for solving these twin problems.
Is "interoperability" also a top-level problem (alongside discoverability and accessibility)?
Different groups of people have a need for spatial data on the Web.
Commercial operators, including search engine operators, invest a great deal of time and effort in generating geographical databases which are mirrors to Web content with the geographical context often added manually or at best semi-automatically. This process would be much better if data were published on the Web with the appropriate geographic information at the source, so it can be found and accessed using the standard mechanisms of the Web.
Geospatial experts who try to find and use data published in Spatial Data Infrastructures (SDI), can get frustrated by the fact that most web services available for spatial data are in fact Web Map Services (WMS) services, which serve static pictures, not data. Web Feature Services (WFS) services, which allow you to get data, also exist, but are far less common. One could ask: do we really have a Spatial Data Infrastructure or is it mostly just a 'spatial picture infrastructure'?
If you're not a geospatial expert, this may be the first time you've heard terms like SDI, WMS and WFS. But in fact, there is a whole world of geospatial standards, maintained by the Open Geospatial Consortium (OGC), aimed at publishing geospatial data in a related series of standardized Web services and processing it with specialized tools. These technologies have a steep learning curve.
The intended users of the SDI are experts in the geospatial domain. The OGC standards cover the full range of geospatial use cases – some of which are very complex. Because of this, it requires significant expertise in geospatial information technology to be able to use the SDI.
For Web developers, who come from outside the geospatial domain, the data behind the OGC services is part of the "Deep Web" - the data is published behind specialized Web services and is not easy to get at, unless you're an expert. But Web developers are increasingly creating and using data related to locations, e.g. obtained from GPS enabled mobile devices and sensors, so they are important participants in the business of geospatial data.
The public sector creates a lot of geospatial data, much of which is open and can be useful to others who may or may not be geospatial data experts. This includes statistical data, for example regions identified by NUTS codes used as territorial units for statistics by the European Union.
Spatial data often has a temporal component; things move, and boundaries change over time. Geospatial data is becoming more and more important for the Web and its importance is still growing, among other things because of the rise of the Internet of Things. The importance of spatial data is far beyond serving static 'pixels'. Meaningful information about objects is required with clearly expressed semantics.
In short, there is a large demand for spatial data on the Web. But there are some questions around publishing spatial data on the Web that keep popping up and need to be answered. One is that many relevant standards already exist. These include informal 'community standards' - geospatial formats and/or vocabularies that enjoy widespread adoption ([[GeoJSON]] being a prime example) - and others for which the formal standardization process has not been completed. Where standards have been completed there are competing ideas, and it is often unclear which one you should use. With these factors in mind, this Best Practice document aims to clarify and formalize the relevant standards landscape.
Analysis of the requirements derived from scenarios that describe how spatial data is commonly published and used on the Web (as documented in [[SDW-UCR]]) indicates that, in contrast to the workings of a typical SDI, the Linked Data approach is most appropriate for publishing and using spatial data on the Web. Linked Data provides a foundation to many of the best practices in this document.
If you are not a Web developer, Linked Data may be one of those buzz words that doesn’t mean much to you. Essentially it is about publishing bite size information resources about real world or conceptual things, using URIs to identify those things, and publishing well described relationships between them, all in a machine readable way that enables data from different sources to be connected and queried.
Questions answered in this document include:
The best practices in this document are based on what is already there. It is not meant to be a 'best theories' document: it takes its examples and solutions from what is being done in practice right now. The examples point to publicly available real-world datasets. In other words, this document is as much as possible evidence based. And where real-world practice is missing, it provides clearly identified recommended practices.
The best practices in this document are designed to be testable by humans and/or machines in order to determine whether you are following them or not. They are based on the idea of using the Web as a data sharing platform, as described in Architecture of the World Wide Web, Volume One [[webarch]].
This document complies as much as possible with the principles of the Best Practices for Publishing Linked Data [[LD-BP]] and the (developing) Data on the Web Best Practices [[DWBP]]. Where it does not, this will be identified and explained.
Devise a way to make best versus emerging practices clearly recognizable in this document.
Need to describe how the proposed best practices differ from typical SDI approaches; content to be added.
Details of "Audience" section overlaps with "Introduction"; redraft to avoid duplication.
The audience is the broadest community of Web users possible, three important groups of which are described below. Application and tool builders addressing the needs of the mass consumer market should find value and guidance in the document.
These are people who already have a spatial dataset (or more than one) as part of existing SDIs and they want to publish it as "spatial data on the web" so that data becomes "mashable". They are vital in liberating existing spatial data, already published but not visible or linkable.
These are people who just want to work with (find, publish, use) spatial data on the Web and are not necessarily experts in spatial technology. Spatial data is just one facet of the information space they work with, and they want to use Web technologies to work with spatial data. These web developers will be writing Web-based applications that either use spatial data directly or help non-technical users publish spatial information on the Web, for example, people who are publishing content about their village fête or local festival including relevant spatial information.
The first two audience groups mentioned are user types; but spatial data publishers are a third important audience for this document. They want to know how to publish their spatial data so that it can be used to its full potential.
Are "content publishers" sufficiently different from the other defined audience categories? Do we need this category?
This document extends the scope of the [[DWBP]] to advise on best practices related to the publication and usage of spatial data on the Web, including sensor data and spatial data. Spatial data concerns resources that have physical extent, from buildings and landmarks to cells on a microscope slide. Where [[DWBP]] is largely concerned with datasets and their distributions (as defined in [[vocab-dcat]]), this document focuses on the content of the spatial datasets: how to describe and relate the individual resources and entities (e.g. SpatialThings) themselves. The best practices included in this document are intended for practitioners; encouraging publication and/or re-use of spatial data (datasets and data streams) on the Web.
Best practices described in this document fulfill the requirements derived from scenarios that describe how spatial data is commonly published and used on the Web (use cases and requirements are documented in [[SDW-UCR]]). In line with the charter, this document provides advice on:
Location is often the common factor across multiple datasets, which makes spatial data an especially useful addition to the Linked Data cloud. Departing from the typical approach used in Spatial Data Infrastructures, these best practices promote a Linked Data approach ().
Given our focus on spatial data, best practices for publishing any kind of data on the Web are deemed to be out of scope. Where relevant to discussion, such best practices will be referenced from other publications including [[DWBP]] and [[LD-BP]]. Other aspects that are out of scope include best practices for:
Compliance with each best practice in this document can be tested by programmatically and/or by human inspection. However, note that determining whether a best practice has been followed or not should be judged based on the intended outcome rather than the possible approach to implementation which is offered as guidance; implementation approaches are subject to change as technology and practices evolve and enhance.
This section presents the template used to describe Spatial Data on the Web Best Practices.
Best Practice Template
Short description of the BP
Why
This section answers crucial questions:
Intended Outcome
What it should be possible to do when a data publisher follows the best practice.
Possible Approach to Implementation
A description of a possible implementation strategy is provided. This represents the best advice available at the time of writing but specific circumstances and future developments may mean that alternative implementation methods are more appropriate to achieve the intended outcome.
How to Test
Information on how to test the BP has been met. This might or might not be machine testable.
Evidence
Information about the relevance of the BP. It is described by one or more relevant requirements as documented in the Spatial Data on the Web Use Cases & Requirements document
In the geospatial community, the entities within datasets are usually information resources, representing areas on a map, which in turn represent real world things. These information resources are usually called 'features'. A feature is a representation of a real world thing and there could be, and often are, more than one feature representing the same real world thing. A feature often has, as a property, a geometry which describes the location of the feature.
For example, a lighthouse standing somewhere on the coast is a real world thing. In some dataset, an information record about this lighthouse exists: a 'feature'. In current practice, this feature will often have properties that are about the real thing (e.g. the height of the lighthouse) and properties about the information record (e.g. when it was last modified). There could be several features, in different datasets, that refer to the same lighthouse; one dataset has its location and date it was built, another has data about its ownership or about shipwrecks near the same lighthouse.
Mostly, people looking for information are interested in real world things, not in information resources. This means the real world things should get global identifiers so they can be found and referenced. The features and their map representations - geometries and topologies - should have global identifiers too so they can be referenced as well; and must have them if they are managed elsewhere.
Discussion on Features, information resources and real-world Things is unclear and needs redrafting
Use globally unique HTTP identifiers for entity-level resources
Entities within datasets SHOULD have unique, persistent HTTP or HTTP(S) URIs as identifiers.
The term "entity-level resources" is confusing and needs to be clarified or replaced.
Why
A lot of spatial data is available 'via the Web' - but not really 'on the web': you can download datasets, or view, query and download data via web services, but it is usually not possible to reference an entity within a dataset, like you would a web page. If this were possible, spatial data would be much easier to reuse and to integrate with other data on the Web.
Intended Outcome
Entities (SpatialThings) described in a dataset will each be identified using a globally unique HTTP URI so that a given entity can be unambiguously identified in statements that refer to that entity.
Possible Approach to Implementation
In order for identifiers to be useful, people should be comfortable creating them themselves without needing to refer to some top-level naming authority- much like how Twitter's hashtags are created dynamically. Good identifiers for data on the web should be dereferenceable/resolvable, which makes it a good idea to use HTTP URIs as identifiers. There is no top down authority that you have to go to in order to create such identifiers for spatial objects. So just make them up yourself if you need them and they don't exist. Best Practice 2: Reuse existing (authoritative) identifiers when available explains how to find already existing identifiers you can reuse.
Read [[DWBP]] Best Practice 11: Use persistent URIs as identifiers within datasets for general discussion on why persistent URIs should be used as identifiers for data entities. Using URIs means the data can be referenced using standard Web mechanisms; making them persistent means the links won't get broken. Note that ideally the URIs should resolve, however, they have value as globally scoped variables whether they resolve or not.
For guidance about how to create persistent URIs, read [[DWBP]] Best Practice 10: Use persistent URIs as identifiers. Keep in mind not to use service endpoint URLs, as these are usually dependent on technology, implementation, and/or API version and will probably change when the implementation changes.
update reference to DWBP BP 11 #identifiersWithinDatasets
Complete this section and How to Test section
How to Test
...
Evidence
Relevant requirements: R-Linkability.
Reuse existing (authoritative) identifiers when available
Avoiding the creation of lots of identifiers for the same resource
Why
In general, re-using identifiers for well-known resources is a good idea, because it discourages proliferation of disparate copies with uncertain provenance. Linking your own resources to well-known or authoritative resources also makes relationships between your data and other data, which refers to the same well-known resource, discoverable. The result is a network of related resources using the identifiers for the SpatialThings.
In the case of SpatialThings, a simple way of indicating a location is by referencing an already existing named place resource on the Web. For example, DBpedia and GeoNames are existing datasets with well-known spatial resources, i.e. besides place names and a lot of other information, a set of coordinates is also available for the resources in these datasets. The advantage of referring to these named place resources is that it makes clear that different resources which refer to, for example, http://dbpedia.org/page/Utrecht
, are all referring to the same city. If these resources did not use a URI reference but a literal value "Utrecht" this could mean the province Utrecht, the city Utrecht (both places in the Netherlands), the South African town called Utrecht, or maybe something else entirely.
See also Best Practice 22: Link to resources with well-known or authoritative identifiers.
Some content of this BP may be moved to BP link-to-auth-identifiers.
Intended Outcome
Already existing identifiers for spatial resources are reused instead of new ones being created.
Possible Approach to Implementation
If you've got feature data and want to publish that as linked data, the first step is to see if there's an authoritative URI already available that you can reuse. If so, do that; else refer to Best Practice 1: Use globally unique identifiers for entity-level resources.
DBpedia and GeoNames are examples of popular, community-driven resource collections. Another good source of resource collections is often found in public government data, such as national registers of addresses, buildings, and so on. Mapping and cadastral authorities maintain datasets that provide geospatial reference data.
See Appendix B. Authoritative sources of geographic identifiers for a list of good sources of geographic identifiers.
How to Test
An automatic check is possible to determine if any of the good sources of geographic identifiers are referenced.
Evidence
Relevant requirements: R-GeoReferencedData, R-IndependenceOnReferenceSystems.
Working with data that lacks globally unique identifiers for entity-level resources
Spatial reconciliation across datasets
Why
There are many mechanisms to reconcile (i.e. find related, map or link) objects from different datasets. When two spatial datasets contain geometries, you can use spatial functions to find out which objects overlap, touch, etc. Based on this spatial correlation you might determine that two datasets are talking about the same places, but this is often not enough. For reasons of efficiency or simply for being able to use these spatial correlations in a context where spatial functions are not available, it is a good idea to express these spatial relationships explicitly in your data. There is also danger in relying on spatial correlation alone; you might conclude that two resources represent the same thing when in reality they represent, for example, a shop at ground level and living apartment above it.
Intended Outcome
Links between resources in datasets created from spatial correspondences.
Possible Approach to Implementation
If you want to link two spatial datasets, find out if they have corresponding geometries using spatial functions and then express these correspondences as explicit relationships.
In this best practice we only give guidance on spatial reconciliation (e.g. two mentions of Paris are talking about the same place). We do not address thematic reconciliation.
If the spatial datasets you want to reconcile are managed in a Geographic Information System (GIS), you can use the GIS spatial functions to find related spatial things. If your spatial things are expressed as Linked Data, you can use [[GeoSPARQL]], which has a set of spatial query functions available.
How to express discovered relationships is discussed in Best Practice 13: Assert known relationships.
This Best Practice needs more content.
So far we have discussed the short comings of using names as identifiers (and the subsequent need for reconciliation). We also need to discuss assigning URIs based on local identifiers; for example, row numbers from tabular data or Feature identifiers from geo-databases.
How to Test
...
Evidence
Relevant requirements: R-Linkability.
Provide stable identifiers for Things (resources) that change over time
Even though resources change, it helps when they have a stable, unchanging identifier.
Why
Spatial things can change over time, but as explained in Assigning identifiers to real world things and information resources, their identifiers should be persistent.
Should we reference the paradox of the Ship of Theseus to highlight there is no rigorous notion of persistent identity?
Intended Outcome
Even if a spatial thing has changed, its identifier should stay the same so that links to it don't get broken.
Possible Approach to Implementation
[[DWBP]] Best Practice 8: Provide versioning information explains how to provide versioning info for datasets. It doesn't provide information about versioning individual resources.
Spatial things can change in different ways, and sometimes the change is such that it's debatable if it's still the same thing. Think carefully about the life cycle of the spatial things in your care, and be reluctant to assign new identifiers. A lake that became smaller or bigger is generally still the same lake.
If your resources are versioned, a good solution is to provide a canonical, versionless URI for your resource, as well as date-stamped versions.
How to Test
Check the identifier for any version-dependent components.
Evidence
Relevant requirements: R-Linkability
Provide identifiers for parts of larger information resources
Identify subsets of large information resources that are a convenient size for applications to work with
Is the term "subset" correct?
Why
Some datasets, particularly coverages such as satellite imagery, sensor measurement timeseries and climate prediction data, can be very large. It is difficult for Web applications to work with large datasets: it can take considerable time to download the data and requires sufficient volume local storage to be available. To remedy this challenge, it is often useful to provide identifiers for conveiently sized subsets of large datasets that Web applications can work with.
Intended Outcome
Being able to refer to subsets of a large information resource that are sized for convienient usage in Web applications.
Possible Approach to Implementation
Two possible approaches are described below:
Web service URLs in general not a good URI for a resource as it is unlikely to be persistent. A Web service URL is often technology and implementation dependent and both are very likely to change with time. For example, consider oft used parameters such as ?version=
. Good practice is to use URIs that will resolve as long as the resource is relevant and may be referenced by others, therefore identifiers for subsets should be protocol independent.
How to Test
...
Evidence
Relevant requirements: R-Compatibility, R-Linkability, R-Provenance, R-ReferenceDataChunks.
More content needed for this BP.
It is important to publish your spatial data with clear semantics. The primary use case for this is you already have a database of assets and you want to publish the semantics of this data. Another use case is someone wanting to publish some information which has a spatial component on the web in a form that search engines will understand.
The spatial thing itself as well as its spatial properties have semantics. There are several vocabularies which cover spatial things and spatial properties. If you need extra semantics not available in an existing vocabulary, you should create your own.
How to publish your vocabulary, which describes the meaning of your data, is explained in [[LD-BP]]. We recommend that you link your own vocabulary to commonly used existing ones because this increases its usefulness. How to do this is out of scope for this document; however, we give some examples of mapping relations you can use from OWL, SKOS, RDFS. And we do the mapping between some commonly used spatial vocabularies.
The current list of RDF vocabularies / OWL ontologies for spatial data being considered by the SDW WG are provided below. Some of these will be used in examples. Full details, including mapping between vocabularies, pointers about inconsistencies in vocabularies (if any are evident), and recommendations avoiding their use as these may lead to confusion, will be published in a complementary NOTE: Comparison of geospatial vocabularies.
Vocabularies can discovered from Linked Open Vocabularies (LOV); using search terms like 'location' or Tags place, Geography, Geometry and Time.
http://statistics.data.gov.uk/def/statistical-geography#
and http://statistics.data.gov.uk/def/statistical-entity#
(URIs do not resolve)No attempts have yet been made to rank these vocabularies; e.g. in terms of expressiveness, adoption etc.
The motivation behind the ISA Programme Location Core Vocabulary was establishing a minimal core common to existing spatial vocabularies. However, experience suggests that such a minimal common core is not very useful as one quickly need to employ specific semantics to meet one's application needs.
This entire subsection is concerned with helping data publishers choose the right spatial data format or vocabulary. Collectively this section provides a methodology for making that choice. We do this rather than recommending one vocabulary because this recommendation would not be durable as vocabularies are released or amended.
Do we need a subclass of SpatialThing for entities that do not have a clearly defined spatial extent; or a property that expresses the fuzzyness the extent?
Provide a minimum set of information for your intended application
When someone looks up a URI for a SpatialThing, provide useful information, using the common representation formats
Why
This will allow to distinguish SpatialThings from one another by looking at their properties; e.g. type, label. It will also allow to get the basic information about SpatialThings by referring to their URI.
Intended Outcome
This requirement should serve a minimum set of information for a SpatialThing against a URI. In general, this will allow to look up the properties and features of a SpatialThings, and get information from machine-interpretable and/or human-readable descriptions.
Possible Approach to Implementation
This requirement specifies that useful information should be returned when a resource is referenced. This can include:
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-MultilingualSupport, R-SpatialVagueness
How to describe geometry
Geometry data should be expressed in a way that allows its publication and use on the Web.
Why
This best practice helps with choosing the right format for describing geometry based on aspects like performance and tool support. It also helps when deciding on whether or not using literals for geometric representations is a good idea.
Intended Outcome
The format chosen to express geometry data should:
Possible Approach to Implementation
Steps to follow:
geo:lat
and geo:long
that are used extensively for describing geo:Point
objects.How to Test
...
Evidence
Relevant requirements: R-BoundingBoxCentroid, R-Compressible, R-CRSDefinition, R-EncodingForVectorGeometry, R-IndependenceOnReferenceSystems, R-MachineToMachine, R-SpatialMetadata, R-3DSupport, R-TimeDependentCRS, R-TilingSupport.
Specify Coordinate Reference System for high-precision applications
A coordinate referencing system should be specified for high-precision applications to locate geospatial entities.
Why
The choice of CRS is sensitive to the intended domain of application for the geospatial data. For the majority of applications a common global CRS (WGS84) is fine, but high precision applications (such as precision agriculture and defence) require spatial referencing to be accurate to a few meters or even centimeters.
Add explanation of why there are so many CRSs.
Need to clarify when and why people use different CRS's
The misuse of spatial data, because of confusion about the CRS, can result in catastrophic results; e.g. both the bombing of the Chinese Embassy in Belgrade during the Balkan conflict and fatal incidents along the East Timor border are generally attributed to spatial referencing problems.
Intended Outcome
A Coordinate Reference System (CRS) sensitive to the intended domain of application (e.g. high precision applications) for the geospatial data should be chosen.
Possible Approach to Implementation
Recommendations about CRS referencing should consider:
How to Test
...
Evidence
Relevant requirements: R-DefaultCRS
How to describe relative positions
Provide a relative positioning capability in which the entities can be linked to a specific position.
Why
In some cases it is needed to describe the location of an entity in relation to another location or in relation to location of another entity. For example, South-West of Guildford, close to London Bridge.
Intended Outcome
It should be possible to describe the location of an entity in relation to another entity or in relation to a specific location, instead of specifying a geometry.
The relative positioning descriptions should be machine-interpretable and/or human-readable.
Possible Approach to Implementation
The relative positioning should be provided as:
Do we need this as a best practice; if yes, this BP needs more content
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-SamplingTopology.
How to describe positional (in)accuracy
Accuracy and precision of spatial data should be specified in machine-interpretable and human-readable form.
Why
The amount of detail that is provided in spatial data and the resolution of the data can vary. No measurement system is infinitely precise and in some cases the spatial data can be intentionally generalized (e.g. merging entities, reducing the details, and aggregation of the data) [[Veregin]].
Intended Outcome
When known, the resolution and precision of spatial data should be specified in a way to allow consumers of the data to be aware of the resolution and level of details that are considered in the specifications.
Possible Approach to Implementation
...
We need some explanations for the approaches to describe positional (in)accuracy.
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-QualityMetadata.
How to describe properties that change over time
Entities and their data should have versioning with time/location references
Why
Entities and their properties can change over time and this change can be related to spatial properties, for example when a spatial thing moves from one location to another location, or when it becomes bigger or smaller. For some use cases you need to be able to explicitly refer to a particular version of information that describes a SpatialThing, or to infer which geometry is appropriate at a specific time, based on the versioning. To make this possible, the properties that are described for an entity should have references to the time and location that the information describing a SpatialThing was captured and should retain a version history. This allows you to reference the most recent data as well as previous versions and to also follow the changes of the properties.
Intended Outcome
Properties described in a dataset will include a time (and/or location) stamp and also versioning information to allow tracking of the changes and accessing the most up-to-date properties data.
Possible Approach to Implementation
Need to include guidance on when a lightweight approach (ignoring the change aspects) is appropriate
When entities and their properties can change over time, or are valid only at a given time, and this needs to be captured, it is important to specify a clear relationship between property data and its versioning information. How properties are versioned should be explained in the specification or schema that defines those properties. Temporal and/or spatial metadata should be provided to indicate when and where the information about the SpatialThing was captured.
For an example of how to version information about entities and their properties and retaining a version history, see version:VersionedThing
and version:Version
at https://github.com/UKGovLD/registry-core/wiki/Principles-and-concepts#versioned-types.
It is also useful to incorporate information on how often the information might change, i.e. the frequency of update.
Data publishers must decide how much change is acceptable before a SpatialThing cannot be considered equivalent. At this point, a new identifier should be used as the subject for all properties about the changed SpatialThing. Also see Best Practice 4: Provide stable identifiers for Things (resources) that change over time.
How to work with data that is such high volume (e.g. sensor data streams) that the data is discarded after a period of time?
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-MovingFeatures, R-Streamable
In most cases, the effective use of information resources requires understanding thematic concepts in addition to the spatial ones; "spatial" is just a facet of the broader information space. For example, when the Dutch Fire Service responded to an incident at a day care center, they needed to evacuate the children. In this case, the 2nd closest alternative day care center was preferred because it was operated by the same organization as the one that was subject of the incident, and they knew who all the children were.
This best practice document provides mechanisms for determining how places and locations are related - but determining the compatibility or validity of thematic data elements is beyond our scope; we're not attempting to solve the problem of different views on the same/similar resources.
Thematic semantics are out of scope for this best practice document. For associated best practices, please refer to [[DWBP]] Metadata, Best Practice 4 Provide structural metadata; and [[DWBP]] Vocabularies, Best Practice 15 Use standardized terms, Best Practice 16 RE-use vocabularies and Best Practice 17 Choose the right formalization level.
See also [[LD-BP]] Vocabularies.
Use spatial semantics for spatial Things
The best vocabulary should be chosen to describe the available spatial things.
Why
The spatial things can be described using several available vocabularies. A robust methodology or an informed decision making process should be adapted to choose the best available vocabulary to describe the entities.
Intended Outcome
Entities and their properties are described using common and reusable vocabularies to increase and improve the interoperability of the descriptions.
Possible Approach to Implementation
There are various vocabularies that provide common information (semantics) about spatial things, such as Basic Geo vocabulary, [[GeoSPARQL]] or schema.org that provide common information about spatial things. This best practice helps you decide which vocabulary to use. The semantic description of entities and their properties should use the existing common vocabularies in their descriptions to increase the interoperability with other descriptions that may refer to the same vocabularies. For this it is required to:
There are different vocabularies that are related to spatial things. This best practice will provide a method for selecting the right vocabulary for your task, in the form of a durable methodology or an actionable selection list.
The Basic Geo vocabulary has a class SpatialThing which has a very broad definition. This can be applicable (as a generic concept) to most of the common use-cases.
For some use cases we might need something more specific.
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-MobileSensors, R-MovingFeatures.
We might publish in the BP or a complimentary note a set of statements mapping the set of available vocabularies about spatial things. There are mappings available e.g. GeoNames has a mapping with schema.org. http://www.geonames.org/ontology/mappings_v3.01.rdf
Assert known relationships
Spatial relationships between Things should be specified in forms of geographical, topological and hierarchical links.
Why
It is often more efficient to rely on relationships asserted between SpatialThings rather than rely on solutions such as analysis of geometries to find out that two Things are, for example, at the same place, near each other, or one is inside the other. Describing the spatial relationships between SpatialThings can be based on relationships such as topological, geographical and hierarchical (e.g. partOf) links.
Relating SpatialThings to other spatial data enables, for example, digital personal assistants (e.g. Siri, Cortana) to make helpful suggestions or infer useful information such as "address" and "description" attributes added to extent data model of GeoLocation API. See also W3C EMMA 2.0; devices provide location and time-stamp data, and this helps us, for example:
Intended Outcome
This requirement will allow expressing explicit spatial relationships between Things in the form of geographical, topological and hierarchical links that will not need post geometric processing and inferring to find the spatial links.
Possible Approach to Implementation
How to use spatial functions to find out if spatial things have corresponding geometries is described in Best Practice 3: Working with data that lacks globally unique identifiers for entity-level resources. This best practice describes how to express these discovered relationships between resources about physical and conceptual spatial things.
The asserted spatial semantics can include relationship such as nearby, contains, etc. This best practice requires specifying geometric, topological and social spatial relationships. It is also important to determine which relationships are appropriate for a given case (This is beyond the scope of this BP). This best practice requires:
Social relationships can be defined based on perception; e.g. "samePlaceAs", nearby, south of. These relationships can also be defined based on temporal concepts such as: after, around, etc. In current practice, there is no such property as samePlaceAs
to express the social notion of place; enabling communities to unambiguously indicate that they are referring to the same place but without getting hung up on the semantic complexities inherent in the use of owl:sameAs
or similar.
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-SamplingTopology, R-SpatialRelationships, R-SpatialOperators.
Which vocabularies out there have social spatial relationships? FOAF, GeoNames, ...
Temporal relationship types will be described here and be entered eventually as link relationship types into the IANA registry, Link relations, just like the spatial relationships.
In the same sense as with spatial data, temporal data can be fuzzy.
Retain section; point to where temporal data is discussed in detail elsewhere in this document.
The best practices described in this document will incorporate practice from both Observations and Measurements [[OandM]] and W3C Semantic Sensor Network Ontology [[SSN]].
See also W3C Generic Sensor API and OGC Sensor Things API. These are more about interacting with sensor devices.
Provide context required to interpret observation data values
Observation data should be linked to spatial, temporal and thematic information that describes the data.
Why
Processing and interpreting observation and measurement data in many use cases will require other contextual information including spatial, temporal and thematic information. This information should be specified as explicit semantic data and/or be provided as linked to other resources.
Intended Outcome
The contextual data will specify spatial, temporal and thematic data and other information that can assist to interpret the observation data; this can include information related to quality, observed property, location, time, topic, type, etc.
Possible Approach to Implementation
The context required to interpret observation values will require:
How to Test
...
Evidence
Relevant requirements: R-ObservedPropertyInCoverage, R-QualityMetadata, R-SensorMetadata, R-SensingProcedure, R-UncertaintyInObservations.
Describe sensor data processing workflows
Processing steps that are used in collecting and publication of sensor data should be specified as semantic data associated to the sensor observations.
Why
Sensor data often goes through different pre-processing steps before making the data available to end-users. Providing information about these processes and workflows that are undertaken in collection and preparation of sensor data helps users understand how the data is modified and decide whether the data is appropriate for a given application/purpose.
Intended Outcome
Explicit semantic descriptions and/or links to external resources that describe the processing workflows that are used in collection and preparation of the sensor data.
Possible Approach to Implementation
Processing workflows are often employed to transform raw observation data into more usable forms. For example, satellite data often undergoes multiple processing steps (e.g. pixel classification, georeferencing) to create usable products. It is important to understand the provenance of the data and how it has been modified in order to determine whether the resulting data product can be used for a given purpose. This will require:
How to Test
...
Evidence
Relevant requirements: R-ObservationAggregations, R-Provenance.
Relate observation data to the real world
Provide links between the observation and measurement data and the real world objects and/or subject of interest.
Why
Observation and measurement data usually represents a feature of interest related to Things: some thing or phenomenon in the real world that is being observed and measured. This link between the observation and measurement data and real world concepts and their feature of interest will help interpreting and using the data more effectively and will specify their relationships with concepts in the real world.
Intended Outcome
It should be possible for data consumers to interpret the meaning of data by referring to real world concepts and features of interest related to Things that are represented by the data.
Possible Approach to Implementation
Real world concept description metadata should include the following information:
How to Test
...
Evidence
Relevant requirements: R-SamplingTopology.
How to work with crowd-sourced observations
Crowd-sourced data should be published as structured data with metadata that allows processing and interpreting it.
Why
Some social media channels do not allow use of structured data. Crowd-sourced data should be published as structured data with explicit semantics and also links to other related data (wherever applicable).
Human-readable and machine-readable metadata data should be provided with the crowd-sourced data.
Contextual data related to crowd sourced data should be available. Quality, trust and density levels of crowd-sourced data varies and it is important that the data is provided with contextual information that helps people judge the probable completeness and accuracy of the observations.
Intended Outcome
It should be possible for humans to have access to contextual information that describes the quality, completeness and trust levels of crowd-sourced observations. It should be possible for machines to automatically process the contextual information that describes the quality, completeness and trust levels of crowd-sourced observations.
Possible Approach to Implementation
The crowd-sourced data should be published as structured data with metadata that allows processing and interpreting it. The contextual information related to crowd-sourced data may be provided according to the vocabulary that is being developed by the DWBP working group (see [[DWBP]] Best Practice 7: Provide data quality information).
How to Test
...
Evidence
Relevant requirements: R-HumansAsSensors.
How to publish (and consume) sensor data streams
The overall (and common) features of a sensor data stream must be described by metadata
Why
Providing explicit metadata and semantic descriptions about the common features of a sensor data stream allows user agents to avoid adding repetitive information to individual data items and also allows to automatically discover sensor data streams on the Web and/or to understand (for human users) and interpret (for machine agents) the common features of sensor data streams.
Intended Outcome
Possible Approach to Implementation
The sensor data stream metadata should include the following overall features of a dataset:
The information above should be included both in the human understandable and the machine interpretable forms of metadata.
The machine readable version of the discovery metadata may be provided according to models such as Stream Annotation Ontology (SAO)
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-Streamable, R-TemporalReferenceSystem, R-TimeSeries.
For data to be on the web the resources it describes need to be connected, or linked, to other resources. The connectedness of data is one of the fundamentals of the Linked Data approach that these best practices build upon. The 5-star rating for Linked Open Data asserts that to achieve the fifth star you must "link your data to other data to provide context". The benefits for consumers and publishers of linking to other data are listed as:
Just like any type of data, spatial data benefits massively from linking when publishing on the web.
The widespread use of links within data is regarded as one of the most significant departures from contemporary practices used within SDIs.
Crucially, the use of links is predicated on the ability to identify the origin and target, or beginning and end, of the link. Best Practice 1: Use globally unique identifiers for entity-level resources is a prerequisite.
This section extends [[DWBP]] by providing best practices that are concerned with creating links between the resources described inside datasets. Best practices detailing the use of links to support discovery are provided in .
[[DWBP]] identifies Linkability as one of the benefits gained from implementing the Data on the Web best practices (see [[DWBP]] Data Identification Best Practice 11 Use persistent URIs as identifiers of datasets and Best Practice 12 Use persistent URIs as identifiers within datasets). However, no discussion is provided about how to create the links that the use those persistent URIs.
Make your entity-level links visible on the web
The data should be published with explicit links to other resources.
Why
Exposing entity-level links to web applications, user-agents and web crawlers allows the relationships between resources to be found without the data user needing to download the entire dataset for local analysis. Entity-level links provide explicit description of the relationships between resources and enable users to find related data and determine whether the related data is worth accessing. Entity-level links can be used to combine information from different sources; for example, to determine correlations in statistical data relating to the same location.
Intended Outcome
Possible Approach to Implementation
To provide explicit entity-level links:
The use of Linksets needs further discussion as evidence indicates that it is not yet a widely adopted best practice. It may be appropriate to publish such details in a Note co-authoried with the DWBP WG.
[[gml]] adopted the [[xlink11]] standard to represent links between resources. At the time of adoption, XLink was the only W3C-endorsed standard mechanism for describing links between resources within XML documents. The Open Geospatial Consortium anticipated broad adoption of XLink over time - and, with that adoption, provision of support within software tooling. While XML Schema, XPath, XSLT and XQuery etc. have seen good software support over the years, this never happened with XLink. The authors of GML note that given the lack of widespread support, use of Xlink within GML provided no significant advantage over and above use a bespoke mechanism tailored to the needs of GML.
[[void]] provides guidance on how to discover VoID descriptions (including Linksets)- both by browsing the VoID dataset descriptions themselves to find related datasets, and using /.well-known/void
(as described in [[RFC5758]]).
How would a (user) agent discover that these 3rd-party link-sets existed? Is there evidence of usage in the wild?
Does the [[beacon]] link dump format allow the use of wild cards / regex in URLs (e.g. URI Template as defined in [[RFC6570]]?
The examples contain only outline information; further details must be added.
How to Test
...
Evidence
Relevant requirements: {... hyperlinked list of use cases ...}
Provide meaningful links
When providing a link, a data publisher should opt for a level of formal and meaningful semantics that helps data consumers to decide if the target resource is relevant to them.
Why
Formal and meaningful semantics may help to provide explicit specifications that describe the intended meaning of the relationships between the resources.
Providing details of the semantic relationship inferred by a link enables a data user to evaluate whether or not the target resource is relevant to them. Describing the affordances of the target resource (e.g. what that resource can do or be used for) helps the data user to determine whether it is worth following the link.
Intended Outcome
The links provided for the data should allow different data consumers and applications to determine the relevance of a target resource to them. The links should be precise and explicit.
How do we know what is at the end of a link - and what can I do with it / can it do for me (e.g. the 'affordances' of the target resource).
How to describe the 'affordances' of the target resource?
Possible Approach to Implementation
Ensure that the type of relationship used to link between resources is explicitly identified. Provide resolvable definitions of those relationship types.
Please refer to Best Practice 13: Assert known relationships for details of relationship types that may be used to describe spatial links (e.g. geographical, hierarchical, topological etc.). [[DWBP]] Section 9.9 Data Vocabularies provides futher information on use of relationship types described in well-defined vocabularies (see [[DWBP]] Best Practice 16: Use standardized terms and Best Practice 17: Reuse vocabularies).
How to Test
...
Evidence
Relevant requirements: R-Linkability
Link to spatial Things
Create durable links by connecting Spatial Things.
Why
Links enable a network of related resources to be connected together. For those connections to remain useful over a long period of time, both origin and target resources need to have durable identifiers. Typically, it is the SpatialThings that are given durable identifiers (see Best Practice 1: Use globally unique identifiers for entity-level resources) whereas the information resources that describe them (e.g. geometry objects) may be replaced by new versions.
When describing the relationships between related spatial resources, the links should connect SpatialThings.
Intended Outcome
Providing machine-interpretable and/or human-readable durable links between SpatialThings.
This best practice is concerned with the connections between SpatialThings. When describing an individual SpatialThing itself, it is often desirable to decompose the information into several uniquely identified objects. For example, the geometry of an administrative area may be managed as a separate resource due to the large number of geometric points required to describe the boundary.
Also note that in many cases, different identifiers are used to describe the SpatialThing and the information resource that describes that SpatialThing. For example, within DBpedia, the city of Sapporo, Japan, is identified as http://dbpedia.org/resource/Sapporo
, while the information resource that describes this resource is identified as http://dbpedia.org/page/Sapporo
. Care should be taken to select the identifier for the SpatialThing rather than the information resource that describes it; in the example above, this is http://dbpedia.org/resource/Sapporo
.
Possible Approach to Implementation
Refer to Best Practice 20: Provide meaningful links for further information on providing the semantics for links.
How to link to a resource as it was at a particular time?
How to Test
...
Evidence
Relevant requirements: {... hyperlinked list of use cases ...}
Link to resources with well-known or authoritative identifiers
Link your spatial resources to others that are commonly used.
Why
In Linked Data, commonly used resources behave like hubs in the network of interlinked resources. By linking your spatial resources to those in common usage it will be easier to discover your resources. For example, a data user interested in air quality data about the place they live might begin by searching for that place in popular data repositories such as GeoNames, Wikidata or DBpedia. Once the user finds the resource that describes the correct place, they can search for data that refers to the identified resource that, in this case, relates to air quality.
Furthermore, by referring to resources in common usage, it becomes even easier to find those resources as search engines will prioritize resources that are linked to more often.
Refer to Best Practice 24: Use links to find related data for more details about how a user might use links to discover data.
Intended Outcome
Data publishers relate their data to commonly used spatial resources using links. Data users can quickly find the data they are interested in by browsing information that is related to commonly used spatial resources.
Possible Approach to Implementation
The link must convey the semantics appropriate to the application (see Best Practice 13: Assert known relationships and Best Practice 20: Provide meaningful links for more information).
A list of sources of commonly used spatial resources is provided in .How to Test
...
Evidence
Relevant requirements: R-Crawlability, R-Discoverability.
Link your spatial resources to other related resources
Why
Relationships between resources with with spatial extent (i.e. size, shape, or position; SpatialThings) can often inferred from their spatial properties. For example, two resources might occur at the same location suggesting that they may be the same resource, or one resource might exist entirely within the bounds of another resource suggesting some kind of hierarchical containment relationship. However, reconciliation of such resources is complex: it requires some degree of understanding about the semantics of the two, potentially related, resources in order to determine how they are related, if at all.
Rather than expecting that data consumers will have sufficient context to relate resources, it is better for data publishers to assert the relationships that they know about. Not only does this provide data users with clear information about how resources are related, it removes the need for complex spatial processing (e.g. region connection calculus) to determine potential relationships between resources with spatial extent because those relations are already made explicit.
Where possible, existing identifiers should be reused when referring to resources (see Best Practice 2: Reuse existing (authoritative) identifiers when available). However, the use of multiple identifiers for the same resource is commonplace, for example, where data publishers from different jurisdictions refer to the same SpatialThing. In this special case, properties such as owl:sameAs
can be used to declare that multiple identifiers refer to the same resource. It is often the case that data published from different sources about the same physical or conceptual resource may provide different view points.
Intended Outcome
A data user can browse between (information about) related resources using the explicitly defined links to discover more information.
In the special case that the property owl:sameAs
is used to relate identifiers, information whose subject is one of the respective identifiers can be combined.
A data user should always exercise some discretion when working with data from different sources; for example, to determine whether the data is timely, accurate or trustworthy. Further discussion on this issue is beyond the scope of these best practices.
Possible Approach to Implementation
Given their in depth understanding of the content they publish, data publishers are in a good position to determine the relationships between related resources. Data publishers should analyze their data to determine related resources.
The mechanics of how to decide when two resources are the same are beyond the scope of this best practice. Tools (e.g. OpenRefine and Silk Linked Data Integration Framework) are available to assist with such reconciliation and may provide further insight.
The link must convey the semantics appropriate to the application (see Best Practice 13: Assert known relationships and Best Practice 20: Provide meaningful links for more information).
How to Test
...
Evidence
Relevant requirements: {... hyperlinked list of use cases ...}
[[DWBP]] provides best practices discussing the provision of metadata to support discovery of data at the dataset level (see [[DWBP]] section 9.2 Metadata for more details). This mode of discovery is well aligned with the practices used in Spatial Data Infrastructure (SDI) where a user begins their search for spatial data by submitting a query to a catalog. Once the appropriate dataset has been located, the information provided by the catalog enables the user to find a service end-point from which to access the data itself - which may be as simple as providing a mechanism to download the entire dataset for local usage or may provide a rich API enabling the users to request only the required parts for their needs. The dataset-level metadata is used by the catalog to match the appropriate dataset(s) with the user's query.
This section includes a best practice for including spatial information in the dataset metadata, for example, the spatial extent of the dataset.
However, one of the criteria for exposing data on the Web is that it can be discovered directly using search engines such as Google, Bing and Yandex. Current SDI approaches treat spatial data much like books in a library where you must first use the librarian's card catalog index to find the book on the shelf. As for other types of data on the Web, we want to be able to find spatial resources directly; we want to move beyond the two-step discovery approach of contemporary SDIs and find the words, sentences and chapters in a book without needing to check the card catalog first. Not only will this make spatial data far more accessible, it mitigates the problems caused when catalogs have only stale dataset metadata and removes the need for data users to be proficient with the query protocol of the dataset's service end-point in order to acquire the data they need.
In the wider Web, it is links that enable this direct discovery: from user-agents following a hyperlink to find related information to search engines using links to prioritise and refine search results. Whereas discusses the creation of links, this section is largely concerned with the use of those links to support discovery of the SpatialThings described in spatial datasets.
Related data to a spatial dataset and its individual data items should be discoverable by browsing the links
Why
In much the same way as the document Web allows one to find related content by following hyperlinks, the links between spatial datasets, SpatialThings described in those datasets and other resources on the Web enable humans and software agents to explore rich and diverse content without the need to download a collection of datasets for local processing in order to determine the relationships between resources.
Spatial data is typically well structured; datasets contain SpatialThings that can be uniquely identified. This means that spatial data is well suited to the use of links to find related content.
The emergency response to natural disasters is often delayed by the need to download and correlate spatial datasets before effective planning can begin. Not only is the initial response hampered, but often the correlations between resources in datasets are discarded once the emergency response is complete because participants have not been able to capture and republish those correlations for subsequent usage.
Intended Outcome
It should be possible for humans to explore the links between a spatial dataset (or its individual items) and other related data on the Web.
It should be possible for software agents to automatically discover related data by exploring the links between a spatial dataset (or its individual items) and other resources on the Web.
It should be possible for a human or software agent to determine which links are relevant to them and which links can be considered trustworthy.
What do we expect user-agents to do with a multitude of links from a single resource? A document hyperlink has just one target; but in data, a resource may be related to many things.
Possible Approach to Implementation
These "back-links" can be traversed to find related information and also help a publisher assess the value of their content by making it possible to see who is using (or referencing) their data.
How to Test
...
Evidence
Relevant requirements: {... hyperlinked list of use cases ...}
Make your entity-level data indexable by search engines
Search engines should receive a metadata response to a HTTP GET when dereferencing the link target URI.
Why
Current SDI approaches require a 2-step approach for discovery, beginning with a catalog query and then accessing the appropriate data service end-point.
Exposing data on the Web means that it can be discovered directly using search engines. This provides a number of benefits:
Search engines should be able to use links and URIs to discover indexable spatial data and to prioritize those spatial data collections within a search result.
Search engines use links to discover content to index and to prioritize that content within a search result.
Intended Outcome
Spatial data should be discoverable directly using a search engine query.
Spatial data is indexable by search engines; a search engine Web crawler should be able to obtain descriptive and machine interpretable metadata response to a HTTP GET when dereferencing the URL of a SpatialThing and to determine links to related data for the Web crawler to follow.
We make the assertion that data is not really 'on the web' until it's crawlable.
Possible Approach to Implementation
To make your entity-level data indexable by search engines:
More discussion is required on how to structure meaningful (spatial) queries with search engines (e.g. based on identifier, location, time etc.).
As more spatial datasets are published that provide structured markup to search engine Web crawlers enabling the content of those datasets to be indexed, the more likely that search engines will provide richer and more sophisticated search mechanisms to exploit that markup which will further improve the ability of users to find spatial data.
How to Test
...
Evidence
Relevant requirements: {... hyperlinked list of use cases ...}
Include spatial information in dataset metadata
The description of datasets that have spatial features should include explicit metadata about the spatial information
Why
It is often useful to provide metadata at the dataset-level. The dataset is the unit of governance for information, which means that details like license, ownership and maintenance regime etc. need only be stated once, rather than for every resource description it contains. Data that is not directly accessible due to commercial or privacy arrangements can also be publicized using summary metadata provided for the dataset. [[DWBP]] section 9.2 Metadata provides more details.
For spatial data, it is often necessary to describe the spatial details of the dataset - such as extent and resolution. This information is used by SDI catalog services that offer spatial query to find data.
Intended Outcome
Dataset metadata should include the information necessary to enable spatial queries within catalog services such as those provided by SDIs.
Dataset metadata should include the information required for a user to evaluate whether the spatial data is suitable for their intented application.
Possible Approach to Implementation
To include spatial information in datasets one can:
How to Test
...
Evidence
Relevant requirements: R-Discoverability, R-Compatibility, R-BoundingBoxCentroid, R-Crawlability, R-SpatialMetadata and R-Provenance.
Should content from this section be moved to [[DWBP]] section 9.11 Data Access?
SDIs have long been used to provide access to spatial data via web services; typically using open standard specifications from the Open Geospatial Consortium (OGC). With the exception of the Web Map Service, these OGC Web service specifications have not seen widespread adoption beyond the geospatial expert community. In parallel, we have seen widespread emergence of Web applications that use spatial data - albeit focusing largely on point-based data.
This section seeks to capture the best practices that have emerged from the wider Web community for accessing spatial data via the Web. While [[DWBP]] provides best practices discussing access to data using Web infrastructure (see [[DWBP]] section 9.11 Data Access), this section provides additional insight for publishers of spatial data. In particular, we look at how Application Program Interfaces (API) may be used to make it easy to work with spatial data.
The term API as used here refers to the combination of the set of operations provided and the data content exposed by a particular Web service end-point.
Publish data at the granularity you can support
Granularity of mechanisms provided to access access a dataset should be decided based on available resources
Why
Making data available on the Web requires data publishers to provide some form of access to the data. There are numerous mechanisms available, each providing varying levels of utility and incurring differing levels of effort and cost to implement and maintain. Publishers of spatial data should make their data available on the Web using affordable mechanisms in order to ensure long-term, sustainable access to their data.
Intended Outcome
Data is published on the Web in a mechanism that the data publisher can afford to implement and support throughout the anticipated lifetime of the data.
Possible Approach to Implementation
When determining the mechanism to be used provide Web access to data, publishers need to assess utility against cost. In order of increasing usefulness and cost:
[[DWBP]] indicates that when data that is logically organized as one container and is distributed across many URLs, accessing the data in bulk is useful. As a minimum, it should be possible for users to download data on the Web as a single resource; e.g. through bulk file formats.
A data publisher need not incur all the costs alone. Given motivation, third parties (such as the open data community or commercial providers) may provide value-added services that build on simple bulk-download or generalized query interfaces. While establishing such arrangements is beyond the scope of this document, it is important to note that the data publisher should consider the end-to-end data publication chain. For example, one may need to consider including conditions in the usage license about the timeliness of frequently changing data.
How to Test
...
Evidence
Relevant requirements: {... hyperlinked list of use cases ...}
Expose entity-level data through 'convenience APIs'
If you have a specific application in mind for publishing your data, tailor the spatial data API to meet that goal.
Why
When access to spatial data is provided by bulk download or through a generalized query service, users need to understand how the data is structured in order to work effectively with that data. Given that spatial data may be arbitrarily complex, this burdens the data user with significant effort before they can even perform simple queries. Convenience APIs are tailored to meet a specific goal; enabling a user to engage with arbitrarily complex data structures using (a set of) simple queries. As stated in [[DWBP]], an API offers the greatest flexibility and processability for consumers of data; for example, enabling real-time data usage, filtering on request, and the ability to work with the data at an atomic level. If your dataset is large, frequently updated, or highly complex, a convenience API is likely to be helpful.
Intended Outcome
Possible Approach to Implementation
This best practice extends [[DWBP]] Best Practice 26: Use an API.
The API should be targeted to deliver a coherent set of functions that are design to meet the needs of common tasks. Work with your developer community to determine the tasks that they want to do with the data or use your experience to infer those tasks. Design your API to help developers achieve those tasks. API operations may be one of:
Include light-weight queries and/or operations in the API that help users start working with your data quickly. The complexity of the data should be hidden from inexperienced users.
The API should offer both machine readable data and human readable HTML that includes the structured metadata required by search engines seeking to index content (see Best Practice 25: Make your entity-level data indexable by search engines for more details).
When designing the API, each operation or query should be focused on achieving a single outcome; this should ensure that the API remains light-weight. Groups of API operations may be chained together (like unix pipes) in order to complete complex tasks.
When designing APIs, data publishers must be aware of the constraints of operating in a Web environment. Providing access to large datasets, such as coverages, is a particular challenge. The API should provide mechanisms to request subsets of the dataset that are a convenient size for client applications to manage.
APIs will often evolve as more data is added or usage changes; queries or operations may be changed or new ones added. APIs should be versioned in order to insulate downstream client applications from these changes.
Regarding API design, also see [[DWBP]] Best Practice 21: Use Web Standardized Interfaces.
In the geospatial domain there are a lot of WFS services providing data. A RESTful API as a wrapper or a shim layer could be created around WFS services. GML content from the WFS service could be provided in this way as linked data or another Web friendly format. This approach is similar to the use of Z39.50 in the library community; that protocol is still used but 'modern' Web sites and web services are wrapped around it. Adding URIs in (GML) data exposed by a WFS is pretty straightforward, but making the data 'webby' is harder. There are examples of this approach of creating a convenience API on top of WFS, but rather than adapting the WFS (GML) output, it may be more effective to provide an alternative 'Linked Data friendly' access path to the data source; creating a new, complementary service endpoint e.g. expose the underpinning postGIS database via SPARQL endpoint (using something like ontop-spatial) and Linked Data API.
How to Test
...
Evidence
Relevant requirements: R-Compatibility, R-LightweightAPI.
APIs should be self-describing
APIs should provide a discoverable description of their contents and how to interact with it. Ideally this should be a machine-readable description.
Why
Good information about an API lets potential users determine if the API is a good resource to use for a given task and how to use it, as well as letting machines find out how to interact with it.
Intended Outcome
The API description enables a user to construct meaningful queries against the API for data that the API can actually provide.
A user can find the API description; e.g. via referral from the API or via a search engine query.
Possible Approach to Implementation
This best practice extends [[DWBP]] Best Practice 25: Document your API.
API documentation should describe the data(set) it exposes; the APIs operations and parameters; what kind of format / payload the API offers; and API versioning.
As a minimum, you should provide a human readable description of your APIs so that developers can read about how the API works. We recommend providing machine readable API documentation that can be used by software development tools to help developers build API client software. API documentation should be crawlable via search engines.
The API documentation should be generated from the API code so that the documentation can be easily kept upto date.
Where a parameter domain is bound to a set of values (e.g. value range, spatial or temporal extent, controlled vocabulary etc.), the API documentation or the API itself should indicate the set of values that may be used in order to help users request data that is actually available.
The API documentation should be discoverable from the API itself.
How to Test
...
Evidence
Relevant requirements: R-Discoverability ... others to be added.
Include search capability in your data access API
If you publish an API to access your data, make sure it allows users to search for specific data.
Should BP "Include search capability in your data access API" move to ?
Why
It can be hard to find a particular resource within a dataset, requiring either prior knowledge of the respective identifier for that resource and/or some intelligent manual guesswork. It is likely that users will not know the URI of the resource that they are looking for- but may know (at least part of) the name of the resource or some other details. A search capability will help a user to determine the identifier for the resource(s) they need using the limited information they have.
Intended Outcome
A user can do a text search on the name, label or other property of an entity that they are interested in to help them find the URI of the related resource.
Possible Approach to Implementation
to be added
How to Test
...
Evidence
Relevant requirements: {... hyperlinked list of use cases ...}
There are several Best Practices in this document dealing with large datasets and coverages:
Should we discuss scaleability issues here?
The Spatial Data on the Web working group is working on recommendations about the use of formats for publishing spatial data on the web, specifically about selecting the most appropriate format. There may not be one most appropriate format: which format is best may depend on many things. This section gives two tables that both aim to be helpful in selecting the right format in a given situation. These tables may in future be merged or reworked in other ways.
The first table is a matrix of the common formats, showing in general terms how well these formats help achieve goals such as discoverability, granularity etc.
Format | Openness | Binary/text | Usage | Discoverability | Granular links | CRS Support | Verbosity | Semantics vocab? | Streamable | 3D Support |
---|---|---|---|---|---|---|---|---|---|---|
ESRI Shape | Open'ish | Binary | Geometry only attributes and metadata in linked DB files | Poor | In Theory? | Yes | Lightweight | No | No | Yes |
GeoJSON | Open | Text | Geometry and attributes inline array | Good ? | In Theory? | No | Lightweight | No | No | No |
DXF | Proprietary | Binary | Geometry only attributes and metadata in linked DB files | Poor | Poor | No | Lightweight | No | No | Yes |
GML | Open | Text | Geometry and attributes inline or xlinked | Good ? | In Theory ? | Yes | Verbose | No | No | Yes |
KML | Open | Text | Geometry and attributes inline or xlinked | Good ? | In Theory ? | No | Lightweight | No | Yes? | Yes |
The second table is much more detailed, listing the currently much-used formats for spatial data, and scoring each format on a lot of detailed aspects.
GML | GML-SF0 | JSON-LD | GeoSPARQL (vocabulary) | schema.org | GeoJSON | KML | GeoPackage | Shapefile | GeoServices / Esri JSON | Mapbox Vector Tiles | |
---|---|---|---|---|---|---|---|---|---|---|---|
Governing Body | OGC, ISO | OGC | W3C | OGC | Google, Microsoft, Yahoo, Yandex | Authors (now in IETF process) | OGC | OGC | Esri | Esri | Mapbox |
Based on | XML | GML | JSON | RDF | HTML with RDFa, Microdata, JSON-LD | JSON | XML | SQLite, SF SQL | dBASE | JSON | Google protocol buffers |
Requires authoring of a vocabulary/schema for my data (or use of existing ones) | Yes (using XML Schema) | Yes (using XML Schema) | Yes (using @context) | Yes (using RDF schema) | No, schema.org specifies a vocabulary that should be used | No | No | Implicitly (SQLite tables) | Implicitly (dBASE table) | No | No |
Supports reuse of third party vocabularies for features and properties | Yes | Yes | Yes | Yes | Yes | No | No | No | No | No | No |
Supports extensions (geometry types, metadata, etc.) | Yes | No | Yes | Yes | Yes | No (under discussion in IETF) | Yes (rarely used except by Google) | Yes | No | No | No |
Supports non-simple property values | Yes | No | Yes | Yes | Yes | Yes (in practice: not used) | No | No | No | No | No |
Supports multiple values per property | Yes | No | Yes | Yes | Yes | Yes (in practice: not used) | No | No | No | No | No |
Supports multiple geometries per feature | Yes | Yes | n/a | Yes | Yes (but probably not in practice?) | No | Yes | No | No | No | No |
Support for Coordinate Reference Systems | any | any | n/a | many | WGS84 latitude, longitude | WGS84 longitude, latitude with optional elevation | WGS84 longitude, latitude with optional elevation | many | many | many | WGS84 spherical mercator projection |
Support for non-linear interpolations in curves | Yes | Only arcs | n/a | Yes (using GML) | No | No | No | Yes, in an extension | No | No | No |
Support for non-planar interpolations in surfaces | Yes | No | n/a | Yes (using GML) | No | No | No | No | No | No | No |
Support for solids (3D) | Yes | Yes | n/a | Yes (using GML) | No | No | No | No | No | No | No |
Feature in a feature collection document has URI (required for ★★★★) | Yes, via XML ID | Yes, via XML ID | Yes, via @id keyword | Yes | Yes, via HTML ID | No | Yes, via XML ID | No | No | No | No |
Support for hyperlinks (required for ★★★★★) | Yes | Yes | Yes | Yes | Yes | No | No | No | No | No | No |
Media type | application/gml+xml | application/gml+xml with profile parameter | application/ld+json | application/rdf+xml, application/ld+json, etc. | text/html | application/vnd.geo+json | application/vnd.google-earth.kml+xml, application/vnd.google-earth.kmz | - | - | - | - |
Remarks | comprehensive and supporting many use cases, but requires strong XML skills | simplified profile of GML | no support for spatial data, a GeoJSON-LD is under discussion | GeoSPARQL also specifies related extension functions for SPARQL; other geospatial vocabularies exist, see ??? | schema.org markup is indexed by major search engines | supported by many mapping APIs | focussed on visualisation of and interaction with spatial data, typically in Earth browsers liek Google Earth | used to support "native" access to geospatial data across all enterprise and personal computing environments, including mobile devices | supported by almost all GIS | mainly used via the GeoServices REST API | used for sharing geospatial data in tiles, mainly for display in maps |
As per http://www.w3.org/DesignIssues/LinkedData.html item 4, it's useful for people to link their data to other related data. In this context we're most frequently talking about either Spatial Things and/or their geometry.
There are many useful sets of identifiers for spatial things and which ones are most useful will depend on context. This involves discovering relevant URIs that you might want to connect to.
Relevant URIs for spatial things can be found in many places. This list gives the first places you should check:
Finding out which national open spatial datasets are available and how they can be accessed, currently requires prior knowledge in most cases, because these datasets are often not easily discoverable. Look for national dataportals / geoportals such as Nationaal Georegister (Dutch national register of geospatial datasets) or Dataportaal van de Nederlandse overheid (Dutch national governmental data portal).
As an example, let's take Edinburgh. In some recent work with the Scottish Government, we have an identifier for the City of Edinburgh Council Area - i.e. the geographical area that Edinburgh City Council is responsible for:
http://statistics.gov.scot/id/statistical-geography/S12000036
(note that this URI doesn't resolve yet but it will in the next couple of months once the system goes properly live)
The UK government provides an identifier for Edinburgh and/or information about it that we might want to link to:
http://statistics.data.gov.uk/id/statistical-geography/S12000036
The Scottish identifier is directly based on this one, but the Scottish Government wanted the ability to create something dereferenceable, potentially with additional or different info to the data.gov.uk one. These two are owl:sameAs.
DBpedia also includes a resource about Edinburgh. Relationship: "more or less the same as" but probably not the strict semantics of owl:sameAs.
http://data.ordnancesurvey.co.uk/id/50kGazetteer/81482
This Edinburgh resource is found by querying the OS gazetteer search service for 'Edinburgh' then checking the labels of the results that came up. OS give it a type of 'NamedPlace' and give it some coordinates.
http://data.ordnancesurvey.co.uk/id/50kGazetteer/81483
This Edinburgh airport resource was also found by the same OS gazetteer search service for 'Edinburgh'. This is clearly not the same as the original spatial thing, but you might want to say something like 'within' or 'hasAirport'.
http://data.ordnancesurvey.co.uk/id/7000000000030505
This resource is in the OS 'Boundary Line' service that contains administrative and statistical geography areas in the UK. It's probably safe to say the original identifier is owl:sameAs this one.
http://sws.geonames.org/2650225/
This is the Geonames resource for Edinburgh found using the search service: http://api.geonames.org/search?name=Edinburgh&type=rdf&username=demo
Once you have found a place in GeoNames, there are other useful services to find things that are nearby.
TO DO: cross reference the Best Practices to determine if any do not have an associated Requirement
UC Requirements | Best Practice |
---|---|
R-BoundingBoxCentroid | Best Practice 7: How to describe geometry |
R-Compatibility | Best Practice 28: Expose entity-level data through 'convenience APIs' |
R-Compressible | Best Practice 7: How to describe geometry |
R-CoverageTemporalExtent | Not a Best Practice deliverable |
R-Crawlability | Best Practice 25: Make your entity-level data indexable by search engines |
R-CRSDefinition | Best Practice 7: How to describe geometry |
R-DateTimeDuration | Not a Best Practice deliverable |
R-DefaultCRS | Best Practice 8: Specify Coordinate Reference System for high-precision applications |
R-DifferentTimeModels | Not a Best Practice deliverable |
R-Discoverability |
Best Practice 24: Use links to find related data Best Practice 25: Make your entity-level data indexable by search engines Best Practice 26: Include spatial information in dataset metadata |
R-DynamicSensorData | Not a Best Practice deliverable |
R-EncodingForVectorGeometry | Best Practice 7: How to describe geometry |
R-ExSituSampling | Not a Best Practice deliverable |
R-Georectification | Not a Best Practice deliverable |
R-GeoreferencedData | Not a Best Practice deliverable |
R-HumansAsSensors | Not a Best Practice deliverable |
R-IndependenceOnReferenceSystems | Best Practice 7: How to describe geometry |
R-LightweightAPI |
Not a Best Practice deliverable, but referenced in - Best Practice 28: Expose entity-level data through 'convenience APIs' |
R-Linkability |
Best Practice 1: Use globally unique identifiers for entity-level resources Best Practice 19: Make your entity-level links visible on the web Best Practice 20: Provide meaningful links |
R-MachineToMachine |
Best Practice 6: Provide a minimum set of information for your intended application Best Practice 7: How to describe geometry Best Practice 9: How to describe relative positions Best Practice 10: How to describe positional (in)accuracy Best Practice 11: How to describe properties that change over time Best Practice 12: Use spatial semantics for spatial Things Best Practice 13: Assert known relationships Best Practice 18: How to publish (and consume) sensor data streams Best Practice 25: Make your entity-level data indexable by search engines |
R-MobileSensors |
Not a Best Practice deliverable, but referenced in - Best Practice 12: Use spatial semantics for spatial Things |
R-4DModelSpaceTime | Not a Best Practice deliverable |
R-ModelReuse | Not a Best Practice deliverable |
R-MovingFeatures |
Not a Best Practice deliverable, but referenced in - Best Practice 11: How to describe properties that change over time Best Practice 12: Use spatial semantics for spatial Things |
R-MultilingualSupport | Best Practice 6: Provide a minimum set of information for your intended application (provision of multi-lingual labels) |
R-MultipleTypesOfCoverage | Not a Best Practice deliverable |
R-NominalObservations | Not a Best Practice deliverable |
R-NominalTemporalReferences | Not a Best Practice deliverable |
R-NonGeographicReferenceSystem | Not a Best Practice deliverable |
R-ObservationAggregations |
Not a Best Practice deliverable, but referenced in - Best Practice 15: How to describe sensor data processing workflows |
R-ObservedPropertyInCoverage |
Not a Best Practice deliverable, but referenced in - Best Practice 14: How to provide context required to interpret observation data values |
R-Provenance |
Best Practice 15: How to describe sensor data processing workflows Best Practice 26: Include spatial information in dataset metadata |
R-QualityMetadata |
Not a Best Practice deliverable, but referenced in - Best Practice 10: How to describe positional (in)accuracy Best Practice 14: How to provide context required to interpret observation data values |
R-ReferenceDataChunks |
Not a Best Practice deliverable, but referenced in - Best Practice 27: Publish spatial data at the level of granularity you can support |
R-ReferenceExternalVocabularies |
Not a Best Practice deliverable Also see [[DWBP]] Best Practice 16: Re-use vocabularies |
R-SamplingTopology |
Not a Best Practice deliverable, but referenced in - Best Practice 9: How to describe relative positions Best Practice 13: Assert known relationships Best Practice 16: How to relate observation data to the real world |
R-SensorMetadata |
Not a Best Practice deliverable, but referenced in - Best Practice 14: How to provide context required to interpret observation data values |
R-SensingProcedure |
Not a Best Practice deliverable, but referenced in - Best Practice 14: How to provide context required to interpret observation data values |
R-SpaceTimeMultiScale | Not a Best Practice deliverable |
R-SpatialMetadata |
Best Practice 7: How to describe geometry Best Practice 26: Include spatial information in dataset metadata |
R-SpatialRelationships | Best Practice 13: Assert known relationships |
R-SpatialOperators | Best Practice 13: Assert known relationships |
R-SpatialVagueness |
Not a Best Practice deliverable, but referenced in - Best Practice 6: Provide a minimum set of information for your intended application |
R-SSNLikeRepresentation | Not a Best Practice deliverable |
R-Streamable |
Best Practice 11: How to describe properties that change over time Best Practice 18: How to publish (and consume) sensor data streams Best Practice 27: Publish spatial data at the level of granularity you can support |
R-3DSupport | Best Practice 7: How to describe geometry |
R-TimeDependentCRS | Best Practice 7: How to describe geometry |
R-TemporalReferenceSystem |
Not a Best Practice deliverable, but referenced in - Best Practice 18: How to publish (and consume) sensor data streams |
R-TemporalVagueness | Not a Best Practice deliverable |
R-TilingSupport | Best Practice 7: How to describe geometry (performance considerations) |
R-TimeSeries |
Not a Best Practice deliverable, but referenced in - Best Practice 18: How to publish (and consume) sensor data streams |
R-UncertaintyInObservations |
Not a Best Practice deliverable, but referenced in - Best Practice 14: how to provide context required to interpret observation data values |
R-UpdateDatatypes | Not a Best Practice deliverable |
R-UseInComputationalModels | Not a Best Practice deliverable |
R-Validation | CLARIFICATION required; is this in scope? |
R-ValidTime | Not a Best Practice deliverable |
R-VirtualObservations | Not a Best Practice deliverable |
Coverage: A property of a SpatialThing whose value varies according to location and/or time.
Commercial operator: Search engine or similar company that operates on the Web and generates indexes from the information found in web pages and data published on the Web.
CRS: Coordinate Reference System, a coordinate-based local, regional or global system used to locate geographical entities.
GIS: Geographic information system or geographical information system, a system designed to capture, store, manipulate, analyze, manage, and present all types of spatial or geographical data. (ref. Geographic information system)
Geospatial expert: A person with a high degree of knowledge about SDIs.
Public sector: The part of the economy concerned with providing various government services. (ref. Public sector)
IoT: Internet of Things, the network of physical objects or "things" embedded with electronics, software, sensors, and network connectivity, which enables these objects to be controlled remotely and to collect and exchange data.
SDI: Spatial Data Infrastructure, a data infrastructure implementing a framework of geographic data, metadata, users and tools that are interactively connected in order to use spatial data in an efficient and flexible way. (ref. Spatial Data Infrastructure (SDI))
SpatialThing: Anything with spatial extent, i.e. size, shape, or position. e.g. people, places, bowling balls, as well as abstract areas like cubes (ref. W3C WGS84 Geo Positioning vocabulary (geo))
TemporalThing: Anything with temporal extent, i.e. duration. e.g. the taking of a photograph, a scheduled meeting, a GPS time-stamped trackpoint (ref. W3C WGS84 Geo Positioning vocabulary (geo))
Web developer: A programmer who specializes in, or is specifically engaged in, the development of World Wide Web applications, or distributed network applications that are run over HTTP from a Web server to a Web browser. (ref. Web developer)
WCS: Web Coverage service, a service offering multi-dimensional coverage data for access over the Internet. (ref. OGC WCS)
WFS: Web Feature Service, a standardized HTTP interface allowing requests for geographical features across the web using platform-independent calls. (ref. OGC WFS)
WMS: Web Map Service, a standardized HTTP interface for requesting geo-registered map images from one or more distributed geospatial databases. (ref. OGC WMS)
WKT: Well Known Text, a text markup language for representing vector geometry objects on a map, spatial reference systems of spatial objects and transformations between spatial reference systems. (ref. Well-known text)
WPS: Web Processing Service, an interface standard which provides rules for standardizing inputs and outputs (requests and responses) for invoking geospatial processing services, such as polygon overlay, as a Web service. (ref. OGC WPS)
The editors gratefully acknowledge the contributions made to this document by all members of the working group and the chairs: Kerry Taylor and Ed Parsons.