Copyright © 2020 OGC & W3C ® (MIT, ERCIM, Keio, Beihang), W3C liability, trademark, W3C and OGC document use rules apply.
This document advises on best practices related to the publication of spatial data on the Web; the use of Web technologies as they may be applied to location. The best practices presented here are intended for practitioners, including Web developers and geospatial experts, and are compiled based on evidence of real-world application. These best practices suggest a significant change of emphasis from traditional Spatial Data Infrastructures by adopting an approach based on general Web standards. As location is often the common factor across multiple datasets, spatial data is an especially useful addition to the Web of data.
This document is considered to be complete and is expected to be the final release by the Spatial Data on the Web Working Group. The editors would like to thank everyone for their feedback. Comments received during final review triggered a couple of updates since the previous release on 11 May 2017 (see for details). This document is published as a W3C Working Group Note and as an OGC Best Practice in accordance with W3C Policy section 6.8 Publishing a Working Group or Interest Group Note and OGC Policies and Procedures section 8.6 Best Practices Documents.
For OGC: This document defines an OGC Best Practice on a particular technology or approach related to an OGC standard. This document is not an OGC Standard and may not be referred to as an OGC Standard. However, this document is an official position of the OGC membership on this particular technology topic. This document was prepared by the Spatial Data on the Web Working Group (SDWWG) — a joint W3C-OGC project (see charter) — following W3C conventions.
Increasing numbers of Web applications provide a means of accessing data. From simple visualizations to sophisticated interactive tools, there is a growing reliance on data. The open data movement has led to many national, regional and local governments publishing their data through portals. Scientific and cultural heritage data is increasingly published on the Web for reuse by others. Crowd-sourced and social media data are abundant on the Web. Sensors, connected devices and services from domains such as energy, transport, manufacturing and healthcare are becoming commonly integrated using the Web as a common data sharing platform.
The Data on the Web Best Practices [[DWBP]] provide a set of recommendations that are applicable to the publication of all types of data on the Web. Those best practices cover aspects including data formats, data access, data identifiers, metadata, licensing and provenance.
Within this document, we are concerned with spatial data: data that describes anything with spatial extent (i.e. size, shape or position). Spatial data is also known as location information.
Similarly to the challenges identified in [[DWBP]] relating to publishing data on the Web, and therefore not making use of the full potential of the Web as a data sharing platform, there is a lack of consistency in how people publish spatial data.
It is not that there is a lack of spatial data on the Web; the maps, satellite and street level images offered by search engines are familiar and there are many more examples of spatial data being used in Web applications.
However, the data that has been published is difficult to find and often problematic to access for non-specialist users. The key problems we are trying to solve in this document are discoverability, accessibility and interoperability. Our overarching goal is to enable spatial data to be integrated within the wider Web of data; providing standard patterns and solutions that help solve these problems.
Our goal in writing this best practice document is to support the practitioners who are responsible for publishing their spatial data on the Web or developing tools to make it easy for others to work with spatial data.
We expect readers to be familiar both with the fundamental concepts of the architecture of the Web [[WEBARCH]] and the generalized best practices related to the publication and usage of data on the Web [[DWBP]].
We aim to provide two primary pathways into these best practices:
In each case, we aim to help them provide incremental value to their data through application of these best practices.
This document provides a wide range of examples that illustrate how these best practices may be applied using specific technologies. We do not expect readers to be familiar with all the technologies used herein; rather that readers can identify with the activities being undertaken in the various examples and, in doing so, find relevant technologies that they are already aware of or discover technologies that are new to them.
All the best practices described in [[DWBP]] are relevant to the publication of spatial data on the Web. Some, such as [[DWBP]] Best Practice 4: Provide data license information need no further elaboration in the context of spatial data. However, other best practices from [[DWBP]] are further refined in this document to provide more specific guidance for spatial data.
The best practices described below are intended to meet requirements derived from the scenarios in [[SDW-UCR]] that describe how spatial data is commonly published and used on the Web. However, working with spatial data can rapidly become complex — especially for critical decision making where mis-use of data can present risks. These best practices are intended to make it easier to work with spatial data on the Web, but do not attempt to cover all aspects of spatial data usage.
In line with the charter, this document provides advice on:
As stated in the charter, discussion of activities relating to rendering spatial data as maps is explicitly out of scope.
The original intent of these best practices was to cover aspects relating to all types of spatial data, for example: the arrangement of cells on a microscope slide; the position of things on the surface of the Earth, the Moon, Mars or other celestial bodies; the position of planets in the solar system etc. However, due to resource limitations these best practices deal almost exclusively with geospatial data; data about things that are implicitly or explicitly located relative to the Earth. That said, many of the best practices are applicable to wider spatial data concerns. In the remainder of the document, we simply refer to spatial data for brevity.
We extend [[DWBP]] to cover aspects specifically relating to spatial data, introducing new best practices only where necessary. In particular, we consider the individual resources, or Spatial Things, that are described within a dataset.
In this document, we focus on the needs of data publishers and the developers that provide tools for them. That said, we recognize that value can only be gained from publishing the spatial data when people use it! Although we do not directly address the needs of those users, we ask that data publishers and developers reading this document do not forget about them; moreover, that they always consider the needs of users when publishing spatial data or developing the supporting tools. All our best practices are intended to provide guidance about publishing spatial data to improve ease of use.
Neither the wider topic of spatial data management nor Spatial Data Infrastructures are covered. We assume that your spatial data already exists and will be available from one of the following places:
If your spatial data is managed within a software system it is likely that you will be able to access that data through one or more of the methods identified above; as structured data from a bulk extract (e.g. a “data dump”), via direct access to the underpinning data repository or through a bespoke or standards-compliant API provided by the system.
Each of the four starting points outlined above have their own challenges, but working with plain text documents can be particularly tricky as you will need to parse the natural language to identify the Spatial Things and their properties before you can proceed any further. Natural Language Processing (NLP) is a complex topic in its own right and is beyond the scope of this best practice document. We will assume that you’ve already completed this step and have parsed any plain documents into structured data records of some kind.
The best practices described in this document are compiled based on evidence of real-world application in production environments. By ‘production environment’ we mean a case where spatial data has been delivered on the Web with the intention of being used by end users and with a quality level expected from such data. Where the Working Group has identified issues that inhibit the use or interoperability of spatial data on the Web, yet no evidence of real-world application is available, the editors present these issues to the reader for consideration, along with any approaches recommended by the Working Group. Please see for further details. Such recommendations are clearly distinguished as such to ensure that they are not confused with evidence-based best practice.
The normative element of each best practice is the intended outcome. Possible implementations are suggested and, where appropriate, these recommend the use of a particular technology.
We intend this best practice to be durable; that is that the best practices remain relevant for many years to come as the specific technologies change. However, to provide actionable guidance, i.e. to provide readers with the technical information they need to get their spatial data on the Web, we try to balance between durable advice (that is necessarily general) and examples using currently available technologies that illustrate how these best practices can be implemented. We expect that readers will continue to be able to derive insight from the examples even when those specifically mentioned technologies are no longer in common usage, understanding that technology ‘y’ has replaced technology ‘x’.
There are many situations where the location of a person is very useful; from using a taxi hailing service to geocoding a selfie. Technology makes this location information easy to collect and share. However, spatial data has particular characteristics which makes its use potentially more complex. For example, a single location of an anonymous tracked mobile phone may cause few privacy concerns, however the same phone tracked over a few days could provide enough information to make the identification of its user possible. Like all personally identifiable information, great care must be taken as the collection, management and security of such information is the subject of legal frameworks. We do not attempt to provide guidance as to legal aspects of storing potentially personally identifiable spatial information; expert legal advice should be obtained. In summary: legal and privacy considerations relating to spatial data are out of scope.
This document contains a variety of best practices related to the publication and usage of spatial data on the Web. First, it continues with several more in-depth introductions on Spatial Things and geometry, coverages, spatial relations, coordinate reference systems, linked data, and Spatial Data Infrastructures. After that, the best practices themselves are described.
The following best practices can be found in this document:
This document uses a unique abbreviation ("prefix") for each RDF namespace and XML namespace listed in this section. The namespace IRI can always be determined from the declaration of the namespace abbreviation.
The following RDF namespace prefixes are used within this document. Use of a namespace does not imply endorsement of the associated data platform or vocabulary.
||http://data.ordnancesurvey.co.uk/ontology/admingeo/||Ordnance Survey's Administrative geography and civil voting area ontology|
||http://www.w3.org/ns/adms#||Asset Description Metadata Schema (ADMS) [[VOCAB-ADMS]]|
||http://bag.basisregistraties.overheid.nl/def/bag#||Dutch Government Base Registry Adressen en Gebouwen (BAG)|
||http://www.w3.org/ns/dcat#||Data Catalog Vocabulary (DCAT) [[VOCAB-DCAT]]|
||http://purl.org/dc/terms/||Dublin Core Metadata Initiative (DCMI) Metadata Terms [[DCTERMS]]|
||http://www.w3.org/ns/dqv#||DWBP Data Quality Vocabulary (DQV) [[VOCAB-DQV]]|
||http://xmlns.com/foaf/0.1/||FOAF Vocabulary Specification|
||http://data.europa.eu/930/||GeoDCAT-AP: A geospatial extension for the DCAT application profile for data portals in Europe [[GeoDCAT-AP]]|
||http://data.ordnancesurvey.co.uk/ontology/geometry/||Ordnance Survey's Geometry Ontology|
||http://www.georss.org/georss/||GeoRSS :: Geographically Encoded Objects for RSS feeds [[GeoRSS]], Geo OWL encoding|
||http://www.opengis.net/ont/geosparql#||GeoSPARQL — A Geographic Query Language for RDF Data [[GeoSPARQL]]|
||http://www.opengis.net/ont/gml#||GeoSPARQL — A Geographic Query Language for RDF Data [[GeoSPARQL]]|
||http://www.w3.org/2016/05/ldqd#||DWBP Data Quality Vocabulary (DQV) [[VOCAB-DQV]]: Data quality categories and dimensions|
||http://www.w3.org/ns/locn#||ISA Location Core Vocabulary [[LOCN]]|
||http://data.ordnancesurvey.co.uk/id/||Ordnance Survey Linked Data Platform|
||http://www.w3.org/2002/07/owl#||Web Ontology Language (OWL) [[OWL2-OVERVIEW]]|
||http://data.pdok.nl/def/pdok#||PDOK Data Platform|
||http://qudt.org/schema/qudt#||Quantities, Units, Dimensions and Data Types Ontologies (QUDT)|
||http://www.w3.org/1999/02/22-rdf-syntax-ns#||Resource Description Framework (RDF) [[RDF11-PRIMER]]|
||http://www.w3.org/2000/01/rdf-schema#||RDF Schema vocabulary (RDFS) [[RDF-SCHEMA]]|
||http://statistics.gov.scot/id/statistical-geography/||STATISTICS.GOV.SCOT Geography Linked Data|
||http://purl.org/linked-data/sdmx/2009/attribute#||The RDF Data Cube Vocabulary [[VOCAB-DATA-CUBE]]: Attribute properties|
||http://www.opengis.net/ont/sf#||GeoSPARQL — A Geographic Query Language for RDF Data [[GeoSPARQL]]|
||http://www.w3.org/2004/02/skos/core#||Simple Knowledge Organization System (SKOS) [[SKOS-PRIMER]]|
||http://statistics.data.gov.uk/id/statistical-geography/||Office for National Statistics Geography Linked Data|
||http://www.w3.org/2006/vcard/ns#||vCard Ontology — for describing People and Organizations [[VCARD-RDF]]|
||http://rdfs.org/ns/void#||Describing Linked Datasets with the VoID Vocabulary [[VoID]]|
||http://www.w3.org/2003/01/geo/wgs84_pos#||Basic Geo (WGS 84 lat/long) Vocabulary [[W3C-BASIC-GEO]]|
The following XML namespace prefixes are used within this document. Use of a namespace does not imply endorsement of the associated XML schema.
||http://bag.geonovum.nl||XML schema for the Dutch Government Base Registry Adressen en Gebouwen (BAG)|
||http://www.opengis.net/gml/3.2||Geography Markup Language (GML) Encoding Standard [[GML]]|
||http://www.opengis.net/sampling/2.0||Observations and Measurements — XML Implementation [[OM-XML]]|
||http://www.opengis.net/samplingSpatial/2.0||Observations and Measurements — XML Implementation [[OM-XML]]|
||http://www.opengis.net/waterml/2.0||WaterML 2.0 Encoding Standard [[WaterML]]|
||http://www.w3.org/1999/xlink||XML Linking Language (XLink) Version 1.1 [[XLINK11]]|
In spatial data standards from the Open Geospatial Consortium (OGC) and the 19100 series of ISO geographic information standards from ISO/TC 211 the primary entity is the feature. [[ISO-19101-1-2014]] defines a feature as an: “abstraction of real world phenomena”.
This terse definition is a little confusing, so let’s unpack it.
Firstly, it talks about “real world phenomena”; that’s everything from highways to helicopters, parking meters to postcode areas, water bodies to weather fronts and more. These can be physical things that you can touch (e.g. a phone box) or an abstract concept that has spatial extent (e.g. a postcode area). Features can even be fictional (e.g. “Dickensian London”) and may even lack any concrete location information such as the mythical Atlantis.
The key point is that these “features” are things that one talks about in the universe of discourse — which is defined in [[ISO-19101-1-2014]] as the “view of the real or hypothetical world that includes everything of interest”.
Secondly, the definition of feature talks about “abstraction”. Take the example of Eddystone Lighthouse. A helicopter pilot might see it a “vertical obstruction” and be interested in attributes such as its height and precise location. Whereas a sailor may see it as a “maritime navigation aid” and need information about its light characteristic and general location. Depending on one’s set of concerns, only a subset of the attributes of a given “real world phenomenon” are relevant. In the case of Eddystone Lighthouse, we defined two separate “abstractions”. As is common practice in many information modelling activities, the common sets of attributes for a given “abstraction” are used to define classes. In the parlance of [[ISO-19101-1-2014]], such a class is known as “feature type”.
However, the term “feature” is also commonly used to mean a capability of a system, application or component. Also, in some domains and/or applications no distinction is made between "feature" and the corresponding real-world phenomena.
To avoid confusion, we adopt the term “Spatial Thing” throughout the remainder of this best practice document. “Spatial thing” is defined in [[W3C-BASIC-GEO]] as “Anything with spatial extent, i.e. size, shape, or position. e.g. people, places, bowling balls, as well as abstract areas like cubes”.
The concept of “Spatial Thing” is considered to include both "real-world phenomena" and their abstractions (e.g. “feature” as defined in [[ISO-19101-1-2014]]). Furthermore, we treat it as inclusive of other commonly used definitions; e.g. Feature from [[NeoGeo]], described as “A geographical feature, capable of holding spatial relations”.
Looking more closely, it is important to note that geometry is typically a property of a Spatial Thing.
In fact, this is only one geometry that may be used to describe Eddystone Lighthouse. Other geometries might include a 2D polygon that defines the footprint of the lighthouse in a horizontal plane and a 3D solid describing the volumetric shape of the lighthouse.
Furthermore, these geometries may be subject to change due to, say, a resurvey of the lighthouse. In such a situation, the geometry object would be updated — but the Spatial Thing that we are talking about is still Eddystone Lighthouse. Following the best practices presented below, we use a HTTP URI to unambiguously identify Eddystone Lighthouse:
We say that the Spatial Thing is disjoint from the geometry object. The Spatial Thing, Eddystone Lighthouse (
https://www.trinityhouse.co.uk/lighthouses-and-lightvessels/eddystone-lighthouse), is the “real world phenomenon” about which we want to state facts (such as the height of its light is 41 meters above sea level) and link to other real world phenomena (for example, that it is located at Eddystone Rocks, Cornwall; another Spatial Thing identified as
http://sws.geonames.org/2650253/ by GeoNames).
Sometimes Spatial Things, such as The Sahara, have imprecisely defined locations. These are still considered to be Spatial Things as they have spatial extent — it's just that we can't define a crisp vector boundary for them because there's no consensus about where the edges are. In such cases, often a single point is given that provides the notional center-point of the Spatial Thing.
Many aspects of Spatial Things can be described with single-valued, static properties. However, in some applications it is more useful to describe the variation of property values in space and time. Such descriptions are formalized as coverages. Users of spatial information may employ both viewpoints.
So what is a coverage? As defined by [[ISO-19123]] it is simply a data structure that maps points in space and time to property values. For example, an aerial photograph can be thought of as a coverage that maps positions on the ground to colors. A river gauge maps points in time to flow values. A weather forecast maps points in space and time to values of temperature, wind speed, humidity and so forth. One way to think of a coverage is as a mathematical function, where data values are a function of coordinates in space and time.
Although the definition above presents a coverage as a data structure, conceptually it still has spatial extent. For example, the distribution of rainfall measured by a weather radar can be thought of as a coverage — the spatial extent is defined by the limit of the weather radar's range. Similarly, we might say in the hydrology example, where a river gauge measures flow values at regular sampling times, the spatial extent would be the monitoring point where the river gauge is positioned.
We say that a coverage is really just a special type of Spatial Thing with some particular properties. Often, a coverage can be a property of another Spatial Thing; referring back to hydrology, a "river segment" may have a property “flow rate” that is expressed as a coverage.
Spatial Things and coverages may be related in several ways:
A coverage can be defined using three main pieces of information:
Usually, the most complex piece of information in the coverage is the definition of the domain. This can vary quite widely from coverage type to coverage type, as the list above shows. For this reason, coverages are often defined by the spatiotemporal geometry of their domain. You will hear people talking about “multidimensional grid coverages” or “time-series coverages” or “vertical profile coverages” for example.
A spatial relation specifies how an object is located in space in relation to a reference object. Commonly used types of spatial relations are: topological, directional and distance relations.
One of the most fundamental aspects of publishing spatial data, data about location, is how to express and share the location in a consistent way. In many cases where you are publishing data for use by the wider Web community the use of latitude and longitude coordinates (Lat and Long) is most appropriate. As latitude and longitude coordinates are global they are well suited to many applications: perfect for locating your favorite coffee shop, geocoding a photograph or capturing an augmented reality Pokemon hiding in your local park.
Users of spatial data are often interested in the third dimension too: vertical elevation (or altitude). For most situations, we can consider elevation to be the vertical distance above (or below) mean sea level. The elevation is most often expressed in meters (but this can vary between CRS definitions) and is provided as a third value in a coordinate position.
As with everything to do with spatial data, things can get more complicated. One of the most common problems occurs because not all Coordinate Reference Systems (CRS) agree on how to express latitude and longitude coordinates. Some CRS order the coordinates Lat/Long while others use Long/Lat; some use decimal degrees while others use degrees, minutes and seconds (dms). Axis order mistakes can mean the difference between, say, a position in the Netherlands or somewhere in Somalia, while encoding coordinates in decimal degrees when dms is expected can lead to positional errors on the kilometer scale.
Therefore, it is very important to provide explicit information to your users about how coordinates are encoded. For example, this snippet of results from the Google Geocoding API makes explicit which is the latitude and which is the longitude coordinate.
Other mechanisms include using a data format that specifies how the coordinates are included (such as GeoJSON [[RFC7946]] where section 4. Coordinate Reference System specifies coordinate order of longitude and latitude using units of decimal degrees) or by having your data explicitly reference the CRS definition you're using. See for more information.
Now let's get a little more technical and discuss coordinate reference systems themselves.
Latitude, longitude and elevation measurements express a position on the surface of the Earth. But to define this position we need to state where we are making the measurements from (e.g. the equator, the prime meridian and the approximated surface of the Earth, or geoid) and consider the shape of the earth (a flattened sphere with lumps and bumps, but for convenient mathematical operations, usually approximated to an ellipsoid). This information is used to define the geodetic datum which provides the basis of every coordinate reference system.
Where your geospatial data has geometries defined as points, lines and polygons (i.e. vector data), publishing in the World Geodetic System 1984 (WGS 84) Coordinate Reference System will help people to integrate data with mass-market Web applications, tools and libraries, thereby increasing the usefulness of that data for a large community of potential users. Also, since WGS 84 is also used by the GPS system, it's handy for all those mobile Apps!
Most people can stop reading now, but of course there are cases where WGS 84 is not appropriate — for example, when working with geo-referenced imagery.
In many parts of the world location data has been collected using local coordinate systems that are specific to particular countries or regions. These local coordinate systems may use projected measurements defined on a flat, two-dimensional surface (which are easier to use for calculating distances than angular measurements and are essential when making topographic maps).
So, it may be that you have information in a projected CRS, rather than global latitude and longitude — what should you do? You can publish data as is in one of these many projected CRS, but you need to tell users which particular CRS is being used. A good directory of Coordinate Reference Systems is maintained by the International Association of Oil and Gas Producers: the EPSG Geodetic Parameter Dataset.
You can re-project your coordinates to WGS 84 using many available tools online. So, for example, the location at
516076, 170953 in British National Grid (EPSG:27700) coordinates is
-0.331841, 51.425708 in WGS 84 Long/Lat. This conversion is a useful step as it makes your data more accessible to global users. So, if you can do so, it is helpful to publish data in both local (projected) and global coordinates.
However, given that satellite imagery is comprised of data pixels projected onto a flat surface (i.e. raster data), it is commonplace for raster-type spatial data to be expressed in a projected coordinate reference system to avoid the unnecessary (and potentially costly) conversion of pixel positions to angular measurements. Web-Mercator (EPSG:3857), a global projected CRS, is used in the majority of Web-mapping applications and has therefore become the de facto Web-standard CRS for publishing raster data.
Re-projecting to a better-known CRS is often a necessary step if you are publishing data in the form of engineering or Computer Aided Design (CAD) drawings of a new building or road layout for example. Usually these drawings are made using a very local coordinate reference system for the site itself, so the data will need to be reprojected to “fit” with existing data.
So, we are now at the point where almost everyone publishing spatial data on the Web can stop reading. But for those with specific requirements concerning high precision locations, there are a few more topics that need to be mentioned.
If you need to be able to measure in terms of a few centimeters or less then things are more complicated. With this level of precision required you need to consider a more sophisticated model of the shape of the Earth and consider plate tectonics.
For these more complex use cases other reference systems with alternative geodetic datums are used. The geodetic datum can be thought of as the model of the Earth's surface over which the coordinate reference system is applied. Different datums use different models for the precise shape and size of the Earth to provide more accurate horizontal or vertical measurements at different positions on the globe (because depending on your location, different ellipsoids will provide a better approximation of the local Earth's surface — but this is at the expense of a poorer match elsewhere).
While WGS 84 provides a reasonable fit at all points on the Earth's surface, many other datums are defined for improved fit within a regional or national area. For example, in Europe a system called ETRS89 (EPSG:4258) can be used instead of WGS 84, while in North America a similar system called NAD-83 (EPSG:4269) is used. So, it might be that you have measurements made using these reference systems. Here the best practice is once more to be explicit in describing the CRS used, but also to be careful re-projecting to different systems as required accuracy may be lost.
Finally, another issue is that points on the surface of the earth are actually moving relative to the coordinate system, due to geologic processes. You may think this is of interest only to geologists, but when I tell you that Australia has moved around 1.5m since the framework was last reset 20 years ago, and remind you that we are entering the age of self-driving cars, then you will probably think again. Re-calculating the datum from time to time, or maybe continuously such as in the case of the dynamic New Zealand Geodetic Datum (NZGD2000), really does matter for some applications. See for more information.
Finding, accessing and using data disseminated through spatial data infrastructures (SDI) based on OGC Web services is difficult for non-expert users. There are several reasons, including:
However, spatial data infrastructures are a key component of the broader spatial data ecosystem. Such infrastructures typically include policies, workflows and tools related to the management and curation of spatial datasets, and provide mechanisms to support the rich set of capabilities required by the expert community. Our goal is to help spatial data publishers build on these foundations to enable the spatial data from SDIs to be fully integrated with the Web of data.
When your starting point is a spatial data infrastructure, you should at least read the following best practices. These provide the most important extra steps that should be taken to bring spatial data from spatial data infrastructures to the Web:
The rest of the best practices provide more detail on specific aspects of publishing spatial data on the Web, such as metadata, geometries, CRS information, versioned data, and so on.
Spatial data, like any other data, should be published on the Web. By this we mean more than providing spatial data file downloads or services; for data to be on the Web, the resources it describes need to be identified using HTTP URIs, be published in such a way that they are indexable by search engines, and be connected, or linked, to other resources. This makes the data easy to find and easy to access for non-specialist users: the spatial data becomes integrated within the wider Web of data.
As a first step in publishing your spatial data on the Web, you should assign a URI to each of your datasets (see [[DWBP]] Best Practice 9: Use persistent URIs as identifiers of datasets).
However, we need to look inside the datasets at the resources described within your data. If you want these resources to be visible within the Web’s information space, by which we mean that others can refer to or talk about those resources, then they must also be assigned URIs (see [[DWBP]] Best Practice 10: Use persistent URIs as identifiers within datasets). These URIs are like 'Web-scale foreign keys' that enable information from different sources to be stitched together.
The primary topics of any spatial dataset are Spatial Things — anything from physical things like people, places and post boxes to abstractions such as administrative areas. Each Spatial Thing will be described by a set of attributes and usually at least one geometry. How your spatial data is structured will depend on the vocabulary or data model you use (see for further details on vocabulary choice). This will determine the types of entities that, along with the Spatial Things themselves, are important enough to be given identifiers so that statements can be made about them. Geometry objects are an example of an entity that is often assigned a unique identifier so that they can be referenced or reused.
Given the widespread use of the Hyper Text Transfer Protocol (HTTP) on the Web, we SHOULD use HTTP URIs to identify resources in spatial data.
This is a fundamentally different approach to that of typical data publication today — where the dataset is (often) globally identified, but individual Spatial Things ( "features" in SDI parlance), are assigned local identifiers which may, or may not, be persistent.
Use globally unique persistent HTTP URIs for Spatial Things
Use stable HTTP URIs to identify Spatial Things, re-using commonly used URIs where they exist and it is appropriate to do so.
To publish spatial data on the Web, we need to stitch the Spatial Things and their corresponding entities into the Web’s information space; contributing to the Web of data. First: [[WEBARCH]] Good Practice: Identify with URIs states that "agents should provide URIs as identifiers for resources". Second: the 5 Star Data scheme states: "★★★★ use URIs to denote things, so that people can point at your stuff".
Resources identified with HTTP URIs can be specified as the target of links within the Web’s global information space, enabling information to be related, combined and referred to. This is the fundamental basis of 5★ Linked Data: "★★★★★ link your data to other data to provide context".
The HTTP URIs used to identify Spatial Things need to be stable or persistent so that relationships that link them to other resources don’t break.
Spatial Things become part of the Web’s global information space enabling them be linked with other Spatial Things and other resources and for those links to be durable. In other words, spatial data becomes part of the Web of Data.
[[DWBP]] Best Practice 10: Use persistent URIs as identifiers within datasets provides directly applicable guidance when identifying resources. It advises:
However, we need to look a little more closely at how and where to apply that guidance.
The Web of data is made up of subjects and objects; the things we talk about and the things we refer to. For example, we could say that Anne Frank's House (the subject) is within the Municipality of Amsterdam (the object). In RDF [[RDF11-PRIMER]], this looks like:
When considering HTTP URIs for objects (e.g. the target of our hyperlinks) it makes sense to reuse existing identifiers. After all, you are trying to stitch your spatial data into the Web so that we can "link your data to other data" and achieve a ★★★★★ rating! Organizations such as DBPedia, GeoNames and government mapping and cadastral authorities (that publish national registers of addresses, buildings, etc.) are good sources of stable, authoritative URIs. The steps described for discovering existing vocabularies [[LD-BP]] can be readily adapted to find more. For more details about how you might link to these authoritative identifiers, see .
However, HTTP URIs for subjects (e.g. the resource that we want to make statements about) can be trickier. If you are working purely with data then you can reuse existing URIs minted by other authorities for your subject URIs. But publishing spatial data on the Web means that the URIs for each Spatial Thing should dereference to Web pages or data resources that provide useful information (see ). An HTTP request will be directed to a host Web server, identified by the internet domain name (or IP address) in the requested URI. If you use a URI with an internet domain name where you have no control over how the Web server behaves, then there is no way for your statements to be included in the Web server's response.
To take control of how information about Spatial Things is presented, data publishers need to assign their subject Spatial Things HTTP URIs from an internet domain name where they have authority over how the Web server responds. Typically, this means minting new HTTP URIs. It's all worth considering that the use of a particular internet domain may reinforce the authority of the information served. For example, a URI for Anne Frank's House is:
https://monumentenregister.cultureelerfgoed.nl/monuments?MonumentId=4296. The use of the internet domain registered to the Cultural Heritage Agency of the Netherlands gives the definition authenticity.
When minting your own URIs, [[DWBP]] Best Practice 10: Use persistent URIs as identifiers within datasets cites the advice from GS1's SmartSearch Implementation Guideline [[GS1]] which suggests that your URIs should include the type of resource that is being identified to help human readability. Also, given the need for the HTTP URIs for Spatial Things to be used throughout their lifetime (and perhaps beyond) you should give some thought to designing a URI that is persistent.
[[DWBP]] Best Practice 9: Use persistent URIs as identifiers of datasets cites the European Commission's Study on Persistent URIs [[PURI]] as a good source from which to gain insights about designing persistent URIs.
When an HTTP URI is dereferenced, the server will respond with a sequence of bytes: by its nature, HTTP can only serve information resources such as Web pages or JSON [[RFC7159]] documents. Yet a Spatial Thing is actually a real or conceptual phenomenon — a lake is made from water not information! Using a single URI to refer to both the Spatial Thing and the page/document that describes the Spatial Thing introduces a URI collision. This can impose a cost in communication due to the effort required to resolve ambiguities. [[URLs-in-data]] has more to say on this subject, including recommending URI design patterns that enable differentiation between the Spatial Thing and the page/document that describes it.
However, in most cases using a single URI for both Spatial Thing and the page/document is simpler to implement and meets the expectations of most end-users. As stated in [[WEBARCH]] section 2.2.3 Indirect Identification, identifiers are commonly used in this way. There is no obligation to distinguish between the Spatial Thing and the page/document unless your application requires this.
HTTP URIs for Spatial Things should not include any indication of the data format used to encode the page/document as this may change as your systems evolve. That said, you may wish to provide a set of complementary resources that specify a particular format as part of your content negotiation strategy. For example, the URI
http://sws.geonames.org/7645281/about.rdf dereferences to provide an RDF/XML encoding of the information about Uluru in the Northern Territory of Australia (
[[DWBP]] Best Practice 10: Use persistent URIs as identifiers within datasets notes that URIs can be long. You may need to define identifiers that are locally unique within your spatial dataset and provide a mechanism to programmatically convert each local identifier to a URI. For example, the Metadata Vocabulary for Tabular Data [[TABULAR-METADATA]] achieves this using URI Templates as described in [[RFC6570]].
It is also good practice to use a redirection service to hide complex and potentially changing service end-point URLs, such as for a Web Feature Service [[WFS]] behind well-designed URIs. This means that users don’t need to be aware of the complexities of the API or changes in endpoint URIs or API versions to request information about a particular Spatial Thing. For example, the URI
http://data.example.org/aan/id/perceel/aan.2528 could be used as proxy for the WFS GetFeature request
Finally, while it is simple to use a query-pattern URL to serve information about a resource identified with a URI from a third-party internet domain, e.g.
http://example.org/museums?q=http://sws.geonames.org/6618987/, these URLs are unsuitable as persistent identifiers. More often than not, your intended users will dereference the "official" URI, e.g.
http://sws.geonames.org/6618987/. That said, this kind of search operation does provide a useful mechanism to find particular Spatial Things. See for further details.
Check that within the data Spatial Things, such as countries, regions and people, are referred to by HTTP URIs or by short identifiers that can be converted to HTTP URIs. Ideally dereferencing the URIs should return the Spatial Thing, however, they have value as globally scoped variables whether they dereference or not.
Search engines are the common starting point for people looking for content on the Web. However, as far as search engines are concerned, something is only 'on the Web' if it has an HTTP URI and when this URI is dereferenced, information is returned (usually in the form of a Web page).
Make your spatial data indexable by search engines
Search engines should be able to crawl spatial data on the Web and index Spatial Things for direct discovery by users.
In SDIs information about spatial datasets is published as authoritative metadata records and collated in Web-based catalogues. This approach causes several problems:
Search engines are the common starting point for people looking for content on the Web that is widely understood. By publishing spatial data in a way that enables their crawlers to index spatial datasets including each Spatial Thing, the fidelity of search results should improve. Users will be able to directly search for specific entities rather than having to look for a dataset and then parse through it; e.g. to search for "Anne Frank’s House" (
https://g.co/kg/m/02s5hd) rather than looking for a dataset about "Cultural Heritage in Amsterdam" and hoping that it contains a reference to what you’re interested in.
Information about spatial datasets and things is indexed by search engines.
Users can find Spatial Things using common search engines.
In general, you need to:
The Web-page for the dataset is an entry-point for humans to browse and for the search engines to crawl your data. This landing page should provide descriptive metadata that helps users evaluate whether the dataset meets their needs (see and [[DWBP]] Best Practice 2: Provide descriptive metadata), and may provide links to other service end-points, APIs or tools that will help a user work with the dataset. When metadata for datasets has already been created, e.g. to create a record in a metadata catalogue or to describe the data available from a service end-point, this information should be re-used — publishing it in a Web-friendly way that humans and Web-crawlers can consume. The landing page should be indexable by the search engines so that it can be discovered too!
To enable humans and Web-crawlers to find HTML pages for the Spatial Things, the "landing page" needs to include hyperlinks that can be followed. Where you have a larger collection of Spatial Things, you should support paging through the collection.
You may also consider using Sitemaps to direct the Web-crawler. For larger datasets, multiple sitemaps can be provided and grouped by a sitemap index file. If a dataset contains millions of Spatial Things (e.g. a building dataset with national coverage), generating and maintaining the sitemaps may require a custom implementation to keep the sitemaps with the set of Spatial Things synchronized.
For very large datasets paging through thousands of pages is not useful for a human either. Consider supporting filtering and/or organize the Spatial Things into subsets, as described in .
A pre-condition for this best practice is as persistent identifiers are essential to support reliable indexing and linking. Traditionally spatial datasets have not been maintained with stable identifiers for Spatial Things, but to share spatial data on the Web stable identifiers are a must. Sharing spatial data is more than "just" making the dataset available on the Web.
Each Web-page, and the hyperlinks used to relate the Spatial Things to the dataset landing page, can likely be generated programmatically from the data you hold about the Spatial Thing, either directly from the data or by using an API that makes the data available on the Web.
It is important to keep in mind that the HTML representations should not mainly be designed for the search engines, but they should present the data in a clear and understandable way to human users. The page about the Spatial Thing should be useful to a user and encourage others to link to the page when they share other information about the Spatial Thing. This typically will also improve the ranking of these pages in search results.
In addition to exposing the spatial data as linked HTML Web-pages, indexing by Web-engines can be further enhanced by incorporating a description of the Spatial Thing as structured markup (in particular [[MICRODATA]] or [[JSON-LD]] annotations using [[SCHEMA-ORG]]) as this enables the search engines to make more detailed assumptions about your resource. It is important to note that this is not only helpful to search engines, but also to other tools that want to understand more about the semantics of the resource, for example, its location.
Location information about a Spatial Thing is typically provided using a geometry (GeoCoordinates or GeoShape) or a PostalAddress. [[SCHEMA-ORG]] coordinates are restricted to WGS 84 with longitude and latitude. Supported geometry types are points, line strings, polygons, boxes and circles.
By using [[SCHEMA-ORG]] annotations, search engines and others can connect location information with other information, e.g. about the nature of the Spatial Thing, opening hours, contact details, etc.
The use of [[SCHEMA-ORG]] for spatial data is in its early days and should be understood as an "emerging practice".
The Web-pages should also provide a mechanism to download data in the formats you decide to support. [[DWBP]] Best Practice 14: Provide data in multiple formats provides guidance.
Typically, multiple formats for a resource are supported using two mechanisms: HTTP content negotiation and by adding format-specific file extensions to the resource URI like "
.xml" or "
.ttl". Content negotiation is the standard mechanism of HTTP and the format-specific URIs enable the use of clickable links to the resource in a specific format.
Search engines may also index resource representations in other formats than HTML.
Using a Web browser,
Monitor the search consoles of the search engines about the progress in indexing your Web-pages and their structured data. In case any errors are reported, try to fix them.
Bind Spatial Things into the Web of data using links to other resources, providing sufficient information for a user to determine whether the target resource specified in a link will be of use.
The 5★ rating for Linked Open Data asserts that to achieve the fifth star you must "link your data to other data to provide context". The benefits for consumers and publishers of linking to other data are listed as:
There is always a cost to traversal of a link, even if it is just a few milliseconds delay and the need to parse a few hundred or thousand bytes returned in response to an HTTP request. In many cases, such as when dealing with large datasets and complex queries, the costs incurred from traversing a link may be significant in terms of time and data volumes. Before a user or software agent decides to traverse a link, they should be able to determine whether acquisition of the target resource, or data about the target resource, will support their application goals. For example, what format can one expect the response in, what type of resource is the target and how is that target related to the source resource?
Links can be identified and traversed by humans and software agents.
Sufficient information is provided to help humans and software agents determine whether traversal of a given link meets their goals.
The ground-rules for linking spatial data are the same as for any type of data.
Use formats that support Web linking (as defined in [[WEBARCH]] section 4.4 Hypertext)
Earlier in this document () we explained that linked data requires only that the formats used to publish data support Web linking. In other words, linking spatial data does not automatically mean the use of RDF [[RDF11-PRIMER]]; links can also be created, for example, using [[GML]], HTML or [[JSON-LD]]. The two key points from [[WEBARCH]] are:
The examples used in this best practice illustrate some of the data formats and mechanisms that support Web linking.
Follow the principles for 4★ — Linked [[WEB-DATA]]
Always use global identifiers when linking between documents, so that link identifiers can be taken out of context and shared globally.
Links should be typed (explicitly or implicitly), so that clients can decide which link to follow when they are traversing a Web of interlinked resources to reach application goals.
This example, using HTTP Link headers (as defined in [[RFC5988]]), illustrates the use of IANA [[IANA-RELATIONS]] to define the link type. According to the IANA registry,
predecessor-version points to a resource containing the predecessor version in the version history (as defined in [[RFC5829]] "Link Relation Types for Simple Version Navigation between Web Resources").
Make links as specific as possible. If the linked resource supports fragment identification, and the link logically should be to a fragment of the resource (and not just the resource as a whole), try to use fragment identifiers when possible.
Check that hyperlinks are distinguishable within the data — a string-literal that happens to contain a URL is insufficient.
Check that hyperlinks use global identifiers, preferably HTTP URIs, to identify the link target.
Check that hyperlinks use typed relationships, and that the definition of the link relation type can be located in order to determine how to interpret the hyperlink.
The best practices in this section take [[DWBP]] as a basis and further refine them to provide more specific guidance for spatial data.
This section does not elaborate on formats for publishing spatial data on the Web. The formats are basically the same as for publishing any other data on the Web: XML [[XML11]], JSON [[RFC7159]], CSV [[RFC4180]], RDF [[RDF11-PRIMER]], etc. Refer to [[DWBP]] section 8.8 Data Formats for more information and best practices. Refer to for a list of spatial data formats for the Web.
That being said, it is important to publish your spatial data with clear semantics, i.e. to provide information about the contents of your data. The primary use case for this is you have information about a collection of Spatial Things and you want to publish precise information about their attributes and how they are inter-related. Another use case is the publication on the Web of a dataset that has a spatial component in a form that search engines will understand.
Depending on the format you use, the semantics may already be described in some form. For example, in GeoJSON [[RFC7946]] this description is present in the specification. When using JSON it is possible to add semantics using a [[JSON-LD]]
@context object. For providing semantics to search engines, using [[SCHEMA-ORG]] is a good option, as explained in . In a linked data setting, the attributes of a Spatial Thing can be described using existing vocabularies, where
each term has a published definition. If you can't find a suitable existing vocabulary term, you should create
your own, and publish a clear definition for the new term, linking it to commonly used existing ones if possible,
because this increases its usefulness. An overview and high-level comparison of RDF vocabularies / OWL ontologies for spatial data is provided in . We do not recommend one vocabulary because this recommendation would not remain durable as vocabularies are released or amended.
[[DWBP]] section 8.9 Data Vocabularies provides guidance on the topic of data modelling; determining which concepts and relationships should be used to describe your area of interest, something usually done by domain experts. Data publishers should not attempt to guess all the purposes for which someone might use or reference their data — ending up with a super-complex data model that tries to cover every possible use case. Instead, data publishers should try to help data consumers make informed decisions about the best way to use the data by providing good metadata.
In most cases, the effective use of information resources requires understanding thematic concepts in addition to the spatial ones; "spatial" is just a facet of the broader information space. For example, when the Dutch Fire Service responded to an incident at a day care center, they needed to evacuate the children. In this case, the 2nd closest alternative day care center was preferred because it was operated by the same organization as the one that was subject of the incident, and they knew who all the children were.
This best practice document provides mechanisms for determining how places and locations are related — but determining the compatibility or validity of thematic data elements is beyond our scope; we're not attempting to solve the problem of different views on the same/similar resources.
That said, there is one aspect of thematic semantics that must be mentioned. The most important semantic statement you can make when publishing spatial data — or any data — is to specify the type of a resource. For Spatial Things, there are several types that define "spatialness" (for examples in a linked data context, see the vocabularies table in ). But you should also consider non-spatial aspects when designating the type of a Spatial Thing. For example, should a fire incident occur at Amsterdam Central railway station, it might seem sensible for the Municipal Fire Department to designate a type such as Building or Station (the Dutch Government Base Registry defines Amsterdam Central railway station, identified as
https://brt.basisregistraties.overheid.nl/top10nl/id/gebouw/102625209, designates both of these types). However, the Fire Departments are concerned with a fire incident — not the railway station itself. The fire incident is a Spatial Thing (it has spatial extent) but it is not the station. For example, the fire may spread to adjacent buildings. The Fire Department might designate their Spatial Thing as having type FireIncident or similar. Advice on how to assign a persistent identifier to the fire incident is provided in , and provides guidance on how one might relate the fire incident to other coincident Spatial Things such as Amsterdam Central railway station.
Use spatial data encodings that match your target audience
Represent spatial data in a way that matches the needs of the target audiences.
Spatial data is used by a range of user communities, each with their own purposes, knowledge and preferred tools. Data publishers should consider which communities and purposes they want to serve and make appropriate choices for the approach to encoding data. In general terms, data usefulness is increased when it can be used for more purposes. This might involve providing data in several different formats. (See [[DWBP]] Best Practice 14: Provide data in multiple formats.)
Spatial data can be used easily and reliably by the target users.
A high-level objective of these best practices is to highlight approaches that data publishers can take to maximize the ease of use of their spatial data via the Web and hence present data in a way that meets the needs of as wide a range of users and applications as possible.
One way of classifying the applications of spatial data is as follows:
Each of these has different needs: often it will be possible or desirable to support several of these application groups.
The main objective is to encode data in a way that recipients can easily decode and understand. To decide this, you need to consider which purpose(s) and which audience(s) you are aiming to serve and the characteristics of the data that you want to share. For example:
In we recommend the use of HTTP URIs as a way of assigning identifiers to Spatial Things. The data publisher should offer the ability to look up ('dereference') such a URI to find out useful information about that Spatial Thing in human readable form (as well as machine readable formats — see the discussion below on data integration). Each Spatial Thing therefore gets its own Web page — in addition it might be useful to have Web pages about groups of Spatial Things, but the 'page per thing' approach enables fine-grained linking of information.
To promote discovery of such Web pages in search engines, each page should contain a clear text description of what it is, ideally in a way that distinguishes it from pages about other similar Spatial Things. Including metadata using the [[SCHEMA-ORG]] vocabulary, embedded as [[MICRODATA]], [[HTML-RDFa]] or as [[JSON-LD]] in the <head> section of the page can provide additional information to search engines to support more precise indexing. See for a more detailed discussion.
A common way of specifying the location of a building is to use its postal address. Most spatial applications require an address to be turned into spatial coordinates, so that its location can be marked on a map, or compared with locations of other things, a process known as geocoding. Although a publisher could leave this process of geocoding to the data user, ideally the publisher should take responsibility for this as they are in a better position to check the accuracy of the results. Different ways of specifying addresses can sometimes lead to errors in the geocoding process.
Other approaches can be taken to specifying location. What3words is an example of a service that assigns an alternative kind of address to a location — in this case a sequence of three common words associated with a 3m by 3m square on the ground. It allows every location to be given such an address and what3words also provides a means to relate the address to latitude and longitude coordinates. Like conventional addresses, converting to coordinates is necessary for many spatial data applications (e.g. to calculate the distance between points or whether a point is inside a region), but the process of conversion is more reliable and precise.
A common application of spatial data on the Web is delivering map data in a tiled form, suitable for display in zoomable 'slippy maps'. The OGC's Web Map Tile Service [[WMTS]] is an established standard for doing this. Other approaches in common use include MBTiles or 'Tile layers' in Google Maps APIs
See this comparison of different spatial data formats to help guide the choice of which approach is best suited to your purpose.
Many important applications of spatial data involve combining it with other kinds of data: for example, opening times of nearby supermarkets, or statistical information on the economy of a town. Often one or more Spatial Things are at the center of the data analysis process.
Other applications involve distinguishing or selecting Spatial Things according to their non-spatial characteristics: hospitals with an emergency department, or restaurants that serve Japanese food.
To enable such questions to be answered using data from different sources, it is important to describe Spatial Things using shared identifiers and vocabularies. This is described in [[DWBP]] Best Practice 10: Use persistent URIs as identifiers within datasets and [[DWBP]] Best Practice 15: Reuse vocabularies, preferably standardized ones.
A common approach to encoding data to enable data integration is Linked Data [[LD-BP]] and RDF [[RDF11-PRIMER]]. The spatial aspects of the data can either be included in the RDF data model, or the entity in question can link to an external Web resource containing the geometry in one of the standard spatial data formats. Although RDF is well-suited to important aspects of best practice, including use of URIs as identifiers and re-use of vocabularies, other data formats are also consistent with this approach. Most spatial data formats enable associating attributes of an entity alongside its geometry.
The publisher's choice of data model to represent the data will depend on what data is available and which audiences and purposes it seems most important to support. However, a reasonable general rule is that it is always useful to provide a label and a type for each entity in the data collection. (See [[DWBP]] Best Practice 16: Choose the right formalization level)
Common vocabularies for describing the address or location of a Spatial Thing include: [[SCHEMA-ORG]], [[VCARD-RDF]] and [[LOCN]]. See this comparison of different vocabularies for describing Spatial Things to help decide which is best for your application.
Publishing explicit relationships between the Spatial Thing of interest and other related Spatial Things helps support data integration applications: for example providing hierarchical relationships between different kinds of administrative area.
Spatial analytics (or spatial analysis) is about deriving new insights by applying formal techniques to study Spatial Things using their topological or geometric properties. Combining spatial data with other data (see item 3 above) is a typical preparatory step before analyzing the one or more datasets using spatial operators, statistical algorithms, etc.
For spatial analytics on the Web, the data should be accessible via an API as described in and results should be shared using the best practices described in this document. Current spatial data infrastructures have some limitations with respect to sharing spatial data on the Web (as discussed in ). Nonetheless this approach is a well-established and powerful way of distributing spatial data, based on open standards and suited to a community of expert users. It is thus one of the options a data publisher should consider when deciding how to encode their spatial data.
In addition to publishing the data that represents the results of the analysis, maps and other forms of visualization (see item 2 above) are typically used to communicate the results.
The four main classes of application above have a wide range of requirements. To support such a wide range may require a lot of effort and cost on behalf of a data publisher. There are many aspects to the 'quality' of a spatial data publishing approach, but in general terms it relates to how well the data and approach to data delivery meet the needs of the target audience. By choosing to concentrate on only some kinds of application the publisher can keep cost down. Other factors to consider include performance (speed with which data is delivered), timeliness of updates — which can be a significant consideration if the underlying data changes frequently, software complexity or maintenance.
In many cases a mixture of technologies can be used together to find a good compromise of quality or performance and cost. The strengths of various approaches can be applied to the part of the publishing 'spectrum' that suits them best. For example, if using a Linked Data approach, one option is to keep all data in a triple store; but hybrid approaches are also possible, for example where geometrical information is stored and served from flat files, or where non-geometrical data and metadata is stored in a triple store and used to generate Web pages and machine readable descriptions of Spatial Things, while geometrical data is indexed by software such as Lucene Spatial, PostGIS or Elasticsearch. Use of shared Web-accessible identifiers for Spatial Things can help support the interconnections between a range of diverse information systems.
[[EO-QB]] describes a 'spectrum of linkiness' for coverage data. At one end of the spectrum, you can assign each individual data point or pixel within a coverage (such as a satellite image) an individual identifier and web page. At the other, you can link just to an entire dataset and provide metadata for that. An intermediate approach involves dividing the data into tiles, each of which can have its own identifier and metadata. The balance of quality and cost in this example corresponds to the size of tiles that can be individually referenced, described and retrieved.
Check if spatial data is encoded, so that it can be understood and re-used reliably.
Consider the main target audience or audiences of a web page or service, and check if spatial information is provided in a way appropriate for that audience.
Location information is a common constituent of spatial data and can be an important 'hook' for finding information and for integrating different datasets. There are different ways of describing the location of Spatial Things. You can use and/or refer to the name of a well-known named place, provide position coordinates in a geometry or describe one location relative to another location. Providing multiple representations i.e. several geometries for one Spatial Thing can also be helpful, allowing data users to choose the one that fits their use case. This generally requires each geometry to be represented as a structured object that includes not only coordinates of the positions defining the geometry but also an identifier and other properties that describe its specific characteristics. It is especially important to choose the coordinate reference system with care and indicate it clearly for each geometry.
Provide geometries on the Web in a usable way
Geometry data should be expressed in a way that allows its publication and use on the Web.
The geospatial, Linked Data, and Web communities use different geometry formats and tools, which reflect different requirements with respect to data complexity and manipulation.
When deciding how a geometry should be described, it is therefore necessary to consider the intended uses and the related user communities. This may also imply providing alternative geometry descriptions.
This best practice helps with choosing the right format for describing geometries, based on aspects like intended use(s), performance, and tool support. It also helps with deciding when encoding geometries as literals rather than as structured objects is a useful simplification.
The format chosen to express geometry data should:
Ideally, to enable their widest re-use, geometries should be described having in mind the geospatial, Linked Data and Web communities. This may not be always feasible, but the objective should at least be to describe geometries (also) for Web consumption.
Steps to follow:
It is important to note that the steps outlined above are interrelated. For instance, the dimensionality of a geometry determines the set of coordinate reference systems that can be used, as well as the geometry encodings / representations.
Another issue to be considered when choosing the geometry format is whether the coordinate axis order is unambiguous — i.e., whether the order of the position coordinates defining each geometry is, e.g., longitude/latitude or latitude/longitude. This specific topic is covered by .
Currently, there are two reference geometry formats widely used in the geospatial and Web communities, respectively, [[GML]] and GeoJSON [[RFC7946]].
GeoJSON [[RFC7946]] supports only one coordinate reference system (CRS84 — i.e., WGS 84 longitude/latitude), and geometries up to 2 dimensions (points, lines, surfaces) but is serialized in JSON [[RFC7159]], which is often easier for browser-based Web applications to process.
To facilitate the use of geometry data on the Web as well in GIS, it is desirable that complex [[GML]]-encoded geometries be made available also in simplified form as GeoJSON [[RFC7946]], by applying any required coordinate reference system transformation, as well as simplifying and generalizing the original geometry as needed (e.g., by transforming a 3D geometry into a 2D one). Simplified geometries may of course also be published in [[GML]], for example by conforming to the GML Simple Feature profile [[GML-SF]]. (On this topic, see ).
Finally, RDF-based representations of geometries are used in the Linked Data community. This is achieved by using specific vocabularies, as [[W3C-BASIC-GEO]] (only for points), [[GeoRSS]] (points, lines, boxes, circles, polygons) or [[GeoSPARQL]] (for any simple features geometries). For a high-level comparison of common spatial data vocabularies, see .
These geometry representations are either stored with the related data, or are maintained separately, and possibly denoted with HTTP URIs (see ).
RDF representations of geometries can support most geometry types and dimensions (up to at least 2 dimensions), with any level of complexity, in any coordinate reference system. On the other hand, many existing Semantic Web tools such as triple stores are currently not efficient enough to perform spatial queries which are complex and/or on complex geometries. It may therefore preferable to maintain geometries separately, in software platforms designed for these specific tasks.
It is nonetheless still desirable to make simplified geometries available for Web consumption in GeoJSON [[RFC7946]] or embedded in Web pages.
The following [[TURTLE]] snippet shows the [[GeoDCAT-AP]] representation of the dataset in . Here the bounding box is provided in multiple literal encodings (WKT, [[GML]], GeoJSON [[RFC7946]]), by using property
In the above example, the coordinate reference system used for the bounding box is CRS84 (equivalent to WGS 84, but with coordinate axis-order longitude/latitude), which is explicitly specified in the [[GML]] encoding via attribute
@srsName, and by using the relevant HTTP URI from the OGC CRS registry. The coordinate reference system is not specified for the WKT encoding, since CRS84 is the default coordinate reference system for WKT in [[GeoSPARQL]], and therefore it can be omitted. The coordinate reference system is also not specified in the GeoJSON [[RFC7946]] encoding, since CRS84 is the only supported coordinate reference system in GeoJSON [[RFC7946]].
Always with reference to , the following snippet shows the [[GML]] and the RDF [[RDF11-PRIMER]] representations of the entry in the BAG Dutch register concerning the building where Anne Frank's house is located. For the corresponding GeoJSON [[RFC7946]] representation, see the relevant example in .
The corresponding RDF representation is provided in the following [[TURTLE]] snippet (taken from the BAG Linked Data service). NB: The RDF representation below has been complemented with additional properties (marked with
# Added) for demonstration purposes.
The two instances of property
geosparql:asWKT follow the syntax recommended in [[GeoSPARQL]], where the specification of the coordinate reference system is required only if different from CRS84. By contrast, property
pdok:asWKT-RD implies the use of a specific coordinate reference system, namely, EPSG:28992 ("Amersfoort / RD New"). The coordinate axis-order used is determined here by the coordinate reference system, and in both cases, it is longitude / latitude (more precisely, east/north for EPSG:28992).
shows also how geometries for Spatial Things can be published as separate Web resources. This approach can be particularly suitable for giving access to huge geometries, consisting of hundreds of vertex positions (as the detailed geometry of the boundaries of a geographical region), without attaching them to the relevant Spatial Things. Moreover, this allows the same geometry to be linked from (i.e., re-used by) different Spatial Things. Finally, it is possible to use mechanisms (including HTTP content negotiation) to provide access to different representations / encodings of the geometry ([[GML]], WKT, GeoJSON [[RFC7946]], etc.) as media types, thus addressing different use cases. (On this topic, see also ).
Relevant requirements: R-MultipleCRSs, R-BoundingBoxCentroid, R-Compressible, R-CRSDefinition, R-EncodingForVectorGeometry, R-IndependenceOnReferenceSystems, R-MachineToMachine, R-SpatialMetadata, R-3DSupport, R-TimeDependentCRS, R-TilingSupport.
Provide geometries at the right level of accuracy, precision, and size
Geometry data should be provided at levels of accuracy, precision, and size fit for their use on the Web.
Geometry data always provide an approximate description of the shape and extent of Spatial Things, which is fit for specific uses. For instance, portraying a geometry on a Web map would typically not require the level of detail that is needed for using the same geometry for spatial analysis. Moreover, although a 3D description of a geometry of a building might be available, a Web map would be typically capable of portraying just its 2-dimensional footprint.
Other issues to be taken into account are network bandwidth and the processing capabilities of the target tools. For instance, a geometry of a total size of 1GB or more, could be more efficiently transmitted after being compressed. On the other hand, a tool with limited processing capabilities (as a Web browser) may not be able to efficiently handle such geometry (e.g., for displaying it on a Web map).
This best practice complements by outlining some of the approaches that can be used to publish alternative versions of geometry data, with respect to the level of accuracy, precision, and size, fit for the most general use cases and the reference target communities.
Geometry data should be made available at (possibly different) levels of accuracy, precision, and size, taking into account:
As said in , the requirements of the geospatial, Linked Data and Web communities should be ideally taken into account also with respect to the accuracy, precision, and size of geometry data. Whenever this is not feasible, Web consumption requirements should at least be addressed.
A number of techniques can be used to deliver representations of geometries at an accuracy, precision, and size fitting the requirements of a given use case.
The following list, although not exhaustive, outlines the approaches most widely used, especially for the Web delivery and consumption of geometry data.
Choosing the right technique requires taking primarily into account whether the derived geometry is fit for the target use case. Technical limits — as network bandwidth and processing capabilities — are of course important, but secondary. Of course, the ideal situation is when you are able to find the technique offering the right trade-off between these two types of requirements.
Whatever option is used, the key requirement is that the derived geometry data are not replacing the original ones, but are made available as alternative representations.
, and provide general guidelines that can be used for the publication of alternative representations of geometries, providing at the same time information on their characteristics. These include, but are not limited to, the use of different URIs for different representations, and HTTP content negotiation. Moreover, whenever geometry data are made available in RDF [[RDF11-PRIMER]], specific properties can be used to specify the geometry type and the level of accuracy and precision. More specific examples are included in the approaches described below.
Compress geometry data
Using standard compression algorithms, as zip and gzip, addresses the issue of efficient transmission of geometry data, without information loss. Notably, some formats come with alternative compressed encodings — e.g., KMZ is used to deliver compressed [[KML]] data.
Compression can be easily carried out on the fly, and it is also supported by the HTTP protocol via content negotiation — see [[RFC2616]], section 3.5: Content Codings.
Use formats optimizing access to and processing of geometry data
Some formats support a more compact description of geometry data, which potentially results in reducing network bandwidth consumption and/or more efficient client-side processing.
This is for instance the case of TopoJSON, an extension to GeoJSON [[RFC7946]] which reduces redundancy in the description of a geometry, by splitting it into segments (referred to as "arcs") that can be re-used.
To achieve the same results, other formats are designed to enable the stream-based delivery of geometry data. For instance, GeoJSON Text Sequences [[RFC8142]] is a format designed to optimize access and processing of GeoJSON [[RFC7946]] data, by enabling a client application to use the received data even before the transmission is completed.
Another approach, focused on efficient client-side processing, is GeoJSON-VT, a library which enables a client to create on the fly vector tiles from GeoJSON [[RFC7946]] data.
Provide geometries at different levels of generalization
Generalization is a traditional technique used in spatial data — first of all, in cartography — to reduce the precision and/or accuracy of a geometry for specific purposes. A typical example is provided by how geometries are portrayed in maps of different scales: for instance, a large-scale map can depict the width of a road (2-dimensional geometry), whereas, at lower scales, the same road can be shown as a line with zero width (1-dimensional geometry).
Providing geometries at different scales or resolutions is actually one of the first criteria to be considered for addressing different use cases. This is common practice in the geospatial domain, especially, but not only, for reference data. For instance, the dataset of the Nomenclature of Territorial Units for Statistics (NUTS) of the European Union is made available at five different scales — ranging from 1:1,000,000 to 1:60,000,000.
Scale reduction uses a number of generalization techniques that can be used also outside this specific use case in order to provide geometries at different levels of accuracy and precision.
These techniques include the following:
Provide the centroid and bounding box of a geometry
Centroids and bounding boxes are another example of how a geometry can be generalized, but serving different purposes. More precisely, a centroid is meant to specify the position of a Spatial Thing by converting its actual geometry to a point, corresponding to its center. On the other hand, a bounding box provides a simplified description of the maximum extent of a Spatial Thing.
Although both these generalization methodologies result in a high-level information loss with respect to the original geometry, they play an important role in spatial analysis because of the topological information they provide. Moreover, centroids and bounding boxes could provide an accurate enough description of a geometry for those use cases where, respectively, the extent or precise shape of a Spatial Thing is not relevant. Finally, they are widely used also outside the geospatial domain.
Computation of centroids and bounding boxes is supported by all GIS tools and Web mapping libraries, which makes it possible to be carried out on the fly. However, performing this operation client-side can be extremely inefficient if the target tool has limited processing capabilities.
This issue can be addressed by providing access to centroids and bounding boxes as alternative representations of a given geometry.
Choose coordinate reference systems to suit your user's applications
A multitude of coordinate reference systems exist because there is no perfect solution to meet all requirements:
The Earth is a complicated shape (neither spherical nor flat!):
For each (Earth-based) coordinate reference system, the topographical surface of the Earth is approximated to a geodetic datum that is described using an ellipsoid. The trouble with approximation is that nothing is perfect everywhere, which means that compromise is inevitable. Some datums, like WGS 84, provide a reasonable (but not highly accurate) fit everywhere on the Earth, while other datums (such as the European Terrestrial Reference System 1989 — as used by ETRS89 / EPSG:4258) provide a better fit in a given region at the expense of accuracy elsewhere.
Spatial data is often projected from the curved surface of the Earth onto a flat plane (e.g. a computer screen or a topographical map) to make it easier to compute distances between positions and calculate areas. There are many choices of projection (e.g. equirectangular, mercator, stereographic, orthographic etc.), each of which is designed for particular tasks. As with datums, projections are often chosen to better support regional, national or local needs.
It is also worth noting that as a living planet, the Earth continues to change its shape; for example, continental drift moves Australia north-eastwards several centimeters each year and New Zealand shifts in multiple directions. To retain accuracy, datums need to be adjusted from time to time — as is the case of the New Zealand Geodetic Datum (NZGD2000) that is frequently revised to take account of earth deformations.
Sometimes we don't want to measure relative to the surface of the Earth at all:
Spatial data such as descriptions of the built environment, geological surveys, satellite imagery, etc. are often captured and stored in an engineering coordinate reference system as measurements from a local datum. For example, X Y survey coordinates relative to a building corner, pixel positions within the image swath of a satellite camera, or distance along a line from a fixed origin point.
Although it is possible to convert coordinates from one CRS to another, many users will be put off by the need to do so. Furthermore, the need for such transformations introduces a point where errors can be introduced to the spatial data — especially where users have limited expertise with spatial data.
When publishing spatial data, it is best to help users avoid the need for them to transform spatial data between coordinate reference systems themselves by providing data in a form, or forms, which they can use directly. To determine which coordinate reference system(s) are needed, data publishers must consider the intended applications of their user community.
Most of a publisher's anticipated user community do not need to transform coordinate values prior to using the spatial data.
The first thing that publishers of spatial data need to do is consider their audience.
When publishing spatial data on the Web, the largest community of potential users will be unknown: anyone might find and use data published on the Web! To support this unanticipated reuse, we recommend always publishing your spatial data using a global coordinate reference system which allows spatial data from multiple sources to be readily combined for display or computation. For geospatial data with point, line or polygon geometries (i.e. vector data), WGS 84 Lat/Long (EPSG:4326) or WGS 84 Lat/Long/Elevation (EPSG:4979) are good choices as many of the tools and applications used by Web developers are set up to use data from GPS-enabled mobile devices that all use WGS 84. Where you have geo-imagery (i.e. raster data, comprised of a rectangular pattern of pixels on a flat plane) it is best to use Web Mercator (EPSG:3857) which has global coverage.
Where considerations of the known user community (or communities) call for different coordinate reference systems, we recommend publishing spatial data in multiple representations: one for each of the prioritized coordinate reference systems. Clearly, the number of representations provided needs to be determined with respect to the associated effort. However, remember that a decision not to publish data in a priority CRS will result in each member of your user community needing to do that task — or them not using your data.
Common reasons for needing to publish in additional coordinate reference systems include:
publication through government data portals that require use of a projected CRS defined by the national mapping agency — and similar legislative requirements;
Check that geospatial data (i.e. data about things located relative to the Earth) is available, as a minimum, in a global coordinate reference system: for vector data, this should be WGS 84 Lat/Long (EPSG:4326) or WGS 84 Lat/Long/Elevation (EPSG:4979); for raster data this should be Web Mercator (EPSG:3857).
State how coordinate values are encoded
Provide enough information for users to determine how coordinate values are encoded.
The geometry of Spatial Things is described using position coordinates; for example, latitude and longitude. Because coordinates describe a position relative to a datum (e.g. zero latitude is the equator and zero longitude is the prime meridian — often the Greenwich Meridian), it is important to understand both the datum and the units that are used for coordinates along with the order which the coordinate axes are defined: the coordinate reference system (CRS). Spatial data is published in a wide variety of CRS. This variety can create confusion and inconsistencies in using and interpreting spatial data. Unless the CRS is known, errors are likely to be introduced when determining the position and extent of a Spatial Thing on the Earth and this makes comparing or combining spatial data from different sources extremely problematic.
Sufficient information is provided to enable coordinates to be related to the correct position, thereby enabling spatial data to be correctly interpreted by humans and software agents.
Spatial data from different sources can be combined without introducing unwarranted positional errors.
A user of spatial data will need to know:
There are four common ways that this information can be provided:
Describe the coordinate reference system in the dataset metadata.
The example above illustrates how to describe the coordinate reference system used for a dataset within [[GeoDCAT-AP]] metadata. The
conformsTo property from [[DCTERMS]] is used to assert the relationship between dataset and CRS in the same way that conformance with a standard is expressed in [[VOCAB-DQV]].
Provide each coordinate value with explicit labels and provide metadata to indicate what each label means.
The labels (or terms)
w3cgeo:long are provided by the [[W3C-BASIC-GEO]] vocabulary which states that it is:
A vocabulary for representing latitude, longitude and altitude information in the WGS 84 geodetic reference datum.
The terms themselves (plus
w3cgeo:alt) are defined with all the necessary information as follows:
In the example above, the labels
longitude are defined in [[SCHEMA-ORG]], as indicated by the [[JSON-LD]] key
@vocab. The associated definitions in [[SCHEMA-ORG]] are:
The metadata for axis labels may also be provided in the documentation for an API from which the spatial data is accessed. For more information on documenting APIs, please refer to [[DWBP]] Best Practice 25: Provide complete documentation for your API.
In this example (adapted from the City of Palo Alto tree operations database and published as tabular data and as an interactive map) the coordinate position of each tree is specified using separate columns (
We see the definitions of those
Lat columns provided in the dataset metadata, in this case a tabular metadata document, as per approach (1) above.
Lat are mapped onto the definitions provided by [[W3C-BASIC-GEO]] to ensure that the meaning of the data values in those columns is clear:
Use a data format that specifies axes, their order, datum and unit of measurement for coordinates.
The media type
application/geo+json is used to designate that content is provided in GeoJSON format, as specified in [[RFC7946]].
[[RFC7946]] Section 4. Coordinate Reference System provides all the necessary information to interpret the coordinates, stating that:
The coordinate reference system for all GeoJSON [[RFC7946]] coordinates is a geographic coordinate reference system, using the World Geodetic System 1984 (WGS 84) [WGS84] datum, with longitude and latitude units of decimal degrees. This is equivalent to the coordinate reference system identified by the Open Geospatial Consortium (OGC) URN urn:ogc:def:crs:OGC::CRS84. An OPTIONAL third-position element SHALL be the height in meters above or below the WGS 84 reference ellipsoid. In the absence of elevation values, applications sensitive to height or depth SHOULD interpret positions as being at local ground or sea level.
The [[SCHEMA-ORG]] definition of
State within the data itself which coordinate reference system is used.
The example above encodes the polygon for Anne Frank's House in [[GML]]. The XML [[XML11]] attribute
srsName (srs meaning "spatial reference system") refers to the Amersfoort / RD CRS (EPSG:28992) used in the Netherlands. Also note that additional useful information (
axisLabels) is provided within the document for easy reference.
The "Well Known Text" (WKT) encoding, itself defined in [[SIMPLE-FEATURES]], is extended by [[GeoSPARQL]] to include designation of the coordinate reference system used, which in turns determines the coordinate axis-order. The example above encodes the polygon as a [[GeoSPARQL]]
wktLiteral data type, designating the coordinate reference system as
<http://www.opengis.net/def/crs/EPSG/0/4326> (EPSG:4326) — WGS 84 Lat/Long.
Sometimes instead of using geometry and coordinates to describe a location, we want or need to describe it in relation to another location. In that case relative positioning can be used.
Describe relative positioning
Provide a relative positioning capability in which one entity can be positioned relative to another entity.
Geocentric coordinate reference systems describe position relative to the earth itself. It can also be valuable or even necessary to describe the position of an entity relative to a second entity. In some cases, this is a navigation convenience, for example a tour kiosk might be described as located between the Boston Common Frog Pond and the Park Street T entrance, or in one's lower left view when looking up at the Statehouse. In other cases of moving or generalized entities, it may be that the entity can only usefully be given a relative position. For example, a package is reported left on seat 32L1 on the #59 bus, or part number PRG5460 is always located at position (51, 73, 3) in Acme warehouses.
It should be possible to describe the location of an entity in relation to one or more other entities or places, instead of specifying its own geocentric position or geometry.
The relative positioning descriptions should be machine-interpretable and/or human-readable as required by the intended application. The positions and/or geometries of reference entities, if available, should be retrievable through their link relations.
Positioning of one entity (A) relative to another referenced entity (B) is a combination of two factors: the referencing target, and the means of relative positioning. "Geocentric" referencing targets the planet itself or at least a fixed point on it. "Allocentric" referencing targets another entity. "Egocentric" referencing targets a particular field of view of an observer or camera. Positioning can take the form of a complete coordinate reference system (e.g. engineering CRS), a qualitative relation such as "beside", or a quantitative relation such as "30m northwest"
|Engineering CRS||Qualitative Relation||Quantitative Relation|
|Geocentric||Coordinate position A relative to a fixed earth datum||Not Applicable||Not Applicable|
|Allocentric||Coordinate position A relative to a fixed, mobile, or generic entity B||A "next to" B||A "20m south" of B|
|Egocentric||Coordinate position A within field of view B||A in "lower left corner" of field of view B||A "30 deg right of center" in field of view B|
Check that, when positions of entities are described as relative to other entities, these descriptions can be interpreted by a machine as well as humans, and the positions of the reference entities can be retrieved through their link relations.
Spatial things and their attributes can change over time. For example, a lake may grow or shrink due to changes in climate, water extraction or any number of reasons. For many applications, it is important that information about Spatial Things is kept up to date. When new information is available, the data publisher may make this available on the Web according to their update schedule and policies. [[DWBP]] section 8.6 Data Versioning and Best Practice 21: Provide data up to date provide directly applicable guidance.
When dealing with change to a Spatial Thing, you should consider its lifecycle; in particular, how much change is acceptable before a Spatial Thing can no longer be considered as the same resource. Consider Eddystone Lighthouse for example: the “Eddystone Light”, a maritime navigation aid, has existed in (more or less) the same place on Eddystone Rocks since 1698. A single HTTP URI (such as
http://dbpedia.org/resource/Eddystone_Lighthouse) is used to identify “the lighthouse on Eddystone rocks” for all that period. The lighthouse's attributes (such as its focal height, visible range and light characteristic) have changed over that period, but we still consider it to be the same lighthouse. However, if our interest is historic buildings, we would identify the four different structures that have stood on that site as different Spatial Things, from Winstanley's Eddystone Lighthouse (the first incarnation) to Douglass' Eddystone Lighthouse (the 4th and current incarnation). In that context, incremental change for these structures during the entire period from 1698 is not appropriate; one structure replaces another and so each structure should be assigned a unique identifier. In summary, different things are important to different people!
All that said, if you consider that the change affects the fundamental nature of the Spatial Thing, then you should assign a new identifier. See for more details. Otherwise, read on for guidance on how to describe properties that change over time.
Provide information on the changing nature of spatial things
Spatial data should include metadata that allows a user to determine when it is valid for.
Spatial things and their attributes change over time. When it comes to Spatial Things, or any resource, that changes over time, it is important to provide metadata about the life cycle of those entities and the resources used to describe them. Given that information, data consumers can make considered choices about which resource they want to link to. Mostly, they are interested in current information. They need to be able to determine whether the published description of a Spatial Thing meets their needs. For example, is the published geographic extent of the City of Amsterdam relevant for a land-usage study of the nineteenth century? (Gemeentegeschiedenis.nl, "Municipality History", illustrates how the extent of Amsterdam has changed during the past 200-years, in HTML and GeoJSON). Where the information is available, a user may want to browse older versions of the published information to understand the nature of any changes or to find historical information.
Users are provided with the most recent version of information about a Spatial Thing and its attributes by default.
Users can determine the time-period for which data is applicable.
If a version history of changes is available, users can browse through a set of changes to see how a Spatial Thing and its attributes have changed over time.
When publishing information about a Spatial Thing that is subject to change there are four approaches to consider in response to a change:
Whichever approach is chosen, publishers of spatial data should consider how dataset metadata plays an important part in helping users determine whether a dataset is fit for their use. Particularly where the contents of a dataset change with time, statements about the (most recent) publication date, the frequency of update and the time-period for which the dataset is relevant (i.e. temporal extent) should be provided. Please refer to [[DWBP]] section 8.2 Metadata for more details about dataset metadata.
A description of the lifecycle of the Spatial Things (e.g. what triggers a change and whether those changes are versioned etc.) should also be provided in either the dataset's metadata, schema or specification. For example, the UK's Digital National Framework policy states that data publishers must provide these lifecycle rules.
Approach (1) is lightweight and should only be used where there are no user requirements that require access to older descriptions of the Spatial Things. Data publishers simply replace the old description of the Spatial Thing with the amended description and keep users informed about updates by providing the appropriate metadata (e.g. when the data was changed). This may be achieved using dataset metadata (as outlined above) or by including the metadata attributes in the description of each Spatial Thing.
Where users are anticipated to need to understand how a Spatial Thing has changed over time, approaches (2), (3) and (4) should be considered.
Approach (2) is a simple variant on approach (1); the difference being that the entire dataset is assigned a new URI when changes are made, thereby enabling older versions of the dataset to be addressed separately. See [[DWBP]] Best Practice 11: Assign URIs to dataset versions and series for further details. Using this approach, a user should be able to compare two versions of the dataset to determine what has changed. Although simple for data publishers, the downside of this approach is that the effort is passed on to the users.
Approach (3) requires the data publisher to publish immutable resources that describe the Spatial Thing at specific points in time (i.e. "snapshots") and provide a mechanism for users to browse between those snapshots. Effectively, the dataset becomes an accumulation of these snapshots that users can browse through. However, given that each snapshot of the Spatial Thing is published as a separate resource, this approach is suited to infrequent changes so that the number of snapshots does not become unwieldy.
The URI for the Spatial Thing, the base URI, should dereference to provide the current information and a link to its version history of snapshots. [[DWBP]] Best Practice 8: Provide version history describes how a version history may be implemented. Each snapshot resource within the version history must be uniquely identified; a common approach is to append a date/time stamp to the base URI as a version indicator. [[DWBP]] Best Practice 7: Provide a version indicator provides relevant guidance.
Approach (4) is suitable where a Spatial Thing has a small number of attributes that are frequently updated. For example, the GPS-position of a runner or when streaming data from a sensor, such as the water level from a stream gauge.
With this approach, the description of the Spatial Thing must include a property that contains a sequentially-ordered set of data-points, each of which defines a time-stamp and the values for the time-varying attribute(s). By definition, this property can be considered as a time-series coverage. Standard data encodings are available for time-series data, including: [[TIMESERIESML]] for [[GML]], plus [[COVJSON-OVERVIEW]] and the SensorThings API [[SENSORTHINGS]] for JSON [[RFC7159]]. [[VOCAB-DATA-CUBE]] provides a generic mechanism to express well-structured data, such as time series, in RDF [[RDF11-PRIMER]]. Although not yet widely used enough to be considered best practices, [[EO-QB]] and [[QB4ST]] (developed alongside this best practice Note within the Spatial Data on the Web Working Group) illustrate how [[VOCAB-DATA-CUBE]] may be used in this way.
Information about a given Spatial Thing, or set of Spatial Things, will be relevant for a particular time or time-period. Check that this information is stated.
Check that dataset metadata provides details about how often the dataset is updated; e.g. date of most recent publication, frequency of update.
If a version history of changes is available, check that links to previous versions are available.
If the Spatial Thing contains an attribute that varies with time, check that those attribute values are provided as a time-series.
In recent years, we have seen widespread emergence of Web applications that use spatial data. Often these applications do not access all the spatial data they use via the Web. While there are good reasons for this, e.g. licensing restrictions, it is often the case, too, that the spatial data is not available via the Web at all, or in ways that application developers find too complex to use, or with insufficient or unclear quality-of-service commitments.
[[DWBP]] provides best practices discussing access to data using Web infrastructure (see [[DWBP]] section 8.10 Data Access). This section provides additional insight for publishers of spatial data.
Making data available on the Web requires data publishers to provide some form of access to the data. There are numerous mechanisms available, each providing varying levels of utility and incurring differing levels of effort and cost to implement and maintain. Publishers of spatial data should make their data available on the Web using affordable mechanisms to ensure long-term, sustainable access to their data.
When determining the mechanism to be used to provide Web access to data, publishers need to assess utility against cost. In order of increasing usefulness and cost:
Let's take a closer look at these options.
The download of a dataset — or a pre-defined subset of it — via a single HTTP request is mainly covered by these [[DWBP]] best practices:
Providing bulk-download or streaming access to data is useful in any case and is relatively inexpensive to support as it relies on standard capabilities of Web servers for datasets that may be published as downloadable files stored on a server. However, this option is more complex for frequently changing datasets or real-time data.
[[DWBP]] Best Practice 18: Provide Subsets for Large Datasets explains why providing subsets is important and how this could be implemented. Spatial datasets, particularly coverages such as satellite imagery, sensor measurement time-series and climate prediction data, are often very large. In these cases, it is useful to provide subsets by having identifiers for conveniently sized subsets of large datasets that Web applications can work with.
Effectively, breaking up a large coverage into pre-defined lumps that you can access via HTTP GET requests is a very simple API.
When a subset is provided, this should include information about the relationship to the complete dataset. In HTML, this could be descriptive text or it is implicitly clear for humans in the way the subset is presented. In [[SCHEMA-ORG]] it could be schema:isPartOf property. In RDF [[RDF11-PRIMER]], PROV-O could be used to describe the relationship between the subset and the complete dataset as well as the mechanism used to derive the subset. In ISO 19115 metadata, the LI_Lineage element may be used for a similar purpose. Etc.
The use of APIs to access data is covered in [[DWBP]] by the following best practices:
For spatial data, SDIs have long been used to provide generalized access to spatial data via Web services, typically using open standard specifications from the Open Geospatial Consortium (OGC). The main examples are Web Feature Service [[WFS]], Web Coverage Service [[WCS]], Sensor Observation Service, SensorThings [[SENSORTHINGS]] or [[GeoSPARQL]] for access to data, or Web Map Service [[WMS]] and Web Map Tile Service [[WMTS]] for access to data rendered as map. Apart from the Web Map Service, the OGC standards have not seen widespread adoption beyond the geospatial expert community.
In the list of options above, a third option is included as sharing spatial data on the Web using the first two options (bulk download or generalized APIs) may not be sufficient for reaching application developers. Reasons for this include:
A useful 'bespoke API' mentioned in the third option provides convenience to developers of the targeted applications, because the API designer has thought about the needs of those developers when consuming the spatial data shared via the API.
Expose spatial data through 'convenience APIs'
If you have a specific type of application in mind for your data, tailor a spatial data access API to meet that goal.
Providing access to spatial data via bulk download or generalized spatial data access APIs may be too complex for application developers with relatively simple requirements, if the spatial data or the API is complex to understand or too large to handle in a Web application. Convenience APIs are tailored to meet a specific goal; enabling a user to engage with complex data structures using (a set of) simple queries, including spatial search.
The API provides a coherent set of queries and operations, including spatial ones, that help users get working with the data quickly to achieve common tasks. The API provides both machine readable data and human readable HTML markup. The human-readable markup will also support search engine's Web crawlers to enable indexing of spatial data.
The API should:
In a White Paper about open geospatial APIs [[OGC-API-WP]], the Open Geospatial Consortium (OGC) has defined the concept of the "OGC API Essentials" — a set of items defined in OGC standards and other open standards that are reusable modules for use in geospatial APIs. The White Paper provides an initial list and many of the identified standards are mentioned in this document. Reuse of standardized building blocks improves consistency and interoperability across APIs. It is recommended to consider the OGC API Essentials when defining an API to access spatial data.
One such essential is a set of well-known spatial predicates for use in queries to select Spatial Things based on their geometry. Most commonly supported is the following set: equal, disjoint, touches, within, overlaps, crosses, intersects, contains. These predicates were originally defined in [[SIMPLE-FEATURES]], but are also supported by [[GeoSPARQL]], WFS [[WFS]] and others. For more information about the definition of the predicates, see [[SIMPLE-FEATURES]].
If the data is already published in a Spatial Data Infrastructure, there are basically two options to publish the data via an additional convenience API.
Reuse your existing spatial data infrastructure
Use a RESTful API as a wrapper, proxy or a shim layer can be created around SDI services. This aims at exposing 'generalized APIs' using 'convenience APIs' to make the data easier to use. For example, in the geospatial domain there are a lot of WFS services providing spatial data. Content from the WFS service can be provided in this way as linked data, JSON [[RFC7159]] or another Web friendly format using simple, navigable resources. This approach is like the use of Z39.50 in the library community; that protocol is still used but 'modern' Web sites and Web services are wrapped around it.
Provide parallel Web-friendly access to the data as an alternative
A more effective route may be to provide an alternative 'Web friendly' access path to the spatial data is to create a new, complementary service endpoint on top of the native storage of the dataset. This limits the load on your SDI compared to the first option, which may matter as the data access APIs of the SDI will continue to be used by expert users and their complex data management tasks.
See the "How to test" sections in [[DWBP]] Best Practice 23: Make data available through an API, [[DWBP]] Best Practice 24: Use Web Standards as the foundation of APIs and [[DWBP]] Best Practice 25: Provide complete documentation for your API.
[[DWBP]] provides best practices discussing the provision of metadata to support discovery and reuse of data (see [[DWBP]] section 8.2 Metadata for more details). Providing metadata at the dataset level supports a mode of discovery well aligned with the practices used in Spatial Data Infrastructure (SDI) where a user begins their search for spatial data by submitting a query to a catalog. Once the appropriate dataset has been located, the information provided by the catalog enables the user to find a service end-point from which to access the data itself — which may be as simple as providing a mechanism to download the entire dataset for local usage or may provide a rich API enabling the users to request only the required parts for their needs. The dataset-level metadata is used by the catalog to match the appropriate dataset(s) with the user's query.
This section includes best practices for including the spatial extent, CRS, and other spatial details of the dataset in the metadata. These are the extra metadata items needed to make spatial datasets both discoverable and usable. A third best practice in this section goes a step further in granularity: exposing spatial data on the Web in such a way that individual entities or "granules" within a dataset can be discovered, evaluated, and utilized.
Quality information is also an important part of spatial metadata, especially for asserting if data is fit for a certain purpose. [[DWBP]] provides a best practice discussing how the quality of data on the Web should be described (see [[DWBP]] section 8.5 Data Quality for more details). This section is based on the Data Quality section from [[DWBP]] and adds a best practice specific for spatial data, which concentrates on the accuracy of the positions in the data — how close are they to the actual positions of the real world things?
In the Spatial Metadata section, we provided a Best Practice on how to deal with CRS in spatial data on the Web. There is also a clear link between CRS and data quality, because the accuracy of spatial data depends for a large part on the CRS used. This can be seen as conformance of data with a "standard" — in this case, a (spatial or temporal) reference system. This is how you can describe spatial data quality using different vocabularies. We will provide an example in this section.
For some uses, it may be sufficient to simply state conformance to a published specification:
However, that specification makes no statement about the positional accuracy of the data, so on its own, it is only a useful quality statement for users to whom positional accuracy is not that important.
Include spatial metadata in dataset metadata
Since location is such a powerful organizing principle, it is usually necessary to specifically describe the spatial details and nature of a dataset to discover it as well as to determine its fitness for use. This information is used, for example, by SDI catalog services that offer spatial querying to find data — but also by users to understand the nature of the dataset. In some cases, for example when dealing with crowd-sourced data, provenance information or how the dataset came to be in its published form and with what quality, is important as well.
The first level of spatial description is the spatial extent of the dataset, the area of the world that the dataset describes. This often suffices for initial discovery, but further levels of description are needed to evaluate a dataset for use. These include the dataset spatial coverage (continuity, resolution, properties) as well as the spatial representation or geometric model (for example, grid coverage, discrete coverage, point cloud, linear network).
Dataset quality measures such as positional accuracy are also important for determining applicability. In the case of datasets whose spatial characteristics vary over their temporal coverage, spatial descriptions must include an explicit temporal aspect.
When publishing a dataset, provide as much spatial metadata as necessary, but at least the spatial extent, coverage, and representation. Other examples of spatial metadata include:
In Spatial Data Infrastructures, the accepted standard for describing metadata is [[ISO-19115]] or profiles thereof.
To provide information about the spatial attributes of the dataset on the Web one can:
Check if the spatial metadata for the dataset itself includes the overall features of the dataset in a human-readable format.
Check if the descriptive spatial metadata is available in a valid machine-readable format.
Describe the positional accuracy of spatial data
Accuracy of spatial data should be specified in machine-interpretable and human-readable form.
The amount of detail that is provided in spatial data and the resolution of the data can vary. No measurement system is infinitely precise and in some cases the spatial data can be intentionally generalized (e.g. merging entities, reducing the details, and aggregation of the data) [[Veregin]]. Some spatial data applications, such as aircraft navigation, require highly accurate data. For others, such as human navigation, a horizontal accuracy of a few meters is good enough. For yet others, such as overlaying weather forecasts on a map, the map is only giving a general indication of place. If the positional accuracy is published together with the data, the user can determine whether it is appropriate to use for their application. Potentially, this makes existing data more re-usable.
For many uses, the positional accuracy of the data is an important aspect of assessing its fitness for purpose (quality). As with other data quality statements, this can be a quantitative measure, a statement of conformance to a standard or policy, or an assertion or report of fitness for a particular purpose.
Describe the accuracy of spatial data in a way that is understandable for humans.
In addition, describe the accuracy of spatial data in a machine-readable format. [[VOCAB-DQV]] is such a format. It is a vocabulary for describing data quality, including the details of quality metrics and measurements.
For observed (measured) datasets, it is possible to make specific quantitative statements about positional accuracy, based on knowledge of the equipment used to make the observations, and any processing carried out.
For coverages, the sampling distance is an effective way of indicating the amount of detail in the dataset — this is one of the meanings of the term "resolution". Alternatively, samples of the data could be independently checked against the real world, and the results of that check reported. Either way, this is usually a statement of absolute positional accuracy, but for some uses, relative positional accuracy is more important.
Positional accuracy measurements, whether observed or asserted based on process, can be given using QualityMeasurement.
For modelled datasets, for example in planning and construction, there is no 'real world' against which to assess the positional accuracy — but relative positional accuracy can still be stated.
For many uses, a statement of the amount of detail provided is sufficient to assess fitness for purpose; examples include "level of detail" (building models), "navigational purpose" (marine navigation), "equivalent scale" or "zoom level" (cartography). Sometimes, this is expressed as if it were a statement of positional accuracy.
These can be expressed in the same way as for non-spatial data; for example using the
QualityPolicy statements of [[VOCAB-DQV]].
The following example shows how [[VOCAB-DQV]] can express conformance to a specified positional accuracy
The following example shows how [[VOCAB-DQV]] can express the amount of detail in a coverage dataset:
The [[VOCAB-DQV]] approach is recommended also in [[VOCAB-DCAT]] as a general solution to specify precision and accuracy. However, in order to address the most common case of spatial resolution (i.e., as horizontal ground distance), [[VOCAB-DCAT]] defines also a specific property,
dcat:spatialResolutionInMeters. By using this property, can be re-written as follows:
Finally, [[GeoDCAT-AP]], building upon the [[VOCAB-DQV]] approach, defines specific individuals for the different types of spatial resolution in [[ISO-19115]] and [[ISO-19115-1-2014]] — namely:
The use of these individuals is illustrated in the following example:
Check if the metadata contains at least one human and machine readable statement regarding positional accuracy
Check that the kind of statement is relevant to the kind of data, e.g. not an absolute positional accuracy measure for Atlantis
Checking whether the accuracy statement is actually correct is beyond the scope of this best practice.
The best practices described in this best practice document are compiled based on evidence of real-world application, as described in . However, there are several issues that inhibit the use or interoperability of spatial data on the Web, for which no evidence of real-world applied solutions is available. These issues are denoted “gaps in current practice”. In the case of gaps, there might be emerging practice i.e. a solution that has been theorized for a certain issue and has possibly been experimented on in beta settings, but not in production environments. Gaps and emerging practices in the area of publishing spatial data on the Web are discussed in this section.
There are several aspects to representing geometries on the Web. First, there is the question of different serialization formats to choose from. The different formats are described in , but the Best Practice does not make a definitive choice as to which format is best. Although having only one format for geometries on the Web would reduce complexity, the currently available formats reflect different requirements with respect to data complexity and manipulation.
A second, related aspect is whether to publish geometries on the Web in self-contained files such as [[GML]] or GeoJSON [[RFC7946]], or rather to embed geometries as structured data markup in HTML, or in an RDF [[RDF11-PRIMER]] based way i.e. as Linked Data. Choosing between these approaches — or not choosing but rather offering a combination of these — depends largely on the intended audience.
Another issue is about the fact that different use cases may require geometries at different levels of accuracy, precision, and size. outlines some of the approaches to address this requirement, considering general application scenarios and providing guidance on the criteria to be taken into account for choosing the appropriate technique (e.g., compress geometry data, use compact formats, apply geometry generalization mechanisms). The overall recommendation is to make available multiple representations of geometry data, and to give data consumers the ability to identify those most fit for purpose. A variety of mechanisms can be used to achieve this, as publishing different geometry representations at different URIs, and accompanying them with a human- and/or machine-readable description of their characteristics (e.g., format, spatial resolution, scale, level of generalization). However, the lack of common practices in this area makes it difficult to provide consistent guidelines on how to publish and access different geometry representations.
Finally, there is a question of how to make geometries available in different CRSs. describes why this is a good idea as well as how to decide which CRSs to provide. explains how the CRS of geometries should be made known. It follows that users should be able to find out which CRSs are available and access geometries in the CRS of their choice. In an OGC WFS [[WFS]] request, users can specify the CRS they wish to use by specifying the srsName parameter. In [[GeoSPARQL]] the getSRID function returns the spatial reference system of a geometry, thus making it possible to request a specific CRS at a (Geo)SPARQL endpoint. However, these options require the user to be proficient in either Geospatial Web services or Linked Data. A best practice for requesting and returning geometries in a specific requested CRS has not yet emerged. Many options can be found in current practice, including creating CRS-specific geometry properties (for example, the Dutch Land Registry does this), and supporting an option for requesting a specific CRS in a convenience API; but one best practice cannot yet be identified.
An option we considered was the use of content negotiation i.e. negotiate CRS as part of the content format for the geometry. A concrete proposal for this suggests the following:
However, providing different CRS might be too complicated to handle in the HTTP protocol. For example, multidimensional datasets will in general use multiple CRSs (e.g. horizontal, vertical and temporal, maybe more), and conversion between CRSs will in general introduce errors, so data in one CRS are not exactly the same as data in another CRS. It might therefore be more appropriate to handle this in the application layer. Generally, choosing among various parameters options of data objects such as geometries would be an overloading of HTTP content negotiation protocols.
Although a large amount of spatial data has been published on the Web, so far there are few authoritative datasets containing geometrical descriptions of their boundaries. Their number is growing (e.g. at the time of writing there are three authoritative spatial datasets publicly available as linked data in the Netherlands containing topographic, cadastral, and address data), but currently there is no common practice in the sense of the same spatial vocabulary being used by most spatial data publishers. Direct georeferencing of data implies representing coordinates or geometries and associating them to a CRS. This requires vocabularies for geometries and CRSs. The consequence is the lack of a baseline during the mapping process for application developers trying to consume specific incoming data. Datasets describing administrative units, points of interest or postal addresses with their labels and geometries, and identifying these Spatial Things with URIs could be beneficial not only for georeferencing other datasets, but also for interlinking datasets georeferenced by direct and indirect location information.
Currently, no single standardized vocabulary is available that covers all needs. A possible way forward is an update for the [[GeoSPARQL]] spatial ontology. This will provide an agreed spatial ontology, i.e. a bridge or common ground between geographical and non-geographical spatial data and between W3C and OGC standards; conformant to the [[ISO-19107]] abstract model and based on existing available ontologies such as [[GeoSPARQL]], the [[W3C-BASIC-GEO]] vocabulary, [[NeoGeo]] and the ISA Programme Location Core Vocabulary [[LOCN]].
This vocabulary would define basic semantics for the concept of a reference system for spatial coordinates, a basic datatype, or basic datatypes for geometry, how geometry and real world objects are related and how different versions of geometries for a single real world object can be distinguished. For example, it makes sense to publish different geometric representations of a spatial object that can be used for different purposes. The same object could be modelled as a point, a 2D polygon or a 3D polygon. The polygons could have different versions with different resolutions (generalization levels). And all those different geometries could be published with different coordinate reference systems. Thus, the vocabulary would provide a foundation for harmonization of the many different geometry encodings that exist today.
Even if all spatial data should become findable directly through search engines, data portals would still remain important hubs for data discovery — for example, because the metadata records registered there can be made crawlable. But in addition, different data portals can harvest each other's information provided there is consistency in the types and meaning of included information, even if structures and technologies vary. In the eGovernment sector, [[VOCAB-DCAT]] is a standard for dataset metadata publication and harvesting implemented by these portals. Because it is lacking in possibilities for describing some specific characteristics spatial datasets, an application profile for spatial data, [[GeoDCAT-AP]], has been developed in the framework of ISA Programme of the European Union, with the primary purpose of enabling the sharing of spatial metadata across domains and catalogue platforms. [[GeoDCAT-AP]] is mentioned in the "Possible approach" section of several Best Practices in this document.
With the goal of sharing spatial metadata, [[GeoDCAT-AP]] defined RDF bindings covering the core profile of [[ISO-19115]] and the INSPIRE metadata schema [[INSPIRE-MD]], enabling the harmonized RDF representation of existing spatial metadata. The reason of this choice was to focus first on the most used metadata elements, whereas additional mappings could be defined in future versions of the specification, based on users’ and implementation feedback.
The next step is evolution towards a single standard for metadata as it is used in data portals, without loss of relevant metadata, while still understandable and not too complicated. A working group in the Open Geospatial Consortium is currently working on this.
Large and complex datasets, for example data gathered using automated sensors, may be impossible to download in their entirety due to their dynamic nature and potential volumes. It is therefore necessary in these cases to be able to adequately describe the structure of such data and how services interact to expose subsets of it — even individual records in a Linked Data context. Currently, there is no established Best Practice for dealing with this, especially when taking the spatial and temporal dimensions into account.
Large, complex datasets are common in the information processing world, and commonly organized in “hypercubes” — where “data dimensions” are used to locate values holding results. A standard based on this dimensional model of data is the RDF Data Cube vocabulary [[VOCAB-DATA-CUBE]]. It has been used to publish sensor data, but RDF Data Cube is lacking in possibilities for describing spatio-temporal aspects of data, which are very important for observations. One of the work items in the Spatial Data on the Web working group has been to extend the existing RDF Data Cube ontology to support specification of key metadata required to interpret spatio-temporal data, called [[QB4ST]].
QB4ST is an extension to RDF Data Cube to provide mechanisms for defining spatio-temporal aspects of dimension and measure descriptions. It is intended to enable the development of semantic descriptions of specific spatio-temporal data elements by appropriate communities of interest, rather than to enumerate a static list of such definitions. It provides a minimal ontology of spatio-temporal properties and defines abstract classes for data cube components (i.e. dimensions and measures) that use these, to allow classification and discovery of specialized component definitions using general terms.
QB4ST is designed to support the publication of consistently described re-usable and comparable definitions of spatial and temporal data elements by appropriate communities of practice. One obvious such case is the use of GPS coordinates described as decimal latitude and longitude measures. Another example is the intended publication of a register of Discrete Global Grid Systems (DGGS) by the OGC DGGS Working Group. QB4ST is intended to support publication of descriptions of such data using a common set of attributes that can be attached to a property description (extending the available RDF-QB mechanisms for attributes of observations).
Spatial data is often concerned with measurements (distance, angles etc.) — for example, when specifying the position of a feature according to a Coordinate Reference System or the accuracy of that position.
For measurement values to be correctly interpreted, a unit of measurement must also be specified. The challenge here is specifying units of measurement in a way that can be widely understood.
As humans, we’re usually quite good at guessing. For example, given a discussion about the accuracy of a position, the assertion
±3.1 m probably means 3.1 meters. That seems reasonable — but it might also be 3.1 miles. Unfortunately, software systems mostly lack the human ability to guess. So we need to unambiguously express which unit of measure is being used — and this is where the problems exist.
There are essentially two mechanisms that can be used:
Use a named serialization scheme that provides string-literal notation for both base units and derived units. Given that there are an infinite number of derived units, such a serialization should specify a formal grammar that software applications use to interpret those strings; enabling automated conversion between units and other useful functions like verifying that two measured quantities can be combined based on the dimensionality of those measurements (e.g. you can’t combine a length with an area and get a sensible answer!).
Use a URI; such as those provided by Quantities, Units, Dimensions and Data Types Ontologies (QUDT) and Ontology of units of Measure (OM). For example, the unit of measure meter has the URIs
http://qudt.org/vocab/unit/M (QUDT) and
Earlier versions of [[GML]] required that every unit was specified using a URI. But, in practice, many were using symbols like "
m" instead of a URI anyway, as they are shorter and often better understood. As a result, [[GML]] clause 188.8.131.52 MeasureType, UomIdentifier now allows the Unified Code for Units of Measure (UCUM) unit of measure serialization in addition to URIs.
If you choose to use a serialization scheme for expressing units of measure, you should select one that is well-known among your community of users.
It’s also worth noting that, if your format or vocabulary allows, you should include a human readable label. For the simple case of displaying the data on a Web page, this removes the need to look up this information from the serialization scheme specification or vocabulary.
The trouble with the use of serialization schemes is that we can’t assume client applications understand the notation. We need some mechanism to indicate which serialization is being used — either so that application developers can find the specification and source some software (e.g. the ucum.js library) to process the unit strings, or so that the client application can map the notation to a well-known URI whose definition conforms to a data model that the application can understand.
There is no evidence of best practice here — nor is there consensus on which data model is best for describing units of measure. Possible approaches to identify the serialization scheme used include:
Provide this information in the data itself, e.g.:
Provide this information in the description of the API that provides access to your data; see [[DWBP]] Best Practice 25: Provide complete documentation for your API.
Convey this information in the HTTP response headers; e.g. using the profile Link Relation Type [[RFC6906]].
In summary, if you think that your users will need to support automated processing of units of measure, then, in lieu of widespread best practice, it will likely be worth engaging with your user community to determine how best to meet their needs.
Looking to the future, the W3C Web of Things Interest Group has created a task force to address the challenges of semantic interoperability relating to units of measure, which may lead to emergence of best practices can be adopted by spatial data publishers.
Unlike administrative areas and other topographic features that have clearly defined boundaries, places often have ill-defined, fuzzy boundaries that are based on human perception of ‘place’; you can’t always define a boundary for a place. For example, Edinburgh the named place, published by Ordnance Survey, is described using only a notional point geometry; information is not provided about the geometric extent. Other examples of places with ill-defined, fuzzy geometries include The Sahara, the American West and Renaissance Italy. The relationships between places, with their ill-defined (or even absent) geometrical extents, defy description using the topological relationships which are computed mathematically from geometry.
Given the lack of existing best practice, we propose the use of a qualitative assertion based on human perceptions to relate places that are deemed to be the same: samePlaceAs.
Given that the notion of place concerns a social perspective, we consider it to be distinct from location which is based on geometry. As a result,
samePlaceAs can be used to assert the imprecise, social perceptions about the equality of places.
samePlaceAs does not overlap with the topological relationships described later in this best practice document that can be computed from geometry.
As with all assertions of an imprecise nature that lack formal semantics,
samePlaceAs may have limited value for semantic reasoning. Exactly what constitutes the ‘same place’ will always be somewhat debatable. For example, is ancient Byzantium the same place as modern Istanbul? Is a historical hotel that was moved across the street to save it from demolition in a redevelopment scheme that same place that it used to be?
[[SCHEMA-ORG]] would be a good home for this link relation type. The definition would be something as follows:
Used to relate two places that are perceived to be the same; the physical extent of the two places should be broadly comparable but do not need to be equal in a topological or geometric sense.
Values expected to be one of these types:
Used on these types:
However, the current definition of
schema:Place is a little too general:
Entities that have a somewhat fixed, physical extension.
This definition includes anything with spatial extent (i.e. all Spatial Things); we would consider "my car keys" to be a Spatial Thing, but not a place.
A continuation of the SDW WG intends to formally define
This section gives two tables that aim to be helpful in selecting the right spatial data encoding in a given situation. There is not one most appropriate format: which format is best may depend on many things. The first table gives an overview of common spatial data formats; the second, an overview of common spatial data RDF [[RDF11-PRIMER]] vocabularies.
The first table is a matrix of the common formats, showing in general terms how well these formats help achieve goals such as discoverability, granularity etc.
Please note that all the listed formats are open and text-based.
|Usage||Representation of 0D-2D geometries, CRS and CRS transformation||Representation of Spatial Things and 0D-3D geometries. Comprehensive and supporting many use cases.||Representation of Spatial Things and 0D-3D geometries. Main focus on spatial data visualization and interaction||Representation of Spatial Things and 0D-2D geometries||Description of Spatial Things and geometries can be embedded by using mechanisms as [[HTML-RDFa]], [[MICRODATA]], [[JSON-LD]], using vocabularies as [[SCHEMA-ORG]]|
Widely supported in GIS tools
Supported by some Web libraries, usually converted in GeoJSON [[RFC7946]]
Supported by most triple stores
Widely supported in GIS tools
Supported by some Web libraries, usually converted in GeoJSON [[RFC7946]], but not when the geometry is 3-dimensional (volumes)
Supported only by triple stores supporting [[GeoSPARQL]]
Mainly supported by Earth browsers, as Google Earth
Supported in some GIS tools
Widely supported in Web libraries and mapping APIs
Optimal for Web publication and discovery
|Link support||No||Via [[XLINK11]]||Via [[XLINK11]]||No||Yes|
|CRS support||Depends on the flavor — e.g., EWKT and [[GeoSPARQL]]'s WKT support arbitrary CRSs, and the latter defaults to WGS 84 long/lat (CRS84)||Any, and it can be explicitly specified (via attribute
||WGS 84 long/lat (CRS84) only||WGS 84 long/lat (CRS84) only||Depends on the vocabulary used — e.g., [[SCHEMA-ORG]] supports WGS 84 only|
|Axis order support||Any, but it cannot be explicitly specified — e.g., in [[SIMPLE-FEATURES]]'s WKT and EWKT it defaults to longitude/latitude, whereas in [[GeoSPARQL]]'s WKT it is determined by the CRS used||Determined by the CRS used||Longitude / latitude only, with optional altitude||Longitude / latitude only, with optional altitude||Depends on the vocabulary used — e.g., [[SCHEMA-ORG]] supports lat/long only|
|3D support||No||Yes||Yes||No||Depends on the vocabulary used — e.g., [[SCHEMA-ORG]] does not support 3D geometries|
The following table compares common spatial data vocabularies and what you can do with them.
Additional vocabularies can be discovered from Linked Open Vocabularies (LOV); using search terms like 'location' and 'place', or tags Geography, Geometry and Time.
|Description||Includes terms for describing location and temporal information, as classes
||A widely used vocabulary, although not an official standard, for specifying point coordinates in the WGS 84 datum.||Includes terms for describing postal addresses and 0D geometries (points).||
Vocabulary defined by the W3C Geospatial Incubator Group (GeoXG) for the representation of geospatial properties of Web resources.
On 28 March 2017, [[GeoRSS]] has been proposed as a candidate OGC Community Standard.
|Designed for annotating Web pages with machine-readable metadata, it supports a number of classes and properties for specifying location information, including geometries. See for more information.||Official OGC standard, defining a set of terms and functions for modeling and querying spatial information. Coordinates are encoded by using WKT or [[GML]].||Defines a set of general terms for describing location information that can be extended based on domain-specific requirements. Covers geographical names, geometries, and postal addresses.||Reuses [[DCTERMS]]. [[OWL-TIME]], and [[LOCN]] for describing location and temporal information, and it defines additional terms — namely,
|Properties to associate Spatial Things with geometries||-||
||Geometries are represented with the
||Geometries are represented with a literal encoding of point coordinates||
|CRS support||-||WGS 84 only||WGS 84 only||WGS 84 only||WGS 84 only||Any||Any (depends on how the geometry is represented)||Any (depends on how the geometry is represented)|
|Axis order support||-||lat/long only||lat/long only||lat/long only||lat/long only||Determined by the CRS used||Any (depends on how the geometry is represented)||Any (depends on how the geometry is represented)|
|0D support||-||lat/long coordinate pair (
||lat/long coordinate pair||lat/long coordinate pair||[[GML]], WKT||[[GML]], WKT, GeoJSON [[RFC7946]],
||[[GML]], WKT, GeoJSON [[RFC7946]]|
|1D and 2D support||-||-||-||lat/long coordinate pairs, separated by a comma||lat/long coordinate pairs, separated by a comma or a space||[[GML]], WKT||[[GML]], WKT, GeoJSON [[RFC7946]]||[[GML]], WKT, GeoJSON [[RFC7946]]|
The list below describes the main benefits of applying the Spatial Data on the Web Best Practice. The benefits are identical to those defined in [[DWBP]]. Each benefit represents an improvement in the way how spatial datasets are available on the Web.
The following table relates Best Practices and Benefits.
The figure below shows the benefits that data publishers will gain with adoption of the Best Practices.
The list below illustrates how the requirements defined in [[SDW-UCR]] are met by a combination of the best practices defined in this document (Spatial Data Best Practices) and those defined in [[DWBP]] (General Data Best Practices).
|Requirements||Spatial Data Best Practice||General Data Best Practice|
Absolute positional accuracy: The closeness of reported coordinate values to values accepted as or being true [[ISO-19159-2]].
Axis order: The order in which coordinates are presented. For example, some systems use (latitude, longitude) rather than (longitude, latitude). The latter is more similar to the mathematical convention of (x, y) ordering. The order used may differ from the order used to define the coordinate system.
Coordinate Reference System (CRS): A coordinate system to locate entities of interest with respect to an object using a datum [[ISO-19111]]. If the entities of interest and the object and datum are in the real world, the CRS is a Spatial Reference System (SRS). If the object is the Earth, the SRS is a Geo-Spatial Reference System (GRS). A GRS may be local, regional or global in scope. An example of a CRS that is not a SRS is the wavelength of a signal in the electromagnetic spectrum.
Coverage: A coverage is a function that describe characteristics of real-world phenomena that vary over space and/or time. Typical examples are temperature, elevation and precipitation. A coverage is typically represented as a data structure containing a set of such values, each associated with one of the elements in a spatial, temporal or spatiotemporal domain. Typical spatial domains are point sets (e.g. sensor locations), curve sets (e.g. contour lines), grids (e.g. orthoimages, elevation models), etc. A property whose value varies as a function of time may be represented as a temporal coverage or time-series [[ISO-19109]].
Comma Separate Values (CSV): A file format for tabular data that writes each row on a separate line and each cell is separated from the next with a comma; see [[RFC4180]]. CSV is just one variety of tabular data; for more information refer to [[TABULAR-DATA-PRIMER]].
Datum: Parameter or set of parameters that define the position of the origin, the scale, and the orientation of a coordinate system [[ISO-19111]].
Dimension (geometry): In physics and mathematics, the dimension of a mathematical space (or object) is informally defined (see Wikipedia entry) as the minimum number of coordinates needed to specify any point within it. Thus, a point has no dimension (0D) as there is no inside, whereas a line has a dimension of one (1D) because only one coordinate is needed to specify a point along it – for example, the point at 5 on a number line. A surface such as a plane or the surface of a cylinder, torus or sphere has a dimension of two (2D) because two coordinates are needed to specify a point on it – for example, both a latitude and longitude are required to locate a point on the surface of a sphere. The inside of a cube, cylinder, torus or sphere is three-dimensional (3D) because three coordinates are needed to locate a point within these spaces. For a formal rigorous mathematical definition see the ISO definition [[ISO-19107]].
Discrete Global Grid System: A DGGS is a form of Earth reference that, unlike its established counterpart the coordinate reference system that represents the Earth as a continual lattice of points, represents the Earth with a tessellation of nested cells. Generally, a DGGS will exhaustively partition the globe in closely packed hierarchical tessellations, each cell representing a homogenous value, with a unique identifier or indexing that allows for linear ordering, parent-child operations, and nearest neighbour algebraic operations.
Ellipsoid: An ellipsoid is a closed quadric surface that is a three-dimensional analogue of an ellipse. In geodesy, a reference ellipsoid is a mathematically defined surface that approximates the geoid.
Extensible Markup Language (XML): A simple, very flexible text-based markup language derived from SGML (ISO 8879). It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable [[XML11]].
Extent: The area covered by something. Within this document, we always imply spatial extent; e.g. size or shape that may be expresses using coordinates.
Feature: Abstraction of real world phenomena. A digital representation of a real world entity or an abstraction of the real world. Examples of features include almost anything that can be placed in time and space, including desks, buildings, cities, trees, forest stands, ecosystems, delivery vehicles, snow removal routes, oil wells, oil pipelines, oil spill, and so on. The terms feature and object are often used synonymously [[ISO-19101-1-2014]].
Geocoding: Forward geocoding, often just referred to as geocoding, is the process of converting addresses into geographic coordinates. Reverse geocoding is the opposite process; converting geographic coordinates to addresses. See also the ISO definition [[ISO-19133]].
Geographic information (also geospatial data): Information concerning phenomena implicitly or explicitly associated with a location relative to the Earth. [[ISO-19101-1-2014]].
Geographic information system (GIS): An information system dealing with information concerning phenomena associated with locations relative to the Earth. [[ISO-19101-1-2014]].
Geohash: A specific geocoding system with a hierarchical spatial data structure which subdivides space into nested regions. Geohashes and some other geocoding systems offer properties like arbitrary precision and the possibility of repeatedly truncating characters from the end of the code to reduce its size and precision. As a consequence of the gradual precision degradation, nearby places will often (but not always) present similar prefixes. The longer a shared prefix is, the closer the two places are. Coordinate and address systems generally do not have this property. (Source: wikipedia).
Geoid: An equipotential surface where the gravitational field of the Earth has the same value at all locations. This surface is perpendicular to a plumb line at all points on the Earth's surface and is roughly equivalent to the mean sea level excluding the effects of winds and permanent currents such as the Gulf Stream.
Geometry: An ordered set of n-dimensional points in a given coordinate reference system; can be used to model the spatial extent or shape of a Spatial Thing.
Internet of Things (IoT): The network of physical objects or "things" embedded with electronics, software, sensors, and network connectivity, which enables these objects to be controlled remotely and to collect and exchange data.
Latitude: The angular distance north or south of the equator. Often abbreviated to Lat.
Link: A typed connection between two resources that are identified by Internationalized Resource Identifiers (IRIs) [[RFC3987]], and is comprised of: (i) a context IRI, (ii) a link relation type, (iii) a target IRI, and (iv) optionally, target attributes. Note that in the common case, the IRI will also be a URI [[RFC3986]], because many protocols (such as HTTP) do not support dereferencing IRIs [[RFC5988]].
Linked data: The term ‘Linked Data’ refers to an approach to publishing data that puts linking at the heart of the notion of data, and uses the linking technologies provided by the Web to enable the weaving of a global distributed database [[LDP-PRIMER]].
Longitude: The angular distance east or west of the prime meridian. Often abbreviated to Long.
Map Projection: A coordinate conversion from an ellipsoidal coordinate system to a plane, e.g. Transverse Mercator.
Open-world assumption (OWA): In a formal system of logic used for knowledge representation, the open-world assumption asserts that the truth value of a statement may be true irrespective of whether or not it is known to be true. This assumption codifies the informal notion that in general no single agent or observer has complete knowledge. In essence, from the absence of a statement alone, a deductive reasoner cannot (and must not) infer that the statement is false. That is, a valid response to a logical query may be: true, false or unknown.
Projected Coordinate Reference System: A coordinate reference system derived from a two-dimensional geodetic coordinate reference system by applying a map projection.
Resource Description Framework (RDF): A directed, labeled graph data model for representing information in the Web. It may be serialized in several data formats such as N-Triples [[N-TRIPLES]], XML [[RDF-SYNTAX-GRAMMAR]], Terse Triple Language (“turtle” or TTL) [[TURTLE]] and [[JSON-LD]].
Semantic Web: The term “Semantic Web” refers to World Wide Web Consortium's vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data.
SensorThings API: An open, geospatial-enabled and unified way to interconnect the Internet of Things (IoT) devices, data, and applications over the Web. [[SENSORTHINGS]].
Sensor Observation Service (SOS): A standardized HTTP interface allowing requests for observations across the Web using platform-independent calls. Sensor Observation Service [[SOS]].
SPARQL: A query language for RDF; it can be used to express queries across diverse data sources [[SPARQL11-OVERVIEW]].
Spatial data: Data describing anything with spatial extent; i.e. size, shape or position. In addition to describing things that are positioned relative to the Earth (also see geospatial data), spatial data may also describe things using other coordinate systems that are not related to position on the Earth, such as the size, shape and positions of cellular and sub-cellular Spatial Things described using the 2D or 3D Cartesian coordinate system of a specific tissue sample.
Spatial database: A spatial database, or geodatabase, is a database that is optimized to store and query data that represents objects defined in a geometric space. Most spatial databases allow representation of simple geometric objects such as points, lines and polygons and provide spatial query functions to determine spatial relationships (overlaps, touches, etc.).
Spatial Data Infrastructure (SDI): An ecosystem of geographic data, metadata, tools, applications, policies and users that are necessary to acquire, process, distribute, use, maintain, and preserve spatial data. Due to its nature (size, cost, number of interactors) an SDI is often government-related.
Spatial operator, spatial query function: Function or procedure that has at least one spatial parameter in its domain or range [[ISO-19107]].
Spatial relation, spatial relationship: Specifies how a Spatial Thing is located in space in relation to another Spatial Thing. Typically determined using a spatial operator.
Spatial thing: Anything with spatial extent, (i.e. size, shape, or position) and is a combination of the real-world phenomenon and its abstraction (the feature). Examples are: people, places, or bowling balls.
This is different from the [[ISO-19107]] definition of a Spatial Object which is a geometry or a topology object.
Temporal thing: Anything with temporal extent, i.e. duration. e.g. the taking of a photograph, a scheduled meeting, a GPS time-stamped track-point. [[W3C-BASIC-GEO]]
Triple-store (or quadstore): A triple-store or RDF store is a purpose-built database for the storage and retrieval of RDF subject-predicate-object “triples” through semantic queries. Many implementations are actually “quad-stores” as they also hold the name of the graph within which a triple is stored.
Universe of discourse: view of the real or hypothetical world that includes everything of interest [[ISO-19101-1-2014]].
Web Coverage Service (WCS): A service offering multi-dimensional coverage data for access over the Internet. [[WCS]]
Web Feature Service (WFS): A standardized HTTP interface allowing requests for geographical features across the Web using platform-independent calls. [[WFS]].
Web Map Service (WMS): A standardized HTTP interface for requesting geo-registered map images from one or more distributed spatial databases. [[WMS]]
Web Map Tile Service (WMTS): A standardized HTTP interface for requesting tiled, geo-referenced map images from one or more distributed spatial databases. [[WMTS]]
Web Processing Service (WPS): An interface standard which provides rules for standardizing inputs and outputs (requests and responses) for invoking spatial processing services, such as polygon overlay, as a Web service. [[WPS]]
Well Known Text (WKT): A text mark-up language for representing vector geometry objects on a map, spatial reference systems of spatial objects and transformations between spatial reference systems. (Sources: [[ISO-19162]], [[SIMPLE-FEATURES]], Wikipedia entry).
The editors gratefully acknowledge the contributions made to this document by all members of the working group; especially, the contributions received from those listed in the Contributors list.
This document would not have been possible without the tremendous efforts of the Data on the Web Working Group; their [[DWBP]] provides the essential underpinnings for our own work. Special thanks are due to Newton Calegari, Riccardo Albertoni, Annette Grainer, Antoine Isaac, and Eric Stephan.
The editors are also grateful for comments received from Ig Ibert Bittencourt, Marco Brattinga, Martin Desruisseaux, Neil McNaughton, Simeon Nedkov, James Passmore, Stefan Proell, Maik Riechert and Erik Wilde.
The editors also gratefully acknowledge the chairs of this Working Group: Ed Parsons and Kerry Taylor — and staff contacts Phil Archer and François Daoust.
A full change-log is available on GitHub
The document has been updated to take into account further support for spatial and temporal aspects added in the new version of the W3C Data Catalog Vocabulary (DCAT) [[VOCAB-DCAT-2]] and GeoDCAT-AP [[GeoDCAT-AP-20201223]].
locn:geometryhas been replaced with the more specific property
dcat:bbox, defined in [[VOCAB-DCAT-2]]. Moreover, the original GeoJSON datatype URI used in the example (corresponding to the GeoJSON IANA Media Type URL) has been replaced with
geosparql:geoJSONLiteral, included in the draft of the new version of [[GeoSPARQL]] (see issues opengeospatial/ogc-geosparql/issues/1 and opengeospatial/ogc-geosparql/issues/48), and already adopted in [[GeoDCAT-AP-20201223]].
dcat:centroid) and bounding boxes (
:spatialAccuracy, which are available from the reference [[VOCAB-DQV]] examples in the relevant section.
Additional changes concerns editorial fixes (typos, broken links, and styling). This included a fix to issue #1037, to ensure all examples be numbered. As a result, the numbering of examples changed.
No major changes have been introduced since publication on 11 May 2017. Main updates were made in response to public reviews to clarify that the best practices do not cover advanced scenarios, e.g. involving critical decision making, in section 3.1 Spatial data, and to note the absence of scientific formats to encode spatial data on the Web in section A. Applicability of common formats to implementation of best practices.
The most obvious change to readers is that the best practices have been reordered with the intent to improve the readability of the document, and the empty stubs of best practices removed in the previous release are now gone. The fragment-identifiers for the best practices remain unchanged, but the numbers are different. The mapping (from old number to new) is as follows:
Two new sections and two new best practices were added:
Section 11. How to use these best practices (link to previous WD version) has been removed.
Significant updates to the following best practices:
Most of the other best practices received minor additions and improvements, without significant change to their contents.
Content was added to the How to test and Benefits sections of all Best Practices.
The Conclusions section was renamed "Gaps in current practice" and content added.
Section A. Applicability of common formats to implementation of best practices was updated; it now has one table listing spatial data formats and one listing spatial data vocabularies.
Section C. Cross reference of use case requirements against best practices was expanded to include cross-reference from both this document and [[DWBP]].
The Glossary was updated.
Plus minor, mostly editorial changes.
Significant updates to the following best practices:
The following best practices have been removed or merged into other best practices:
Section 14. Narrative — the Nieuwhaven flooding (link to previous WD version) has been removed.
Appendix B: Authoritative sources of geographic identifiers has been merged into Best Practice 14: Publish links between Spatial Things and related resources.
Significant updates to:
Significant updates to:
(further updates to these best practices are expected in the next WD release, circa end January 2017)
Plus minor changes that include adding a list of most important best practices for data publishers that start from an existing SDI to section 9, and changing of a few best practice titles to include the word spatial.
The document has undergone substantial changes since the first public working draft. Below are some of the changes made: