This document advises on best practices related to the publication and usage of spatial data on the Web; the use of Web technologies as they may be applied to location. The best practices are intended for practitioners, including Web developers and geospatial experts, and are compiled based on evidence of real-world application. These best practices suggest a significant change of emphasis from traditional Spatial Data Infrastructures by adopting a Linked Data approach. As location is often the common factor across multiple datasets, spatial data is an especially useful addition to the Linked Data cloud; the 5 Stars of Linked Data paradigm is promoted where relevant.

This Working Draft incorporates changes based on Working Group discussions during and since TPAC 2016 (see for details). This document is still very much a work in progress; we anticipate three further iterations of the Spatial Data on the Web Best Practices before the Working Group ends in June 2017. Our aim remains to provide actionable advice and guidance to practitioners (e.g. those directly publishing spatial data on the Web themselves, or those developing software tools to assist that publication) - which means that the omission of examples in many of the best practices will be resolved before final publication. We intend to refer to examples “in the wild” in an effort to provide evidence that each Best Practice is being applied.

Looking to future releases, the editors anticipate:

  1. Further consolidation of best practices (there is still some overlap with [[DWBP]] to be resolved).
  2. Best practices to be reordered to help readers understand where to prioritize their efforts when publishing spatial data on the Web - which will change the numbering of the best practices.
  3. Further development of which is intended to provide a would-be spatial data publisher navigate through best practices from here and [[DWBP]] by presenting with questions they should consider and helping them to identify which best practices they should prioritize - only about one-quarter complete and does not yet cover topics like the choice of data format or vocabulary.

Specifically, we ask reviewers to consider ISSUE 208 in Best Practice 7: Use globally unique persistent HTTP URIs for spatial things regarding the use of “indirect identifiers” (as discussed in [[WEBARCH]] section 2.2.3 Indirect Identification.

Changes made in this and previous releases are recorded in Appendix F - Changes since previous versions.

The editors would like to thank everyone for their feedback - and to encourage reviewers to continue this critique.

For OGC: This is a Public Draft of a document prepared by the Spatial Data on the Web Working Group (SDWWG) - a joint W3C-OGC project (see charter). The document is prepared following W3C conventions. The document is released at this time to solicit public comment.

Introduction

Increasing numbers of Web applications provide a means of accessing data. From simple visualizations to sophisticated interactive tools, there is a growing reliance on data. The open data movement has lead to many national, regional and local governments publishing their data through portals. Scientific and cultural heritage data is increasingly published on the Web for reuse by others. Crowd-sourced and social media data are abundant on the Web. Sensors, connected devices and services from domains such as energy, transport, manufacturing and healthcare are becoming commonly integrated using the Web as a common data sharing platform.

The Data on the Web Best Practices [[DWBP]] provide a set of recommendations that are applicable to the publication of all types of data on the Web. Those best practices cover aspects including data formats, data access, data identifiers, metadata, licensing and provenance.

Location information, or spatial data, is often a common thread running through such data; describing how things are positioned relative to the Earth in terms of coordinates and/or topology.

Within this document our focus is the somewhat broader concern of spatial data; data that describes anything with spatial extent (i.e. size, shape or position) whether or not it is positioned relative to the Earth.

Similarly to the challenges identified in [[DWBP]] relating to publishing data on the Web, and therefore not making use of the full potential of the Web as a data sharing platform, there is a lack of consistency in how people publish spatial data.

It is not that there is a lack of spatial data on the Web; the maps, satellite and street level images offered by search engines are familiar and there are many more examples of spatial data being used in Web applications.

However, the data that has been published is difficult to find and often problematic to access for non-specialist users. The key problems we are trying to solve in this document are discoverability, accessibility and interoperability. Our overarching goal is to enable spatial data to be integrated within the wider Web of data; providing standard patterns and solution that help solve these problems.

Audience

Our goal in writing this best practice document is to support the practitioners who are responsible for publishing their spatial data on the Web or developing tools to make it easy for others to work with spatial data.

We expect readers to be familiar both with the fundamental concepts of the architecture of the Web [[WEBARCH]] and the generalized best practices related to the publication and usage of data on the Web [[DWBP]].

We aim to provide two primary pathways into these best practices:

  1. for those already familiar with publishing data on the Web who want to better exploit the spatial aspects of their data; and
  2. for those who publish spatial data through Spatial Data Infrastructures and want to better integrate that data within the wider Web ecosystem.

In each case, we aim to help them provide incremental value to their data through application of these best practices.

This document provides a wide range of examples that illustrate how these best practices may be applied using specific technologies. We do not expect readers to be familiar with all the technologies used herein; rather that readers can identify with the activities being undertaken in the various examples and, in doing so, find relevant technologies that they are already aware of or discover technologies that are new them.

In this document we focus on the needs of data publishers and the developers that provide tools for them. That said, we recognize that value can only be gained from publishing the spatial data when people use it! Although we do not directly address the needs of those users, we ask that data publishers and developers reading this document do not forget about them; moreover, that they always consider the needs of users when publishing spatial data or developing the supporting tools. All of our best practices are intended to provide guidance about the publishing spatial data to improve ease of use. As we said above: the key problems we are trying to solve in this document are discoverability, accessibility and interoperability. We hope that the examples included in the urban flooding scenario will help illustrate how users may benefit from the application of these best practices.

Scope

Spatial data

All of the best practices described in [[DWBP]] are relevant to publication of spatial data on the Web. Some, such as [[DWBP]] Best Practice 4: Provide data license information need no further elaboration in the context of spatial data. However, other best practices from [[DWBP]] are further refined in this document to provide more specific guidance for spatial data.

The best practices described below are intended to meet requirements derived from the scenarios in [[SDW-UCR]] that describe how spatial data in commonly published and used on the Web.

In line with the charter, this document provides advice on:

As stated in the charter, discussion of activities relating to rending spatial data as maps is explicitly out of scope.

We extend [[DWBP]] to cover aspects specifically relating to spatial data, introducing new best practices only where necessary. In particular, we consider the individual resources, or Spatial Things, that are described within a dataset.

We consider a ‘programmable web’, formed by a network of connected services, products, data and devices, each doing what it is good at, to be the future of the Web. So whether we are talking about Big, Crawlable, Linked, Open, Closed, Small, Spatial or Structured Data; our starting point is that it needs to be machine-friendly and developer-friendly: making it as programmable as possible.

Best practice criteria

The best practices described in this document are compiled based on evidence of real-world application. Where the Working Group have identified issues that inhibit the use or interoperability of spatial data on the Web, yet no evidence of real-world application is available, the editors present these issues to the reader for consideration, along with any approaches recommended by the Working Group. Such recommendations will be clearly distinguished as such in order to ensure that they are not confused with evidence-based best practice.

Devise a way to make best versus emerging practices clearly recognizable in this document.

The normative element of each best practice is the intended outcome. Possible implementations are suggested and, where appropriate, these recommend the use of a particular technology.

We intend this best practice to be durable; that is that the best practices remain relevant for many years to come as the specific technologies change. However, in order to provide actionable guidance, i.e. to provide readers with the technical information they need to get their spatial data on the Web, we try to balance between durable advice (that is necessarily general) and examples using currently available technologies that illustrate how these best practices can be implemented. We expect that readers will continue to be able to derive insight from the examples even when those specifically mentioned technologies are no longer in common usage, understanding that technology ‘y’ has replaced technology ‘x’.

Privacy considerations

There are many situations where the location of a person is very useful; from using a taxi hailing service to geocoding a selfie. Technology makes this location information easy to collect and share. However, spatial data has particular characteristics which makes its use potentially more complex. For example a single location of an anonymous tracked mobile phone may cause few privacy concerns, however the same phone tracked over a few days could provide enough information to make the identification of it's user possible. Like all personally identifiable information, great care must be taken as the collection, management and security of such information is the subject of legal frameworks. We do not attempt to provide guidance as to legal aspects of storing potentially personally identifiable spatial information, expert legal advice should be obtained. In summary: legal and privacy considerations relating to spatial data are out of scope.

Best Practices Summary

This document contains a variety of best practices related to the publication and usage of spatial data on the Web. First, it continues with several more in-depth introductions on spatial things and geometry, coverages, spatial relations, coordinate reference systems, linked data, and Spatial Data Infrastructures. Then it describes how these best practices can be used, depending on your starting point and context. After that, the best practices themselves are described. They are about metadata, quality, versioning, identifiers, vocabularies, (API) access, linking, and large datasets.

The following best practices can be found in this document:

Spatial Things, Features and Geometry

In spatial data standards from the Open Geospatial Consortium (OGC) and the 19100 series of ISO geographic information standards from ISO/TC 211 the primary entity is the feature. [[ISO-19101]] defines a feature as an: “abstraction of real world phenomena”.

This terse definition is a little confusing, so let’s unpack it.

Firstly, it talks about “real world phenomena”; that’s everything from highways to helicopters, parking meters to postcode areas, water bodies to weather fronts and more. These can be physical things that you can touch (e.g. a phone box) or an abstract concept that has spatial extent (e.g. a postcode area). Features can even be fictional (e.g. “Dickensian London”) and may even lack any concrete location information such as the mythical Atlantis.

The key point is that these “features” are things that one talks about in the universe of discourse - which is defined in [[ISO-19101]] as the “view of the real or hypothetical world that includes everything of interest”.

Secondly, the definition of feature talks about “abstraction”. Take the example of Eddystone Lighthouse. A helicopter pilot might see it a “vertical obstruction” and be interested in attributes such as its height and precise location. Whereas a sailor may see it as a “maritime navigation aid” and need information about its light characteristic and general location. Depending on one’s set of concerns, only a subset of the attributes of a given “real world phenomenon” are relevant. In the case of Eddystone Lighthouse, we defined two separate “abstractions”. As is common practice in many information modelling activities, the common sets of attributes for a given “abstraction” are used to define classes. In the parlance of [[ISO-19101]], such a class is known as “feature type”.

Although the exact semantics differ a little, there is a good correlation between the concept of “feature type” as defined in spatial data standards and the concept of “class” defined in [[RDF-SCHEMA]]. The former is an information modelling construct that binds a fixed set of attributes to an identified resource, whereas the latter defines the set of all resources that share the same group of attributes.

When combined with the open-world assumption embraced by RDF Schema and the Web Ontology Language (OWL) [[OWL2-OVERVIEW]], the set-based approach to classes provides more flexibility when combining information from multiple sources. For example, the “Eddystone Lighthouse” resource can be seen as both a “vertical obstruction” and a “maritime navigation aid” as it meets the criteria for membership of both sets. Conversely, this flexibility makes it much more difficult to build software applications as there is no guarantee that an information resource will specify a given attribute. Web standards such the Shapes Constraint Language [[SHACL]] are being defined to remedy this issue.

However, the term “feature” is also commonly used to mean a capability of a system, application or component. Also, in some domains and/or applications no distinction is made between "feature" and the corresponding real-world phenomena.

To avoid confusion, we adopt the term “spatial thing” throughout the remainder of this best practice document. “Spatial thing” is defined in [[W3C-BASIC-GEO]] as “Anything with spatial extent, i.e. size, shape, or position. e.g. people, places, bowling balls, as well as abstract areas like cubes”.

The concept of “spatial thing” is considered to include both "real-world phenomena" and their abstractions (e.g. “feature” as defined in [[ISO-19101]]). Furthermore, we treat it as inclusive of other commonly used definitions; e.g. Feature from [[NeoGeo]], described as “A geographical feature, capable of holding spatial relations”.

A spatial thing may move. We must take care not to oversimplify our concept of spatial thing by assuming that it is equivalent to definitions such as Location (from [[DCTERMS]]) or Place (from [[SCHEMA-ORG]]), which are respectively described as “A spatial region or named place” and "Entities that have a somewhat fixed, physical extension".

How do we ensure alignment with the terminology being used in the further development of GeoSPARQL? We expect a new spatial ontology to be published which will contain clear and unambiguous definitions for the terms used therein.

Looking more closely, it is important to note that geometry is typically a property of a spatial thing.

{
  “geometry”: {
    “type”: “Point”,
    “coordinates”: [50.184, -4.268]
  }
}
      

In actual fact, this is only one geometry that may be used to describe Eddystone Lighthouse. Other geometries might include a 2D polygon that defines the footprint of the lighthouse in a horizontal plane and a 3D solid describing the volumetric shape of the lighthouse.

Furthermore, these geometries may be subject to change due to, say, a resurvey of the lighthouse. In such a situation, the geometry object would be updated- but the spatial thing that we are talking about is still Eddystone Lighthouse. Following the best practices presented below, we use a HTTP URI to unambiguously identify Eddystone Lighthouse: http://d-nb.info/gnd/1067162240 (URI sourced from Deutsche Nationalbibliothek).

We say that the spatial thing is disjoint from the geometry object. The spatial thing, Eddystone Lighthouse (http://d-nb.info/gnd/1067162240), is the “real world phenomenon” about which we want to state facts (such as it has a focal height is at 41 meters above sea level) and link to other real world phenomena (for example, that it is located at Eddystone Rocks, Cornwall; another spatial thing identified as http://sws.geonames.org/2650253 by GeoNames).

Coverages: describing properties that vary with location (and time)

Many aspects of spatial things can be described with single-valued, static properties. However, in some applications it is more useful to describe the variation of property values in space and time. Such descriptions are formalized as coverages. Users of spatial information may employ both viewpoints.

So what is a coverage? As defined by [[ISO-19123]] it is simply a data structure that maps points in space and time to property values. For example, an aerial photograph can be thought of as a coverage that maps positions on the ground to colors. A river gauge maps points in time to flow values. A weather forecast maps points in space and time to values of temperature, wind speed, humidity and so forth. One way to think of a coverage is as a mathematical function, where data values are a function of coordinates in space and time.

Sometimes you’ll hear the word “coverage” used synonymously with “gridded data” or “raster data” but this isn’t really accurate. You can see from the above paragraph that non-gridded data (like a river gauge measurement) can also be modelled as coverages. Nevertheless, you will often find a bias toward gridded data in discussions (and software) that concern coverages.

A coverage is not itself a spatial thing. The definition above presents a coverage as a data construct - in which case, it does not exist in the real world. Accordingly, we might say in the hydrology example, where a river gauge measures flow values at regular sampling times, that the “river segment” (a spatial thing) has a property “flow rate” that is expressed as coverage data.

Spatial things and coverages may be related in several ways:

A coverage can be defined using three main pieces of information:

Usually, the most complex piece of information in the coverage is the definition of the domain. This can vary quite widely from coverage type to coverage type, as the list above shows. For this reason, coverages are often defined by the spatiotemporal geometry of their domain. You will hear people talking about “multidimensional grid coverages” or “time-series coverages” or “vertical profile coverages” for example.

Spatial relations

A spatial relation specifies how an object is located in space in relation to a reference object. Commonly used types of spatial relations are: topological, directional and distance relations.

Do we also need to talk about spatial relationships? And how they are related to spatial things and geometries?

Coordinate Reference Systems (CRS)

Introduction to CRS does not yet cover non-geographic cases.

Best Practice scope is "spatial data" - which includes non-geographic location (e.g. where things aren't positioned relative to the Earth). For example, we have a microscopy use case where the locations of cells are described.

One of the most fundamental aspects of publishing spatial data, data about location, is how to express and share the location in a consistent way. In almost all cases where you are publishing data for use by the wider web community the use of latitude and longitude coordinates (Lat and Long) is most appropriate. Latitude and longitude coordinates are global and offer a level of precision well suited to many applications: you can express a location to within a few metres which is perfect for locating your favorite coffee shop, geocoding a photograph or capturing an augmented reality Pokemon hiding in your local park.

As with everything to do with spatial data, things can get more complicated. One of the most common problems occurs because not all Coordinate Reference Systems (CRS) agree how to express latitude and longitude coordinates. Some CRS order the coordinates Lat/Long while others use Long/Lat; some use decimal degrees while others use degrees, minutes and seconds (dms). Axis order mistakes can mean the difference between, say, a position in the Netherlands or somewhere in Somalia, while encoding coordinates in decimal degrees when dms is expected can lead to positional errors on the kilometer scale.

Therefore it is very important to provide explicit information to your users about how coordinates are encoded. For example, this snippet of results from the Google Geocoding API makes explicit which is the latitude and which is the longitude coordinate.

 "formatted_address" : "1600 Amphitheatre Parkway, Mountain View, CA 94043, USA",
         "geometry" : {
            "location" : {
               "lat" : 37.4224764,
               "lng" : -122.0842499
            },
            "location_type" : "ROOFTOP",
            "viewport" : {
               "northeast" : {
                  "lat" : 37.4238253802915,
                  "lng" : -122.0829009197085
               },
               "southwest" : {
                  "lat" : 37.4211274197085,
                  "lng" : -122.0855988802915
               }
            }
         },
      

Other mechanisms include using a data format that specifies how the coordinates are included (such as GeoJSON [[RFC7946]] where section 4. Coordinate Reference System specifies coordinate order of longitude and latitude using units of decimal degrees) or by having your data explicitly reference the CRS definition you're using. See Best Practice 17: State how coordinate values are encoded for more information.

Users of spatial data are often interested in the third dimesion too: vertical elevation (or altitude). For most situations, we can consider elevation to be the vertical distance above (or below) mean sea level. The elevation is most often expressed in meters (but this can vary between CRS definititions) and is provided as a third value in a coordinate position.

The following is a little more technical; and in most cases this should only be for information.

Latitude, longitude and elevation measurements express a position on the surface of the Earth. But to define this position we need to state where we are making the measurements from (e.g. the equator, the prime meridian and the approximated surface of the Earth, or geoid) and consider the shape of the earth (a flattened sphere with lumps and bumps, but for convenient mathematical operations, usually approximated to an ellipsoid). This information is used to define the geodetic datum which provides the basis of every Coordinate Reference System.

The World Geodetic System 1984 (WGS 84) Coordinate Reference System is used in almost all cases where spatial data is published on the Web. Since this is also used by the GPS system, it's handy for all those mobile Apps!

Most people can stop reading now, but of course there are going to be a few cases where WGS 84 is not appropriate.

In many parts of the world location data has been collected using local coordinate systems that are specific to particular countries or regions. These local coordinate systems may use projected measurements defined on a flat, two-dimensional surface (which are easier to use for calculating distances than angular measurements and are essential when making topographic maps).

Users of spatial data should be aware that projected coordinate systems distort distances and angular measurements and accoringly affect how the true size of countries of countries and other large-scale entities is perceived. CNN explore some of the challenges relating to map projections in their article What's the real size of Africa?.

So it may be that you have information in a projected CRS, rather than global latitude and longitude - what should you do? You can publish data as is in one of these many projected CRS, but you need to tell users which particular CRS is being used. A good directory of Coordinate Reference Systems is maintained by the International Association of Oil and Gas Producers: the EPSG Geodetic Parameter Dataset.

It is common for a CRS to be described by its ESPG code. For example, 2-dimensional WGS 84 (Lat/Long) is EPSG:4326, 3-dimensional WGS 84 (Lat/Long/Elevation) EPSG:4979, Web-Mercator (a global projected CRS used in most Web-mapping applications) is EPSG:3857 and OSGB 1936 / British National Grid (a national projected CRS) is EPSG:27700.

Definitions of coordinate reference systems are available from the Open Geospatial Consotium CRS Register, Spatial Reference and EPSG.io (an open-source web service which simplifies discovery of coordinate reference systems utilized worldwide).

Alternatively you can re-project your coordinates to WGS 84 using many available tools online. So for example the location at 516076, 170953 in British National Grid (EPSG:27700) coordinates is -0.331841, 51.425708 in WGS 84 Long/Lat. This conversion is a useful step as it makes you data more accessible to global users. So if you can do so, it is helpful to publish data in both local (projected) and global coordinates.

Re-projecting to a better known CRS is also a necessary step if you are publishing data in the form of engineering or Computer Aided Design (CAD) drawings of a new building or road layout for example. Usually these drawings are made using a very local coordinate reference system for the site itself, so the data will need to be reprojected to “fit” with existing data.

So we are now at the point where almost everyone publishing spatial data on the Web can stop reading. But for those with specific requirements concerning high precision locations, there are a few more topics that need to be mentioned.

If you need to be able to measure in terms of a few centimetres or less then things are more complicated. With this level of precision required you need to take into account a more sophisticated model of the shape of the Earth and take into account plate tectonics.

For these more complex use cases other reference systems with alternative geodetic datums are used. The geodetic datum can be thought of as the model of the Earth's surface over which the coordinate reference system is applied. Different datums use different models for the precise shape and size of the Earth in order to provide more accurate horizontal or vertical measurements at different positions on the globe (because depending on your location, different ellipsoids will provide a better approximation of the local Earth's surface - but this is at the expense of a poorer match elsewhere).

While WGS 84 provides a reasonable fit at all points on the Earth's surface, many other datums are defined for improved fit within a regional or national area. For example in Europe a system called ETRS89 (EPSG:4258) can be used instead of WGS 84, while in North America a similar system called NAD-83 (EPSG:4269) is used. So it might be that you have measurements made using these reference systems. Here the best practice is once more to be explicit in describing the CRS used, but also to be careful re-projecting to different systems as required accuracy may be lost.

Finally another issue is that points on the surface of the earth are actually moving relative to the coordinate system, due to geologic processes. You may think this is of interest only to geologists, but when I tell you that Australia has moved around 1.5m since the framework was last reset 20 years ago, and remind you that we are entering the age of self-driving cars, then you will probably think again. Re-calculating the datum from time to time, or maybe continuously such as in the case of the dynamic New Zealand Geodetic Datum (NZGD2000), really does matter for some applications. See Best Practice 3: Choose the coordinate reference system to suit your user's applications for more information.

Detailed discussion of coordinate reference system definitions themselves is beyond the scope of this best practice document. Should this topic be of interest, please refer to specialist documentation such as [[OGC-TOPIC-2]] / [[ISO-19111]].

Linked Data

The term ‘Linked Data’ refers to an approach to publishing data that puts linking at the heart of the notion of data, and uses the linking technologies provided by the Web to enable the weaving of a global distributed database. By naming real world entities - be they web resources, physical objects such as the Eiffel Tower, or even more abstract things such as relations or concepts - with URLs data can be published and linked in the same way web pages can. [[LDP-PRIMER]]

The 5-star scheme at 5 Star Data states:

★ make your stuff available on the Web (whatever format) under an open license

★★ make it available as structured data (e.g., Excel instead of image scan of a table)

★★★ make it available in a non-proprietary open format (e.g., CSV as well as of Excel)

★★★★ use URIs to denote things, so that people can point at your stuff

★★★★★ link your data to other data to provide context

We think that the concept of Linked Data is fundamental to the publishing of spatial data on the Web: it is the links that connect data together that are the foundational to the Web of data.

These best practices promote a Linked Data approach.

Sources such as the Best Practices for Publishing Linked Data [[LD-BP]] assert a strong association between Linked Data and the Resource Description Framework (RDF) [[RDF11-PRIMER]]. Yet we believe that Linked Data requires only that the formats used to publish data support Web linking (see [[WEBARCH]] section 4.4 Hypertext). 5 Star Data (based on [[5STAR-LOD]]) asserts only that data formats be open and non-proprietary (★★★); and infers the need for data formats to support use of URIs as identifiers (★★★★) and Web linking (★★★★★).

Within this document we include examples that use RDF and related technologies such as triple stores and SPARQL because we see evidence of its use in real world applications that support Linked Data. However, we must make clear to readers that there is no requirement for all publishers of spatial data on the Web to embrace the wider suite of technologies associated with the Semantic Web; we recognize that in many cases, a Web developer has little or no interest in the toolchains associated with Semantic Web due to the addition of complexity to any Web-centric solution.

Although we think that Linked Data need not necessarily require the use of RDF, it is probably the most commonly representation. We note that [[JSON-LD]] provides a bridge between those worlds by providing a data format that is compatible with RDF but relies on standard JSON tooling.

Furthermore, as the examples in this document illustrate, we often see a ‘hybrid’ approach being used in real-world applications; using RDF to work with graphs of information that interlink resources, while relying on other technologies to query and process the spatial aspects of that information for performance reasons.

Why are traditional Spatial Data Infrastructures not enough?

Finding, accessing and using data disseminated through spatial data infrastructures (SDI) based on OGC web services is difficult for non-expert users. There are several reasons, including:

However, spatial data infrastructures are a key component of the broader spatial data ecosystem. Such infrastructures typically include workflows and tools related to the management and curation of spatial datasets, and provide mechanism to support the rich set of capabilities required by the expert community. Our goal is to help spatial data publishers build on these foundations to enable the spatial data from SDIs to be fully integrated with the Web of data.

When your starting point is a spatial data infrastructure, you should at least read the following best practices. These provide the most important extra steps that should be taken in order to bring spatial data from spatial data infrastructures to the Web:

The rest of the best practices provide more detail on specific aspects of publishing spatial data on the Web, such as metadata, geometries, CRS information, versioned data, and so on.

How to use these best practices

Section is incomplete.

Estimate that this covers only a quarter of the "spatial data publication pathway" that we are trying to help would-be spatial data publishers navigate. More material to be added describing the full range of considerations when publishing spatial data on the Web in the next public draft.

What are the starting points?

Preparations for publishing spatial data on the Web need to start somewhere. Typically, your spatial data will be in the following places:

  1. plain text documents; e.g. historical texts, government reports, blog posts etc.
  2. data files containing structured content or markup; e.g. geospatial vector data in Shapefile or [[GML]] format, statistical data in tabular CSV format or a spreadsheet, as GPX data with “waypoints” and “tracks”, satellite imagery in GeoTIFF, climate simulations in CF-NetCDF etc.
  3. a data repository; e.g. PostGIS (a spatially enabled relational database), Elasticsearch (a document-oriented noSQL repository based on Apache Lucene), Apache Jena’s TDB (an RDF triple store)
  4. exposed via an existing API; including OGC-compliant web services such as WFS and WCS

If your spatial data is managed within a software system it is likely that you will be able to access that data through one or more of the methods identified above; as structured data from a bulk extract (e.g. a “data dump”), via direct access to the underpinning data repository or through a bespoke or standards-compliant API provided by the system.

As working with specific spatial data management systems is beyond the scope of this best practice document we will assume that one of the four methods identified above is your starting point.

Each of these starting points have their own challenges, but working with plain text documents can be particularly tricky as you will need to parse the natural language to identify the spatial things and their properties before you can proceed any further. Natural Language Processing (NLP) is also beyond the scope of this best practice document - so we will assume that you’ve already completed this step and have parsed any plain documents into structured data records of some kind.

What are you talking about?

The Web is an information space in which the items of interest, referred to as resources, are identified by URIs ([[WEBARCH]] §1. Introduction). The spatial data you want to publish is one such resource. Depending on the nature of your spatial data, it may be a single dataset or a collection of datasets. [[VOCAB-DCAT]] provides a useful definition of dataset: “A collection of data, published or curated by a single agent, and available for access or download in one or more formats.”

Deciding whether your spatial data is a single dataset or not is somewhat arbitrary. To decide this, it is often useful to consider attributes such as the license under which the data will be made available, the refresh or publication schedules, the quality of the data and the governance regime applied in managing the data. Typically, all of these attributes should be consistent within a single dataset.

As a first step in publishing your spatial data on the Web, we need to stitch your data into the Web’s information space by assigning a URI to each dataset (see [[DWBP]] Best Practice 9: Use persistent URIs as identifiers of datasets). Furthermore, if you anticipate your data changing over time and you want users to be able to refer to a particular version of your dataset you should also consider assigning a URI to each version of the dataset (see [[DWBP]] Best Practice 11: Assign URIs to dataset versions and series).

[[DWBP]] section 8.6 Data Versioning provides further guidance on working with versioned resources: providing metadata to indicate the particular version of a given dataset resource (see [[DWBP]] Best Practice 7: Provide a version indicator) and enabling users to browse the history of your dataset (see [[DWBP]] Best Practice 8: Provide version history).

We also need to look inside the datasets at the resources described within your data. If you want these resources to be visible within the Web’s information space, by which we mean that others can refer to or talk about those resources, then they must also be assigned URIs (see [[DWBP]] Best Practice 10: Use persistent URIs as identifiers within datasets). These URIs are like 'Web-scale foreign keys' that enable information from different sources to be stitched together.

In spatial data, our primary concern is always the spatial things; these are the things with spatial extent (i.e. size, shape, or position) that we talk about in our data - anything from physical things like people, places and post boxes to abstractions such as administrative areas. Spatial things should always be assigned URIs (see Best Practice 7: Use globally unique persistent HTTP URIs for spatial things) - potentially reusing existing URIs that are already in common usage. A common pattern used when assigning URIs to spatial things is append the locally-scoped identifiers used within the dataset to a URI path within an internet DNS domain where one has administrative rights to publish content.

Depending on how you organize your data, it may also be helpful to give your geometry objects URIs. For example, you may want to reuse a line string when describing the boundaries of adjacent administrative areas, or you may need to serve geometry data from an alternate URL because property data and geometry data are managed in different systems. Essentially, if you want to refer to a resource on the Web, you need to assign a URI to it.

Who is your audience?

Once you have determined the subjects of your spatial data, you should then consider your users - and the software tools, applications and capabilities they might have at their disposal.

Your objective should be to reduce the “friction” created for users to work with your data by providing it in a form that is closest to what their chosen software environment supports.

It is likely that you will be able to identify your intended “community of use” - and on that basis discern how best to publish data for them. However, increasingly data is being repurposed to derive insight in ways that the original publisher had never foreseen. This “unanticipated re-use” can add significant value to your data (e.g. because you didn’t know that your data could be used that way!) but this introduces the challenge of working with a large set of unknown users, developers and devices.

So while you should always prioritize your known users when publishing spatial data on the Web (often, because they are your stakeholders and their happiness can lead to continued funding!), it will often reap dividends to “design for the masses”: providing your spatial data in a way that is most readily usable with the (geo)spatial JavaScript libraries commonly employed across the Web.

Things that you should consider when choosing how to publish your spatial data on the Web are described next …

Parse that!

For users to work with your data, software agents (a.k.a. the “machines”) need to be able to parse it - to resolve the serialized data into its component parts. You should make your data available in machine-readable, standardized data formats (see [[DWBP]] Best Practice 12: Use machine-readable standardized data formats); e.g. JSON [[RFC7159]], XML [[XML11]], CSV [[RFC4180]] and other tabular data formats, YAML [[YAML]], protocol-buffers [[PROTO3]] etc. According to the 5 Star Data [[5STAR-LOD]] scheme, using open and non-proprietary structured data formats yields a 3-star rating (★★★), so you’re well on your way to good practice.

Consider that Web applications are most often written in JavaScript, probably the most “frictionless” data format for Web developers is JSON. That said, it is reasonably simple to parse other formats for use in JavaScript using widely available libraries. In some cases, there are even standards to define how this should be done (for example: [[CSV2JSON]])

You should also consider whether there are any attributes of these machine-readable standardized data formats that offset a little inconvenience for your data user. For example, protocol-buffers [[PROTO3]] and CBOR [[RFC7049]] (“Concise Binary Object Representation”) provide a significantly more compact encoding than JSON. The inconvenience of having to use additional libraries to parse these binary formats is offset by the convenience of much faster load times.

Imagery formats JPEG [[JPEG2000]] and PNG [[PNG]] can also be coerced to carry data; providing 3 or 4 channels of 8-bit data values. This can be an attractive way to encode gridded coverage data values as it is highly compact. So long as you don’t apply compression algorithms to the “image”; while compression retains visual integrity, it can ruin your data integrity. Experience indicates that network providers often do apply compression to image formats - even if you don’t want that. The key point is to ensure that you choose formats that are unaffected by the transport network.

When selecting the data format, make sure that your community of use have access to libraries or other software components required to work with that format. Let’s take [[GeoTIFF]] as an “anti-example”: it’s the de facto format for encoding geo-referenced imagery data - such as that available from satellites - but the lack of widely available libraries for working with it in a JavaScript application make it unsuitable for publishing spatial data on the Web. Although a developer could write a byte-level parser, it puts an additional burden on any re-use.

The Best Practices

Spatial Metadata

[[DWBP]] provides best practices discussing the provision of metadata to support discovery and reuse of data (see [[DWBP]] section 8.2 Metadata for more details). Providing metadata at the dataset level supports a mode of discovery well aligned with the practices used in Spatial Data Infrastructure (SDI) where a user begins their search for spatial data by submitting a query to a catalog. Once the appropriate dataset has been located, the information provided by the catalog enables the user to find a service end-point from which to access the data itself - which may be as simple as providing a mechanism to download the entire dataset for local usage or may provide a rich API enabling the users to request only the required parts for their needs. The dataset-level metadata is used by the catalog to match the appropriate dataset(s) with the user's query.

This section includes best practices for including the spatial extent, CRS, and other spatial details of the dataset in the metadata. These are the extra metadata items needed to make spatial datasets both discoverable and usable. A third best practice in this section goes a step further in granularity: exposing spatial data on the web in such a way that individual entities or "granules" within a dataset can be discovered, evaluated, and utilized.

Include spatial metadata in dataset metadata

The description of datasets that have spatial features should include explicit metadata about their spatial extent, coverage, and representation

This best practice extends [[DWBP]] Best Practice 2: Provide descriptive metadata.

Why

Since location is such a powerful organizing principle, it is usually necessary to specifically describe the spatial details and nature of a dataset in order to discover it as well as to determine its fitness for use. This information is used, for example, by SDI catalog services that offer spatial querying to find data - but also by users to understand the nature of the dataset. In some cases, for example when dealing with crowd-sourced data, provenance information or how the dataset came to be in its published form and with what quality, is important as well. The first level of spatial description is the spatial extent of the dataset, the area of the world that the dataset describes. This often suffices for initial discovery, but further levels of description are needed to evaluate a dataset for use. These include the dataset spatial coverage (continuity, resolution, properties) as well as the spatial representation or geometric model (for example, grid coverage, discrete coverage, point cloud, linear network). Dataset quality measures such as positional accuracy are also important for determining applicability. In the case of datasets whose spatial characteristics vary over their temporal coverage, spatial descriptions must include an explicit temporal aspect

Intended Outcome

Dataset metadata should include the information necessary to enable spatial queries within catalog services such as those provided by SDIs.

Dataset metadata should also include the information required for a user to evaluate whether a spatial dataset is suitable for their intended application.

Possible Approach to Implementation

When publishing a dataset, provide as much spatial metadata as necessary, but at least the spatial extent, coverage, and representation. Other examples of spatial metadata include:

  • number of dimensions (1D, 2D, 3D, 4D)
  • spatial representation type (e.g. grid, vector, text table)
  • geometric property (e.g. boundary, region, centerline, centroid, field
  • Coordinate Reference System(s) - refer to for an introduction to that topic
  • spatial resolution - Best Practice 5: Describe the positional accuracy of spatial data
  • spatial significance of non-spatial properties (e.g. point value, interpolation, unit average, sum)

In Spatial Data Infrastructures the accepted standard for describing metadata is [[ISO-19115]] or profiles thereof.

To provide information about the spatial attributes of the dataset on the Web one can:

  • As shown in [[DWBP]] Best Practice 2: Provide descriptive metadata: Include the spatial coverage of the features described by the dataset using [[VOCAB-DCAT]] and a reference to a named place in a common vocabulary for geospatial semantics (e.g. GeoNames),
  • Again, use [[VOCAB-DCAT]], but instead of a reference to a named place, use a set of coordinates to specify the boundaries of the area either as a bounding box (add glossary ref) or a polygon.
  • Use the spatial extension of [[VOCAB-DCAT]], [[GeoDCAT-AP]], to specify spatial attributes that are not available in [[VOCAB-DCAT]]. [[GeoDCAT-AP]] provides an RDF syntax binding for the metadata elements defined in the core profile of [[ISO-19115]] and in the INSPIRE metadata schema [[INSPIRE-MD]].
  • Use geospatial ontologies (see W3C Geospatial Incubator Group (GeoXG)'s report) to describe the spatial data for the datasets.

How to Test

Check if the spatial metadata for the dataset itself includes the overall features of the dataset in a human-readable format.

Check if the descriptive spatial metadata is available in a valid machine-readable format.

Evidence

Relevant requirements: R-Discoverability, R-Compatibility, R-BoundingBoxCentroid, R-Crawlability, R-SpatialMetadata and R-Provenance.

Benefits

  • Reuse
  • Discoverability

(to be deleted)

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

Choose the coordinate reference system to suit your user's applications

Consider your user's intended application when choosing the coordinate reference system(s) used to publish spatial data.

Why

A multitude of coordinate reference systems exist because there is no perfect solution to meet all requirements:

  1. The Earth is a complicated shape (neither spherical nor flat!):

    For each (Earth-based) coordinate reference system, the topographical surface of the Earth is approximated to a geodetic datum that is described using an ellipsoid. The trouble with approximation is that nothing is perfect everywhere, which means that compromise is inevitable. Some datums, like WGS 84, provide a reasonable (but not highly accurate) fit everywhere on the Earth, while other datums (such as the European Terrestrial Reference System 1989 - as used by ETRS89 / EPSG:4258) provide a better fit in a given region at the expense of accuracy elsewhere.

    Spatial data is often projected from the curved surface of the Earth onto a flat plane (e.g. a computer screen or a topographical map) to make it easier to compute distances between positions and calculate areas. There are many choices of projection (e.g. equirectangular, mercator, stereographic, orthographic etc.), each of which is designed for particular tasks. As with datums, projections are often chosen to better support regional, national or local needs.

    It is also worth noting that as a living planet, the Earth continues to change its shape; for example, continental drift moves Australia north-eastwards several centimeters each year and New Zealand shifts in multiple directions. To retain accuracy, datums need to be adjusted from time to time - as is the case of the the New Zealand Geodetic Datum (NZGD2000) that is frequently revised to take account of earth deformations.

  2. Sometimes we don't want to measure relative to the surface of the Earth at all:

    Spatial data such as descriptions of the built environment, geological surveys, satellite imagery, etc. are often captured and stored in an engineering coordinate reference system as measurements from a local datum. For example, X Y survey coordinates relative to a building corner, pixel positions within the image swath of a satellite camera, or distance along a linear feature from a fixed origin point.

Although it is possible to convert coordinates from one CRS to another, many users will be put off by the need to do so. Furthermore, the need for such transformations introduces a point where errors can be introduced to the spatial data - especially where users have limited expertise with spatial data.

When publishing spatial data it is best to help users avoid the need for them to transform spatial data between coordinate reference systems themselves by providing data in a form, or forms, which they can use directly. To determine which coordinate reference system(s) are needed, data publishers must consider the intended applictions of their user community.

Intended Outcome

Spatial data is provided in a coordinate reference system, or systems, that are sensitive to the needs of user's intended applications.

The majority of a publisher's anticipated user community do not need to transform coordinate values prior to using the spatial data.

Possible Approach to Implementation

Whichever coordinate reference system is chosen for the publication of spatial data, it is imperative that that choice is made clear to users. Please refer to Best Practice 17: State how coordinate values are encoded for further details.

The first thing that publishers of spatial data need to do is consider their audience.

When publishing spatial data on the Web, the largest community of potential users will be unknown: anyone might find and use data published on the Web! To support this unanticipated reuse, we recommend always publishing your spatial data using a global coordinate reference system which allows spatial data data from multiple sources to be readily combined for display or computation. WGS 84 Lat/Long (EPSG:4326) or WGS 84 Lat/Long/Elevation (EPSG:4979) are good choices as many of the tools and applications used by Web developers are set up to use data from GPS-enabled mobile devices that all use WGS 84. Although Web Mercator (EPSG:3857) has global coverage data publishers should be aware that its geodetic datum is spherical and not true to the shape of the earth, thereby introducing positional differences of upto 20 kilometers when compared with WGS 84.

Where considerations of the known user community (or communities) call for different coordinate reference systems, we recommend publishing spatial data in multiple representations: one for each of the prioritised coordinate reference systems. Clearly, the number of representations provided needs to be determined with respect to the associated effort. However, remember that a decision not to publish data in a priority CRS will result in each member of your user community needing to do that task - or them not using your data.

Common reasons for needing to publish in additional coordinate reference systems include:

  1. publication through government data portals that require use of a projected CRS defined by the national mapping agency - and similar legislative requirements;

    The Basisregistraties Adressen en Gebouwen (BAG), or Basic Registers for Addresses and Buildings, provided by Kadaster, publishes data in both OGC CRS84 (using the WGS 84 geodetic datum) and the Amersfoort / RD (EPSG:28992) coordinate reference systems.

    The INSPIRE Directive 2007/2/EC of the European Commission requires that the European Terrestrial Reference System 1989 ETRS89 (EPSG:4258) is used for the referencing of spatial data sets.

  2. applications such as defence and precision agriculture that require coordinates to be accurate to tens of centimeters or less, thereby requiring the use of a CRS with an alternative geodetic datum that provides a superior fit for the local or regional geographic area - noting that every CRS and datum should define the geographic area within which it is intended to be used;
  3. the need to support applications that work in a local frame of reference using an engineering CRS - such as in a urban environment, inside a building complex or using chainage along a survey line;
  4. avoiding computationally intensive reprojection of raster data within end-user applications; and
  5. the need to retain the integrity of raster data by publishing in its original projection, thereby avoiding modification of pixel values due to the reprojection.

Discussion of coordinate system transformations is beyond the scope of this best practice document: converting coordinates between CRSs that use different datums and or projections can be very involved. This is especially true where elevation values are missing from the source data. For reference, EPSG guidelines say that in such cases reasonable assumptions are:

  • Height = 0 meters (i.e. we are standing on the surface of the ellipsoid); or
  • The height is given by a digital elevation model (i.e. we are standing on the surface of the planet).

That said, we note that there are a number of open source software implementations are available to help users do such conversions. These include: the Geospatial Data Abstraction Library (GDAL), the Cartographic Projections Library (PROJ.4), its associated JavaScript implementation (PROJ4.JS) and the Apache Spatial Information System Library (SIS).

How to Test

Spatial data is available in at least WGS 84 Lat/Long (EPSG:4326) or WGS 84 Lat/Long/Elevation (EPSG:4979).

Evidence

Relevant requirements: R-AvoidCoordinateTransformations, R-CoordinatePrecision.

Benefits

  • Comprehension

Make your spatial data indexable by search engines

Search engines should be able to crawl spatial data on the Web and index spatial things for direct discovery by users.

Why

In SDIs information about spatial datasets is published as authoritative metadata records and collated in Web-based catalogues. This approach causes a number of problems:

  1. the catalogues are often designed to primarily support expert users - people may not even be aware of their existence;
  2. once you have discovered a dataset that meets your needs and identified where it is available from, a second step is required to access the data itself - often requiring the use of unfamiliar protocols or complex API requests; and
  3. the data itself is not indexed - discovery relies on the metadata records that are often sparsely populated or out of date.

Search engines are the common starting point for people looking for content on the Web that is widely understood. By publishing spatial data in a way that enables their crawlers to index spatial datasets including each spatial thing, the fidelity of search results should improve. Users will be able to directly search for specific entities rather than having to look for a dataset and then parse through it; e.g. to search for "Anne Frank’s House" (https://g.co/kg/m/02s5hd) rather than looking for a dataset about "Cultural Heritage in Amsterdam" and hoping that it contains a reference to what you’re interested in.

At present, spatial information is not widely exploited by search engines. However, by increasing the volume of spatial information presented to search engines, and the consistency with which it is provided, we expect search engines to begin offering spatial search functions. We already see evidence of this in the form of contextual search, such as prioritization of search results from nearby entities. In addition, search engines are beginning to offer more structured, custom searches that return only results that include certain [[SCHEMA-ORG]] types, like Dataset, Place or City.

Intended Outcome

Information about spatial datasets and things is indexed by search engines.

Users can find spatial things using common search engines.

Possible Approach to Implementation

In general, you need to:

  1. publish a HTML Web-page for the spatial dataset and each spatial thing that it describes; and
  2. make sure that those pages can be crawled.

The Web-page for the dataset is an entry-point for humans to browse and for the search engines to crawl your data. This landing page should provide descriptive metadata that helps users evaluate whether the dataset meets their needs (see Best Practice 1: Include spatial metadata in dataset metadata and [[DWBP]] Best Practice 2: Provide descriptive metadata), and may provide links to other service end-points, APIs or tools that will help a user work with the dataset. The landing page should be indexable by the search engines so that it can be discovered too!

To enable humans and Web-crawlers to find HTML pages for the spatial things, the "landing page" needs to include hyperlinks that can be followed. Where you have a larger collection of spatial things, you should support paging through the collection.

You may also consider using Sitemaps to direct the Web-crawler; noting that sitemaps currently are limited to several thousands of entries and will not work for larger datasets.

For very large datasets paging through thousands of pages is not useful for a human either. Consider supporting filtering and/or organise the spatial things into subsets, as described in section 12.6 Spatial Data Access.

A pre-condition for this best practice is Best Practice 7: Use globally unique persistent HTTP URIs for spatial things as persistent identifiers are essential to support reliable indexing and linking. Traditionally spatial datasets have not been maintained with stable identifiers for spatial things, but to share spatial data on the Web stable identifiers are a must. Sharing spatial data is more than "just" making the dataset available on the Web.

Each Web-page can likely be generated programmatically from the data you hold about the spatial thing, either directly from the data or by using an API that makes the data available on the Web.

It is important to keep in mind that the HTML representations should not mainly be designed for the search engines, but they should present the data in a clear and understandable way to human users. The page about the spatial thing should be useful to a user and encourage others to link to the page when they share other information about the spatial thing. This typically will also improve the ranking of these pages in search results.

In addition to exposing the spatial data as linked HTML Web-pages, indexing by web-engines can be further enhanced by incorporating a description of the spatial thing as structured markup (in particular [[MICRODATA]] or [[JSON-LD]] annotations using [[SCHEMA-ORG]]) as this enables the search engines to make more detailed assumptions about your resource. It is important to note that this is not only helpful to search engines, but also to other tools that want to understand more about the semantics of the resource, for example, its location.

In [[SCHEMA-ORG]], a spatial dataset is a Dataset and a spatial thing is in general a Place or an Event. For some types of spatial things, more specific sub-types exist, for example City or Mountain.

Location information about a spatial thing is typically provided using a geometry (GeoCoordinates or GeoShape) or a PostalAddress. [[SCHEMA-ORG]] coordinates are restricted to WGS 84 with longitude and latitude. Supported geometry types are points, line strings, polygons, boxes and circles.

Through the use of [[SCHEMA-ORG]] annotations, search engines and others can connect location information with other information, e.g. about the nature of the spatial thing, opening hours, contact details, etc.

The use of [[SCHEMA-ORG]] for spatial data is in its early days and has to be understood as an "emerging practice".

The Web-pages should also provide a mechanism to download data in the formats you decide to support. [[DWBP]] Best Practice 14: Provide data in multiple formats provides guidance.

Typically multiple formats for a resource are supported using two mechanisms: HTTP content negotiation and by adding format-specific file extensions to the resource URI like ".json", ".xml" or ".ttl". Content negotiation is the standard mechanism of HTTP and the format-specific URIs enable the use of clickable links to the resource in a specific format.

Search engines may also index resource representations in other formats than HTML.

In 2016, these topics were analysed in a testbed organised by Geonovum in the Netherlands. More details can be found in reports from the testbed: Spatial Data on the Web using the current SDI and Crawlable geospatial data using the ecosystem of the Web and Linked Data.

The use of [[SCHEMA-ORG]] for describing spatial information is continually evolving; spatial data publishers should familiarise themselves with current practices. A useful Introduction to Structured Data is provided in Google's developer portal.

How to Test

Using a Web browser,

  1. search for the landing page of your dataset, and
  2. check that you can browse to human-readable HTML pages for each spatial thing that the dataset describes.

Monitor the search consoles of the search engines about the progress in indexing your Web-pages and their structured data. In case any errors are reported, try to fix them.

Evidence

Relevant requirements: R-BoundingBoxCentroid, R-Crawlability, R-Discoverability, R-Linkability, R-MachineToMachine.

Benefits

  • Discoverability

Spatial Data Quality

[[DWBP]] provides a best practice discussing how the quality of data on the web should be described (see [[DWBP]] section 8.5 Data Quality for more details). This section is based on the Data Quality section from [[DWBP]] and adds a best practice specific for spatial data.

In the Spatial Metadata section we provided a Best Practice on how to deal with CRS in spatial data on the web. There is also a clear link between CRS and data quality, because the accuracy of spatial data depends for a large part on the CRS used. This can be seen as conformance of data with a "standard" - in this case, a (spatial or temporal) reference system. This is how you can describe spatial data quality using different vocabularies. We will provide an example in this section.

Describe the positional accuracy of spatial data

Accuracy and precision of spatial data should be specified in machine-interpretable and human-readable form.

Why

The amount of detail that is provided in spatial data and the resolution of the data can vary. No measurement system is infinitely precise and in some cases the spatial data can be intentionally generalized (e.g. merging entities, reducing the details, and aggregation of the data) [[Veregin]].

It is important to understand the difference between precision and accuracy. Seven decimal places of a latitude degree corresponds to about one centimeter. Whatever the precision of the specified coordinates, the accuracy of positioning on the actual earth's surface using WGS84 will only approach about a meter horizontally and may have apparent errors of up to 100 meters vertically, because of assumptions about reference systems, tectonic plate movements and which definition of the earth's 'surface' is used.

Intended Outcome

When known, the resolution and precision of spatial data should be specified in a way to allow consumers of the data to be aware of the resolution and level of details that are considered in the specifications.

Possible Approach to Implementation

Describe the accuracy of spatial data in a way that is understandable for humans.

In addition, describe the accuracy of spatial data in a machine-readable format. [[VOCAB-DQV]] is such a format. It is a vocabulary for describing data quality, including the details of quality metrics and measurements.

We need some explanations for the approaches to describe positional (in)accuracy.

a:Dataset a dcat:Dataset ;
  dct:conformsTo <http://data.europa.eu/eli/reg/2010/1089/oj> .

<http://data.europa.eu/eli/reg/2010/1089/oj> a dct:Standard , foaf:Document ;
  dct:title "COMMISSION REGULATION (EU) No 1089/2010 of 23 November 2010
             implementing Directive 2007/2/EC of the European Parliament
             and of the Council as regards interoperability of spatial
             data sets and services"@en ;
  dct:issued "2010-12-08"^^xsd:date .

The following example shows how DQV can express the precision of a spatial dataset:

:myDataset a dcat:Dataset ;
   dqv:hasQualityMeasurement :myDatasetPrecision, :myDatasetAccuracy .

:myDatasetPrecision a dqv:QualityMeasurement ;
   dqv:isMeasurementOf :spatialResolutionAsDistance ;
   dqv:value "1000"^^xsd:decimal ;
   sdmx-attribute:unitMeasure  <http://www.wurvoc.org/vocabularies/om-1.8/metre>
   .

:spatialResolutionAsDistance  a  dqv:Metric;
    skos:definition "Spatial resolution of a dataset expressed as distance"@en ;
    dqv:expectedDataType xsd:decimal ;
    dqv:inDimension dqv:precision
    .
            

This example was taken from [[VOCAB-DQV]]. For more examples of expressing spatial data precision and accuracy see DQV, Express dataset precision and accuracy.

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-QualityPerSample.

Benefits

  • Reuse
  • Trust

Spatial Data Versioning

Spatial things and their attributes can change over time. For example, a lake may grow or shrink due to changes in climate, water extraction or any number of reasons. For many applications, it is important that information about spatial things is kept up to date. When new information is available, the data publisher may make this available on the Web according to their update schedule and policies. [[DWBP]] section 8.6 Data Versioning and Best Practice 21: Provide data up to date provide directly applicable guidance.

When dealing with change to a spatial thing, you should consider its lifecycle; in particular, how much change is acceptable before a spatial thing can no longer be considered as the same resource. Consider Eddystone Lighthouse for example: the “Eddystone Light”, a maritime navigation aid, has existed in (more or less) the same place on Eddystone Rocks since 1698. A single HTTP URI (such as http://dbpedia.org/resource/Eddystone_Lighthouse) is used to identify “the lighthouse on Eddystone rocks” for all that period. The lighthouse's attributes (such as its focal height, visible range and light characteristic) have changed over that period, but we still consider it to be the same lighthouse. However, if our interest is historic buildings, we would identify the four different structures that have stood on that site as different spatial things, from Winstanley's Eddystone Lighthouse (the first incarnation) to Douglass' Eddystone Lighthouse (the 4th and current incarnation). Incremental change for these structures during the entire period from 1698 is not appropriate; one structure replaces another and so each structure should be assigned a unique identifier. In summary, different things are important to different people!

Essentially, the decision to assign a new identifier in response to change depends on how domain experts think about the lifecycle of the spatial thing, which then manifests in a data modelling choice. [[DWBP]] section 8.9 Data Vocabularies and provide further guidance on the topic of data modelling; determining which concepts and relationships should be used to describe your area of interest.

Data publishers should not attempt to guess all the purposes for which someone might use or reference their data - ending up with a super-complex data model that tries to cover every possible use case. Instead, data publishers should try to help data consumers make informed decisions about the best way to use the data by providing good metadata. When it comes to spatial things, or any resource, that changes over time, it is important to provide metadata about the life cycle of those entities and the resources used to describe them. Given that information, data consumers can make considered choices about which resource they want to link to. [[DWBP]] section 8.2 Metadata provides useful guidance.

All that said, if you consider that the change affects the fundamental nature of the spatial thing, then you should assign a new identifier. See for more details. Otherwise, read on for guidance on how to describe properties that change over time.

Describe properties that change over time

Spatial data should include metadata that allows a user to determine when it is valid for.

Why

Spatial things and their attributes change over time. Mostly, users are interested in current information. They need to be able to determine whether the published description of a spatial thing meets their needs. For example, is the published geographic extent of the City of Amsterdam relevant for a land-usage study of the nineteenth century? (Gemeentegeschiedenis.nl, "Municipality History", illustrates how the extent of Amsterdam has changed during the past 200-years, in HTML and GeoJSON). Where the information is available, a user may want to browse older versions of the published information to understand the nature of any changes or to find historical information.

Intended Outcome

Users are provided with the most recent version of information about a spatial things and its attributes by default.

Users are able to determine the time period for which data is applicable.

If a version history of changes is available, users are able to browse through a set of changes to see how a spatial thing and its attributes have changed over time.

Possible Approach to Implementation

When publishing information about a spatial thing that is subject to change there are three main approaches to consider:

  1. simply updating the description of the spatial thing in response to a change;
  2. providing a series of immutable snapshots that describe the spatial thing at various points in its lifecycle; and
  3. capturing a time-series of data values within an attribute of the spatial thing.

Whichever approach is chosen, publishers of spatial data should consider how dataset metadata plays an important part in helping users determine whether a dataset is fit for their use. Particularly where the contents of a dataset change with time, statements about the (most recent) publication date, the frequency of update and the time period for which the dataset is relevant (i.e. temporal extent) should be provided. Please refer to [[DWBP]] section 8.2 Metadata for more details about dataset metadata.

A description of the lifecycle of the spatial things (e.g. what triggers a change and whether those changes are versioned etc.) should also be provided in either the dataset's metadata, schema or specification. For example, the UK's Digital National Framework policy states that data publishers must provide these lifecycle rules.

Approach (1) is lightweight and should only be used where there are no user requirements that require access to older descriptions of the spatial things. Data publishers simply replace the old description of the spatial thing with the amended description and keep users informed about updates by providing the appropriate metadata (e.g. when the data was changed). This may be achieved using dataset metadata (as outlined above) or by including the metadata attributes in the description of each spatial thing.

Where users are anticipated to need to understand how a spatial thing has changed over time, approaches (2) and (3) must be considered.

Approach (2) requires the data publisher to publish immutable resources that describe the spatial thing at specific points in time (i.e. "snapshots") and provide a mechanism for users to browse between those snapshots. Given that each snapshot of the spatial thing is published as a separate resource, this approach is suited to infrequent changes so that the number of snapshots does not become unweildy.

The URI for the spatial thing, the base URI, should resolve to provide the current information and a link to its version history of snapshots. [[DWBP]] Best Practice 8: Provide version history describes how a version history may be implemented. Each snapshot resource within the version history must be uniquely identified; a common approach is to append a date/time stamp to the base URI as a version indicator. [[DWBP]] Best Practice 7: Provide a version indicator provides relevant guidance.

Approach (3) is suitable where a spatial thing has a small number of attributes that are frequently updated. For example, the GPS-position of a runner or when streaming data from a sensor, such as the water level from a stream guage.

With this approach, the description of the spatial thing must include a property that contains a sequentially-ordered set of data-points, each of which defines a time-stamp and the values for the time-varying attribute(s). By definition, this property can be considered as a time-series coverage. Standard data encodings are available for time-series data, including: [[TIMESERIESML]] for [[GML]], plus [[COVERAGE-JSON]] and [[SENSORTHINGS]] for JSON.

The OGC [[MOVING-FEATURES-XML]] and [[MOVING-FEATURES-CSV]] specifications follow the pattern described above. A trajectory element is used to describe the position of a spatial thing, and varying attributes (such as orientation or rotation) can be added alongside the tuples in the trajectory. However, there is limited evidence of adoption outside of Japan.

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-MovingFeatures, R-Streamable

Benefits

  • Comprehension
  • Trust
  • Access

Spatial Data Identifiers

The primary topics of any spatial dataset are spatial things, each described by a set of attributes and usually at least one geometry. How your spatial data is structured will depend on the vocabulary or data model you use (see for further details on vocabulary choice). This will determine the types of entities that, along with the spatial things themselves, are important enough to be given identifiers so that statements can be made about them. Geometry objects are an example of an entity that is often assigned a unique identifier so that they can be referenced or reused.

To publish spatial data on the Web, we need to stitch the spatial things and their corresponding entities into the Web’s information space; contributing to the Web of data. First: [[WEBARCH]] Good Practice: Identify with URIs states that "agents should provide URIs as identifiers for resources". Second: the 5 Star Data scheme states: "★★★★ use URIs to denote things, so that people can point at your stuff".

[[DWBP]] Best Practice 10: Use persistent URIs as identifiers within datasets provides directly applicable guidance. When identifying resources, it advises:

  1. Seek and reuse existing URIs, ensuring that the URIs are persistent and they are published by a trusted group or organization; or
  2. Create your own persistent URIs.

Furthermore, given ubiquitous use of the Hyper Text Transfer Protocol (HTTP) on the Web, we SHOULD use HTTP URIs to identify resources in spatial data.

We consider identifiers in the Web’s information space to be unaffected by the choice to serve HTTP content securely or not. For example, http://example.org/country/suriname and https://example.org/country/suriname both identify the same spatial thing - in this case the South American country of Suriname.

Resources identified with HTTP URIs can be specified as the target of links within the Web’s global information space, enabling information from different sources to be related and combined. This is the fundamental basis of 5★ Linked Data: "★★★★★ link your data to other data to provide context".

Use globally unique persistent HTTP URIs for spatial things

Use stable HTTP URIs to identify spatial things, re-using commonly used URIs where they exist and it is appropriate to do so.

Why

The Web works with resources that are identified using HTTP URIs. We want Spatial things to be first class resources on the Web that we want to make statements about and relate to other resources. To do this, spatial things need to be addressable resources in the Web’s global information space which means they must be identified using HTTP URIs.

This is a fundamentally different data publication approach to what is typical today where the dataset is (often) globally identified, but individual spatial things, or "features" in SDI parlance, are not - at least not with a persistent identifier.

The HTTP URIs used to identify spatial things need to be stable or persistent so that relationships that link them to other resources don’t break.

Intended Outcome

Spatial things become part of the Web’s global information space enabling them be linked with other spatial things and other resources and for those links to be durable. In other words, spatial data becomes part of the Web of Data.

Possible Approach to Implementation

The Web of data is made up of subjects and objects; the things we talk about and the things we refer to. For example, we could say that Anne Frank's House (the subject) is within the Municipality of Amsterdam (the object). In RDF this looks like:

<https://g.co/kg/m/02s5hd> schema:containedInPlace <http://sws.geonames.org/2759793> .

When considering HTTP URIs for objects (e.g. the target of our hyperlinks) it makes sense to reuse existing identifiers. After all, you are trying to stitch your spatial data into the Web so that we can "link your data to other data" and achieve a ★★★★★ rating! Organizations such as DBPedia, GeoNames and government mapping and cadastral authorities (that publish national registers of addresses, buildings, etc.) are good sources of stable, authoritative URIs. The steps described for discovering existing vocabularies [[LD-BP]] can be readily adapted to find more. For more details about how you might link to these authoritative identifiers, see .

However, HTTP URIs for subjects (e.g. the resource that we want to make statements about) can be a bit more tricky. If you are working purely with data then you can reuse existing URIs minted by other authorities for your subject URIs. But publishing spatial data on the Web means that the URIs for each spatial thing should resolve to Web pages or data resources that provide useful information (see ). An HTTP request will be directed to a host Web server, identified by the internet domain name (or IP address) in the requested URI. If you use a URI with an internet domain name where you have no control over how the Web server behaves, then there is no way for your statements to be included in the Web server's response.

To take control of how information about spatial things is presented, data publishers need to assign their subject spatial things HTTP URIs from an internet domain name where they have authority over how the Web server responds. Typically, this means minting new HTTP URIs. It's all worth considering that the use of a particular internet domain may reinforce the authority of the information served. For example, a URI for Anne Frank's House is: https://monumentenregister.cultureelerfgoed.nl/monuments?MonumentId=4296. The use of the internet domain registered to the Cultural Heritage Agency of the Netherlands gives the definition authenticity.

The need to control what information is provided about a given spatial thing means that it is not uncommon for a spatial thing to be identified by multiple HTTP URIs. The equality between two URIs that refer to the same resource can be stated using a property such as owl:sameAs. Care must always be taken when using owl:sameAs to determine that the two URIs actually refer to the same resource, rather than two resources that are similar. Warning: don't say if you're not sure it's true!

For more information about the types of properties that can be used to link between spatial things, and between spatial things and other resources, see .

When minting your own URIs, [[DWBP]] Best Practice 10: Use persistent URIs as identifiers within datasets cites the advice from GS1's SmartSearch Implementation Guideline [[GS1]] which suggests that your URIs should include the type of resource that is being identified to help human readability. Also, given the need for the HTTP URIs for spatial things to be used throughout their lifetime (and perhaps beyond) you should give some thought to designing a URI that is persistent.

[[DWBP]] Best Practice 9: Use persistent URIs as identifiers of datasets cites the European Commission's Study on Persistent URIs [[PURI]] as a good source from which to gain insights about designing persistent URIs.

When an HTTP URI is resolved, the server will respond with a sequence of bytes: by its nature, HTTP can only serve information resources such as Web pages or JSON documents. Yet a spatial thing is actually a real or conceptual phenomenon - a lake is made from water not information! Using a single URI to refer to both the spatial thing and the page/document that describes the spatial thing introduces a URI collision. This can impose a cost in communication due to the effort required to resolve ambiguities. [[URLs-in-data]] has more to say on this subject, including recommending URI design patterns that enable differentiation between the spatial thing and the page/document that describes it.

However, in most cases using a single URI for both spatial thing and the page/document is simpler to implement and meets the expectations of most end-users. As stated in [[WEBARCH]] section 2.2.3 Indirect Identification, identifiers are commonly used in this way. There is no obligation to distinguish between the spatial thing and the page/document unless your application requires this.

While there is a cost to this conflation, problems can be mitigated by avoiding making statements that confuse spatial thing and the page/document, such as “Uluru is available in KML format”; e.g. <http://sws.geonames.org/7645281> dc:hasFormat ex:kml .

This statement is clearly not true; an ancient monolith covering more than 3 km2 cannot be provided in XML!

There is a level of discomfort in the wider community (based on discussion with Platform Linked Data Nederland folks amongst others) about whether this best practice should recommend "indirect identifiers" (where spatial thing and page/document both share the same URI) while the TAG Guidance (albeit from 2005) states that a HTTP 303 (see other) response should be provided by servers resolving the URI of a non-information resource (such as a spatial thing), referring the user agent to the corresponding information resource. (i.e. the /id and /doc pattern that is in widespread use but often seems to confuse users and even some experts).

We pretty much agreed that use of indirect identifiers was OK during our discussion at TPAC-2016. That said, we didn't record a resolution.

If we want to stick with the TAC guidance, suggest that we remove the paragraph beginning

However, in most cases using a single URI for both spatial thing and the page/document [...]

and the following note, and replace with:

Dereferencing URIs for spatial things should result in a HTTP 303 (see other) response that redirects the user agent to the corresponding page/document. This means that the spatial thing and the page/resource MUST have different URIs. It is common to use /id as part of the URI for non-information resources, and /doc for the corresponding page/document.

That said, [[URLs-in-data]] provides other alternatives such as using a #id fragment.

HTTP URIs for spatial things should not include any indication of the data format used to encode the page/document as this may change as your systems evolve. That said, you may wish to provide a set of complementary resources that specify a particular format as part of your content negotiation strategy. For example, the URI http://sws.geonames.org/7645281/about.rdf resolves to provide an RDF/XML encoding of the information about Uluru in the Northern Territory of Australia (http://sws.geonames.org/7645281).

[[DWBP]] Best Practice 10: Use persistent URIs as identifiers within datasets notes that URIs can be long. You may need to define identifiers that are locally unique within your spatial dataset and provide a mechanism to programmatically convert each local identifier to a URI. For example, the Metadata Vocabulary for Tabular Data [[TABULAR-METADATA]] achieves this using URI Templates as described in [[RFC6570]].

It is also good practice to use a redirection service to hide complex and potentially changing service end-point URLs, such as for a Web Feature Service behind well-designed URIs. This means that users don’t need to be aware of the complexities of the API or changes in endpoint URIs or API versions in order to request information about a particular spatial thing. For example, the URI http://data.example.org/aan/id/perceel/aan.2528 could be used as proxy for the WFS GetFeature request http://geodata.nationaalgeoregister.nl/aan/wfs?VERSION=2.0.0&SERVICE=WFS&REQUEST=GetFeature&featureID=aan.2528.

Finally, while it is simple to use a query-pattern URL to serve information about a resource identified with a URI from a third-party internet domain, e.g. http://example.org/museums?q=http://sws.geonames.org/6618987, these URLs are unsuitable as persistent identifiers. More often than not, your intended users will dereference the "official" URI, e.g. http://sws.geonames.org/6618987. That said, this kind of search operation does provide a useful mechanism to find particular spatial things. See Best Practice 11: Expose spatial data through 'convenience APIs' for further details.

How to Test

Check that within the data spatial things, such as countries, regions and people, are referred to by HTTP URIs or by short identifiers that can be converted to HTTP URIs. Ideally the URIs should resolve, however, they have value as globally scoped variables whether they resolve or not.

Evidence

Relevant requirements: R-Linkability, R-GeoReferencedData, R-IndependenceOnReferenceSystems.

Benefits

  • ...

Spatial Data Vocabularies

In this document there is no section on formats for publishing spatial data on the web. The formats are basically the same as for publishing any other data on the web: XML, JSON, CSV, RDF, etc. Refer to [[DWBP]] section 8.6 Data Formats for more information and best practices.

That being said, it is important to publish your spatial data with clear semantics, i.e. to provide information about the contents of your data. The primary use case for this is you have information about a collection of SpatialThings and you want to publish precise information about their attributes and how they are inter-related. Another use case is the publication on the Web of a dataset that has a spatial component in a form that search engines will understand.

Depending on the format you use, the semantics may already be described in some form. For example, in GeoJSON [[RFC7946]] this description is present in the specification. When using JSON it is possible to add semantics using a JSON-LD @context object. For providing semantics to search engines, using [[SCHEMA-ORG]] is a good option, as explained in Best Practice 4: Make your spatial data indexable by search engines.

In a linked data setting, the attributes of a spatial thing can be described using existing vocabularies, where each term has a published definition. [[DWBP]] Best Practice 15: Reuse vocabularies, preferably standardized ones recommends using terms from an established widely used vocabulary. If you can't find a suitable existing vocabulary term, you should create your own, and publish a clear definition for the new term (see [[LD-BP]]. We recommend that you link your own vocabulary to commonly used existing ones because this increases its usefulness. We provide the mapping between some commonly used spatial vocabularies.

We must avoid being overly focused on RDF.

The [[LD-BP]] reference makes this section very RDF dependent. Is there a need / justification for this? Are we saying that RDF is the only recommended way to publish data and models on the Web? Web developers may not care about RDF vocabularies and maybe they prefer a Swagger document (just to pick an example)?

To reduce the RDF focus, some text was added.

The current list of RDF vocabularies / OWL ontologies for spatial data being considered by the SDW WG are provided below. Some of these will be used in examples. Full details, including mapping between vocabularies, pointers about inconsistencies in vocabularies (if any are evident), and recommendations avoiding their use as these may lead to confusion, will be published in a complementary NOTE: Comparison of geospatial vocabularies.

The NOTE will be concerned with helping data publishers choose the right spatial data format or vocabulary. It provides a methodology for making that choice. We do this rather than recommending one vocabulary because this recommendation would not be durable as vocabularies are released or amended.

Vocabularies can discovered from Linked Open Vocabularies (LOV); using search terms like 'location' or Tags place, Geography, Geometry and Time.

No attempts have yet been made to rank these vocabularies; e.g. in terms of expressiveness, adoption etc.

The motivation behind the ISA Programme Location Core Vocabulary was establishing a minimal core common to existing spatial vocabularies. However, experience suggests that such a minimal common core is not very useful as one quickly need to employ specific semantics to meet one's application needs.

Do we need a subclass of SpatialThing for entities that do not have a clearly defined spatial extent; or a property that expresses the fuzziness the extent?

Describing location

Location information is often a common thread running through such data and can be an important 'hook' for finding information and for integrating different datasets. There are different ways of describing the location of spatial things. You can use and/or refer to the name of a well known named place, provide the location's coordinates as a geometry or describe it in relation to another location. These last two options are described in this section.

Provide geometries on the Web in a usable way

Geometry data should be expressed in a way that allows its publication and use on the Web.

Why

The geospatial, Linked Data, and Web communities use different geometry formats and tools, which reflect different requirements with respect to data complexity and manipulation.

When deciding how a geometry should be described, it is therefore necessary to take into account the intended uses and the related user communities. Which may imply providing alternative geometry descriptions.

This best practice helps with choosing the right format for describing geometries, based on aspects like intended use(s), performance, and tool support. It also helps when deciding on when using literals rather than structured objects for geometric representations is a good idea.

Since coordinate reference system and axis order are two of the factors determining how a geometry is described, this best practice is strictly correlated to and , to which we refer the reader for more information.

Intended Outcome

The format chosen to express geometry data should:

  • Support the dimensionality of the geometry (from points - 0D - to volumes - 3D) - not all geometry formats support all dimensions;
  • Be supported by the software tools used within data user community - the geospatial and Web communities use different tools, working with different geometry formats;
  • Keep geometry descriptions to a size that is convenient for the intended applications - Web applications are typically not using detailed geometries;
  • Support the coordinate reference system you need.

Ideally, to enable their widest re-use, geometries should be described having in mind the geospatial, Linked Data and Web communities. This may not be always feasible, but the objective should at least be to describe geometries (also) for Web consumption.

Possible Approach to Implementation

Steps to follow:

  • Identify the intended uses and applications. In particular, it is important to verify if geometries needs to be used in one or more of the following scenarios:
    • specific geospatial applications;
    • linked data applications;
    • Web consumption.
  • For each of the intended uses / applications, provide possibly alternative descriptions of geometries, taking into account:
    • The appropriate geometry dimensionality (0D - points, 1D - lines, 2D - surfaces, 3D - volumes). See for more information.
    • The appropriate coordinate reference system(s). See for more information.
    • The appropriate geometry encoding(s) / representation(s) - also taking into account the software tools that you anticipate your user community to employ. See for more information.
    • The appropriate level of complexity.
  • Where multiple representations are required, consider offering as many as you can - balancing the benefit of ease of use against the cost of the additional storage or additional processing if converting on-the-fly. See [[DWBP]] Best Practice 19: Use content negotiation for serving data available in multiple formats for more information.

    HTTP content negotiation only works for media-type, character set, encoding and language. As a consequence, it is not possible to select one representation that conforms to a given "profile" (e.g. data model, complexity level, CRS) from several that all share the same media-type; e.g. asking for the geojson features with "simple" geometries (compacted polygons or just points) not the "complex" geometries; or asking for the representation that uses CRS84 not Amersfoort-RD.

It is important to note that the steps outlined above are interrelated. For instance, the dimensionality of a geometry determines the set of coordinate reference systems that can be used, as well as the geometry encodings / representations.

Another issue to be taken into account in choosing the geometry format, it is whether the axis order is unambiguous - i.e., whether the order of the coordinates is, e.g., longitude/latitude or latitude/longitude. This specific topic is covered by .

Multiple formats exists for representing geometries (and some of them are listed in ). One of issues to be taken into account when choosing the format(s) to be supported, it is whether to use literals or structured objects.

  • For geometry literals, several solutions are available, like Well-Known Text (WKT) representations, GeoHash and other geocoding representations. The alternative is to use structured geometry objects as is possible, for example, in [[GeoSPARQL]].
  • There are also several suitable binary data formats (e.g. Google's protocol buffers for vector tiling); however, some binary formats do not (effectively) work on the Web as there are no software tools for working with those formats from within a typical Web application; to work with data in such formats, you must first download the data and then work with it locally.
  • There are widespread practices for representing geometric data as linked data, such as using [[W3C-BASIC-GEO]] (geo) w3cgeo:lat and w3cgeo:long that are used extensively for describing w3cgeo:Point objects.
  • Concrete geometry types are available, such as those defined in the OpenGIS [[SIMPLE-FEATURES]] Specification, namely 0-dimensional Point and MultiPoint; 1-dimensional curve LineString and MultiLineString; 2-dimensional surface Polygon and MultiPolygon; and the heterogeneous GeometryCollection.

Currently, there are two reference geometry formats widely used in the geospatial and Web communities, respectively, [[GML]] and GeoJSON [[RFC7946]].

[[GML]] provides the ability to express any type of geometry, in any coordinate reference system, and up to 3 dimensions (from points to volumes).

On the other hand, GeoJSON supports only one coordinate reference system (CRS84 - i.e., WGS84 longitude/latitude), and geometries up to 2 dimensions (points, lines, surfaces).

In order to facilitate the use of geometry data on the Web, it is therefore desirable that [[GML]]-encoded geometries are made available also in GeoJSON, by applying not only the required coordinate reference system transformation, but, if needed, by simplifying the original geometry (e.g., by transforming a 3D geometry in a 2D one).

Another approach to publishing geometries on the Web is to embed them directly in Web pages. This is, for instance, the approach used by [[SCHEMA-ORG]], which defines a number of terms to specify them (see for more information).

Typically, this is used just for 0D-2D geometries (points, lines, surfaces). Detailed and complex geometries cannot be published with this methodology, so also in this case only a very simplified representation of the original geometry can be published - e.g., the centroid and/or 2D bounding box.

Finally, RDF-based representations of geometries are used in the Linked Data community. This is achieved by using specific vocabularies, as [[W3C-BASIC-GEO]] (only for points) or [[GeoSPARQL]] (for points, lines, surfaces) - see for more information.

These geometry representations are either stored with the related data, or are maintained separately, and possibly denoted with HTTP URIs (see ).

RDF representations of geometries can support most geometry types and dimensions (at least, up to 2 dimensions), with any level of complexity, in any coordinate reference system. On the other hand, existing Semantic Web tools, as triple stores, are currently not efficient enough to perform spatial queries which are complex and/or on complex geometries. In the latter case, it is therefore preferable to maintain geometries separately, in software platforms designed for these specific tasks.

It is nonetheless desirable to make available these geometries for Web consumption as said before for [[GML]]-encoded geometries - i.e., by publishing also simplified version of them, either in GeoJSON or embedded in Web pages.

The following Turtle snippet shows the [[GeoDCAT-AP]] representation of the dataset in . Here the bounding box is provided in multiple literal encodings (WKT, [[GML]], GeoJSON), by using property locn:geometry [[LOCN]].

In the above example, the coordinate reference system used for the bounding box is CRS84 (equivalent to WGS84, but with axis order longitude/latitude), which is explicitly specified in the [[GML]] encoding via attribute @srsName, and by using the relevant HTTP URI from the OGC CRS registry. The coordinate reference system is not specified for the WKT encoding, since CRS84 is the default coordinate reference system for WKT in [[GeoSPARQL]], and therefore it can be omitted. The coordinate reference system is also not specified in the GeoJSON encoding, since CRS84 is the only supported coordinate reference system in GeoJSON [[RFC7946]].

Always with reference to , the following snippet shows the [[GML]] and the RDF representations of the entry in the BAG Dutch register concerning the building where Anne Frank's house is located. For the corresponding GeoJSON representation, see the relevant example in .

It is worth noting that the [[GML]] snippet above also includes the explicit specification of the axis order (via attribute @axisLabels) and the number of dimension of the geometry (via attribute @srsDimension).

The corresponding RDF representation is provided in the following Turtle snippet (taken from the BAG Linked Data service). NB: The RDF representation below has been complemented with additional properties (marked with # Added) for demonstration purposes.

The RDF representation above includes:

  • The detailed geometry of the building (geosparql:asWKT / pdok:asWKT-RD), in WKT and using multiple reference systems.
  • The bounding box (schema:box) and centroid (w3cgeo:lat and w3cgeo:long).

The different WKT encodings in the example show alternative ways of specifying the coordinate reference system used.

The two instances of property geosparql:asWKT follow the syntax recommended in [[GeoSPARQL]], where the specification of the coordinate reference system is required only if different from CRS84. By contrast, property pdok:asWKT-RD implies the use of a specific coordinate reference system, namely, EPSG:28992 ("Amersfoort / RD New"). The axis order used is determined here by the coordinate reference system, and in both cases it is longitude / latitude (more precisely, east/north for EPSG:28992). By contrast, the coordinates for the bounding box and centroid use WGS84, with axis order latitude / longitude.

shows also how geometries for spatial things can be publshed as separate Web resources. This approach can be particularly suitable for giving access to huge geometries, consisting of hundreds of vertices (as the detailed geometry of the boundaries of a geographical region), without attaching them to the relevant spatial things. Moreover, this allows the same geometry to be linked from (i.e., re-used by) different spatial things. Finally, it is possible to use mechanisms (including HTTP content negotiation) to provide access to different representations / encodings of the geometry ([[GML]], WKT, GeoJSON, etc.), thus addressing different use cases. (On this topic, see also ).

This section needs to be completed with:

  • Guidelines on how to provide geometries at different levels of precision and complexity, highlighting issues to be taken into account when simplifying geometries.
  • More details about how WKT is used.
  • An RDF example with embedded geometry
  • Clarification on the differences between “coordinate positions” as a literal and the “geometry” as literal; the latter includes additional metadata
  • Clarification on "formats" for geometries only (e.g., WKT) and those used also for features (e.g., GML, GeoJSON)

How to Test

Check if:

  1. Geometries are made available in possibly different formats and levels of complexity, taking into account their intended uses and their consumption on the Web.
  2. The choosen geometry descriptions comply with and .
  3. The (possibly) alternative geometry descriptions can be accessible via standard mechanisms, as HTTP content negotiation.

Evidence

Relevant requirements: R-MultipleCRSs, R-BoundingBoxCentroid, R-Compressible, R-CRSDefinition, R-EncodingForVectorGeometry, R-IndependenceOnReferenceSystems, R-MachineToMachine, R-SpatialMetadata, R-3DSupport, R-TimeDependentCRS, R-TilingSupport.

Benefits

  • ...

Describe relative positioning

Provide a relative positioning capability in which one entity can be positioned relative to another entity.

Why

Geocentric coordinate reference systems describe position relative to the earth itself. It can also be valuable or even necessary to describe the position of an entity relative to a second entity. In some cases this is a navigation convenience, for example a tour kiosk might be described as being located between the Boston Common Frog Pond and the Park Street T entrance, or in one's lower left view when looking up at the Statehouse. In other cases of moving or generalized entities, it may be that the entity can only usefully be given a relative position. For example, a package is reported left on seat 32L1 on the #59 bus, or part number PRG5460 is always located at position (51, 73, 3) in Acme warehouses.

Intended Outcome

It should be possible to describe the location of an entity in relation to one or more other entities or places, instead of specifying it's own geocentric position or geometry.

The relative positioning descriptions should be machine-interpretable and/or human-readable as required by the intended application. The positions and/or geometries of reference entities, if available, should be retrievable through their link relations.

Possible Approach to Implementation

Positioning of one entity (A) relative to another referenced entity (B) is a combination of two factors: the referencing target, and the means of relative positioning. "Geocentric" referencing targets the planet itself or at least a fixed point on it. "Allocentric" referencing targets another entity. "Egocentric" referencing targets a particular field of view of an observer or camera. Positioning can take the form of a complete coordinate reference system (e.g. engineering CRS), a qualitative relation such as "beside", or or a quantitative relation such as "30m northwest"

Combinations of relative positioning means and references
Engineering CRS Qualitative Relation Quantitative Relation
Geocentric Coordinate position A relative to a fixed earth datum NA NA
Allocentric Coordinate position A relative to a fixed, mobile, or generic entity B A "next to" B A "20m south" of B
Egocentric Coordinate position A within field of view B A in "lower left corner" of field of view B A "30 deg right of center" in field of view B
  • Descriptions of the positions of entities as explicit links to target entities.
  • Semantic descriptions of the target entities and type of positioning.
  • Encodings of the specific entity relations or the relative coordinate positions in the case of engineering CRS'

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-SamplingTopology.

Benefits

  • ...

Publishing data with clear semantics

In most cases, the effective use of information resources requires understanding thematic concepts in addition to the spatial ones; "spatial" is just a facet of the broader information space. For example, when the Dutch Fire Service responded to an incident at a day care center, they needed to evacuate the children. In this case, the 2nd closest alternative day care center was preferred because it was operated by the same organization as the one that was subject of the incident, and they knew who all the children were.

This best practice document provides mechanisms for determining how places and locations are related - but determining the compatibility or validity of thematic data elements is beyond our scope; we're not attempting to solve the problem of different views on the same/similar resources.

That said, there is one aspect of thematic semantics that must be mentioned. The most important semantic statement you can make when publishing spatial data - or any data - is to specify the type of a resource. For spatial things, there are a number of types that define "spatialness" (see ). But you should also consider non-spatial aspects when designating the type of a spatial thing. For example, should a fire incident occur at Amsterdam Central railway station, it might seem sensible for the Municipal Fire Department to designate a type such as Building or Station (the Dutch Government Base Registry defines Amsterdam Central railway station, identified as https://brt.basisregistraties.overheid.nl/top10nl/id/gebouw/102625209, designates both of these types). However, the Fire Department are concerned with a fire incident - not the railway station itself. The fire incident is a spatial thing (it has spatial extent) but it is not the station. For example, the fire may spread to adjacent buildings. The Fire Department might designate their spatial thing as having type FireIncident or similar. Advice on how to assign a persistent identifier to the fire incident is provided in Best Practice 7: Use globally unique persistent HTTP URIs for spatial things, and provides guidance on how one might relate the fire incident to other conincident spatial things such as Amsterdam Central railway station.

Thematic semantics are out of scope for this best practice document. For associated best practices, please refer to [[DWBP]] section 8.2 Metadata, Best Practice 3: Provide structural metadata; and [[DWBP]] section 8.9 Data Vocabularies, Best Practice 15: Reuse vocabularies, preferably standardized ones and Best Practice 16: Choose the right formalization level.

See also [[LD-BP]] Vocabularies.

Encoding spatial data

Represent spatial data in a way that matches the needs of the target audiences.

Why

Spatial data is used by a range of user communities, each with their own purposes, knowledge and preferred tools. Data publishers should consider which communities and purposes they want to serve and make appropriate choices for the approach to encoding data. In general terms, data usefulness is increased when it can be used for more purposes. This might involve providing data in several different formats. (See [[DWBP]] Best Practice 14: Provide data in multiple formats.)

Intended Outcome

Spatial data can be used easily and reliably by the target users.

Possible Approach to Implementation

A high level objective of these best practices is to highlight approaches that data publishers can take to maximise the ease of use of their spatial data via the web and hence present data in a way that meets the needs of as wide a range of users and applications as possible.

One way of classifying the applications of spatial data is as follows:

  1. Web pages for people to read about spatial things
  2. Web mapping or visualisation applications
  3. Data integration - combining spatial data with other data
  4. Spatial Data Infrastructures

Each of these has different needs: often it will be possible or desirable to support several of these application groups.

The main objective is to encode data in a way that recipients can easily decode and understand. To decide this, you need to consider which purpose(s) and which audience(s) are you aiming to serve and the characteristics of the data that you want to share. For example:

  • the volume of data
  • how many spatial dimensions it covers (points, lines, areas, 3D)
  • what kind of area it covers (one building, a town, a whole country)
  • how frequently it changes
  • the level of spatial precision that exists in the data and the precision needed by users
1. Web pages for people to read about spatial things

In Best Practice 7 we recommend use of HTTP URIs as a way of assigning identifiers to spatial things. The data publisher should offer the ability to look up ('dereference') such a URI to find out useful information about that spatial thing in human readable form (as well as machine readable formats - see the discussion below on data integration). Each spatial thing therefore gets its own web page - in addition it might be useful to have web pages about groups of spatial things, but the 'page per thing' approach enables fine-grained linking of information.

To promote discovery of such web pages in search engines, each page should contain a clear text description of what it is, ideally in a way that distinguishes it from pages about other similar spatial things. Including metadata using the [[SCHEMA-ORG]] vocabulary, embedded as microdata, RDFa or as JSON-LD in the <head> section of the page can provide additional information to search engines to support more precise indexing. See Best Practice 4: Making data indexable by search engines for a more detailed discussion.

It is also very useful in such web pages to include links to descriptions of the spatial thing in other formats (typically machine-readable formats) as well as linking to related spatial things.

2. Web mapping or visualisation applications

A common application of spatial data on the Web is delivering map data in a tiled form, suitable for display in zoomable 'slippy maps'. The Open Geospatial Consortium Web Map Tile Service is an established standard for doing this. Other approaches in common use include MBTiles or 'Tile layers' in Google Maps APIs

Another frequent requirement is to draw markers or polygons on top of a web map. A typical approach is for the browser to display a base map, then separately retrieve data about the spatial things of interest, typically as GeoJSON or KML files, then to combine the two using appropriate Javascript libraries. For applications involving boundary polygons of geographical areas, a common consideration is how to make this process efficient at different zoom levels. A high level of detail is appropriate when zoomed in, but when zoomed out a large number of areas may be visible, and delivering boundaries of all of those at full detail can lead to very large amounts of data and hence poor performance, so simplified lower resolution versions of polygons may be required.

3. Data integration - combining spatial data with other data

Many important applications of spatial data involve combining it with other kinds of data: for example opening times of nearby supermarkets, or statistical information on the economy of a town. Often the Spatial Thing is at the centre of the data analysis process.

Other applications involve distinguishing or selecting spatial things according to their non-spatial characteristics: hospitals with an emergency department, or restaurants that serve Japanese food.

To enable such questions to be answered using data from different sources, it is important to describe spatial things using shared identifiers and vocabularies. This is described in [[DWBP]] Best Practice 10: Use persistent URIs as identifiers within datasets and [[DWBP]]Best Practice 15: Reuse vocabularies, preferably standardized ones.

From a spatial data perspective, the question of identifiers is discussed in Best Practice 7. How to relate a Spatial Thing to its geometry is described in Best Practice 8: Provide geometries on the Web in a usable way.

A common approach to encoding data to enable data integration is Linked Data #linked-data and RDF. The spatial aspects of the data can either be included in the RDF data model, or the entity in question can link to an external web resource containing the geometry in one of the standard spatial data formats. Although RDF is well-suited to important aspects of best practice, including use of URIs as identifiers and re-use of vocabularies, other data formats are also consistent with this approach. Most spatial data formats enable associating attributes of an entity alongside its geometry.

The publisher's choice of data model to represent the data will depend on what data is available and which audiences and purposes it seems most important to support. However a reasonable general rule is that it is always useful to provide a label and a type for each entity in the data collection. (See [[DWBP]] Best Practice 16: Choose the right formalization level)

A common way of specifying the location of a building is to use its postal address. Most spatial applications require an address to be turned into spatial coordinates, so that its location can be marked on a map, or compared with locations of other things, a process known as 'geocoding'. Although a publisher could leave this process of geocoding to the data user, ideally the publisher should take responsibility for this as they are in a better position to check the accuracy of the results. Different ways of specifying addresses can sometimes lead to errors in the geocoding process.

Common vocabularies for describing the address of a Spatial Thing include: schema.org, vCard and ISA Core Location (LOCN).

What3words is an example of a service that assigns an alternative kind of address to a location - in this case a sequence of three common words associated with a 3m by 3m square on the ground. It allows every location to be given such an address and What3words also provides a means to relate the address to latitude and longitude coordinates. Like conventional addresses, converting to coordinates is necessary for many spatial data applications (eg to calculate the distance between points or whether a point is inside a region), but the process of conversion is more reliable and precise.

4. Spatial Data Infrastructures

In the section 'Why are traditional Spatial Data Infrastructures not enough?' the limitations of this approach with respect to web-based data sharing were explained. Nonetheless this approach is a well-established and powerful way of distributing spatial data, based on open standards and suited to a community of expert users. It is thus one of the options a data publisher should consider when deciding how to encode their spatial data.

Balancing quality and cost

The four main classes of application above have a wide range of requirements. To support all of these may require a lot of effort and cost on behalf of a data publisher. There are many aspects to the 'quality' of a spatial data publishing approach, but in general terms it relates to how well the data and approach to data delivery meet the needs of the target audience. By choosing to concentrate on only some kinds of application the publisher can keep cost down. Other factors to consider include performance (speed with which data is delivered), timeliness of updates - which can be a significant consideration if the underlying data changes frequently, software complexity or maintenance.

In many cases a mixture of technologies can be used together to find a good compromise of quality or performance and cost. The strengths of various approaches can be applied to the part of the publishing 'spectrum' that suits them best. For example, if using a Linked Data approach, one option is to keep all data in a triple store; but hybrid approaches are also possible, for example where geometrical information is stored and served from flat files, or where non-geometrical data and metadata is stored in a triple store and used to generate web pages and machine readable descriptions of spatial things, while geometrical data is indexed by software such as Lucene Spatial, PostGIS or ElasticSearch. Use of shared web-accessible identifiers for spatial things can help support the interconnections between a range of diverse information systems.

How to Test

TO DO...

Evidence

Relevant requirements: TO DO

Benefits

  • TO DO...

Temporal aspects of spatial data

Temporal relationship types will be described here and be entered eventually as link relationship types into the IANA [[LINK-RELATION-TYPES]] registry, just like the spatial relationships.

In the same sense as with spatial data, temporal data can be fuzzy.

Retain section; point to where temporal data is discussed in detail elsewhere in this document.

Spatial Data Access

In recent years we have seen widespread emergence of Web applications that use spatial data. Often these applications do not access all the spatial data they use via the Web. While there are good reasons for this, e.g. licensing restrictions, it is often the case, too, that the spatial data is not available via the Web at all, or in ways that application developers find too complex to use, or with insufficient or unclear quality-of-service commitments.

[[DWBP]] provides best practices discussing access to data using Web infrastructure (see [[DWBP]] section 8.10 Data Access). This section provides additional insight for publishers of spatial data.

Making data available on the Web requires data publishers to provide some form of access to the data. There are numerous mechanisms available, each providing varying levels of utility and incurring differing levels of effort and cost to implement and maintain. Publishers of spatial data should make their data available on the Web using affordable mechanisms in order to ensure long-term, sustainable access to their data.

When determining the mechanism to be used provide Web access to data, publishers need to assess utility against cost. In order of increasing usefulness and cost:

  1. Bulk-download or streaming of the entire or pre-defined subsets of a dataset
  2. Generalized spatial data access API
  3. Bespoke API designed to support a particular type of use

Let's take a closer look at these options.

The download of a dataset - or a pre-defined subset of it - via a single HTTP request is mainly covered by these [[DWBP]] best practices:

Providing bulk-download or streaming access to data is useful in any case and is relatively inexpensive to support as it relies on standard capabilities of web servers for datasets that may be published as downloadable files stored on a server. However, this option is more complex for frequently changing datasets or real-time data.

[[DWBP]] Best Practice 18: Provide Subsets for Large Datasets explains why providing subsets is important and how this could be implemented. Spatial datasets, particularly coverages such as satellite imagery, sensor measurement time-series and climate prediction data, are often very large. In these cases it is useful to provide subsets by having identifiers for conveniently sized subsets of large datasets that Web applications can work with.

Effectively, breaking up a large coverage into pre-defined lumps that you can access via HTTP GET requests is a very simple API.

When a subset is provided, this should include information about the relationship to the complete dataset. In HTML this could be descriptive text or it is implicitly clear for humans in the way the subset is presented. In schema.org it could be schema:isPartOf property. In RDF PROV-O could be used to describe the relationship between the subset and the complete dataset as well as the mechanism used to derive the subset. In ISO 19115 metadata, the LI_Lineage element may be used for a similar purpose. Etc.

The use of APIs to access data is covered in [[DWBP]] by the following best practices:

For spatial data, SDIs have long been used to provide generalized access to spatial data via web services, typically using open standard specifications from the Open Geospatial Consortium (OGC). The main examples are Web Feature Service, Web Coverage Service, Sensor Observation Service, SensorThings or [[GeoSPARQL]] for access to data, or Web Map Service and Web Map Tile Service for access to data rendered as map. With the exception of the Web Map Service, the OGC standards have not seen widespread adoption beyond the geospatial expert community.

In addition, commercial offerings for publishing spatial data on the Web often provide access via product-specific APIs, too. These APIs are typically not restricted to HTTP-based web service APIs in the sense of [[DWBP]] Best Practice 24: Use Web Standards as the foundation of APIs, but include APIs targeted at a specific programming language, for example, JavaScript.

In the list of options above, a third option is included as sharing spatial data on the Web using the first two options (bulk download or generalized APIs) may not be sufficient for reaching application developers. Reasons for this include:

A useful 'bespoke API' mentioned in the third option provides convenience to developers of the targeted applications, because the API designer has thought about the needs of those developers when consuming the spatial data shared via the API.

Expose spatial data through 'convenience APIs'

If you have a specific type of application in mind for your data, tailor a spatial data access API to meet that goal.

Why

Providing access to spatial data via bulk download or generalized spatial data access APIs may be too complex for application developers with relatively simple requirements, if the spatial data or the API is complex to understand or too large to handle in a Web application. Convenience APIs are tailored to meet a specific goal; enabling a user to engage with complex data structures using (a set of) simple queries, including spatial search.

Intended Outcome

The API provides a coherent set of queries and operations, including spatial ones, that help users get working with the data quickly to achieve common tasks. The API provides both machine readable data and human readable HTML markup. The human-readable markup will also support search engine's Web crawlers to enable indexing of spatial data.

Possible Approach to Implementation

The API should

  • offer both machine readable data and human readable HTML that includes the structured metadata required by search engines seeking to index content (see Best Practice 4: Make your entity-level data indexable by search engines for details);
  • follow the architectural guidance of the [[DWBP]] Best Practice 24: Use Web Standards as the foundation of APIs and [[DWBP]] Best Practice 19: Use content negotiation for serving data available in multiple formats;
  • be well documented and easy to understand, both in terms of the options to access / filter the data and of the data structures that are returned (see [[DWBP]] Best Practice 25: Provide complete documentation for your API);
  • return data in chunks fit for use in Web applications and are useful sets of information. For large datasets this is related to [[DWBP]] Best Practice 18: Provide Subsets for Large Datasets; this may be achieved, for example, by filtering options that return appropriately sized subsets of the specific dataset or by supporting paging (returning larger subsets in pages with forward/backward links). For paging, some patterns have been established, see for example W3C Linked Data Platform Paging or Hydra pagination. At the other end of the spectrum, overly small pieces of data are inconvenient to use, too. Data should be packaged in lumps that are convenient to work with. An approach where very small, fine-grained units of information are published that require further HTTP requests to get the related information sufficient to determine context is not useful;
  • support queries for spatial things based on user needs. For spatial data, typical needs that should be considered are neighbourhood searches (e.g., "what is near me?" or "what is near this spatial thing?") and searching for things located in a specific area (e.g., an area shown as a map in an application). Users will often look for a particular spatial thing without knowning its identifier, too, in which case a fault-tolerant, free-text search on the name, label or other property may be useful.

In a White Paper about open geospatial APIs [[OGC-API-WP]], the Open Geospatial Consortium (OGC) has defined the concept of the "OGC API Essentials" - a set of items defined in OGC standards and other open standards that are reusable modules for use in geospatial APIs. The White Paper provides an initial list and many of the identified standards are mentioned in this document. Reuse of standardized building blocks improves consistency and interoperability across APIs. It is recommended to consider the OGC API Essentials when defining an API to access spatial data.

If the data is already published in a Spatial Data Infrastructure, there are basically two options to publish the data via an additional convenience API.

  1. Reuse your existing spatial data infrastructure

    Use a RESTful API as a wrapper, proxy or a shim layer can be created around SDI services. This aims at exposing 'generalized APIs' using 'convenience APIs' to make the data easier to use. For example, in the geospatial domain there are a lot of WFS services providing spatial data. Content from the WFS service can be provided in this way as linked data, JSON or another Web friendly format using simple, navigable resources. This approach is similar to the use of Z39.50 in the library community; that protocol is still used but 'modern' Web sites and Web services are wrapped around it.

  2. Provide parallel web-friendly access to the data as an alternative

    A more effective route may be to provide an alternative 'Web friendly' access path to the spatial data is to create a new, complementary service endpoint on top of the native storage of the dataset. This limits the load on your SDI compared to the first option, which may matter as the data access APIs of the SDI will continue to be used by expert users and their complex data management tasks.

How to Test

See the "How to test" sections in [[DWBP]] Best Practice 23: Make data available through an API, [[DWBP]] Best Practice 24: Use Web Standards as the foundation of APIs and [[DWBP]] Best Practice 25: Provide complete documentation for your API.

Evidence

Relevant requirements: R-Compatibility, R-LightweightAPI, R-ReferenceDataChunks.

Benefits

  • ...

(to be deleted)

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

(to be deleted)

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

Linking Spatial Data

Links, in whatever machine-readable form, are important. In the wider Web, it is links that enable the discovery of web pages: from user-agents following a hyperlink to find related information to search engines using links to prioritize and refine search results. This section is concerned with the creation and use of those links to support discovery of the SpatialThings described in spatial datasets.

For data to be on the Web, the resources it describes need to be connected, or linked, to other resources. The connectedness of data is one of the fundamentals of the Linked Data approach that these best practices build upon.

Just like any type of data, spatial data benefits massively from linking when publishing on the web. The widespread use of links within data is regarded as one of the most significant departures from contemporary practices used within SDIs. That's why this topic is included in this Best Practice.

[[DWBP]] identifies Linkability as one of the benefits gained from implementing the Data on the Web best practices (see [[DWBP]] section 8.7 Data Identifiers Best Practice 9: Use persistent URIs as identifiers of datasets and Best Practice 10: Use persistent URIs as identifiers within datasets). However, no discussion is provided about how to create the links that the use those persistent URIs. This section of the document extends [[DWBP]] by providing a best practice about creating links between the resources described inside spatial datasets.

Discussion of links in these best practices are limited to simple links that relate exactly two resources: the source and target. Complex links that relate an arbitrary number of resources, such as described in [[XLINK11]] section 5.1 Extended Links, are out of scope.

Publish links between spatial things and related resources

Bind spatial things into the Web of data using links to other resources, providing sufficient information for a user to determine whether the target resource specified in a link will be of use.

Why

The 5★ rating for Linked Open Data asserts that to achieve the fifth star you must "link your data to other data to provide context". The benefits for consumers and publishers of linking to other data are listed as:

  • You can discover more (related) data while consuming the data.
  • You can directly learn about the data schema.
  • You make your data discoverable.
  • You increase the value of your data.
  • Your own organization will gain the same benefits from the links as the [other] consumers.

Geography is often described as the "glue that binds Linked Data"; the links between spatial things - and between other resources and spatial things - describe how the world around us is structured and interrelated and form an important facet of the Web of Data.

Spatial relationships can often be derived mathematically based on geometry - but this can be computationally expensive. Topological relationships such as these can be asserted, thereby removing the need to do geometry-based calculations. A useful secondary benefit is that these relationships are easier for humans to understand!

One particular issue that spatial data is more prone to than other domains is the non-unique naming. Different authorities and agencies seek to describe the world around them by publishing spatial data, and in doing so, each minting their own URIs (as recommended in Best Practice 7: Use globally unique persistent HTTP URIs for spatial things). Where spatial things are of common interest to multiple agents, it is almost inevitable that a given spatial thing will end up being identified with several URIs. Given necessary due diligence, multiple identifiers may be linked, thereby supporting conflation of multiple sets of information and yielding new perspectives on spatial things.

There is always a cost to traversal of a link, even if it is just a few milliseconds delay and the need to parse a few hundred or thousand bytes returned in response to an HTTP request. In many cases, such as when dealing with large datasets and complex queries, the costs incurred from traversing a link may be significant in terms of time and data volumes. Before a user or software agent decides to traverse a link, they should be able to determine whether acquisition of the target resource, or data about the target resource, will support their application goals. For example, what format can one expect the response in, what type of resource is the target and how is that target related to the source resource?

Links by their nature are a directional relationship between source and target. Most often, a link is published within the dataset that describes the source resource specified in the link, enabling users to browse through information; traversing the links they find in documents (i.e. outbound links). While many links specify source and target resources that are described in the same dataset, it is commonplace, and encouraged as per the 5★ rating, to link between datasets, thereby 'stitching' together the Web of data. In these situations, a link refers to some remote target resource. A dataset access API may interpret such links to enable a user to specify a well-known URI of a spatial thing they are interested in (for example from popular data repositories such as GeoNames, Wikidata or DBpedia) in order to search for related information (see Best Practice 11: Expose spatial data through 'convenience APIs' for more on APIs and search). But how does a user know of the existence of a link referring to their target spatial thing (and potentially identifying a related resource that is useful for their intended goal) when that link is published within a remote dataset resource (i.e. an inbound link)?

Making these inbound links discoverable makes the Web of data symmetric; e.g. where both inbound and outbound links are visible to data users who may then choose to traverse them. However, links provide a secondary benefit in terms of a citation; indicating some subjective trustworthiness of the data (e.g. "it's good enough quality for me to use"). Not only do such link-based citations convey the value of the dataset in a way that the original publisher can objectively quantify (and hence continue to publish and maintain those datasets), but they can provide a subjective indication of quality; similar to a search engine’s page ranking algorithm, the larger the number of sources of inbound links, the greater the likelihood that a given dataset is of high quality.

Intended Outcome

Spatial things are related to other resources in the Web of data using links with appropriate semantics.

Links can be identified and traversed by humans and software agents.

Sufficient information is provided to help humans and software agents determine whether traversal of a given link meets their goals.

Humans and software agents can find resources that link to and from a given spatial thing.

Possible Approach to Implementation

Before we get into the details of linking spatial data it's worth stating some ground-rules that should be followed when linking any types of resource in the Web of data.

  1. Use formats that support Web linking (as defined in [[WEBARCH]] section 4.4 Hypertext)

    Earlier in this document () we explained that Linked Data requires only that the formats used to publish data support Web linking. In other words, linking spatial data does not automatically mean the use of RDF; links can also be created, for example, using [[GML]], HTML or JSON-LD. The two key points from [[WEBARCH]] are:

    • Good practice: Link identification — A [data format] specification SHOULD provide ways to identify links to other resources [...].
    • Good practice: Web linking — A [data format] specification SHOULD allow Web-wide linking, not just internal document linking.

    The examples used in this best practice illustrate some of the data formats and mechanisms that support Web linking.

  2. Follow the principles for 4★ — Linked [[WEB-DATA]]

    • Always use global identifiers when linking between documents, so that link identifiers can be taken out of context and shared globally.

      Obviously this principle is predicated on the use of global identifiers for resources. As such, we consider Best Practice 7: Use globally unique persistent HTTP URIs for spatial things as a prerequisite for linking.

    • Links should be typed (explicitly or implicitly), so that clients can decide which link to follow when they are traversing a web of interlinked resources to reach application goals.

      This example, using HTTP Link headers (as defined in [[RFC5988]]), illustrates the use of IANA [[LINK-RELATION-TYPES]] to define the link type. According to the IANA registry, predecessor-version points to a resource containing the predecessor version in the version history (as defined in [[RFC5829]] "Link Relation Types for Simple Version Navigation between Web Resources").

      In simple links involving only two resources, the role, or type, of each resource are implicit and can be inferred from the link relation type. It can be useful to include other information to help users judge whether to follow a link such as human-readable labels and hints about the target resource type. Of course, often target resources and the links that refer to them are maintained by different parties, so such hints should be assumed as prescriptive; they may or may not turn out to be true. For example, [[RFC5988]] "Web Linking" defines a number of additional attributes including: hreflang — hints at the language or languages that the target resource is available in; type — indicates the media-type expected; and title — labels the link target such that it can be used as a human-readable identifier etc.

      Also note that [[DWBP]] Best Practice 19: Use content negotiation for serving data available in multiple formats recommends the use of content negotiation to help ensure that a user or software agent is provided with useful content when they traverse a link and resolve the target resource. However, HTTP Request headers are limited to specifying media-type, character set, encoding (e.g. for compression) and language. There is no mechanism to request that data is provided according to a particular data model or 'profile' (see [[RFC6906]] "profile" Link Relation Type), nor request data in a particular coordinate reference system.

    • Make links as specific as possible. If the linked resource supports fragment identification, and the link logically should be to a fragment of the resource (and not just the resource as a whole), try to use fragment identifiers when possible.

      Being as specific as possible with links is important; e.g. refer to a particular spatial thing rather than the dataset in which that spatial thing is described. That said, we encourage publication of data about spatial things as independently resolvable resources (e.g. so that they can be accessed by search engine's Web crawlers, see Best Practice 4: Make your spatial data indexable by search engines) which means that fragment identifiers are usually not required.

Now that we've set some ground rules for Web linking, we need to consider the types of relationship that might be used in spatial data.

  1. Synonyms and equality

    As described above, it is not uncommon for a spatial thing to be identified using more than one URI (also known as the "non-unique naming problem"). If you think that this is the case, the property owl:sameAs may be used to express this. However, caution is advised as owl:sameAs is an extremely strong statement; literally "these two URIs identify the same resource". As there is only one spatial thing, all the properties and attributes returned when resolving any of the equated URIs are considered to apply to that spatial thing. Given that spatial data is often published by different parties, each concerned with their own perspective, the spatial thing equality is often difficult to determine and depends heavily on the semantics involved.

    So the advice is: if in doubt, don't use owl:sameAs.

    By way of example, let's explore some data for Edinburgh.

    The City of Edinburgh Council Area (e.g. the geographical area that Edinburgh City Council is responsible for) is identified by the Office for National Statistics (the recognised national statistical institute of the UK) using their GSS code (a 9 character alpha numeric identifier) S12000036 and the URI http://statistics.data.gov.uk/id/statistical-geography/S12000036. At the same time, the devolved government in Scotland, operating under its own jurisdiction, retains the GSS code but uses the URI http://statistics.gov.scot/id/statistical-geography/S12000036. Furthermore, the Ordnance Survey maintain yet another URI for the City of Edinburgh Council Area as part of its 'Boundary Line' service that contains administrative and statistical geography areas in the UK: http://data.ordnancesurvey.co.uk/id/7000000000030505. Similarly, Geonames identifies Edinburgh, a second-order administrative division, as http://sws.geonames.org/2650225. All of these URIs refer to the same spatial thing and are equated using owl:sameAs.

    @prefix owl:          <http://www.w3.org/2002/07/owl#> .
    @prefix scotgov-stat: <http://statistics.gov.scot/id/statistical-geography/> .
    @prefix ukgov-stat:   <http://statistics.data.gov.uk/id/statistical-geography/> .
    @prefix osuk:         <http://data.ordnancesurvey.co.uk/id/> .
    @prefix geonames:     <http://sws.geonames.org/> .
    
    scotgov-stat:S12000036 owl:sameAs ukgov-stat:S12000036 .
    osuk:7000000000030505 owl:sameAs ukgov-stat:S12000036 .
    geonames:2650225 owl:sameAs ukgov-stat:S12000036 .
                    

    Also note that in this [[TURTLE]] snippet one could easily include additional properties to help users determine whether the link is worth traversing, such as providing human-readable labels and specifying the type designated by each data publisher.

    In contrast, the resource identified by http://data.ordnancesurvey.co.uk/id/50kGazetteer/81103 defines Edinburgh as a named place of type city. This is not the same as the City of Edinburgh Area and therefore use of the owl:sameAs relationship is inappropriate.

    The mechanics of determining whether the information provided when resolving two or more URIs does indeed describe the same spatial thing is a complex topic all in its own right and way beyond the scope of best practice document. Tools such as Open Refine and the Silk Linked Data Integration Framework are designed to work with, transform and integrate heterogeneous data sources. Their documentation may provide further insight regarding these challenges.

    The very strong semantics of the owl:sameAs property has lead to a number of similar sounding, but semantically weaker, properties being defined. These include:

    • http://www.bbc.co.uk/ontologies/coreconcepts/sameAs, defined by the BBC, whose description states that the property:

      Indicates that something is the same as something else, but in a way that is slightly weaker than owl:sameAs. Its purpose is to connect separate identities of the same thing, whilst keeping separation between the original statements of each.

    • and the widely used schema:sameAs defined by [[SCHEMA-ORG]] whose description states:

      URL of a reference Web page that unambiguously indicates the item's identity. E.g. the URL of the item's Wikipedia page, Freebase page, or official website.

    The lack of equality between the City of Edinburgh [Administrative] Area and the named place Edinburgh spatial things illustrates a gap in best practice. Clearly, these two resources are strongly related, but it is not clear which property one would use to assert that relationship. Unlike administrative areas that have clearly defined boundaries, places often have ill-defined, fuzzy boundaries that are based on human perception of ‘place’; you can’t always define a boundary for a place. For example, Edinburgh the named place, published as part of Ordnance Survey’s 50K Gazetteer, is described using only a notional point geometry; no information is provided about the geometric extent. Other examples of places with ill-defined, fuzzy geometries include the American West and Renaissance Italy. The relationships between places, with their ill-defined (or even absent) geometrical extents, defy description using the topological relationships, such as those described below, which are computed mathematically from geometry.

    We propose the use of a qualitative assertion based on human perceptions to relate places that are deemed to be the same: samePlaceAs. The proposed definition to be published in [[SCHEMA-ORG]] is:

    Thing > Property > samePlaceAs

    Used to relate two places that are perceived to be the same; the physical extents of the two places should be broadly comparable but do not need to be equal in a topological or geometric sense.

    Values expected o be one of these types: Place

    Used on these types: Place

    Given that the notion of place concerns a social perspective, we consider it to be distinct from location which is based on geometry. As a result, samePlaceAs can be used to assert the imprecise, social perceptions about the equality of places. samePlaceAs does not overlap with the topological relationships described later in this best practice document that can be computed from geometry.

    As with all assertions of an imprecise nature that lack formal semantics, samePlaceAs may have limited value for semantic reasoning. Exactly what constitutes the ‘same place’ will always be somewhat debatable. For example, is ancient Byzantium the same place as modern Istanbul? Is a historical hotel that was moved across the street to save it from demolition in a redevelopment scheme that same place that it used to be?

  2. Spatial relationships

    Topological relationships between spatial things can be computed based on assessment of their geometry. [[GeoSPARQL]] defines families of topological relationships (based on the DE-9IM pattern) that, in mathematical terms, specify the spatial dimension of the intersections of the interiors, boundaries and exteriors of two geometric objects that may be 2-dimensional (e.g. area), 1-dimensional (e.g. linear) or 0-dimensional (e.g. point).

    Most commonly used are the simple feature relationship family, described in [[SIMPLE-FEATURES]] section 6.1.15.3 Named spatial relationship predicates based on the DE-9IM. The set of seven named relationships, or spatial predicates, and their associated [[GeoSPARQL]] properties are listed below:

    We recommend use of the Simple Features relation families for describing topological relations between points, lines and areas. Further details are provided in [[GeoSPARQL]] section 7 Topology Vocabulary Extension.

    <script type="application/hal+json">
    {
      "ex:type-nl": "brug",
      "ex:type-en": "bridge",
      "ex:name": "Lelieslius",
      "_links": {
        "self": { "href" : "http://data.example.org/topo/ams/brug/Leliesluis" },
        "curies": [ 
          { 
            "name": "geosparql", 
            "href": "http://www.opengis.net/ont/geosparql#{rel}", 
            "templated": true 
          } , {
            "name": "ex",
            "href": "http://data.example.org/def/topo#{rel}",
            "templated": "true"
          } 
        ],
        "geosparql:sfCrosses": { "href" : "http://data.example.org/topo/ams/kanaal/Prinsengracht" }
      }, 
      "_embedded": {
        "ex:type-nl": "kanaal",
        "ex:type-en": "canal",
        "ex:name": "Prinsengracht",
        "_links": {
          "self": { "href" : "http://data.example.org/topo/ams/kanaal/Prinsengracht" }
        }
      }
    }
    </script>
                    

    The example above uses the Hypertext Application Language (HAL) conventions for expressing hyperlinks in JSON. It illustrates how one would indicate using geosparql:crosses that two linear features, a bridge and a canal, cross over each other.

    The spatial predicates specified in [[GeoSPARQL]] describe 2-dimensional topological relations. There is no evidence of common practice for describing 3-dimensional topological relationships.

    In addition to the mathematically precise spatial predicates described above, a number of vocabularies define similar relationships but without the formal mathematical underpinning. For example, [[SCHEMA-ORG]] defines a pair of basic containment relationships for use with schema:Place:

    It is also commonplace to use spatial relationships to convey distance (e.g. at, nearby or far-away) and direction (e.g. left, inFrontOf, astern and below). However, we find no evidence that points to use of common vocabularies to express these relationships - perhaps because these relationships are often subjective and dependent on application context (e.g. the meaning of “near” will be quite different between an endurance cycling App and the App I use to find the bluetooth tag attached to my house keys!).

    Two notable examples of distance relations are:

    • foaf:based_near which states "We do not say much about what 'near' means in this context; it is a 'rough and ready' concept."; and
    • geonames:nearby which simply states "A feature close to the reference feature".
    <script type="application/geo+json">
    {
      "id" : "http://sws.geonames.org/6618987/",
      "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [ [ ... ] ]
      },
      "properties": {
        "http://www.geonames.org/ontology#name": "Anne Frank's House",
        "http://www.geonames.org/ontology#nearby" : 
          [ "http://sws.geonames.org/6950949/",
            "http://sws.geonames.org/6951798/",
            "http://sws.geonames.org/6944503/",
            ... ]
      }
    }
    </script>
                    

    This example snippet, adapted to use [[RFC7946]] GeoJSON format, shows a list of features (e.g. Westerkerk, Homomonument and Westertoren) that are deemed 'nearby' Anne Frank's House according to GeoNames.

    The [[RFC7159]] JSON format provides only simple primitive types; string, number, boolean etc. The lack of a datatype for URIs means that they must be encoded as strings. As such, conventions (such as those defined in HAL) are required to tell applications that a given string value is a URI. However, [[RFC7946]] GeoJSON does not define any conventions for describing URIs and forbids any extension of the data format specification.

    To mitigate this, details about object types etc. included in data payload should be provided in the documentation for the API or service end-point from which the data is accessed. See [[DWBP]] Best Practice 25: Provide complete documentation for your API for further details.

  3. Domain-specific relationships involving spatial things

    In addition to the spatial relationships that are applicable to a wide variety of domains, there are a huge number of cases where asserting a relationship between spatial thing is useful. Clearly, enumerating all these cases is more than we can do here - but we can look at some of those that commonly occur.

    First, there are the properties used to describe relationships between spatial things in a gazetteer. These properties are often used in combination with spatial predicates to describe the relationship between administrative units. For example, Ordnance Survey define specific properties to describe the relationships between the administrative units used within the UK: county, district, ward, etc.

    @prefix rdfs:      <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix geosparql: <http://www.opengis.net/ont/geosparql#> .
    @prefix admingeo:  <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
    
    <http://data.ordnancesurvey.co.uk/id/7000000000030505>
      a admingeo:District ;
      rdfs:label "City of Edinburgh" ;
      admingeo:gssCode "S12000036" ;
      admingeo:ward
        <http://data.ordnancesurvey.co.uk/id/7000000000043412> , 
        <http://data.ordnancesurvey.co.uk/id/7000000000043415> , 
        <http://data.ordnancesurvey.co.uk/id/7000000000043411> ,
        ... ;
      geosparql:sfTouches
        <http://data.ordnancesurvey.co.uk/id/7000000000036552> , 
        <http://data.ordnancesurvey.co.uk/id/7000000000030509> , 
        <http://data.ordnancesurvey.co.uk/id/7000000000030634> , 
        <http://data.ordnancesurvey.co.uk/id/7000000000030632> ;
      ...
      .
                    

    The example snippet above, provided in [[TURTLE]] format, shows the relationships between the City of Edinburgh district and the electoral wards it contains. Also note that complementary use of geosparql:sfTouches to relate the City of Edinburgh to its adjacent districts; Midlothian, West Lothian etc.

    A second domain where relationships between spatial things and non-spatial resources occur is earth observing. The example below, provided in [[GML]], relates a monitoring point at Deddington on the Nile River, Tasmania, to the sensor that is deployed there (using the sams:hostedProcedure property) and relates that monitoring point to the waterbody whose properties are being measured (using the sam:sampledFeature property). Here, the links are defined using [[XLINK11]].

    <wml2:MonitoringPoint gml:id="xsd-monitoring-point.example"
      xmlns:wml2="http://www.opengis.net/waterml/2.0"
      xmlns:gml="http://www.opengis.net/gml/3.2" 
      xmlns:sam="http://www.opengis.net/sampling/2.0"
      xmlns:sams="http://www.opengis.net/samplingSpatial/2.0"
      xmlns:xlink="http://www.w3.org/1999/xlink">
      <gml:description>Hydrological monitoring point for Nile river at 
        Deddington, South Esk catchment, Tasmania</gml:description>
      <gml:identifier codeSpace="http://www.example.com/">
        http://www.example.com/catchment/south-esk/mpoint/deddington
      </gml:identifier>
      <sam:sampledFeature xlink:href="http://sws.geonames.org/2155327/" 
        xlink:title="Nile river"/> 
      <sams:shape>
        <gml:Point gml:id="location_deddington">
          <gml:pos srsName="urn:ogc:def:crs:EPSG::4326">
            -41.814935 147.568517
          </gml:pos> 
        </gml:Point>
      </sams:shape>
      <sams:hostedProcedure>
        <wml2:ObservationProcess gml:id="sensor:4c40fd3acdbf">
          <wml2:processType xlink:href="http://www.opengis.net/def/waterml/2.0/processType/Sensor" 
            xlink:title="Sensor"/>
          <wml2:processReference xlink:href="http://www.example.com/sensor/00d97bbc-77ca-4b3d-91ca-4c40fd3acdbf/conf/1489405706" 
            xlink:title="Sensor configuration (updated:2017-03-13)"/>
        </wml2:ObservationProcess>
      </sams:hostedProcedure>
      ...
    </wml2:MonitoringPoint>
                    

    For further information about sensors, sampling, observations and measurements, please refer to [[OandM]] and also to [[VOCAB-SSN]].

    [[GML]] adopted the [[XLINK11]] standard to represent links between resources. At the time of adoption, XLink was the only W3C-endorsed standard mechanism for describing links between resources within XML documents. The Open Geospatial Consortium anticipated broad adoption of XLink over time - and, with that adoption, provision of support within software tooling. While XML Schema, XPath, XSLT and XQuery etc. have seen good software support over the years, this never happened with XLink. The authors of [[GML]] note that given the lack of widespread support, use of Xlink within [[GML]] provided no significant advantage over and above use a bespoke mechanism tailored to the needs of [[GML]].

    Our final example of a domain-specific relationship concerns creative works. For example, one may want to indicate the location a social media message was sent from. In the example below, we assume that Maurits, a tourist in Amsterdam, wants to comment on his visit to Anne Frank's House. His social media App uses the [[GEOLOCATION-API]] to determine his location (Lat=52.37590 and Long=4.88452) and suggests several places that Maurits might choose from in order to geo-tag his message. Maurits wants people to know roughly where he is, so he chooses "Amsterdam-Centrum" and presses 'send'. The App encodes the message in [[SCHEMA-ORG]] and pushes the message to the server for distribution. The geo-information is provided using the schema:locationCreated property.

    <script type="application/ld+json">
    {
      "@context" : {
        "@vocab" : "http://schema.org/"
      },
      "@id" : "http://app.example.com/message/867a52e3-6687-4471-b1f2-c7561673552e",
      "@type" : "Message",
      "sender" : { "@type" : "Person", "name" : "Maurits" },
      "datePublished" : "2017-03-12", 
      "locationCreated" : {
        "@id" : "https://g.co/kg/m/0gh6_3j"
        "@type" : "Place",
        "name" : "Amsterdam-Centrum"
      }
    }
    </script>
                    

    If Maurits had wanted to indicate that the subject of the photograph he took moments later was Leliesluis bridge, then the following [[SCHEMA-ORG]] markup and schema:mainEntity property could be used:

    <script type="application/ld+json">
    {
      "@context" : {
        "@vocab" : "http://schema.org/"
      },
      "@id" : "http://app.example.com/user/Maurits/photo/e35f1132-461e-4acb-8a76-a5d622a85958",
      "@type" : "Photograph",
      "sender" : { "@type" : "Person", "name" : "Maurits" },
      "datePublished" : "2017-03-12", 
      "mainEntity" : {
        "@id" : "http://data.example.org/topo/ams/brug/Leliesluis"
        "@type" : "Bridge",
        "name" : "Leliesluis bridge",
        "geo" : {
          "@type" : "GeoCoordinates",
          "longitude" : "4.88435",
          "latitude" : "52.37608"
        }
      }
    }
    </script>
                    

Other than the specific guidance above about use of owl:sameAs, schema:samePlaceAs and the [[GeoSPARQL]] simple features relations, we are not recommending specific vocabularies for spatial linking. Instead, we hope to have introduced patterns that show the types of spatial linking that might be used and leave it to spatial data publishers to determine which specific vocabulary best suits their purpose. In this regard, [[DWBP]] section 8.9 Data Vocabularies and, in particular, [[DWBP]] Best Practice 15: Reuse vocabularies, preferably standardized ones are highly relevant.

Now that we know how to link and the relation types that might be used, it's time to consider what we should link to.

  1. Describe what you know — but stop there.

    Data publishers should assert the relationships that they know about and that they think will be of interest to their user community; through equality statements, spatial relations or domain-specific relations etc.

    Publishers should try to avoid making assumptions about what the user may or may not know. For example, they may lack the expertise or resources to calculate a topological relationship, or lack the domain knowledge to determine how two spatial things are related, if at all. As the data publisher, you are likely to be in a better position to make these judgements than the user — so help them out by making these relationships clear.

  2. Link to the spatial thing.

    The geometry description or extent of a spatial thing may be expressed using an object with its own URI. For example:

    @prefix rdfs:      <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix admingeo:  <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
    @prefix geom:      <http://data.ordnancesurvey.co.uk/ontology/geometry/>
    
    <http://data.ordnancesurvey.co.uk/id/7000000000030505>
      a admingeo:District ;
      rdfs:label "City of Edinburgh" ;
      geom:extent <http://data.ordnancesurvey.co.uk/id/geometry/30505-11> .
    
    <http://data.ordnancesurvey.co.uk/id/geometry/30505-11>
      a geom:AbstractGeometry ;
      geom:asGML "<gml:MultiPolygon>...</gml:MultiPolygon>"^^rdf:XMLLiteral ;
      geom:hectares 27300.411 .
                    

    As can be seen in the example above, the geometry 30505-11 is an attribute of the City of Edinburgh. If your intent is to make a statement about, or refer to, the real-world entity then make sure you link to the spatial thing rather than the geometry. Furthermore, note that the geometry record may be updated and re-published with a new identifier, for example, if the city boundary was resurveyed and would then result in a broken link.

    Data publishers should also be aware of a common pattern used in the publication of Linked Data, where the spatial thing and the information resource that describes it are identified separately — often, but not always, using /id as part of the URI for spatial thing, and /doc for the corresponding page/document/record. When the URI for the spatial thing is resolved, a HTTP 303 (see other) response is used to redirect the browser to the page/document/record URL. For example:

    While this disambiguation has its advantages, it often seems to confuse users (and even some experts). Be aware of this redirect pattern, and make sure you use the correct URI i.e. the identifying one — especially if you're copying the URI from a browser's address bar which usually ends up showing the page/document/record URL.

  3. Link to spatial things from popular repositories.

    Linking with URIs from popular repositories may improve discoverability of your data. Not only does this provide users with better context by enabling them to browse the information published by the popular repository, it also helps relate your data with datasets from other parties who have also used those URIs as points of reference.

    There are many popular repositories containing sets of identifiers for spatial things; the following list suggests the primary sources worth checking:

    • GeoNames
    • Wikidata
    • DBpedia
    • National open spatial datasets such as are made available by for example the UK and Dutch governments.

      Finding out which national open spatial datasets are available, and how they can be accessed, currently requires some insider knowledge — in most cases because these datasets are often not easily discoverable. Look for national data portals / geoportals such as Nationaal Georegister (Dutch national register of spatial datasets) or Dataportaal van de Nederlandse overheid (Dutch national governmental data portal).

    Once you've found well-known URIs for spatial things that you want to link to, proceed to create links using properties such as those described above — owl:sameAs (if you're careful!) and geosparql:sfWithin, or perhaps qualitative relationships like geonames:nearby or the proposed schema:samePlaceAs.

    However, don't try to make links to everything. It is not always feasible to link your spatial things to well-known resources. For example, if you were maintaining a registry of cultural heritage in Amsterdam, it would be reasonably simple to look up identifiers for the city's 50 or so museums and map these to your spatial things. But it would be a huge task for, say, a topographic mapping agency to cross-reference their entire catalogue of named places containing tens of thousands of spatial things with third-party resources (although in the spirit of crowd-sourcing, if someone else found those links useful, they may take on the task of relating the spatial things and publishing those relationships to the Web as a complementary resource!). In essence, you should only create the data that you have the resources to maintain.

Finally, we need to consider how to bring symmetry to discovery of links. It's easy for a user to discover links when they are embedded in the document or data record they are working with — whether they are inbound or outbound links, all the information they need to decide whether to traverse them should be available. However, we need to consider the situation where another party publishes a dataset that refers to your spatial things. Or perhaps, a set of relationships between spatial things in two (or more) datasets has been developed by an agent that owns neither of the source datasets. How would a client application know that these links existed? To finish this best practice, we describe a number of mechanisms that might be used to help a user discover remotely published links.

  1. Help search engines find your links.

    Discovery is the staple of search engines. As with the document Web, links in the Web of data will be used to support and refine search algorithms.

    If your spatial data is indexable by the search engines, the links defined therein will be visible to them. Please refer to Best Practice 4: Make your spatial data indexable by search engines.

  2. Publish your links to a third party service.

    Because the links we define use global identifiers for both the source and target resources, they do not need any context to be useful (e.g. one doesn't need any prior knowledge to determine that the URI http://dbpedia.org/resource/Anne_Frank_House identifies Anne Frank's House). This means that our links can be published independently from the datasets where they were originally defined.

    The service <sameAs> provides an example of this in practice — albeit only for links using the owl:sameAs relation type. The HTTP GET request http://sameas.org/store/freebase/n3?uri=<http://rdf.freebase.com/ns/m.02s5hd> produces the following result in [[N-TRIPLES]] format:

    HTTP/1.1 200 OK
    Connection:close
    Content-Length:864
    Content-Type:text/n3
    Date:Mon, 13 Mar 2017 16:12:04 GMT
    
    @prefix owl:  <http://www.w3.org/2002/07/owl#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix dc:   <http://purl.org/dc/elements/1.1/> .
    @prefix dct:  <http://purl.org/dc/terms/> .
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    
    <http://sameas.org/rdf?uri=http%3A%2F%2Frdf.freebase.com%2Fns%2Fm.02s5hd>
      dc:creator "sameAs.org/store/freebase" ;
      dc:title "Co-references from sameAs.org/store/freebase for http://rdf.freebase.com/ns/m.02s5hd" ;
      foaf:primaryTopic "http://rdf.freebase.com/ns/m.02s5hd" ;
      dct:license <http://creativecommons.org/publicdomain/zero/1.0/> .
    
    <http://rdf.freebase.com/ns/m.02s5hd> owl:sameAs
        <http://dbpedia.org/resource/Anne_Frank_House> ,
        <http://rdf.freebase.com/ns/m.02s5hd> ,
        <http://rdf.freebase.com/ns/en.the_annexe> ,
        <http://rdf.freebase.com/ns/guid.9202a8c04000641f80000000002c15ec> .
                    

    Sets of links can be published in any of the data formats that support Web linking. However, it is worth noting that tabular data, as described in [[TABULAR-DATA-PRIMER]], may provide a reasonably compact form while retaining the semantics associated with each of the link relation types used.

    While this is an example of a convenient mechanism to discover alternate identifiers for a given resource, it is ad-hoc. A client application would need to be configured to query an arbitrary number of known service end-points in order to discover links published across all the domains deemed of interest. As such, this kind of approach is only likely to be useful where one can alert the user community which services they should refer to.

  3. Publish metadata about collections of links in a standard way.

    A data publisher may consider publishing summary information about their datasets and the links defined in them using the Vocabulary of Interlinked Datasets [[VoID]].

    In [[VoID]], Linksets provide summary description of the relationships between two datasets; identifying the source and target datasets, the link relation type(s) used plus optional metadata such as URI templates for identifying participating resources and the number of links specified for each relation type, and may describe technical features such as APIs through which the participating datasets can be accessed.

    [[VoID]] dataset and linkset descriptions may be published at /.well-known/void on each domain (using the .well-known pattern defined in [[RFC5758]]) making them easy to find. Applications could be configured to harvest [[VoID]] descriptions from the Internet domains of data publishers in their community, enabling them to build a searchable graph of relationships between datasets — a Data Network. Because this information is summarized at the set level, the data network graph is convenient to work with, allowing simple discovery of numbers and relation types of both inbound and outbound links within a given dataset. Once the presence of interesting links within a dataset has been identified, the user would then work directly with the dataset in question (or the API through which it is accessed) to acquire the detailed information about specific links defined in that dataset.

    Interactions such as those described above would be quite intensive for a human using a browser. However, the [[VoID]] descriptions could be used to drive a software agent that hides much of the complexity from the user; for example, automatically harvesting individual links once a set-level relationship is considered interesting, and then allowing the user to traverse those links either forwards or backwards.

    The use of [[VoID]] needs further discussion; is there enough evidence of its adoption in the wild? Also, the Triple Pattern Fragments server needs a bit more investigation to determine exactly how much the [[VoID]] descriptions are used.

How to Test

...

Evidence

Relevant requirements: R-Linkability, R-MachineToMachine, R-SpatialRelationships, R-SpatialOperators, R-Discoverability.

Benefits

  • ...

(to be deleted)

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

Other best practices

This section is a placeholder for best practices that were in the FPWD but have not yet been placed in the new doc structure. They may be removed, merged, or moved.

Removed - merged with BP10.

When someone looks up a URI for a SpatialThing, provide useful information, using the common representation formats

The best practices described in this document will incorporate practice from both Observations and Measurements [[OandM]] and W3C Semantic Sensor Network Ontology [[VOCAB-SSN]].

See also W3C Generic Sensor API and OGC Sensor Things API. These are more about interacting with sensor devices.

State how coordinate values are encoded

Provide enough information for users to determine how coordinate values are encoded.

Why

The geometry of spatial things is described using coordinates; for example, latitude and longitude. Because coordinates describe a position relative to a datum (e.g. zero latitude is the equator and zero longitude is the prime meridian - often the Greenwich Meridian), it is important to understand both the datum and the units that are used for coordinates along with the order which the coordinate axes are defined: the coordinate reference system (CRS). Spatial data is published in a wide variety of CRS. This variety can create confusion and inconsistencies in using and interpreting spatial data. Unless the CRS is known, errors are likely to be introduced when determining the position of a spatial thing on the Earth and makes comparing or combining spatial data from different sources extremely problematic.

Intended Outcome

Sufficient information is provided to enable coordinates to be related to the correct position, thereby enabling spatial data to be correctly interpretted by humans and software agents.

Spatial data from different sources can be combined without introducing unwarranted positional errors.

Possible Approach to Implementation

A user of spatial data will need to know:

  1. which coordinate value relates to which axis;
  2. what units used for each coordinate; and
  3. what datum is used

There is a predominant view that "I just need to use Lat and Long - and I'm done".

Although the vast majority of spatial data published on the Web uses WGS 84 Long/Lat (as used by GPS), we strongly recommend that spatial data is published with all the necessary information to interpret coordinate values. Even where it the use of latitude and longitude angular measurements is obvious, the choice of datum and units of measurement have an impact. In particular, angular measurements appearing as floating point numbers are mostly likely to be provided in decimal degrees, may be radians or gons (also known as grads).

The problem that the assumption of a "predominant view" leads to ambiguity. For example, many spatial data users work entirely with information provided in their national coordinate reference system (such as the Dutch Amersfoort / RD EPSG:28992 or OSGB 1936 / British National Grid EPSG:27700) which make all coordinates in WGS 84 Long/Lat (especially the negative numbers) utterly perplexing.

In practice, a publisher not documenting their CRS and presuming that latitude and longitude can be treated as cartesian is often bailed out by fuzzy use cases and software that takes care of projections. However, CRS and coordinate axis order ambiguity leads sooner or later to serious and avoidable errors, while ignorance of datums and map projections leads to broken applications. Furthermore, these practices will also become less and less tenable as new applications such as Augmented Reality require higher data precision and accuracy.

There are four common ways that this information can be provided:

  1. Provide each coordinate value with explicit labels and provide metadata to indicate what each label means.

    @prefix w3cgeo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
    @prefix dcterms: <http://purl.org/dc/terms/> .
    
    :myPointOfInterest a w3cgeo:SpatialThing ;
        dcterms:description "Anne Frank's House, Amsterdam."
        w3cgeo:lat "52.37514"^^xsd:float ;
        w3cgeo:long "4.88412"^^xsd:float ;
        .
                  

    The labels (or terms) w3cgeo:lat and w3cgeo:long are provided by the [[W3C-BASIC-GEO]] vocabulary which states that it is:

    A vocabulary for representing latitude, longitude and altitude information in the WGS84 geodetic reference datum.

    The terms themselves (plus w3cgeo:alt) are defined with all the necessary information as follows:

    • lat: The WGS84 latitude of a Spatial Thing (decimal degrees).
    • long: The WGS84 longitude of a Spatial Thing (decimal degrees).
    • alt: The WGS84 altitude of a Spatial Thing (decimal meters above the local reference ellipsoid).
    <script type="application/ld+json">
    {
      "@context" : {
        "@vocab" : "http://schema.org/"
      },
      "myPointOfInterest" : {
        "@type" : "Place",
        "geo" : {
          "@type": "GeoCoordinates",
          "latitude": "52.37514",
          "longitude": "4.88412"
        }
      }
    }
    </script>
                  

    In the example above, the labels latitude and longitude are defined in [[SCHEMA-ORG]], as indicated by the JSON-LD key @vocab. The associated definitions in [[SCHEMA-ORG]] are:

    • latitude: The latitude of a location. For example 37.42242 (WGS 84).
    • longitude: The longitude of a location. For example -122.08585 (WGS 84).

    The definitions provided in [[SCHEMA-ORG]] do not indicate the unit of measure. However, we have included this example as [[SCHEMA-ORG]] is very commonly used. The unit of measure used for latitude and longitude are decimal degrees. Decimal meters is used for the remaining coordinate position property elevation.

    The metadata for axis labels may also be provided in the documentation for an API from which the spatial data is accessed. For more information on documenting APIs, please refer to [[DWBP]] Best Practice 25: Provide complete documentation for your API.

  2. Use a data format that specifies axes, their order, datum and unit of measurement for coordinates.

    HTTP/1.1 200 OK
    Date: Sun, 05 Mar 2017 17:12:35 GMT
    Content-length: 543
    Connection: close
    Content-type: application/geo+json
    
    {
      "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [ [4.884235, 52.375108], [4.884276, 52.375153], 
            [4.884257, 52.375159], [4.883981, 52.375254], 
            [4.883850, 52.375109], [4.883819, 52.375075], 
            [4.884104, 52.374979], [4.884143, 52.374965], 
            [4.884207, 52.375035], [4.884263, 52.375016], 
            [4.884320, 52.374996], [4.884255, 52.374926], 
            [4.884329, 52.374901], [4.884451, 52.375034], 
            [4.884235, 52.375108] ]
          ]
      },
      "properties": {
        "name": "Anne Frank's House"
      }
    }
                  

    The media type application/geo+json is used to designate that content is provided in GeoJSON format, as specified in [[RFC7946]].

    [[RFC7946]] Section 4. Coordinate Reference System provides all the necessary information to interpret the coorindates, stating that:

    The coordinate reference system for all GeoJSON coordinates is a geographic coordinate reference system, using the World Geodetic System 1984 (WGS 84) [WGS84] datum, with longitude and latitude units of decimal degrees. This is equivalent to the coordinate reference system identified by the Open Geospatial Consortium (OGC) URN urn:ogc:def:crs:OGC::CRS84. An OPTIONAL third-position element SHALL be the height in meters above or below the WGS 84 reference ellipsoid. In the absence of elevation values, applications sensitive to height or depth SHOULD interpret positions as being at local ground or sea level.

    <script type="application/ld+json">
    {
      "@context" : {
        "@vocab" : "http://schema.org/"
      },
      "myPlaceOfInterest" : {
        "@type" : "Place",
        "name" : "Anne Frank's House",
        "geo" : {
          "@type": "GeoShape",
          "polygon": "52.375108,4.884235 52.375153,4.884276 
                      52.375159,4.884257 52.375254,4.883981 
                      52.375109,4.883850 52.375075,4.883819 
                      52.374979,4.884104 52.374965,4.884143 
                      52.375035,4.884207 52.375016,4.884263 
                      52.374996,4.884320 52.374926,4.884255 
                      52.374901,4.884329 52.375034,4.884451 
                      52.375108,4.884235"
        }
      }
    }
    </script>
    
    
    
                  

    The [[SCHEMA-ORG]] definition of GeoShape states:

    The geographic shape of a place. A GeoShape can be described using several properties whose values are based on latitude/longitude pairs. Either whitespace or commas can be used to separate latitude and longitude; whitespace should be used when writing a list of several such points.

    In these two previous examples we see a prime example of why coordinate axis-order is important: GeoJSON [[RFC7946]] uses Long/Lat while [[SCHEMA-ORG]] uses Lat/Long. Getting the axis order in the wrong order puts Anne Frank's House somewhere somewhere off the coast of Somalia rather than the Netherlands!

  3. State within the data itself which coordinate reference system is used.

    <gml:Polygon srsDimension="2" axisLabels="east north" 
                 srsName="http://www.opengis.net/def/crs/EPSG/0/28992">
      <gml:exterior>
        <gml:LinearRing>
          <gml:posList>
            120749.725 487589.422  120752.55  487594.375  120751.227 487595.129
            120732.539 487605.788  120723.505 487589.745  120721.387 487585.939
            120740.668 487575.07   120743.316 487573.589  120747.735 487581.337
            120751.564 487579.154  120755.411 487576.96   120750.935 487569.172
            120755.941 487566.288  120764.369 487581.066  120749.725 487589.422
          </gml:posList>
        </gml:LinearRing>
      </gml:exterior>
    </gml:Polygon>
                  

    The example above encodes the polygon for Anne Frank's House in [[GML]]. The XML attribute srsName (srs meaning "spatial reference system") refers to the Amersfoort / RD CRS (EPSG:28992) used in the Netherlands. Also note that additional useful information (srsDimension and axisLabels) is provided within the document for easy reference.

    {
      "@context": {
        "geosparql" : "http://www.opengis.net/ont/geosparql#" ,
        "rdfs" : "http://www.w3.org/2000/01/rdf-schema#" ,
        "asWKT" : {
          "@id" : "http://www.opengis.net/ont/geosparql#asWKT" ,
          "@type" : "geosparql:wktLiteral"
        }
      } ,
      "@id" : "http://example.org/register/id/building/0363100012169587" ,
      "@type" : "http://www.opengis.net/ont/geosparql#Feature" ,
      "rdfs:label" : "Building 0363100012169587" ,
      "geosparql:hasGeometry": {
        "geosparql:asWKT" : "<http://www.opengis.net/def/crs/EPSG/0/4326> 
                            POLYGON ((52.375108 4.884235, 52.375153 4.884276, 
                                      52.375159 4.884257, 52.375254 4.883981, 
                                      52.375109 4.883850, 52.375075 4.883819, 
                                      52.374979 4.884104, 52.374965 4.884143, 
                                      52.375035 4.884207, 52.375016 4.884263, 
                                      52.374996 4.884320, 52.374926 4.884255, 
                                      52.374901 4.884329, 52.375034 4.884451, 
                                      52.375108 4.884235))"
      }
    }
                  

    The "Well Known Text" (WKT) encoding, itself defined in [[SIMPLE-FEATURES]], is extended by [[GeoSPARQL]] to include designation of the coordinate reference system used. The example above encodes the polygon as a [[GeoSPARQL]] wktLiteral data type, designating the coordinate reference system as <http://www.opengis.net/def/crs/EPSG/0/4326> (EPSG:4326) - WGS 84 Lat/Long.

    When using the wktLiteral datatype specified in [[GeoSPARQL]], the coordinate reference system URI may be omitted. In such a case, WGS 84 Long/Lat (urn:ogc:def:crs:OGC::CRS84) is used. Please refer to [[GeoSPARQL]] Requirement 11 for more details.

    The Basisregistraties Adressen en Gebouwen (BAG - the Dutch "Basic Registers for Addresses and Buildings"), provided by Kadaster, uses this default behaviour. Anne Frank's House, is identified using the URI http://bag.basisregistraties.overheid.nl/bag/id/pand/0363100012169587. HTML, JSON, TTL and XML representations are available.

  4. Describe the coordinate reference system in the dataset metadata.

    @prefix ex:      <http://data.example.org/datasets/> .
    @prefix dcat:    <http://www.w3.org/ns/dcat#> .
    @prefix dcterms: <http://purl.org/dc/terms/> .
    @prefix skos:    <http://www.w3.org/2004/02/skos/core#> .
    
    ex:ExampleDataset 
      a dcat:Dataset ;
      dcterms:conformsTo <http://www.opengis.net/def/crs/EPSG/0/32630> .
    
    <http://www.opengis.net/def/crs/EPSG/0/32630> 
      a dcterms:Standard, skos:Concept ;
      dcterms:type <http://inspire.ec.europa.eu/glossary/SpatialReferenceSystem> ;
      dcterms:identifier "http://www.opengis.net/def/crs/EPSG/0/32630"^^xsd:anyURI ;
      skos:prefLabel "WGS 84 / UTM zone 30N"@en ;
      skos:inScheme <http://www.opengis.net/def/crs/EPSG/0/> .
                  

    The example above illustrates how to describe the coordinate reference system used for a dataset within [[GeoDCAT-AP]] metadata. The conformsTo property from [[DCTERMS]] is used to assert the relationship between dataset and CRS in the same way that conformance with a standard is expressed in [[VOCAB-DQV]].

    For more information about dataset metadata, please refer to Best Practice 1: Include spatial metadata in dataset metadata.

    GID,On Street,Long,Lat,Species,Trim Cycle,Diameter at Breast Ht,Inventory Date,Comments,Protected
    1,ADDISON AV,-122.156485,37.440963,Celtis australis,Large Tree Routine Prune,11,10/18/2010,,
    2,EMERSON ST,-122.156749,37.440958,Liquidambar styraciflua,Large Tree Routine Prune,11,6/2/2010,,
    6,ADDISON AV,-122.156299,37.441151,Robinia pseudoacacia,Large Tree Routine Prune,29,6/1/2010,cavity or decay; trunk decay; codominant leaders; included bark; large leader or limb decay; previous failure root damage; root decay;  beware of BEES,YES
                  

    In this example (adapted from the City of Palo Alto tree operations database and published as tabular data and as an interactive map) the coordinate position of each tree is specified using separate columns (Long and Lat) as recommended in approach (1) above.

    As shown in the abridged tabular metadata document, the columns Long and Lat are mapped onto the definitions provided by [[W3C-BASIC-GEO]] to ensure that the meaning of the data values in those columns is clear:

    {
      "@context": ["http://www.w3.org/ns/csvw", {"@language": "en"}],
      "@id": "http://example.org/tree-ops-db",
      "url": "tree-ops-db.csv",
      "dc:title": "Tree Operations",
      ...
      "tableSchema": {
        "columns": [{
          "name": "GID",
          "titles": [
            "GID",
            "Generic Identifier"
          ],
          "dc:description": "An identifier for the operation on a tree.",
          "datatype": "string",
          "required": true, 
          "suppressOutput": true
        }, {
          "name": "on_street",
          "titles": "On Street",
          "dc:description": "The street that the tree is on.",
          "datatype": "string"
        }, {
          "name": "Long",
          "titles": "Longitude",
          "dc:description": "The WGS84 longitude of the tree (decimal degrees).",
          "propertyUrl": "http://www.w3.org/2003/01/geo/wgs84_pos#long"
          "datatype": {
            "base": "number",
            "minimum": "-180",
            "maximum": "180"
          }
        }, {
          "name": "Lat",
          "titles": "Latitude",
          "propertyUrl": "http://www.w3.org/2003/01/geo/wgs84_pos#lat"
          "dc:description": "The WGS84 latitude of the tree (decimal degrees).",
          "datatype": {
            "base": "number",
            "minimum": "-90",
            "maximum": "90"
          }
        },
        ...
        "primaryKey": "GID",
        "aboutUrl": "http://example.org/tree-ops-ext#gid-{GID}"
      }
    }
                  

    Please refer to [[TABULAR-DATA-PRIMER]] section 6.2 How do you support geospatial data? for more details on working with geospatial content in tabular data.

How to Test

For a given spatial data publication, users can find information about the coordinate axes, their order and unit of measurement, plus the datum used.

Evidence

Relevant requirements: R-DeterminableCRS, R-CRSDefinition, R-GeoreferencedData, R-LinkingCRS.

Conclusions

Applicability of common formats to implementation of best practices

The Spatial Data on the Web working group is working on recommendations about the use of formats for publishing spatial data on the web, specifically about selecting the most appropriate format. There may not be one most appropriate format: which format is best may depend on many things. This section gives two tables that both aim to be helpful in selecting the right format in a given situation. These tables may in future be merged or reworked in other ways.

The first table is a matrix of the common formats, showing in general terms how well these formats help achieve goals such as discoverability, granularity etc.

An attempt at matrix of the common formats (GeoJSON, [[GML]], RDF, JSON-LD) and what you can or can't achieve with it. (source: @eparsons)
Format Openness Binary/text Usage Discoverability Granular links CRS Support Verbosity Semantics vocab? Streamable 3D Support
ESRI Shape Open'ish Binary Geometry only attributes and metadata in linked DB files Poor In Theory? Yes Lightweight No No Yes
GeoJSON [[RFC7946]] Open Text Geometry and attributes inline array Good ? In Theory? No Lightweight No No No
DXF Proprietary Binary Geometry only attributes and metadata in linked DB files Poor Poor No Lightweight No No Yes
[[GML]] Open Text Geometry and attributes inline or xlinked Good ? In Theory ? Yes Verbose No No Yes
KML Open Text Geometry and attributes inline or xlinked Good ? In Theory ? No Lightweight No Yes? Yes

The second table is much more detailed, listing the currently much-used formats for spatial data, and scoring each format on a lot of detailed aspects.

An attempt at a matrix of the formats for spatial data in current use and detailed aspects. (source: @portele)
GML GML-SF0 JSON-LD GeoSPARQL (vocabulary) schema.org GeoJSON KML GeoPackage Shapefile GeoServices / Esri JSON Mapbox Vector Tiles
Governing Body OGC, ISO OGC W3C OGC Google, Microsoft, Yahoo, Yandex Authors (now in IETF process) OGC OGC Esri Esri Mapbox
Based on XML GML JSON RDF HTML with RDFa, Microdata, JSON-LD JSON XML SQLite, SF SQL dBASE JSON Google protocol buffers
Requires authoring of a vocabulary/schema for my data (or use of existing ones) Yes (using XML Schema) Yes (using XML Schema) Yes (using @context) Yes (using RDF schema) No, schema.org specifies a vocabulary that should be used No No Implicitly (SQLite tables) Implicitly (dBASE table) No No
Supports reuse of third party vocabularies for features and properties Yes Yes Yes Yes Yes No No No No No No
Supports extensions (geometry types, metadata, etc.) Yes No Yes Yes Yes No (under discussion in IETF) Yes (rarely used except by Google) Yes No No No
Supports non-simple property values Yes No Yes Yes Yes Yes (in practice: not used) No No No No No
Supports multiple values per property Yes No Yes Yes Yes Yes (in practice: not used) No No No No No
Supports multiple geometries per feature Yes Yes n/a Yes Yes (but probably not in practice?) No Yes No No No No
Support for Coordinate Reference Systems any any n/a many WGS84 latitude, longitude WGS84 longitude, latitude with optional elevation WGS84 longitude, latitude with optional elevation many many many WGS84 spherical mercator projection
Support for non-linear interpolations in curves Yes Only arcs n/a Yes (using GML) No No No Yes, in an extension No No No
Support for non-planar interpolations in surfaces Yes No n/a Yes (using GML) No No No No No No No
Support for solids (3D) Yes Yes n/a Yes (using GML) No No No No No No No
Feature in a feature collection document has URI (required for ★★★★) Yes, via XML ID Yes, via XML ID Yes, via @id keyword Yes Yes, via HTML ID No Yes, via XML ID No No No No
Support for hyperlinks (required for ★★★★★) Yes Yes Yes Yes Yes No No No No No No
Media type application/gml+xml application/gml+xml with profile parameter application/ld+json application/rdf+xml, application/ld+json, etc. text/html application/vnd.geo+json application/vnd.google-earth.kml+xml, application/vnd.google-earth.kmz - - - -
Remarks comprehensive and supporting many use cases, but requires strong XML skills simplified profile of GML no support for spatial data, a GeoJSON-LD is under discussion GeoSPARQL also specifies related extension functions for SPARQL; other spatial vocabularies exist, see ??? schema.org markup is indexed by major search engines supported by many mapping APIs focussed on visualisation of and interaction with spatial data, typically in Earth browsers liek Google Earth used to support "native" access to spatial data across all enterprise and personal computing environments, including mobile devices supported by almost all GIS mainly used via the GeoServices REST API used for sharing spatial data in tiles, mainly for display in maps

Authoritative sources of geographic identifiers

Content from Appendix B has been integrated into Best Practice 14: Publish links between spatial things and related resources. This stub appendix has been left so that sequential ordering of the appendices does not change in this draft.

Cross reference of use case requirements against best practices

Cross reference of requirements against best practices
UC Requirements Best Practice

Glossary

Glossary section needs improving; see existing sources of definitions.

Consider adopting definitions from, or aligning definitions with the ISO/TC 211 Glossary - see linked data prototype.

For example, the Coverage definition is considered unclear and potentially inconsistent with the ISO definition of Coverage.

Need consistency in how we cite existing specifications.

Why do some references go to the glossary (e.g. WFS) and some to the references (e.g. SPARQL)? Maybe WFS etc. should be added to the references, too, and the text should include a link both to the glossary and the references?

Coverage: A coverage is a function that describe characteristics of real-world phenomena that vary over space and/or time. Typical examples are temperature, elevation and precipitation. A coverage is typically represented as a data structure containing a set of such values, each associated with one of the elements in a spatial, temporal or spatiotemporal domain. Typical spatial domains are point sets (e.g. sensor locations), curve sets (e.g. contour lines), grids (e.g. orthoimages, elevation models), etc. A property whose value varies as a function of time may be represented as a temporal coverage or time-series [[ISO-19109]] §8.8.

Coordinate Reference System (CRS): A coordinate-based local, regional or global system used to locate geographical entities. Also known as Spatial Reference System. Compare with the ISO definition.

Dimension (geometry): [re-phrased from Wikipedia: https://en.wikipedia.org/wiki/Dimension] In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus a point has no dimension (0D), whereas a line has a dimension of one (1D) because only one coordinate is needed to specify a point on it – for example, the point at 5 on a number line. A surface such as a plane or the surface of a cylinder or sphere has a dimension of two (2D) because two coordinates are needed to specify a point on it – for example, both a latitude and longitude are required to locate a point on the surface of a sphere. The inside of a cube, a cylinder or a sphere is three-dimensional (3D) because three coordinates are needed to locate a point within these spaces. Compare with the ISO definition. [[ISO-19107]]

Ellipsoid: An ellipsoid is a closed quadric surface that is a three-dimensional analogue of an ellipse. In geodesy a reference ellipsoid is a mathematically defined surface that approximates the geoid.

Extent: The area covered by something. Within this document we always imply spatial extent; e.g. size or shape that may be expresses using coordinates.

Feature: Abstraction of real world phenomena. Compare with the ISO definition. [[ISO-19101]] §4.11

Geocoding: Forward geocoding, often just referred to as geocoding, is the process of converting addresses into geographic coordinates. Reverse geocoding is the opposite process; converting geographic coordinates to addresses. Compare with the ISO definition.

Geohash: A geocoding system with a hierarchical spatial data structure which subdivides space into buckets. Geohashes offer properties like arbitrary precision and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision). As a consequence of the gradual precision degradation, nearby places will often (but not always) present similar prefixes. The longer a shared prefix is, the closer the two places are. wikipedia

Geographic information (also geospatial data): Information concerning phenomena implicitly or explicitly associated with a location relative to the Earth. Compare with the ISO definition. [[ISO-19101]] §4.16.

Geographic information system (GIS): An information system dealing with information concerning phenomena associated with location relative to the Earth. Compare with the ISO definition. [[ISO-19101]] §4.18

Geometry: An ordered set of n-dimensional points; can be used to model the spatial extent or shape of a spatial thing

Geoid: An equipotential surface where the gravitational field of the Earth has the same value at all locations. This surface is perpendicular to a plumb line at all points on the Earth's surface and is roughly equivalent to the mean sea level excluding the effects of winds and permanent currents such as the Gulf Stream.

Hypermedia: to be added

Internet of Things (IoT): The network of physical objects or "things" embedded with electronics, software, sensors, and network connectivity, which enables these objects to be controlled remotely and to collect and exchange data.

JavaScript Object Notation (JSON): A lightweight, text-based, language-independent data interchange format defined in [[RFC7159]]. It was derived from the ECMAScript Programming Language Standard. JSON defines a small set of formatting rules for the portable representation of structured data.

Latitude: The angular distance north or south of the equator. Often abbreviated to Lat.

Link: A typed connection between two resources that are identified by Internationalized Resource Identifiers (IRIs) [[RFC3987]], and is comprised of: (i) a context IRI, (ii) a link relation type, (iii) a target IRI, and (iv) optionally, target attributes. Note that in the common case, the IRI will also be a URI [[RFC3986]], because many protocols (such as HTTP) do not support dereferencing IRIs [[RFC5988]].

Linked data: The term ‘Linked Data’ refers to an approach to publishing data that puts linking at the heart of the notion of data, and uses the linking technologies provided by the Web to enable the weaving of a global distributed database [[LDP-PRIMER]].

Longitude: The angular distance east or west of the prime meridian. Often abbreviated to Long.

Open-world assumption (OWA): In a formal system of logic used for knowledge representation, the open-world assumption asserts that the truth value of a statement may be true irrespective of whether or not it is known to be true. This assumption codifies the informal notion that in general no single agent or observer has complete knowledge. In essence, from the absence of a statement alone, a deductive reasoner cannot (and must not) infer that the statement is false. wikipedia

Resource Description Framework (RDF): A directed, labeled graph data model for representing information in the Web. It may be serialized in a number of data formats such as N-Triples [[N-TRIPLES]], XML [[RDF-SYNTAX-GRAMMAR]], Terse Triple Language (“turtle” or TTL) [[TURTLE]] and JSON-LD [[JSON-LD]].

Semantic web: The term “Semantic Web” refers to World Wide Web Consortium's vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data.

SPARQL: A query language for RDF; it can be used to express queries across diverse data sources [[SPARQL11-OVERVIEW]].

Spatial data: Data describing anything with spatial extent; i.e. size, shape or position. In addition to describing things that are positioned relative to the Earth (also see geospatial data), spatial data may also describe things using other coordinate systems that are not related to position on the Earth, such as the size, shape and positions of cellular and sub-cellular features described using the 2D or 3D Cartesian coordinate system of a specific tissue sample.

Spatial database: A spatial database, or geodatabase, is a database that is optimized to store and query data that represents objects defined in a geometric space. Most spatial databases allow representation of simple geometric objects such as points, lines and polygons and provide functions to determine spatial relationships (overlaps, touches etc.).

Spatial Data Infrastructure (SDI): An ecosystem of geographic data, metadata, tools, applications, policies and users that are necessary to acquire, process, distribute, use, maintain, and preserve spatial data. Due to its nature (size, cost, number of interactors) an SDI is often government-related.

SensorThings API: An open, geospatial-enabled and unified way to interconnect the Internet of Things (IoT) devices, data, and applications over the Web. [[SensorThings]].

Sensor Observation Service (SOS): A standardized HTTP interface allowing requests for observations across the web using platform-independent calls. Sensor Observation Service [[SOS]].

Spatial thing: Anything with spatial extent, i.e. size, shape, or position. e.g. people, places, bowling balls, as well as abstract regions like cubes. Compare with the ISO definition for Spatial Object. [[W3C-BASIC-GEO]]

Temporal thing: Anything with temporal extent, i.e. duration. e.g. the taking of a photograph, a scheduled meeting, a GPS time-stamped track-point [[W3C-BASIC-GEO]]

Triple-store (or quadstore): A triple-store or RDF store is a purpose-built database for the storage and retrieval of RDF subject-predicate-object “triples” through semantic queries. Many implementations are actually “quad-stores” as they also hold the name of the graph within which a triple is stored.

Universe of discourse: View of the real or hypothetical world that includes everything of interest. Compare with the ISO definition. [[ISO-19101]] §4.29

Web Coverage Service (WCS): A service offering multi-dimensional coverage data for access over the Internet [[WCS]]

Web Feature Service (WFS): A standardized HTTP interface allowing requests for geographical features across the web using platform-independent calls. Web Feature Service [[WFS]].

Web Map Service (WMS): A standardized HTTP interface for requesting geo-registered map images from one or more distributed spatial databases [[WMS]].

Web Map Tile Service (WMTS): A standardized HTTP interface for requesting tiled, geo-referenced map images from one or more distributed spatial databases [[WMTS]].

Well Known Text (WKT): A text markup language for representing vector geometry objects on a map, spatial reference systems of spatial objects and transformations between spatial reference systems. wikipedia)

Web Processing Service (WPS): An interface standard which provides rules for standardizing inputs and outputs (requests and responses) for invoking spatial processing services, such as polygon overlay, as a Web service [[WPS]].

Extensible Markup Language (XML): A simple, very flexible text-based markup language derived from SGML (ISO 8879). It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. [[XML11]]

Acknowledgments

The editors gratefully acknowledge the contributions made to this document by all members of the working group and the chairs: Kerry Taylor and Ed Parsons.

Changes since previous versions

A full change-log is available on GitHub

Changes since the first public working draft of 19 January 2016

The document has undergone substantial changes since the first public working draft. Below are some of the changes made:

  • Focusing the document to suit the needs of practitioners: those either publishing spatial data themselves, or developing software tools to support the publication of spatial data (see sections and )
  • Addition of new introductory material to explain the fundamentals of spatial data to readers (see sections , , , and )
  • Consolidation of the best practices from 30 down to 17 - based on merging duplicate or closely related best practices and focusing our scope only on spatial data concerns so that, for example, best practices relating to handling sensor data are removed (we expect these subjects to be included in future iterations of the Sensor Network deliverables of the working group)
  • Alignment of the remaining best practices with those from [[DWBP]] - including organizing them according to the same sub-section headings
  • As a consequence of the consolidation and [[DWBP]] alignment, the best practices are renumbered (although the fragment-identifiers remain unchanged from those used in the first public working draft) and the cross-reference to the Requirements from [[SDW-UCR]] (section ) has been updated
  • Addition of a new, partially complete section (see ) that is intended to help readers understand the steps they should take and the questions they should consider when publishing spatial data on the Web; referencing both the general [[DWBP]] best practices and those specific to spatial data described in this document
  • Improvements to the

Changes since working draft of 25 October 2016

Significant updates to:

(further updates to these best practices are expected in the next WD release, circa end January 2017)

Plus minor changes that include adding a list of most important best practices for data publishers that start from an existing SDI to section 9, and changing of a few best practice titles to include the word spatial.

Changes since working draft of 5 January 2017

Significant updates to:

Also:

  • The BP summary has been moved and is now section 4 of the document;
  • Best Practice 17, "How to work with crowd-sourced observations" was removed; and
  • the Best Practices Template section, explaining the template used to describe Best Practices, was removed.