This specification defines WebVMT, the Web Video Map Tracks format, which is an enabling technology whose main use is for marking up external map track resources in connection with the HTML <track> element. WebVMT files provide map presentation and annotation synchronised to video content, including animation support, and more generally any form of geolocation data that is time-aligned with audio or video content.

This document is an explanatory specification, intended to communicate and develop the draft WebVMT format through discussion with user communities.

Use Cases

Example scenarios in which WebVMT can add significant value, with a list of identified benefits.

Coastguard/Mountain Rescue

A missing person is reported to the rescue services, who deploy a drone to search inaccessible areas of coastline or moorland for their target. The drone relays back a live video stream from its camera and geolocation data from its GPS receiver to a remote human operator who is piloting it.

As the search continues, the operator spots a target on the video feed and can instantly call up an electronic map, synchronised to the video, which has been automatically following the drone’s position and plotting its ground track. The display gives the operator immediate context for the video, and allows them to override the automatic map control and zoom in to pinpoint the target’s precise location from the features visible in the video and on the map/satellite view. They mark the location and then zoom out to assess the surrounding terrain and advise the recovery team of the best approach to the target. For example, the terrain may dictate very different approach routes if either the person has twisted their ankle at the top of a cliff, or has fallen and is lying at the bottom of the same cliff, though the co-ordinates are almost identical in both cases.

The operator has been able to make important decisions quickly, which may be life critical, and deploy recovery resources effectively.

Area Survey

A survey drone is equipped with a camera which records an image of the ground directly below it. The pilot is a remote human operator, tasked with surveying a defined area from particular height in order to capture the required data.

As the survey progresses, geometric shapes are automatically added to the map to represent areas which have been included. Once the drone has finished its sweep, the operator can quickly confirm whether the required area has been completely covered. If any areas have been missed, the pilot can use the map to navigate and make additional passes to fill the gaps, before returning to base.

Adding map track (VMT) files to the survey archive provides a geospatial index to the videos, allowing a particular geographic location to be found more rapidly by virtue of their small file size. Online video archives can be indexed more quickly using this web-friendly format.

The operator has been easily able to verify the quality of their own work and correct any errors, saving time and additional effort in redeployment. Video footage has been indexed by geolocation rapidly and in a search-engine-friendly format.

Outdoor Trails

An outdoor sportsperson, e.g. snowboarder or cyclist, is equipped with a helmet camera and/or mobile phone to record video footage and GPS data. They set off to find new challenges and practice their skills, e.g. off-piste or on mountain trails, and discover new routes and areas that they would like to explore in future, chatting to the camera as they go. Afterwards, they upload the video to share their experience with the online community, so others can quickly identify locations of particularly interesting sections of the featured trail. Using the synchronised map view in their browser, community members can easily see where they need to go in order to try it for themselves.

The operator has been able to fully engage in their sporting activity, without making any written notes, while simultaneously recording the details needed to guide others to the same locations. Their changing location over time can also be used to calculate speed and distance information, which can be displayed alongside the footage.

TV Sports Coverage

A TV production company is covering a sports event that takes place over a large area, e.g. rallying, road cycling or sailing, using a number of mobile video devices including competitor cams, e.g. dash cams or helmet cams, and drones to provide shots of inaccessible areas, e.g. remote terrain or over water.

Feeds from all the cameras are streamed to the production control room, where their geolocation data are combined on a map showing the locations of every competitor and camera, each labelled for easy identification. The live map enables the director to quickly choose the best shot and anticipate where and when to deploy their drone cameras to catch competitors at critical locations on the course as the competition develops in real time.

Multiple operators can function concurrently, both autonomously and under central direction. Mobile assets can be monitored and deployed from an operations centre to provide optimum coverage of the developing live event.

Proxy Explorer

Important details of a remote area have been captured on video. It is not possible to revisit the location for safety reasons or because it has physically changed in the intervening time. Footage can be retrospectively geotagged against a concurrent map to allow the viewer to better interpret and identify features seen in the footage. Explanatory annotations can be added to the video-map track to help future viewers' understanding and aggregate the collective analysis.

Multiple operators can contribute their observations to provide a group analysis, iteratively adding new details and discarding out-of-date information. Experts can offer insight about filmed locations, which would otherwise be inaccessible to them.

Treasure Hunt

A TV production company designs a new game show which involves competitors searching for targets across a wide area, with an operations centre remotely monitoring their progress and providing updates. Competitors are equipped with body-worn video or helmet cameras to relay footage of their view.

Geolocation context allow central operators to better understand the participants' actions and to remotely direct them more efficiently. Competitors' positions can displayed to the TV audience on annotated 2D- or 3D-maps for clearer presentation.

Swarm Monitoring

A swarm of drones is deployed to perform a task, and their operations are monitored centrally. Geolocation details of the swarm are automatically collated and broadcast to the drone pilots, showing the locations of all the drones and each is circled with a suitable safety zone to warn their operators in case two units find themselves flying in close proximity.

Pilots are safely able to operate either autonomously or under the direction of central control. Extra zonal information can be added to the operators' maps to show the outer perimeter of their operating area and warn of fixed aerial hazards, e.g. a radio mast, or transient hazards, e.g. a helicopter.

Crisis Response

Disaster strikes, e.g. hurricane, tsunami, and emergency response teams are deployed to the affected area. However, it is difficult to verify which problems people are facing, what resources would help them and exactly where these events are occurring. Maps are unreliable as the infrastructure has been damaged, though people on the ground have the relevant knowledge if it could be reliably recorded and shared.

Anyone with a basic smartphone could video events with reliable geospatial data, as GPS receivers can operate without the need for a mobile phone signal by using satellite data, to accurately document the problems they face. Even if the cell network is not operational, this information can be physically delivered to crisis coordinators to notify them of the issues that need to be addressed, including an accurate location in a common format. Crisis events can be reliably recorded, knowledge can be shared and aggregated, and relief resources can be accurately targeted and deployed to the correct locations.

Fastest Lap

An amateur racer attempts to set a fastest lap time in a vehicle equipped with a dashcam. The film can be reviewed with details of location and vehicle telemetry to analyse its performance, and that of the driver, to identify where improvements can be made. The ideal racing line can be determined and measured against the actual line taken, and optimum braking points can be established by suggesting and testing corrections.

Driver's commentary can be used to identify performance issues along with vehicle audio response, e.g. engine tone and tyres, for comparison with telemetry data to rapidly iterate testing using community analysis tools. Results can be shared online in a common format to display metrics and to demonstrate veracity.

Police Evidence

A web-based police system is established to allow dashcam video evidence of driving offences to be submitted digitally by members of the public who have witnessed them. Detectives are able to identify the time and vehicles involved directly from the uploaded footage, and accurately determine the location at which the incident occurred from the digital metadata included.

Its ability to accept open format data also makes the system available to cyclists and pedestrians who can record video with location on their helmet cameras and smartphones respectively, providing wider access to the service beyond the dashcam community. Metadata, e.g. location, from different video manufacturers is often recorded in mutually-incompatible formats, but video-map track support enables synchronised location (and other) data to be extracted from recordings using manufacturers' or community tools, without affecting source video integrity, and submitted to the police system in a common format, significantly reducing development costs.

Officers have been able to identify incident locations quickly and accurately, without sacrificing evidence integrity. The online service has been made available to a wider audience of drivers, cyclists and pedestrians, without incurring additional development costs.

State of the Art

No standard format currently exists by which web browsers can synchronise geolocation data with video. Though many browser-supported formats exist to present the two data streams separately, e.g. MPEG for video and GPX for geolocation, there is no viable synchronisation mechanism for video playback time with geolocation information.

Current Solutions

Material Exchange Format (MXF) was developed by the Society of Motion Picture and Television Engineers (SMPTE) to synchronise metadata, including geolocation, with audio and video streams using a register of key-length-value (KLV) triples. The breadth of its scope has resulted in interoperability issues, as different vendors implement different parts of the standard, and has produced implementations from high-profile companies which are mutually incompatible. KLVs can also be embedded within MPEG files, though this does not address the synchronisation issue for other web video formats such as WebM.

Video camera manufacturers have taken various approaches, resulting in a number of non-standard solutions including embedding geolocation data within the MPEG metadata stream in disparate formats, e.g. Go-Pro Metadata Format (GPMF), or recording a separate geolocation file in a proprietary format alongside the associated video file. From a hardware perspective, a few high-end cameras provide geotagging out of the box and all require an add-on device to support this feature.

Geospatial data are not currently accessible in the video Document Object Model (DOM) in HTML nor via video playback APIs in smartphones, e.g. Android, though their host devices are typically equipped with both a video camera and Global Navigation Satellite System (GNSS) receiver capable of capturing the required information.

In sharp contrast, still photos have a well-established geotagging standard called Exif, which was published by the Japan Electronic Industries Development Association (JEIDA) in 1995 and defines a metadata tag to embed geolocation data within TIFF and JPEG images. This is widely supported by manufacturers of photographic equipment and software worldwide, including low-end smartphones, making this feature cheap and accessible to the public.

Growing Requirements

Historically, there has been no requirement for a comparable video standard, but the urgency for such a standard is growing fast due to the emerging markets for 'mobile video devices,' e.g. drones, dashcams, body-worn video and helmet cameras, as well as the rise in high-quality video and geolocation support in the global smartphone market.

Accessible Standard Opportunity

Using current W3C recommendations, it is possible for a competent programmer to synchronise video-geolocation 'metadata' with a <video> element using a <track> child element. However, this is a non-trivial development task which requires an understanding of the video DOM events and Javascript file handling, making it inaccessible to the vast majority of web users. In addition, video map tracks are a clearly identified metadata subclass, which could be isolated in a similar way to video text tracks.

Establishing a standard file format would allow interoperability and information sharing between the public, the emergency services, police and other mobile video device users, e.g. drone pilots, giving cheaper and easier access to this important source of information. If web browsers supported video geotagging natively using this file format, it would also become accessible to most web users. Current low-end smartphones already provide suitable hardware to concurrently capture video and geolocation streams, which would make this technology easily accessible to the general public, and encourage the user and developer communities to grow rapidly.

Proposed Solution

This proposal constitutes a lightweight markup language to synchronise video with geolocation data for display on electronic maps, e.g. OpenStreetMaps. It offers presentational control of the map display, e.g. pan and zoom, and annotation to highlight map features to the viewer, e.g. markers and labels.

WebVMT (Web Video Map Tracks) format is intended for marking up external map track resources, and its main use is for files synchronising video content with an annotated map presentation. Ideas have been borrowed from existing W3C formats, including WebVTT's HTML binding and its block and cue structures, and SVG's approach to drawing and animation, in order to display output on an electronic map.

The format mimics WebVTT's structure and syntax for video synchronisation, with cue details listed in an accessible text-based file linked to the <video> DOM element by a child <track> element in an HTML document.

<!doctype html>
<html>
  <head>
    <title>WebVMT Basic Example</title>
  </head>
  <body>
    <!-- Video display -->
    <video controls width="640" height="360">
      <source src="video.mp4" type="video/mp4">
      <track src="maptrack.vmt" kind="metadata" for="vmt-map" tileurl="https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png?key=VALID_OSM_KEY">
      Your browser does not support the video tag.
    </video>
    <!-- Map display -->
    <div id="vmt-map" style="height: 360px; width:640px;"></div>
  </body>
</html>
      

The VMT (Video Map Track) format file, e.g. maptrack.vmt, contains the map cues associated with the video, e.g. video.mp4.

The meaning of for and tileurl attributes for user agents is an open question. Initial solutions can be built using Javascript, with existing map libraries such as Leaflet, though the vision is that future user agents will handle map rendering in the longer term.

Map Cues

Map cues display their payload between a start time and end time. The end cue time may be omitted to signify the end time of the video, which may be unknown in the case of streamed video.

Hello World

Here is a sample VMT file with a cue highlighting Tower Bridge in London on a static map.

WEBVMT

MEDIA
url:TowerBridge.mp4
mimetype:video/mp4

MAP
lat:51.506,lng:-0.076
rad:250

00:00:02.000 --> 00:00:05.000
{ "moveto":
  { "lat": 51.504362, "lng": -0.076153 }
}
{ "lineto":
  { "lat": 51.506646, "lng": -0.074651 }
}
          

Map Presentation

Cues also allow dynamic presentation to pan and zoom the map. This example focusses attention on the Tower of London.

Cues without end times are displayed until the end of the video.

WEBVMT

MEDIA
url:../movies/TowerOfLondon.webm
mimetype:video/webm

MAP
lat:51.162,lng:-0.143
rad:500

00:00:03.000 -->
{ "panto":
  { "lat": 51.508, "lng": -0.077, "end": "00:00:05.000" }
}

00:00:06.000 -->
{ "zoom":
  { "rad": 250 }
}
          

Comments

Comments are blocks that are preceded by a blank line, start with the word NOTE (followed by a space or newline), and end at the first blank line.

Comment Block

Comment block format is identical to WebVTT.

WEBVMT

NOTE Associated video

MEDIA
url:/home/myuser/movies/TowerLandmarks.ogg
mimetype:video/ogg

NOTE Map config

MAP
lat:51.506,lng:-0.076
rad:250

NOTE Tower Bridge

00:00:01.000 --> 00:00:05.000
{ "moveto":
  { "lat": 51.504362, "lng": -0.076153 }
}
{ "lineto":
  { "lat": 51.506646, "lng": -0.074651 }
}

NOTE City Hall

00:00:02.000 -->
{ "circle":
  { "lat": 51.504789, "lng": -0.078642, "rad": 20 }
}

NOTE Tower Of London
This line is also part of the comment

00:00:03.000 --> 00:00:04.000
{ "polygon":
  { "points":
    [ { "lat": 51.507193, "lng": -0.074844 },
      { "lat": 51.508756, "lng": -0.074716 },
      { "lat": 51.509036, "lng": -0.075638 },
      { "lat": 51.508929, "lng": -0.077162 },
      { "lat": 51.507727, "lng": -0.077848 },
      { "lat": 51.507220, "lng": -0.075767 }
    ]
  }
}
          

Styling

Display style is controlled by CSS, which may be embedded in HTML or within the VMT file.

CSS Style in HTML

In this example, an HTML page has a CSS style sheet in a <style> element that styles map cues for the video, e.g. drawing lines in red.

<!doctype html>
<html>
  <head>
    <title>WebVMT Style Example</title>
    <style>
      video::cue-map {
        stroke-color: red;
        stroke-opacity: 0.9;
      }
    </style>
  </head>
  <body>
    <video controls width="640" height="360">
      <source src="video.mp4" type="video/mp4">
      <track src="maptrack.vmt" kind="metadata" for="vmt-map" tileurl="https://api2.ordnancesurvey.co.uk/mapping_api/v1/service/zxy/EPSG%3A3857/Outdoor%203857/\{z}/{x}/{y}.png?key=VALID_OS_KEY">
      Your browser does not support the video tag.
    </video>
    <div id="vmt-map" style="height: 360px; width:640px;"></div>
  </body>
</html>
          

CSS Style Block

Style block format is similar to WebVTT.

CSS style sheets can also be embedded in WebVMT files themselves. Style blocks are placed after any headers but before the first cue, and start with the word STYLE.

Comment blocks can be interleaved with style blocks.

WEBVMT

MEDIA
url:http://example.com/movies/Greenwich.mp4
mimetype:video/mp4

MAP
lat:51.478,lng:-0.001
rad:100

STYLE
::cue-map {
  stroke-color: red;
}

NOTE Comments are allowed between style blocks

STYLE
::cue-map {
  stroke-opacity: 0.9;
}

NOTE Prime Meridian marker

00:00:00.000 -->
{ "moveto":
  { "lat":51.477901, "lng": -0.001466 }
}
{ "lineto":
  { "lat":51.477946, "lng": -0.001466 }
}

NOTE Style blocks may not appear after the first cue
          

Animation

Map annotations may be animated using an animate command, in a similar way to the <animate> element in SVG.

Paths have additional properties to other annotations including a unique identifier, and an animation which can be controlled separately for distance calculation purposes.

Animated Path

A path typically identifies a mobile camera's route, which is defined by a moveto command followed by a sequence of lineto commands, and may include multiple discrete segments separated by moveto commands.

A path may be assigned an identifier using the path attribute to discriminate between and uniquely identify multiple paths. Distinct paths may be styled in different ways, e.g. colour, with separate speed and distance calculations performed during playback.

In this example, an animated path is traced from London to Brighton:

WEBVMT

NOTE Associated video

MEDIA
url:LondonBrighton.mp4
mimetype:video/mp4

NOTE Map config

MAP
lat:51.1618,lng:-0.1428
rad:30000

NOTE London overview

00:00:01.000 -->
{ "panto":
  { "lat": 51.4952, "lng": -0.1441 }
}

00:00:02.000 -->
{ "zoom":
  { "rad": 3000 }
}

NOTE From London Victoria...

00:00:03.000 -->
{ "panto":
  { "lat": 50.830553, "lng": -0.141706, end: "00:00:25.000" }
}
{ "moveto":
  { "lat": 51.494477, "lng": -0.144753, "path": "cam1" }
}
{ "lineto":
  { "lat": 51.155958, "lng": -0.16089, "path": "cam1", "end": "00:00:10.000" }
}

NOTE ...via Gatwick Airport...

00:00:10.000 -->
{ "lineto":
  { "lat": 50.830553, "lng": -0.141706, "path": "cam1", "end": "00:00:25.000" }
}

NOTE ...to Brighton (at 00:00:25.000)

00:00:27.000 -->
{ "zoom":
  { "rad": 30000 }
}
          

Animated Annotation

This example tracks a drone with a circular 10-metre safety zone around it.

WEBVMT

NOTE Associated video

MEDIA
url:SafeDrone.mp4
mimetype:video/mp4

NOTE Map config

MAP
lat:51.0130,lng:-0.0015
rad:3000

NOTE Drone starts at (51.0130, -0.0015)

00:00:05.000 -->
{ "panto":
  { "lat": 51.0070, "lng": -0.0020, end: "00:00:25.000" }
}
{ "moveto":
  { "lat": 51.0130, "lng": -0.0015, "path": "drone1" }
}
{ "lineto":
  { "lat": 51.0090, "lng": -0.0017, "path": "drone1",
    "end": "00:00:10.000" }
}

NOTE Safety zone

00:00:05.000 --> 00:00:10.000
{ "circle":
  { "lat": 51.0130, "lng": -0.0015, "rad": 10,
    "animate":
    [ { "name": "lat", "to": 51.0090, "end": "00:00:10.000" },
      { "name": "lng", "to": -0.0017, "end": "00:00:10.000" }
    ]
  }
}

NOTE Drone arrives at (51.0090, -0.0017)

00:00:10.000 -->
{ "lineto":
  { "lat": 51.0070, "lng": -0.0020, "path": "drone1", "end": "00:00:25.000" }
}
{ "circle":
  { "lat": 51.0090, "lng": -0.0017, "rad": 10,
    "animate":
    [ { "name": "lat", "to": 51.0070, "end": "00:00:25.000" },
      { "name": "lng", "to": -0.0020, "end": "00:00:25.000" }
    ]
  }
}

NOTE Drone ends at (51.0070, -0.0020)
          

YouTube Integration

Embedded YouTube content can be displayed using an <iframe> element, specifying the unique 10-character content identifier for the posted video, using the official YouTube IFrame API with the Javascript API enabled.

Hello YouTube

A child <track> pseudo-element within the <iframe> links it with WebVMT using the same syntax as for the <video> DOM element.

<!doctype html>
<html>
  <head>
    <title>WebVMT YouTube Example</title>
  </head>
  <body>
    <!-- Video display -->
    <iframe src="http://www.youtube.com/embed/YOUTUBE_VIDEO_ID?enablejsapi=1" width="640" height="360" frameborder="0">
      <track src="maptrack.vmt" kind="metadata" for="vmt-map" tileurl="mapbox://styles/mapbox/streets-v9">
    </iframe>
    <!-- Map display -->
    <div id="vmt-map" style="height: 360px; width:640px;"></div>
  </body>
</html>
          

Note that the <track> pseudo-element is actually replaced by the <iframe> content when the page is loaded.

The url in the MEDIA block should match the src attribute of the <iframe> element without the query.

WEBVMT

NOTE Associated YouTube video

MEDIA
url:http://www.youtube.com/embed/YOUTUBE_VIDEO_ID
mimetype:video/mp4
          

Data Model

The data model of WebVMT consists of four key elements: the linked media file, the video viewport, cues, and the map viewport. The linked media file contains audio or video data with which cues are synchronised. The video viewport is the rendering area for video output. Cues are containers consisting of a set of metadata lines. The map viewport is the rendering area for metadata output, for example graphical annotations overlaid on an online map.

Overview

The WebVMT file is a container file for chunks of data that are time-aligned with a video or audio resource. It can therefore be regarded as a serialisation format for time-aligned data.

A WebVMT file starts with a header and then contains a series of data blocks. If a data block has a start time, it is called a WebVMT cue. A comment is another kind of data block.

A WebVMT file carries cues which are identified as metadata and specified in the kind attribute of the track element in the HTML specification.

The data kind of a WebVMT file is externally specified, such as in a HTML file’s track element. The environment is responsible for interpreting the data correctly.

A WebVMT cue is rendered as an overlay on top of the map viewport.

WebVMT Cues

A WebVMT cue is a text track cue with an optional end time and that additionally consists of the following:

A cue text
The raw text of the cue which is interpreted as time-aligned metadata, and rules for its interpretation.

A WebVMT cue without an end time indicates that the cue end time is equal to the duration of the media timeline. The duration value may be unknown, for example during media streaming.

WebVMT Map

A WebVMT map is the map viewport and provides a rendering area for WebVMT cues.

A WebVMT map consists of:

A map center latitude
The latitude of the location at the center of the map.
A map center longitude
The longitude of the location at the center of the map.
A map zoom radius
The radius in metres of the minimum area visible from the center of the map.
A map object
The control interface object for the map.

The precise format of the map object control interface is implementation dependent, for example the OpenLayers API.

WebVMT Media

A WebVMT media is the linked media data with which WebVMT cues are synchronised, for example audio or video.

A WebVMT media enables a web crawler to rapidly search media metadata by providing sufficient information to construct a time-metadata index of the linked media file without opening it. Search engine data throughput is reduced as only matching media files selected by the user need be read, and non-matching media files are not accessed at all. Care should be taken to maintain WebVMT media details correctly, for example when a media file is renamed.

A WebVMT media consists of:

A media URL
The URL of the linked media file.

A null media URL indicates that no linked media file exists.

A media MIME type
The MIME type of the linked media file.

A null media MIME type indicates that no linked media file exists.

A media start time
The global time and date at which the linked media file begins.

The media start time allows multiple WebVMT files to be aggregated. A null media start time indicates that no start time is associated, for example in the case of an animation.

A media path
The path identifier which uniquely identifies the moving object capturing the linked media file.

A null media path indicates that no moving object is associated, for example when no linked media file exists.

WebVMT Command Structures

A WebVMT command is an instruction to display WebVMT metadata content.

A WebVMT command consists of one of the following components:

WebVMT commands are executed in order from first to last in the WebVMT file.

Map Controls

A WebVMT map control command controls map presentation.

A WebVMT map control command consists of one of the following components:

A WebVMT pan is a command to set the center location of the map.

A WebVMT pan consists of:

A pan latitude
The latitude in degrees of the map center.
A pan longitude
The longitude in degrees of the map center.
A pan end time
The time at which the new map center is reached.
The pan end time may be defined as an absolute value, or calculated relative to the cue start time using a duration.

A WebVMT zoom is a command to set the level of detail of the map.

A WebVMT zoom consists of:

A zoom radius
The radius in metres of the minimum area visible from the center of the map.

Shape Annotations

A WebVMT shape annotation command annotates a shaped area to a map overlay.

A WebVMT shape annotation command consists of one of the following components:

A WebVMT circle is a command to annotate a circular area to the map.

A WebVMT circle consists of:

A circle latitude
The latitude in degrees of the circle center.
A circle longitude
The longitude in degrees of the circle center.
A circle radius
The radius in metres of the circle.

A WebVMT polygon is a command to annotate a polygonal area to the map.

A WebVMT polygon consists of:

A list of WebVMT locations defining the polygon vertices.
Vertex locations are listed sequentially around the perimeter of the polygon. The last vertex should not repeat the value of the first, as this is implicit.

A WebVMT location consists of:

A location latitude
The latitude in degrees of the location.
A location longitude
The longitude in degrees of the location.

Location information is provided in terms of World Geodetic System coordinates, WGS84.

Paths

A WebVMT path consists of all the path segments with the same unique path identifier.

A path segment describes the trajectory of the identified object moving through the mapped space.

A path segment consists of the following components, in the order given:

  1. One WebVMT move command;
  2. Zero or more WebVMT line commands.

A WebVMT move is a command to set the start location of the path segment.

A WebVMT move consists of:

A path identifier
The identifier shared by all the path segments in the WebVMT path.
By default, the path identifier is set to null.
A segment start latitude
The latitude in degrees of the path segment start.
A segment start longitude
The longitude in degrees of the path segment start.

A WebVMT line is a command to set the end location of the path segment.

A WebVMT line consists of:

A path identifier
The identifier shared by all the path segments in the WebVMT path.
By default, the path identifier is set to null.
A segment end latitude
The latitude in degrees of the path segment end.
A segment end longitude
The longitude in degrees of the path segment end.
A segment end time
The time at which the path segment end location is reached.
The segment end time may be defined as an absolute value, or calculated relative to the cue start time using a duration.

A WebVMT line is a straight line between two locations. The trajectory of the moving object can be linearly interpolated between the cue start time and the segment end time.

Animations

A WebVMT animation changes an object attribute from a start value to an end value over a given time.

A WebVMT animation consists of:

An animation object
The parent object of the attribute.
An animation attribute
The attribute of the object to change.
An animation start value
The initial value of the attribute.
An animation end value
The final value of the attribute.
An animation start time
The time at which to begin changing the attribute.
An animation end time
The time at which to finish changing the attribute.
The animation end time may be defined as an absolute value, or calculated relative to the animation start time using a duration.

Syntax

WebVMT File Structure

A WebVMT file must consist of a WebVMT file body encoded as UTF-8 and labeled with the MIME type text/vmt.

A WebVMT file body consists of the following components, in the order given:

  1. An optional U+FEFF BYTE ORDER MARK (BOM) character.
  2. The string "WEBVMT" (U+0057 LATIN CAPITAL LETTER W, U+0045 LATIN CAPITAL LETTER E, U+0042 LATIN CAPITAL LETTER B, U+0056 LATIN CAPITAL LETTER V, U+004D LATIN CAPITAL LETTER M, U+0054 LATIN CAPITAL LETTER T).
  3. Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER TABULATION (tab) character followed by any number of characters that are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters.
  4. Two or more WebVMT line terminators to terminate the line with the file magic and separate it from the rest of the body.
  5. The following components, in any order, separated from each other by one or more WebVMT line terminators.
  6. Zero or more WebVMT line terminators.
  7. Zero or more WebVMT cue blocks and WebVMT comment blocks separated from each other by one or more WebVMT line terminators.
  8. Zero or more WebVMT line terminators.

A WebVMT line terminator consists of one of the following:

A WebVMT media definition block consists of the following components, in the order given:

  1. The string "MEDIA" (U+004D LATIN CAPITAL LETTER M, U+0045 LATIN CAPITAL LETTER E, U+0044 LATIN CAPITAL LETTER D, U+0049 LATIN CAPITAL LETTER I, U+0041 LATIN CAPITAL LETTER A).
  2. Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  3. A WebVMT line terminator.
  4. A WebVMT media settings list.
  5. A WebVMT line terminator.

A WebVMT map initialisation block consists of the following components, in the order given:

  1. The string "MAP" (U+004D LATIN CAPITAL LETTER M, U+0041 LATIN CAPITAL LETTER A, U+0050 LATIN CAPITAL LETTER P).
  2. Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  3. A WebVMT line terminator.
  4. A WebVMT map settings list.
  5. A WebVMT line terminator.

The WebVMT map initialisation block defines the state of the WebVMT map before any WebVMT cues are active.

A WebVMT style block consists of the following components, in the order given:

  1. The string "STYLE" (U+0053 LATIN CAPITAL LETTER S, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+004C LATIN CAPITAL LETTER L, U+0045 LATIN CAPITAL LETTER E).
  2. Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  3. A WebVMT line terminator.
  4. Any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by a WebVMT line terminator, except that the entire resulting string must not contain the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN). The string represents a CSS style sheet; the requirements given in the relevant CSS specifications apply.
  5. A WebVMT line terminator.

A WebVMT cue block consists of the following components, in the order given:

  1. Optionally, a WebVMT cue identifier followed by a WebVMT line terminator.
  2. WebVMT cue timings.
  3. Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  4. A WebVMT line terminator.
  5. The WebVMT cue payload consists of a WebVMT metadata text, but must not contain the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).
  6. A WebVMT line terminator.

A WebVMT cue block corresponds to one piece of time-aligned data in the WebVMT file. The WebVMT cue payload is the data associated with the WebVMT cue.

A WebVMT cue identifier is any sequence of one or more characters not containing the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), nor containing any U+000A LINE FEED (LF) characters or U+000D CARRIAGE RETURN (CR) characters.

A WebVMT cue identifier must be unique amongst all the WebVMT cue identifiers of all WebVMT cues of a WebVMT file.

A WebVMT cue identifier can be used to reference a specific cue, for example from script or CSS.

The WebVMT cue timings part of a WebVMT cue block consists of the following components, in the order given:

  1. A WebVMT timestamp representing the start time offset of the cue. The time represented by this WebVMT timestamp must be greater than or equal to the start time offsets of all previous cues in the file.
  2. One or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  3. The string "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).
  4. One or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  5. Optionally, a WebVMT timestamp representing the end time offset of the cue. The time represented by this WebVMT timestamp must be greater than or equal to the start time offset of the cue.

The WebVMT cue timings give the start and end offsets of the WebVMT cue block. Different cues can overlap. Cues are always listed ordered by their start time.

A WebVMT timestamp consists of the following components, in the order given:

  1. Optionally (required if hours is non-zero):
    1. Two or more ASCII digits, representing the hours as a base ten integer.
    2. A U+003A COLON character (:).
  2. Two ASCII digits, representing the minutes as a base ten integer in the range 0 ≤ minutes ≤ 59.
  3. A U+003A COLON character (:).
  4. Two ASCII digits, representing the seconds as a base ten integer in the range 0 ≤ seconds ≤ 59.
  5. A U+002E FULL STOP character (.).
  6. Three ASCII digits, representing the thousandths of a second seconds-frac as a base ten integer.

A WebVMT timestamp is always interpreted relative to the current playback position of the media data with which the WebVMT file is to be synchronized.

A WebVMT comment block consists of the following components, in the order given:

  1. The string "NOTE" (U+004E LATIN CAPITAL LETTER N, U+004F LATIN CAPITAL LETTER O, U+0054 LATIN CAPITAL LETTER T, U+0045 LATIN CAPITAL LETTER E).
  2. Optionally, the following components, in the order given:
    1. Either:
    2. Any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by a WebVMT line terminator, except that the entire resulting string must not contain the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).
  3. A WebVMT line terminator.

A WebVMT comment block is ignored by the parser.

WebVMT Cue Payload

WebVMT metadata text consists of any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by a WebVMT line terminator. (In other words, any text that does not have two consecutive WebVMT line terminators and does not start or end with a WebVMT line terminator.)

The string represents a WebVMT command list.

WebVMT metadata text cues are only useful for scripted applications (e.g. using the metadata text track kind in a HTML text track).

WebVMT Media Settings

The WebVMT media settings list consists of zero or more of the following components, in any order, separated from each other by one or more U+0020 SPACE characters, U+0009 CHARACTER TABULATION (tab) characters, or WebVMT line terminators, except that the string must not contain two consecutive WebVMT line terminators. Each component must not be included more than once per WebVMT media settings list string.

A WebVMT media url setting consists of the following components, in the order given:

  1. The string "url".
  2. A U+003A COLON character (:).
  3. A valid URL.

For the purpose of resolving a URL in the MEDIA block of a WebVMT file, or any URLs in resources referenced from MEDIA blocks of a WebVMT file, if the URL’s scheme is not "data", then the user agent must act as if the URL failed to resolve. If the url value does not match the tileurl attribute of the HTML <track> element, then the tileurl value takes precedence.

A WebVMT media MIME type setting consists of the following components, in the order given:

  1. The string "mimetype".
  2. A U+003A COLON character (:).
  3. A valid MIME type.

A WebVMT media start time setting consists of the following components, in the order given:

  1. The string "starttime".
  2. A U+003A COLON character (:).
  3. A valid global date and time string.

A WebVMT media path setting consists of the following components, in the order given:

  1. The string "path".
  2. A U+003A COLON character (:).
  3. A WebVMT path identifier.

WebVMT Map Settings

The WebVMT map settings list consists of the following components, in any order, separated from each other by one or more U+0020 SPACE characters, U+0009 CHARACTER TABULATION (tab) characters, or WebVMT line terminators, except that the string must not contain two consecutive WebVMT line terminators. Each component must be included once per WebVMT map settings list string.

The WebVMT map settings list defines the WebVMT map state before the first cue.

A WebVMT map center setting consists of a WebVMT location setting.

A WebVMT map zoom setting consists of the following components, in the order given:

  1. The string "rad".
  2. A U+003A COLON character (:).
  3. One or more ASCII digits.
  4. Optionally:
    1. A U+002E DOT character (.).
    2. One or more ASCII digits.

When interpreted as a number, the WebVMT map zoom setting represents the map zoom radius.

A WebVMT location setting consists of the following components, in any order, separated from each other by a U+002C COMMA character (,). Each component must be included once per WebVMT location setting string.

A WebVMT latitude setting consists of the following components, in the order given:

  1. The string "lat".
  2. A U+003A COLON character (:).
  3. A WebVMT latitude.

A WebVMT latitude consists of the following components, in the order given:

  1. Optionally, a U+002D HYPHEN-MINUS character (-).
  2. One or more ASCII digits.
  3. Optionally:
    1. A U+002E DOT character (.).
    2. One or more ASCII digits.

When interpreted as a number, a WebVMT latitude must be in the range -90..+90.

A WebVMT longitude setting consists of the following components, in the order given:

  1. The string "lng".
  2. A U+003A COLON character (:).
  3. A WebVMT longitude.

A WebVMT longitude consists of the following components, in the order given:

  1. Optionally, a U+002D HYPHEN-MINUS character (-).
  2. One or more ASCII digits.
  3. Optionally:
    1. A U+002E DOT character (.).
    2. One or more ASCII digits.

When interpreted as a number, a WebVMT longitude must be in the range -180..+180.

WebVMT Commands

A WebVMT command list consists of one or more of the following components in any order, separated from each other by a WebVMT line terminator:

WebVMT Map Commands

A WebVMT map control command consists of one of the following components:

A WebVMT pan command consists of a JSON text representing the following JSON object:

A WebVMT pan parameter list is a JSON object representing the following components in any order:

A WebVMT location attribute list consists of a JSON text representing a list of the following JSON values in any order, separated from each other by a U+002C COMMA character (,):

A WebVMT latitude attribute consists of a JSON text consisting of the following components in the order given:

  1. The JSON string "lat".
  2. A U+003A COLON character (:).
  3. A JSON value consisting of a JSON number.

When interpreted as a number, a WebVMT latitude attribute must be in the range -90..+90.

A WebVMT longitude attribute consists of a JSON text consisting of the following components in the order given:

  1. The JSON string "lng".
  2. A U+003A COLON character (:).
  3. A JSON value consisting of a JSON number.

When interpreted as a number, a WebVMT longitude attribute must be in the range -180..+180.

A WebVMT zoom command consists of a JSON text representing the following JSON object:

A WebVMT zoom parameter list is a JSON object representing the following component:

When interpreted as a number, the WebVMT map zoom setting represents the map zoom radius.

A WebVMT radius attribute consists of a JSON text consisting of the following components in the order given:

  1. The JSON string "rad".
  2. A U+003A COLON character (:).
  3. A JSON value consisting of a JSON number greater than zero.

WebVMT Shape Commands

A WebVMT shape annotation command consists of one of the following components:

A WebVMT circle command consists of a JSON text representing the following JSON object:

A WebVMT circle parameter list consists of a JSON object representing the following components in any order:

A WebVMT polygon command consists of a JSON text representing the following JSON object:

A WebVMT polygon parameter list consists of the following JSON object:

A WebVMT polygon points list consists of the following JSON object:

A WebVMT vertices list consists of a JSON array of three or more JSON objects each representing the following components in any order:

WebVMT Path Commands

A WebVMT path annotation command consists of one of the following components:

A WebVMT move command consists of a JSON text representing the following JSON object:

A WebVMT move parameter list is a JSON object representing the following components in any order:

A WebVMT path attribute consists of a JSON text consisting of the following components in the order given:

  1. The JSON string "path".
  2. A U+003A COLON character (:).
  3. A JSON value consisting of a JSON string representing a WebVMT path identifier.

A WebVMT line command consists of a JSON text representing the following JSON object:

A WebVMT line parameter list consists of a JSON object representing the following components in any order:

A WebVMT path identifier is any sequence of one or more characters not containing the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), nor containing any U+000A LINE FEED (LF) characters or U+000D CARRIAGE RETURN (CR) characters.

A WebVMT path identifier is a string which uniquely identifies a moving object in the WebVMT file, for example a camera.

WebVMT Animation Subcommand

A WebVMT animation subcommand consists of a JSON text representing the following JSON object:

The WebVMT animation subcommand refers to the attributes of its parent command. The parent command is the animation object.

A WebVMT animation parameter list consists of a JSON array of one or more WebVMT individual animation lists.

A WebVMT individual animation list consists of a JSON object consisting of the following components in any order:

A WebVMT name attribute consists of a JSON text consisting of the following components in the order given:

  1. The JSON string "name".
  2. A U+003A COLON character (:).
  3. A JSON value consisting of a JSON string.

A WebVMT name attribute represents the name of the animation attribute.

A WebVMT target value attribute consists of a JSON text consisting of the following components in the order given:

  1. The JSON string "to".
  2. A U+003A COLON character (:).
  3. A JSON value.

The WebVMT target value attribute represents the value of the animation end value.

A WebVMT end time attribute consists of a JSON text consisting of the following components in the order given:

  1. The JSON string "end".
  2. A U+003A COLON character (:).
  3. A JSON string representing a WebVMT timestamp.

A WebVMT end time attribute represents the value of the animation end time.

A WebVMT duration attribute consists of a JSON text consisting of the following components in the order given:

  1. The JSON string "dur".
  2. A U+003A COLON character (:).
  3. A JSON string representing a WebVMT timestamp.

A WebVMT duration attribute represents the difference in value between the animation end time and the animation start time.

Properties Of Cue Sequences

WebVMT File Using Only Nested Cues

A WebVMT file whose cues all comply with the following rule is said to be a WebVMT file using only nested cues.

Given any two cues cue1 and cue2 with start and end time offsets (x1, y1) and (x2, y2) respectively:

  • either cue1 lies fully within cue2, i.e. x1 >= x2 and y1 <= y2;
  • or cue1 fully contains cue2, i.e. x1 <= x2 and y1 >= y2.

The following example matches this definition:

WEBVMT

NOTE Required non-cue blocks omitted for clarity

00:00.000 --> 01:24.000
{ "circle": { "lat": 0, "lng": 0, "rad": 2000 } }

00:00.000 --> 00:44.000
{ "moveto": { "lat": 0, "lng": 0, "path": "cam1" } }
{ "lineto": { "lat": 0.12, "lng": 0.34, "path": "cam1" } }

00:44.000 --> 01:19.000
{ "lineto": { "lat": 0.56, "lng": 0.78, "path": "cam1" } }

01:24.000 --> 05:00.000
{ "circle": { "lat": 0, "lng": 0, "rad": 30000 } }

01:35.000 --> 03:00.000
{ "moveto": { "lat": 0.87, "lng": 0.65, "path": "cam2" } }
{ "lineto": { "lat": 0.43, "lng": 0.21, "path": "cam2" } }

03:00.000 --> 05:00.000
{ "lineto": { "lat": 0, "lng": 0, "path": "cam2" } }
          

Notice how you can express the cues in this WebVMT file as a tree structure:

  • 2km Circle at (0, 0)
    • Line (0, 0) to (0.12, 0.34)
    • Line (0.12, 0.34) to (0.56, 0.78)
  • 30km Circle at (0, 0)
    • Line (0.87, 0.65) to (0.43, 0.21)
    • Line (0.43, 0.21) to (0, 0)

If the file has cues that can’t be expressed in this fashion, then they don’t match the definition of a WebVMT file using only nested cues. For example:

WEBVMT

NOTE Required non-cue blocks omitted for clarity

00:00.000 --> 01:00.000
{ "moveto": { "lat": 0.12, "lng": 0.34, "path": "cam3" } }
{ "lineto": { "lat": 0.56, "lng": 0.78, "path": "cam3" } }

00:30.000 --> 01:30.000
{ "moveto": { "lat": 0.87, "lng": 0.65, "path": "cam4" } }
{ "lineto": { "lat": 0.43, "lng": 0.21, "path": "cam4" } }
          

In this ninety-second example, the two cues partly overlap, with the first ending before the second ends and the second starting before the first ends. This therefore is not a WebVMT file using only nested cues.

Known Issues

This section captures issues which have been identified, but are not fully documented in this explanatory specification.

As the specification develops, issues will be moved out of this section and included elsewhere in the document, until it is no longer needed and is completely removed.

Planned Features

This section lists potential features which have been identified during the development process, but have not yet matured to a full design specification.

Features which appear in this section warrant further investigation, but are not guaranteed to appear in the final specification.

Markers

An image linked to and displayed at an offset from a geolocation.

Labels

A text string linked to and displayed at an offset from a geolocation.

Tile Shortcuts

Shortcuts to popular tile URLs for easy access and to help avoid URL syntax errors.

Layers

Syntax to allow more than one layer of map tiles to be specified, e.g. 'map' and 'satellite' layers.

This should be functional, but remain lightweight.

Multiple APIs

The current tech demo is based on the Leaflet API, but should be broadened to support other web map APIs, e.g. Open Layers.

A hot-swap feature would allow users to switch API on-the-fly to take advantage of the unique features supported by different APIs, e.g. Street View.

Camera Direction

Camera orientation may not match the direction of travel, or may be dynamic, e.g. for Augmented Reality.

Data Sync

Mobile video devices often collect additional data, e.g. drones can be used as sensor platforms and dashcams record vehicle telemetry such as speed and acceleration.

Allowing the option to embed XML data within cues would permit synchronisation of arbitrary data with location and video without compromising interoperability, in a similar fashion to the GPX <extension> element.

Co-ordinate Reference Systems

Although originally conceived for Earth-based use, spatial data in other environments could be accommodated by specifying the co-ordinate reference system. For example, location on another planet, e.g. Mars, or in an artifical environment, e.g. a video game.

Planned Interfaces

This section lists interfaces which have been identified during the development process, but have not yet matured to a full design specification.

VMTCue Interface

Expose WebVMT Cues in the DOM API, based on the HTML5 DataCue interface. For example:

              [Exposed=Window,
              Constructor(double startTime, double endTime, DOMString text),
              Constructor(double startTime, DOMString text)]
              interface VMTCue : DataCue {
                attribute boolean noEndTime;
              };
            

This is analogous to the VTTCue interface.

VMTMap Interface

Expose WebVMT Maps in the DOM API. For example:

              [Exposed=Window,
              Constructor]
              interface VMTMap {
                attribute double centerLat;
                attribute double centerLng;
                attribute double zoomRad;
                attribute object getMap;
              };
            

This is analogous to the VTTRegion interface.