WebVMT: The Web Video Map Tracks Format

This specification defines WebVMT, the Web Video Map Tracks format, which is an enabling technology whose main use is for marking up external metadata track resources in connection with the HTML <track> element. WebVMT files provide map presentation, annotation and interpolation synchronized to web media content, and more generally any form of data that is time-aligned with audio or video content, including those from location-aware devices such as dashcams, drones and smartphones.

Use Cases

This section details example scenarios in which WebVMT can add significant value with identified benefits.

Coastguard/Mountain Rescue

A missing person is reported to the rescue services, who deploy a drone to search inaccessible areas of coastline or moorland for their target. The drone relays back a live video stream from its camera and geolocation data from its GPS receiver to a remote human operator who is piloting it.

As the search continues, the operator spots a target on the video feed and can instantly call up an electronic map, synchronized to the video, which has been automatically following the drone’s position and plotting its ground track. The display gives the operator immediate context for the video, and allows them to override the automatic map control and zoom in to pinpoint the target’s precise location from the features visible in the video and on the map/satellite view. They mark the location and then zoom out to assess the surrounding terrain and advise the recovery team of the best approach to the target. For example, the terrain may dictate very different approach routes if either the person has twisted their ankle at the top of a cliff, or has fallen and is lying at the bottom of the same cliff, though the co-ordinates are almost identical in both cases.

The operator has been able to make important decisions quickly, which may be life critical, and deploy recovery resources effectively.

Rapid decision making;
Effective resource deployment.

Area Survey

A survey drone is equipped with a camera which records an image of the ground directly below it. The pilot is a remote human operator, tasked with surveying a defined area from particular height in order to capture the required data.

As the survey progresses, zones are automatically marked on the map to represent areas which have been included. Once the drone has finished its sweep, the operator can quickly confirm whether the required area has been completely covered. If any areas have been missed, the pilot can use the map to navigate and make additional passes to fill the gaps, before returning to base.

Adding WebVMT files to the survey archive provides a geospatial index to the videos, allowing a particular geographic location to be found more rapidly by virtue of their small file size in comparison to their linked media. Online video archives can be indexed more quickly using this web-friendly format.

The operator has been easily able to verify the quality of their own work and correct any errors, saving time and additional effort in redeployment. Video footage has been indexed by geolocation rapidly and in a search-engine-friendly format.

Autonomous quality assurance;
Cost saving;
Rapid archive indexing for search engines.

Outdoor Trails

An outdoor sportsperson, e.g. snowboarder or cyclist, is equipped with a helmet camera and/or mobile phone to record video footage and GPS data. They set off to find new challenges and practice their skills, e.g. off-piste or on mountain trails, and discover new routes and areas that they would like to explore in future, chatting to the camera as they go. Afterwards, they upload the video to share their experience with the online community, so others can quickly identify locations of particularly interesting sections of the featured trail. Using the synchronized map view in their browser, community members can easily see where they need to go in order to explore these places for themselves.

The operator has been able to fully engage in their sporting activity, without making any written notes, while simultaneously recording the details needed to guide others to the same locations. Their changing location over time can also be used to calculate speed and distance information, which can be displayed alongside the footage.

Non-invasive capture;
Information sharing;
Speed and distance calculation.

TV Sports Coverage

A TV production company is covering a sports event that takes place over a large area, e.g. rallying, road cycling or sailing, using a number of mobile video devices including competitor cams, e.g. dash cams or helmet cams, and drones to provide shots of inaccessible areas, e.g. remote terrain or over water.

Feeds from all the cameras are streamed to the production control room, where their geolocation data are combined on a map showing the locations of every competitor and camera, each labelled for easy identification. The live map enables the director to quickly choose the best shot and anticipate where and when to deploy their drone cameras to catch competitors at critical locations on the course as the competition develops in real time.

Multiple operators can function concurrently, both autonomously and under central direction. Mobile assets can be monitored and deployed from an operations centre to provide optimum coverage of the developing live event.

Multiple mobile video devices;
Real time asset management.

Proxy Explorer

Important details of a remote area have been captured on video. It is not possible to revisit the location for safety reasons or because it has physically changed in the intervening time. Footage can be retrospectively geotagged against a concurrent map to allow the viewer to better interpret and identify features seen in the footage. Explanatory annotations can be added to the WebVMT file to help future viewers' understanding and aggregate the collective analysis.

Multiple operators can contribute their observations to provide a group analysis, iteratively adding new details and discarding out-of-date information. Experts can offer insight about filmed locations, which would otherwise be inaccessible to them.

Remote analysis of inaccessible locations;
Knowledge aggregation and sharing for archive footage.

Treasure Hunt

A TV production company designs a new game show which involves competitors searching for targets across a wide area, with an operations centre remotely monitoring their progress and providing updates. Competitors are equipped with body-worn video or helmet cameras to relay footage of their view.

Geolocation context allow central operators to better understand the participants' actions and to remotely direct them more efficiently. Competitors' positions can displayed to the TV audience on annotated 2D- or 3D-maps for clearer presentation.

Speed and distance calculation;
Knowledge aggregation and sharing for real-time footage.

Swarm Monitoring

A swarm of drones is deployed to perform a task, and their operations are monitored centrally. Geolocation details of the swarm are automatically collated and broadcast to the drone pilots, showing the locations of all the drones and each is circled with a suitable safety zone to warn their operators in case two units find themselves flying in close proximity.

Pilots are safely able to operate either autonomously or under the direction of central control. Extra zonal information can be added to the operators' maps to show the outer perimeter of their operating area and warn of fixed aerial hazards, e.g. a radio mast, or transient hazards, e.g. a helicopter.

Static and dynamic hazard indication;
Central swarm monitoring;
Autonomous swarm monitoring.

Crisis Response

Disaster strikes, e.g. hurricane or tsunami, and emergency response teams are deployed to the affected area. However, it is difficult to verify which problems people are facing, what resources would help them and exactly where these events are occurring. Maps are unreliable as the infrastructure has been damaged, though people on the ground have the relevant knowledge if it could be reliably recorded and shared.

Anyone with a basic smartphone could video events with reliable geospatial data, as GPS receivers can operate without the need for a mobile phone signal by using satellite data, to accurately document the problems they face. Even if the cell network is not operational, this information can be physically delivered to crisis coordinators to notify them of the issues that need to be addressed, including accurate location data in a common format. Response teams can quickly search archived video by location to verify latest updates with recent context. Crisis events can be reliably recorded, knowledge can be shared and aggregated, and relief resources can be accurately targeted and deployed to the correct locations.

Reliable dispersed information gathering and sharing;
Accurate resource deployment.

Police Evidence

A web-based police system is established to allow dashcam video evidence of driving offences to be submitted digitally by members of the public who have witnessed them. Detectives are able to identify the time and vehicles involved directly from the uploaded footage, and accurately determine the location at which the incident occurred from the digital timed metadata included.

The ability to accept open format data also makes the system available to cyclists and pedestrians who can record video with location on their helmet cameras and smartphones respectively, providing wider access to the service beyond the dashcam community. Metadata, e.g. location, from different video manufacturers is often recorded in mutually-incompatible formats, but WebVMT support enables synchronized location (and other) data to be extracted from recordings using manufacturers' or community tools, without affecting source video integrity, and submitted to the police system in a common format, significantly reducing development costs.

Officers have been able to identify incident locations quickly and accurately, without sacrificing evidence integrity. The online service has been made available to a wider audience of drivers, cyclists and pedestrians, without incurring additional development costs.

Accurate location with evidence integrity preserved;
Development costs reduced;
Service extended to a wider audience.

Area Monitoring

An area of interest is monitored operationally by a collection of different mobile video devices, e.g. drones, body-worn video, helicopter, etc. Video footage, possibly in different formats, is added to an archive with location (and other) metadata in a common format which forms a time-location index suitable for rapid parsing by a web crawler. Users can submit online queries to search by location and return a time-ordered sequence of video frame stills captured within a radial distance of the chosen location. Alternatively, sensor data can be searched, e.g. for high readings, to return matching geotagged video frames for further analysis.

Video archives can be quickly indexed using a common metadata format regardless of video encoding, e.g. MPEG, WebM, OGG, and video files are only accessed in case of a positive search result, which reduces bandwidth in comparison to embedded metadata. Linked files also allow different security permissions to be applied to the crawling and querying processes, so an AI algorithm can be authorised to read metadata without being able to access image content if there are security concerns over data privacy, e.g. illicit facial recognition.

Homogenized video metadata from disparate sources;
Reduced search bandwidth;
Structured security support;
Web search engine compatible.

Vehicle Collision

Dashcam footage is searched to automatically identify vehicle collisions from impact acceleration profiles recorded in video metadata. Dashcam manufacturers typically embed metadata in an unpublished format and provide a proprietary video player to allow users to display it. Exporting embedded metadata to a linked file in a web-friendly format enables searchable video archive data to be shared quickly and easily, without affecting evidence integrity, and to be accessed through a common web interface.

Vehicles can be automatically monitored using a low-cost dashcam and web-based tools to ensure that collisions are accurately recorded by drivers and that commercial vehicles remain safe and undamaged. Interoperability means that users are not limited to a particular brand and can share evidence with insurers and the police in a common format without damaging its integrity.

Accurate vehicle collision detection;
Common format for data sharing;
Web search support;
Evidence integrity preserved.

Golden Tutorial

Augmented reality (AR) software is used to control assets or view content in situ at a particular location. For example, nearby street lights can be switched off or on by a service engineer for maintenance purposes, or an architect can see how their structural design integrates with the surrounding landscape at its proposed location before any building work has started.

Video footage can be recorded with location, camera orientation and other metadata so AR overlays be generated on demand. Such recordings can be used to demonstrate how AR content is displayed and controlled in order to educate users with a 'golden tutorial', to provide 'proof of action' as evidence of work done for auditing purposes, or to create example data for AR software testing and debugging.

Accurate AR video and data recording;
Improved AR software development.

Virtual Guide

A user triggers an audio track which provides guidance about the local area or instruction for a known object, e.g. Web of Things (WoT) device at that location. The audio timeline is synchronized with events that can display AR content, control WoT devices and display points of interest on a map which provide guidance with real world context by highlighting places or objects of interest and showing possible actions.

Users can be guided by a virtual assistant through an area of interest or sequence of actions augmented with AR/VR and WoT devices to visualise events and by an annotated map or model to provide additional geospatial context. Greater insight is given to the user by showing detailed views of the location on a map or internal structure of the identified object using a virtual model.

Contextual guidance provided in situ;
Concurrent operation with AR/VR video;
Integration with Web of Things.

Proposed Solution

This proposal constitutes a lightweight markup language to synchronise video with geolocation data for display on electronic maps, such as OpenStreetMaps. It offers presentational control of the map display, e.g. pan and zoom, and annotation to highlight map features to the viewer, e.g. paths and zones.

WebVMT (Web Video Map Tracks) format is intended for marking up external map track resources, and its main use is for files synchronising video content with an annotated map presentation. Ideas have been borrowed from existing W3C formats, including WebVTT's HTML binding and its block and cue structures, and SVG's approach to drawing, in order to display output on an electronic map.

The format mimics WebVTT's structure and syntax for media synchronisation, with cue details listed in an accessible text-based file linked to a <video> or <audio> DOM element by a child <track> element in an HTML document.

<!doctype html>
<html>
  <head>
    <title>WebVMT Basic Example</title>
  </head>
  <body>
    <!-- Video display -->
    <video controls width="640" height="360">
      <source src="video.mp4" type="video/mp4">
      <track src="maptrack.vmt" kind="metadata" for="vmt-map" tileurl="https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png?key=VALID_OSM_KEY">
      Your browser does not support the video tag.
    </video>
    <!-- Map display -->
    <div id="vmt-map" style="height: 360px; width:640px;"></div>
  </body>
</html>

The WebVMT format file, e.g. maptrack.vmt, contains the map cues associated with the video, e.g. video.mp4.

The meaning of for and tileurl attributes for user agents is an open question. Initial solutions can be built using Javascript, with existing map libraries such as Leaflet, though the vision is that future user agents will handle map rendering in the longer term.

Map Cues

Map cues display their payload between a start time and end time. The end cue time may be omitted to represent an unknown time.

Hello World

Here is a sample WebVMT file with a cue highlighting Tower Bridge in London on a static map.

WEBVMT

MEDIA
url:TowerBridge.mp4
mime-type:video/mp4

MAP
lat:51.506 lng:-0.076
rad:250

00:00:02.000 --> 00:00:05.000
{ "move-to":
  { "lat": 51.504362, "lng": -0.076153 }
}
{ "line-to":
  { "lat": 51.506646, "lng": -0.074651 }
}

Map Presentation

Cues also allow dynamic presentation to pan and zoom the map. This example focusses attention on the Tower of London.

Cues without end times are displayed until the end of the video.

WEBVMT

MEDIA
url:../movies/TowerOfLondon.webm
mime-type:video/webm

MAP
lat:51.162 lng:-0.143
rad:20000

00:00:03.000 -->
{ "pan-to":
  { "lat": 51.508, "lng": -0.077, "end": "00:00:05.000" }
}

00:00:06.000 -->
{ "zoom":
  { "rad": 250 }
}

Comments

Comments are blocks that are preceded by a blank line, start with the word NOTE (followed by a space or newline), and end at the first blank line.

Comment Block

Comment block format is identical to WebVTT.

WEBVMT

NOTE Associated video

MEDIA
url:/home/myuser/movies/TowerLandmarks.ogg
mime-type:video/ogg

NOTE Map config

MAP
lat:51.506 lng:-0.076
rad:500

NOTE Tower Bridge

00:00:01.000 --> 00:00:05.000
{ "move-to":
  { "lat": 51.504362, "lng": -0.076153 }
}
{ "line-to":
  { "lat": 51.506646, "lng": -0.074651 }
}

NOTE City Hall

00:00:02.000 -->
{ "circle":
  { "lat": 51.504789, "lng": -0.078642, "rad": 20 }
}

NOTE Tower Of London
This line is also part of the comment

00:00:03.000 --> 00:00:04.000
{ "polygon":
  { "perim":
    [ { "lat": 51.507193, "lng": -0.074844 },
      { "lat": 51.508756, "lng": -0.074716 },
      { "lat": 51.509036, "lng": -0.075638 },
      { "lat": 51.508929, "lng": -0.077162 },
      { "lat": 51.507727, "lng": -0.077848 },
      { "lat": 51.507220, "lng": -0.075767 }
    ]
  }
}

Styling

Display style is controlled by CSS, which may be embedded in HTML or within the WebVMT file.

CSS Style in HTML

In this example, an HTML page has a CSS style sheet in a <style> element that styles map cues for the video, e.g. drawing lines in red.

<!doctype html>
<html>
  <head>
    <title>WebVMT Style Example</title>
    <style>
      video::cue {
        stroke: red;
        stroke-opacity: 0.9;
      }
    </style>
  </head>
  <body>
    <video controls width="640" height="360">
      <source src="video.mp4" type="video/mp4">
      <track src="maptrack.vmt" kind="metadata" for="vmt-map" tileurl="https://api2.ordnancesurvey.co.uk/mapping_api/v1/service/zxy/EPSG%3A3857/Outdoor%203857/\{z}/{x}/{y}.png?key=VALID_OS_KEY">
      Your browser does not support the video tag.
    </video>
    <div id="vmt-map" style="height: 360px; width:640px;"></div>
  </body>
</html>

CSS Style Block

Style block format is similar to WebVTT.

CSS style sheets can also be embedded within WebVMT files. Style blocks are placed after any headers but before the first cue, and start with the word STYLE.

Comment blocks can be interleaved with style blocks.

WEBVMT

MEDIA
url:http://example.com/movies/Greenwich.mp4
mime-type:video/mp4

MAP
lat:51.478 lng:-0.001
rad:50

STYLE
::cue {
  stroke: red;
}

NOTE Comments are allowed between style blocks

STYLE
::cue {
  stroke-opacity: 0.9;
}
/* Style blocks cannot use blank lines nor "dash dash greater than" */

NOTE Prime Meridian marker

00:00:00.000 -->
{ "move-to":
  { "lat":51.477901, "lng": -0.001466 }
}
{ "line-to":
  { "lat":51.477946, "lng": -0.001466 }
}

NOTE Style blocks may not appear after the first cue

Data Synchronization

Arbitrary data may be associated with a WebVMT cue using a sync command, in a similar fashion to the GPX <extension> element.

WEBVMT

NOTE Associated video

MEDIA
url:Animals.mp4
mime-type:video/mp4

NOTE Map config

MAP
lat:51.1618 lng:-0.1428
rad:200

NOTE Cat, top left, after 5 secs until 25 secs

00:00:05.000 —-> 00:00:25.000
{ “sync”: { “type”: “org.ogc.geoai.example”, “data”:
  { “animal”:”cat”, “frame-zone”:”top-left" }
} }

NOTE Dog, mid right, after 10 secs until 40 secs

00:00:10.000 —-> 00:00:40.000
{ “sync”: { “type”: “org.ogc.geoai.example”, “data”:
  { “animal”: ”dog”, “frame-zone”: ”middle-right" }
} }

Interpolation

Data values may be interpolated using an interpolation end time to record interpolation end values in a WebVMT cue.

Sensor data can be interpolated between sample points to provide intermediate values where necessary, while retaining the original source data sample values.

Three interpolation schemes are supported:

Step: the value remains constant until the next sample time, e.g. vehicle gear selection - see ;
Linear: the value is linearly interpolated to the next sample time, e.g. temperature - see ;
Discrete: the value is only valid instanteously at the sample time, e.g. headcount in a video frame - see .

Step Interpolation

A stepwise-interpolated value, e.g. vehicle gear selection, remains constant until the next sample time.

WEBVMT

NOTE Required blocks omitted for clarity

NOTE Step interpolation of sensor1 data
     gear = 4 after 2 secs until 6 secs

00:00:02.000 --> 00:00:06.000
{ "sync":
  { "type": "org.webvmt.example1", "id": "sensor1", "data":
    { "gear": "4" }
  }
}

NOTE Step interpolation of sensor1 data
     gear = 5 after 6 secs until 9 secs

00:00:06.000 --> 00:00:09.000
{ "sync":
  { "type": "org.webvmt.example1", "id": "sensor1", "data":
    { "gear": "5" }
  }
}

Linear Interpolation

A linearly-interpolated value, e.g. temperature, changes to a final value at the next sample time in direct proportion to the elapsed sample interval.

WEBVMT

NOTE Required blocks omitted for clarity

NOTE Linear interpolation of sensor2 data
     temperature = 14 -> 16 after 4 secs until 6 secs

00:00:04.000 --> 00:00:06.000
{ "sync":
  { "type": "org.webvmt.example2", "id": "sensor2", "data":
    { "temperature": "14"}
  }
}
{ "sync":
  { "type": "org.webvmt.example2", "id": "sensor2", "end": "00:00:06.000", "data":
    { "temperature": "16"}
  }
}

NOTE Linear interpolation of sensor2 data
     temperature = 16 -> 19 after 6 secs until 9 secs

00:00:06.000 --> 00:00:09.000
{ "sync":
  { "type": "org.webvmt.example2", "id": "sensor2", "end": "00:00:09.000", "data":
    { "temperature": "19"}
  }
}

Discrete Interpolation

A discretely-interpolated value, e.g. headcount in a video frame, is only valid instanteously at the sample time.

WEBVMT

NOTE Required blocks omitted for clarity

NOTE Discrete interpolation of sensor3 data
     headcount = 12 at 4 secs

00:00:04.000 --> 00:00:04.000
{ "sync":
  { "type": "org.webvmt.example3", "id": "sensor3", "data":
    { "headcount": "12" }
  }
}

NOTE Discrete interpolation of sensor3 data
     headcount = 34 at 6 secs

00:00:06.000 --> 00:00:06.000
{ "sync":
  { "type": "org.webvmt.example3", "id": "sensor3", "data":
    { "headcount": "34" }
  }
}

Live Stream Interpolation

Live streams can be recorded with interpolation using unbounded cues, i.e. a cue with an unknown end time.

In this example, the result is identical to the previous step interpolation example but without requiring knowledge of any future data values during the live capture process.

WEBVMT

NOTE Required blocks omitted for clarity

NOTE Step interpolation of live1 data
     gear = 4 after 4 secs until next update

00:00:04.000 -->
{ "sync":
  { "type": "org.webvmt.example1", "id": "live1", "data":
    { "gear": "4" }
  }
}

NOTE Step interpolation of live1 data
     gear = 5 after 6 secs until next update

00:00:06.000 -->
{ "sync":
  { "type": "org.webvmt.example1", "id": "live1", "data":
    { "gear": "5" }
  }
}

NOTE End (step) interpolation of live1 data
     gear = 5 at 9 secs

00:00:09.000 --> 00:00:09.000
{ "sync":
  { "type": "org.webvmt.example1", "id": "live1", "data":
    { "gear": "5" }
  }
}

In the next example, the result is identical to the previous linear interpolation example but without requiring knowledge of any future data values during the live capture process.

WEBVMT

NOTE Required blocks omitted for clarity

NOTE Linear interpolation of live2 data
     temperature = 14 after 4 secs until next update

00:00:04.000 -->
{ "sync":
  { "type": "org.webvmt.example2", "id": "live2", "data":
    { "temperature": "14" }
  }
}
{ "sync":
  { "type": "org.webvmt.example2", "id": "live2", "end": "00:00:06.000", "data":
    { "temperature": "16" }
  }
}

NOTE Linear interpolation of live2 data
     temperature = 16 after 6 secs until next update

00:00:06.000 -->
{ "sync":
  { "type": "org.webvmt.example2", "id": "live2", "end": "00:00:09.000", "data":
    { "temperature": "19" }
  }
}

NOTE End (linear) interpolation of live2 data
     temperature = 19 at 9 secs

00:00:09.000 --> 00:00:09.000
{ "sync":
  { "type": "org.webvmt.example2", "id": "live2", "data":
    { "temperature": "19" }
  }
}

Values may not be interpolated during capture as future data are unknown, e.g. for linear interpolation, though can be correctly interpolated after capture, once end values are known during subsequent playbacks.

Path Interpolation

A WebVMT path describes the trajectory of a moving object which consists of a timed sequence of locations. The object's location may be interpolated between consecutive values in the sequence to calculate the distance travelled over time.

The path attribute may be set to identify an individual path. This allows a path:

to be styled with CSS, e.g. colour;
to be associated with speed and distance attributes during playback;
to be uniquely associated with the video footage.

In this example, an interpolated path is traced from London to Brighton:

WEBVMT

NOTE Associated video

MEDIA
url:LondonBrighton.mp4
mime-type:video/mp4
start-time:2018-02-19T12:34:56.789Z
path:cam-1

NOTE Map config

MAP
lat:51.1618 lng:-0.1428
rad:20000

NOTE London overview

00:00:01.000 -->
{ "pan-to":
  { "lat": 51.4952, "lng": -0.1441 }
}

00:00:02.000 -->
{ "zoom":
  { "rad": 10000 }
}

NOTE From London Victoria...

00:00:03.000 -->
{ "pan-to":
  { "lat": 50.830553, "lng": -0.141706, "end": "00:00:25.000" }
}
{ "move-to":
  { "lat": 51.494477, "lng": -0.144753, "path": "cam-1" }
}
{ "line-to":
  { "lat": 51.155958, "lng": -0.16089, "path": "cam-1", "end": "00:00:10.000" }
}

NOTE ...via Gatwick Airport...

00:00:10.000 -->
{ "line-to":
  { "lat": 50.830553, "lng": -0.141706, "path": "cam-1", "end": "00:00:25.000" }
}

NOTE ...to Brighton (at 00:00:25.000)

00:00:27.000 -->
{ "zoom":
  { "rad": 20000 }
}

Zone Interpolation

A WebVMT zone describes an area or volume. The locations of a zone's vertices may be interpolated and the zone may be animated in this way.

The zone attribute may be set to identify an individual zone. This allows a zone:

to be styled with CSS, e.g. colour;
to be associated with a path;
to be uniquely associated with the video footage.

This example tracks a drone with a circular 10-meter safety zone around it.

WEBVMT

NOTE Associated video

MEDIA
url:SafeDrone.mp4
mime-type:video/mp4

NOTE Map config

MAP
lat:51.0130 lng:-0.0015
rad:1000

NOTE Drone starts at (51.0130, -0.0015)

00:00:05.000 -->
{ "pan-to":
  { "lat": 51.0070, "lng": -0.0020, "end": "00:00:25.000" }
}
{ "move-to":
  { "lat": 51.0130, "lng": -0.0015, "path": "drone-1" }
}
{ "line-to":
  { "lat": 51.0090, "lng": -0.0017, "path": "drone-1", "end": "00:00:10.000" }
}

NOTE Safety zone

00:00:05.000 --> 00:00:10.000
{ "circle":
  { "lat": 51.0130, "lng": -0.0015, "rad": 10, "zone": "safety-1" }
}
{ "circle":
  { "lat": 51.0090, "lng": -0.0017, "rad": 10, "zone": "safety-1", "end": "00:00:10.000" }
}

NOTE Drone arrives at (51.0090, -0.0017)

00:00:10.000 -->
{ "line-to":
  { "lat": 51.0070, "lng": -0.0020, "path": "drone-1", "end": "00:00:25.000" }
}
{ "circle":
  { "lat": 51.0070, "lng": -0.0020, "rad": 10, "zone": "safety-1", "end": "00:00:25.000" }
}

NOTE Drone ends at (51.0070, -0.0020)

YouTube Integration

Embedded YouTube content can be displayed using an <iframe> element, specifying the unique 10-character content identifier for the posted video, using the official YouTube IFrame API with the Javascript API enabled.

Hello YouTube

A child <track> pseudo-element within the <iframe> links it with WebVMT using the same syntax as for the <video> DOM element.

<!doctype html>
<html>
  <head>
    <title>WebVMT YouTube Example</title>
  </head>
  <body>
    <!-- Video display -->
    <iframe src="http://www.youtube.com/embed/YOUTUBE_VIDEO_ID?enablejsapi=1" width="640" height="360" frameborder="0">
      <track src="maptrack.vmt" kind="metadata" for="vmt-map" tileurl="mapbox://styles/mapbox/streets-v9">
    </iframe>
    <!-- Map display -->
    <div id="vmt-map" style="height: 360px; width:640px;"></div>
  </body>
</html>

Note that the <track> pseudo-element is actually replaced by the <iframe> content when the page is loaded.

The url in the MEDIA block should match the src attribute of the <iframe> element without the query.

WEBVMT

NOTE Associated YouTube video

MEDIA
url:http://www.youtube.com/embed/YOUTUBE_VIDEO_ID
mime-type:video/mp4

Data Model

The data model of WebVMT consists of four key elements: the linked media file, the video viewport, cues, and the map viewport. The linked media file contains audio or video data with which cues are synchronized. The video viewport is the rendering area for video output. Cues are containers consisting of a set of metadata lines. The map viewport is the rendering area for metadata output, for example graphical annotations overlaid on an online map.

Overview

The WebVMT file is a container file for chunks of data that are time-aligned with a video or audio resource. It can therefore be regarded as a serialisation format for time-aligned data.

A WebVMT file starts with a header and then contains a series of data blocks. If a data block has a start time, it is called a WebVMT cue. A comment is another kind of data block.

A WebVMT file carries cues which are identified as metadata and specified in the kind attribute of the track element in the HTML specification.

The data kind of a WebVMT file is externally specified, such as in a HTML file’s track element. The environment is responsible for interpreting the data correctly.

A WebVMT cue is rendered as an overlay on top of the map viewport.

WebVMT Cue

A WebVMT cue is a text track cue that additionally consists of the following:

A cue text: The raw text of the cue which is interpreted as time-aligned metadata, and rules for its interpretation.

A WebVMT cue without an end time indicates that the cue is an unbounded text track cue, for example during live streaming when the time of the next data sample is unknown or when the duration of the media is unknown.

A WebVMT cue with negative cue times maintains timing information prior to the start of the media, for example to preserve speed information for a WebVMT path.

WebVMT Location

A WebVMT location consists of:

A location latitude: The latitude in degrees of the location.
A location longitude: The longitude in degrees of the location.
A location altitude: Optionally, the altitude in meters of the location.

Location information is provided in terms of World Geodetic System coordinates, WGS84. Altitude is measured in meters above the WGS84 ellipsoid, and should not be confused with the height above mean sea level.

WebVMT Map

A WebVMT map is the map viewport and provides a rendering area for WebVMT cues.

A WebVMT map consists of:

A map center location: The WebVMT location at the center of the map.
A map zoom radius: The radius in meters of the minimum area visible from the map center location.
A map interface object: The control interface object for the map.
A text track map: A WebVMT map associated with the text track.; By default, the text track map is set to null.

WebVMT Media

A WebVMT media is metadata for the linked media with which WebVMT cues are synchronized, for example audio or video.

A WebVMT media enables a web crawler to rapidly search media metadata by providing sufficient information to construct a time-metadata index of the linked media file without opening it. Search engine data throughput is reduced as only matching media files selected by the user need be read, and non-matching media files are not accessed at all. Care should be taken to maintain WebVMT media details correctly, for example when a media file is renamed.

A WebVMT media consists of:

A media URL: The URL of the linked media file.
A media MIME type: The MIME type of the linked media file.
A media start time: The global time and date at which the linked media file begins.
A media path: The path identifier which uniquely identifies the moving object capturing the linked media file.

WebVMT Command Structures

A WebVMT command is an instruction to display WebVMT metadata content.

A WebVMT command consists of one of the following components:

A WebVMT map control command;
A WebVMT zone fragment command;
A WebVMT path fragment command;
A WebVMT synchronized data command.

WebVMT commands are executed in order from first to last in the WebVMT file.

Map Controls

A WebVMT map control command controls map presentation.

A WebVMT map control command consists of one of the following components:

A WebVMT pan command.
A WebVMT zoom command.

A WebVMT pan is a command to set the location of the map center.

A WebVMT pan consists of:

A pan location: The WebVMT location to which the map center location pans.
A pan start time: The time at which the map starts panning towards the pan location.; The pan start time is set to the cue start time.
A pan end time: The time at which the map center location equals the pan location.; The pan end time may be defined as an absolute value, or calculated relative to the pan start time using a duration.

A WebVMT zoom is a command to set the level of detail of the map.

A WebVMT zoom consists of:

A zoom radius: The radius in meters of the map zoom radius.

Zones

A WebVMT zone consists of all the WebVMT zone fragments with the same zone identifier.

A WebVMT zone fragment command consists of one of the following components:

A WebVMT circle command.
A WebVMT polygon command.

A WebVMT circle is a command to annotate a circular area to the map.

A WebVMT circle consists of:

A zone identifier: The identifier shared by all the WebVMT zone fragments in the WebVMT zone.; By default, the zone identifier is set to null.
A circle location: The WebVMT location of the circle center.
A circle radius: The radius in meters of the circle.

A WebVMT polygon is a command to annotate a polygonal area to the map.

A WebVMT polygon consists of:

A zone identifier: The identifier shared by all the WebVMT zone fragments in the WebVMT zone.; By default, the zone identifier is set to null.
A list of WebVMT locations defining the polygon vertices.: Vertex locations are listed sequentially around the perimeter of the polygon. The last vertex should not repeat the value of the first, as this is implicit.

Paths

A WebVMT path consists of all the path segments with the same path identifier.

A path segment consists of a sequence of contiguous WebVMT path fragments that describe the trajectory of an object moving through the mapped space.

A WebVMT path may include non-contiguous path segments, but each path segment must contain a sequence of contiguous WebVMT path fragments.

A path segment consists of the following components, in the order given:

One WebVMT move command;
Zero or more WebVMT line commands.

A WebVMT path fragment command consists of the one of the following components:

A WebVMT move command;
A WebVMT line command.

A WebVMT move command sets the start location of the first WebVMT path fragment in a path segment.

A WebVMT move consists of:

A path identifier: The identifier shared by all the path segments in the WebVMT path.; By default, the path identifier is set to null.
A fragment start time: The time at which the WebVMT path fragment starts.; The fragment start time is set to the cue start time.
A fragment start location: The WebVMT location at the fragment start time.

A WebVMT line command sets the end location of the WebVMT path fragment. The fragment start location is set by the preceding WebVMT path fragment in the WebVMT path.

A WebVMT line consists of:

A path identifier: The identifier shared by all the path segments in the WebVMT path.; By default, the path identifier is set to null.
A fragment end time: The time at which the WebVMT path fragment ends.; By default, the fragment end time is set to the cue end time.; The fragment end time may be defined as an absolute value, or calculated relative to the fragment start time using a duration.
A fragment end location: The WebVMT location at the fragment end time.

A WebVMT line is a straight line from the start location to the end location. The location of the moving object can be linearly interpolated between the fragment start time and the fragment end time.

Synchronized Data

A WebVMT synchronized data command synchronizes a sample from a data source with a WebVMT cue.

A WebVMT synchronized data command consists of:

A synchronized data object: An arbitrary object representing the raw sample from the data source.
A synchronized data type: An associated data type, e.g. org.geojson.
Optionally, a synchronized data identifier: An identifier shared by all samples from the same data source over time, e.g. for interpolation.
Optionally, a synchronized path identifier: A path identifier associated with the data source, e.g. a moving sensor.

Command Interpolation

A WebVMT interpolation describes how WebVMT command attributes change from a start value to an end value over a time interval.

A WebVMT interpolation consists of:

Interpolation attributes: The attribute data associated with a WebVMT command.
An interpolation start time: The time at which the interpolation starts.; By default, the interpolation start time is set to the cue start time.
An interpolation start value: The interpolation start value is set to the value of the interpolation attributes at the interpolation start time.
An interpolation end time: The time at which the interpolation ends.; The interpolation end time may be defined as an absolute value, or calculated relative to the interpolation start time using a duration.
An interpolation end value: The interpolation end value is set to the value of the interpolation attributes at the interpolation end time.

Syntax

WebVMT File Structure

A WebVMT file must consist of a WebVMT file body encoded as UTF-8 and labeled with the MIME type text/vmt.

A WebVMT file body consists of the following components, in the order given:

An optional U+FEFF BYTE ORDER MARK (BOM) character.
The string "WEBVMT" (W U+0057 LATIN CAPITAL LETTER W, E U+0045 LATIN CAPITAL LETTER E, B U+0042 LATIN CAPITAL LETTER B, V U+0056 LATIN CAPITAL LETTER V, M U+004D LATIN CAPITAL LETTER M, T U+0054 LATIN CAPITAL LETTER T).
Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER TABULATION (tab) character followed by any number of characters that are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters.
Two or more WebVMT line terminators to terminate the line with the file magic and separate it from the rest of the body.
The following components, in any order, separated from each other by one or more WebVMT line terminators.

A WebVMT media metadata block.
A WebVMT map initialisation block.
Zero or more WebVMT style blocks and WebVMT comment blocks separated from each other by one or more WebVMT line terminators.

Zero or more WebVMT line terminators.
Zero or more WebVMT cue blocks and WebVMT comment blocks separated from each other by one or more WebVMT line terminators.
Zero or more WebVMT line terminators.

A WebVMT line terminator consists of one of the following:

A U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair.
A single U+000A LINE FEED (LF) character.
A single U+000D CARRIAGE RETURN (CR) character.

A WebVMT media metadata block consists of the following components, in the order given:

The string "MEDIA" (M U+004D LATIN CAPITAL LETTER M, E U+0045 LATIN CAPITAL LETTER E, D U+0044 LATIN CAPITAL LETTER D, I U+0049 LATIN CAPITAL LETTER I, A U+0041 LATIN CAPITAL LETTER A).
Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
A WebVMT line terminator.
A WebVMT media settings list.
A WebVMT line terminator.

The WebVMT media metadata block provides hints about the linked media file for web crawlers and search engines.

A WebVMT map initialisation block consists of the following components, in the order given:

The string "MAP" (M U+004D LATIN CAPITAL LETTER M, A U+0041 LATIN CAPITAL LETTER A, P U+0050 LATIN CAPITAL LETTER P).
Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
A WebVMT line terminator.
A WebVMT map settings list.
A WebVMT line terminator.

The WebVMT map initialisation block defines the state of the WebVMT map before any WebVMT cues are active.

A WebVMT style block consists of the following components, in the order given:

The string "STYLE" (S U+0053 LATIN CAPITAL LETTER S, T U+0054 LATIN CAPITAL LETTER T, Y U+0059 LATIN CAPITAL LETTER Y, L U+004C LATIN CAPITAL LETTER L, E U+0045 LATIN CAPITAL LETTER E).
Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
A WebVMT line terminator.
Any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by a WebVMT line terminator, except that the entire resulting string must not contain the substring "-->" (- U+002D HYPHEN-MINUS, - U+002D HYPHEN-MINUS, > U+003E GREATER-THAN SIGN). The string represents a CSS style sheet; the requirements given in the relevant CSS specifications apply.
A WebVMT line terminator.

A WebVMT cue block consists of the following components, in the order given:

Optionally, a WebVMT cue identifier followed by a WebVMT line terminator.
WebVMT cue timings.
Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
A WebVMT line terminator.
The WebVMT cue payload consists of a WebVMT metadata text, but must not contain the substring "-->" (- U+002D HYPHEN-MINUS, - U+002D HYPHEN-MINUS, > U+003E GREATER-THAN SIGN).
A WebVMT line terminator.

A WebVMT cue block corresponds to one piece of time-aligned data in the WebVMT file. The WebVMT cue payload is the data associated with the WebVMT cue.

A WebVMT cue identifier is any sequence of one or more characters not containing the substring "-->" (- U+002D HYPHEN-MINUS, - U+002D HYPHEN-MINUS, > U+003E GREATER-THAN SIGN), nor containing any U+000A LINE FEED (LF) characters or U+000D CARRIAGE RETURN (CR) characters.

A WebVMT cue identifier must be unique amongst all the WebVMT cue identifiers of all WebVMT cues of a WebVMT file.

A WebVMT cue identifier can be used to identify a specific cue, for example from script or CSS.

The WebVMT cue timings part of a WebVMT cue block consists of the following components, in the order given:

A WebVMT timestamp representing the start time offset of the cue. The time represented by this WebVMT timestamp must be greater than or equal to the start time offsets of all previous cues in the file.
One or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
The string "-->" (- U+002D HYPHEN-MINUS, - U+002D HYPHEN-MINUS, > U+003E GREATER-THAN SIGN).
One or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
Optionally, a WebVMT timestamp representing the end time offset of the cue. The time represented by this WebVMT timestamp must be greater than or equal to the start time offset of the cue.

The WebVMT cue timings give the start and end offsets of the WebVMT cue block. Different cues can overlap. Cues are always listed ordered by their start time.

A WebVMT timestamp consists of the following components, in the order given:

Optionally (required if hours is non-zero):
1. Two or more ASCII digits, representing the hours as a base ten integer.
2. A : U+003A COLON character.
Two ASCII digits, representing the minutes as a base ten integer in the range 0 ≤ minutes ≤ 59.
A : U+003A COLON character.
Two ASCII digits, representing the seconds as a base ten integer in the range 0 ≤ seconds ≤ 59.
A . U+002E FULL STOP character.
Three ASCII digits, representing the thousandths of a second seconds-frac as a base ten integer.

A WebVMT timestamp is always interpreted relative to the current playback position of the media data with which the WebVMT file is to be synchronized.

A WebVMT comment block consists of the following components, in the order given:

The string "NOTE" (N U+004E LATIN CAPITAL LETTER N, O U+004F LATIN CAPITAL LETTER O, T U+0054 LATIN CAPITAL LETTER T, E U+0045 LATIN CAPITAL LETTER E).
Optionally, the following components, in the order given:
1. Either:
  - A U+0020 SPACE character or U+0009 CHARACTER TABULATION (tab) character.
  - A WebVMT line terminator.
2. Any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by a WebVMT line terminator, except that the entire resulting string must not contain the substring "-->" (- U+002D HYPHEN-MINUS, - U+002D HYPHEN-MINUS, > U+003E GREATER-THAN SIGN).
A WebVMT line terminator.

A WebVMT comment block is ignored by the parser.

WebVMT Cue Payload

WebVMT metadata text consists of any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by a WebVMT line terminator. (In other words, any text that does not have two consecutive WebVMT line terminators and does not start or end with a WebVMT line terminator.)

The string represents a WebVMT command list.

WebVMT metadata text cues are only useful for scripted applications (e.g. using the metadata text track kind in a HTML text track).

WebVMT Media Settings

The WebVMT media settings list consists of zero or more of the following components, in any order, separated from each other by one or more U+0020 SPACE characters, U+0009 CHARACTER TABULATION (tab) characters, or WebVMT line terminators, except that the string must not contain two consecutive WebVMT line terminators. Each component must not be included more than once per WebVMT media settings list string.

A WebVMT media url setting.
A WebVMT media MIME type setting.
A WebVMT media start time setting.
A WebVMT media path setting.

A WebVMT media url setting consists of the following components, in the order given:

The string "url".
A : U+003A COLON character.
A valid URL.

For the purpose of resolving a URL in the MEDIA block of a WebVMT file, or any URLs in resources referenced from MEDIA blocks of a WebVMT file, if the URL’s scheme is not "data", then the user agent must act as if the URL failed to resolve. If the url value does not match the src attribute of the HTML <track> element, then the src value takes precedence.

A WebVMT media MIME type setting consists of the following components, in the order given:

The string "mime-type".
A : U+003A COLON character.
A valid MIME type.

A WebVMT media start time setting consists of the following components, in the order given:

The string "start-time".
A : U+003A COLON character.
A valid global date and time string.

WebVMT media start time setting should include millisecond data in order to allow the WebVMT file to be accurately synchronized with Coordinated Universal Time (UTC).

A WebVMT media path setting consists of the following components, in the order given:

The string "path".
A : U+003A COLON character.
A WebVMT path identifier.

WebVMT Map Settings

The WebVMT map settings list consists of the following components, in any order, separated from each other by one or more U+0020 SPACE characters, U+0009 CHARACTER TABULATION (tab) characters, or WebVMT line terminators, except that the string must not contain two consecutive WebVMT line terminators. Each component must be included once per WebVMT map settings list string.

A WebVMT map center latitude setting.
A WebVMT map center longitude setting.
Optionally, a WebVMT map center altitude setting.
A WebVMT map zoom setting.

The WebVMT map settings list defines the WebVMT map state before the first cue is active.

A WebVMT map center latitude setting consists of a WebVMT latitude setting.

A WebVMT map center longitude setting consists of a WebVMT longitude setting.

A WebVMT map center altitude setting consists of a WebVMT altitude setting.

When interpreted as numbers, the WebVMT map center latitude setting, WebVMT map center longitude setting and WebVMT map center altitude setting values represent the map center location.

A WebVMT latitude setting consists of the following components, in the order given:

The string "lat".
A : U+003A COLON character.
A WebVMT latitude.

A WebVMT latitude consists of the following components, in the order given:

Optionally, a - U+002D HYPHEN-MINUS character.
One or more ASCII digits.
Optionally:
1. A . U+002E DOT character.
2. One or more ASCII digits.

When interpreted as a number, a WebVMT latitude must be in the range -90..+90.

A WebVMT longitude setting consists of the following components, in the order given:

The string "lng".
A : U+003A COLON character.
A WebVMT longitude.

A WebVMT longitude consists of the following components, in the order given:

Optionally, a - U+002D HYPHEN-MINUS character.
One or more ASCII digits.
Optionally:
1. A . U+002E DOT character.
2. One or more ASCII digits.

When interpreted as a number, a WebVMT longitude must be in the range -180..+180.

A WebVMT altitude setting consists of the following components, in the order given:

The string "alt".
A : U+003A COLON character.
A WebVMT altitude.

A WebVMT altitude consists of the following components, in the order given:

Optionally, a - U+002D HYPHEN-MINUS character.
One or more ASCII digits.
Optionally:
1. A . U+002E DOT character.
2. One or more ASCII digits.

When interpreted as a number, a WebVMT altitude represents the height in meters above the WGS84 ellipsoid. Care should be taken not to confuse this with the height above mean sea level.

A WebVMT map zoom setting consists of the following components, in the order given:

The string "rad".
A : U+003A COLON character.
One or more ASCII digits.
Optionally:
1. A . U+002E DOT character.
2. One or more ASCII digits.

When interpreted as a number, the WebVMT map zoom setting must be positive and represents the map zoom radius.

WebVMT Commands

A WebVMT command list consists of one or more of the following components in any order, separated from each other by a WebVMT line terminator:

A WebVMT map control command.
A WebVMT zone annotation command.
A WebVMT path annotation command.
A WebVMT synchronized data command.

WebVMT Map Commands

A WebVMT map control command consists of one of the following components:

A WebVMT pan command.
A WebVMT zoom command.

A WebVMT pan command consists of a JSON text representing the following JSON object:

The JSON string "pan-to".
A JSON value consisting of a WebVMT pan parameter list.

A WebVMT pan parameter list is a JSON object representing the following components in any order:

A WebVMT pan latitude attribute.
A WebVMT pan longitude attribute.
Optionally, a WebVMT pan altitude attribute.
Optionally, one of the following components:
- A WebVMT pan end time attribute.
- A WebVMT pan duration attribute.

A WebVMT pan latitude attribute consists of a WebVMT latitude attribute.

A WebVMT pan longitude attribute consists of a WebVMT longitude attribute.

A WebVMT pan altitude attribute consists of a WebVMT altitude attribute.

A WebVMT pan end time attribute consists of a WebVMT end time attribute.

A WebVMT pan duration attribute consists of a WebVMT duration attribute.

A WebVMT zoom command consists of a JSON text representing the following JSON object:

The JSON string "zoom".
A JSON value consisting of a WebVMT zoom parameter list.

A WebVMT zoom parameter list is a JSON object representing the following component:

A WebVMT zoom radius attribute.

A WebVMT zoom radius attribute consists of a WebVMT radius attribute.

When interpreted as a number, the WebVMT zoom radius attribute value represents the map zoom radius.

A WebVMT radius attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "rad".
A : U+003A COLON character.
A JSON value consisting of a JSON number greater than zero.

WebVMT Zone Commands

A WebVMT zone annotation command consists of one of the following components:

A WebVMT circle command.
A WebVMT polygon command.

A WebVMT circle command consists of a JSON text representing the following JSON object:

The JSON string "circle".
A JSON value consisting of a WebVMT circle parameter list.

A WebVMT circle parameter list consists of a JSON object representing the following components in any order:

A WebVMT circle center latitude attribute.
A WebVMT circle center longitude attribute.
Optionally, a WebVMT circle center altitude attribute.
A WebVMT circle radius attribute.
Optionally, a WebVMT zone attribute.

A WebVMT circle center latitude attribute consists of a WebVMT latitude attribute.

A WebVMT circle center longitude attribute consists of a WebVMT longitude attribute.

A WebVMT circle center altitude attribute consists of a WebVMT altitude attribute.

A WebVMT circle radius attribute consists of a WebVMT radius attribute.

A WebVMT zone attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "zone".
A : U+003A COLON character.
A JSON value consisting of a JSON string representing a WebVMT zone identifier.

A WebVMT zone identifier is any sequence of one or more characters not containing the substring "-->" (- U+002D HYPHEN-MINUS, - U+002D HYPHEN-MINUS, > U+003E GREATER-THAN SIGN), nor containing any U+000A LINE FEED (LF) characters or U+000D CARRIAGE RETURN (CR) characters.

A WebVMT zone identifier is a string which uniquely identifies a zone in the WebVMT file, for example a safety zone around a moving object.

A WebVMT polygon command consists of a JSON text representing the following JSON object:

The JSON string "polygon".
A JSON value consisting of a WebVMT polygon parameter list.
Optionally, a WebVMT zone attribute.

A WebVMT polygon parameter list consists of the following JSON object:

A WebVMT zone perimeter list.

A WebVMT zone perimeter list consists of the following JSON object:

The JSON string "perim".
A JSON value consisting of a WebVMT vertices list.

A WebVMT vertices list consists of a JSON array of three or more JSON objects each representing a WebVMT location attribute list.

A WebVMT location attribute list consists of a JSON text representing a list of the following JSON values in any order, separated from each other by a , U+002C COMMA character:

A WebVMT latitude attribute.
A WebVMT longitude attribute.
Optionally, a WebVMT altitude attribute.

A WebVMT latitude attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "lat".
A : U+003A COLON character.
A JSON value consisting of a JSON number.

When interpreted as a number, a WebVMT latitude attribute must be in the range -90..+90.

A WebVMT longitude attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "lng".
A : U+003A COLON character.
A JSON value consisting of a JSON number.

When interpreted as a number, a WebVMT longitude attribute must be in the range -180..+180.

A WebVMT altitude attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "alt".
A : U+003A COLON character.
A JSON value consisting of a JSON number.

When interpreted as a number, a WebVMT altitude represents the height in meters above the WGS84 ellipsoid. Care should be taken not to confuse this with the height above mean sea level.

WebVMT Path Commands

A WebVMT path annotation command consists of one of the following components:

A WebVMT move command.
A WebVMT line command.

A WebVMT move command consists of a JSON text representing the following JSON object:

The JSON string "move-to".
A JSON value consisting of a WebVMT move parameter list.

A WebVMT move parameter list is a JSON object representing the following components in any order:

A WebVMT fragment start latitude attribute.
A WebVMT fragment start longitude attribute.
Optionally, a WebVMT fragment start altitude attribute.
Optionally, a WebVMT path attribute.

A WebVMT fragment start latitude attribute consists of a WebVMT latitude attribute.

A WebVMT fragment start longitude attribute consists of a WebVMT longitude attribute.

A WebVMT fragment start altitude attribute consists of a WebVMT altitude attribute.

A WebVMT path attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "path".
A : U+003A COLON character.
A JSON value consisting of a JSON string representing a WebVMT path identifier.

A WebVMT path identifier is any sequence of one or more characters not containing the substring "-->" (- U+002D HYPHEN-MINUS, - U+002D HYPHEN-MINUS, > U+003E GREATER-THAN SIGN), nor containing any U+000A LINE FEED (LF) characters or U+000D CARRIAGE RETURN (CR) characters.

A WebVMT path identifier is a string which uniquely identifies a moving object in the WebVMT file, for example a camera.

A WebVMT line command consists of a JSON text representing the following JSON object:

The JSON string "line-to".
A JSON value consisting of a WebVMT line parameter list.

A WebVMT line parameter list consists of a JSON object representing the following components in any order:

A WebVMT fragment end latitude attribute.
A WebVMT fragment end longitude attribute.
Optionally, a WebVMT fragment end altitude attribute.
Optionally, a WebVMT path attribute.
Optionally, one of the following components:
- A WebVMT fragment end time attribute.
- A WebVMT fragment duration attribute.

A WebVMT fragment end latitude attribute consists of a WebVMT latitude attribute.

A WebVMT fragment end longitude attribute consists of a WebVMT longitude attribute.

A WebVMT fragment end altitude attribute consists of a WebVMT altitude attribute.

A WebVMT fragment end time attribute consisting of a WebVMT end time attribute.

A WebVMT fragment duration attribute consisting of a WebVMT duration attribute.

WebVMT Synchronization Command

A WebVMT synchronized data command consists of a JSON text representing the following JSON object:

The JSON string "sync".
A JSON value consisting of a WebVMT synchronized parameter list.

A WebVMT synchronized parameter list consists of a JSON object representing the following components in any order:

A WebVMT synchronized type attribute.
A WebVMT synchronized data attribute.
Optionally, a WebVMT synchronized identifier attribute.
Optionally, a WebVMT synchronized path attribute.

A WebVMT synchronized type attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "type".
A : U+003A COLON character.
A JSON string representing a synchronized data type.

A WebVMT synchronized data attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "data".
A : U+003A COLON character.
A JSON object representing a synchronized data object.

A WebVMT synchronized identifier attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "id".
A : U+003A COLON character.
A JSON string representing a synchronized data identifier.

A WebVMT synchronized path attribute consists of a WebVMT path attribute representing a synchronized path identifier.

WebVMT Command Interpolation Attributes

A WebVMT end time attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "end".
A : U+003A COLON character.
A JSON string representing a WebVMT timestamp.

A WebVMT end time attribute represents the time at which an interpolation ends.

A WebVMT duration attribute consists of a JSON text consisting of the following components in the order given:

The JSON string "dur".
A : U+003A COLON character.
A JSON string representing a WebVMT timespan.

A WebVMT duration attribute represents the time interval for which an interpolation lasts.

A WebVMT timespan is the positive time offset between two WebVMT timestamps and is represented in WebVMT timestamp format.

Properties Of Cue Sequences

WebVMT File Using Only Nested Cues

A WebVMT file whose cues all comply with the following rule is said to be a WebVMT file using only nested cues.

Given any two cues cue1 and cue2 with start and end time offsets (x1, y1) and (x2, y2) respectively:

either cue1 lies fully within cue2, i.e. x1 >= x2 and y1 <= y2;
or cue1 fully contains cue2, i.e. x1 <= x2 and y1 >= y2.

The following example matches this definition:

WEBVMT

NOTE Required blocks omitted for clarity

00:00.000 --> 01:24.000
{ "circle": { "lat": 0, "lng": 0, "rad": 2000 } }

00:00.000 --> 00:44.000
{ "move-to": { "lat": 0, "lng": 0, "path": "cam-1" } }
{ "line-to": { "lat": 0.12, "lng": 0.34, "path": "cam-1" } }

00:44.000 --> 01:19.000
{ "line-to": { "lat": 0.56, "lng": 0.78, "path": "cam-1" } }

01:24.000 --> 05:00.000
{ "circle": { "lat": 0, "lng": 0, "rad": 30000 } }

01:35.000 --> 03:00.000
{ "move-to": { "lat": 0.87, "lng": 0.65, "path": "cam-2" } }
{ "line-to": { "lat": 0.43, "lng": 0.21, "path": "cam-2" } }

03:00.000 --> 05:00.000
{ "line-to": { "lat": 0, "lng": 0, "path": "cam-2" } }

Notice how you can express the cues in this WebVMT file as a tree structure:

2km Circle at (0, 0)
- Line (0, 0) to (0.12, 0.34)
- Line (0.12, 0.34) to (0.56, 0.78)
30km Circle at (0, 0)
- Line (0.87, 0.65) to (0.43, 0.21)
- Line (0.43, 0.21) to (0, 0)

If the file has cues that can’t be expressed in this fashion, then they don’t match the definition of a WebVMT file using only nested cues. For example:

WEBVMT

NOTE Required blocks omitted for clarity

00:00.000 --> 01:00.000
{ "move-to": { "lat": 0.12, "lng": 0.34, "path": "cam-3" } }
{ "line-to": { "lat": 0.56, "lng": 0.78, "path": "cam-3" } }

00:30.000 --> 01:30.000
{ "move-to": { "lat": 0.87, "lng": 0.65, "path": "cam-4" } }
{ "line-to": { "lat": 0.43, "lng": 0.21, "path": "cam-4" } }

In this ninety-second example, the two cues partly overlap, with the first ending before the second ends and the second starting before the first ends. This therefore is not a WebVMT file using only nested cues.

Parsing

WebVMT file parsing is similar to WebVTT parsing, though many of those steps can be skipped as WebVMT files are metadata files.

WebVMT File Parsing

A WebVMT parser, given an input byte stream, a text track list of cues |output|, and a collection of CSS style sheets |stylesheets|, must decode the byte stream using the UTF-8 decode algorithm, and then must parse the resulting string according to the WebVMT parser algorithm. This results in WebVMT cues being added to |output|, and CSS style sheets being added to |stylesheets|.

A WebVMT parser, specifically its conversion and parsing steps, is typically run asynchronously, with the input byte stream being updated incrementally as the resource is downloaded; this is called an incremental WebVMT parser.

A WebVMT parser verifies a file signature before parsing the provided byte stream. If the stream lacks this WebVMT file signature, then the parser aborts.

The WebVMT parser algorithm is as follows:

Let |input| be the string being parsed, after conversion to Unicode, and with the following transformations applied:
- Replace all � U+0000 NULL characters by � U+FFFD REPLACEMENT CHARACTERs.
- Replace each U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair by a single U+000A LINE FEED (LF) character.
- Replace all remaining U+000D CARRIAGE RETURN (CR) characters by U+000A LINE FEED (LF) characters.
Let |position| be a pointer into |input|, initially pointing at the start of the string. In an incremental WebVMT parser, when this algorithm (or further algorithms that it uses) moves the |position| pointer, the user agent must wait until appropriate further characters from the byte stream have been added to |input| before moving the pointer, so that the algorithm never reads past the end of the |input| string. Once the byte stream has ended, and all characters have been added to |input|, then the |position| pointer may, when so instructed by the algorithms, be moved past the end of |input|.
Let |seen cue| be false.
If |input| is less than six characters long, then abort these steps. The file does not start with the correct WebVMT file signature and was therefore not successfully processed.
If |input| is exactly six characters long but does not exactly equal "WEBVMT", then abort these steps. The file does not start with the correct WebVMT file signature and was therefore not successfully processed.
If |input| is more than six characters long but the first six characters do not exactly equal "WEBVMT", or the seventh character is not a U+0020 SPACE character, a U+0009 CHARACTER TABULATION (tab) character, or a U+000A LINE FEED (LF) character, then abort these steps. The file does not start with the correct WebVMT file signature and was therefore not successfully processed.
Collect a sequence of code points that are not U+000A LINE FEED (LF) characters.
If |position| is past the end of |input|, then abort these steps. The file was successfully processed, but it contains no useful data and so no WebVMT cues were added to |output|.
The character indicated by |position| is a U+000A LINE FEED (LF) character. Advance |position| to the next character in |input|.
If |position| is past the end of |input|, then abort these steps. The file was successfully processed, but it contains no useful data and so no WebVMT cues were added to output.
Header: If the character indicated by |position| is not a U+000A LINE FEED (LF) character, then collect a WebVMT block with the |in header| flag set. Otherwise, advance |position| to the next character in |input|.
Collect a sequence of code points that are U+000A LINE FEED (LF) characters.
Let |map| be null.
Let |media metadata| be null.
Block loop: While |position| doesn’t point past the end of |input|:
1. Collect a WebVMT block, and let |block| be the returned value.
2. If |block| is a WebVMT cue, add |block| to the text track list of cues |output|.
3. Otherwise, if |block| is a CSS style sheet, add |block| to |stylesheets|.
4. Otherwise, if |block| is a WebVMT map object, let |map| be |block|.
5. Otherwise, if |block| is a WebVMT media object, let |media metadata| be |block|.
6. Collect a sequence of code points that are U+000A LINE FEED (LF) characters.
End: The file has ended. Abort these steps. The WebVMT parser has finished. The file was successfully processed.

When the algorithm above says to collect a WebVMT block, optionally with a flag |in header| set, the user agent must run the following steps:

Let |input|, |position| and |seen cue| be the same variables as those of the same name in the algorithm that invoked these steps.
Let |line count| be zero.
Let |previous position| be |position|.
Let |line| be the empty string.
Let |buffer| be the empty string.
Let |seen EOF| be false.
Let |seen arrow| be false.
Let |cue| be null.
Let |stylesheet| be null.
Let |map| be null.
Let |media metadata| be null.
Loop: Run these substeps in a loop:
1. Collect a sequence of code points that are not U+000A LINE FEED (LF) characters. Let |line| be those characters, if any.
2. Increment |line count| by 1.
3. If |position| is past the end of |input|, let |seen EOF| be true. Otherwise, the character indicated by |position| is a U+000A LINE FEED (LF) character; advance |position| to the next character in |input|.
4. If |line| contains the three-character substring "-->" (- U+002D HYPHEN-MINUS, - U+002D HYPHEN-MINUS, > U+003E GREATER-THAN SIGN), then run these substeps:
  1. If |in header| is not set and at least one of the following conditions are true:
    - |line count| is 1
    - |line count| is 2 and |seen arrow| is false
    ...then run these substeps:
    1. Let |seen arrow| be true.
    2. Let |previous position| be |position|.
    3. Cue creation: Let |cue| be a new WebVMT cue and initialize it as follows:
      1. Let |cue|'s text track cue identifier be |buffer|.
      2. Let |cue|'s text track cue pause-on-exit flag be false.
      3. Let |cue|'s cue text be the empty string.
    4. Collect WebVMT cue timings from |line| for |cue|. If that fails, let |cue| be null. Otherwise, let |buffer| be the empty string and let |seen cue| be true.
    Otherwise, let |position| be |previous position| and break out of loop.
5. Otherwise, if |line| is the empty string, break out of loop.
6. Otherwise, run these substeps:
  1. If |in header| is not set and |line count| is 2, run these substeps:
    1. If |seen cue| is false and |buffer| starts with the substring "STYLE" (S U+0053 LATIN CAPITAL LETTER S, T U+0054 LATIN CAPITAL LETTER T, Y U+0059 LATIN CAPITAL LETTER Y, L U+004C LATIN CAPITAL LETTER L, E U+0045 LATIN CAPITAL LETTER E), and the remaining characters in |buffer| (if any) are all ASCII whitespace, then run these substeps:
      1. Let |stylesheet| be the result of creating a CSS style sheet, with the following properties:
        
        location
        
        null
        
        parent CSS style sheet
        
        null
        
        owner node
        
        null
        
        owner CSS rule
        
        null
        
        media
        
        The empty string.
        
        title
        
        The empty string.
        
        alternate flag
        
        Unset.
        
        origin-clean flag
        
        Set.
      2. Let |buffer| be the empty string.
    2. Otherwise, if |seen cue| is false and |buffer| starts with the substring "MAP" (M U+004D LATIN CAPITAL LETTER M, A U+0041 LATIN CAPITAL LETTER A, P U+0050 LATIN CAPITAL LETTER P), and the remaining characters in |buffer| (if any) are all ASCII whitespace, then run these substeps:
      1. Map creation: Let |map| be a new WebVMT map.
      2. Let |buffer| be the empty string.
    3. Otherwise, if |seen cue| is false and |buffer| starts with the substring "MEDIA" (M U+004D LATIN CAPITAL LETTER M, E U+0045 LATIN CAPITAL LETTER E, D U+0044 LATIN CAPITAL LETTER D, I U+0049 LATIN CAPITAL LETTER I, A U+0041 LATIN CAPITAL LETTER A), and the remaining characters in |buffer| (if any) are all ASCII whitespace, then run these substeps:
      1. Media creation: Let |media metadata| be a new WebVMT media.
      2. Let |buffer| be the empty string.
  2. If |buffer| is not the empty string, append a U+000A LINE FEED (LF) character to |buffer|.
  3. Append |line| to |buffer|.
  4. Let |previous position| be |position|.
7. If |seen EOF| is true, break out of loop.
If |cue| is not null, let the cue text of |cue| be |buffer|, and return |cue|.
Otherwise, if |stylesheet| is not null, then parse a stylesheet from |buffer|. If it returned a list of rules, assign the list as |stylesheet|'s CSS rules; otherwise, set |stylesheet|'s CSS rules to an empty list. Finally, return |stylesheet|.
Otherwise, if |map| is not null, then collect WebVMT map settings from |buffer| using |map| for the results. Construct a WebVMT map object from |map|, and return it.
Otherwise, if |media metadata| is not null, then collect WebVMT media settings from |buffer| using |media metadata| for the results. Construct a WebVMT media object from |media metadata|, and return it.
Otherwise, return null.

WebVMT Map Settings Parsing

When the WebVMT parser algorithm says to collect WebVMT map settings from a string |input| for a text track, the user agent must run the following algorithm.

A WebVMT map object is a conceptual construct to represent a WebVMT map that is used as a root node for WebVMT node objects. This algorithm returns a WebVMT map object.

Let |settings| be the result of splitting |input| on spaces.
For each token |setting| in the list |settings|, run the following substeps:
1. If |setting| does not contain a : U+003A COLON character, or if the first : U+003A COLON character in |setting| is either the first or last character of |setting|, then jump to the step labeled next setting.
2. Let |name| be the leading substring of |setting| up to and excluding the first : U+003A COLON character in that string.
3. Let |value| be the trailing substring of |setting| starting from the character immediately after the first : U+003A COLON character in that string.
4. Run the appropriate substeps that apply for the value of |name|, as follows:
5. Next setting: Continue to the next setting, if any.

WebVMT Media Settings Parsing

When the WebVMT parser algorithm says to collect WebVMT media settings from a string |input| for a text track, the user agent must run the following algorithm.

A WebVMT media object is a conceptual construct to represent a WebVMT media. This algorithm returns a WebVMT media object.

Let |settings| be the result of splitting |input| on spaces.
For each token |setting| in the list |settings|, run the following substeps:
1. If |setting| does not contain a : U+003A COLON character, or if the first : U+003A COLON character in |setting| is either the first or last character of |setting|, then jump to the step labeled next setting.
2. Let |name| be the leading substring of |setting| up to and excluding the first : U+003A COLON character in that string.
3. Let |value| be the trailing substring of |setting| starting from the character immediately after the first : U+003A COLON character in that string.
4. Run the appropriate substeps that apply for the value of |name|, as follows:
5. Next setting: Continue to the next setting, if any.

WebVMT Cue Timings Parsing

When the algorithm above says to collect WebVMT cue timings from a string |input| for a WebVMT cue |cue|, the user agent must run the following algorithm.

Let |input| be the string being parsed.
Let |position| be a pointer into |input|, initially pointing at the start of the string.
Skip whitespace.
Collect a WebVMT timestamp. If that algorithm fails, then abort these steps and return failure. Otherwise, let |cue|'s text track cue start time be the collected time.
Skip whitespace.
If the character at |position| is not a - U+002D HYPHEN-MINUS character then abort these steps and return failure. Otherwise, move |position| forwards one character.
If the character at |position| is not a - U+002D HYPHEN-MINUS character then abort these steps and return failure. Otherwise, move |position| forwards one character.
If the character at |position| is not a > U+003E GREATER-THAN SIGN character then abort these steps and return failure. Otherwise, move |position| forwards one character.
Skip whitespace.
If |position| is not past the end of |input| and the character at |position| is an ASCII digit, collect a WebVMT timestamp. If that algorithm fails, then abort these steps and return failure. Otherwise, let |cue|'s text track cue end time be the collected time.
Otherwise (|position| is past the end of |input| or the character at |position| is not an ASCII digit), let |cue|'s text track cue end time be the value positive Infinity.

When this specification says that a user agent is to collect a WebVMT timestamp, the user agent must run the following steps:

Let |input| and |position| be the same variables as those of the same name in the algorithm that invoked these steps.
Let |most significant units| be minutes.
If |position| is past the end of |input|, return an error and abort these steps.
If the character at |position| is a - U+002D HYPHEN-MINUS character, let |negative| be true and move |position| forwards one character. Otherwise, let |negative| be false.
If |position| is beyond the end of |input| or if the character indicated by |position| is not an ASCII digit, then return an error and abort these steps.
Collect a sequence of code points that are ASCII digits, and let |string| be the collected substring.
Interpret |string| as a base-ten integer. Let |value1| be that integer.
If |string| is not exactly two characters in length, or if |value1| is greater than 59, let |most significant units| be hours.
If |position| is beyond the end of |input| or if the character at |position| is not a : U+003A COLON character, then return an error and abort these steps. Otherwise, move |position| forwards one character.
Collect a sequence of code points that are ASCII digits, and let |string| be the collected substring.
If |string| is not exactly two characters in length, return an error and abort these steps.
Interpret |string| as a base-ten integer. Let |value2| be that integer.
If |most significant units| is hours, or if |position| is not beyond the end of |input| and the character at |position| is a : U+003A COLON character, run these substeps:
1. If |position| is beyond the end of |input| or if the character at |position| is not a : U+003A COLON character, then return an error and abort these steps. Otherwise, move |position| forwards one character.
2. Collect a sequence of code points that are ASCII digits, and let |string| be the collected substring.
3. If |string| is not exactly two characters in length, return an error and abort these steps.
4. Interpret |string| as a base-ten integer. Let |value3| be that integer.
Otherwise (if |most significant units| is not hours, and either |position| is beyond the end of |input|, or the character at |position| is not a : U+003A COLON character), let |value3| have the value of |value2|, then |value2| have the value of |value1|, then let |value1| equal zero.
If |position| is beyond the end of |input| or if the character at |position| is not a . U+002E FULL STOP character, then return an error and abort these steps. Otherwise, move |position| forwards one character.
Collect a sequence of code points that are ASCII digits, and let |string| be the collected substring.
If |string| is not exactly three characters in length, return an error and abort these steps.
Interpret |string| as a base-ten integer. Let |value4| be that integer.
If |value2| is greater than 59 or if |value3| is greater than 59, return an error and abort these steps.
Let |result| be |value1|×60×60 + |value2|×60 + |value3| + |value4|/1000.
If |negative| is true, let |result| be 0 - |result|.
Return |result|.