The W3C Internationalization Tag Set 2.0 - developed by the W3C MultilingualWeb-LT Working Group enhances the foundation to integrate automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with is predecessor ITS 1.0 but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF).
The W3C MultilingualWeb-LT Working Group received funding by the European Commission (project LT-Web|) through the Seventh Framework Programme (FP7) in the area of Language Technologies (Grant Agreement No. 287815). As part of their activities, members of the Working Group and the LT-Web project created various implementations that exemplify how ITS 2.0 supports automated processing of human language into core Web technologies. These implementations/the corresponding usage scenarios are sketched in this document. Each section of the document comprises the following:
This document describes usage scenarios and related implementations for Internationalization Tag Set (ITS) 2.0. ITS 2.0 enhances the foundation to integrate both automated and manual processing of human language into core Web technologies.
The work described in this document receives funding by the European Commission (project MultilingualWeb-LT (LT-Web) ) through the Seventh Framework Programme (FP7) in the area of Language Technologies (Grant Agreement No. 287815).
The W3C Internationalization Tag Set 2.0 - developed by the W3C MultilingualWeb-LT Working Group enhances the foundation to integrate automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with is predecessor ITS 1.0 but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF).
The W3C MultilingualWeb-LT Working Group received funding by the European Commission (project MultilingualWeb-LT (LT-Web)) through the Seventh Framework Programme (FP7) in the area of Language Technologies (Grant Agreement No. 287815). As part of their activities, project members and members of the Working Group compiled a list of usage scenarios that exemplify how ITS 2.0 integrates automated processing of human language into core Web technologies. These usage scenarios - and implementations realized by the Working Group - are sketched in this document. The usage scenarios comprise information such as the following:
Benefits:
Tool: Okapi Framework (ENLASO).
Implementation status/issues:
Benefits:
Tool: Okapi Framework (ENLASO).
Implementation status/issues:
Benefits:
Tool: Okapi Framework (ENLASO).
Implementation status/issues:
Benefits:
Benefits:
Benefits:
Additional data category (not part of ITS 2.0):
Tools (developed by Linguaserve):
Implementation status:
Implementation issues:
Benefits:
Tools:
Implementation issues:
Implementation status/issues:
Benefits:
Benefits:
Benefits:
Implementer: TCD/UL, Making use of MT components by Moravia and DCU, and JSI Enrycher as Text Analysis service.
This tool is based on an ITS-XLIFF mapping:
Although all ITS categories listed above, as encoded by OKAPI or TCD's CMS-LION, are covered, the demos in mid March show consumption of mainly the following: translate, term, text analysis, domain, localization note, provenance, and MT confidence. The demos involve:
Please note that links to the running software are currently only accessible to the SOLAS system at the moment. They should become public next week.
Benefits:
Tool: Drupal Module for editing and viewing of ITS 2.0 markup (Cocomore AG)
Tool: Drupal Module to connect to TMGMT Translator Linguaserve (Cocomore AG)
Tool: Drupal Module to interact with TMGMT Workflow (Cocomore AG)
Tool: ITS 2.0 jQuery Plugin (Cocomore AG)
Localization interoperability can be enhanced by using not just ITS 2.0 as standard. In particular, the following standards provide additional opportunities:
Benefits:
Where available, and not already specified by explicit ITS provenance annotation, annotatorsRef was used to derive PROV-O agent details for specific activities, e.g. text analysis and terminology.
Details:
Benefits:
Implementation issues and need for discussion:
Benefits: The Web Service API can be integrated in automated language processing workflows, for instance, machine translation, localization, terminology management and many other tasks that may benefit from terminology annotation.
The implementation has reached Milestone 2 (Initial HTML5 term tagging with simple visualization). The implementation for the Milestone 3 (Enhanced HTML5 term tagging with full visualization) is ongoing.
XML-based source content such as XLIFF files is usually provided to translators or reviewers as reduced and partially transformed text without any information about local or global context or support for rendering/visualization of content itself or metadata embedded in the content. In sum this has negative effects on quality of final output and productivity of human workers.
The usage scenario allows rendering of content and metadata for easy and interactive reading it as a reference material in a browser. The rendering includes special visual cues, and interaction possibilities (such as colour-coding and pop-ups for metadata to be displayed). It is based on auxiliary files in HTML5+ITS 2.0 (including JavaScript) that are generated from ITS-annotated source content of any supported formats (XML, XLIFF, HTML).
Implementer: Logrus
Implementation status: Prototype will display Translate, Localization Note, and Terminology data categories at the MultilingualWeb Workshop March 2013.
ILO uses OKAPI capabilities for XLIFF handling and will be available in April 2013. The use of ILO will be presented at the MultilingualWeb Workshop March 2013. The results of ILO development will be given back to the public domain under the open licenses LGPL V3 (same as Libre Office).
Benefits:
Renat Bikmatov (Logrus), David Filip (University of Limerick), Leroy Finn (Trinity College Dublin), Karl Fritsche (Cocomore AG), Serge Gladkoff (Logrus), Declan Groves (Centre for Next Generation Localisation (CNGL), Dublin City University), Milan Karasek (Moravia), Jirka Kosek (University of Economics, Prague), Kevin Lew (Spartan Software), Dave Lewis (Trinity College Dublin), Fredrik Liden (ENLASO Corporation), Shaun McCane ((public) Invited expert), Sean Mooney (University of Limerick), Pablo Nieto Caride (Linguaserve), Pēteris Ņikiforovs (Tilde), David O'Carrol (University of Limerick), Philip O'Duffy (University of Limerick), Mauricio del Olmo (Linguaserve), Mārcis Pinnis (Tilde), Phil Ritchie (VistaTEC), Nieves Sande (German Research Center for Artificial Intelligence (DFKI) Gmbh), Felix Sasaki (W3C Fellow), Yves Savourel (ENLASO Corporation), Sebastian Sklarß (]init[ Europe), Ankit Srivastava (Centre for Next Generation Localisation (CNGL), Dublin City University), Tadej Štajner (Jozef Stefan Institute), Chase Tingley (Spartan Software), Asanka Wasala (University of Limerick), Clemens Weins (Cocomore AG).