Warning:
This wiki has been archived and is now read-only.

LSP Localization Chain Side Use Case Demonstration

From MultilingualWeb-LT EC Project Wiki
Jump to: navigation, search

1 Use Case Description

This implementation demonstrates the use of several ITS 2.0 data categories in an internal LSP localization workflow and is part of a broader showcase called CMS – Localization Chain Integration. The contents are generated in a language service client side CMS, sent to the LSP translation server, processed in the LSP internal localization workflow, downloaded from the client side and imported into the CMS.
In this use case only the LSP side will be described.
The interchange format will be XHTML (export from/import to the client CMS).
In this use case the following entities take part: client CMS, web services client, LSP web services server, LSP internal localization workflow, ITS pre-production/post-production XHTML engine, LSP-based Translation Process Managers, CAT tool, LSP-based Translators and LSP-based Reviewers.

The data categories used and its benefits are the following:

  • Translate: translate data category assures that pieces of content will not be translated.
  • Localization Note: this data category provides more context to the Process Managers, LSP-based Translators and LSP-based Reviewers with the aim that they do a better localization job.
  • Domain: this data category provides more information to the LSP-based Translators and LSP-based Reviewers. Also this information is used by the internal workflow to select the dictionaries that the LSP-based Translators and LSP-based Reviewers will use as support in the localization job. Lastly it is used to store, classify ans select the translation memories.
  • Language Information: expresses the language of a given content. It’s useful for selecting the LSP-based Translators and LSP-based Reviewers and the nature of the job. Also adds contextual information and helps them to decide if a piece of content will or will not be translated.
  • Allowed Characters: this data category allows a way for checking internal limitations in certain elements of a document for guaranteeing the proper functionality of the translated documents in the client side.
  • Storage Size: this data category allows a way for checking limitations in the size of a document or elements within a document for guaranteeing the proper functionality of the translated documents in the client side.
  • Provenance: this data category provides the information of the LSP-based Translator and Reviewer and the organization that has done the job for possible tracking issues. Also, if a second translation of the same content occurs, the system will propose the same Translator/Reviewer that did the job in first place.
  • Readiness: readiness provides the information to the LSP-based Translation Process Managers of when the content was ready to process, when the language service client wants the job to be fulfilled, the priority of the job in comparison with others potential contemporary ones and what processes are needed. All of this will have direct impact in how they organize the localization job (milestones and dates) and to arrange it with the LSP-based Translators and LSP-based Reviewers.

2 Use Case Implementation

The implementation of this use case involves the following components:

  • Linguaserve GBCC: B2B SOAP web services upon OFBiz. Internal Linguaserve development based on SOAP and web services upon OFBiz. Defines the operations and requests the client side can make to the Linguaserve server, being the most important of all of them the request for translation/revision and the download of translated contents.
  • Linguaserve PLINT: Internal localization workflow upon OFBiz. Internal Linguaserve development upon OFBiz. Allows the processing of the files sent by the client to the LSP-based Translation Process Managers. This workflow performs the necessary preprocessing and postprocessing of the ITS Tags before and after the translation and revision tasks. Supports the following ITS2.0 data categories: Translate, Localization note, Domain, Language Information, Allowed Characters, Storage Size, Provenance and Readiness.
  • Linguaserve ITS pre-production/post-production XML engine: Java classes. Transforms the CMS Drupal XML to a CAT-oriented XML to make easier the translation/revision work in the CAT tool and reconstructs the CMS XML with the CAT-oriented XML translated version. Supports the following ITS2.0 data categories: Translate, Localization Note, Domain, Language Information, Allowed Characters, Storage Size, Provenance and Readiness.
  • STAR Transit XV (CAT tool): tool used by the LSP-based Translators and LSP-based Reviewers to do their work.


The detailed usage of each data category is:

  • Translate: global and local usage.
  • Localization Note: global and local usage.
  • Domain: global usage.
  • Language Information: local usage.
  • Allowed Characters: local usage.
  • Storage Size: local usage.
  • Provenance: local usage.
  • Readiness: global usage.

3 Use Case Demonstration

  • Status:
  • Connection between the CMS client side and the LSP server side tested and working.
  • Client CMS - LSP localization workflow roundtrip tests made in coordination with Cocomore with Drupal XHTML files.
  • LSP workflow integrated engine tested with Drupal XHTML files for processing the selected usage of the data categories.
  • Data category usage integration with the localization workflow finished.
  • Ongoing translation of client contents.
  • Demonstration: https://www-pre.linguaserve.net/las_demos/control/MLWLTWP3DemoEngine user: demos password: demosLingu@serve
  • ITS Data Categories: Translate, Localization Note, Domain, Language Information, Allowed Characters, Storage Size, Provenance and Readiness. The Readiness data category is an ITS 2.0 extension.

4 Interoperability Behaviour

4.1 Step 1: Preproduction process

Localization workflow interaction:

  • Localization Note: when alert type, send a notification to the project manager and add tooltip visualization in the workflow.
  • Language Information: quality check to ensure the source language content is according to the Webservice parameter.
  • Storage Size: quality check for the original content.
  • Readiness: control of processes to be done. Date control for availability and delivery is registered. Priority control.


The data categories treatment by the XHTML engine in the internal preproduction process is the following:

Preproduction process
Data category Global Local
Translate Omit selected not translatable contents. A particular node could be not translatable. Block parts of the content marked as not translatable.
Localization Note Inform the translator. Block and inform the translator.
Domain Inform the translator.
Language Information Inform the translator.
Storage Size Inform the translator.
Readiness Inform the project manager.

Example CMS XHTML source file:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsx="http://www.w3.org/2008/12/its-extensions" its:version="2.0">
 <head>
  <meta name="DC.subject" content="'Angewandte Wissenschaft', 'Unternehmen', 'Maschinenbau', 'Allgemein', 'Anlagenbau', 'Kunststoff- und Gummimaschinen', 'Technologien'"/>
  <script type="application/its+xml">
   <its:rules>
    <its:domainRule selector="/h:html/h:body" domainPointer="/h:html/h:head/h:meta[@name='DC.subject']/@content"/>
    <its:locNoteRule locNoteType="description" selector="/h:html/h:body">
     <its:locNote translate="no">Pressemitteilung</its:locNote>
    </its:locNoteRule>
    <itsx:readinessRule ready-at="21/01/2013 13:48:56:000 CET" priority="1/3" complete-by="19/02/2013 16:00:00:000 CET" ready-to-process="hTranslate, reviseQA, hReview, publish"/>
   </its:rules>
  </script>
 </head>
 <body id="36672" lang="de">
  <div id="36672-node_title" its-allowed-characters="[^<>]" its-storage-size="255">VDMA verstärkt den Kampf gegen Produktpiraterie</div>
  <div id="36672-body-0-value" its-allowed-characters=".">
   <p>17.05.2010 - Der Kampf gegen Produktpiraterie muss an vielen Fronten gleichzeitig geführt werden. Eine wichtige Maßnahme im
Kampf gegen die Verletzung von materiellem und geistigem Eigentum ist die juristische Unterstützung. Der <span translate="no">VDMA</span> setzt sich mit aller Vehemenz für den Schutz
seiner Mitgliedsunternehmen ein.</p>
   <p>„<span its-loc-note="Bitte korrekte sinngemäße Übersetzung mit Marc Wiesner absprechen." its-loc-note-type="description">Das Internet macht vieles
transparenter - beispielsweise auch die Verletzung von Schutzrechten</span>", betont <span translate="no">Marc Wiesner</span>, Experte für Produktpiraterie der Abteilung Recht im
<span translate="no">VDMA</span>. Viel schneller als früher bemerken es die Unternehmensvertreter heutzutage, wenn Produkte angeboten werden, die den eigenen täuschend ähnlich sind oder
illegale Nachahmungen darstellen. Das weltweite Datennetz hilft nicht nur beim Verkauf illegaler Waren, es bringt ebenso Rechtsverletzungen schnell und überall zutage.</p>
   <p>Der <span translate="no">VDMA</span> bietet seinen Mitgliedsunternehmen zusätzlich zu Publikationen und Informationen bei Veranstaltungen rechtliche Beratung speziell zu Verletzung
von <span lang="en">Know-how</span> und gewerblichen Schutzrechten an.</p>
  </div>
  <div id="36672-body-0-format" translate="no" its-allowed-characters=".">full_html</div>
 </body>
</html>

The result of this step is shown in the next step.

4.2 Step 2: Translation and revision

Localization workflow interaction:

  • Domain: automatic selection of CAT terminology and dictionaries. Selection of Translation Memories by domains.
  • Provenance: possibility to reassign the same translator/reviewer in new versions of the same content (based on identifiers). Inform the project manager.


Example preprocessed CAT oriented XML file:

<?xml version="1.0" encoding="UTF-8"?>
<xlas version="2.0">
    <xlasUTrad codif="UTF-8" sourceFile="1214_193_node.xml" xmlGuid="193_node">
        <xlasRefTrad url=""><![CDATA[http://www.machines-for-plastics.com/kug/de/presse]]></xlasRefTrad>
        <xlasRefTrad it2dataCategory="Domain"><![CDATA[Allgemein, Angewandte Wissenschaft, Anlagenbau, Kunststoff- und Gummimaschinen, Maschinenbau, Technologien, Unternehmen]]></xlasRefTrad>
        <xlasRefTrad it2dataCategory="LocalizationNote"><![CDATA[Pressemitteilung]]></xlasRefTrad>
        <xlasTrad fieldName="node_title" xMax="255" xMaxOrig="48" xlasId="1"><![CDATA[VDMA verstärkt den Kampf gegen Produktpiraterie]]></xlasTrad>
        <xlasTrad fieldName="body-0-value" xlasId="2"><![CDATA[<p>17.05.2010 - Der Kampf gegen Produktpiraterie muss an vielen
Fronten gleichzeitig geführt werden. Eine wichtige Maßnahme im Kampf gegen die Verletzung von materiellem und geistigem Eigentum ist die juristische Unterstützung.
Der <xlasBloq xlasBloqId="1"><span translate="no" >VDMA</span></xlasBloq> setzt sich mit aller Vehemenz für den Schutz seiner Mitgliedsunternehmen ein.</p>
<p>„<span its-loc-note="Bitte korrekte sinngemäße Übersetzung mit Marc Wiesner absprechen." its-loc-note-type="description">Das Internet macht vieles
transparenter - beispielsweise auch die Verletzung von Schutzrechten</span>", betont <xlasBloq xlasBloqId="2"><span translate="no" >Marc Wiesner</span></xlasBloq>, 
Experte für Produktpiraterie der Abteilung Recht im <xlasBloq xlasBloqId="3"><span translate="no" >VDMA</span></xlasBloq>. Viel schneller als früher
bemerken es die Unternehmensvertreter heutzutage, wenn Produkte angeboten werden, die den eigenen täuschend ähnlich sind oder
illegale Nachahmungen darstellen. Das weltweite Datennetz hilft nicht nur beim Verkauf illegaler Waren, es bringt ebenso
Rechtsverletzungen schnell und überall zutage.</p>
<p>Der <xlasBloq xlasBloqId="4"><span translate="no" >VDMA</span></xlasBloq> bietet seinen Mitgliedsunternehmen zusätzlich zu Publikationen und Informationen
bei Veranstaltungen rechtliche Beratung speziell zu Verletzung von <span lang="en">Know-how</span> und gewerblichen Schutzrechten an.</p>
]]></xlasTrad>
    </xlasUTrad>
</xlas>

This file will be imported in the CAT tool using a filter created ad hoc, based on a XML files with embedded HTML tags filter and modified to add new special tags.
The content inside the xlasRefTrad nodes is reference content for the translators/reviewers. Is is visible but blocked in the CAT tool.
The content inside the xlasTrad nodes is translatable content. It is editable by the translators/reviewers except the standard HTML tags and the content between the special inline blocking tags (<xlasbloq>).

The specific manipulation with each data category is the following:

  • Translate: The content of the XHTML div nodes with the local attribute translate="no" is not extracted. Additionally, in the translatable div nodes with HTML content the preprocessing step adds the xlasbloq tags before and after the pieces of the content that are not translatable.
  • Localization Note: The content of the localization note (from the <its:locNote> node) is added in a xlasRefTrad node, blocked by the CAT filter but visible for the translators and reviewers, if it applies (selector attribute). Additionally, the contents of the its-loc-note attributes (local usage) is blocked by the CAT tool.
  • Domain: The content of the domain attribute (as established by the domain pointer "/h:html/h:head/h:meta[@name='DC.subject']/@content") is added in a xlasRefTrad node, blocked by the CAT filter but visible for the translators and reviewers, if it applies (selector attribute).
  • Language Information: The local lang attributes are visible for the translators and reviewers and blocked. Workflow usage: The source language information is obtained from the DB (originally a web service parameter), always available for the LSP-based Translation Process Managers and used to select the translators and reviewers.
  • Storage Size: The local its-storage-size attribute value is obtained and informed in the xMax attribute of the xlasTrad node when applies. Also the size of the original content is calculated and informed in the xMaxOrig attribute.
  • Readiness: Workflow usage: the priority info is obtained from the DB (originally a web service parameter) and always available for the LSP-based Translation Process Managers. The expected finalization date (complete-by parameter) is updated into the system DB in the preprocessing step.

4.3 Step 3: Postproduction process

Localization workflow interaction:

  • Domain: Storage and classification of Translation Memories by domains.
  • Readiness: control of processes to be done. Date control for availability and delivery.


The data categories treatment by the XHTML engine in the internal postproduction process is the following:

Postproduction process
Data category Global Local
Translate Insert translation on the translatable nodes and undo blocking of parts of the content marked as not translatable.
Language Information Update the lang attribute in the body node. Update the language attributes in the translated contents.
Allowed Characters Restriction compliance check.
Storage Size Limitation compliance check.
Provenance Add or update the data category attributes.
Readiness Update the data category node.

Example translated and postprocessed CMS XHTML file:

<?xml version="1.0" encoding="UTF-8"?>
<html its:version="2.0" xmlns="http://www.w3.org/1999/xhtml" xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
 <head>
  <meta content="'Allgemein', 'Angewandte Wissenschaft', 'Unternehmen', 'Maschinenbau', 'Anlagenbau', 'Kunststoff- und Gummimaschinen', 'Technologien'" name="DC.subject"/>
  <script type="application/its+xml">
   <its:rules>
    <its:domainRule domainPointer="/h:html/h:head/h:meta[@name='DC.subject']/@content" selector="/h:html/h:body"/>
    <its:locNoteRule locNoteType="description" selector="/h:html/h:body">
     <its:locNote translate="no">Pressemitteilung</its:locNote>
    </its:locNoteRule>
    <itsx:readinessRule complete-by="19/02/2013 16:00:00:000 CET" priority="1/3" ready-at="30/01/2013 17:46:27:744 CET" ready-to-process="hReview, publish"/>
   </its:rules>
  </script>
 </head>
 <body id="36672" its-org="Linguaserve" its-person="21686" its-rev-org="Linguaserve" its-rev-person="20697" lang="fr">
  <div id="36672-node_title" its-allowed-characters="[^<>]" its-storage-size="255">VDMA renforce la lutte contre le piratage des produits</div>
  <div id="36672-body-0-value" its-allowed-characters=".">
   <p>17/05/2010 - La lutte contre le piratage des produits doit être menée sur de nombreux fronts à la fois. L'assistance juridique est une mesure 
essentielle dans la lutte contre la violation de la propriété matérielle et intellectuelle. <span translate="no">VDMA</span> milite avec véhémence 
pour la protection de ses sociétés membres.</p>
   <p>« <span its-loc-note="Bitte korrekte sinngemäße Übersetzung mit Marc Wiesner absprechen." its-loc-note-type="description">Internet donne de 
la transparence à beaucoup de choses, dont notamment à la violation du droit d'auteur</span> », souligne <span translate="no">Marc Wiesner</span>, 
spécialiste en piratage des produits au sein du département juridique de
<span translate="no">VDMA</span>. De nos jours, les représentants des entreprises remarquent beaucoup plus rapidement qu'autrefois la 
ressemblance frappante entre les produits proposés et les leurs ou les contrefaçons illégales. Même si le réseau mondial de données aide à la vente de 
marchandises illégales, il met également au grand jour, rapidement et en tout lieu, des violations du droit.</p>
   <p><span translate="no">VDMA</span> propose également à ses sociétés membres un conseil juridique en matière de publications et 
d'informations diffusées lors de manifestations, particulièrement en matière de violation du <span lang="fr">savoir-faire</span> 
et des droits de propriété industrielle.</p>
  </div>
  <div id="36672-body-0-format" its-allowed-characters="." translate="no">full_html</div>
 </body>
</html>

This file is exported from the CAT tool and postprocessed in the internal localization workflow.
The specific manipulation with each data category is the following:

  • Translate: Insert the translation on the XHTML translatable nodes. Additionally, undo the blocking of parts of the content marked as not translatable (remove the xlasbloq tags).
  • Language Information: Update the lang attribute in the body node. Change the code of the source language with the code of the target language. Also update the local language attributes in the translated contents.
  • Allowed Characters: The postprocessing engine checks if the content of the node fulfills the restriction indicated by the its-allowed-characters attribute.
  • Storage Size: The postprocessing engine checks if the content of the node fulfills the restriction indicated in the its-storage-size attribute.
  • Provenance: The global data category tag is added or updated showing the internal ID of the translator (its-person) and reviewer (its-rev-person) who has done the job and with the name of the LSP company (its-org and its-rev-org).
  • Readiness: The ready-to-process attribute is updated deleting the processing steps already done and leaving only the next steps in the localization chain (hReview, publish). These tasks will be executed in the client CMS side: the import and publication of the translated content. Also the ready-at attribute is updated with the time stamp.