Okapi Framework Changes Log - Jul-30-2010
Note this document is common to both the okapi-lib distribution and
the okapi-apps distribution. The information pertaining to
applications other than Tikal are relevant only for the okapi-apps
distribution.
Changes from M7 to M8
- Installation:
- Added a distribution for the Windows 64-bit platform.
- Rainbow:
- Fixed the bug where the initial character of input file was truncated if
root had a final slash or backslash.
- Replaced the Line-Break conversion utility by the "Line-Break
conversion" pre-defined
pipeline.
- Added the "Run Quality Check Session" command to to Tools menu.
- Fixed the issue #139 where a target SRX was required for segmentation in
"Translation Package Creation".
- CheckMate:
- Added CheckMate: a standalone application to run a the quality checker.
- Translation resources:
- Added a first simple connector implementation for the TDA Search
services.
- Steps:
- Added the Term Extraction step.
- Added the Quality Check step. Including support for
Language Tool Checker.
- Added Line-Break Conversion step.
- Added the Image Modification step.
- Full-Width Conversion step:
- Added the option to convert Squared Latin Abbreviations parts of the CJK
Compatibility block to non-CJK.
- Added the option to convert some of the Letter-Like Symbols block to
simple character sequences.
- Format Conversion step:
- Added the output to Parallel Corpus Files (for example to use as input
for training MT systems)
- Added the option "Output only approved entries".
- Search and Replace step:
- Added support for
\n, \r, \t, and
\N in replacement feild when in regex mode. Resolve issue #123.
- Filters:
- XML Filter:
- Added support for unique ID in pre-defined configuration for RESX files.
- Added the
omitXMLDeclaration option to the parameters file.
- XMLStream Filter:
- Added new filter for streamed XML, e.g. to handle large documents.
- TTX Filter:
- Replaced ScoreInfo annotation by AltTranslation annotation.
- Added the option of escaping the character "greater-than" in output.
- Improved the supported for overlapping TTX
<df> tags.
- Trados RTF
- Improved the RTF filter
- Integrated it as Trados RTF filter (Reading mode only, and inline codes
only when represented with Trados styles). This filter
cannot be used for normal extract/merge operations, but is useable for any
function that requires only extraction.
- Table Filter:
- Fixed issue #138 where tab was not useable as separator in "csv" mode.
- Fixed issue #136 where a defined Record ID was not set properly.
- Fixed issue #137 where the Source column of Source was incorrectly set
- Libraries:
- Added
getDefaultConfigurationFromExtension() to filter
configurationn mapper.
Changes from M6 to M7
- Rainbow:
- Fixed issue where output example was not updated when the top input
file was removed.
- Fixed issue where pipeline file was not written as UTF-8.
- Translation package Creation:
- Fixed issue #132 where we generated segment <mrk> in XLIFF if the text
was pre-translated but not segmented.
- ID-Based Alignment
- Implemented request #134: A TMX output can now be created for
un-aligned entries.
- Libraries:
- Changed the SVN structure to allow checking-out and building the
libraries separately from the UI and apps. To get the base libraries
only:
http://okapi.googlecode.com/svn/trunk/okapi. To get everything:
http://okapi.googlecode.com/svn/trunk.
- Changed the TextContainer class and refactor all dependencies. This
modification is a major code change.
- Added the
setRootDirectory() method to the IQuery interface.
- Updated QueryManager to handle empty inline codes and inline codes
with references when leveraging fuzzy matches
- Added spin-like input part to the generic editor.
- Fixed bug where platform type for "cocoa" was not handled, and
therefore Mac not detected in some occurrences.
- Added support for ftnsep, ftnsepc, aftnsep and aftnsepc control
words in the RTF parser, so any defined paragraph or character is
skipped.
- Added the following generic UI parts to the generic editor:
SpinInputPart and SeparatorPart,
- The ScoresAnnotation class has been deprecated, use the new
AltTranslationsAnnotation instead.
- Fixed help location issue for SRX editor (Ratel).
- Translation resources:
- Updated all the connectors for the IQuery change and Implemented
${rootDir} for all the connectors using locale files: SimpleTM and Pensieve.
- Apertium MT:
- Cross-Language Gateway MT services:
- Filters:
- OpenXML filter:
- Fixed an issue with open/closing group in some conditions.
- Fixed an issue with a case of text box resulting in hanging.
- XML filter:
- Added pre-defined configuration for WiX (Windows Installer XML)
Localization files.
- Improved handling of empty elements.
- XLIFF filter:
- Improved the reading of pre-segmented content, so the segment Ids are
now preserved instead of re-generated.
- Fixed parent-id for StartSubDocument event/resource.
- Implemented read-only property for build-num in <file> and extradata in
<trans-unit>.
- Improved support for segmentation choices in output. Now the filter can
remove, add or keep the segmentation for each trans-unit.
- Vignette filter:
- Fixed issue of 64K limit of blocks (due to Java DataOutputStream
writeUTF() limitation): added multi-chunks write/read function.
- Ruby on Rails YAML filter:
- Added support for Ruby on Rails YAML filter. It offers partial support of
YAML files.
- Versified Text Filter:
- Added support for filter on versified text documents.
- HTML filter:
- Fixed default configurations to extract ALT attribute of AREA elements.
- TMX filter:
- Fixed the bug where the option "escape greater-than characters" was not
working.
- Steps:
- Implemented ${rootDir} for the follwoing steps: Format Conversion,
Generate SimpleTM, Segmentation, TM Import, Leveraging, Batch Translation.
- Segmentation step:
- Made copy of source into empty target an option.
- Added the option of verifying source and target segments match after
segmentation.
- Added the "Diff Leverage" step.
- Added the "External Command" step.
- Sentence Alignment step:
- Added support to use a single bilingual input file.
- Format Conversion step:
- Added the option to generate output files with
automated extension.
- Text Modification step:
- Implemeted Request #100: An option to modify or not entries without text.
Changes from M5 (0.5.1) to M6
- Installation:
- Updated the Macintosh distributions with application bundles for Rainbow
and Ratel.
- Changed the Macintosh distributions to GunZIP files to preserve executable
flag of the shell scripts.
- Rainbow:
- Translation Package Creation:
- Fixed the issue where pre-segmented RTF
output was losing referents in target.
- Fixed the deletion of the empty TMX files when the package is zipped.
- Added English-India in the locales list.
- Fixed bug where steps using 3 input lists for more than 3 input files
were getting null values instead of raw documents.
- Added support for plugins for steps, filters and parameters editors.
Just drop the JAR in the
dropins folder.
- Updated the way the utilities menu is stored.
- Replaced the "URI Conversion" utility by a pre-define pipeline using the
"URI Conversion" step.
- Tikal:
- Added support for plugins for filters and parameters editors. Just drop
the JAR in the
dropins folder.
- Steps:
- Format Conversion step:
- Fixed the issue where monolingual segmented input was not output
properly in tab-delimited format.
- Added the "Desegmentation" step.
- Added the "URI Conversion" step.
- Added Import/Export functions to the dialog box of the "Search and
Replace" step
- Libraries:
- Changed QueryManager:
- Allow code changes in target for the
non-segmented queries.
- Prevents exact matches to have the target codes "adjusted" from the
source.
- Added setReferentCopies() to GenericSkeletonWriter to allow correct
output for writers refering more than once to the referents (e.g. when
creating pre-segmented RTF with source and target).
- Moved lib-plugins to common.
- Translation resources:
- Added in SimpleTM an option for code content and order difference
between query and source text
- Filters:
- HTML filter:
- Added support for inline codes using regular expressions.
- Table filter:
- Fixed issue #124 where part of the copy of the file configuration was
dropped for TSV files whn creating package for XLIFF.
- TTX Filter:
- Fixed issue #130 where empty TargetLanguage attributes were not updated
with the target language code.
- XML filter:
- Improved the pre-defined configuration for Android resources files.
- Fixed issue #128: help example for codeFinder:
count=1 is now
count.i=1.
Changes from M5 (0.5) to M5 (0.5.1)
- Rainbow:
- Translation Package Creation:
- Fixed the bug where the encoder manager
for RTF output was not properly set and cause some formats like HTML, TMX,
etc. to have un-escaped characters.
- Changed the RTF writer to allow other skeleton writers than
GenericSkeletonWriter.
- Replaced the Search and Replace utility by the "Search and Replace with
Filter" and the "Search and Replace without Filter" pre-defined pipelines.
- Replaced the Text Rewriting utility by the "Text Rewriting" pre-defined
pipeline.
- Tikal:
- Fixed the issue of not having the HTML filter mapped when using the
Vignette filter.
- Added support for accessing Microsoft MT engine (
-ms
option).
- Translation resources:
- Added a connector for Microsoft MT Web services (http://api.microsofttranslator.com/V1/SOAP.svc),
a Microsoft Bing AppID is needed to use it. You can obtain one at
http://www.bing.com/developers/appids.aspx.
- Google MT: made it consistent with other connector when result
is same as target, now the result is returned.
- SimpleTM: Added made the feature "penalize exact matches when target has
different codes than the query" an option. (default is true, backward
compatible).
- Libraries:
- Fixed issue with GenericSkeletonWriter and in-line codes in segmented
text unit that were outside any segment.
- Fixed issue with GenericFilterWriter output stream not nullified in
close() (causing for example no output using FilterEventsToRawDocument).
- Steps / Pipeline:
- Added MULTI_EVENT (new resource and Event) handling to pipeline.
- Changed step handlers to return Event by default.
- Fixed the parameters setting bug preventing to save the parameters for
pre-defined pipeline from one session to the next.
- Leveraging step:
- Fixed the bug preventing to enter a TMX path.
- Made adding an
MT! prefix to the TMX entries an option.
- Added an option to enabled/disable the step.
- Search and Replace step: Improved the behavior of the dialog box for
add/edit item.
- Format Conversion step: Fixed bug where the table-delimited output was
not closed properly for "one output per input" use case.
- Added Text Modification step.
- Filters:
- PHP Content filter: Added UI for the localization directives options
(default behaviour is the same).
- OpenXML filter: Changed the parameters editor to use GridLayout instead
of BorderLayout.
- TMX filter: Fixed losing original line-breaks between <tu> when
re-writing.
- Vignette filter: Fixed bug of un-escaped and non-CDATA RTF output.
- Properties filter: Added the option "Convert \n and \t to line-break and
tab".
- Table filter:
- Fixed issue #119 where csv action "Exclude leading/trailing..." was not
updated properly in the parameters editor
- Fixed issue #118 where some csv cases were not extracted properly
- Installation:
- Updated licence information for third-party packages.
- Removed all the dependencies to swing2swt.
Changes from M4 to M5
- Libraries:
- Changed minimum requirement to Java 1.6 instead of Java 1.5.
- Removed distribution for Mac Carbon, added distribution for Mac
Cocoa-64-bit.
- Updated to Lucene 3.0.0
- Refactored Pensieve TM engine, added new API.
- Rainbow:
- Added the duration of the process in the log.
- Updated the UI of the Pipeline Edit / Execute facility to make the
panels of each step accessible without clicking.
- Replaced the utility "Generate SimpleTM Dabase" by the
pre-defined pipeline "Import
Into Pensieve TM" (the previous utility's functionality is still available
using a custom pipeline).
- Replaced the utility "Export SimpleTM Database" by the
pre-defined pipeline "Convert
File Format" (the previous utility's functionality is still available using
a custom pipeline).
- Fixed issue with Text Rewriting and empty <target> for XLIFF input.
- Replaced the utility "Translation Comparison" by the pre-defined
pipeline "Translation Comparison".
- Added the pre-defined pipeline "Create Translations in Batch Mode"
- Replaced the utility "XSL Transformation" by the pre-defined pipeline
"XSL Transformation".
- Replaced the utility "Used Characters Listing" by the pre-defined
pipeline "Used Characters Listing".
- Ratel:
- Fixed selection bug in UI.
- Updated the default segmentation rules.
- Steps:
- Added Batch Translation step (tested with ProMT and Apertium).
- Added Codes Removal step
- Added Leveraging step
- Completed initial Tokenization and Word-Count steps.
- Added the Sentence Alignment step.
- Translation resources:
- Fixed issue with score > 100 in Pensieve TM.
- Added NCR support for Apertium connector.
- Filters:
- In the Properties Filter: Added pre-defined configuration for Skype's
.lang format.
- In the RTF parser:
- Fixed the issue with \'HHc being read as \'HH\'HH in some cases.
- Added support for additional DBCS encodings.
- Added TTX Filter for Trados TagEditor documents (Beta).
- In the HTML Filter: Added pre-defined configuration for well-formed files, providing groups
and extra meta-data.
- In the XML Filter: Changed the ITS extension
idPointer to
idValue and modified its behavior to allow ID values to be
generated from the expression, not just from the content pointed by the
expression. The values are backward compatible, but existing parameters file
will have to rename any reference to idPointer by idValue.
- Added the Vignette Filter for Vignette export XML documents (Alpha)
- Added the Pensieve Filter for reading and writing Pensieve translation
memories.
Changes from M3 to M4
- Filters:
- XLIFF filter: Added property for target-language and option to add it.
Changed some of the language selection behaviors and set fall-back to ID
option to false.
- Fixed several bugs in the OpenXML filter (MS Office 2007 documents)
- The JSON Filter has been added, to support for example AJAX or Palm
WebOS applications.
- The PHP Content Filter has been added, to support PHP include files.
- Added default DITA configuration to the XML Filter.
- Fixed several issues with the TS, Table, TMX, and XLIFF filters.
- Added
whiteSpaces ITS extension support in the XML Filter.
- The PHP Content Filter has been added.
- Library, Translation resources:
- All the TM and MT connectors have been moved to the package
net.sf.okapi.connectors.
- Modified the OpenTran connector to use the REST interface instead of
RCP.
- Added the connector to the MyMemory server (http://mymemory.translated.net)
- Improved Google MT connector.
- Improved GlobalSight TM connector for inline codes, and adjusted it for
GS version 7.1.6.
- Added Pensieve TM engine and its connector.
- Added the connector for the open-source Apertium MT system web
service (http://wiki.apertium.org/wiki/Main_Page)
- Changed language identification from String to LocaleId objects
across the whole framework.
- Steps and Rainbow utilities:
- Added the SimpleTM2TMX step.
- Added Import and Export utilities for SimpleTM files.
- Continued improving the Tokenization and WordCount steps.
- Implement an option to select the XSLT processor to use with the
XSL
Transformation utility.
- Updated the Translation Package Creation utility to
select from several resources for the pre-translation options, and to
allow specifying threshold instead "exact match only".
- Updated the Text Rewriting utility to select from
several resources for the translation options.
- Added the FormatConversion step.
- Improved inline compatibility in projects generated for OmegaT.
- Tikal:
- Added support for accessing the MyMemory repository (
-mm option)
- Corrected display of extended characters on the console for some
languages/platforms.
- Added threshold and max-hits options for TM query command (
-opt
option)
- Added a command to create PO files from any input (
-2po
command).
- Added a command to create TMX files from any input (
-2tmx
command).
- Added a command to create Table files from any input (
-2tbl
command).
- Added capability to query a Pensieve TM (
-pen option).
- Added support for accessing GlobalSight TM servers (
-gs option).
- Added support for accessing Apertium MT servers (
-apertium option).
- Added segmentation and leveraging options for the extraction command.
- Added a commands to import any file into a Pensieve TM (
-imp
command).
- Added a command to export a Pensieve TM to a TMX file (
-exp
command).
Changes from M2 to M3
- The build system has been completely redone and now uses Maven as its
main builder. This has resulted in several changes in the structure of the
Okapi classes, and in the way the files are distributed.
- Filters:
- Added the TS Filter (beta) for Qt translation files.
- Fixed handling of fuzzy flag for plural entries in the PO filter.
- Fixed handling of
approved, state and
coord properties in the XLIFF
Filter.
- Improved XML Filter:
- Improved rewriting of document type subset declaration.
- Added support for protecting custom entity references.
- Added support for ID defined using
xml:id or the
idPointer ITS extension feature.
- Properties Filter:
- Change the default configuration to always escape output.
- Added pre-defined configuration for non-escaped output.
- Fixed various issues in the OpenXML Filter (docx, pptx, etc.), and
PO Filter.
- Libraries:
- The Google MT connector has been enhanced to have the inline codes taken
into account, not simply pushed to the end of the text.
- Fixed one error in default segmentation rules.
- Added a connector component for the Translate Toolkit TM server.
- Added steps such as Word-count and Tokenizer.
- The command-line tool Tikal has been added.
- Rainbow (okapi-apps distribution only):
- Improved handling of un-approved translations in TMX generated
during a translation package creation.
- Added option to choose to merge only approved translations in
translation package post-processing.
Changes from M1 to M2
- Filters:
- The DTD Filter has been added.
- The PlainText Filter has been added.
- The Table Filter has been added.
- Several pre-defined filter configurations have been added or updated:
Mozilla-RDF, XML Android Strings, XML Java properties, RESX, Monoligual PO,
SRT (Sub-titles), plain-text lines, plain-text paragraphs, CSV, etc.
- The OpenXML Filter (DOCX, PPTX, XSLX files) has been improved and now
provides much inline code simplification.
- The definition of the parameters for the RegEx Filter have been modified
to allow the support of target text, ID, etc. This new format is not
compatible with the one of M1.
- Other filters (HTML, Properties, XLIFF, TMX, PO, and OpenDocument
filters) have been improved.
- Libraries:
- A new TM connector to query remote GlobalSight TM servers has been
added. (See the Java Example05 of the okapi-lib distribution for an
illustration on how to use this component).
- A connector to query the remote OpenTran server has been added. (See the
Java Example05 of the okapi-lib distribution for an illustration on how to
use this component).
- New
RawDocument object model.
- The events mechanism has been augmented to work with batch items in the
pipeline.
- The encoding detection and handling of BOM has been modified in most
filters and utilities.
- The pipeline mechanism has been extensively re-written.
- Many steps for the pipeline have been created, they are experimental for
now.
- Rainbow:
- Ratel:
- Better preservation of comments in SRX files; and capability to add
comments from within Ratel.
- Uses the latest libraries.