Ninas Travel-App: automatic translation

September 27, 2016, Rolf W. Eckertz, Germany
The basic ideas for automatic translation are remembering text that has already been translated, recognizing text fragments that have to be excluded from automatic translation and offering a correction-option after automatic machine translation.

In the past I told that the principle of a self learning translation.

There are different kinds of text which have to be translated:

UI-texts of the application

labels - text which comments input fields and other parts of the ui
placeholder - text that is shown within input field
options - text for select lists
messages - generated from the program, excluding text-fragments that shall not be translated
hints for the user, help text - that will be avoided, because the app should be self explanatory
logging data will not be translated

user-texts

application specific data entered by users
diary entries - diary entries can be translated, so friends and co-readers can be international
comments to diary entries

Regarding the point in time, when the translation is done, there are three choices:

translation at buildtime - that means

analyze source code and certain tables and files, extract text to be translated, translate and deploy the matching for the translation
translation is done with the translation database or by calling the translation API and manuell check and correction

translation at runtime - that means

analyze the text that goes to the ui
translate from the matchings (Translation Database) or call the translation API
store new translations for later check and correction
show translated text in the ui
runtime translation can be used for ui-data and for user-data

combined solution

first translation at buildtime
then translation at runtime, if necessary

Normally translation an runtime is more convenient and during the testing phase it's a good side effect for test coverage and translation coverage.

Translation is delivered based on the preferred language in the user options compared to the language flag in the application and in the data (texts). It is expected that a user writes in the language he has choosen in the options and that he wants to read the texts in the language. For better efficiency two languages can be choosen for reading.

The following aspects are relevant:

resource files can be used, not my preference
string-matching database

key is language and string, content is target language and translated string

hashcode-matching database

for every text a hashcode is calculated
key is language and hashcode
content is text, target language and translated text as well as hash-code for the translated text

exclusion from translation

at first the text is examined regarding exclusions
exclusions are replaced by placeholders, the placeholders are stored with the text fragments to exclude
the hashcode is calculated with the placeholders
the target text is found
the placeholders in the target text are substituted with the stored text fragments
the final target text can be shown
that means: before the translation the exclusions have to be marked in the origin text

hashcode and exclusions with placeholders are a very efficient basis for translation - proved since decades

There are various offerings for webservices for machine translation. Google and Bing surely are big players, but there are also smaller companies like SYSTRAN having good solutions.

Here bing translation based on AJAX will be the first choice with a very attractive pricing - generous free quantitiy per month.

The principle is as follows:

first check if a text is already translated (remember-function)

check against translation cache (in memory)
check against translation database

if not, the machine translation API is envoked

the result is stored in the cache and in the database
the result is flagged as "not yet controlled"

a dialog is provided to check all machine translations

correction-option for the translation
tagging option for the text fragments that should not be translated
repeat option for the machine translation

Now a certain amount of programming has to be done - and standards like http://i18next.com/ are checked to make the development more efficient
A detail concerns singular and plural respectively pluralization, jed seems to be a good approach regarding that. Professional offerings for translation must be mentioned, like http://locize.com/ - but for starting my non commercial app bing will be preferred.
And I go into https://medium.com/@jamuhl/translate-my-website-please-732ddb622cba#.paw31btxw and formatting may not be forgotten: http://formatjs.io/guides/ and especially http://formatjs.io/guides/message-syntax/

The qualtity of machine translation depends on the "machine friendliness" of the text. There are some rules which support machine translation - and they usually enhance the readability for humans too. Some recommendations from http://translation-blog.multilizer.com/guidelines-for-writing-text-that-machine-can-translate-better/

write short sentences
write full, grammatically correct sentences
use common vocabulary
avoid words that have several meanings

These recommendations should be obeyed by programmers too, who write help-texts, messages or documentation.

Ninas Travel-App

Tuesday, September 27, 2016

automatic translation

No comments:

Post a Comment