The basic ideas for automatic translation are remembering text that has already been translated, recognizing text fragments that have to be excluded from automatic translation and offering a correction-option after automatic machine translation.
In the past I told that the principle of a self learning translation.
There are different kinds of text which have to be translated:
- UI-texts of the application
- labels - text which comments input fields and other parts of the ui
- placeholder - text that is shown within input field
- options - text for select lists
- messages - generated from the program, excluding text-fragments that shall not be translated
- hints for the user, help text - that will be avoided, because the app should be self explanatory
- logging data will not be translated
- user-texts
- application specific data entered by users
- diary entries - diary entries can be translated, so friends and co-readers can be international
- comments to diary entries
- translation at buildtime - that means
- analyze source code and certain tables and files, extract text to be translated, translate and deploy the matching for the translation
- translation is done with the translation database or by calling the translation API and manuell check and correction
- translation at runtime - that means
- analyze the text that goes to the ui
- translate from the matchings (Translation Database) or call the translation API
- store new translations for later check and correction
- show translated text in the ui
- runtime translation can be used for ui-data and for user-data
- combined solution
- first translation at buildtime
- then translation at runtime, if necessary
Translation is delivered based on the preferred language in the user options compared to the language flag in the application and in the data (texts). It is expected that a user writes in the language he has choosen in the options and that he wants to read the texts in the language. For better efficiency two languages can be choosen for reading.
The following aspects are relevant:
- resource files can be used, not my preference
- string-matching database
- key is language and string, content is target language and translated string
- hashcode-matching database
- for every text a hashcode is calculated
- key is language and hashcode
- content is text, target language and translated text as well as hash-code for the translated text
- exclusion from translation
- at first the text is examined regarding exclusions
- exclusions are replaced by placeholders, the placeholders are stored with the text fragments to exclude
- the hashcode is calculated with the placeholders
- the target text is found
- the placeholders in the target text are substituted with the stored text fragments
- the final target text can be shown
- that means: before the translation the exclusions have to be marked in the origin text
- hashcode and exclusions with placeholders are a very efficient basis for translation - proved since decades
There are various offerings for webservices for machine translation. Google and Bing surely are big players, but there are also smaller companies like SYSTRAN having good solutions.
Here bing translation based on AJAX will be the first choice with a very attractive pricing - generous free quantitiy per month.
The principle is as follows:
Here bing translation based on AJAX will be the first choice with a very attractive pricing - generous free quantitiy per month.
The principle is as follows:
- first check if a text is already translated (remember-function)
- check against translation cache (in memory)
- check against translation database
- if not, the machine translation API is envoked
- the result is stored in the cache and in the database
- the result is flagged as "not yet controlled"
- a dialog is provided to check all machine translations
- correction-option for the translation
- tagging option for the text fragments that should not be translated
- repeat option for the machine translation
Now a certain amount of programming has to be done - and standards like http://i18next.com/ are checked to make the development more efficient
A detail concerns singular and plural respectively pluralization, jed seems to be a good approach regarding that. Professional offerings for translation must be mentioned, like http://locize.com/ - but for starting my non commercial app bing will be preferred.
And I go into https://medium.com/@jamuhl/translate-my-website-please-732ddb622cba#.paw31btxw and formatting may not be forgotten: http://formatjs.io/guides/ and especially http://formatjs.io/guides/message-syntax/
The qualtity of machine translation depends on the "machine friendliness" of the text. There are some rules which support machine translation - and they usually enhance the readability for humans too. Some recommendations from http://translation-blog.multilizer.com/guidelines-for-writing-text-that-machine-can-translate-better/
A detail concerns singular and plural respectively pluralization, jed seems to be a good approach regarding that. Professional offerings for translation must be mentioned, like http://locize.com/ - but for starting my non commercial app bing will be preferred.
And I go into https://medium.com/@jamuhl/translate-my-website-please-732ddb622cba#.paw31btxw and formatting may not be forgotten: http://formatjs.io/guides/ and especially http://formatjs.io/guides/message-syntax/
The qualtity of machine translation depends on the "machine friendliness" of the text. There are some rules which support machine translation - and they usually enhance the readability for humans too. Some recommendations from http://translation-blog.multilizer.com/guidelines-for-writing-text-that-machine-can-translate-better/
- write short sentences
- write full, grammatically correct sentences
- use common vocabulary
- avoid words that have several meanings
These recommendations should be obeyed by programmers too, who write help-texts, messages or documentation.
No comments:
Post a Comment