Machine translation Introduction to Computational Linguistics – 25 April 2017.

1 Machine translation Introduction to Computational Lingu...
Author: Enikő Fodor
0 downloads 8 Views

1 Machine translation Introduction to Computational Linguistics – 25 April 2017

2 Introduction Translation of full texts from the source language to the target language Computer Aided Translation (CAT) Hard task: content should be the same but the morphological and syntactic rules of the target language should be observed

3 Quality Now MT quality is way below that of human translationBUT: for some specific (delimited) topics it is sufficient recipes weather forecast technical descriptions The most desired NLP application throughout the world „to help those with disabilities”

4 The task of translation„I just look up the words, I translate the sentence, and it’s done” Let 1 sentence contain 15 words on average Let 1 word have 3 meanings on average 1 sentence can have 315 different translations * permutations due to word order changes * insertion or deletion of words

5 The Vauqois pyramid InterlinguaSource language semantics Target language semantics Source language syntax Target language syntax Source language words Target language words

6 Approaches to MT Dictionary based MT Syntactic transformation based MTSemantic transformation based MT

7 Dictionary based MT Electronic dictionary or bilingual lists of wordsWords or phrases in SL are matched to words or phrases in the TL Very basic solution Problematic issues are frequent

8 Problems in word-to-word translation:The word cannot be found in the dictionary (SL word on the TL text) Too many senses in the dictionary (the most frequent one (?) is selected)

9 Problems in word form-to-word form translation:The dictionary contains only one word form (e.g. Sg Nom for nouns) In agglutinative languages, words have many forms (HUN) – they cannot be listed Morphological analysis would be necessary for translation

10 Problems in phrase-to-phrase translation:Syntactic parsing is needed for both TL and SL Resolution of syntactic ambiguities Syntactic parsers should be integrated

11 Syntactic transformation based MTSyntactic tree is transformed into a syntactic tree Not only the leaves (words) are translated The whole tree is transformed

12 Transformation of a treeOne node and one child is selected from the tree with some probability These nodes are ordered (change the order, insertion or deletion of other nodes) 3. Lexical units on the leaves are translated

13 Example Mary did not slap the green witch Maria no daba una bofetada a la bruja verde

14 Characteristics Word order problems might be solvedAppropriate solutions for related languages Complex transformation rules Incorrect translations for structures that are different in Sl and TL (movement + direction): La botella entro a la cuerva flotando. The bottle floated into the cave. *The bottle entered the cave floating.

15 Semantic transformationSemantic information is used during MT Linguistic differences among languages can be neutralized Simple form: applying semantic features (human, animate, male/female…)

16 SL information is translated into an intermediate language (Interlingua)From interlingua it is translated to TL Information is translated not only syntax Nowadays: English ~ interlingua

17 Example John gave Mary a book. Give(john; book; mary) TJean a donné un livre à Marie. János adott Marinak egy könyvet.

18 Semantics based translationSimpler rules More precise translation Based on theoretical semantics Less experience Difficult to construct for some language pairs

19 Example based MT(EBMT)„when humans translate, they do not use transformational rules, instead, they follow previously seen patterns” Translation database (parallel corpus) Translation units are identified The most similar example is selected from the database

20 Methods Example based: which word/phrase is most similar to the one in the database Dictionary based: weak, may be good for related languages (syntactic) transfer based: most methods are based on this, acceptable solutions Interlingua: translation to intermediate language – „future work”

21 Statistical MT Each word has several possible translationsSelect the most probable sequence of words Translation model: bag-of-words translation Language model: meaningful sentences argmax P(h|e)=argmax P(h)*P(e|h)

22 BLEU-score Measuring the quality of translationHuman evaluation is very costly For the sentences of the test set, there are some human translations available The coverage of word 1,2,3,4 grams is compared to the human translations

23 Example A Föld nem kör, hanem ellipszis alakú pályán kering a Nap körül. A CsE egységet korábban az ellipszis fél nagytengelyének hosszaként definiálták. The Earth orbits around the Sun on an elliptic, not circular path. The AU was defined as half the length of the major axis of the ellipse. The Earth revolves around the Sun not on a circular but on an elliptic orbit. The unit AU was earlier defined as the length of half of the major axis of the ellipse. The Earth’s orbit around the Sun is not circular, but elliptical. Previously the Astronomical Unit (AU) was defined as the length of the semi-major axis of the said ellipse. The Earth is not circular , but elliptical orbit around the sun . The AU unit before the semi-major axis of the ellipse was defined as the length .

24 MT systems Google Translate (http://translate.google.com/)Bing (http://www.bing.com/translator) HU-En: MetaMorpho (www.webforditas.hu) computer-aided translation (CAT) tools : Intelligent dictionaries Lexical databases Translation memories Paralell corpora

25 Funny examples bányászszív [miner sucks]bulvárszíndarab [boulevard colour][ piece] gyertyamártás candle sauce habképző [foam][ derivational suffix] hajsütés hair baking halálnem death gender halmajonéz [dying mayonnaise] hóhányás [snow][ vomit] hóhullás [snow][ corpse] hőképzés [heat][ training] hőkiütés [heat][ knockout] időjóslat [time][ prophecy] + light verb construction fényigeszerkezet

26 Funny examples Her children go to the same school as mine.A gyerekei bányaként járnak ugyanabba az iskolába. (MetaMorpho) Gyermekei megy ugyanaz iskola mint bánya. (InterTran) A gyerekei ugyanabba az iskolába járnak, mint az enyémek. (reference)

27 Funny examples There are no biscuits left!Kekszek nincsenek balra! (MetaMorpho) Nincs kétszersültek bal! (InterTran) Nincs több keksz! (reference)