Saturday, October 27, 2007

testing new google translation technology

It has been announced that Google has changed its translation technology. This article Google, traducteur automatique gives explanation about the change and proposes to do some tests. I took an excerpt of this article, in french, and tried the new Goggle translation technology.
The excerpt was
Contrairement à Systran, qui utilise une technologie à base de règles (morphologiques, syntaxiques, sémantiques, grammaticales, heuristiques, logiques, etc. etc.) appliquées à des vocabulaires et des grammaires définis, Google choisit une approche statistique [tout en collaborant avec le milieu universitaire], qui consiste à gaver les machines de milliards de mots de texte, ce qu'on appelle la linguistique de corpus (parallèles, alignés ou non), en associant des corpus (ou corpora pour les puristes) monolingues à des bi-textes (en prenant par exemple un site bilingue, ou tri- ou n-lingue, dont les textes sont segmentés puis alignés afin de fournir une mémoire de traduction) pour y appliquer ensuite des techniques d'apprentissage statistiques permettant de construire des modèles de traduction
Google's translation was: I suggested the following (human) translation on Google's translation page
Unlike Systran, which uses a technology-based rules (morphology, syntax, semantics, grammar, heuristics, logics, and so on. Etc.). Applied to grammar and vocabulary defined, Google chooses a statistical approach [while working with the academia], which consists of machines unheard of billions of words of text, the so-called corpus linguistics (parallel, or non-aligned), combining corpus (or corpora for the purists) to monolingual bi texts (eg taking a bilingual or tri - or n-lingue, whose texts are then segmented aligned to provide a translation memory) to implement then learning techniques can be used to construct statistical models of translation. Unlike Systran, which uses a rule based technology(morphology, syntax, semantics, grammar, heuristics, logics, and so on, and so on) applied to defined grammars and vocabularies, Google chooses a statistical approach [while working with the academia], which consists in feeding machines with billions of words of text, also called corpus linguistics (parallel, or non-aligned), combining monolingual corpus (or corpora for the purists) to bi texts (eg taking a bilingual or tri - or n-lingual site, whose texts are then segmented and then aligned to provide a translation memory) then implementing statistical learning techniques in order to build translation templates.
Will we ever get that kind of quality in automatic translation?

0 Comments:

Post a Comment

<< Home