Thursday, November 01, 2007

More tests on Google automatic translations and some comments

Google automatic translation Human translation Comments
Original text une technologie à base de règles if the semantic context tells the human translation conveys the right meaning and if the automatic translation produces a result that is correct in other contexts, Google finds results that are appropriate to the context
Translation A technology-based rules A rules-based technology
Google finds 1250000 1250000
Automatic reverse translation Une technologie à base de règles Une technologie reposant sur des règles If the semantic context is important to find a right translation, automatic reverse translation can degrade the number of Google results
Google finds 2040000 304000
Original text Google choisit une approche statistique, qui consiste à gaver les machines de milliards de mots de texte When then automatic translation is not good, compared to a human translation, less results are found with Google
Translation Google chooses a statistical approach, which consists of machines unheard of billions of words of text Google chooses a statistical approach, which consists in feeding machines with billions of words of text
Google finds 12800 41000
Automatic reverse translation Google choisit une approche statistique, qui se compose de machines inouïe de milliards de mots de texte Google choisit une approche statistique, qui consiste à nourrir les machines avec des milliards de mots de texte Automatic reverse translation of a bad automatic translation yields results that are semantically less consistent
Google finds 465 12800
Original text une approche statistique qui consiste à gaver les machines de milliards de mots de texte  
Translation A statistical approach, which is to force-feed the machines of billions of words of text A statistical approach, which consists in stuffing the machines full of billions of words of text
Google finds 301 166000
Automatic reverse translation Une approche statistique, qui consiste à nourrir de force les machines de milliards de mots de texte Une approche statistique, qui consiste en pleine bourre les machines de milliards de mots de texte
Google finds 123000 21400

Saturday, October 27, 2007

testing new google translation technology

It has been announced that Google has changed its translation technology. This article Google, traducteur automatique gives explanation about the change and proposes to do some tests. I took an excerpt of this article, in french, and tried the new Goggle translation technology.
The excerpt was
Contrairement à Systran, qui utilise une technologie à base de règles (morphologiques, syntaxiques, sémantiques, grammaticales, heuristiques, logiques, etc. etc.) appliquées à des vocabulaires et des grammaires définis, Google choisit une approche statistique [tout en collaborant avec le milieu universitaire], qui consiste à gaver les machines de milliards de mots de texte, ce qu'on appelle la linguistique de corpus (parallèles, alignés ou non), en associant des corpus (ou corpora pour les puristes) monolingues à des bi-textes (en prenant par exemple un site bilingue, ou tri- ou n-lingue, dont les textes sont segmentés puis alignés afin de fournir une mémoire de traduction) pour y appliquer ensuite des techniques d'apprentissage statistiques permettant de construire des modèles de traduction
Google's translation was: I suggested the following (human) translation on Google's translation page
Unlike Systran, which uses a technology-based rules (morphology, syntax, semantics, grammar, heuristics, logics, and so on. Etc.). Applied to grammar and vocabulary defined, Google chooses a statistical approach [while working with the academia], which consists of machines unheard of billions of words of text, the so-called corpus linguistics (parallel, or non-aligned), combining corpus (or corpora for the purists) to monolingual bi texts (eg taking a bilingual or tri - or n-lingue, whose texts are then segmented aligned to provide a translation memory) to implement then learning techniques can be used to construct statistical models of translation. Unlike Systran, which uses a rule based technology(morphology, syntax, semantics, grammar, heuristics, logics, and so on, and so on) applied to defined grammars and vocabularies, Google chooses a statistical approach [while working with the academia], which consists in feeding machines with billions of words of text, also called corpus linguistics (parallel, or non-aligned), combining monolingual corpus (or corpora for the purists) to bi texts (eg taking a bilingual or tri - or n-lingual site, whose texts are then segmented and then aligned to provide a translation memory) then implementing statistical learning techniques in order to build translation templates.
Will we ever get that kind of quality in automatic translation?

Friday, March 24, 2006

Adscriptor: Google et la présentation d'Eric Schmidt

Google receives a lot of attention these days after Eric Schmidt's presentation to analysts. This blog provides yet another display of Eric's presentation and notes. I commented : Google is relying on advertizing, which is where Google's limit is. Google promises to index everything. Well not everything, just these things than can be advertized and that is a very short fraction of all "worlwide contents". Google might therefore not be that important after all.

Tuesday, July 05, 2005

Translation memory : could it be shared across customers ?

I just came across an article explaining how a Translation memory is used by a company, Argos, in the business of translation and localization. Argos says it uses one of the well known translation memory tools available on the market : SDLX, Trados, DejaVu, Transit. The company also says it uses translation memories on a client by client basis and is not sharing translation memories from one client to another. Argos mentions that using a translation memory can save up to 50% in translation costs. Imagine a translation memory was shared among customers; would that save more on translation? Has anyone yet practiced sharing translation memories across customers? Can it be done? Is it worth doing?

Thursday, May 12, 2005

Is a Googleography like a bibliography?

I thought Googleography was a neologism I just carved. To my surprise it is already employed. At the time of this writing Google references googleography 15 times with the meaning "performing a search using Google". See here what I got:
  1. PerversionTracker: Apparently Useless Software: Googleography ... Googleography. A few interesting Google searches for which we unexpectedly appear on the first page:. version tracker (#4); rosy periwinkle (#2) ... perversiontracker.com/archives/000392.html - 16k - 10 May 2005 - Cached - Similar pages
  2. PerversionTracker: Apparently Useless Software Archives ... Googleography L'Astrologue 5.3 PerversionTracker Invited to REAL World Beetle Revenge Saturday Can You Spell . . . ? 1.1 RadicalSqueeze 1.0 ... perversiontracker.com/archives.html - 35k - Cached - Similar pages [ More results from perversiontracker.com ]
  3. [PDF] PREVIEW EDITION File Format: PDF/Adobe Acrobat - View as HTML ... The New ‘Googleography’ of Local Search. Is Changing the Advertising Map — Forever. Peter M. Zollman. 2. Page 3. info@aimgroup.com (407) 788-2780 ... www.classifiedintelligence.com/FileGallery_ Redirect.asp?UID=1091&FileID=17&FileName=17.pdf - Similar pages
  4. UUC Blacksburg: Carter Turner: Requiem for a Dream: Remembering ... ... the significance of anything in the world today is to subject it to the empirical test of googleography – otherwise known as the Google search. ... civic.bev.net/uufnrv/CT-040523.html - 16k - Cached - Similar pages
  5. Re: New member says WELL DONE ... here... but yeah this is a cool place with lots of fun stuff... like Googleography, Googlemania, and other assorted Tigger Classes... ... adultwebmasterhangout.com/ubbthreads/ printthread.php?Board=fun&main=22065&type=post - 8k - Supplemental Result - Cached - Similar pages
  6. New member says WELL DONE ... but yeah this is a cool place with lots of fun stuff... like Googleography, Googlemania, and other assorted Tigger Classes... -- Gay ... adultwebmasterhangout.com/ubbthreads/showflat. php?Cat=&Board=fun&Number=22042&page=8&view... - 49k - Supplemental Result - Cached - Similar pages [ More results from adultwebmasterhangout.com ]

My view of the meaning for this word is more like the meaning of the word bibliography.

When you are trying to find out some ideas, discussions, facts, in a library, you usually make a list of all those books were you found something interesting or supporting or challenging, related to your idea.

Then you make this list available, for example at the end of the book or paper you are writing. Is is a way of proclaiming your are honest with what you say since others are saying the same, or are opposing what you say.

Some may also pretend to be honest, and the bibliography is there to make the innocent believe they are honest.

A bibliography does not save you the effort of proving what you are saying, or the effort of verifying what is said by someone. It may be a clarification but may also add to the confusion.

So making a googleography of your findings is like making a bibliography but there is a bit more to it.

A good googleographical entry should contain the google keywords that lead you to the URL's you are citing: URL's vanish over time, Google keywords probably less.

Furthermore the documents referred to by a given Google list of keywords evolve over time: how do you account for what Google found at the time you performed the Google search? What happens if Google finds now a different list of documents?

Click on googleography here to view what Google finds at the time you do it. Do you see the same links as above or something different? What are your conclusions regarding a Googleography?

ps: Click on googlography. That word is used too!

Friday, April 15, 2005

Are Chatter bots here to stay?

Chatter bots are robots capable of having a conversation with some-(human)-body. The attempt is not new as related in this (French speaking) page. Trying to make a computer speak (and answer) to a human is somehow like trying to make that same computer translate from one human language to another : a serious attempt, lots of research, some success, results that generate other questions, trying to find out how that human language works and why it seems so difficult to imitate it with a machine. So, is solving the Chatter bot problem like solving the translation problem : an impossible task? What do you think? What do we do until someone finds the solution?

Wednesday, April 13, 2005

Why MT (Machine Translation) can only help

You have for sure tried machine translation services, such as Google language tools or Altavista Babel Fish Translation. If you have not, make a try at one of the two above, and why not comment here about your findings? If your conclusion is :
Machine translation can help
then your are probably looking for a rough translation Microsoft offers also some kind of machine translation for its support pages. See for instance Windows Installer 3.1 is available and its French translation Windows Installer 3.1 est disponible. So Machine Translation helps, but as Microsoft wisely says above "use at your own risk" of mis-understanding the real meaning of the original text. What good uses do you see for Machine Translation?