Pourquoi je ne comprends pas l’engouement actuel autour de la traduction avec l’IA

L’intelligence artificielle (IA) n’a rien de nouveau en matière de traduction.

Le secteur de la traduction a toujours été l’un de premiers à intégrer la technologie de l’information et de la communication (ICT), parce que traduire c’est communiquer. Les traducteurs ont aussi été les premiers à utiliser les télécopieurs à grande échelle. Nous avons également été les premiers à utiliser des PC, même si c’était uniquement pour que le travail final soit propre. Et nous avons aussi été les premiers à utiliser des PC pour installer des bases de données, ainsi que pour automatiser les dictionnaires et le contrôle de l’orthographe.

L’un des premiers avantages qui m’a séduit dans la traduction assistée par ordinateur était la possibilité de conserver la mise en page du document original. Fini le temps des feuilles de papier avec des blocs de texte numérotés pour établir le lien avec le texte source. Le client recevait enfin un document traduit sans risque de confusion et qu’il pouvait facilement intégrer dans sa chaîne de production.

Lorsque la mémoire des ordinateurs est devenue suffisamment importante et rapide pour stocker le texte source et le texte cible des traductions, nous avons pu récolter les fruits de notre travail antérieur en automatisant ce que nous faisions depuis des décennies : baser les traductions de documents modifiés sur celles des versions précédentes. Mais cette fois, nous n’avions plus à passer au crible et à retaper des piles de textes. L’ordinateur faisait cette partie du travail pour nous, et nous pouvions nous concentrer sur les parties nouvelles et modifiées afin de livrer plus rapidement un meilleur résultat.

La dernière évolution a été l’introduction de la traduction assistée par ordinateur (TAO, ou CAT pour computer aided translation en anglais). Une fois ces logiciels adoptés, la mise en place de règles programmées pour comparer le texte source avec différents types de textes cibles, permettant ainsi au programme de nous aider à adapter le texte cible, n’a plus été qu’une question de temps.

Il en a inévitablement découlé l’introduction d’une intelligence accrue dans ces logiciels, ce qui a finalement donné naissance à ce que nous appelons la traduction automatique (TA, ou MT pour machine translation).

C’est pourquoi je ne comprends pas l’engouement actuel autour de la traduction avec l’IA. En réalité, la TA EST de la traduction avec l’IA. Bien souvent, les systèmes de TA n’utilisent pas la TAO traditionnelle comme base, mais la TAO classique intègre généralement des éléments de TA afin d’utiliser la traduction automatique pour préparer le travail final du traducteur, accélérer le processus et éviter les erreurs des systèmes purement automatisés.

La TA ne fonctionne pas sans la post-édition de traduction automatique (MTPE, machine translation post-editing). Le traducteur révise et édite le texte après la traduction automatique. C’est également ainsi que fonctionne la traduction avec l’IA. La seule différence entre la traduction avec l’IA et la MTPE est terminologique : ce sont deux expressions différentes pour désigner la même chose. C’est pourquoi l’énorme battage médiatique autour de l’IA dans la traduction n’est rien de plus qu’un changement de terminologie, parfois accompagné d’un changement de fournisseur : certains affirment utiliser l’IA pour attirer des clients, mais en réalité, ils font exactement ce que le secteur de la traduction fait depuis au moins dix ans.

Le terme  « IA » pour la traduction relève davantage du marketing et du window dressing que d’une nouvelle méthode de travail plus avancée.

Peter Motte

Waarom ik de hype over vertalen met AI niet begrijp

Artificiële intelligentie (AI) voor vertalen is niets nieuws.

Het vertaalvak is altijd al beïnvloed geweest door ICT, omdat het nu eenmaal over communicatie gaat. Vertalers waren bij de eersten die faxmachines op grote schaal konden gebruiken. We waren ook bij de eersten die pc’s toepasten, zelfs al was het alleen maar om ervoor te zorgen dat het afgeleverde werk er netjes uitzag. En we waren ook de eersten die pc’s gebruikten om er databases op te installeren om geautomatiseerde woordenboeken en spellingcontrole te gebruiken.

Een van de eerste voordelen die mij aansprak bij vertalen met computers, was dat we de lay-out van het oorspronkelijke document konden behouden. De dagen van velletjes papier met genummerde tekstblokken om het verband te leggen met de oorspronkelijke tekst, waren voorbij. De klant kreeg eindelijk een vertaald document zonder het risico tekst te verwarren, en dat hij vlot in zijn productketen kon introduceren.

Toen de computergeheugen groot en snel genoeg werden om brontekst en doeltekst van vertalingen op te slaan, konden we de voordelen oogsten van ons vroegere werk door de automatiseren wat we al decennia lang deden: vertalingen van gewijzigde documenten baseren op vertalingen van vorige versies. Maar deze keer moesten we geen stapels teksten meer doorploegen en overtikken. De computer deed dat deel van het werk voor ons, en wij konden ons concentreren op de nieuwe en gewijzigde delen en sneller een beter resultaat afleveren.

De jongste evolutie was de introductie van on of computerondersteund vertalen of CAT (van computer aided translation). Eenmaal die programma’s werden gebruikt, was het alleen nog maar een kwestie van tijd voor er programmaregels werden geïmplementeerd om bronmateriaal met allerlei soorten doelmateriaal te vergelijken, zodat het programma ons kon ondersteunen bij het aanpassen van de doeltekst.

Dat betekende onvermijdelijk dat er meer intelligentie in die programma’s werd geïntroduceerd, en dat leidde ten slotte tot wat we machinevertalingen, of MT (van machine translation) noemen.

Daarom begrijp ik de huidige hype over vertalen met AI niet. In feite IS MT vertalen met AI. MT-systemen gebruik vaak geen traditionele CAT als basis, maar traditionele CAT heeft doorgaans wel MT-elementen geïmplementeerd zodat MT wordt gebruikt voor de voorbereiding van het definitieve werk van de vertaler om het proces te versnellen en tegelijk de fouten te vermijden die pure geautomatiseerde systemen maken.

MT werkt niet zonder MTPE: machine translation post editing.  De vertaler voert een redactie na de machinevertaling uit. Op die manier werkt ook vertalen met AI. Het enige verschil tussen vertalen met AI en met MTPE is terminologisch: het zijn twee verschillende uitdrukkingen voor hetzelfde ding. Daarom is de enorme hype over AI bij vertalen niets meer dan een verschuiving in terminologie en soms ook in leveranciers: sommigen zeggen om klanten aan te trekken dat ze AI gebruiken, maar ze doen hetzelfde als wat de vertaalsector al minstens tien jaar doet.

De term AI voor vertalen is meer een kwestie van marketing en windowdressing dan van een nieuwe, meer geavanceerde manier van werken.

Why I don’t understand the hype about translation with AI

Artificial Intelligence (AI) in translation is nothing new.

The craft of translation has always been influence by ICT, because it’s all about communication. We were amongst the first who could use fax machines. We were amongst the first who could use pc’s, even if it was only to have an output which looked more neat. And so we were also the first who used pc’s to install databases on them and to be able to use automated dictionaries and spelling checkers.

One of the first benefits I like about translating using a computer, was that we could preserve the layout of the original document. The days of numbering blocks of text to make clear which part of the translation was related to which part of the original text, were gone. The client finally got a document which he could incorporate in his work flow and product chain without much ado.

When computer memories became big and fast enough to store source and target of previous translations, we could start reaping the benefits of our previous work, by automating what we had already been done for decades: found translations of reviewed documents on the translations of the original documents. But this time we didn’t have to plough through huge stacks of material and type it over. The computer did part of the work for us, and we could deliver faster a better result.

That last evolution was the introduction of computer aided translation, or CAT. Once those programs were used, it was simply a matter of time before we could implement logic to compare source material with target material of various kinds, so that the program could support us to amend the target.

Inevitable that lend to an increasing intelligence of the programs, and finally to what we call machine translation, or MT.

That’s why I don’t understand the hype about translating with AI. As a matter of fact MT IS translating with AI. MT systems often don’t use the traditional CAT tools as their fundamental work system, and traditional CAT tools have implemented some MT elements to make it possible to use MT to prepare for the definitive work by the translation to speed up the work and at the same time to avoid mistakes by the automated systems.

MT doesn’t work without MTPE: machine translation post editing.  The translator does the post editing of the texts produced by MT. That is the way translation with AI works. The only difference I see between translation with AI and MTPE is the terminology: it are two different expressions to mention the same thing. Therefore the huge hype about AI in the field of translation is nothing more than a shift in providers: some say they use AI to draw clients, but they’re doing the same thing that the translation business has been doing for several years now. AI in translation is more about marketing and window dressing than about offering a new, more advanced way of working.

Vertalen: een inleiding, deel 9

9 De vertaler: mens of machine?

9a Hulpmiddelen bij het vertalen

In de voortdurend evoluerende wereld van vertalingen zijn hulpmiddelen essentieel geworden voor zowel professionele vertalers als amateurs. Van woordenboeken tot vertaalgeheugens en terminologiedatabanken hebben tools de efficiëntie en nauwkeurigheid van vertaalwerk aanzienlijk verbeterd. Ze helpen vertalers bij het vinden van de juiste termen, het behouden van consistentie en het vermijden van fouten.

9b De computer als hulpmiddel

De opkomst van computers heeft een revolutie teweeggebracht in het vertaalproces. Vertaalsoftware zoals CAT-tools (Computer Assisted Translation) biedt functionaliteiten zoals automatische segmentatie van teksten, fuzzy matching en suggesties op basis van eerder vertaald materiaal. Dit verhoogt niet alleen de snelheid van vertalingen, maar stelt vertalers ook in staat om consistentie te handhaven over grote hoeveelheden tekst.

9c De computer als vertaler

Met de opkomst van machinevertalingen of machine translation is de relatie tussen menselijke vertalers en machines veranderd. Moderne vertaalmachines, zoals neurale netwerkmodellen, maken gebruik van geavanceerde algoritmen en machine learning-technieken om teksten automatisch te vertalen. Hoewel deze systemen indrukwekkende resultaten kunnen behalen, blijven ze moeite hebben met nuances, contextuele interpretatie en culturele gevoeligheid.

De discussie over de effectiviteit van machinevertalingen gaat echter verder dan slechts de kwaliteit van de vertalingen. Menselijke vertalers brengen unieke vaardigheden en inzichten naar het vertaalproces die niet gemakkelijk kunnen worden gereproduceerd door machines. Ze begrijpen de nuances van taal, kunnen culturele subtiliteiten oppikken en zijn in staat om contextuele betekenissen te interpreteren op een manier die machines nog niet kunnen evenaren.

De toekomst van vertalingen ligt in een samensmelting van menselijke expertise en geavanceerde technologie. Terwijl machines blijven evolueren en betere vertaalresultaten produceren, zal de rol van menselijke vertalers zich wellicht verplaatsen naar taken die een dieper begrip van taal en cultuur vereisen, zoals redactie, revisie en creatieve vertalingen.

Hoewel machines een waardevol hulpmiddel zijn in het vertaalproces, blijft de menselijke vertaler onmisbaar voor het leveren van hoogwaardige vertalingen die recht doen aan de complexiteit en nuances van menselijke communicatie. De vertaler van de toekomst is misschien geen mens of machine, maar eerder een symbiotische combinatie van beide. Maar de vertaler heeft altijd het laatste woord.

Translation, its forms

Translation refers to the process of converting written or spoken words in one language into another language. For the case of simplicity, we will treat ttem both as the same.

Translation can be done through various methods such as machine translation, human translation, or a combination of both.

Translation can also refer to converting non-linguistic forms of information, such as mathematical equations or musical notation, into other forms. But that doesn’t concern us here.

There are several forms of machine translation, including:

  • Rule-based machine translation (RBMT): This method uses a set of pre-defined grammar rules and dictionaries to translate text from one language to another.
  • Statistical machine translation (SMT): This method uses statistical models that are trained on large amounts of parallel text data to translate text.
  • Neural machine translation (NMT): This method uses neural networks to translate text. NMT models have been shown to produce translations that are often more accurate and natural-sounding than those produced by other forms of machine translation.
  • Hybrid machine translation: This method combines the strengths of different machine translation methods, such as RBMT and SMT, to produce more accurate translations.
  • Interactive machine translation (IMT) : this form of machine translation allows the user to interact with the machine during the translation process, correcting and providing feedback on the translations produced.
  • Post-editing machine translation (PEMT) : this form of machine translation allows a human translator to check and correct machine-generated translations.

There are several forms of human translation, including:

  • Professional translation: This form of translation is done by trained and experienced translators, who are often certified by professional organizations. Professional translations are usually done for official documents, legal contracts, and other important texts.
  • Community translation: This form of translation is done by volunteers or members of a community who translate text for the benefit of their community. Community translation is often used for non-profit or social impact projects.
  • Literary translation: This form of translation is done for literary works, such as novels, poems, and plays, to make them accessible to readers in other languages. Literary translators are often writers themselves, and pay particular attention to preserving the style and tone of the original text.
  • Simultaneous translation: This form of translation is done while the speaker is still speaking, which is often used in conferences, meetings and other events where multiple languages are spoken.
  • Consecutive translation: This form of translation is done after the speaker has finished speaking, which is often used in interviews, meetings and other events where multiple languages are spoken.
  • Audiovisual translation: This form of translation is done for audiovisual materials such as movies, TV shows, and video games, it includes subtitling, dubbing, and voice-over.

There are several combinations of machine translation and human translation, including:

  • Machine-assisted translation: This form of translation uses machine translation as a tool to help a human translator produce a more accurate and efficient translation. The translator can use the machine-generated translation as a starting point and then make any necessary corrections and improvements.
  • Post-editing machine translation (PEMT): This form of translation uses machine translation to generate a first draft of the translation, which is then reviewed and corrected by a human translator. This is often used for large-scale projects where speed and cost-effectiveness are important.
  • Hybrid machine translation: This form of translation combines the strengths of different machine translation methods, such as rule-based and statistical machine translation, to produce more accurate translations. The output of the machine translation is often reviewed and corrected by a human translator.
  • Interactive machine translation: This form of translation allows the user to interact with the machine during the translation process, providing feedback and corrections on the translations produced. The user can also provide additional information and context to the machine to improve the translation.
  • Human-in-the-loop machine translation: This form of translation uses machine translation as a tool to help a human translator, who can provide feedback and corrections to the machine, which will then adjust its output accordingly. This can improve the final translation and the efficiency of the process.

Will AI make translation an obsolete craft?

104786213-gettyimages-675938062-530x298

Sometimes people use Google Translate to understand a website. And some people think something exists like computer programs which spew out translations without any hassle. It makes look translators as old fashioned craftsman who at best have a workshop in a tourist center or during an arts & crafts exhibition.

 

Does that image suit reality?

 

As Artificial Intelligence is on the rise, some people proclaim the death of the translator in ten or maybe even five years time.

 

But as a matter of fact, Artificial Intelligence is not something which will pop up all at a sudden. It has been influencing daily practices since the nineties, maybe even earlier.

Research into artificial translation or machine translation started as early as 1949, but as is often the case with IT, the name promises more than it delivers. Early applications did nothing more than automatically looking up words in an automated dictionary.

 

Some historians claim that the idea of machine translation may be traced back to the 17th century, when in 1629 René Decartes proposed a universal language, which would share one symbol in different tongues for equivalent ideas. But the actual field of “machine translation” appeared only for the first time in ‘Memorandum on Translation’ by Warren Weaver in 1949. Research started in 1951 at MIT by Yehosha Bar-Hillel. And in 1954 there was a surprising demonstation at Georgetown University when the Machine Translation research team showed off its Georgetown-IBM experiment system in 1954. As computers’ power increased, so did the results of artificial translation. But real progress was rather slow, and after the ALPAC Report of 1966 found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. However, in 1972 a report by the Director of Defense Research and Engineering (DDR&E) reestablished the feasibility of large-scale MT because of the success of the Logos MT system in translating military manuals into Vietnamese during that conflict. And so again, war made progress (that is ironic).

 

So, considering the early starting date of the research at about 1949, probably induced by the advent of computers during the Second World War, progress was actually very slow. The problem is whether the computer program can actually understand human language, and whether that understanding is necessary to be able to translate.

 

Some would argue “yes”, and they try to find the rules which govern human language. Interesting in that respect was transformational-generative grammar or TGG. It’s philosophy is that human beings have a set of rules in their heads which forms meaning into meaningful sentences. So an English speaker would have a rule which puts the verb immediately following the subject, whereas a Japanese speaker would have a rule putting the verb at the end of the sentence.

 

Fact is, however, that you still have to be able to make the computer program to be able to grasp the meaning of what it has to say. But it is not the computer translation program building up the message to be translated. The message is already given in the source text.

To a certain degree, that simplifies matters: the program only has to be able to transform a message from a source text into a target text, in which source and target contain the same content, but encoded in different ways.

 

That’s, of course, an idea which appeals to programmers. You take a source, use TGG to derive it’s inner structure or deep structure, and use TGG of another language to build up a new surface structure. As simple as that.

 

It seems to be the most intelligent way to deal with artificial translation, but linguistics themselves are not always sure about the rules which one should put into TGG. And, anyway, TGG is meant to go from deep structure to surface structure, not the other way around. So, that leaves us with the problem of the analysis of the source text. All TGG rules have to be “reversed” or “inversed”.

 

Although there are a lot of other ways to deal with automatic translation, not all of them could be implied from the very beginning. The advantage of a TGG based translation system was the promise of using rules in a way a human being processes language – or is thought to process language – thereby limiting the amount of memory. Rules, as in maths, provide a way to apply knowledge without a big knowledgebase. Compare having to learn al multiplications starting with the table of 1 till the table of 10, or only having to know the rule that you add up a number as many times as you want to multiply it.

 

Most machine translation systems try to apply rules, but not all do to the same degree. As a matter of fact, the terms ‘machine translation’, ‘automatic translation’, ‘artificial translation’ and so on, are not interchangeable.

 

The main rule-based machine translation (RBMT) paradigms are further classified in three types: transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms.

 

RBMT involves more information about the linguistics of the source and target languages. The basic approach uses a parser for the structure of the source sentence and an analyzer for the source language, and then applies a generator on that information to generate the target sentence, with a transfer lexicon for the translation of the words.

 

However, RBMT demands that everything is be made explicit: orthographical variation and erroneous input must be made part of the source language analyser in order to cope with it, and lexical selection rules must be written for all instances of ambiguity. Adapting to new domains in itself is not that hard, as the core grammar is the same across domains, and the domain-specific adjustment is limited to lexical selection adjustment. But, of course, that’s all from a theoretical point of view.

 

Another way is transfer-based machine translation. It creates a translation from an intermediate representation that simulates the meaning of the original sentence. Unlike interlingual MT, it depends partially on the language pair involved in the translation.

The third method, interlingual machine translation, is a kind of rule-based machine-translation. The source language is transformed into an interlingual language. That is a ‘language neutral’ representation that is independent of any language. The target language is then generated out of the interlingua. One of the major advantages of this system is that the interlingua becomes more valuable as the number of target languages it can be turned into increases. However, the only interlingual machine translation system that has been made operational at the commercial level is the KANT system (Nyberg and Mitamura, 1992), which is designed to translate Caterpillar Technical English (CTE) into other languages.

Using Caterpillar texts had the advantage of having an enormous load of already translated texts, and the fact that CTE is rather limited in scope: it only has to deal with technical language for heavy mobile equipment. Using it to translate other subject matters, would be disastrous.

 

The dictionary-based system uses a method based on dictionary entries, which means that the words will be translated as they are by a dictionary. This will make clear, of course, that a pure dictionary-based system can only give word-for-word translations, and therefore rather mediocre results – to put it mildly.

 

The statistical machine translation (SMT) uses bilingual text corpora. Where such corpora are available, good results can be achieved translating similar texts, but such corpora are still rare for many language pairs. Google switched to a statistical translation method in October 2007. In 2005, Google improved its internal translation capabilities by using approximately 200 billion words from United Nations materials to train their system, and the translation accuracy improved. Google Translate and similar statistical translation programs work by detecting patterns in hundreds of millions of documents that have previously been translated by humans and making intelligent guesses based on the findings. Generally, the more human-translated documents available in a given language, the more likely it is that the translation will be of good quality. However, it turned out this is not always the case, rather to the surprise of Google. Newer approaches into Statistical Machine translation use minimal corpus size and instead focus on derivation of syntactic structure through pattern recognition, which puts higher stress on artificial intelligence. SMT’s biggest downfall includes it being dependent upon huge amounts of parallel texts, its problems with morphology-rich languages (especially with translating into such languages), and its inability to correct singleton errors. Which explains why Google was disappointed. Not to mention that a typical United Nations document deals with a limited set of subjects.

 

Example-based machine translation is based on the idea of analogy. The corpus also contains texts that have already been translated. Given a sentence that is to be translated, sentences from this corpus are selected that contain similar sub-sentential components. The similar sentences are then used to translate the sub-sentential components of the original sentence into the target language, and these phrases are put together to form a complete translation.

Hybrid machine translation (HMT) leverages the strengths of statistical and rule-based translation methodologies. Several MT organizations claim a hybrid approach that uses both rules and statistics.

 

And finally a deep learning based approach is neural machine translation.

But all these methods are in some or other way hampered by several problems: ambiguity in texts, non-standard speech, names from people, places, organizations and so on, and the continuous changes in language: what’s standard today, might be substandard tomorrow, and vice-versa.

 

In reality all systems are in some way hybrid systems, because the output of the computer program always has to be checked by a human translator. Example-based machine translation is actually the most successful form of machine translation, because the computer program uses a big memory of previous translations to come up with suggestions, which the translator has to judge, change if necessary, and validate.

 

As mentioned above, forms of machine translations have a long history, and the development was slow and hampered by characteristics of human language (e.g. it’s well-know lack of sustained logic), and by technological problems, like processing speed and memory size.

The main reason computer translations seem to be on the up, is that processing speed and memory size are gradually less of a problem. It also means that the influx of all forms of automation have never given a big boom to artificial translation.

 

It did, however, change the nature of the work of the translator. Translation turned more and more into proofreading and editing, away from pure translation. That was a rather slow evolution, and in all likelihood, it will remain so for a very long time.

 

robots