Domain Adaptation for Statistical Machine Translation

These last years have seen the development of statistical approaches for machine translation. Nevertheless, the intrinsic variations of the natural language act upon the quality of statistical models. Studies have shown that in-domain corpora containwords that can occur in out-of-domain corpora (common words), but also contain domain specific words. This particularity can be handled by terminological resources like bilingual lexicons. However, if the vocabulary differs between out and in-domain data, the syntactic and semantic content may also vary. In our work, we consider the task of domain adaptation for statistical machine translation through two majoraxes : bilingual lexicon acquisition and post-edition of machine translation outputs.We evaluate our approaches on the medical domain. The quality of automatic translations in the medical domain are improved and the results are compared to other works in this field. Oracle evaluations tend to show that further gains are still possible

Data and Resources

Additional Info

Field Value
Source https://theses.hal.science/tel-00879945
Author Rubino, Raphaël
Maintainer CCSD
Last Updated May 9, 2026, 03:44 (UTC)
Created May 9, 2026, 03:44 (UTC)
Identifier NNT: 2011AVIG0186
Language fr
Rights https://about.hal.science/hal-authorisation-v1/
contributor Laboratoire Informatique d'Avignon (LIA) ; Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
creator Rubino, Raphaël
date 2011-11-30T00:00:00
harvest_object_id c4baceba-4f71-48fe-a7ed-f76a93d2258b
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2026-03-31T00:00:00
set_spec type:THESE