Domain Adaptation for Statistical Machine Translation

These last years have seen the development of statistical approaches for machine translation. Nevertheless, the intrinsic variations of the natural language act upon the quality of statistical models. Studies have shown that in-domain corpora containwords that can occur in out-of-domain corpora (common words), but also contain domain specific words. This particularity can be handled by terminological resources like bilingual lexicons. However, if the vocabulary differs between out and in-domain data, the syntactic and semantic content may also vary. In our work, we consider the task of domain adaptation for statistical machine translation through two majoraxes : bilingual lexicon acquisition and post-edition of machine translation outputs.We evaluate our approaches on the medical domain. The quality of automatic translations in the medical domain are improved and the results are compared to other works in this field. Oracle evaluations tend to show that further gains are still possible

Data and Resources

Domain Adaptation for Statistical Machine TranslationHTML
Explore
- More information
- Go to resource

Additional Info

Field	Value
Source	https://theses.hal.science/tel-00879945
Author	Rubino, Raphaël
Maintainer	CCSD
Last Updated	May 9, 2026, 03:44 (UTC)
Created	May 9, 2026, 03:44 (UTC)
Identifier	NNT: 2011AVIG0186
Language	fr
Rights	https://about.hal.science/hal-authorisation-v1/
contributor	Laboratoire Informatique d'Avignon (LIA) ; Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
creator	Rubino, Raphaël
date	2011-11-30T00:00:00
harvest_object_id	c4baceba-4f71-48fe-a7ed-f76a93d2258b
harvest_source_id	3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title	test moissonnage SELUNE
metadata_modified	2026-03-31T00:00:00
set_spec	type:THESE