A Syntax-Augmented Phrase-Based Statistical Machine Translation Model

Traditional Statistical Machine Translation models are not aware of linguistic structure. Thus, target lexical choices and word order are controlled only by surface-based statistics learned from the training corpus. However, knowledge of linguistic structure can be beneficial since it provides generic information compensating data sparsity. The purpose of our work is to study the impact of syntactic information while preserving the general framework of Phrase-Based SMT. First, we study the integration of syntactic information using a reranking approach. We define features measuring the similarity between the dependency structures of source and target sentences, as well as features of linguistic coherence of the target sentences. The importance of each feature is assessed by learning their weights through a Structured Perceptron Algorithm. The evaluation of several reranking models shows that these features often improve the quality of translations produced by the basic model, in terms of manual evaluations as opposed to automatic measures. Then, we propose different models in order to increase the quality and diversity of the search graph produced by the decoder, through filtering out uninteresting hypotheses based on the source syntactic structure. This is done either by learning limits on the phrase recordering, or by decomposing the source sentence in order to simplify the translation process. The initial evaluations of these models look promising.

Data and Resources

Additional Info

Field Value
Source https://theses.hal.science/tel-00996317
Author Nikoulina, Vassilina
Maintainer CCSD
Last Updated May 5, 2026, 10:15 (UTC)
Created May 5, 2026, 10:15 (UTC)
Identifier tel-00996317
Language fr
Rights https://about.hal.science/hal-authorisation-v1/
contributor Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP) ; Laboratoire d'Informatique de Grenoble (LIG) ; Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)
creator Nikoulina, Vassilina
date 2010-03-19T00:00:00
harvest_object_id cfbc0d19-ca3d-4f71-bf70-dada76c0b5b0
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2025-09-27T00:00:00
set_spec type:THESE