Efficient large-context dependency parsing and correction with distributional lexical resources

This thesis explores ways to improve the accuracy and coverage of efficient statistical dependency parsing. We employ transition-based parsing with models learned using Support Vector Machines (Cortes and Vapnik, 1995), and our experiments are carried out on French. Transition-based parsing is very fast due to the computational efficiency of its underlying algorithms, which are based on a local optimization of attachment decisions. Our first research thread is thus to increase the syntactic context used. From the arc-eager transition system (Nivre, 2008) we propose a variant that simultaneously considers multiple candidate governors for right-directed attachments. We also test parse correction, inspired by Hall and Novák (2005), which revises each attachment in a parse by considering multiple alternative governors in the local syntactic neighborhood. We find that multiple-candidate approaches slightly improve parsing accuracy overall as well as for prepositional phrase attachment and coordination, two linguistic phenomena that exhibit high syntactic ambiguity. Our second research thread explores semi-supervised approaches for improving parsing accuracy and coverage. We test self-training within the journalistic domain as well as for adaptation to the medical domain, using a two-stage parsing approach based on that of McClosky et al. (2006). We then turn to lexical modeling over a large corpus: we model generalized lexical classes to reduce data sparseness, and prepositional phrase attachment preference to improve disambiguation. We find that semi-supervised approaches can sometimes improve parsing accuracy and coverage, without increasing time complexity.

Data and Resources

Efficient large-context dependency parsing and...HTML
Explore
- More information
- Go to resource

Additional Info

Field	Value
Source	https://theses.hal.science/tel-00860720
Author	Henestroza Anguiano, Enrique
Maintainer	CCSD
Last Updated	May 9, 2026, 19:03 (UTC)
Created	May 9, 2026, 19:03 (UTC)
Identifier	tel-00860720
Language	en
Rights	https://about.hal.science/hal-authorisation-v1/
contributor	Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing (ALPAGE) ; Inria Paris-Rocquencourt ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Paris Diderot - Paris 7 (UPD7)
creator	Henestroza Anguiano, Enrique
date	2013-06-27T00:00:00
harvest_object_id	18e9cfe1-cf94-452c-aae2-c9ec29c616c1
harvest_source_id	3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title	test moissonnage SELUNE
metadata_modified	2025-02-26T00:00:00
set_spec	type:THESE