Automatic sentence compression : towards abstract summarization

This dissertation presents a novel approach to automatic text summarization, one of the most challenging tasks in Natural Language Processing (NLP). Until now, no one had ever created a summarization method capable of producing summaries comparable in quality with those produced by humans. Even many of state-of-the-art approaches form the summary by selecting a subset of sentences from the original text. Since some of the selected sentences might still contain superfluous information, a finer analysis is needed. We propose an Automatic Sentence Compression method based on the elimination of intra-phrase discourse segments. Using a manually annotated big corpus, we have obtained a linear model that predicts the elimination probability of a segment on the basis of three simple three criteria: informativity, grammaticality and compression rate. We discuss the difficulties for automatic assessment of these criteria in documents and phrases and we propose a solution based on existing techniques in NLP literature, one applying two different algorithms that produce summaries with compressed sentences. After applying both algorithms in documents in Spanish, our method is able to produce high quality results. Finally, we evaluate the produced summaries using the Turing test to determine if human judges can distinguish between human-produced summaries and machine-produced summaries. This dissertation addresses many previously ignored aspects of NLP, namely the subjectivity of informativity, the sentence compression in Spanish documents, and the evaluation of NLP using the Turing test.

Data and Resources

Additional Info

Field Value
Source https://theses.hal.science/tel-00998924
Author Molina Villegas, Alejandro
Maintainer CCSD
Last Updated May 5, 2026, 09:51 (UTC)
Created May 5, 2026, 09:51 (UTC)
Identifier NNT: 2013AVIG0195
Language fr
Rights https://about.hal.science/hal-authorisation-v1/
contributor Laboratoire Informatique d'Avignon (LIA) ; Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
creator Molina Villegas, Alejandro
date 2013-09-30T00:00:00
harvest_object_id 0c609c7f-7e1c-4c92-b647-738087c8d0c0
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2026-03-31T00:00:00
set_spec type:THESE