Un modèle de partition du vocabulaire

The model proposed here is used to describe the vocabulary of a corpus. It is divided into two groups : general vocabulary which is used whatever the circumstances and several local (or 'specialised') vocabularies, each of which is used in only one part of the corpus, General words may appear everywhere in the text and their increase with corpus size can be estimated with Muller's formula. In this model, a partition parameter measures the relative importance of both types of vocabularies: so the value of this parameter gives an estimation of the lexical 'specialisation' in the text. This model has been applied to Racine's plays and TV debates (Giscard vs Mitterrand, Chirac vs Fabius). The partition model can also be used to measure the increase of vocabulary with corpus length, to locate stylistic changes or to compare several texts from the point of view of their lexical richness.

Data and Resources

Additional Info

Field Value
Source Etudes sur la richesse et la structures lexicales
Author Hubert, Pierre, Labbé, Dominique
Maintainer CCSD
Last Updated June 3, 2026, 17:40 (UTC)
Created June 3, 2026, 17:40 (UTC)
Identifier hal-00758061
Language fr
Rights https://about.hal.science/hal-authorisation-v1/
contributor Centre de Géosciences (GEOSCIENCES) ; Mines Paris - PSL (École nationale supérieure des mines de Paris) ; Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)
creator Hubert, Pierre
date 1988-06-03T00:00:00
harvest_object_id 7bbe3bf5-2fb6-45da-be77-b077cdf3b1f8
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2026-02-07T00:00:00
set_spec type:COUV