Note sur l'approximation de la loi hypergéométrique par la formule de Muller

The argument which is developed here starts from the computation of the probability that a word will be absent from an exhaustive random sample drawn from a corpus whose complete frequency distribution is known. This probability is the basis of the formula put forward, more than 20 years ago, by C. Muller. Muller's formula is compared here to its equivalent in the hypergeometric model. Two studies were carried out: first the computation of vocabulary increase in corpuses and, secondly, the comparison between Muller's values and averages obtained by drawing a large number of random samples from several corpuses. It is thus demonstrated that this formula is a good approximation of the hypergeometric law. The need for associating standard deviations to the computed values is also emphasised since confidence levels have to be taken into account.

Data and Resources

Note sur l'approximation de la loi...HTML
Explore
- More information
- Go to resource

Additional Info

Field	Value
Source	Etudes sur la richesse et la structures lexicales
Author	Hubert, Pierre, Labbé, Dominique
Maintainer	CCSD
Last Updated	June 3, 2026, 18:09 (UTC)
Created	June 3, 2026, 18:09 (UTC)
Identifier	hal-00758060
Language	fr
Rights	https://about.hal.science/hal-authorisation-v1/
contributor	Centre de recherche sur l'administration, la ville et le territoire (CERAT) ; Université Pierre Mendès France - Grenoble 2 (UPMF)-Centre National de la Recherche Scientifique (CNRS)
creator	Hubert, Pierre
date	1988-06-03T00:00:00
harvest_object_id	28c24b52-c83f-4596-8692-e9488d327d04
harvest_source_id	3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title	test moissonnage SELUNE
metadata_modified	2025-09-27T00:00:00
set_spec	type:COUV