Combinatorial optimization for variable selection in high dimensional regression: Application in animal genetic

Advances in high-throughput sequencing and genotyping technologies allow to measure large amounts of genomic information. The aim of this work is dedicated to the animal genomic selection is to select a subset of relevant genetic markers to predict a quantitative trait, in a context where the number of genotyped animals is widely lower than the number of markers studied. This thesis introduces a state-of-the-art of existing methods to address the problem. We then suggest to deal with the variable selection in high dimensional regression problem combining combinatorial optimization methods and statistical models. We start by experimentally set two combinatorial optimization methods, the iterated local search and the genetic algorithm, combined with a linear multiple regression and we evaluate their relevance. In the context of animal genomic, family relationships between animals are known and can be an important information. As our approach is flexible we suggest an adaptation to consider these familial relationships through the use of a mixed model. Moreover, the problem of overfitting is particularly present in such data due to the large imbalance between the number of variables studied and the number of animals available, so we suggest an improvement of our approach in order to reduce this over-fitting. The different suggested approaches are validated on data from the literature as well as on real data of Gènes Diffusion.

Data and Resources

Additional Info

Field Value
Source https://theses.hal.science/tel-00920205
Author Hamon, Julie
Maintainer CCSD
Last Updated May 7, 2026, 18:29 (UTC)
Created May 7, 2026, 18:29 (UTC)
Identifier tel-00920205
Language fr
Rights https://about.hal.science/hal-authorisation-v1/
contributor Parallel Cooperative Multi-criteria Optimization (DOLPHIN) ; Laboratoire d'Informatique Fondamentale de Lille (LIFL) ; Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Centre Inria de l'Université de Lille ; Institut National de Recherche en Informatique et en Automatique (Inria)
creator Hamon, Julie
date 2013-11-26T00:00:00
harvest_object_id 77b6114a-58cb-48ea-9970-e9c01bb3175a
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2025-02-26T00:00:00
set_spec type:THESE