Computational analysis of cis-regulatory elements in drosophilae and mammalian genomes

Cellular differentiation and tissue specification depend in part on the establishment of specific transcriptional programs of gene expression. These programs result from the interpretation of genomic regulatory information by sequence-specific transcription factors (TFs). Decoding this information in sequenced genomes is a key issue. In a first part, we study the interaction between the TFs and the DNA sequences they bind to, called Transcription Factor Binding Sites (TFBSs). Using a Potts model inspired from spin glass physics along with high-throughput binding data for a variety of Drosophilae and mammalian TFs, we show that TFBSs exhibit correlations among nucleotides and that the account of their contribution in the binding energy greatly improves the predictability of genomic TFBSs. Then, we present Imogene, an extension to mammalian genomes of a Bayesian, phylogeny-based algorithm designed to computationally identify the Cis-Regulatory Modules (CRMs) that control gene expression in a set of co-regulated genes, and that was previously applied to Drosophila regulation. Starting with a small number of CRMs in a reference species as a training set, but with no a priori knowledge of the factors acting in trans, the algorithm uses the over-representation and conservation of TFBSs among related species to predict putative regulatory elements along with genomic CRMs underlying co-regulation. We present several applications of this algorithm both in Drosophila and vertebrates. We also present an extension of the algorithm to the case of pattern recognition, showing that CRMs with different patterns of expression can be distinguished on the sole basis of their DNA motifs content. Finally, we present applications of these modeling tools to real biological cases : the trichomes differentiation in Drosophila, and the skeletal muscle differentiation in the mouse. In both cases, predictions were experimentally validated in a joint work with biological teams, and point towards a great flexibility of the cis-regulatory processes.

Data and Resources

Additional Info

Field Value
Source https://theses.hal.science/tel-00865159
Author Santolini, Marc
Maintainer CCSD
Last Updated May 9, 2026, 15:32 (UTC)
Created May 9, 2026, 15:32 (UTC)
Identifier tel-00865159
Language fr
Rights https://about.hal.science/hal-authorisation-v1/
contributor Laboratoire de Physique Statistique de l'ENS (LPS) ; Fédération de recherche du Département de physique de l'Ecole Normale Supérieure - ENS Paris (FRDPENS) ; École normale supérieure - Paris (ENS-PSL) ; Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL) ; Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Université Paris Diderot - Paris 7 (UPD7)-Centre National de la Recherche Scientifique (CNRS)
creator Santolini, Marc
date 2013-09-19T00:00:00
harvest_object_id bea147c4-f4bf-409a-8deb-fb262d8d8a82
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2024-04-20T00:00:00
set_spec type:THESE