Statistical Approaches for Segmentation : Application to Genome Annotation

We propose to model the output of transcriptome sequencing technologies (RNA-Seq) using the negative binomial distribution, as well as build segmentation models suited to their study at different biological scales, in the context of these technologies becoming a valuable tool for genome annotation, gene expression analysis, and new-transcript discovery. We develop a fast segmentation algorithm to analyze whole chromosomes series, and we propose two methods for estimating the number of segments, a key feature related to the number of genes expressed in the cell, should they be identified from previous experiments or discovered at this occasion. Research on precise gene annotation, and in particular comparison of transcription boundaries for individuals, naturally leads us to the statistical comparison of change-points in independent series. To address our questions, we build tools, in a Bayesian segmentation framework, for which we are able to provide uncertainty measures. We illustrate our models, all implemented in R packages, on an RNA-Seq dataset from a study on yeast, and show for instance that the intron boundaries are conserved across conditions while the beginning and end of transcripts are subject to differential splicing.

Data and Resources

Additional Info

Field Value
Source https://theses.hal.science/tel-00913851
Author Cleynen, Alice
Maintainer CCSD
Last Updated May 7, 2026, 23:13 (UTC)
Created May 7, 2026, 23:13 (UTC)
Identifier NNT: 2013PA112258
Language en
Rights https://about.hal.science/hal-authorisation-v1/
contributor Mathématiques et Informatique Appliquées (MIA-Paris) ; Institut National de la Recherche Agronomique (INRA)-AgroParisTech
creator Cleynen, Alice
date 2013-11-15T00:00:00
harvest_object_id abcb7d35-fbae-4470-a035-74004e3c8dcc
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2026-04-01T00:00:00
set_spec type:THESE