On Some Unsupervised Learning Problems for Highly Dependent Time Series

This thesis is devoted to the theoretical analysis of unsupervised learning problems involving highly dependent time-series. Two fundamental problems are considered, namely, the problem of change point estimation as well as that of time-series clustering. The problems are considered in an extremely general framework, where the data are assumed to be generated by arbitrary, unknown stationary ergodic process distributions. This is one of the weakest assumptions in statistics, because it is more general than the parametric and model-based settings, and it subsumes most of the non-parametric frameworks considered for this class of problems. These assumptions typically have the premise that each time-series consists of independent and identically distributed observations or that it satisfies certain mixing conditions. For each of the considered problems, novel nonparametric methods are proposed, and are further shown to be asymptotically consistent in this general framework. For change point estimation, asymptotic consistency refers to the algorithm's ability to produce change point estimates that are asymptotically arbitrarily close to the true change points. On the other hand, a clustering algorithm is asymptotically consistent, if the output clustering, restricted to each fixed batch of sequences, consistently coincides with the target clustering from some time on. The proposed algorithms are shown to be efficiently implementable, and the theoretical results are complemented with experimental evaluations. Statistical analysis in the stationary ergodic framework is extremely challenging. In general, rates of convergence (even of frequencies to respective probabilities) are provably impossible to obtain for this class of processes. As a result, given a pair of samples generated independently by stationary ergodic process distributions, it is provably impossible to distinguish between the case where they are generated by the same process or by two different ones. This in turn, implies that such problems as time-series clustering with unknown number of clusters, or change point detection, cannot possibly admit consistent solutions. Thus, a challenging task is to discover the problem formulations which admit consistent solutions in this general framework. The main contribution of this thesis is to constructively demonstrate that despite these theoretical impossibility results, natural formulations of the considered problems exist which admit consistent solutions in this general framework. Specifically, natural formulations of change-point estimation and time-series clustering are proposed, and efficient algorithms are provided, which are shown to be asymptotically consistent under the assumption that the process distributions are stationary ergodic. This includes the demonstration of the fact that the correct number of change points can be found, without the need to impose stronger assumptions on the process distributions. It turns out that in this formulation the change point estimation problem can be reduced to time-series clustering. The results presented in this work lay down the theoretical foundations for the analysis of sequential data in a broad range of real-world applications.

Data and Resources

Additional Info

Field Value
Source https://theses.hal.science/tel-00920184
Author Khaleghi, Azadeh
Maintainer CCSD
Last Updated May 7, 2026, 18:29 (UTC)
Created May 7, 2026, 18:29 (UTC)
Identifier tel-00920184
Language en
Rights https://about.hal.science/hal-authorisation-v1/
contributor Sequential Learning (SEQUEL) ; Laboratoire d'Informatique Fondamentale de Lille (LIFL) ; Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Centre Inria de l'Université de Lille ; Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Automatique, Génie Informatique et Signal (LAGIS) ; Université de Lille, Sciences et Technologies-Centrale Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Centre National de la Recherche Scientifique (CNRS)
creator Khaleghi, Azadeh
date 2013-11-18T00:00:00
harvest_object_id f76c8fec-cd27-4b1f-b8e4-7835c98bb975
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2025-08-12T00:00:00
set_spec type:THESE