Enhanced image and video representation for visual recognition

The subject of this thesis is about image and video representations for visual recognition. This thesis first focuses on image search, both for image and textual queries, and then considers the classification and the localization of actions in videos. In image retrieval, images similar to the query image are retrieved from a large dataset. On this front, we propose an asymmetric version of the Hamming Embedding method, where the comparison of query and database descriptors relies on a vector-to-binary code comparison. For image classification, where the task is to identify if an image contains any instance of the queried category, we propose a novel approach based on a match kernel between images, more specifically based on Hamming Embedding similarity. We also present an effective variant of the SIFT descriptor, which leads to a better classification accuracy. Action classification is improved by several methods to better employ the motion inherent to videos. This is done by dominant motion compensation, and by introducing a novel descriptor based on kinematic features of the visual flow. The last contribution is devoted to action localization, whose objective is to determine where and when the action of interest appears in the video. A selective sampling strategy produces 2D+t sequences of bounding boxes, which drastically reduces the candidate locations. The method advantageously exploits a criterion that takes in account how motion related to actions deviates from the background motion. We thoroughly evaluated all the proposed methods on real world images and videos from challenging benchmarks. Our methods outperform the previously published related state of the art and remains competitive with the subsequently proposed methods.

Data and Resources

Additional Info

Field Value
Source https://theses.hal.science/tel-00996793
Author Jain, Mihir
Maintainer CCSD
Last Updated May 5, 2026, 10:10 (UTC)
Created May 5, 2026, 10:10 (UTC)
Identifier tel-00996793
Language en
Rights https://about.hal.science/hal-authorisation-v1/
contributor Multimedia content-based indexing (TEXMEX) ; Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) ; Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) ; Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) ; Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Centre Inria de l'Université de Rennes ; Institut National de Recherche en Informatique et en Automatique (Inria)
creator Jain, Mihir
date 2014-04-09T00:00:00
harvest_object_id f2b3d18e-f166-4bd1-ba4a-2e4a2a44b064
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2025-08-12T00:00:00
set_spec type:THESE