Semantic Description of Humans in Images

In the present thesis we are interested in semantic description of humans in images. We propose to describe humans with the help of (i) semantic attributes e.g. male or female, wearing a tee-shirt, (ii) actions e.g. riding a horse, running and (iii) facial expressions e.g. smiling, angry. First, we propose a new image representation to better exploit the class specific spatial information. The standard representation \ie spatial pyramids, has two shortcomings. It assumes that the distribution of spatial information (i) is uniform and (ii) is same for all tasks. We address these shortcomings by learning the discriminative spatial information for a specific task. Further, we propose a model that adapts the spatial information for each image for a given task. This lends more flexibility to the model and allows for misalignments of discriminative regions e.g. the legs may be at different positions, in different images for running class. Finally, we propose a new descriptor for facial expression analysis. We work in the space of intensity differences of local pixel neighborhoods and propose to learn the quantization of the space and use higher order statistics of the difference vector to obtain more expressive descriptors. We introduce a challenging dataset of human attributes containing 9344 human images, sourced from the internet, with annotations for 27 semantic attributes based on sex, pose, age and appearance/clothing. We validate the proposed methods on our dataset of human attributes as well as on publicly available datasets of human actions, fine grained classification involving human actions and facial expressions. We also report results on related computer vision datasets, for scene recognition, object image classification and texture categorization, to highlight the generality of our contributions.

Data and Resources

Semantic Description of Humans in ImagesHTML
Explore
- More information
- Go to resource

Additional Info

Field	Value
Source	https://theses.hal.science/tel-00767699
Author	Sharma, Gaurav
Maintainer	CCSD
Last Updated	May 29, 2026, 21:34 (UTC)
Created	May 29, 2026, 21:34 (UTC)
Identifier	tel-00767699
Language	en
Rights	https://about.hal.science/hal-authorisation-v1/
contributor	Learning and recognition in vision (LEAR) ; Centre Inria de l'Université Grenoble Alpes ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK) ; Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Centre National de la Recherche Scientifique (CNRS)
creator	Sharma, Gaurav
date	2012-12-17T00:00:00
harvest_object_id	9759469a-d2a0-462d-91b6-5c71032c928a
harvest_source_id	3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title	test moissonnage SELUNE
metadata_modified	2025-09-27T00:00:00
set_spec	type:THESE