Visual words for pose computation

We address the problem of establishing point correspondences in images for computing camera pose through Perspective-n-Point (PnP) algorithm. We compute the 3D map i.e. 3D coordinates and visual characteristics of some of the points in the environment through an offline training stage using a set of training images. Given a new test image we apply PnP using the 2D coordinates of 3D points in the image detected by using the 3D map. During the training stage we cluster the SIFT descriptors extracted from training images to obtain 2D-tracks of some of the 3D points in the environment. Each 2D-track consists of a set of 2D image coordinates of a single 3D point in different training images. SfM (Structure from Motion) is performed on these 2D-tracks to obtain the coordinates of the corresponding 3D points. During the test stage, the SIFT descriptors associated the 2D-track of a 3D point is used to recognize the 3D point in a given image. The overall process is similar to visual word framework used in different fields of computer vision. During training, visual word formation is performed through clustering and during testing 3D points are identified through visual word recognition. We experiment with different clustering schemes (k-means and mean-shift) and propose a novel scheme for visual word formation for training stage. We use different matching rules including some of the popular supervised pattern classification methods to perform visual word recognition during test stage. We evaluate these various matching strategies in both stages. In order to achieve robustness against pose variation between train and test images, we explore different ways of incorporating SIFT descriptors extracted from synthetic views generated from the training images. We also propose an exact acceleration strategy for mean-shift computation.

Data and Resources

Additional Info

Field Value
Source https://theses.hal.science/tel-01749330
Author Bhat, Srikrishna
Maintainer CCSD
Last Updated May 14, 2026, 03:19 (UTC)
Created May 14, 2026, 03:19 (UTC)
Identifier NNT: 2013LORR0001
Language en
Rights https://about.hal.science/hal-authorisation-v1/
contributor Visual Augmentation of Complex Environments (MAGRIT) ; INRIA Lorraine ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)
creator Bhat, Srikrishna
date 2013-01-22T00:00:00
harvest_object_id d0b556ca-2ab8-4f3b-9d31-2a4556326884
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2025-11-04T00:00:00
set_spec type:THESE