Seeing allows animals and people alike to gather information from a distance, often with high spatial and temporal resolution. Machines have access to this rich pool of information thanks to their cameras. But, they still do not have the software to process it, in order to transform the raw pixel values into useful information such as nature, position, and function of the surrounding objects. That is one of the reasons why it is still difficult for them to naviguate in an unknown environment and interract with people and objects in an un-planned fashion. However, the design of such a software implies many challenges. Among them, it is hard to compare two images, for insance, in order to recognize that the seen image is similar to another which has been previously seen and identified. One of the difficulties here is that the software cannot know --a priori-- which parts of the two images match. So, it cannot know which parts it should compare. This thesis tackles that problem, and presents a set of algorithm to find correspondences in images, or in other words, to align them. The first proposed method match parts in images, in a coherent fachion, taking into account higher order interactions between more than to of them. The second proposed algorithm apply with success alignment technique to discover the category of an object centered in an image. The third one is optimized for speed and try to detect objects of a given category, which can be anywhere in an image.