Object recognition

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/26

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

27 Terms

1
New cards

Indexing local features problem

  • With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image?

2
New cards

Inverted File Index idea

  • For text documents, efficient way to find the pages a word occurs is an index

  • We want to find all images in which a feature occurs

  • To use this idea, need to map our features to “visual words”.

3
New cards

Indexing with visual words

Map high dimensional SIFT descriptors to tokens/words by quantising the feature space

  • Quantise via clustering, let cluster centres be prototype “words”

  • Determine which word to assign to each new image region by finding the closest cluster centre.

4
New cards

Inverted file index

Database map from visual word to the image it occurs in

  1. Extract words in query

  2. Inverted file index to find relevant frames

  3. Compare word counts

5
New cards

Spatial verification

Need to know visual words are geometrically arranged in a compatible way between two images

6
New cards

Spatial verification strategy

  • Generalised Hough Transform

    • Let each matched feature cast a vote on location, scale, orientation of model object

    • Verify parameters with enough votes

7
New cards

Application of Inverted File Index: Video Google System

  1. Collect all words within query region

  2. Inverted file index to find relevant frames

  3. Compare word counts

  4. Spatial verification

8
New cards

Issues for formation of visual vocabulary

  • Sampling strategy: where to extract features?

  • What clustering/quantisation algorithm to use?

  • Unsupervised vs supervised

  • What corpus provides features (universal vocabulary)?

  • Vocabulary size, number of words

9
New cards

Sampling Strategies

  • Sparse, at interest points

    • Specific, textured objects, sparse sampling from interest points more reliable.

  • Dense, uniformly

    • For object categorisation, dense sampling offers better coverage

  • Randomly

  • Multiple interest operators (e.g. Harris and LoG)

10
New cards

Object Categorisation: Task description

Given a small number of training images of a category, recognise up-til-now unknown instances of that category and assign correct label

11
New cards

Visual object categories - defined in humans?

Predominantly visually. Evidence that humans usually start with basic level categorisation before doing identification

12
New cards

Types of object categories

  • Functional categories (e.g. chairs, something you can sit on)

  • Ad-hoc categories (e.g. something you can find in an office environment)

13
New cards

Object recognition robustness:

  • Illumination

  • Object pose

  • Clutter

  • Occlusions

  • Intra-class appearance

  • Viewpoint

14
New cards

Bag of Words: Analogy to documents

Can classify a book based on seeing a group of words you expect to be characteristic of a certain class

Not just the occurrence of certain words but the co-occurrence.

<p>Can classify a book based on seeing a group of words you expect to be characteristic of a certain class<br><br>Not just the occurrence of certain words but the co-occurrence.</p>
15
New cards

Bag of visual words definition

Collection of independent visual words with a histogram representation.

  • Summarise entire image based on its distribution of word occurences

  • Analogous to bag of words representation commonly used for documents

<p>Collection of independent visual words with a histogram representation.</p><ul><li><p>Summarise entire image based on its distribution of word occurences</p></li><li><p>Analogous to bag of words representation commonly used for documents</p></li></ul><p></p>
16
New cards

First step of training bag of words

Two ways:

  • Regular grid

  • Interest point detector

    • Use SOTA interest point detector

    • Represent by using SIFT

17
New cards

Comparing bags of words

Build up histograms of word activation - so any histogram comparison measure can be used here

E.g. can rank frames by normalised scalar product between weighted occurence counts

18
New cards

Given bag-of-features representations of images from different classes, how do we train a model to distinguish them?

Extract features from the image
Each of the visual words are labelled with a visual word label
Arrange visual words into a histogram of fixed size that represents your image

Can then either use generative methods or discriminative methods

19
New cards

Discriminative methods

Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes (e.g. SVM)

Discriminative methods tend to have a boundary that gives a yes/no or a separation between classes.

20
New cards

Generative methods

Based on likelihood — “based on my image, what is the likelihood it contains a zebra?”

21
New cards

Discriminative method example: nearest neighbour classification

  • Assign input vector to one of two or more clases

  • Any decision rule divides input space into decision regions separated by decision boundaries

  • Assign label of nearest training data point to each test data point

K-nearest neighbour classification will find the k closest points from training data, and labels of k points “vote” to classify

<ul><li><p>Assign input vector to one of two or more clases</p></li><li><p>Any decision rule divides input space into decision regions separated by decision boundaries</p></li><li><p>Assign label of nearest training data point to each test data point</p></li></ul><p>K-nearest neighbour classification will find the k closest points from training data, and labels of k points “vote” to classify</p><p></p>
22
New cards

Nearest neighbour method pros

  • Simple to implement

  • Flexible to feature/distance choices

  • Naturally handles multi-class cases

  • Can do well in practice with enough representative data

23
New cards

Nearest neighbour method cons

  • Large search problem to find nearest neighbours

  • Storage of data

  • Must know we have a meaningful distance function

24
New cards

Generative method example: Naive Bayes Model

Assume each feature given the class
Prior: Boost knowledge with our prior (a certain class is more likely, given where we are)

Naive Bayes classifier assumes that visual words are conditionally independent given object class

25
New cards

Bag-of-words pros

  • Flexible to geometry/deformations/viewpoint

  • Compact summary of image content

  • Provides vector representation of sets

  • Empirically good recognition results in practice

26
New cards

Methods to add spatial information to BoW

BoW is orderless

  • Visual ‘phrases’: frequently co-occuring words

  • Semi-local features: describe configuration, neighbourhood

  • Let position be part of each feature

  • Count bags of words only within sub-grids of an image

  • After matching, verify spatial consistency (e.g. look at neighbours—are they the same too?)

27
New cards

Bag-of-words cons

  • Basic model ignores geometry — must verify afterwards, or encode via features

  • Background and foreground mixed when bag covers whole image

  • Interest points or sampling: no guarantee to capture objective parts

  • Optimal vocabulary formation remains unclear.