CSE111 - Indexes

5.0(1)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/51

flashcard set

Earn XP

Description and Tags

Quiz 4

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

52 Terms

1
New cards

Index

A data structure that makes it efficient to find those tuples as a binary search tree of (key, value) pairs, in which a key a (one of the values that attribute A may have) is associated with a "value" that is the set of locations of the tuples that have a in the component for attribute A

2
New cards

Index key

Attributes of the index when a distinction needs to be made. Can be any attribute or set of attributes, and need not be the key for the relation on which the index is built

3
New cards

Technology of implementing indexes on large relations

Central importance in the implementation of DBMS's

4
New cards

B-tree

A generalization of a balanced binary tree

5
New cards

When relations are very large...

It becomes expensive to scan all the tuples of a relation to find those (perhaps very few) tuples that match a given condition

6
New cards

Example of a syntax for creating an index on a certain attribute for a certain relation

Index Name: YearIndex

Attribute: year

Relation: Movies

CREATE INDEX YearIndex ON Movies(year)

7
New cards

What does this term mean?

8
New cards

What exactly does this mean?

CREATE INDEX YearIndex ON Movies(year)

SQL queries that specify a year may be executed by the SQL query processor in such a way that only those tuples of Movies with the specified year are ever examined; there is a resulting decrease in the time needed to answer the query

9
New cards

What does a DBMS allow us to do often?

Build a single index on multiple attributes. This type of index takes values for several attributes and efficiently finds the tuples with the given values for these attributes

10
New cards

For this declaration of an index on these two attributes: CREATE INDEX KeyIndex ON Movies(title, year);

11
New cards
12
New cards

For this declaration of an index on these two attributes: CREATE INDEX KeyIndex ON Movies(title, year)…
When we are given a title and a year, what will the index do?

Since (title, year) is a key, it follows that when we are given a title and year, we know the index will find only one tuple, and that will be the desired tuple

13
New cards

What if the key for the multiattribute index is really the concatenation of the attributes in some order?

Then we can even use this index to find all the tuples with the given value in the first of the attributes. For instance, if we were more likely to specify a title than a year for a movie, then we would prefer to order the attributes as (title, year) and vice versa

14
New cards

How to delete an index?

Use its name in a statement like: DROP INDEX YearIndex

15
New cards

What is required of the database designer when choosing which indexes to create?

Analyzing a trade-off. This choice is one of the principle factors that influence whether a database design gives acceptable performance.

16
New cards

One important factor to consider when choosing an attribute for an index

The existence of an index on an attribute may speed up greatly the execution of those queries in which a value, or range of values, is specified for that attribute, and may speed up joins involving that attribute as well

17
New cards

The second important factor to consider when choosing an attribute

Every index built for one or more attributes of some relation makes insertions, deletions, and updates to that relation more complex and time-consuming

18
New cards

What's something we need to know to understand how to choose indexes for a database?

Where the time is spent answering a query

19
New cards

What is required to examine even one tuple of a relation that's normally distributed among many pages of a disk?

The whole page must be brought into main memory

20
New cards

How much time does it cost to examine all the tuples on a page than to examine only one?

Little more time

21
New cards

One reason why the most useful index we can put on a relation is an index on its key

Queries in which a value for the key is specified are common. Thus, an index on the key will get used frequently

22
New cards

A second reason why the most useful index we can put on a relation is an index on its key

Since there is at most one tuple with a given key value, the index returns either nothing or one location for a tuple. Thus, at most one page must be retrieved to get that tuple into main memory (although there may be other pages that need to be retrieved to use the index itself).

23
New cards

What happens when an index is not on a key? Does it improve time efficiency?

It may or may not be able to improve the time spent retrieving from disk the tuples needed to answer a query

24
New cards

One situation in which an index can be effective, even if it is not on a key

If the attribute is almost a key; that is, relatively few tuples have a given value for that attribute. Even if each of the tuples with a given value is on a different page, we shall not have to retrieve many pages from disk

25
New cards

Another situation in which an index can be effective, even if it is not on a key

If the tuples are "clustered" on that attribute. We cluster a relation on an attribute by grouping the tuples with a common value for that attribute onto as few pages as possible. Then, even if there are many tuples, we shall not have to retrieve nearly as many pages as there are tuples.

26
New cards

Suppose the only index we have on Movies is one on year, and we want to answer the query:

27
New cards
28
New cards

What indexes could we create from this?
SELECT * FROM Movies WHERE year = 1990;

Suppose the tuples of Movies are clustered on year. Then we could use the index on year = 1990. In this case, the year index will be of great help. In comparison, an index on the combination of title and year would be of little help, no matter with attribute or attributes we used to cluster Movies.

29
New cards

What should we keep in mind if modifications are the most frequent action?

We should be very conservative about creating indexes.

30
New cards

Each modification on a relation R...

forces us to change any index on one or more of the modified attributes of R.

31
New cards

What must we read and write when keeping in mind that each modification on a relation R forces us to change any index on one or more of the modified attributes of R?

We must read and write not only the pages of R that are modified, but also read and write certain pages that hold the index.

32
New cards

Even when modifications are the dominant form of database action...

it may be an efficiency gain to create an index on a frequently used attribute.

33
New cards

Since some modification commands involve querying the database (e.g., an INSERT with a select-from-where subquery or a DELETE with a condition)...

one must be very careful how one estimates the relative frequency of modifications and queries

34
New cards

What is the typical relation stored over?

Many disk blocks (pages)

35
New cards

Principal cost of a query or modification

Often the number of pages that need to be brought to main memory

36
New cards

What about indexes that can save us a lot of time

Indexes that let us find a tuple without examining the entire relation

37
New cards

What costs disk accesses?

Accessing and modifying the indexes themselves

38
New cards

What’s the costs of Modification?

Since it requires one disk access to read a page and another disk access to write the changed page, it's about twice as expensive as accessing the index or the data in a query

39
New cards

What do we need to do to calculate the new value of an index?

We need to make assumptions about which queries and modifications are most likely to be performed on the database

40
New cards

What assumption can we make if we have a history of queries that we can use to get good information?

That the future will be like the past

41
New cards

We may know that the database supports a particular application or applications, and…

We can see in the code for those applications all the SQL queries and modifications that they will ever do.

42
New cards

Common query and modification forms

Can have variables in place of constants, but should otherwise look like real SQL statements

43
New cards

"Tuning" a database

A process that includes not only index selection, but the choice of many different parameters

44
New cards

Examples of tuning issues

The amount of main memory to allocate to various processes and the rate at which backups and checkpoints are made (to facilitate recovery from a crash)

45
New cards

First step on how the index-selection portion of tuning advisors work

Establish the query workload. Since a DBMS normally logs all operations anyway, we may be able to examine the log and find a set of representative queries and database modifications for the database at hand. Or it is possible that we know, from the application programs that use the database, what the typical queries will be

46
New cards

Second step on how the index-selection portion of tuning advisors work

The designer may be offered the opportunity to specify some constraints, e.g., indexes that must, or must not, be chosen

47
New cards

Third step on how the index-selection portion of tuning advisors work

The tuning advisor generates a set of possible candidate indexes, and evaluates each one. Typical queries are given to the query optimizer of the DBMS. The query optimizer has the ability to estimate the running times of these queries under the assumption that one particular set of indexes is available

48
New cards

Fourth step on how the index-selection portion of tuning advisors work

The index set resulting in the lowest cost for the given workload is suggested to the designer, or it is automatically created

49
New cards

Benefit

Improvement in average execution time of the query mix

50
New cards

Step one for a greedy approach to choosing indexes

With no indexes selected, evaluate the benefit of each of the candidate indexes. If at least one provides positive benefit (i.e., it reduces the average execution time of queries), then choose that index

51
New cards

Step two for a greedy approach to choosing indexes

Reevaluate the benefit of each of the remaining candidate indexes, assuming that the previously selected index is also available. Again, choose the index that provides the greatest benefit, assuming that benefit is positive

52
New cards

Step three for a greedy approach to choosing indexes

Repeat the evaluation of candidate indexes under the assumption that all previously selected indexes are available. Pick the index with maximum benefit, until no more positive benefits can be obtained