1/51
Quiz 4
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Index
A data structure that makes it efficient to find those tuples as a binary search tree of (key, value) pairs, in which a key a (one of the values that attribute A may have) is associated with a "value" that is the set of locations of the tuples that have a in the component for attribute A
Index key
Attributes of the index when a distinction needs to be made. Can be any attribute or set of attributes, and need not be the key for the relation on which the index is built
Technology of implementing indexes on large relations
Central importance in the implementation of DBMS's
B-tree
A generalization of a balanced binary tree
When relations are very large...
It becomes expensive to scan all the tuples of a relation to find those (perhaps very few) tuples that match a given condition
Example of a syntax for creating an index on a certain attribute for a certain relation
Index Name: YearIndex
Attribute: year
Relation: Movies
CREATE INDEX YearIndex ON Movies(year)
What does this term mean?
What exactly does this mean?
CREATE INDEX YearIndex ON Movies(year)
SQL queries that specify a year may be executed by the SQL query processor in such a way that only those tuples of Movies with the specified year are ever examined; there is a resulting decrease in the time needed to answer the query
What does a DBMS allow us to do often?
Build a single index on multiple attributes. This type of index takes values for several attributes and efficiently finds the tuples with the given values for these attributes
For this declaration of an index on these two attributes: CREATE INDEX KeyIndex ON Movies(title, year);
For this declaration of an index on these two attributes: CREATE INDEX KeyIndex ON Movies(title, year)…
When we are given a title and a year, what will the index do?
Since (title, year) is a key, it follows that when we are given a title and year, we know the index will find only one tuple, and that will be the desired tuple
What if the key for the multiattribute index is really the concatenation of the attributes in some order?
Then we can even use this index to find all the tuples with the given value in the first of the attributes. For instance, if we were more likely to specify a title than a year for a movie, then we would prefer to order the attributes as (title, year) and vice versa
How to delete an index?
Use its name in a statement like: DROP INDEX YearIndex
What is required of the database designer when choosing which indexes to create?
Analyzing a trade-off. This choice is one of the principle factors that influence whether a database design gives acceptable performance.
One important factor to consider when choosing an attribute for an index
The existence of an index on an attribute may speed up greatly the execution of those queries in which a value, or range of values, is specified for that attribute, and may speed up joins involving that attribute as well
The second important factor to consider when choosing an attribute
Every index built for one or more attributes of some relation makes insertions, deletions, and updates to that relation more complex and time-consuming
What's something we need to know to understand how to choose indexes for a database?
Where the time is spent answering a query
What is required to examine even one tuple of a relation that's normally distributed among many pages of a disk?
The whole page must be brought into main memory
How much time does it cost to examine all the tuples on a page than to examine only one?
Little more time
One reason why the most useful index we can put on a relation is an index on its key
Queries in which a value for the key is specified are common. Thus, an index on the key will get used frequently
A second reason why the most useful index we can put on a relation is an index on its key
Since there is at most one tuple with a given key value, the index returns either nothing or one location for a tuple. Thus, at most one page must be retrieved to get that tuple into main memory (although there may be other pages that need to be retrieved to use the index itself).
What happens when an index is not on a key? Does it improve time efficiency?
It may or may not be able to improve the time spent retrieving from disk the tuples needed to answer a query
One situation in which an index can be effective, even if it is not on a key
If the attribute is almost a key; that is, relatively few tuples have a given value for that attribute. Even if each of the tuples with a given value is on a different page, we shall not have to retrieve many pages from disk
Another situation in which an index can be effective, even if it is not on a key
If the tuples are "clustered" on that attribute. We cluster a relation on an attribute by grouping the tuples with a common value for that attribute onto as few pages as possible. Then, even if there are many tuples, we shall not have to retrieve nearly as many pages as there are tuples.
Suppose the only index we have on Movies is one on year, and we want to answer the query:
What indexes could we create from this?
SELECT * FROM Movies WHERE year = 1990;
Suppose the tuples of Movies are clustered on year. Then we could use the index on year = 1990. In this case, the year index will be of great help. In comparison, an index on the combination of title and year would be of little help, no matter with attribute or attributes we used to cluster Movies.
What should we keep in mind if modifications are the most frequent action?
We should be very conservative about creating indexes.
Each modification on a relation R...
forces us to change any index on one or more of the modified attributes of R.
What must we read and write when keeping in mind that each modification on a relation R forces us to change any index on one or more of the modified attributes of R?
We must read and write not only the pages of R that are modified, but also read and write certain pages that hold the index.
Even when modifications are the dominant form of database action...
it may be an efficiency gain to create an index on a frequently used attribute.
Since some modification commands involve querying the database (e.g., an INSERT with a select-from-where subquery or a DELETE with a condition)...
one must be very careful how one estimates the relative frequency of modifications and queries
What is the typical relation stored over?
Many disk blocks (pages)
Principal cost of a query or modification
Often the number of pages that need to be brought to main memory
What about indexes that can save us a lot of time
Indexes that let us find a tuple without examining the entire relation
What costs disk accesses?
Accessing and modifying the indexes themselves
What’s the costs of Modification?
Since it requires one disk access to read a page and another disk access to write the changed page, it's about twice as expensive as accessing the index or the data in a query
What do we need to do to calculate the new value of an index?
We need to make assumptions about which queries and modifications are most likely to be performed on the database
What assumption can we make if we have a history of queries that we can use to get good information?
That the future will be like the past
We may know that the database supports a particular application or applications, and…
We can see in the code for those applications all the SQL queries and modifications that they will ever do.
Common query and modification forms
Can have variables in place of constants, but should otherwise look like real SQL statements
"Tuning" a database
A process that includes not only index selection, but the choice of many different parameters
Examples of tuning issues
The amount of main memory to allocate to various processes and the rate at which backups and checkpoints are made (to facilitate recovery from a crash)
First step on how the index-selection portion of tuning advisors work
Establish the query workload. Since a DBMS normally logs all operations anyway, we may be able to examine the log and find a set of representative queries and database modifications for the database at hand. Or it is possible that we know, from the application programs that use the database, what the typical queries will be
Second step on how the index-selection portion of tuning advisors work
The designer may be offered the opportunity to specify some constraints, e.g., indexes that must, or must not, be chosen
Third step on how the index-selection portion of tuning advisors work
The tuning advisor generates a set of possible candidate indexes, and evaluates each one. Typical queries are given to the query optimizer of the DBMS. The query optimizer has the ability to estimate the running times of these queries under the assumption that one particular set of indexes is available
Fourth step on how the index-selection portion of tuning advisors work
The index set resulting in the lowest cost for the given workload is suggested to the designer, or it is automatically created
Benefit
Improvement in average execution time of the query mix
Step one for a greedy approach to choosing indexes
With no indexes selected, evaluate the benefit of each of the candidate indexes. If at least one provides positive benefit (i.e., it reduces the average execution time of queries), then choose that index
Step two for a greedy approach to choosing indexes
Reevaluate the benefit of each of the remaining candidate indexes, assuming that the previously selected index is also available. Again, choose the index that provides the greatest benefit, assuming that benefit is positive
Step three for a greedy approach to choosing indexes
Repeat the evaluation of candidate indexes under the assumption that all previously selected indexes are available. Pick the index with maximum benefit, until no more positive benefits can be obtained