Module 6 shortened

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/63

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

64 Terms

New cards

ANALYZE

A PostgreSQL command used to collect and store statistics about table contents. These statistics are crucial for the query planner to determine efficient execution plans.

New cards

Atomicity (across multiple sites)

A property in distributed databases ensuring that a transaction across several sites either commits completely at all sites or aborts completely at all sites; no partial completion is allowed.

New cards

B+ Tree

A dynamic, tree-structured index type that adapts gracefully to inserts and deletes. It is widely recommended and used by default in PostgreSQL for its efficiency in search operations.

New cards

Blocks/Pages

Fundamental units of data storage within a Database Management System (DBMS). Data is stored in these blocks within files.

New cards

Commit Protocols

Mechanisms used in distributed database systems to ensure atomicity for transactions executing across multiple sites.

New cards

Database Management System (DBMS)

Software responsible for storing, retrieving, and managing data efficiently and reliably.

New cards

EXPLAIN

A PostgreSQL command that shows the estimated execution plan and cost of a query without actually executing it, based on available statistics.

New cards

Hash Function

A mathematical function used in hash indexes that takes a key and maps it to a corresponding hash index or location.

New cards

Hash Table for Indexing

An index type that uses a hash function to map keys to specific storage locations, efficient for equality comparisons.

New cards

Index

A data structure that maps search keys to sets of rows within a database table, primarily used to speed up searches and updates.

New cards

ISAM (Indexed Sequential Access Method)

A static, tree-structured index type.

New cards

NULL Values

A special marker in SQL indicating that a data value does not exist in the database. Its recommended to avoid indexes on columns with a high number of NULL values.

New cards

Partial Indexes

Indexes in PostgreSQL that only cover a subset of rows in a table, based on a specified condition.

New cards

Primary Key

A column or set of columns in a database table that uniquely identifies each row in that table. PostgreSQL automatically indexes these.

New cards

REINDEX

"A PostgreSQL command used to rebuild an index, replacing the old copy. This can be beneficial for performance if an index becomes corrupted or ""bloated."""

New cards

Search Keys

The values used to look up information within an index.

New cards

Storage Overhead

The additional disk space and resources required to store and maintain an index, which is a trade-off for faster search performance.

New cards

Tree-Structured Indexes

A type of index built as a multi-level tree, where nodes of the tree are blocks, used to efficiently navigate and locate data.

New cards

Two-Phase Commit (2PC) Protocol

"A widely used commit protocol that ensures atomicity in distributed transactions by coordinating ""prepare"" messages and requiring unanimous agreement from all participating sites before committing."

New cards

Unique Constraint

A rule that ensures all values in a column or a set of columns are unique. PostgreSQL automatically creates indexes for unique constraint fields.

New cards

WHERE clause

A clause in SQL queries used to specify conditions for filtering records. Columns frequently used in WHERE clauses are good candidates for indexing.

New cards

What is the primary purpose of a database index, and what is its main drawback?

The primary purpose of a database index is to map search keys to sets of rows within a database table, making searches and updates faster. Its main drawback is the associated storage overhead.

New cards

Explain the concept of how a DBMS stores data in relation to blocks and indexes.

A DBMS stores data in files, which are composed of pages or blocks. An index can be created to point to the start of each block, allowing for efficient searching by directing the system to a specific block where the desired data is located.

New cards

Differentiate between ISAM and B+ Tree indexes, highlighting their key characteristic.

ISAM (Indexed Sequential Access Method) is a static tree-structured index, meaning its structure is fixed. In contrast, a B+ Tree is a dynamic tree-structured index that adjusts gracefully under inserts and deletes, making it adaptable to changing data.

New cards

When would a hash index perform well, and when would it not be recommended compared to a B+ Tree index?

Hash indexes perform well for equality comparisons using operators like = or

New cards

List three scenarios when you should generally consider using an index.

You should generally consider using an index for large tables, primary and foreign keys, and fields that are frequently used in WHERE clauses.

New cards

Describe two situations when it's advisable to avoid using indexes.

Indexes should generally be avoided when using small tables, or when tables have frequent, large batch update or insert operations. Additionally, they should be avoided on columns with a high number of NULL values.

New cards

How does PostgreSQL automatically handle index creation, and what is its default index type?

PostgreSQL automatically creates indexes for primary keys and unique constraint fields. Its default index type is B+ Tree.

New cards

Explain the difference between using EXPLAIN and EXPLAIN ANALYZE in PostgreSQL regarding query optimization.

EXPLAIN shows the estimated execution plan and cost of a query based on available statistics without actually executing it. EXPLAIN ANALYZE executes the query and shows the actual run time along with the execution plan, providing more precise performance insights.

New cards

What is the purpose of a commit protocol in a distributed database system, and what specific problem does it address?

The purpose of a commit protocol is to ensure atomicity across multiple sites in a distributed database system. It addresses the problem of ensuring that a transaction is either committed at all participating sites or aborted at all sites, preventing partial commits.

New cards

According to the source material, what is the recommendation regarding using indexes on columns with a high number of NULL values?

The source material states that indexes should generally be avoided on columns that contain a high number of NULL values.

New cards

Atomicity

Ensures that a transaction is all-or-nothing—either all operations succeed or none do.

New cards

Consistency

Guarantees that a transaction transforms the database from one valid state to another, maintaining all rules and constraints.

New cards

Isolation

Ensures that concurrent transactions do not interfere with each other and produce the same results as if run sequentially.

New cards

Durability

Once a transaction is committed, its changes are permanent, even in the event of a system crash.

New cards

Aggregation

SQL operations that summarize data, such as calculating sums, averages, counts, minimums, or maximums.

New cards

AVG

An aggregation function that calculates the average value of a set of numbers.

New cards

CASE Expression

A conditional expression in SQL used to define different outcomes based on specified conditions, often used to categorize data.

New cards

COUNT

An aggregation function that counts the number of rows or non-NULL values in a specified column.

New cards

Cross Join

A join operation that returns the Cartesian product of two tables, combining every row from the first table with every row from the second table.

New cards

Full Join (Full Outer Join)

An outer join that includes all rows from both the left and right tables, with NULL values for non-matching columns.

New cards

GROUP BY Clause

A SQL clause used to group rows that have the same values in specified columns into summary rows, typically used with aggregate functions.

New cards

HAVING Clause

A SQL clause used to filter groups of rows based on conditions involving aggregate functions, applied after the GROUP BY clause.

New cards

Inner Join

The most common type of join that returns only the rows that have matching values in specified columns from both tables.

New cards

Joins

SQL operations that merge rows from two or more tables based on specified conditions, combining data from related tables.

New cards

Left Join (Left Outer Join)

An outer join that includes all rows from the first (left) table, and only the matching rows from the second (right) table. Non-matching columns from the right table will contain NULL values.

New cards

MAX

An aggregation function that returns the maximum value in a set of numbers.

New cards

MIN

An aggregation function that returns the minimum value in a set of numbers.

New cards

Multiple Joins

The process of linking several tables in a single SQL query based on their related columns.

New cards

Natural Join

A type of inner join where no join condition is explicitly specified; it joins tables based on common columns with identical names.

New cards

Outer Joins

A category of joins that includes both matching and non-matching rows from the involved tables.

New cards

Right Join (Right Outer Join)

An outer join that includes all rows from the second (right) table, and only the matching rows from the first (left) table. Non-matching columns from the left table will contain NULL values.

New cards

Self Join

A join operation where a table is joined with itself, typically used to find relationships within the same table.

New cards

SUM

An aggregation function that calculates the sum of values in a numeric column.

New cards

What is the fundamental purpose of SQL joins, and how do they achieve this purpose?

SQL joins are fundamental operations that merge rows from two tables based on specified conditions. They achieve this by combining related data from multiple tables into a single result set, enabling comprehensive data retrieval.

New cards

Explain the key difference between an Inner Join and an Outer Join. Provide one scenario where an Outer Join would be more appropriate than an Inner Join.

The key difference is that Inner Joins return only matching rows from both tables, while Outer Joins include non-matching rows as well. An Outer Join would be more appropriate when you need to see all records from one or both tables, even if there isn't a corresponding match in the other table, for example, listing all departments even if they have no employees.

New cards

Describe the function of a Left Join. What values appear in the columns from the right table if there are no matching rows?

A Left Join includes all rows from the first (left) table, even if there are no matching rows in the second (right) table. If there's no match in the right table for a row in the left table, the columns from the right table will show NULL values.

New cards

When would you use a Self Join, and what common SQL technique is typically employed to facilitate it?

You would use a Self Join when a relationship exists within the same table, such as finding managers for employees where both are listed in the same employee table. To facilitate this, the table is typically aliased (e.g., employee AS e1, employee AS m1) to distinguish between its different roles in the join.

New cards

What is the primary role of aggregation in SQL, and what are the five main aggregation functions mentioned?

The primary role of aggregation in SQL is to summarize data. The five main aggregation functions mentioned are SUM, MIN, MAX, AVG, and COUNT.

New cards

Distinguish between COUNT(*) and COUNT(DISTINCT category). When would you prefer one over the other?

COUNT() counts all rows that match a condition, regardless of whether there are duplicate values. COUNT(DISTINCT category), however, counts only the unique values within the specified category. You would prefer COUNT(DISTINCT category) when you need to know the number of unique items or types, whereas COUNT() is used when you need the total number of records.

New cards

Explain the purpose of the GROUP BY clause. What is a crucial rule regarding the columns that can appear in the SELECT clause when GROUP BY is used?

The GROUP BY clause is used to group rows that have the same values in specified columns into a summary row, allowing aggregate functions to be applied to each group. A crucial rule is that when GROUP BY is used, the SELECT clause can only contain grouped attributes or aggregates.

New cards

Why was the HAVING clause introduced in SQL, and how does it differ from the WHERE clause in its application?

The HAVING clause was introduced in SQL because the WHERE clause cannot be used directly with aggregate functions. HAVING differs from WHERE in that it filters data on the group level after aggregation has occurred, whereas WHERE filters individual records before grouping.

New cards

Describe the general evaluation order for SQL queries involving FROM, WHERE, GROUP BY, and HAVING clauses.

The general evaluation order for SQL queries involving these clauses is: first, the FROM and WHERE clauses are computed to select and filter individual rows. Then, the rows are grouped by the attributes specified in GROUP BY. Finally, the HAVING clause is applied to filter these groups, and then the SELECT clause computes aggregates for the remaining groups.

New cards

How can the CASE expression be utilized with aggregation, and what benefit does this combination offer?

The CASE expression can be utilized within a query to categorize data based on conditions, and this newly categorized data can then be used with aggregation. This combination offers the benefit of allowing dynamic classification of data which can then be summarized, such as counting employees within specific salary levels that were defined by CASE.