1/63
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
ANALYZE
A PostgreSQL command used to collect and store statistics about table contents. These statistics are crucial for the query planner to determine efficient execution plans.
Atomicity (across multiple sites)
A property in distributed databases ensuring that a transaction across several sites either commits completely at all sites or aborts completely at all sites; no partial completion is allowed.
B+ Tree
A dynamic, tree-structured index type that adapts gracefully to inserts and deletes. It is widely recommended and used by default in PostgreSQL for its efficiency in search operations.
Blocks/Pages
Fundamental units of data storage within a Database Management System (DBMS). Data is stored in these blocks within files.
Commit Protocols
Mechanisms used in distributed database systems to ensure atomicity for transactions executing across multiple sites.
Database Management System (DBMS)
Software responsible for storing, retrieving, and managing data efficiently and reliably.
EXPLAIN
A PostgreSQL command that shows the estimated execution plan and cost of a query without actually executing it, based on available statistics.
Hash Function
A mathematical function used in hash indexes that takes a key and maps it to a corresponding hash index or location.
Hash Table for Indexing
An index type that uses a hash function to map keys to specific storage locations, efficient for equality comparisons.
Index
A data structure that maps search keys to sets of rows within a database table, primarily used to speed up searches and updates.
ISAM (Indexed Sequential Access Method)
A static, tree-structured index type.
NULL Values
A special marker in SQL indicating that a data value does not exist in the database. Its recommended to avoid indexes on columns with a high number of NULL values.
Partial Indexes
Indexes in PostgreSQL that only cover a subset of rows in a table, based on a specified condition.
Primary Key
A column or set of columns in a database table that uniquely identifies each row in that table. PostgreSQL automatically indexes these.
REINDEX
"A PostgreSQL command used to rebuild an index, replacing the old copy. This can be beneficial for performance if an index becomes corrupted or ""bloated."""
Search Keys
The values used to look up information within an index.
Storage Overhead
The additional disk space and resources required to store and maintain an index, which is a trade-off for faster search performance.
Tree-Structured Indexes
A type of index built as a multi-level tree, where nodes of the tree are blocks, used to efficiently navigate and locate data.
Two-Phase Commit (2PC) Protocol
"A widely used commit protocol that ensures atomicity in distributed transactions by coordinating ""prepare"" messages and requiring unanimous agreement from all participating sites before committing."
Unique Constraint
A rule that ensures all values in a column or a set of columns are unique. PostgreSQL automatically creates indexes for unique constraint fields.
WHERE clause
A clause in SQL queries used to specify conditions for filtering records. Columns frequently used in WHERE clauses are good candidates for indexing.
What is the primary purpose of a database index, and what is its main drawback?
The primary purpose of a database index is to map search keys to sets of rows within a database table, making searches and updates faster. Its main drawback is the associated storage overhead.
Explain the concept of how a DBMS stores data in relation to blocks and indexes.
A DBMS stores data in files, which are composed of pages or blocks. An index can be created to point to the start of each block, allowing for efficient searching by directing the system to a specific block where the desired data is located.
Differentiate between ISAM and B+ Tree indexes, highlighting their key characteristic.
ISAM (Indexed Sequential Access Method) is a static tree-structured index, meaning its structure is fixed. In contrast, a B+ Tree is a dynamic tree-structured index that adjusts gracefully under inserts and deletes, making it adaptable to changing data.
When would a hash index perform well, and when would it not be recommended compared to a B+ Tree index?
Hash indexes perform well for equality comparisons using operators like = or
List three scenarios when you should generally consider using an index.
You should generally consider using an index for large tables, primary and foreign keys, and fields that are frequently used in WHERE clauses.
Describe two situations when it's advisable to avoid using indexes.
Indexes should generally be avoided when using small tables, or when tables have frequent, large batch update or insert operations. Additionally, they should be avoided on columns with a high number of NULL values.
How does PostgreSQL automatically handle index creation, and what is its default index type?
PostgreSQL automatically creates indexes for primary keys and unique constraint fields. Its default index type is B+ Tree.
Explain the difference between using EXPLAIN and EXPLAIN ANALYZE in PostgreSQL regarding query optimization.
EXPLAIN shows the estimated execution plan and cost of a query based on available statistics without actually executing it. EXPLAIN ANALYZE executes the query and shows the actual run time along with the execution plan, providing more precise performance insights.
What is the purpose of a commit protocol in a distributed database system, and what specific problem does it address?
The purpose of a commit protocol is to ensure atomicity across multiple sites in a distributed database system. It addresses the problem of ensuring that a transaction is either committed at all participating sites or aborted at all sites, preventing partial commits.
According to the source material, what is the recommendation regarding using indexes on columns with a high number of NULL values?
The source material states that indexes should generally be avoided on columns that contain a high number of NULL values.
Atomicity
Ensures that a transaction is all-or-nothing—either all operations succeed or none do.
Consistency
Guarantees that a transaction transforms the database from one valid state to another, maintaining all rules and constraints.
Isolation
Ensures that concurrent transactions do not interfere with each other and produce the same results as if run sequentially.
Durability
Once a transaction is committed, its changes are permanent, even in the event of a system crash.
Aggregation
SQL operations that summarize data, such as calculating sums, averages, counts, minimums, or maximums.
AVG
An aggregation function that calculates the average value of a set of numbers.
CASE Expression
A conditional expression in SQL used to define different outcomes based on specified conditions, often used to categorize data.
COUNT
An aggregation function that counts the number of rows or non-NULL values in a specified column.
Cross Join
A join operation that returns the Cartesian product of two tables, combining every row from the first table with every row from the second table.
Full Join (Full Outer Join)
An outer join that includes all rows from both the left and right tables, with NULL values for non-matching columns.
GROUP BY Clause
A SQL clause used to group rows that have the same values in specified columns into summary rows, typically used with aggregate functions.
HAVING Clause
A SQL clause used to filter groups of rows based on conditions involving aggregate functions, applied after the GROUP BY clause.
Inner Join
The most common type of join that returns only the rows that have matching values in specified columns from both tables.
Joins
SQL operations that merge rows from two or more tables based on specified conditions, combining data from related tables.
Left Join (Left Outer Join)
An outer join that includes all rows from the first (left) table, and only the matching rows from the second (right) table. Non-matching columns from the right table will contain NULL values.
MAX
An aggregation function that returns the maximum value in a set of numbers.
MIN
An aggregation function that returns the minimum value in a set of numbers.
Multiple Joins
The process of linking several tables in a single SQL query based on their related columns.
Natural Join
A type of inner join where no join condition is explicitly specified; it joins tables based on common columns with identical names.
Outer Joins
A category of joins that includes both matching and non-matching rows from the involved tables.
Right Join (Right Outer Join)
An outer join that includes all rows from the second (right) table, and only the matching rows from the first (left) table. Non-matching columns from the left table will contain NULL values.
Self Join
A join operation where a table is joined with itself, typically used to find relationships within the same table.
SUM
An aggregation function that calculates the sum of values in a numeric column.
What is the fundamental purpose of SQL joins, and how do they achieve this purpose?
SQL joins are fundamental operations that merge rows from two tables based on specified conditions. They achieve this by combining related data from multiple tables into a single result set, enabling comprehensive data retrieval.
Explain the key difference between an Inner Join and an Outer Join. Provide one scenario where an Outer Join would be more appropriate than an Inner Join.
The key difference is that Inner Joins return only matching rows from both tables, while Outer Joins include non-matching rows as well. An Outer Join would be more appropriate when you need to see all records from one or both tables, even if there isn't a corresponding match in the other table, for example, listing all departments even if they have no employees.
Describe the function of a Left Join. What values appear in the columns from the right table if there are no matching rows?
A Left Join includes all rows from the first (left) table, even if there are no matching rows in the second (right) table. If there's no match in the right table for a row in the left table, the columns from the right table will show NULL values.
When would you use a Self Join, and what common SQL technique is typically employed to facilitate it?
You would use a Self Join when a relationship exists within the same table, such as finding managers for employees where both are listed in the same employee table. To facilitate this, the table is typically aliased (e.g., employee AS e1, employee AS m1) to distinguish between its different roles in the join.
What is the primary role of aggregation in SQL, and what are the five main aggregation functions mentioned?
The primary role of aggregation in SQL is to summarize data. The five main aggregation functions mentioned are SUM, MIN, MAX, AVG, and COUNT.
Distinguish between COUNT(*) and COUNT(DISTINCT category). When would you prefer one over the other?
COUNT() counts all rows that match a condition, regardless of whether there are duplicate values. COUNT(DISTINCT category), however, counts only the unique values within the specified category. You would prefer COUNT(DISTINCT category) when you need to know the number of unique items or types, whereas COUNT() is used when you need the total number of records.
Explain the purpose of the GROUP BY clause. What is a crucial rule regarding the columns that can appear in the SELECT clause when GROUP BY is used?
The GROUP BY clause is used to group rows that have the same values in specified columns into a summary row, allowing aggregate functions to be applied to each group. A crucial rule is that when GROUP BY is used, the SELECT clause can only contain grouped attributes or aggregates.
Why was the HAVING clause introduced in SQL, and how does it differ from the WHERE clause in its application?
The HAVING clause was introduced in SQL because the WHERE clause cannot be used directly with aggregate functions. HAVING differs from WHERE in that it filters data on the group level after aggregation has occurred, whereas WHERE filters individual records before grouping.
Describe the general evaluation order for SQL queries involving FROM, WHERE, GROUP BY, and HAVING clauses.
The general evaluation order for SQL queries involving these clauses is: first, the FROM and WHERE clauses are computed to select and filter individual rows. Then, the rows are grouped by the attributes specified in GROUP BY. Finally, the HAVING clause is applied to filter these groups, and then the SELECT clause computes aggregates for the remaining groups.
How can the CASE expression be utilized with aggregation, and what benefit does this combination offer?
The CASE expression can be utilized within a query to categorize data based on conditions, and this newly categorized data can then be used with aggregation. This combination offers the benefit of allowing dynamic classification of data which can then be summarized, such as counting employees within specific salary levels that were defined by CASE.