Binary Trees: A tree data structure with at most two children for each node, known as the left and right child.
Heaps: A specialized tree-based structure that satisfies the heap property.
A max-heap: The key of each node is greater than or equal to the keys of its children.
A min-heap: The key of each node is less than or equal to the keys of its children.
Sorting Techniques
Merge Sort: A divide and conquer algorithm that sorts the input array by recursively dividing it into halves and then merging the sorted halves.
Quick Sort: A highly efficient sorting algorithm that selects a 'pivot' and partitions the other elements into two sub-arrays according to whether they are less than or greater than the pivot.
Selection Sort: An in-place comparison sorting algorithm that divides the input list into two parts: a sorted and an unsorted. The smallest (or largest, depending on sorting order) element from the unsorted part is repeatedly moved to the sorted part.
Expected exam format: Given a sequence, demonstrate sorting technique progression.
Hashing Basics
Purpose of Hashing: Quick search in large scale data.
Example: Searching for movies featuring an actor.
Key Concept: The key is mapped to a numerical representation using hash functions, aiming for constant search time complexity.
Hash Function Definition: A transformation that generates a numerical representation of the key. Examples include:
$h(k) = a imes key + b$ (where $a$ and $b$ are constants)
$h(k) = key mod m$ (where $m$ is the size of the hash table)
Building Hash Tables
Structure: A hash table can be viewed as an array of buckets where data is stored based on hash values.
Insertion Process:
Apply hash function to calculate the index.
Insert the item into the calculated index.
Collision Issue: Two different keys resulting in the same index, leading to data loss unless resolved.
Collision Resolution Techniques
Separate Chaining:
Each bucket contains a linked list.
Allows storage of multiple items at the same index.
Example: If both actors have the same hash value, their data is stored in a linked list at that index.
Open Addressing:
No external lists for collisions; only use the original array.
Techniques include linear probing and double hashing (where a second hash function is used to resolve collisions).
Linear Probing: Incrementally checks for the next available slot if a collision occurs.
Double Hashing: Combines two hash functions to determine the next location if the first choice is full. Uses the formula:
h(k) = (h1(k) + i imes h2(k)) mod m where $i$ is the number of attempts.
Performance Considerations
Load Factor: The ratio of the number of entries to the total number of buckets. Aiming for a load factor less than 1 generally keeps operations efficient.
Cluster Phenomenon: A situation in open addressing where sequences of occupied buckets lead to increased search time.
Essential Performance Metric: Average number of probes taken to find or insert an item, influenced by load factor and clustering.
Practical Applications
Hash tables are commonly employed in databases for indexing large datasets due to their fast search capabilities.
Understanding the threshold of performance, specifically aiming for average-case constant time, is crucial for practical applications like databases, cache implementations, and other systems that require high speed and efficiency.
Study Tips for Exams
Familiarize with each sorting algorithm's logic and execution through examples.
Understand the application and formula of different hash functions and collision resolution techniques.
Be able to construct hash tables and perform operations like insertions and lookups by hand to solidify understanding in practical terms, especially for exam scenarios where coding is not necessary but logical progression is expected.