Data Structures Study Notes

Introduction to Data Structures

  • Binary Search Tree (BST): A specialized type of binary tree that organizes data in a hierarchical structure, allowing for efficient searching, insertion, and deletion operations. Each node has at most two children, and maintains specific ordering properties: all values in the left subtree are less than the node's value, and all values in the right subtree are greater. This ordering enables logarithmic time complexity for average-case operations.

  • Queues: A linear data structure that follows the First-In, First-Out (FIFO) principle, similar to a waiting line. Elements are added at one end (rear/tail) and removed from the other end (front/head).

  • Stacks: A linear data structure that follows the Last-In, First-Out (LIFO) principle, analogous to a stack of plates. Elements are added and removed from the same end, typically called the "top" of the stack.

Linear Data Structures
  • Linear Data Structures consist of collections of elements arranged in a sequential manner, forming a single level. This means each element has a unique predecessor (except the first) and a unique successor (except the last), defining a clear ordering. They are primarily used for sequential data access patterns.

    • Example Structures:

      • Arrays: Fixed-size, contiguous memory blocks allowing O(1)O(1) access to any element by index.

      • Linked Lists: Dynamic-size structures where elements (nodes) are linked via pointers, enabling efficient insertions and deletions at arbitrary positions but O(n)O(n) access time.

      • Stacks: LIFO principle, often implemented using arrays or linked lists.

      • Queues: FIFO principle, also often implemented using arrays or linked lists.

Queue Operations:
  • Operations in a queue include:

    • Enqueue (Insertion): Adds a new element to the rear (or tail) of the queue. If the queue is implemented with a fixed-size array, this operation might require checking for overflow.

    • Dequeue (Removal): Removes and returns the element from the front (or head) of the queue. This operation often involves advancing the head pointer. If the queue is empty, an underflow condition occurs.

    • Front/Peek: Returns the element at the front of the queue without removing it.

    • IsEmpty: Checks if the queue contains any elements.

    • IsFull: Checks if the queue has reached its maximum capacity (relevant for array-based implementations).

  • Searching in Queues: While not a primary operation that leverages queue-specific efficiency, it is common to search for an element's presence in a queue.

    • To search, one typically has to traverse through the queue elements from front to rear. This results in O(n)O(n) time complexity in the worst case, as every element might need to be checked. This is unlike stacks, where only the top element is directly accessible without disrupting the LIFO order.

Stack Operations:
  • Operations in a stack include:

    • Push (Insertion): Adds an element to the top of the stack. This operation increments the stack pointer and places the new element at the new top position.

    • Pop (Removal): Removes and returns the top element from the stack. This operation decrements the stack pointer. It adheres strictly to the Last In, First Out (LIFO) principle, meaning the last inserted element is the first one to be removed.

    • Peek/Top: Returns the element at the top of the stack without removing it.

    • IsEmpty: Checks if the stack contains any elements.

    • IsFull: Checks if the stack has reached its maximum capacity (relevant for array-based implementations).

  • Stack Behavior: Due to its Last In, First Out (LIFO) structure, you cannot directly check or access elements within the stack other than the top one without performing pop operations, which would alter the stack's state. Elements are accessed only at the "top."

Circular Queue Implementation
  • Next Operation: Moving the index in a circular manner is fundamental to efficiently manage queue operations within a fixed-size array without constantly shifting elements.

    • For an index i in an array of size n, the next position is calculated using the modulo operator:
      next(i)=(i+1)modn\text{next}(i) = (i + 1) \mod n
      This formula elegantly handles the "wrap-around" behavior: if i reaches n-1 (the last valid index), (n-1 + 1) \mod n = n \mod n = 0, bringing the index back to the beginning of the array. This allows the queue to utilize memory efficiently in a circular buffer fashion.

  • Efficiency: In terms of queue operations, both enqueue and dequeue are designed to run in constant time, O(1)O(1). This efficiency is achieved by merely updating pointers (head and tail) rather than physically moving data elements. Moving all indices down would lead to O(n)O(n) complexity, which circular queues specifically avoid.

Circular Array Implementation for Queues
  • Array-Based Implementation of queues utilizes a fixed-size array to store elements. Pointers (e.g., head for the front and tail for the rear) track the position of the first and last elements within this circular array.

  • Conditions for Queue Fullness:

    • A common condition for determining if a circular queue implemented with an array is full is when the next position where an element would be enqueued (i.e., next(tail)) is equal to the current head position. This indicates that adding another element would cause an overlap with the front of the queue, necessitating careful handling to distinguish a full queue from an empty one (e.g., using a count variable or leaving one slot empty).
      if  next(tail)=head\text{if} \; \text{next}(tail) = head

  • Conditions for Queue Emptiness: A queue is considered empty when head equals tail. When head and tail are at the same position, it signifies that there are no elements between them.

  • Handling Overflows and Underflows:

    • Overflow (Queue Full): When an enqueue operation is attempted on a full queue, options include:

      • Returning an error code or a boolean false.

      • Throwing an exception to signal the failure.

      • Resizing the queue dynamically by creating a new, larger array and copying existing elements, though this changes the operation's time complexity for that instance.

    • Underflow (Queue Empty): When a dequeue operation is attempted on an empty queue, sensible actions include:

      • Returning a special value (e.g., null or an indicator of no element).

      • Throwing an exception.

Inquiry and Dequeue Steps in Circular Queue
  1. Inquiry (Search Operation):

    • To inquire about the presence of an element, one must check positions from the head up to, but not including, the next(tail) in a circular fashion. This means iterating through the currently stored elements. Due to this traversal, an inquiry operation has a time complexity of O(k)O(k) where kk is the number of elements in the queue (worst case O(n)O(n) if the queue is full).

  2. Dequeue Operation:

    • This involves removing the element from the head position of the queue.

    • First, the element at array[head] is retrieved.

    • Then, the head index is advanced to the next position using the circular formula:
      head=next(head)\text{head} = \text{next}(head)

    • Finally, the dequeued element is returned. It's often good practice to "logically" remove the element or keep its memory address for potential future validation or auditing, although the physical data might remain in the array until overwritten.

Memory Management in Queue Implementations
  • When an element is dequeued, its data technically remains in the underlying array or memory location until it is overwritten by a new enqueue operation.

  • For sensitive data (e.g., passwords, personal identifiers), explicit steps to reset or overwrite the memory location with dummy values (e.g., zeros or random bits) might be necessary for security and privacy considerations, preventing potential information leakage or unauthorized access to stale data. However, for general data, this is often omitted for performance.

Time Complexity of Queue Operations
  • Time Complexity: Both the enqueue and dequeue operations in a standard circular queue (array-based or linked-list based) run in constant time, i.e., O(1)O(1).

    • This is because they only involve a few simple operations: updating a pointer (head or tail), reading/writing an element, and performing a modulo calculation. These steps do not depend on the number of elements currently in the queue, making them highly efficient.

    • This constant time efficiency holds true under normal conditions, primarily when the queue does not need to be resized (which would involve O(n)O(n) operations).

Tree Structures
Overview of Trees
  • Trees as Data Structures:

    • Trees are nonlinear data structures that represent hierarchical relationships between elements, called nodes. Unlike linear structures, nodes in a tree can have multiple children, creating a branching structure. They are composed of a set of nodes and a set of directed edges connecting them.

  • Key Node Types and Terminology:

    • Node: A fundamental unit of a tree containing data and links to its children.

    • Root: The topmost node in a tree. It is unique and serves as the starting point for any tree traversal. It has no parents (ancestors).

    • Parent: The direct predecessor of a node.

    • Child: The direct successor of a node.

    • Siblings: Nodes that share the same parent.

    • Leaf (External Node): A node that has no children. These are the "end" nodes of a tree's branches.

    • Internal Node: A node that has at least one child.

    • Edge: The link between a parent and a child node.

  • Path, Depth, and Height:

    • Path: A unique sequence of connected nodes from one node to another. A path from the root to a leaf represents a complete branch.

    • Depth of a Node: The length of the path from the root to that specific node. The root node has a depth of 00.

    • Height of a Node: The length of the longest path from that node to a leaf descendant. A leaf node has a height of 00.

    • Height of a Tree: The length of the longest path from the root to any leaf in the tree. This is equivalent to the height of the root node. It is measured as the number of edges.

Binary Trees
  • Definition of Binary Trees: A binary tree is a special type of tree where every node has at most two children. These children are strictly distinguished as the left child and the right child. This strict limit on children simplifies many tree algorithms.

  • Subtrees:

    • For any given node in a binary tree, its children (if they exist) are roots of new binary trees themselves. These are referred to as the left subtree (rooted at the left child) and the right subtree (rooted at the right child).

    • Each subtree recursively maintains the properties of a binary tree. This recursive definition is crucial for many tree-based algorithms.

Binary Search Trees (BST)
  • Definition: A Binary Search Tree (BST), also known as an ordered or sorted binary tree, is a binary tree that adheres to specific ordering rules for efficient data retrieval. For every node N in the tree:

    • All values in the left subtree of N are strictly less than the value of N.

    • All values in the right subtree of N are strictly greater than the value of N.

    • There are no duplicate values allowed (though variations exist for handling duplicates).

  • Core Operations and Their Efficiency:

    • Search Operation: To find a specific value, you start at the root.

      • If the target value is equal to the current node's value, the search is successful.

      • If the target value is less, you recursively search in the left subtree.

      • If the target value is greater, you recursively search in the right subtree.

      • This process eliminates half of the remaining tree at each step, leading to an average-case time complexity of O(logn)O(\log n). In the worst case (a skewed tree resembling a linked list), it degrades to O(n)O(n).

    • Insertion Operation: Similar to search, you traverse the tree to find the correct position to insert the new node, maintaining the BST property. New nodes are always inserted as leaf nodes. Average-case time complexity is O(logn)O(\log n).

    • Deletion Operation: More complex, as it requires re-arranging the tree to maintain its BST properties after a node is removed. There are cases for deleting a leaf node, a node with one child, and a node with two children. Average-case time complexity is O(logn)O(\log n).

Conclusion
  • Mastery of fundamental data structures such as arrays, linked lists, stacks, queues, and hierarchical structures like trees (with a focus on binary trees and Binary Search Trees) lays an essential foundation for efficient algorithm design, problem-solving, and robust software implementation.

  • Understanding the characteristics, operations, and time complexities of these structures is critical for optimizing performance and managing computational resources effectively.

  • Future discussions on advanced algorithms and data structures will invariably build upon these foundational concepts, requiring a deep appreciation for complexity management, structural integrity, and performance optimization in various computing contexts.