Main Operation: Search/lookup operation
Given a list, user gets a query and searches for it in the list.
Data Structures:
Two major groups:
Linear: Data arranged in a line (e.g., arrays)
Hierarchical: Data arranged in a way that some elements may be skipped (tree structures)
Example of a List: Contains words such as 'carefully', 'cabinet'.
For example, if we search for 'cat' in this list, we need to determine if it exists or not.
Storage Options:
Unsorted Array:
Worst-case time complexity for search is O(n), as it requires a linear search (sweeping from left to right).
Sorted Array:
After sorting, can perform binary search for better efficiency.
Worst-case time complexity for binary search is O(log n).
Binary Search Tree (BST):
If balanced, worst-case search time is O(log n). Not necessarily linear, depends on height of tree.
Heap Structure:
Searching in a heap has worst-case time complexity of O(n), as it lacks binary search properties and requires checking nodes.
Key Data Structure for Search:
AVL Tree:
Offers O(log n) time complexity for search in a balanced tree.
Operations include inspection at each node, guiding traversal based on value comparisons.
Tree Basics:
A tree consists of parent and child nodes.
Binary Tree: A tree where each node has at most two children.
Binary Search Tree: A binary tree that fulfills the property where left children are lesser and right children are greater.
AVL Tree: A specialized binary search tree that is balanced.
Trie (Prefix Tree):
A multi-way tree designed primarily for storing strings. Utilizes shared prefixes to improve space and search efficiency.
Nodes do not hold actual character data, rather, they represent paths based on character traversal.
Structure: Each node can have up to 26 child nodes (one for each letter of the English alphabet).
Terminal Node: Nodes indicating the end of a word are marked.
Insertion in Trie:
Traverse each letter of the input string.
Create paths as needed, marking terminal nodes for the completion of words.
Searching in Trie:
Similar to insertion; traverse using the string's characters.
Check if the path exists for each character. If paths for every character exist and the last node is marked as terminal, the word exists.
If a path does not exist while traversing, the word is absent.
Reason for Using Trie:
Space efficient when dealing with multiple words that share prefixes, allowing for quick retrieval and reduced redundancy in storage.
Space Complexity: Considerable due to potentially having 26 pointers per node.
Requires a trade-off analysis against simpler structures like binary search trees due to each node's overhead.
Implementation:
Typically implemented in C with linked representations.
A node structure has pointers for connections, and perhaps a flag for indicating terminal nodes.
Complexity Conclusions:
Insert operation complexity is O(m), where m is the length of the word being inserted.
Search operation maintains a similar complexity, resulting in efficient string operations in comparison to arrays.
Final Note: Keep in mind the balance and structure of trees to assess efficiency in storage and retrieval for different use cases.