MG

Data Structures and Trie

  • Main Operation: Search/lookup operation

    • Given a list, user gets a query and searches for it in the list.

  • Data Structures:

    • Two major groups:

      • Linear: Data arranged in a line (e.g., arrays)

      • Hierarchical: Data arranged in a way that some elements may be skipped (tree structures)

  • Example of a List: Contains words such as 'carefully', 'cabinet'.

    • For example, if we search for 'cat' in this list, we need to determine if it exists or not.

  • Storage Options:

    • Unsorted Array:

      • Worst-case time complexity for search is O(n), as it requires a linear search (sweeping from left to right).

    • Sorted Array:

      • After sorting, can perform binary search for better efficiency.

      • Worst-case time complexity for binary search is O(log n).

    • Binary Search Tree (BST):

      • If balanced, worst-case search time is O(log n). Not necessarily linear, depends on height of tree.

    • Heap Structure:

      • Searching in a heap has worst-case time complexity of O(n), as it lacks binary search properties and requires checking nodes.

  • Key Data Structure for Search:
    AVL Tree:

    • Offers O(log n) time complexity for search in a balanced tree.

    • Operations include inspection at each node, guiding traversal based on value comparisons.

  • Tree Basics:

    • A tree consists of parent and child nodes.

    • Binary Tree: A tree where each node has at most two children.

    • Binary Search Tree: A binary tree that fulfills the property where left children are lesser and right children are greater.

    • AVL Tree: A specialized binary search tree that is balanced.

  • Trie (Prefix Tree):

    • A multi-way tree designed primarily for storing strings. Utilizes shared prefixes to improve space and search efficiency.

    • Nodes do not hold actual character data, rather, they represent paths based on character traversal.

    • Structure: Each node can have up to 26 child nodes (one for each letter of the English alphabet).

    • Terminal Node: Nodes indicating the end of a word are marked.

  • Insertion in Trie:

    • Traverse each letter of the input string.

    • Create paths as needed, marking terminal nodes for the completion of words.

  • Searching in Trie:

    • Similar to insertion; traverse using the string's characters.

    • Check if the path exists for each character. If paths for every character exist and the last node is marked as terminal, the word exists.

    • If a path does not exist while traversing, the word is absent.

  • Reason for Using Trie:

    • Space efficient when dealing with multiple words that share prefixes, allowing for quick retrieval and reduced redundancy in storage.

  • Space Complexity: Considerable due to potentially having 26 pointers per node.

    • Requires a trade-off analysis against simpler structures like binary search trees due to each node's overhead.

  • Implementation:

    • Typically implemented in C with linked representations.

    • A node structure has pointers for connections, and perhaps a flag for indicating terminal nodes.

  • Complexity Conclusions:

    • Insert operation complexity is O(m), where m is the length of the word being inserted.

    • Search operation maintains a similar complexity, resulting in efficient string operations in comparison to arrays.

  • Final Note: Keep in mind the balance and structure of trees to assess efficiency in storage and retrieval for different use cases.