Theoretical Test Question Bank 2b - Google Docs
Document Overview
This document is a collaborative question bank for FIT1008/FIT2085 Theory Test A2b.
It outlines rules for contributing and maintaining the accuracy of the answers.
Rules for Contribution
No vandalism.
Answers should only be provided if the contributor is sure of their correctness.
All content must be in English.
Formatting and content errors should be edited.
Answers should be organized in bullet points where possible.
Do not delete anyone’s answer; extend answers/sources within the row itself.
Correct answers should be highlighted in green, those needing improvement in yellow, and incorrect answers in red.
Crowdfunded answers must have a traceable source.
Feedback should be provided in brackets after the answer.
Sources can include lecture slides, research papers, articles, or any external sources, clearly stated.
Expert-endorsed answers (TAs, PASS Leaders) should be marked with a star and the expert’s name and role.
Iterator
Definition and Functionality
An iterator is an object that allows traversal of a container without needing to know its underlying implementation.
Provides a unified way to access elements in a container.
Use Case Scenario
Iterators are helpful when iterating through a large sequence of items.
They are faster and more space-efficient compared to list-based approaches for large datasets.
Implementation in Python
The iterable class must override the
__iter__()method to return an associated Iterator object.The iterator object must override the
__iter__()method to return an object with a__next__()method (often itself).The
__next__()method must be overridden to return the next element and raise aStopIterationexception when the iterable is exhausted.The
__init__()method (constructor) must be defined to initialize variables.
Iterable
Definition
An iterable is an object that can be iterated through.
Implements the
__iter__()method, which returns an iterator.
Examples of Intrinsic Python Iterable Classes
Lists:
list()Tuples:
tuple()Dictionaries:
dict()Strings:
str()
Differences between Iterable and Iterator
Iterable: An object that can return an iterator.
Does not have the
__next__()method but can produce an iterator using the__iter__()method.Iterator: An object produced by calling the
__iter__()method of an iterable.Has both
__iter__()and__next__()methods, allowing traversal of elements one by one.The
list()function automatically callsnext()until aStopIterationerror is raised.
Making a Class Iterable in Python
Key Points
The iterable class must implement an
itermethod that returns an iterator, or implement a generator initer().Iterator objects must implement:
iter()method that returnsself.next()method that increments a marker, raisesStopIterationwhen done, and returns the object.init()method to initialize the marker.
Augmenting a Collection with an Iterator in Python
Additional Operators
To augment the collection, additional operators can be included:
Delete: Returns the item pointed to by the current marker and deletes the node.
Add: Adds an item right before the current marker.
Implementation Details
To create add and delete methods, it's necessary to maintain not only a current marker but also a previous marker.
Store the list, previous, and current markers, then create add and delete methods.
Iterator Signaling End of Collection in Python
Method
The iterator should raise a
StopIterationerror to inform Python to end the iteration through the collection.
Generator in Python
Definition
A generator is an iterable object created by a function containing a
yieldstatement.When called, it resumes from where it last left off, retaining the state and local variables of the function to yield subsequent values.
Key Differences from Regular Iterators
Unlike a regular iterator, a generator computes and stores only one value per
next()call.More memory-efficient than an iterator because it generates values on demand rather than storing them all in memory.
Using the yield Keyword to Create a Generator Function
Function Transformation
A function becomes a generator function when it includes the
yieldkeyword.A generator function returns a generator object that can yield multiple values one at a time as we iterate through it.
Resumption and State Retention
Each time the function is called, it resumes from where it last left off, retaining the state and local variables of the function to yield subsequent values.
Usefulness in Memory Efficiency
Generator functions are useful for handling large sets of data without consuming a large amount of memory.
Advantages of Using Generators over Lists or Other Iterable Objects in Python
Memory Efficiency
Generators produce values on the go and do not store the entire sequence in memory.
Ideal for large datasets or infinite sequences.
Lazy Evaluation
They compute values only as needed, enabling efficient processing without computing all values upfront.
Generates values one at a time as you loop over them.
Performance
Generators often outperform lists, especially for large datasets, by avoiding unnecessary computations.
Ease of Use
Simpler to write; it is just a regular function with a
yieldstatement.More concise and readable.
Statefulness
They maintain internal state between function calls, facilitating efficient processing without recomputing values.
Using a for Loop to Iterate Over an Iterator or Generator in Python
Example with Iterator
Given a class
NumberSerieswith__iter__and__next__methods defined:
NumberIterator = iter(NumberSeries) for number in NumberIterator: print(number) # Perform desired operationsExample with Generator
Define a function
next_number()that yields values:
def next_number(): for i in range(10): yield i generator_number = next_number() for number in generator_number: print(number) # Perform desired operationsPython recognizes the
yieldstatement, pauses at that statement, and returns a generator object.This generator object is iterable, so a
forloop can iterate through each element.
Python Generator Function Analysis
Functionality
The
mysteryfunction generates a sequence of integers from 0 ton-1.It skips the next two integers whenever the current integer is a multiple of 5.
Time Complexity
The time complexity is , where is the size of the input.
Even though some numbers are skipped, the complexity remains , since the amount of skipped numbers is proportional to .
Potential Use Cases
This can potentially be utilized to create integers for ID systems.
Alternate complexity analyses
Complexity is per run as we are taking the complexity between each call.
Linked Abstract Data Types (ADTs)
Complexity of Operations
Linked Stack:
Push:
Pop:
Linked Queue:
Append:
Serve:
Abstract Data Types (ADTs)
Summary
ADTs provide storage for a collection of data items and a set of operations to interact with the data, but do not provide any information on how storage is organized and how operations are implemented.
Example: A stack has operations like push, pop, is_empty, etc.
Advantages
Maintenance: Changes to the implementation of the ADT won’t affect the user’s code.
Reusability: Clear and standard way to define common data structures, making them easy to reuse across different projects.
Encapsulation and abstraction: Implementation details are hidden from the user, allowing them to use the data structure without understanding its internals.
Disadvantages
Efficiency: Access to the implementation of the ADT might allow good programmers to improve the efficiency performance.
Overhead in memory, which affects performance.
Stack-Based Calculator with Postfix Notation
Algorithm
Push numbers into the stack one by one.
When encountering an operator, pop two numbers from the stack and perform the calculation.
Example
Expression:
[1, 2, +, 3, 4, +, -]
Read 1. Push into the stack. Stack:
[1]Read 2. Push into the stack. Stack:
[1, 2]Read +. Pop 2 numbers from the stack. Calculation:
1 + 2 = 3. Push back into the stack. Stack:[3]Read 3. Push into the stack. Stack:
[3, 3]Read 4. Push into the stack. Stack:
[3, 3, 4]Read +. Pop 2 numbers from the stack. Calculation:
3 + 4 = 7. Push back into the stack. Stack:[3, 7]Read -. Pop 2 numbers from the stack. Calculation:
3 - 7 = -4. Push back into the stack. Stack:[-4]
Result: -4
The order of math is
{first popped item} - {second popped item}because of LIFO.
Linked ADTs vs. Array-Based Counterparts
Advantages of Linked ADTs
Dynamic resizing: Dynamic and efficient resizing.
LinkedLists can implement stacks with complexity for push and pop operations if the rear node is stored as an instance variable.
LinkedLists can implement queues with complexity for append and serve operations if the front and rear nodes are stored as an instance variable.
No shuffling required when an item is added/deleted.
Has unlimited size until there is no more memory left (hence no need to resize).
Disadvantages of Linked ADTs
Higher access complexity of elements: , as we would have to traverse through each node and the item relating to that node until we reach that index.
Utilizes more memory than an array given that the array is relatively full due to the overhead of storing the additional pointers required by the nodes.
Even if a linked list is sorted, we cannot utilize binary search as we don’t have random access to all elements.
Stack vs. Queue
Stack (LIFO)
Last-item in is the first-item taken out.
Appropriate to reverse the order of a certain list of items, e.g., function call stack, undo/redo.
Queue (FIFO)
First-item taken in is the first-item taken out.
More appropriate for tasks requiring sequential processing, e.g., orders at a restaurant.
Using a Stack to Check if a String of Parentheses Is Balanced
Algorithm
Iterate through each character in the string.
If you encounter an opening parenthesis (e.g.,
(,[,{), push it onto the stack.If you encounter a closing parenthesis (e.g.,
),],}):Check if the stack is not empty and if the top of the stack is the matching opening parenthesis.
If they match, pop the opening parenthesis from the stack.
If they don’t match or the stack is empty, the string is unbalanced.
After processing all characters, check if the stack is empty:
If the stack is empty, the string is balanced.
If the stack still contains unmatched opening parentheses, the string is unbalanced.
Balanced Example:
{[()]}Process
{: push onto the stack → Stack:[{]Process
[: push onto the stack → Stack:[{, []Process
(: push onto the stack → Stack:[{, [, (Process
): matches with(on top of stack → pop(→ Stack:[{, []Process
]: matches with[on top of stack → pop[→ Stack:[{]Process
}: matches with{on top of stack → pop{→ Stack:[]The stack is empty, so the string of parenthesis is balanced.
Unbalanced Example:
{[()]}Process
{: push onto the stack → Stack:[{]Process
[: push onto the stack → Stack:[{, []Process
(: push onto the stack → Stack:[{, [, (Process
]: mismatch with((top of the stack) → UnbalancedThe closing
]does not match the top of the stack(, so the string is unbalanced.
Time Complexity of Stack Operations in Linked Stack
Push(), Pop(), Peek(), and
__len__()The time complexity of
push()assuming that the rear node is stored as an instance variable for the Linked Stack, the best and worst-case complexity would be , as we would be simply connecting a Node to the rear of the Linked List that represents the stack.The complexity of the
pushoperation in an array-based stack is also .In an array-based stack of an unfixed length the complexity would be , as we would have to resize the list each time we add an element to the end of the list.
Likewise, the time complexity of
pop()assuming that the rear node is stored as an instance variable for the Linked Stack, would be , as we would simply remove the rear node and update it to be the previous element of the Linked List that represents the stack.In an array-based stack of an unfixed length this would also be O(1), if we utilise lazy deletion to mark the item as deleted.
The complexity of
peek()would also be , as we would simply get the item from the rear node.In an array-based stack of an unfixed length this would also be O(1), as we could simply store the index of the rearmost element as an instance variable (self.rear) and utilise that to return the rearmost element (self.array[self.rear], where self.array stores the information in the stack).
If the length of the linked list is stored as an instance variable (self.length) and it is updated as elements are removed and appended from the list,
__len__()would simply return self.length, making the complexity of__len__()to be , in terms of complexity.The same approach of updating the length of the list that represents the stack during appending and popping can be utilised for an array-based stack, making the complexity of the
__len__(), , as we would simply return self.length as the length of the stack.
Function Analysis
Functionality
The function recursively prints the reverse of the contents of the linked list
head.
Time Complexity
The best case would be , assuming that the input is None and the head node is None.
The worst case would be , where is the length of the linked list head, as there are recursive calls throughout this.
Mystery Function Analysis
Functionality
A node from a linked list is inputted.
A stack is used to store the elements in the linked list.
The elements of the popped values of the stack are compared to the values of the linked list.
This checks if the original order of values in the list is equal to the reverse of the order of values in the list.
Potential Applications
Check whether or not a string is a palindrome (by representing all chars in a linked list).
Time Complexity
Best and worst-case time complexity: , where is the amount of nodes until reaching the last element of the linked list to which the node
headbelongs.
Function Analysis - Finding the k-th Element from the End of a Linked List
Functionality
The function finds the k-th element from the end of the linked list where the node head is contained in.
Time Complexity
The best complexity is , when is less than the length of the list, and the pointer fast reaches the end before slow needs to traverse.
The best-case complexity and the worst-case complexity is where is the total number of nodes within the linked list where the node head is in.
Dictionary ADT
Main Property
A container that stores objects that are uniquely identified by a label or key.
Key Operations
Search: Retrieve the value of the desired key
Insert: Add new key-value pair into the collection
Delete: Remove the pair of the desired key
Update: Update value of the existing key
Applications
Databases use dictionaries to map unique keys to their associated records, enabling fast lookups.
Hash Function
Purpose
To map a key to a potential position in the hash table.
Properties
Should have a fixed-size output no matter what is inputted.
Should attempt to minimize collisions (where multiple keys are mapped to the same index).
Should map keys evenly across the hash table by taking into account the individual characteristics of the key.
The output should be between
[0, table size-1], where table size is the size of the Hash Table.Should be efficient in terms of time while not compromising on the suggested values above; ideally .
It is type-dependent; for example, a function that hashes strings will not work on an integer.
Defining a Hash Function
Create a mathematical algorithm/function that returns an index where a key along with its accompanying value could potentially exist.
Use a modular expression to ensure that the output of the potential index is always between .
Examples
Bad Hash Function (for strings):
python def hash(key): return self.key(ord[char[0]]) % (table_size - 1)Good Hash Function (for strings):
python def hash(key): prime_one = 31 prime_two = 97 a = prime_one value = 0 for char in key: value = (ord(char) + a * value) % (table_size - 1) a = (a * prime_two) % (table_size - 1) return value
Good vs. Bad Hash Functions Targeting String-Typed Keys
Good Hash Function
def good_hash(key):
value = 0
a = 37
for char in key:
value = (ord(char) + a * value) % table_size
a = a * HASH_BASE % (table_size - 1)
return value
Good Distribution: Spreads out hash values for different strings.
Order Sensitive: Produces different hash values for strings with different character positions (e.g.,
abcandcab).Minimizes Collisions: Reduces the likelihood of collisions through prime constants and multiplication strategy.
Bad Hash Function
def bad_hash(s):
return sum(ord(char) for char in s)
High Collision: Does not account for the position of each character, leading to many collisions (e.g.,
abcandcabhave the same hash value).Limited Hash Value Range: Constrained to the sum of ASCII values, resulting in a small number of unique hash values.
Collision Resolution
Definition of Collision
A collision occurs when the hash of a certain key is the same as the hash of another key.
Resolution Techniques
Open Addressing (Probing): Find a suitable alternate position for the new key by probing throughout the list.
Closed Addressing (Separate Chaining): Resolve conflicts by using a linked list to represent colliding keys.
Separate Chaining
Description
Initially, each cell of the hash table refers to a linked list.
If a key is hashed to a particular index , then we push it to a list data structure (commonly a linked list) in the index .
Collisions vs conflicts
Collisions
Collisions occur when multiple keys map to the same index (i.e. hash(key1) = hash(key2) between any pair of keys).
-While, conflicts occurConflicts occur when hash(key2) = hash(key1) when key1 is already in the array table and we are attempting to insert key1 key2. Whilst, conflicts occur when the hash position for the key has already been occupied by another key
Why It Is Used
Utilized to prevent conflicts caused by collisions within hash tables.
Implementation
If a key is hashed to a particular index , then we push it to a list data structure (commonly a linked list) in the index .
Data Structures
Array-based data structures can be utilized to hold the elements at a particular index.
However, the preferred choice for the data structure is a linked list due to its ability to resize itself infinitely until the system running the algorithm runs out of memory.
Operations for Hash Table with Separate Chaining
Search
To search for the key, use the hash function to find the position where the linked list which contains the key should be.
Use linear search to check if a node contains the item; if it does, then the value corresponding to the key will be returned.
Best-Case Complexity: , assuming the key is at the start of the linked list, where hash is the complexity of the hash function or assuming there is only one element in the linked list.
Worst-Case Complexity: , where is the length of the linked list, and the key-value pair being searched is at the rear of the linked list.
Add
Use the hash function to determine which list the key-value pair should be located in.
Use linear search to check if a node contains the item. If it does, then update the corresponding value.
If a node containing the key-value pair is not found, then append the (key, item) tuple to the rear of the list.
Best-Case Complexity: , assuming the key is at the start of the linked list, where hash is the complexity of the hash function, or assuming there is only one element in the linked list.
Worst-Case Complexity: , where is the length of the linked list, and the key-value pair being searched is at the rear of the linked list.
Delete
To search for the item we need to delete, use the hash function to find the position where the linked list which contains the key should be.
Use linear search to check if a node contains the item. If the key-value pair is found, then the node containing it is deleted from the linked list.
Best-Case Complexity: , assuming the key x is at the start of the linked list, where hash is the complexity of the hash function or assuming there is only one element in the linked list.
Worst-Case Complexity: , where is the length of the linked list. Use , in the worst-case scenario, the item is in the rear of the linked list, assuming that linear search is being used to check whether or not the key is in the array.
Hash Tables: Linear Probing Load Factor
Assumptions
Collisions occur if , where between any pair of 2 keys.
Conflicts
Conflicts occur when during the insertion of , the position has already been taken by .
Linear Probing
We utilize a hash function to find a potential position for a key. If that position is taken, we go onto the next and so on and so forth until we find an empty position we can insert our
(key, value)pair into.
Conflict Resolution
It is a form of conflict resolution within a hash table to deal with logical errors when , if is already in the hash table.
Implementation
For linear probing to work in an implementation, one needs double the table size of the elements which are being inserted, or needs to appropriately resize the hash table prior to inserting an item.
Collision Handling
Upon a conflict during insertion caused by collisions, we will either update (same key being inserted) or utilize the available empty space next to the hash position for the key to store the new item.
How Linear Probing Handles Conflicts Using the Complexity
Basic idea
When collision occurs, linear probing searches the next available empty slot by increasing the hash value by 1 continuously. If it reaches the end of the table, it wraps around to the beginning.
Implementation
(Assume that the new element does not exist in the hash table)
a. Hash function determines the initial index i of the key
b. If the slot at i is occupied, the next slot, i+1 is checked and this process continues until an empty slot is found.
-c. Once an empty slot is found, the new element is inserted at that position.
-d. Searching and deletion involves a similar process, which is probing through the table until the key is found.
Addition to Q3: A
Operations for Hash Table with Linear Probing
To search for the element mapped to a key:
It is needed to compute where the hash function maps the input key to. This checks if the key is there. If it is not there, we will continue to linearly probe the entire table with a step size of one until the item is found. If we reach the end of the hash table, but our search is not complete, we will wrap around in the hash table array and start searching from the beginning.
-Once the item is found, the value corresponding to the key will be returned.
Complexity
The complexity of search in the best-case scenario is O(hash), where hash is the complexity of the hash function assuming that the position found via the hash function is empty.
The worst case-complexity scenario of search is O(hash + n), where n is the table size of the hash table, if the entire hash table is searched. To add a key-value pair: -We will start linear probing from the hash position of the inputted key If the hash position is empty then we will add the key-value pair to it. If the key is found during search at a position, we will take that position and update its value. Else, we can take the first eligible position (empty) after the hash position of the inputted key and insert our
(key, value)pair there. Complexity -The complexity of add in the best-scenario is O(hash), where hash is the complexity mentioned above, and the position the hash function returns is empty/contains the same key we are attempting to update. -In the worst-case scenario the complexity should be O(hash+n), where n is the size of the table if the entire hash table is almost full and there is only one slot free. To delete a particular key value pair functionFirstly, compute where the hash function maps the input key to. We check if the key is there. If it is not there, we will continue to linearly probe around the entire table with a step size of one until the item is found. If we reach the end of the hash table, but our search is not complete, we will wrap around the hash table array and start searching from the beginning. Once the key to delete is found, it can be deleted through lazy deletion.
Complexity
-The complexity of delete should be O(hash) in the best scenario, where O(hash) is the complexity mentioned above and the position that the hash function returns is where the deleted item is located at.
-In the worst-case scenario, we would have O(hash +n) complexity if the entire hash table is searched, where n is the size of the table as mentioned above.
Linear Probing vs. Separate Chaining
Linear Probing
Utilizes probing throughout the existing hash table to find an appropriate position.
Separate Chaining
Utilizes a data structure to store the key-value pairs mapped to a particular position. This data structure can be referred to as a chain and is commonly a linked list.
Advantage of Linear Probing
When the hash table is relatively full, it takes less memory relative to separate chaining as it does not contain the memory overhead from the nodes of the linked list.
Disadvantage of Linear Probing
To add more elements, one needs to resize the hash table once it is full.
Advantage of Separate Chaining
Can handle an unlimited amount of collisions without the hash table needing to be resized as the linked list which holds all elements mapped to a particular position grows dynamically as we can simply add/delete nodes make it grow.
Another Disadvantage of Linear Probing
The creation of primary clusters slows down operations undermining the O(1) promise.
However, if clustering is minimal, the operations complexity is very close to O(1).
Collisions & Conflicts
Collisions:
Definition: Occur when the hash(key1) = hash(key2) for a particular hash function.
Conflicts:
Definition: Occur when hash(key2) = hash(key1) when we are trying to insert key2 in the hash table, but the same position is occupied by key1 (a previously inserted key).
Handling Techniques:
Linear Probing
We can sequentially probe through the hash table with a step size of 1 until an appropriate position for insertion is found.
An appropriate position is a position that has either been marked as empty (or as a sentinel if lazy deletion has been utilized within the hash table).
Separate Chaining
We will create an ADT each time we insert an element to a particular position that was previously empty.
Load factor
Load factor = number of elements in list/table size.
Affect the performance of below collision handling techniques
Linear Probing. -The higher the load factor is, the more keys would be mapped to the same index (hash collisions conflicts) as there will be fewer positions left to fill out the table relative to the amount of elements being added.
This will lead to clustering and will impact the average performance of the search, add, and delete functions
Separate Chaining.collisions conflicts would rise as there will be fewer positions to fill out the table. This may increase the average search time of the algorithm as the amount of comparisons have to make in an individual chain rise.
Clustering
clustering described as a phenomenon which occurs when multiple elements in a hash table are next to each other in a contiguous block.
This impacts the performance of the search, add, and delete functions in the hashtable as the higher the clustering of items is, the longer it can take to probe through the hash table to find a key-value pair
However, a similar phenomenon occurs when some chains which hold key-value pairs grow longer than others as more elements are added to the hash-table, and then it impacts the average performance of the search, add, and delete functions in the hash table.
Insertion Example - with Linear Probing and Hash Value
string Hash value
hello 104%13 = 0
world 119 % 13 = 2
python 112 %13 = 8
hash 104%13 = 0
table 116%13 = 12
programming 112 % 13 = 8
-Explanationhello is inserted at position 0 as it is free.world is inserted at position 2 as it is free.python is inserted at position 8 as it is free.hash is inserted at position 1, as position 0 is taken by helloprogramming cannot be sorted at position 8 as it is taken, so we go to position 9. Position 9 is free so programming goes at position 9
Array:
Array: ['watermelon', None, 'apple', 'grape', 'pear', 'peach', 'orange', 'kiwi', None, 'banana', None]
KEY HASH ACTUAL POSITION PROBE CHAIN
watermelon 110%11 = 0