A dictionary is a data structure like a list but more flexible.
Keys and Values: A dictionary maps keys to values, where keys can be of any data type and values can also vary. The connection between a key and a value is a key-value pair.
Creating a Dictionary: Use the dict()
function or curly brackets {}
.
Example:
eng2sp = dict()
creates an empty dictionary.
eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
initializes with items.
Ordered Structure: As of Python 3.7, dictionaries maintain the order of key-value pairs as entered.
Value Lookup:
Access values using keys: eng2sp['two']
returns 'dos'.
Attempting to access a non-existing key raises a KeyError
.
Key Functions:
len(eng2sp)
returns the number of entries.
'key' in dictionary
checks key presence.
Values can be accessed and checked using dictionary.values()
.
Efficient Search:
Lists: Linear search time increases with size.
Dictionaries: Utilizes hashing, allowing for constant time search regardless of size, which greatly enhances efficiency.
Implementing Frequency Counter: Use a dictionary to count occurrences of each letter in a string.
Initializing:
d = dict()
For each character, check existence and update count:
if c not in d:
d[c] = 1
else:
d[c] += 1
Result example: {'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}
get()
MethodSimplifies counting loops by avoiding key existence checks:
d[c] = d.get(c, 0) + 1
This idiom enhances readability and is commonly used.
Reading files to count occurrences is a common task. Utilize nested loops:
Outer loop iterates through file lines, while the inner counts words.
Code example:
for line in fhand:
words = line.split()
for word in words:
counts[word] = counts.get(word, 0) + 1
Cleaning Input: Remove punctuation and normalize case using str.translate()
and str.lower()
to avoid miscounts.
Complete code now includes steps to clean lines before counting words:
line = line.translate(line.maketrans('', '', string.punctuation)).lower()
Scale Down: Test with smaller data sets to isolate errors.
Summaries: Print summaries instead of entire datasets to identify anomalies.
Self-checks: Implement sanity checks to validate outputs.
Dictionary: A mapping from keys to corresponding values.
Hashtable: The algorithm behind dictionary implementation.
Histogram: A set of counters reflecting frequencies of items.
Exercise 1: Read words from a file and store them in a dictionary as keys.
Exercise 2: Count messages by the day of the week from email logs.
Exercise 3: Build a histogram of email counts from different addresses.
Exercise 4: Determine who sent the most emails in the dataset.
Exercise 5: Modify email counting to aggregate by domain instead of addresses.