PY4E - Python for Everybody
Understanding Dictionaries
A dictionary is a data structure like a list but more flexible.
Keys and Values: A dictionary maps keys to values, where keys can be of any data type and values can also vary. The connection between a key and a value is a key-value pair.
Creating a Dictionary: Use the
dict()function or curly brackets{}.Example:
eng2sp = dict()creates an empty dictionary.eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}initializes with items.
Properties of Dictionaries
Ordered Structure: As of Python 3.7, dictionaries maintain the order of key-value pairs as entered.
Value Lookup:
Access values using keys:
eng2sp['two']returns 'dos'.Attempting to access a non-existing key raises a
KeyError.
Key Functions:
len(eng2sp)returns the number of entries.'key' in dictionarychecks key presence.Values can be accessed and checked using
dictionary.values().
Performance of Dictionaries
Efficient Search:
Lists: Linear search time increases with size.
Dictionaries: Utilizes hashing, allowing for constant time search regardless of size, which greatly enhances efficiency.
Example: Counting Letters in a String
Implementing Frequency Counter: Use a dictionary to count occurrences of each letter in a string.
Initializing:
d = dict()
For each character, check existence and update count:
if c not in d: d[c] = 1 else: d[c] += 1
Result example:
{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}
Using the get() Method
Simplifies counting loops by avoiding key existence checks:
d[c] = d.get(c, 0) + 1
This idiom enhances readability and is commonly used.
Dictionaries with Files
Reading files to count occurrences is a common task. Utilize nested loops:
Outer loop iterates through file lines, while the inner counts words.
Code example:
for line in fhand: words = line.split() for word in words: counts[word] = counts.get(word, 0) + 1
Advanced Word Counting with Punctuation
Cleaning Input: Remove punctuation and normalize case using
str.translate()andstr.lower()to avoid miscounts.Complete code now includes steps to clean lines before counting words:
line = line.translate(line.maketrans('', '', string.punctuation)).lower()
Debugging Tips for Dictionaries
Scale Down: Test with smaller data sets to isolate errors.
Summaries: Print summaries instead of entire datasets to identify anomalies.
Self-checks: Implement sanity checks to validate outputs.
Vocabulary
Dictionary: A mapping from keys to corresponding values.
Hashtable: The algorithm behind dictionary implementation.
Histogram: A set of counters reflecting frequencies of items.
Exercises
Exercise 1: Read words from a file and store them in a dictionary as keys.
Exercise 2: Count messages by the day of the week from email logs.
Exercise 3: Build a histogram of email counts from different addresses.
Exercise 4: Determine who sent the most emails in the dataset.
Exercise 5: Modify email counting to aggregate by domain instead of addresses.