PY4E - Python for Everybody

Understanding Dictionaries

  • A dictionary is a data structure like a list but more flexible.

  • Keys and Values: A dictionary maps keys to values, where keys can be of any data type and values can also vary. The connection between a key and a value is a key-value pair.

  • Creating a Dictionary: Use the dict() function or curly brackets {}.

    • Example:

      • eng2sp = dict() creates an empty dictionary.

      • eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'} initializes with items.

Properties of Dictionaries

  • Ordered Structure: As of Python 3.7, dictionaries maintain the order of key-value pairs as entered.

  • Value Lookup:

    • Access values using keys: eng2sp['two'] returns 'dos'.

    • Attempting to access a non-existing key raises a KeyError.

  • Key Functions:

    • len(eng2sp) returns the number of entries.

    • 'key' in dictionary checks key presence.

    • Values can be accessed and checked using dictionary.values().

Performance of Dictionaries

  • Efficient Search:

    • Lists: Linear search time increases with size.

    • Dictionaries: Utilizes hashing, allowing for constant time search regardless of size, which greatly enhances efficiency.

Example: Counting Letters in a String

  • Implementing Frequency Counter: Use a dictionary to count occurrences of each letter in a string.

    • Initializing:

      • d = dict()

    • For each character, check existence and update count:

      if c not in d:
      d[c] = 1
      else:
      d[c] += 1
  • Result example: {'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}

Using the get() Method

  • Simplifies counting loops by avoiding key existence checks:

    • d[c] = d.get(c, 0) + 1

  • This idiom enhances readability and is commonly used.

Dictionaries with Files

  • Reading files to count occurrences is a common task. Utilize nested loops:

    • Outer loop iterates through file lines, while the inner counts words.

  • Code example:

    for line in fhand:
    words = line.split()
    for word in words:
    counts[word] = counts.get(word, 0) + 1

Advanced Word Counting with Punctuation

  • Cleaning Input: Remove punctuation and normalize case using str.translate() and str.lower() to avoid miscounts.

  • Complete code now includes steps to clean lines before counting words:

    line = line.translate(line.maketrans('', '', string.punctuation)).lower()

Debugging Tips for Dictionaries

  • Scale Down: Test with smaller data sets to isolate errors.

  • Summaries: Print summaries instead of entire datasets to identify anomalies.

  • Self-checks: Implement sanity checks to validate outputs.

Vocabulary

  • Dictionary: A mapping from keys to corresponding values.

  • Hashtable: The algorithm behind dictionary implementation.

  • Histogram: A set of counters reflecting frequencies of items.

Exercises

  • Exercise 1: Read words from a file and store them in a dictionary as keys.

  • Exercise 2: Count messages by the day of the week from email logs.

  • Exercise 3: Build a histogram of email counts from different addresses.

  • Exercise 4: Determine who sent the most emails in the dataset.

  • Exercise 5: Modify email counting to aggregate by domain instead of addresses.

robot