Dictionaries (dict)#


For one of your practice problems in this lesson, you will be implementing a function to find the count of each word in a file. It’s not clear how you could use a list or set to solve this problem since you need to be able to answer the question: “for a given word, how many of them have we seen?”

A list seems like it’s more on the right track, but unfortunately, the indices have to be numbers! There is no way of using a list to say that a word should be an index.

Introduction to dict#

The last data structure we are going to learn in this lesson is called a dictionary (in Python, written as dict ). A dict is a very powerful data structure since it acts, in some sense, as a more generalized list . Essentially a dict is much like a list , but allows you to store any type as the index while a list only allows numbers from 0 to len - 1 as valid indices.

To create a dict in Python, you use the syntax in the following snippet. Note that dict supports the square-bracket notation for accessing a value, but now you can use any value for the index. In fact, dict uses a different term for the index to reduce confusion with list s: we call the “index” of an entry in a dict its key. We describe a dict as a bunch of key/value pairs that are accessible via the key.

d = {'a': 1, 'b': 17, 47: 'scurvy'}
print(d)
# This makes a dictionary with the following keys/values:
#   The key 'a' is associated to the value 1
#   The key 'b' is associated to the value 17
#   The key  47 is associated to the value 'scurvy'

# You can get/set the value for a key using the square-bracket notation
print(d['b'])

d['dogs'] = 'cute'
print(d)

# If a key already exists in the dict, it will be overwritten if you set it
d['dogs'] = 'very adorable'
print(d)

The nice thing is you have a pretty solid understanding of how to use a dict already because you know how to use list s! The semantics of accessing/setting a value associated to a key are very similar to accessing/setting a value associated to an index in a list .

If you try to look up a key that is not in the dict , you will run into a KeyError , as shown in the following snippet. As a note, we also show how to make an empty dict with the syntax {} (just like an empty list is [] ).

d = {}
d['dogs'] = 'very cute'
print(d['cats'])

To prevent this error, you can use the in keyword to see if a key is in a dict before trying to access it.

d = {}
d['dogs'] = 'very cute'
if 'cats' in d:
    print(d['cats'])
else:
    print('No cats!')

Example#

Imagine we had a list of strings, and we wanted to find the sum of the word lengths that start with each letter. For example, with the list ['cats', 'dogs', 'deers'] we would report the sum of the lengths of strings that start with 'c' is 4 while the sum of the lengths of strings that start with 'd' is 9. We will write a function called count_lengths to solve this problem. The function should take a list of words (all str ) and we can assume none of the str are the empty string.

This seems like the task of a dict where the keys are the first letters of the words, and the values are the sum of the lengths. Let’s try to write a function to use the things we have seen so far to do this!

def count_lengths(words):
    counts = {}
    for word in words:
        first_letter = word[0]
        counts[first_letter] = counts[first_letter] + len(word)
    return counts

print(count_lengths(['cats', 'dogs', 'deers']))

We ran into an error! What happened?

It turns out we crashed on the first word in the list 'cats' . We get the first letter 'c' and we try to get the value in the dictionary associated to the key 'c' when we evaluate counts[first_letter] + len(word) . Remember though, if a key is not present, we get a KeyError which is exactly what happened in this snippet.

To fix this, we need to introduce a common pattern when working with dict s. If you are ever adding values to a dict , you commonly need to think about the cases:

  • This is the first time we have seen the key

  • We have seen the key before

Depending on which case you are in, you need to write different code to handle the fact that the key is not present in the dict in the first case. We can easily fix this by introducing a check that uses in , but the pattern does look a bit odd at first. All of the added code is inside the loop, and is there to avoid getting this KeyError .

def count_lengths(words):
    counts = {}
    for word in words:
        first_letter = word[0]
        if first_letter in counts:
            counts[first_letter] = counts[first_letter] + len(word)
        else:
            counts[first_letter] = len(word)
    return counts

print(count_lengths(['cats', 'dogs', 'deers']))

To explain what this if -check is doing in English: If we haven’t seen the key before, we can just go ahead and set it to the length of the current word. If we have seen it before, we take the current sum for this key and add on the length of the current word.

Note

A very common bug when you’re first working with dicts is forgetting this type of pattern to handle the first time you see a key.