Think Python for CS114

Chapter 25 Memoization

If you played with the fibonacci function you might have noticed that the bigger the argument you provide, the longer the function takes to run. Furthermore, the run time increases quickly.

To understand why, consider Figure 25.1, which shows the call graph for fibonacci with n=4:

Call graph.

Figure 25.1: Call graph.

A call graph shows a set of function frames, with lines connecting each frame to the frames of the functions it calls. At the top of the graph, fibonacci with n=4 calls fibonacci with n=3 and n=2. In turn, fibonacci with n=3 calls fibonacci with n=2 and n=1. And so on.

Count how many times fibonacci(0) and fibonacci(1) are called. This is an inefficient solution to the problem, and it gets worse as the argument gets bigger.

One solution is to keep track of values that have already been computed by storing them in a dictionary. A previously computed value that is stored for later use is called a memo. Here is a “memoized” version of fibonacci:

known = {0:0, 1:1}
def fibonacci(n: int) -> int:
    if n in known:
        return known[n]
    res = fibonacci(n-1) + fibonacci(n-2)
    known[n] = res
    return res

known is a dictionary that keeps track of the Fibonacci numbers we already know. It starts with two items: 0 maps to 0 and 1 maps to 1.

Whenever fibonacci is called, it checks known. If the result is already there, it can return immediately. Otherwise it has to compute the new value, add it to the dictionary, and return it.

If you run this version of fibonacci and compare it with the original, you will find that it is much faster.

25.1 Global variables

In the previous example, known is created outside the function, so it belongs to the special frame called __main__. Variables in __main__ are sometimes called global because they can be accessed from any function. Unlike local variables, which disappear when their function ends, global variables persist from one function call to the next.

It is common to use global variables for flags; that is, boolean variables that indicate (“flag”) whether a condition is true. For example, some programs use a flag named verbose to control the level of detail in the output:

verbose = True
def example1():
    if verbose:
        print('Running example1')

If you try to reassign a global variable, you might be surprised. The following example is supposed to keep track of whether the function has been called:

been_called = False
def example2():
    been_called = True         # WRONG

But if you run it you will see that the value of been_called doesn’t change. The problem is that example2 creates a new local variable named been_called. The local variable goes away when the function ends, and has no effect on the global variable.

To reassign a global variable inside a function you have to declare the global variable before you use it:

been_called = False
def example2():
    global been_called
    been_called = True

The global statement tells the interpreter something like, “In this function, when I say been_called, I mean the global variable; don’t create a local one.”

Here’s an example that tries to update a global variable:

count = 0
def example3():
    count = count + 1          # WRONG

If you run it you get:

UnboundLocalError: local variable 'count' referenced before assignment

Python assumes that count is local, and under that assumption you are reading it before writing it. The solution, again, is to declare count global.

def example3():
    global count
    count += 1

If a global variable refers to a mutable value, you can modify the value without declaring the variable:

known = {0:0, 1:1}
def example4():
    known[2] = 1

So you can add, remove and replace elements of a global list or dictionary, but if you want to reassign the variable, you have to declare it:

def example5():
    global known
    known = {}

Global variables can be useful, but if you have a lot of them, and you modify them frequently, they can make programs hard to debug.

25.2 Debugging

As you work with bigger datasets it can become unwieldy to debug by printing and checking the output by hand. Here are some suggestions for debugging large datasets:

Scale down the input:

If possible, reduce the size of the dataset. For example if the program reads a text file, start with just the first 10 lines, or with the smallest example you can find. You can either edit the files themselves, or (better) modify the program so it reads only the first n lines.

If there is an error, you can reduce n to the smallest value that manifests the error, and then increase it gradually as you find and correct errors.

Check summaries and types:

Instead of printing and checking the entire dataset, consider printing summaries of the data: for example, the number of items in a dictionary or the total of a list of numbers.

A common cause of runtime errors is a value that is not the right type. For debugging this kind of error, it is often enough to print the type of a value.

Write self-checks:

Sometimes you can write code to check for errors automatically. For example, if you are computing the average of a list of numbers, you could check that the result is not greater than the largest element in the list or less than the smallest. This is called a “sanity check” because it detects results that are “insane”.

Another kind of check compares the results of two different computations to see if they are consistent. This is called a “consistency check”.

Format the output:

Formatting debugging output can make it easier to spot an error. We saw an example in Section 10.1. Another tool you might find useful is the pprint module, which provides a pprint function that displays built-in types in a more human-readable format (pprint stands for “pretty print”).

Again, time you spend building scaffolding can reduce the time you spend debugging.

25.3 Glossary

mapping:

A relationship in which each element of one set corresponds to an element of another set.

dictionary:

A mapping from keys to their corresponding values.

key-value pair:

The representation of the mapping from a key to a value.

item:

In a dictionary, another name for a key-value pair.

key:

An object that appears in a dictionary as the first part of a key-value pair.

value:

An object that appears in a dictionary as the second part of a key-value pair. This is more specific than our previous use of the word “value”.

implementation:

A way of performing a computation.

hashtable:

The algorithm used to implement Python dictionaries.

hash function:

A function used by a hashtable to compute the location for a key.

hashable:

A type that has a hash function. Immutable types like integers, floats and strings are hashable; mutable types like lists and dictionaries are not.

lookup:

A dictionary operation that takes a key and finds the corresponding value.

reverse lookup:

A dictionary operation that takes a value and finds one or more keys that map to it.

raise statement:

A statement that (deliberately) raises an exception.

singleton:

A list (or other sequence) with a single element.

call graph:

A diagram that shows every frame created during the execution of a program, with an arrow from each caller to each callee.

memo:

A computed value stored to avoid unnecessary future computation.

global variable:

A variable defined outside a function. Global variables can be accessed from any function.

global statement:

A statement that declares a variable name global.

flag:

A boolean variable used to indicate whether a condition is true.

declaration:

A statement like global that tells the interpreter something about a variable.

25.4 Exercises

Exercise 25.1.

Write a function that reads the words in words.txt and stores them as keys in a dictionary. It doesn’t matter what the values are. Then you can use the in operator as a fast way to check whether a string is in the dictionary.

If you did Exercise 21.10, you can compare the speed of this implementation with the list in operator and the bisection search.

Exercise 25.2.

Read the documentation of the dictionary method setdefault and use it to write a more concise version of invert_dict.

Exercise 25.3.

If you did Exercise 21.7, you already have a function named has_duplicates that takes a list as a parameter and returns True if there is any object that appears more than once in the list.

Use a dictionary to write a faster, simpler version of has_duplicates.

Exercise 25.4.

Two words are “rotate pairs” if you can rotate one of them and get the other (see rotate_word in Exercise 16.4).

Write a program that reads a wordlist and finds all the rotate pairs.

Exercise 25.5.

Here’s another Puzzler from Car Talk (http://www.cartalk.com/content/puzzlers):

This was sent in by a fellow named Dan O’Leary. He came upon a common one-syllable, five-letter word recently that has the following unique property. When you remove the first letter, the remaining letters form a homophone of the original word, that is a word that sounds exactly the same. Replace the first letter, that is, put it back and remove the second letter and the result is yet another homophone of the original word. And the question is, what’s the word?
Now I’m going to give you an example that doesn’t work. Let’s look at the five-letter word, ‘wrack.’ W-R-A-C-K, you know like to ‘wrack with pain.’ If I remove the first letter, I am left with a four-letter word, ’R-A-C-K.’ As in, ‘Holy cow, did you see the rack on that buck! It must have been a nine-pointer!’ It’s a perfect homophone. If you put the ‘w’ back, and remove the ‘r,’ instead, you’re left with the word, ‘wack,’ which is a real word, it’s just not a homophone of the other two words.
But there is, however, at least one word that Dan and we know of, which will yield two homophones if you remove either of the first two letters to make two, new four-letter words. The question is, what’s the word?

You can use the dictionary from Exercise 25.1 to check whether a string is in the word list.

To check whether two words are homophones, you can use the CMU Pronouncing Dictionary. You can download it from http://www.speech.cs.cmu.edu/cgi-bin/cmudict or from http://thinkpython2.com/code/c06d and you can also download http://thinkpython2.com/code/pronounce.py, which provides a function named read_dictionary that reads the pronouncing dictionary and returns a Python dictionary that maps from each word to a string that describes its primary pronunciation.

Write a program that lists all the words that solve the Puzzler.