If you played with the fibonacci function you might have noticed that the bigger the argument you provide, the longer the function takes to run. Furthermore, the run time increases quickly.
To understand why, consider Figure 25.1, which shows the call graph for fibonacci with n=4:
A call graph shows a set of function frames, with lines connecting each frame to the frames of the functions it calls. At the top of the graph, fibonacci with n=4 calls fibonacci with n=3 and n=2. In turn, fibonacci with n=3 calls fibonacci with n=2 and n=1. And so on.
Count how many times fibonacci(0) and fibonacci(1) are called. This is an inefficient solution to the problem, and it gets worse as the argument gets bigger.
One solution is to keep track of values that have already been computed by storing them in a dictionary. A previously computed value that is stored for later use is called a memo. Here is a “memoized” version of fibonacci:
known is a dictionary that keeps track of the Fibonacci numbers we already know. It starts with two items: 0 maps to 0 and 1 maps to 1.
Whenever fibonacci is called, it checks known. If the result is already there, it can return immediately. Otherwise it has to compute the new value, add it to the dictionary, and return it.
If you run this version of fibonacci and compare it with the original, you will find that it is much faster.
In the previous example, known is created outside the function,
so it belongs to the special frame called __main__
.
Variables in __main__
are sometimes called global
because they can be accessed from any function. Unlike local
variables, which disappear when their function ends, global variables
persist from one function call to the next.
It is common to use global variables for flags; that is, boolean variables that indicate (“flag”) whether a condition is true. For example, some programs use a flag named verbose to control the level of detail in the output:
If you try to reassign a global variable, you might be surprised. The following example is supposed to keep track of whether the function has been called:
But if you run it you will see that the value of been_called
doesn’t change. The problem is that example2 creates a new local
variable named been_called
. The local variable goes away when
the function ends, and has no effect on the global variable.
To reassign a global variable inside a function you have to declare the global variable before you use it:
The global statement tells the interpreter
something like, “In this function, when I say been_called
, I
mean the global variable; don’t create a local one.”
Here’s an example that tries to update a global variable:
If you run it you get:
Python assumes that count is local, and under that assumption you are reading it before writing it. The solution, again, is to declare count global.
If a global variable refers to a mutable value, you can modify the value without declaring the variable:
So you can add, remove and replace elements of a global list or dictionary, but if you want to reassign the variable, you have to declare it:
Global variables can be useful, but if you have a lot of them, and you modify them frequently, they can make programs hard to debug.
As you work with bigger datasets it can become unwieldy to debug by printing and checking the output by hand. Here are some suggestions for debugging large datasets:
If possible, reduce the size of the dataset. For example if the program reads a text file, start with just the first 10 lines, or with the smallest example you can find. You can either edit the files themselves, or (better) modify the program so it reads only the first n lines.
If there is an error, you can reduce n to the smallest value that manifests the error, and then increase it gradually as you find and correct errors.
Instead of printing and checking the entire dataset, consider printing summaries of the data: for example, the number of items in a dictionary or the total of a list of numbers.
A common cause of runtime errors is a value that is not the right type. For debugging this kind of error, it is often enough to print the type of a value.
Sometimes you can write code to check for errors automatically. For example, if you are computing the average of a list of numbers, you could check that the result is not greater than the largest element in the list or less than the smallest. This is called a “sanity check” because it detects results that are “insane”.
Another kind of check compares the results of two different computations to see if they are consistent. This is called a “consistency check”.
Formatting debugging output can make it easier to spot an error. We saw an example in Section 10.1. Another tool you might find useful is the pprint module, which provides a pprint function that displays built-in types in a more human-readable format (pprint stands for “pretty print”).
Again, time you spend building scaffolding can reduce the time you spend debugging.
A relationship in which each element of one set corresponds to an element of another set.
A mapping from keys to their corresponding values.
The representation of the mapping from a key to a value.
In a dictionary, another name for a key-value pair.
An object that appears in a dictionary as the first part of a key-value pair.
An object that appears in a dictionary as the second part of a key-value pair. This is more specific than our previous use of the word “value”.
A way of performing a computation.
The algorithm used to implement Python dictionaries.
A function used by a hashtable to compute the location for a key.
A type that has a hash function. Immutable types like integers, floats and strings are hashable; mutable types like lists and dictionaries are not.
A dictionary operation that takes a key and finds the corresponding value.
A dictionary operation that takes a value and finds one or more keys that map to it.
A statement that (deliberately) raises an exception.
A list (or other sequence) with a single element.
A diagram that shows every frame created during the execution of a program, with an arrow from each caller to each callee.
A computed value stored to avoid unnecessary future computation.
A variable defined outside a function. Global variables can be accessed from any function.
A statement that declares a variable name global.
A boolean variable used to indicate whether a condition is true.
A statement like global that tells the interpreter something about a variable.
Write a function that reads the words in words.txt and stores them as keys in a dictionary. It doesn’t matter what the values are. Then you can use the in operator as a fast way to check whether a string is in the dictionary.
If you did Exercise 21.10, you can compare the speed of this implementation with the list in operator and the bisection search.
Read the documentation of the dictionary method setdefault
and use it to write a more concise version of invert_dict
.
If you did Exercise 21.7, you already have
a function named has_duplicates
that takes a list
as a parameter and returns True if there is any object
that appears more than once in the list.
Use a dictionary to write a faster, simpler version of
has_duplicates
.
Two words are “rotate pairs” if you can rotate one of them
and get the other (see rotate_word
in Exercise 16.4).
Write a program that reads a wordlist and finds all the rotate pairs.
Here’s another Puzzler from Car Talk (http://www.cartalk.com/content/puzzlers):
This was sent in by a fellow named Dan O’Leary. He came upon a common one-syllable, five-letter word recently that has the following unique property. When you remove the first letter, the remaining letters form a homophone of the original word, that is a word that sounds exactly the same. Replace the first letter, that is, put it back and remove the second letter and the result is yet another homophone of the original word. And the question is, what’s the word?
Now I’m going to give you an example that doesn’t work. Let’s look at the five-letter word, ‘wrack.’ W-R-A-C-K, you know like to ‘wrack with pain.’ If I remove the first letter, I am left with a four-letter word, ’R-A-C-K.’ As in, ‘Holy cow, did you see the rack on that buck! It must have been a nine-pointer!’ It’s a perfect homophone. If you put the ‘w’ back, and remove the ‘r,’ instead, you’re left with the word, ‘wack,’ which is a real word, it’s just not a homophone of the other two words.
But there is, however, at least one word that Dan and we know of, which will yield two homophones if you remove either of the first two letters to make two, new four-letter words. The question is, what’s the word?
You can use the dictionary from Exercise 25.1 to check whether a string is in the word list.
To check whether two words are homophones, you can use the CMU
Pronouncing Dictionary. You can download it from
http://www.speech.cs.cmu.edu/cgi-bin/cmudict or from
http://thinkpython2.com/code/c06d and you can also download
http://thinkpython2.com/code/pronounce.py, which provides a function
named read_dictionary
that reads the pronouncing dictionary and
returns a Python dictionary that maps from each word to a string that
describes its primary pronunciation.
Write a program that lists all the words that solve the Puzzler.