M08: More Lists
Hint: If possible, make your browser wide enough to place the slides and commentary side-by-side.
The slides on this page are not accessible (for screen readers, etc). The 1up and 3up slides are better.
Module Topics
The payroll is shown visually using two different diagramming notations. Both represent the same list. The payroll is a three element list. Each of those elements is a two-element list containing a name and a salary.
Refer back to tax-payable if you don’t remember it.
The payroll is shown visually using the two different diagramming notations. Both represent the same list. The payroll is a three element list. Each of those elements is a two-element list containing a name and a salary.
Recall the (listof X)
data definition:
;; A (listof X) is one of:
;; * empty
;; * (cons X (listof X))
A payroll is the same as a (listof X)
where X
is a two-element list that
we represent with (list Str Num)
. The
first element is a string and the second is a number. Because the data definition
has been made more specific, we give it a specific name: Payroll
.
Intuitively, an empty list is a Payroll. If you want a longer Payroll, cons the name of an employee and their salary onto a Payroll.
Note the new notation: (list Str Num)
. That is, list
can appear in data definitions
just like cons
and (sub1 n)
. It simply means a two-element list where the
first element is a string and the second is a number. In general, we use
(list X Y Z ...)
to indicate a fixed-length list with the given types.
We will often use fixed-length lists to group information that belongs together – like a name and salary. In M10 we’ll see another way to group information, structures.
Instead of writing this data definition we could write (listof (list Str Num))
in contracts. But that doesn’t help us derive the template (and therefore code that
consumes a Payroll) and a contract of Payroll -> TaxRoll
is more expressive
(easier to write and understand) than
(listof (list Str Num)) -> (listof (list Str Num))
Recap of (list ...)
vs. (listof ...)
:
- Use
(list ...)
for a fixed-length list. You know when you write the code exactly how many elements the list will have. Typically those items are all related to each other, like a person’s name and their salary. (list ...)
will have one type for each element in the list.- Use
(listof ...)
when the length of the list is unknown or changes as the program runs. The list might be empty or it might be very long or anything in between. (listof ...)
will have exactly one argument: the type that every element in the list will have. It might be a complicated type such as(list Str Num)
.(list ...)
and(listof ...)
are not interchangeable! One of them will be right and one will be wrong. (Well, it’s possible that they’re both wrong 😊.)
Recall that (first pr)
produces the first element of the payroll. That
first element is a two-element list with a name and salary. So (first (first pr))
produces the name while (second (first pr))
produces the salary. We could
also use (first (rest (first pr)))
.
This code is not very readable. A better
solution is to use well-named helper functions such as name
and salary
instead of first
and second
(sort of like structures!).
With a template in hand, we can now start to solve the actual problem of computing taxes for a payroll.
The slide shows the template after a number of steps in the design recipe:
- Writing the purpose
- Writing some examples (represented here by constants
test-payroll
andtest-taxes
; definitions aren’t shown) - Writing the header (including renaming
payroll-template
to something more problem-specific) and contract (note the use of the types we defined earlier,Payroll
andTaxRoll
) - Refining the purpose
The classic questions for a (listof X)
function apply here. Their answers direct how to fill in
the gaps in the template.
- What do you do in the base case?
There are no taxes owed if no one earned a salary. The contract says to produce a(listof X)
.empty
fits both observations. - How do you transform the first element on the list?
In this case, compute the taxes on the salary. Put that number together with the name into a two element list. - What value is produced by the recursive processing of the rest of the list?
A list of taxes owed by the employees on the rest of the list. - How do you combine the value produced in 2 with the value produced in 3?
Add the newly calculated taxes owed to the beginning of the list.
A really useful strategy is to break the problem into two parts:
- the part that handles the list and
- the part that handles one item from the list.
This makes clear the overall structure of the code is simply the (listof-X-template)
and allows one to focus on the distinct part of how one item is handled.
This is very similar to the “refined” (listof X) template discussed in M06.
What we’ve just seen – a list of fixed-length flat lists – is just the beginning. That’s as complex a list structure as we’ll see in this module. In later modules we’ll explore more complex arrangements.
The remainder of this module will explore uses for lists of lists and processing multiple lists or a list and a number.
Students often get confused by cons
and list
. Play with small
examples in DrRacket until you can accurately predict what each does.
An important aspect of the definition is that keys are unique . Given a key, we can look it up in a dictionary and get, at most, one value. We say “at most” because it’s possible the key isn’t there. But there won’t be two of them.
Values, on the other hand, may be duplicated: for example, each student has a unique student ID (key), but multiple students might have the same grade (value).
Sample Dictionaries:
Userid | Mark |
---|---|
asmith | 87 |
bjones | 64 |
cwang | 87 |
Name | Phone |
---|---|
A. Smith | 519-555-0123 |
B. Jones | 226-555-3210 |
C. Wang | 226-555-5678 |
lookup
is also commonly called find
or search
. add
is sometimes called insert
.
You might start to think about what the contracts for these functions should be. In particular, what do they produce?
Computer scientists have come up with an amazing number of ways to implement a dictionary. The big advantage of an association list is that it is simple to implement. The big disadvantage is being increasingly slow as the dictionary gets larger (has more entries).
Other methods of implementing dictionaries trade off complexity for some set of other advantages, typically speed. Some might be blazingly fast to search but slow to insert or delete; others might try to balance those operations in different ways.
We’ll explore at least one other dictionary implementation strategy, binary search trees , in M11.
We have three options for documenting this data type.
We could write the full data definition, as shown on the previous slide. This gives the maximum amount of help in writing the template and therefore the maximum help in writing the code. It also gives a name that is useful in writing contracts.
Because it’s a list of number-string pairs, we could use (listof (list Nat Str))
every place
we need the data type, principally in contracts. However, the term AL
or “association list”
helps the reader understand the intent of
the parameter is probably to look something up. The listof
format doesn’t convey
that meaning. It’s also surprising how quickly the keystrokes for (listof (list Nat Str))
add up when writing contracts. This form also ignores the requirement that keys are unique.
A middle ground that gives a meaningful name to use in contracts, includes the requires statement, but doesn’t help with the templates is to write
;; An AL (association list) is a `(listof (list Nat Str))`
;; Requires: each key (Nat) is unique
This is very similar to the payroll template. Like it, helper functions make the code more readable in the improved version.
Section 6.2 of the Style Guide says that accessor functions for fixed-length lists don’t need the design recipe.
This code is similar to the insert
function we wrote earlier in that the template
doesn’t tell the whole story. What both functions have in common is that there are two
base cases; two situations in which we stop.
For insert
the two stopping situations were (1) when we got to the end of the list and (2)
when the item we were inserting belonged at the beginning of the list.
For lookup
the two stopping situations are (1) when we get to the end of the list and (2)
when we find what we’re looking for.
In both cases the template was a good starting point even though the final form of the function didn’t match it really well.
We’ve introduced two concepts here:
(anyof ...)
indicates any of several different things. A contract of(anyof Sym Str) -> Num
means the function could consume a symbol or it could consume a string. Both work, but a number does not.- We’re now allowing values into our contracts. To say that
lookup-al
on the previous slide produces(anyof Str Bool)
is correct. If the key is found, it produces a string. If the key is not found, it produces a Boolean. Butlookup-al
never producestrue
, so it’s more accurate to say that it produces(anyof Str false)
. We now allow our contracts to include specific values.
Here is a slight revision of the countup-to
code we saw in the previous module. Instead
of counting to b
it counts until b
(that is, b
is not included).
;; (countup-until n b) produces an increasing list from n to b-1
;; Examples:
(check-expect (countup-until 6 9) (cons 6 (cons 7 (cons 8 empty))))
(check-expect (countup-until 9 9) empty)
;; countup-until: Int Int -> (listof Int)
(define (countup-until n b)
(cond [(>= n b) empty]
[else (cons n (countup-until (add1 n) b))]))
make-a-row
is identical except for:
- naming (function name, the counter is called
c
instead ofn
) - it has an extra parameter,
r
, that goes along for the ride - the counter,
c
, is transformed before beingcons
ed onto the list
make-a-row
constructs one row for the multiplication table by figuring out
what value each column should have.
generate-rows
is also very similar to countup-until
. This time the counter is called
r
. Again, we have one argument going along for the ride (nc
, the number of columns
that make-a-row
will produce for each row).
The big difference is that instead of just cons
ing n
onto the list we’re
making a whole row with make-a-row
and cons
ing that row/list onto the list.
The result is a list of lists.
Finally, mult-table
is just a wrapper function that calls generate-rows
with the
correct values.
We will return to this example two more times. The first time we will “bundle”
the two helper functions with mult-table
to make a tidier package. The second
time we will reduce these eight lines of code down to three lines.
In this section we’ll develop functions with two parameters, each of which is a list:
;; twolist-template: (listof X) (listof X) -> Any
(define (twolist-template lst1 lst2) ... )
Each of those two lists might be empty or not. So we might expect code similar to this:
(define (twolist-template lst1 lst2)
(cond [(and (empty? lst1) (empty? lst2)) ...]
[(and (empty? lst1) (cons? lst2)) ...]
[(and (cons? lst1) (empty? lst2)) ...]
[(and (cons? lst1) (cons? lst2)) ...]))
This is getting complicated fast! We can actually break it down into three cases:
- We only need to process one list; the other “goes along for the ride”.
- Both lists are the same length; one element from each is processed at each step.
- The general case: unequal lengths, process one or the other or both.
We’ll take these three cases in order of increasing complexity with a concrete example for each.
In the first case of processing two lists, we only actually process one. The second one just goes along for the ride. That means that of the four tests identified above, we only need to consider two: whether the first list is empty or not.
Video: Download m08.30_process_one_of_two
cons
, list
and append
are all different! But they are often confused:
(cons e lst)
is used to makelst
one element longer by addinge
at the beginning.(list e_1 e_2 ... e_n)
is used to construct a list with the n elements given.(append lst1 lst2)
produces a single list with all the elements inlst1
followed by all the elements inlst2
, in order.
Examples:
This | Produces |
---|---|
(cons (list 1 2) (list 3 4)) |
(list (list 1 2) 3 4) (a list of length 3) |
(list (list 1 2) (list 3 4)) |
(list (list 1 2) (list 3 4)) (a list of length 2) |
(append (list 1 2) (list 3 4)) |
(list 1 2 3 4) (a list of length 4) |
Save yourself some typing:
(check-expect (expand-each (list 12 13 'x)
(list 42 "zorkmids" 'Q))
(list (list 12 42 "zorkmids" 'Q)
(list 13 42 "zorkmids" 'Q)
(list 'x 42 "zorkmids" 'Q)))
In the second case of processing two lists, we only need to check one of the lists
for empty or non-empty – but for a different reason than the first case (e.g. my-append
).
Video: Download m08.40_lockstep
It seems reasonable that someone might need a predicate to check whether two lists
of numbers are equal or not. We’ll define “equal” to mean exactly the same
elements in the same order. (list 1 2 3)
and (list 2 3 4)
are obviously
not equal. (list 1 2 3)
and (list 1 2 3)
are equal.
(list 1 2 3)
and (list 1 3 2)
are not equal even though they have the same elements.
We also need to be able to compare lists with different lengths (which will always
return false
).
Handling unequal length lists leads to the same starting template code as
we had for merge
(shown in the slide). As with merge
we can’t write a complete
template because more problem-specific analysis is needed.
The observations on the previous slide lead directly to this code.
Some further simplifications are possible.
There is a very simple, elegant solution to equality for PrettyMuchAny
. If you happen
to find it, great! If not, don’t worry. We’ll see several helpful ideas about this in module 11.
Do not over-use equal?
. It should not be used for simple values when you know
the types of each argument. In that case, use the appropriate predicate such as
symbol=?
, string=?
, =
, etc. It’s more efficient and, if the types are not
correct Racket will give an error right away – a tremendous help in debugging.
It is reasonable to use equal?
for complex types such as lists and structures
(when we get there).
The fourth case is the most general where the lists may be of different lengths and may be processed at different rates. As a result, either one or both could be empty. We’ll need to test all four possibilities.
We’ve focusing for the moment on the second and third possibilities where exactly one list is empty. In the template, extract the first element and the rest of the list for each non-empty list.
When one list is empty, it could be that all we need is the non-empty list without change. No recursion is needed.
But it could be that the non-empty list needs further processing, leading to still more recursion.
A short video to introduce the merging problem:
Video: Download m08.50_merge
The first three examples are base cases: no further recursion is necessary because one of the consumed lists is the result. Note that either or both lists may be empty.
In the recursive cases, there is still more reasoning to be done.
Tests to get you started:
;; Base cases:
(check-expect (merge empty empty) empty)
(check-expect (merge empty (list 2 6 9)) (list 2 6 9))
(check-expect (merge (list 1 3) empty) (list 1 3))
;; Recursive cases:
(check-expect (merge (list 1 4) (list 2)) (list 1 2 4))
(check-expect (merge (list 3 4) (list 2)) (list 2 3 4))
If (first lon1)
is the smaller one, then we need the first of the natural recursions ealier.
If (first lon2)
is the smaller one, then we need the second of three natural recursions.
After understanding this trace, consider the code on the previous slide again. A number of transformations are possible that shorten the code, make it slightly more efficient for the computer, and may make it easier to understand.
First, note that in the first question/answer pair lon2
is empty
and the answer produced
could be replaced with lon2
. The first two question/answer pairs are then
[(and (empty? lon1) (empty? lon2)) lon2]
[(and (empty? lon1) (cons? lon2)) lon2]
and it becomes apparent that both produce lon2
regardless of
whether lon2
is empty or not. So the two pairs can be replaced with just
[(empty? lon1) lon2]
Second, the last question/answer pair can be replaced with else
. We then have [else (cond ...)]
. That’s
considered bad style and the question/answer pairs of the inner cond
can be promoted to the outer cond
.
Together, these transformation result in the following:
(define (merge lon1 lon2)
(cond [(empty? lon1) lon2] ; first two cases
[(empty? lon2) lon1] ; third case
[(< (first lon1) (first lon2))
(cons (first lon1) (merge (rest lon1) lon2))]
[else (cons (first lon2) (merge lon1 (rest lon2)))]))
Aside: Mergesort
merge
is a large
fraction of the MergeSort algorithm. This is not officially part of CS135; but just in case you’re
curious….
mergesort
consumes an unsorted (listof Num)
and produces a sorted list. merge
produces a sorted
list, too. If only we had a way to split the list that mergesort
consumes and sort them, we could
then merge them together to produce the entire sorted list.
We can split the list with keep-alternates
and drop-alternates
. keep-alternates
will keep every other element in a list, starting with the first. drop-alternates
will drop every other element (keeping the rest). We’ll cover them in Module 11
.
mergesort
can sort those shorter lists.
Putting it all together, we get:
;; mergesort: (listof Num) -> (listof Num)
(define (mergesort lst)
(cond [(empty? lst) empty]
[(empty? (rest lst)) lst]
[else (merge (mergesort (keep-alternates lst))
(mergesort (drop-alternates lst)))]))
Note that this is not simple recursion! That’s because the lists consumed by the recursive application
of mergesort
are not just one step closer to the base case. The code for merge
is simple recursion.
mergesort
is more complicated than insertion sort but on longer lists it is much faster than insertion sort.
Video: Download m08.60_at-least
Take a careful look at the contract. It consumes a list of anything. The lists
here all have elements of the same type (because that’s the most common) but the
lists could also have mixed type such as (list 1 'a 2 'b)
. Also note that the
thing we’re looking for is of type Any
– which may or may not match the type
that’s in the list.
In the tests, it’s always true that there are at least 0 examples of anything at all on a list – even if the list is empty. 0 is one of the base cases.
The steps that lead to this code:
- The first two cases, in which n is 0, both have a true result. They can
be collapsed into a single case,
[(zero? n) true]
- We collapse
(and (> n 0) (empty? lst))
to just(empty? lst)
because we’ve already established that n is not zero. - The test for
(and (> n 0) (cons? lst))
can be replaced withelse
. That makes it obvious that the nestedcond
can be promoted to the top levelcond
.