M08: More Lists

Hint: If possible, make your browser wide enough to place the slides and commentary side-by-side.

The slides on this page are not accessible (for screen readers, etc). The 1up and 3up slides are better.

Module Topics

Fixed-length Lists
Dictionaries
2D data
Processing two lists
Processing a list and a number

Fixed-length Lists

The payroll is shown visually using two different diagramming notations. Both represent the same list. The payroll is a three element list. Each of those elements is a two-element list containing a name and a salary.

Refer back to tax-payable if you don’t remember it.

The payroll is shown visually using the two different diagramming notations. Both represent the same list. The payroll is a three element list. Each of those elements is a two-element list containing a name and a salary.

Recall the (listof X) data definition:

;; A (listof X) is one of:
;; * empty
;; * (cons X (listof X))

A payroll is the same as a (listof X) where X is a two-element list that we represent with (list Str Num). The first element is a string and the second is a number. Because the data definition has been made more specific, we give it a specific name: Payroll.

Intuitively, an empty list is a Payroll. If you want a longer Payroll, cons the name of an employee and their salary onto a Payroll.

Note the new notation: (list Str Num). That is, list can appear in data definitions just like cons and (sub1 n). It simply means a two-element list where the first element is a string and the second is a number. In general, we use (list X Y Z ...) to indicate a fixed-length list with the given types.

We will often use fixed-length lists to group information that belongs together – like a name and salary. In M10 we’ll see another way to group information, structures.

Instead of writing this data definition we could write (listof (list Str Num)) in contracts. But that doesn’t help us derive the template (and therefore code that consumes a Payroll) and a contract of Payroll -> TaxRoll is more expressive (easier to write and understand) than

(listof (list Str Num)) -> (listof (list Str Num))

Recap of (list ...) vs. (listof ...):

Use (list ...) for a fixed-length list. You know when you write the code exactly how many elements the list will have. Typically those items are all related to each other, like a person’s name and their salary.
(list ...) will have one type for each element in the list.
Use (listof ...) when the length of the list is unknown or changes as the program runs. The list might be empty or it might be very long or anything in between.
(listof ...) will have exactly one argument: the type that every element in the list will have. It might be a complicated type such as (list Str Num).
(list ...) and (listof ...) are not interchangeable! One of them will be right and one will be wrong. (Well, it’s possible that they’re both wrong 😊.)

We’ll use the Payroll data definition to create a template to help write our payroll code.

Recall that (first pr) produces the first element of the payroll. That first element is a two-element list with a name and salary. So (first (first pr)) produces the name while (second (first pr)) produces the salary. We could also use (first (rest (first pr))).

This code is not very readable. A better solution is to use well-named helper functions such as name and salary instead of first and second (sort of like structures!).

With a template in hand, we can now start to solve the actual problem of computing taxes for a payroll.

The slide shows the template after a number of steps in the design recipe:

Writing the purpose
Writing some examples (represented here by constants test-payroll and test-taxes; definitions aren’t shown)
Writing the header (including renaming payroll-template to something more problem-specific) and contract (note the use of the types we defined earlier, Payroll and TaxRoll)
Refining the purpose

The classic questions for a (listof X) function apply here. Their answers direct how to fill in the gaps in the template.

What do you do in the base case?
There are no taxes owed if no one earned a salary. The contract says to produce a (listof X). empty fits both observations.
How do you transform the first element on the list?
In this case, compute the taxes on the salary. Put that number together with the name into a two element list.
What value is produced by the recursive processing of the rest of the list?
A list of taxes owed by the employees on the rest of the list.
How do you combine the value produced in 2 with the value produced in 3?
Add the newly calculated taxes owed to the beginning of the list.

A really useful strategy is to break the problem into two parts:

the part that handles the list and
the part that handles one item from the list.

This makes clear the overall structure of the code is simply the (listof-X-template) and allows one to focus on the distinct part of how one item is handled.

This is very similar to the “refined” (listof X) template discussed in M06.

What we’ve just seen – a list of fixed-length flat lists – is just the beginning. That’s as complex a list structure as we’ll see in this module. In later modules we’ll explore more complex arrangements.

The remainder of this module will explore uses for lists of lists and processing multiple lists or a list and a number.

Students often get confused by cons and list. Play with small examples in DrRacket until you can accurately predict what each does.

Dictionaries

An important aspect of the definition is that keys are unique . Given a key, we can look it up in a dictionary and get, at most, one value. We say “at most” because it’s possible the key isn’t there. But there won’t be two of them.

Values, on the other hand, may be duplicated: for example, each student has a unique student ID (key), but multiple students might have the same grade (value).

Sample Dictionaries:

Userid	Mark
asmith	87
bjones	64
cwang	87

Name	Phone
A. Smith	519-555-0123
B. Jones	226-555-3210
C. Wang	226-555-5678

lookup is also commonly called find or search. add is sometimes called insert.

You might start to think about what the contracts for these functions should be. In particular, what do they produce?

Computer scientists have come up with an amazing number of ways to implement a dictionary. The big advantage of an association list is that it is simple to implement. The big disadvantage is being increasingly slow as the dictionary gets larger (has more entries).

Other methods of implementing dictionaries trade off complexity for some set of other advantages, typically speed. Some might be blazingly fast to search but slow to insert or delete; others might try to balance those operations in different ways.

We’ll explore at least one other dictionary implementation strategy, binary search trees , in M11.

We have three options for documenting this data type.

We could write the full data definition, as shown on the previous slide. This gives the maximum amount of help in writing the template and therefore the maximum help in writing the code. It also gives a name that is useful in writing contracts.

Because it’s a list of number-string pairs, we could use (listof (list Nat Str)) every place we need the data type, principally in contracts. However, the term AL or “association list” helps the reader understand the intent of the parameter is probably to look something up. The listof format doesn’t convey that meaning. It’s also surprising how quickly the keystrokes for (listof (list Nat Str)) add up when writing contracts. This form also ignores the requirement that keys are unique.

A middle ground that gives a meaningful name to use in contracts, includes the requires statement, but doesn’t help with the templates is to write

;; An AL (association list) is a `(listof (list Nat Str))` 
;; Requires: each key (Nat) is unique

This is very similar to the payroll template. Like it, helper functions make the code more readable in the improved version.

Section 6.2 of the Style Guide says that accessor functions for fixed-length lists don’t need the design recipe.

This code is similar to the insert function we wrote earlier in that the template doesn’t tell the whole story. What both functions have in common is that there are two base cases; two situations in which we stop.

For insert the two stopping situations were (1) when we got to the end of the list and (2) when the item we were inserting belonged at the beginning of the list.

For lookup the two stopping situations are (1) when we get to the end of the list and (2) when we find what we’re looking for.

In both cases the template was a good starting point even though the final form of the function didn’t match it really well.

We’ve introduced two concepts here:

(anyof ...) indicates any of several different things. A contract of (anyof Sym Str) -> Num means the function could consume a symbol or it could consume a string. Both work, but a number does not.
We’re now allowing values into our contracts. To say that lookup-al on the previous slide produces (anyof Str Bool) is correct. If the key is found, it produces a string. If the key is not found, it produces a Boolean. But lookup-al never produces true, so it’s more accurate to say that it produces (anyof Str false). We now allow our contracts to include specific values.

2D data

These multiplication tables are a little unusual because they start with 0 rather than 1.

Here is a slight revision of the countup-to code we saw in the previous module. Instead of counting to b it counts until b (that is, b is not included).

;; (countup-until n b) produces an increasing list from n to b-1
;; Examples:
(check-expect (countup-until 6 9) (cons 6 (cons 7 (cons 8 empty))))
(check-expect (countup-until 9 9) empty)

;; countup-until: Int Int -> (listof Int)
(define (countup-until n b)
  (cond [(>= n b) empty]
        [else (cons n (countup-until (add1 n) b))]))

make-a-row is identical except for:

naming (function name, the counter is called c instead of n)
it has an extra parameter, r, that goes along for the ride
the counter, c, is transformed before being consed onto the list

make-a-row constructs one row for the multiplication table by figuring out what value each column should have.

generate-rows is also very similar to countup-until. This time the counter is called r. Again, we have one argument going along for the ride (nc, the number of columns that make-a-row will produce for each row).

The big difference is that instead of just consing n onto the list we’re making a whole row with make-a-row and consing that row/list onto the list. The result is a list of lists.

Finally, mult-table is just a wrapper function that calls generate-rows with the correct values.

We will return to this example two more times. The first time we will “bundle” the two helper functions with mult-table to make a tidier package. The second time we will reduce these eight lines of code down to three lines.

Processing two lists

In this section we’ll develop functions with two parameters, each of which is a list:

;; twolist-template: (listof X) (listof X) -> Any
(define (twolist-template lst1 lst2) ... )

Each of those two lists might be empty or not. So we might expect code similar to this:

(define (twolist-template lst1 lst2)
  (cond [(and (empty? lst1) (empty? lst2)) ...]
        [(and (empty? lst1) (cons? lst2)) ...]
        [(and (cons? lst1) (empty? lst2)) ...]
        [(and (cons? lst1) (cons? lst2)) ...]))

This is getting complicated fast! We can actually break it down into three cases:

We only need to process one list; the other “goes along for the ride”.
Both lists are the same length; one element from each is processed at each step.
The general case: unequal lengths, process one or the other or both.

We’ll take these three cases in order of increasing complexity with a concrete example for each.

In the first case of processing two lists, we only actually process one. The second one just goes along for the ride. That means that of the four tests identified above, we only need to consider two: whether the first list is empty or not.

Video: Download m08.30_process_one_of_two

cons, list and append are all different! But they are often confused:

(cons e lst) is used to make lst one element longer by adding e at the beginning.
(list e_1 e_2 ... e_n) is used to construct a list with the n elements given.
(append lst1 lst2) produces a single list with all the elements in lst1 followed by all the elements in lst2, in order.

Examples:

This	Produces
`(cons (list 1 2) (list 3 4))`	`(list (list 1 2) 3 4)` (a list of length 3)
`(list (list 1 2) (list 3 4))`	`(list (list 1 2) (list 3 4))` (a list of length 2)
`(append (list 1 2) (list 3 4))`	`(list 1 2 3 4)` (a list of length 4)

Save yourself some typing:

(check-expect (expand-each (list 12 13 'x)
                           (list 42 "zorkmids" 'Q))
              (list (list 12 42 "zorkmids" 'Q)
                    (list 13 42 "zorkmids" 'Q)
                    (list 'x 42 "zorkmids" 'Q)))

In the second case of processing two lists, we only need to check one of the lists for empty or non-empty – but for a different reason than the first case (e.g. my-append).

Video: Download m08.40_lockstep

It seems reasonable that someone might need a predicate to check whether two lists of numbers are equal or not. We’ll define “equal” to mean exactly the same elements in the same order. (list 1 2 3) and (list 2 3 4) are obviously not equal. (list 1 2 3) and (list 1 2 3) are equal.

(list 1 2 3) and (list 1 3 2) are not equal even though they have the same elements.

We also need to be able to compare lists with different lengths (which will always return false).

Handling unequal length lists leads to the same starting template code as we had for merge (shown in the slide). As with merge we can’t write a complete template because more problem-specific analysis is needed.

The observations on the previous slide lead directly to this code.

Some further simplifications are possible.

There is a very simple, elegant solution to equality for PrettyMuchAny. If you happen to find it, great! If not, don’t worry. We’ll see several helpful ideas about this in module 11.

Do not over-use equal?. It should not be used for simple values when you know the types of each argument. In that case, use the appropriate predicate such as symbol=?, string=?, =, etc. It’s more efficient and, if the types are not correct Racket will give an error right away – a tremendous help in debugging.

It is reasonable to use equal? for complex types such as lists and structures (when we get there).

The fourth case is the most general where the lists may be of different lengths and may be processed at different rates. As a result, either one or both could be empty. We’ll need to test all four possibilities.

We’ve focusing for the moment on the second and third possibilities where exactly one list is empty. In the template, extract the first element and the rest of the list for each non-empty list.

When one list is empty, it could be that all we need is the non-empty list without change. No recursion is needed.

But it could be that the non-empty list needs further processing, leading to still more recursion.

Fortunately, each of these moves us closer to a base case. That’s important!

A short video to introduce the merging problem:

Video: Download m08.50_merge

The first three examples are base cases: no further recursion is necessary because one of the consumed lists is the result. Note that either or both lists may be empty.

In the recursive cases, there is still more reasoning to be done.

Tests to get you started:

;; Base cases:
(check-expect (merge empty empty) empty)
(check-expect (merge empty (list 2 6 9)) (list 2 6 9))
(check-expect (merge (list 1 3) empty) (list 1 3))

;; Recursive cases:
(check-expect (merge (list 1 4) (list 2)) (list 1 2 4))
(check-expect (merge (list 3 4) (list 2)) (list 2 3 4))

If (first lon1) is the smaller one, then we need the first of the natural recursions ealier.

If (first lon2) is the smaller one, then we need the second of three natural recursions.

Here is the finished merge function.

After understanding this trace, consider the code on the previous slide again. A number of transformations are possible that shorten the code, make it slightly more efficient for the computer, and may make it easier to understand.

First, note that in the first question/answer pair lon2 is empty and the answer produced could be replaced with lon2. The first two question/answer pairs are then

[(and (empty? lon1) (empty? lon2)) lon2]
[(and (empty? lon1) (cons? lon2)) lon2]

and it becomes apparent that both produce lon2 regardless of whether lon2 is empty or not. So the two pairs can be replaced with just

[(empty? lon1) lon2]

Second, the last question/answer pair can be replaced with else. We then have [else (cond ...)]. That’s considered bad style and the question/answer pairs of the inner cond can be promoted to the outer cond. Together, these transformation result in the following:

(define (merge lon1 lon2)
  (cond [(empty? lon1) lon2] ; first two cases
        [(empty? lon2) lon1] ; third case
        [(< (first lon1) (first lon2))
         (cons (first lon1) (merge (rest lon1) lon2))]
        [else (cons (first lon2) (merge lon1 (rest lon2)))]))

Aside: Mergesort

merge is a large fraction of the MergeSort algorithm. This is not officially part of CS135; but just in case you’re curious….

mergesort consumes an unsorted (listof Num) and produces a sorted list. merge produces a sorted list, too. If only we had a way to split the list that mergesort consumes and sort them, we could then merge them together to produce the entire sorted list.

We can split the list with keep-alternates and drop-alternates. keep-alternates will keep every other element in a list, starting with the first. drop-alternates will drop every other element (keeping the rest). We’ll cover them in Module 11 .

mergesort can sort those shorter lists.

Putting it all together, we get:

;; mergesort: (listof Num) -> (listof Num)
(define (mergesort lst)
  (cond [(empty? lst) empty]
        [(empty? (rest lst)) lst]
        [else (merge (mergesort (keep-alternates lst))
                     (mergesort (drop-alternates lst)))]))

Note that this is not simple recursion! That’s because the lists consumed by the recursive application of mergesort are not just one step closer to the base case. The code for merge is simple recursion.

mergesort is more complicated than insertion sort but on longer lists it is much faster than insertion sort.

Processing a list and a number

Video: Download m08.60_at-least

Take a careful look at the contract. It consumes a list of anything. The lists here all have elements of the same type (because that’s the most common) but the lists could also have mixed type such as (list 1 'a 2 'b). Also note that the thing we’re looking for is of type Any – which may or may not match the type that’s in the list.

In the tests, it’s always true that there are at least 0 examples of anything at all on a list – even if the list is empty. 0 is one of the base cases.

The steps that lead to this code:

The first two cases, in which n is 0, both have a true result. They can be collapsed into a single case, [(zero? n) true]
We collapse (and (> n 0) (empty? lst)) to just (empty? lst) because we’ve already established that n is not zero.
The test for (and (> n 0) (cons? lst)) can be replaced with else. That makes it obvious that the nested cond can be promoted to the top level cond.