CS 106 Winter 2018
Lab 08: Text Processing
Question 1 Singles
The built-in function match() can be used to check if a string contains a pattern described by a regular expression. But its return type is a bit funny: it returns a String[], even though it would make more sense to expect true/false. We won't talk about the meaning of that string array in this course. Instead, write a short function hasMatch(). It takes the same arguments as match(): a String to search, and a String defining a regular expression. It returns a boolean that tells you whether the search string matches the regular expression. (The lecture slides explain how this is done.) The body of the function can be a single line of code.
Write a function chop() that takes a single String as input, and returns an array of two strings, consisting of the first half and second half of the passed-in string. For example, chop( "pangloss" ) would return the array { "pang", "loss" }. If the input has odd length, put the extra character in the first string; chop( "salad" ) would return the array { "sal", "ad" }. The body of the function should be at most 3–5 lines of code, with no loops.
Write a function associate() that takes two arguments: an array of strings, and an array of integers. The function returns a value of type IntDict in which every string in the first array has been used as a key, whose value is the corresponding element of the second array. That is, if the first array is { "cat", "dog", "fish" } and the second array is { 13, 87, 2 }, you'd return an IntDict that maps "cat" to 13, "dog" to 87, and "fish" to 2. You can assume that the two array arguments have the same length.
Question 2 Word Play
In this exercise, you will practice processing text by solving more word puzzles like the ones demonstrated in class. These puzzles involve iterating over an array containing all the words in some long list, finding the ones that pass some sort of criterion or test. You will be given a sketch that has a user interface for displaying the solutions to these puzzles; your job is to write the tests themselves.
Download the starter code and unzip it. Open the WordPlay sketch. You will see that the sketch is divided into two tabs. The WordPlay tab sets up the sketch, loads in the word list, and builds the user interface. You do not need to change anything in that tab, apart from adding your name and student ID to the top. Everything else you do will happen in the Puzzles tab.
There are seven puzzles to solve. They are solved by calling the functions getWords_0() ... getWords_6(). Each function must return an array of Strings containing all the words that solve the puzzle. You do not need to call these six functions yourself—they are called for you by the code in the WordPlay tab. You will find that getWords_0() is solved for you as an example. You must add code to the other six functions to solve those puzzles (look for comments marked TODO.
In getWords_1(), write code to find all words that start with "und" and end with "und". Example: "underground".
In getWords_2(), write code to find all words that contain three double letters in a row. Example: "bookkeeper", which contains "oo", "kk" and "ee", with nothing else in between. (Obviously, these words must all have at least six letters!) For this puzzle, you'll have to write an inner for loop that iterates over the letters in each word, extracting individual characters using the String class's charAt() method. Note that unlike with strings, you can use plain == on two char values to determine whether they're equal.
In getWords_3(), write code to find all words of six or more letters in which the letters are in strict alphabetical order within the word, with no repeats. Example: "almost", because "a" comes before "l", "l" comes before "m", and so on. As above, you'll need an inner loop, here comparing each letter to the next one in the word. Note that you can use < and > on two char values to determine whether one comes before or after the other in the alphabet.
In getWords_4(), write code to find all words of 14 or more letters in which no single letter occurs more than once anywhere in the word. So, for example, "undiscoverable" doesn't count because it contains two "e"s, but "undiscoverably" words.
There are a couple of different ways to solve this puzzle. The easiest is to first write a helper function that counts the number of times a given letter occurs in a word, using a for loop. That might start like this:
int getCount( String word, char letter ) { // Count how many times letter occurs in word }
Now, in getWords_4(), first check that the current word has at least 14 letters. If so, check how many times each letter of the alphabet occurs in the word, by calling getCount(). If any letter returns a count of two or more, this word is invalid and should not be appended to the output array.
In getWords_5(), write code to find all words with exactly eight letters where, if you take the first two letters of the word and move them to the end, you get another valid word. For example, if we move the "by" in "bypasser" to the end of the word we get "passerby", which is also in our word list. It will be useful to remember how to use the String class's substring() method.
Let's say you're given the word "pancakes". Shifting two letters gives you "ncakespa". You now need to check whether that's a word. Do not loop over the array words looking for "ncakespa". Instead, use the variable word_dict, defined in the WordPlay tab: specifically, check if it contains the test word as a key, as shown in class.
In getWords_6(), write code to find all words in which the vowels "a", "e", "i", "o", "u" and "y" appear in the word in that order, with no other vowels. Example: "facetiously". The easiest way to solve this puzzle is with a regular expression. The good news is that the regular expression is provided for you! All you need to do is use it correctly. Use the built-in match() function with the current word and the regular expression. If match() returns a non-null value, then the pattern was found and the word is one of the solutions to the puzzle.
In case it's not obvious, your code must actually find the words that solve the puzzle. That is, you're not allowed to determine the solution words by some other means and return them explicitly in an array. Looked at another way, if we changed the file words.txt, your code would proceed to find new sets of solutions relative to the words in that new file.
Save your work in a sketch titled WordPlay.
Submission
When you are ready to submit, please follow these steps.
If necessary, review the Code Style Guide and use Processing's built-in auto format tool. You do not need to use the precise coding style outlined in the guide, but whatever style you use, your code must be clear, concise, consistent, and commented.
If necessary, review the How To Submit document for a reminder on how to submit to LEARN.
Make sure to include a comment at the top of all source files containing your name and student ID number.
Create a zip file called L08.zip containing the entire L08 folder and its subfolder WordPlay.
Upload L08.zip to LEARN. Remember that you can (and should!) submit as many times as you like. That way, if there's a catastrophe, you and the course staff will still have access to a recent version of your code.
If LEARN isn't working, and only if LEARN isn't working, please email your ZIP file to the course account (see the course home page for the address). In this case, you must mail your ZIP file before the deadline. Please use this only for emergencies, not "just in case". Submissions received after the deadline may receive feedback, but their marks will not count.