|
This assignment is to be done individually and handed in electronically (using submit) before 12:00pm on Tuesday, March 8. |
You are to write a Java program to parse XML data and create corresponding trees.
XML is an emerging standard for representing text. It is being adopted by many companies for representing documents, for interchanging data among applications, and for enabling applications to work in a Web environment.
A (simplified) well-formed XML document includes strings surrounded by matched opening and closing tags of the form
<TagName> ... </TagName>
where any identifier (string of alphanumeric characters starting with a letter) can be used as a tag name, as long as the name in the closing tag matches the name in the opening tag. Pairs of tags can be arbitrarily placed, including nested one within the other, as long as they are properly matched. One pair of tags must surround all the complete text. The two special symbols “<”: and “>” may not appear anywhere in the text except to mark a tag start or a tag end, respectively. The following XML data uses tag names a, b, c, d, and e.
<a>This is <b>some nested text</b> and we can have <c>text <d>within</d> the text</c> and <a>even use the same tags for nesting</a></a>
(Note that a document encoded in HTML is tagged similarly. View the "page source" of any HTML document to see the tags.)
Any well-formed XML document can be represented as a tree, in which the nested substructures are represented by nested subtrees. For the above XML data, the corresponding tree looks like this (where solid boxes represent tag names, and dashed boxes represent running text):

Note, however, that we need generalized trees rather than binary trees. Whereas
a non-empty binary tree has a left and right subtree, in a generalized tree a
non-empty tree has an ordered sequence of zero or more non-empty subtrees.
Start by implementing a class to represent generalized trees using the following specifications, in which the sequence of subtrees is stored in a queue:
public interface QueueTreeInterface extends NestingInterface
{
public void attachRightmost(Object newItem);
// Creates a new QueueTreeInterface with newItem as its root item and appends it as an additonal subtree
// pre: !isEmpty() and newItem is not null
// post: enqueues a new subtree that contains newItem as
// its root value and has no subtrees itself.
public void attachRightmostSubtree(QueueTreeInterface newTree);
// Appends a pre-formed tree as an additional subtree
// pre: - not this.isEmpty() and not newTree.isEmpty()
// - this is not nested within newTree
// - newTree is not nested within some other tree
// post: newTree is enqueued as an additional subtree.
// newTree itself is made empty.
public QueueTreeInterface detachLeftmostSubtree();
// Detaches the leftmost subtree
// pre: not this.isEmpty()
// post: the left-most subtree is dequeued and returned;
// if there are no subtrees, returns null
} // end QueueTreeInterface
Your implementation should be modelled after the recursive binary tree implementation discussed in class and given in tree.zip. Instead of declaring “leftTree” and “rightTree” of type BinaryTree, your class will declare a single variable “subtrees” of type QueueInterface. This makes sense if you consider the Queue Interface - you may only attach from the right (or the rear of the queue) and remove from the left (or the front of the queue). Represent an empty tree by setting “subtrees = null”.
Note:If you have a set of tags (let's say <a></a>) with no text in between, the QueueTree is not an empty tree. Instead, it is a non-empty tree whose root is <a> and whose instance variable, subtrees, is empty (i.e. an instantiated but empty Queue). You need to implement two constructors (one taking no parameters, and one taking a single object as parameter), the four methods from NestingInterface , the three additional methods for QueueTreeInterface, and the method toString(), as described below. You may optionally implement any other helper methods you choose.
For simplicity, we have provided a zip file of 'provided code' in provided.zip. This zip file includes all the source files needed for a working Queue (complete with exceptions).
We have also compiled for you a template DataFactory for you to use. Note that there are in fact two constuctors for your implementation of QueueTreeInterface. Please use these exact method signatures in your submitted DataFactory.java file
We will provide you with a “tokenizer” (in XMLTokenizer.java - Ready to be downloaded!) that reads an arbitrarily long string of characters from a file named xml.txt and separates it into “tokens”, where each token is either an opening tag (e.g., <a>), a closing tag (e.g., </a>), or a piece of running text with containing no tags (e.g., “This is “). Note that angle brackets (“<” and “>”) conveniently serve to delimit the tokens. The tokens will be put into a queue in the order that they appear in the text.
Next implement a “parser” that reads tokens from the token queue and builds the corresponding tree. Each time it encounters an opening tag the parser starts to build a new tree. A block of running text is appended as a child of the tree being constructed. You should distinguish a tagname from running text by storing angle brackets around the tagname in the label (as shown in the diagram above). Each time the parser encounters a closing tag, it appends the newly completed tree as a subtree of the parent tree being constructed (which should have a matching tagname as its root label). You will find this code easiest to write recursively.
Finally, implement the toString() method for your tree class using a pre-and-post-order traversal of the tree, as follows:
if the root label represents a tag,
1. print the root label in the form of an opening tag
2. visit each of the subtrees in order using a pre-and-post-order traversal
3. print the root label again, but in the form of a closing tag
otherwise (the root label represents running text) print the root label
Note that you will have to detach and then reattach subtrees during the traversal.
Your mainline should cause your program to repeatedly read in XML from the file xml.txt by calling the tokenizer, create the corresponding tree(s), and then print the tree(s) to the console using toString(), one tree per line.
|
Important Notes! You may only use the becker, java.lang and java.util packages. Contact the tutor before using any other packages. Not doing so may cause your program to fail compilation. For this and all subsequent programming assignments in CS124 you are not allowed to use package names. That is, your files cannot begin with something like package Assign03; This causes marking scripts to fail. If you make package names in this way and the marking scripts fail, you will be penalized. |
|
Design |
15% |
|
Documentation (including pre- and post-conditions) |
30% |
|
Correctness |
55% |
You can also view the marking scheme for Assignment 3.
For every file you submit, include your name and ID number. Submit:
BinaryTree.java,
BinaryTreeInterface.java, and
TreeException.java will not be accepted nor provided for
compilation. Furthermore:
NestingInterface.java,
NestingException.java,
Node.java,
QueueInterface.java,
QueueReferenceBased.java,
QueueException.java,
XMLTokenizer.java,
QueueTreeInterface.java should not be submitted, but these files
will be
provided when we compile your code.|
Note: Late assignments will NOT be accepted. |