Assignment #1: Abstractions and Interfaces
This assignment is to be done individually and handed in electronically before 11:00 p.m. on Thursday, April 20 th. |
Problem
You must write an XML application to convert suitably tagged documents into plain text.
Specifications
A (simplified) well-formed XML document has the following format:
<TagName> ... </TagName>
where any identifier (string of alphanumeric characters starting with a letter) can be used as a tag name, as long as the name in the closing tag matches the name in the opening tag. Pairs of tags can be arbitrarily placed, including nested one within the other, as long as they are properly matched.
(Note that a document encoded in HTML is a tagged text that is meant to be interpreted by a browser, such as the ones marketed by Netscape and Microsoft. View the "page source" of any HTML document to see the tags.)
Your program is an XML application that interprets certain tags as follows:
<d>Humpty Dumpty sat on a wall.</d>
<d>had a great
f</d>
<d>All the king's horses,
and </d>
<d>all men</d><d>couldn't put</d>
declares 5 strings (one of which includes a newline character).
For example, the text
<r><s>1</s><o>7</o><l>6</l></r>
denotes the 6 characters starting from character number 7 in declared string number 1 (namely "reat f" if strings are declared as above).
<k>xact >>*/ text</k>
denotes the string "xact >>*/ text".
Given such a tagged document as input, your program should remove declarations of strings (tagged with <d>...</d>), replace references (tagged with <r>...</r>) by the substrings they denote, retain kept strings (<k>...</k>) in place, and remove all other text. For example, given the input
<?xml version="1.0"?>
<d>Humpty Dumpty sat on a
wall.</d>
<d>had a great f</d> What do you know?
<r><s>0</s><o>0</o><l>13</l></r><k>,</k>
<r><s>1</s><o>3</o><l>10</l></r><k>ool.
</k> <p>However,</p>
<d>All the king's horses,
and </d>
<d>all the king's men</d><d>couldn't
put</d>
<r><s>0</s><o>0</o><l>13</l></r>
<r><s>2</s><o>12</o><l>15</l></r>
<r><s>3</s><o>15</o><l>3</l></r>
<r><s>4</s><o>8</o><l>3</l></r>
<k>lled him up.</k> again.
would produce as output:
Humpty Dumpty, a great fool.
Humpty Dumpty's horses,
and men pulled
him up.
Implementation
Your solution should include two Java interfaces: one to process XML text (find tags, interpret specific tag semantics such as declaring strings and referencing substrings, etc.) and the other to manage the collection of declared strings (saving them and extracting designated substrings).
Your solution must also include (at least) two Java classes, one to implement each interface. The class that manages the collection of declared strings should use a Vector, since this makes it easy to add another string and to reference the ith string. (Read C.3.3 in Weiss' text to learn how to use Vectors.)
Your program should report errors if the input is invalid, except that it need not check that ignored tags are properly matched and balanced. For example, you should check that the tags <d>, <r>, <s>, <o>, <l>, and <k> are all properly used, but you need not report an error if other tags are not matched properly (thus you need not report it as an error when given "<A> <B> <C> </A>" even though it is not valid XML).
Grading
Design |
10% |
Documentation (including pre and post conditions) |
25% |
Correctness |
40% |
Testing |
25% |
Comments
Pre- and post-conditions are necessary for all methods. Loop invariants are not necessary, but your code must contain sufficient comments to be easily understandable.
Your solution need not be as robust as the sample project, but it must meet the above specifications.
Note the emphasis on testing. Take the information from class and from the notes as well as the sample project as a good guide to what is proper and sufficient testing. Many of the testing marks will come from your test summary file. Be sure it is clear and complete. The sample project shows one acceptable way to set up such a file.
Electronic Submission
Be sure to submit:
Note: Late submissions will not be accepted. |
Further Reading
The description of XML given here is quite simplified. You are not required to learn XML for this assignment, beyond the specifications given above. However, if you are interested in exploring this topic further, you may wish to refer to Robin Cover's SGML/XML Web page or to Tim Bray's annotated XML specification.