Assignment #1: Abstractions and Interfaces

This assignment is to be done individually and handed in electronically before 11:00 p.m. on Thursday, April 20 th.

Problem

You must write an XML application to convert suitably tagged documents into plain text.

Specifications

A (simplified) well-formed XML document has the following format:

(Note that a document encoded in HTML is a tagged text that is meant to be interpreted by a browser, such as the ones marketed by Netscape and Microsoft. View the "page source" of any HTML document to see the tags.)

Your program is an XML application that interprets certain tags as follows:

Given such a tagged document as input, your program should remove declarations of strings (tagged with <d>...</d>), replace references (tagged with <r>...</r>) by the substrings they denote, retain kept strings (<k>...</k>) in place, and remove all other text. For example, given the input

<?xml version="1.0"?>
<d>Humpty Dumpty sat on a wall.</d>
<d>had a great f</d> What do you know?
<r><s>0</s><o>0</o><l>13</l></r><k>,</k>
<r><s>1</s><o>3</o><l>10</l></r><k>ool.
</k> <p>However,</p>
<d>All the king's horses,
and </d>
<d>all the king's men</d><d>couldn't put</d>
<r><s>0</s><o>0</o><l>13</l></r>
<r><s>2</s><o>12</o><l>15</l></r>
<r><s>3</s><o>15</o><l>3</l></r>
<r><s>4</s><o>8</o><l>3</l></r>
<k>lled him up.</k> again.

would produce as output:

Humpty Dumpty, a great fool.
Humpty Dumpty's horses,
and men pulled him up.

Implementation

Your solution should include two Java interfaces: one to process XML text (find tags, interpret specific tag semantics such as declaring strings and referencing substrings, etc.) and the other to manage the collection of declared strings (saving them and extracting designated substrings).

Your solution must also include (at least) two Java classes, one to implement each interface. The class that manages the collection of declared strings should use a Vector, since this makes it easy to add another string and to reference the ith string. (Read C.3.3 in Weiss' text to learn how to use Vectors.)

Your program should report errors if the input is invalid, except that it need not check that ignored tags are properly matched and balanced. For example, you should check that the tags <d>, <r>, <s>, <o>, <l>, and <k> are all properly used, but you need not report an error if other tags are not matched properly (thus you need not report it as an error when given "<A> <B> <C> </A>" even though it is not valid XML).

Grading

Design

10%

Documentation (including pre and post conditions)

25%

Correctness

40%

Testing

25%

Comments

Pre- and post-conditions are necessary for all methods. Loop invariants are not necessary, but your code must contain sufficient comments to be easily understandable.

Your solution need not be as robust as the sample project, but it must meet the above specifications.

Note the emphasis on testing. Take the information from class and from the notes as well as the sample project as a good guide to what is proper and sufficient testing. Many of the testing marks will come from your test summary file. Be sure it is clear and complete. The sample project shows one acceptable way to set up such a file.

Electronic Submission

Be sure to submit:

Note: Late submissions will not be accepted.

Further Reading

The description of XML given here is quite simplified. You are not required to learn XML for this assignment, beyond the specifications given above. However, if you are interested in exploring this topic further, you may wish to refer to Robin Cover's SGML/XML Web page or to Tim Bray's annotated XML specification.