RU beehive logo ITEC dept promo banner
ITEC 325
2017spring
ibarland

homelectshws
D2Lbreeze (snow day)

dtd
DTD
ch06

These notes influenced by XML Visual Quickstart Guide by Kevin Howard Goldberg, and notes therefrom by Jack Davis (jcdavis@radford.edu).

We've seen quick examples of some standardized file-formats using their own variant of XML: iTunes collections, .svg files, Word documents. We also saw a made-up set of tags about children, as well as the books' made-up set of tags about ancient wonders. Actually, even the “standardized” file formats are just somebody who originally made up tags, to describe the hierarchical information they wanted to represent. But if the tags are made-up, who's to declare if people are using them correctly?

CIrx4XqX3ok youtube (22m59s):

When defining a new XML language, you must specify its grammar — what tags are allowed, what child-tags they may (or, must) contain. Similarly, for attributes — what attributes are required on certain tags (like “img” tags must have a “src” attribute and otherwise be empty), and what the allowed values are for attributes.

In fact, you can compare any XML document to its corresponding grammar to validate whether it conforms to the rules specified in the schema. If an XML document is deemed valid, then it data is in the proper form as specified by the schema. (Of course, just as a syntax-checker can validate a Java program's syntax, it won't validate its meaning; similarly schema's won't validate an XML document's logical content.) This isn't as much a problem for XML documents (especially small ones) as for programs. For databases implemented as large XML files, people might build other ad hoc tools to run sanity checks on the contents of an XML document's meaning — check that a wonder's year-destroyed isn't less than its year-built, that links actually resolve, etc..

There are two common formats for specifying XML grammars: DTDs, and XML Schema. A DTD, “Document Type Definition”, is an older but widely used system with a peculiar and limited syntax. However, they are lightweight: compact and easily comprehended with a little study. Since they are relatively simple and still widely used, studying them is a good first step in understanding XML tag set definition. A DTD is a text-only document itself and therefore does not begin with the standard XML declaration.

The three things a DTD specifies

An example file: the textbook's ch06-wonders.dtd, as referenced in ch06-wonders.xml (do a view-source, line 3)

Where to put the DTD file

You can have the DTD as an external file, or in-document (similar to providing css-files).

  1. In-document, inside the same file as the XML:
    <!DOCTYPE ancient_wonders [
        <!ELEMENT ancient_wonders wonders*>
        <!ELEMENT wonder (name+, )>
        
    ]>
    
    <ancient_wonders>
        <wonder>
            <name>Great Pyramid of Ghiza</name>
            
        </wonder>
    </ancient_wonders>    
                  
    This is the lightest-weight solution, and is suitable for the homework assignment.
  2. For an external document file on your own computer, you can use SYSTEM Start your xml file with a DOCTYPE specifying where the dtd is found, e.g. <!DOCTYPE ancient_wonders SYSTEM "wonders.dtd">. This is what the above ch06-wonders.xml does (remember to view-source, line 3).
  3. An external document on the interwebs, using PUBLIC: <!DOCTYPE ancient_wonders PUBLIC "-//ibarland//DTD Archaeological Wonders 1.0//EN" "https://php.radford.edu/~itec325/2016spring-ibarland/Lectures/wonders.dtd">. See here for the syntax of the word after “PUBLIC”. (And: on your homework, prefer the SYSTEM technique to this one.)

Defining your XML: Elements, Attributes, and Entities


Design Issues

Digression: tags vs. attributes

When specifying a height, which of the following do you like/dislike? Why?

  1.   <wonder height = "37 feet">…</wonder>
    
  2.   <wonder>
        <height>37 feet</height>
        
      </wonder>
    
  3.   <height>
        <measure units="meters">11.8</measure>
        <measure units="feet">37</measure>
      </height>
    
  4.   <height>
        <feet>37</feet>
        <meters>11.8</meters>
      </height>
    
  5.   <height measure="37">
        <units>feet</units>
      </height>
    
  6.   <height>
        <measure>37</measure>
        <units>feet</units>
      </height>
    
  7.   <height units="feet">
        <measure>37</measure>
      </height>
    
  8.   <height>
        37
        <units>feet</units>
      </height>
    
  9.   <height units="feet">
        37
      </height>
    

A common question, when designing an XML language, is when to use a (nested) tag, vs. an attribute on the tag. The rule-of-thumb is “data as tags; metadata as attributes”. For example, in XHTML, the img tag has the filename as an attribute, because the image is the data; the filename is information about how to find the (real) data. A couple of guidelines:

Design practice

Sample exercise:

  1. (a) create a reasonable DTD for census records so that the following file would be legal: ch06-census.xml
  2. (b) Critique any strengths and weaknesses of how that file represents information — what changes would you make to represent census records?

Sample exercise: Come up with a DTD for a flow charts, such as imgur.com/ECkYukd#.

homelectshws
D2Lbreeze (snow day)


©2017, Ian Barland, Radford University
Last modified 2017.Apr.22 (Sat)
Please mail any suggestions
(incl. typos, broken links)
to ibarlandradford.edu
Rendered by Racket.