|
home—lects—exams—hws
breeze (snow day)
Originally based on XML Visual Quickstart Guide by Kevin Howard Goldberg, and notes therefrom by Jack Davis (jcdavis@radford.edu).
In 1991, the first Web site was put online Twenty years later (not yet quite an adult), in 2011.Mar, one report reported 100 million active websites (e.g. onsite.bloggspot.com/), which (extrapolating from 2005 estimates of ~275 pages/site) would mean nearly 30billion (30,000million) pages. Another site even claims 20 trillion pages (that's 20,000,000 million); this might be trying to count the number of dynamic pages ever generated??
Most of the information available on the web is made available through HTML. HTML's simplicity has helped fuel the popularity of the Web. However, when faced with the Web's huge and growing quantity of information, it has presented real limitations. These limitations are evident when the task is to categorize, search or perform other operations on web available information.
XML (“eXtensible Markup Language”) is a general-purpose grammar/specification for structured (hierarchical) data. It can be thought of as a tree, where the leaves are strings, and the nodes are element-tags (with their attributes).
XML is used to define (other) markup languages (tag sets), like XHTML, or iTunes libraries1?. While HTML is designed to specifically to display web pages, XML is used to structure and manage many sorts of information.
Here's a simple XML document: ch01-xml-example.xml (use “show-source”).
At first glance, XML doesn't look so different from HTML: it is populated with tags, attributes, and values. Notice, however, that the tags are different than HTML, and in particular how the tags describe the contents they enclose. XML is also written much more strictly, the rules will be described in much more detail.
The XML describes information only;
no formatting/presentation is implied by the tag names.
We can use CSS to specify formatting details:
XML document
referencing a CSS
the referenced CSS
Every custom markup language created using the XML specification must adhere to XML's underlying grammar. Custom markup languages created with XML are called “XML applications”. In other words, these custom markup languages are applications of XML, such as XSLT, RSS, SOAP, etc.. XML, like HTML, can be written using any text editor or word processor. There are many XML editors that are available (e.g. EditiX Lite), or plugins such as nxml mode for emacs.
XML documents, like HTML documents, are comprised of tags and data. One big difference between the two documents is that the tags used by an XML document are invented by the author. They are used to describe and structure the data. There are no implied presentation details for XML tag sets.
Notice that tags must be properly nested (a key rule in XML): You must close the nearest unclosed tag. Attributes can be specified within the open-tag.
Although browers will handle improperly nested tags
(like
XML uses the same building blocks as HTML, tags that define elements, values of those elements, and attributes. An XML element is the most basic unit of an XML document. It can contain text, attributes, and other elements. Although whitespace between adjacent tags is ignored by the parser, but is important for human readability. (Whitespace within strings, or between tags and strings, is technically preserved, but may or may not actually be important to the information being represented.)
Tags that begin with <? and end with ?> are called processing instructions. The <?xml?> processing instruction tells the computer what parser to use (it commonly contains attributes for version, encoding). We have also seen the <?xml-stylesheet?> processing instruction, as well as (duh) the <?php?> processing instruction.
Comments in XML are just like XHTML (as in all XML tag sets).
<!-- here's the comment -->
Comments do not nest (arrrgh!),
though a comment can contain multiple lines.
If you want to include
a long string which you don't want processed as XML,
you can you use a special
Example: ch01-cdata.xml
CDATA sections do not nest (arrrgh!),
(CDATA sections are very reminiscent of
PHP's “nowdoc”
multi-line literals,
which spanned from an opening
“
1On Mac, you
can see your iTunes library
at ~/Music/iTunes/iTunes Music Library.xml;
if the file seems to be in a compressed binary format
use
home—lects—exams—hws
breeze (snow day)
©2011, Ian Barland, Radford University Last modified 2011.Mar.30 (Wed) |
Please mail any suggestions (incl. typos, broken links) to ibarlandradford.edu |