RU beehive logo ITEC dept promo banner
ITEC 325
2011spring
ibarland

homelectsexamshws
breeze (snow day)

ch01
XML overview

Originally based on XML Visual Quickstart Guide by Kevin Howard Goldberg, and notes therefrom by Jack Davis (jcdavis@radford.edu).

Chapter 1 - XML

In 1991, the first Web site was put online Twenty years later (not yet quite an adult), in 2011.Mar, one report reported 100 million active websites (e.g. onsite.bloggspot.com/), which (extrapolating from 2005 estimates of ~275 pages/site) would mean nearly 30billion (30,000million) pages. Another site even claims 20 trillion pages (that's 20,000,000 million); this might be trying to count the number of dynamic pages ever generated??

Most of the information available on the web is made available through HTML. HTML's simplicity has helped fuel the popularity of the Web. However, when faced with the Web's huge and growing quantity of information, it has presented real limitations. These limitations are evident when the task is to categorize, search or perform other operations on web available information.

XML (“eXtensible Markup Language”) is a general-purpose grammar/specification for structured (hierarchical) data. It can be thought of as a tree, where the leaves are strings, and the nodes are element-tags (with their attributes).

XML is used to define (other) markup languages (tag sets), like XHTML, or iTunes libraries1?. While HTML is designed to specifically to display web pages, XML is used to structure and manage many sorts of information.

Every custom markup language created using the XML specification must adhere to XML's underlying grammar. Custom markup languages created with XML are called “XML applications”. In other words, these custom markup languages are applications of XML, such as XSLT, RSS, SOAP, etc.. XML, like HTML, can be written using any text editor or word processor. There are many XML editors that are available (e.g. EditiX Lite), or plugins such as nxml mode for emacs.

XML documents, like HTML documents, are comprised of tags and data. One big difference between the two documents is that the tags used by an XML document are invented by the author. They are used to describe and structure the data. There are no implied presentation details for XML tag sets.

XML document rules

XML uses the same building blocks as HTML, tags that define elements, values of those elements, and attributes. An XML element is the most basic unit of an XML document. It can contain text, attributes, and other elements. Although whitespace between adjacent tags is ignored by the parser, but is important for human readability. (Whitespace within strings, or between tags and strings, is technically preserved, but may or may not actually be important to the information being represented.)

Tags that begin with <? and end with ?> are called processing instructions. The <?xml?> processing instruction tells the computer what parser to use (it commonly contains attributes for version, encoding). We have also seen the <?xml-stylesheet?> processing instruction, as well as (duh) the <?php?> processing instruction.

Comments in XML are just like XHTML (as in all XML tag sets).
<!-- here's the comment -->
Comments do not nest (arrrgh!), though a comment can contain multiple lines.

Entities


1On Mac, you can see your iTunes library at ~/Music/iTunes/iTunes Music Library.xml; if the file seems to be in a compressed binary format use plutil -convert xml1 fileToConvert. Actually, iTunes stores everything as “dictionaries” (php: arrays) — dictionaries-of-dictionaries-of-dictionaries, if you look closely.      

homelectsexamshws
breeze (snow day)


©2011, Ian Barland, Radford University
Last modified 2011.Mar.30 (Wed)
Please mail any suggestions
(incl. typos, broken links)
to iba�rlandrad�ford.edu
Powered by PLT Scheme