home—lects—hws
D2L—breeze (snow day)
XML
what is it?
Course Overview
where we’ve been; where we’re going
In my mind, here's what we've talked about so far:
-
client-server web model: What info travels between them; what choices do they each make?
-
Unit-testing; writing functions that call helper functions;
functions that returns strings-of-html.
(All happen to use php, but that's inconsequential).
-
Forms: how the information is passed from client to server, in PHP
-
Server-side validation:
What needs to be validated, and why;
how to structure the code well.
-
Javascript and the DOM; modifying the DOM at run-time;
reinforcing good code design (via refactoring).
-
Client-side validation:
What needs to be validated (slightly different that server-side), and why;
how to structure the code well (only slightly different than server-side).
Using built-in html attributes, for validation.
-
Cookies and sessions: how they work; risks (and benefits) of third-party cookies.
-
server-side database connectivity:
php functions for connecting;
review SQL injection;
prepared statements.
Issues re storing passwords.
The remainder of the lectures are about XML
-
What is XML; advantages as a format (aside: bitmap vs vector representations)
-
XSLT (example of a domain-specific language … with several odd quirks.)
-
xpath (a general technology/API for referring to nodesets; useful in many contexts)
-
DTDs (an example of defining your syntax)
XML
What is XML?
- A1: “HTML, but you get to make up the tags”.
- A2: eXtensible Markup Language. (Whatever.)
- A3: A way of encoding a tree as a string.
(In particular, a tree with typed
nodes.)
In the 80s, SGML was an obscure markup language;
Tim Berners-Lee based HTML on it in 1990,
which by the mid-90s was widely known.
But because early browsers, trying to get market share, all worked hard to make sense of
ill-formed HTML pages (tag soup), which in turn led people to learn and write bad HTML,
the standardization process came to allow all sorts of awful shortcuts
(e.g. you don't need to put attributes in quotes, or close your p tags, etc.).
In reaction to the sorry state of HTML,
XML was a reiteration
(and narrowing) of SGML.
As always: for your web pages, I recommend using all XML conventions when writing your HTML5.
Even though you technically don't need to (say) close your p tags in HTML5,
the XML requirements are all good ones which help clean data and interoperability.
Examples of XML documents
Here are some specific examples:
- xhtml web documents (duh).
HTML's tags are designed specifically for marking up documents:
There are tags for paragraphs, ordered-lists, tables, etc..
- iTunes library-backup: see ~/Music/iTunes/iTunes Music Library.xml.
(If you modify that file,
it doesn't actually change the iTunes data unfortunately;
the .xml is merely provided by iTunes as “output” for other programs to be able to read.)
-
.svg (“Scalable Vector Graphics”) files,
which are supported by all major browsers.
E.g. show-source this .svg file.
Note that because the information is not a bitmap, but rather
how-to-create-the-image,
it has the potential to scale particularly well.
- Microsoft Office documents — .docx files (etc.) are actually in zip format;
if you unzip, you can see sub-folders, including the content as an xml file.
Example:
(view the spreadsheet's xml)
As an aside: vector graphics vs. bitmap graphics:
What are pros and cons of each?
For sound, compare:
Beethoven's Fifth on piano,
vs.
midi-with-simple-tones
(or
midi-with-good-tones)
There are also fusions of the two (rather, vectorgraphics where bitmaps are one primitive):
E.g. badger-badger-badger.swf.
Keeping data in XML format
has several nice properties:
- It's a string-format which easily represents hierarchical information.
- It's human-readable (compared with a proprietary binary format often used otherwise).
- There are many nice tools for working with it:
-
syntax validators
-
triangular buttons for showing/hiding sub-trees
-
syntax highlighters
-
parsers (incl. sanitizing)
-
etc.
If you make your own information and store it as XML, you get all
these tools and libraries for free.
-
You have a default format for serializing your information.
It's worth noting that other formats also achieve this — notably,
json (reasonable),
and
yaml (not preferred — too many surprising exceptions).
home—lects—hws
D2L—breeze (snow day)
This page licensed CC-BY 4.0 Ian Barland Page last generated 2018.Apr.09 (Mon) | Please mail any suggestions (incl. typos, broken links) to ibarlandradford.edu |
|