![]() |
||||||
|
SHAKSPER 2008: XML (eXtensible Markup Language)
From: Hardy M. Cook (editor@SHAKSPER.NET) Date: 05/21/08
The Shakespeare Conference: SHK 19.0308 Wednesday, 21 May 2008
[1] From: Gabriel Egan <mail@GabrielEgan.com>
Date: Tuesday, 20 May 2008 17:37:14 +0100
Subj: Re: SHK 19.0306 XML (eXtensible Markup Language)
[2] From: Michael Best <mbest1@uvic.ca>
Date: Tuesday, 20 May 2008 16:17:17 -0700
Subj: Re: SHK 19.0306 XML (eXtensible Markup Language)
[1]-----------------------------------------------------------------
From: Gabriel Egan <mail@GabrielEgan.com>
Date: Tuesday, 20 May 2008 17:37:14 +0100
Subject: 19.0306 XML (eXtensible Markup Language)
Comment: Re: SHK 19.0306 XML (eXtensible Markup Language)
Hardy Cook asks:
>So my first question is do Office 2007's files in
>XML save to "a format that will be reliably modified
>to work on any future system"? In other words, is there
>the same problem with Microsoft's XML standard
>as there is with its HTML standard?
Yes, the problem is almost precisely the same. Microsoft's
implementation of XML is crippled in most versions of its software. To
understand how, it's necessary to know a little about XML. Taking HTML
as the starting point, those who know HTML will agree that there are
predefined 'tags' that one can put around elements in the text. Thus, a
paragraph of text begins with a <p>tag and ends with </p>tag, and an
italicized word begins with in <i>tag and ends with an </i>tag.
As well as defining the tags, the HTML standard defines certain rules
about the tags and the relationships between them. For example, tags are
in general embedded, one within another, like Russian dolls. If a word
'house' is to be both italicized and underlined, the tags must be paired
like this <i><u>house</u></i>and not overlapped like this
<i><u>house</i></u>.
The definitions of the tags and the rules that govern their
relationships are built into the HTML standard, indeed that's all HTML
is: the standard.
XML works the same way, except that rather than us all agreeing on the
tags and the rules beforehand, XML allows the user to define the tags
and the rules. Thus for any XML document there have to be two texts:
the document itself and the 'schema' that defines the tags and the
rules. Because XML is really just the standard for writing schemas, all
sorts of disparate kinds of data can be represented in XML. Once you've
defined the schema for, say, the representation of questions in a
multiple-choice online quiz, you've created a new tagging standard
rather like HTML, but one suited to your purpose. (Of course, this has
already been done and the result is QML, or Question Markup Language. If
your online quiz software is QML compliant, all quizzes written in
conformance with QML will work on your system. That's where the claim
of inter-operability comes in whenever people extol the virtues of XML.)
The problem with Microsoft's implementation of XML is that you don't get
to write the schema of a Word document unless you buy the most expensive
variant ('Enterprise' or 'Professional' edition) of the software. All
ordinary users find that their '.docx' Word documents are written to a
predetermined schema supplied by Microsoft called WordML.
Unsurprisingly, it's execrable and works with nothing else: it was
designed merely as an embodiment of the proprietary format Microsoft was
already using for Word files (the '.doc' format). The point was to give
the appearance that Microsoft had gone over to an Open Standards
philosophy, while maintaining proprietary control.
Gabriel Egan
[2]-----------------------------------------------------------------
From: Michael Best <mbest1@uvic.ca>
Date: Tuesday, 20 May 2008 16:17:17 -0700
Subject: 19.0306 XML (eXtensible Markup Language)
Comment: Re: SHK 19.0306 XML (eXtensible Markup Language)
Hardy M. Cook wrote:
>My second question is to Michael: How are files for the
>Internet Shakespeare Editions encoded into XML? Are
>they encoded manually or do you use a program to
>perform the encoding? And if so, what is that program
>or what is the process?
This is an excellent and deceptively simple question. As Hardy will
realize, since he has been working on the poems for the Internet
Shakespeare Editions, Shakespeare's texts are complex, and our aims
ambitious. We aim to encode, in the old-spelling texts now on the site,
a great deal of information about both the semantic structure of the
plays (how they are divided into acts, scenes, speeches, and so on), and
about the physical structure of the books they were published in, with
their division of pages, columns, and physical lines. Normal XML does
not deal elegantly with this level of complexity, and has to privilege
one of these structures. Our response has been to encode the plays and
poems initially in an earlier, more flexible standard (SGML -- Standard
Generalized Markup Language), from which we generate separate XML files
for the different structures.
Unfortunately there is as yet no program that simplifies the process of
encoding files of this kind. We have developed our own software to
generate the XML files, and use Oxygen -- a powerful XML editor -- to
work with them. Our general principle is to use Open Source software
where possible, because it adheres more closely to accepted standards
than much proprietary software. As Hardy comments, Microsoft software in
general fails to follow the standards set by the ISO (International
Organization for Standardization, www.iso.org); I have not looked deeply
at their XML, but I do know that it is very difficult to work with.
Perhaps others on the list will be able to respond more fully.
Cheers--
Michael
Coordinating Editor, Internet Shakespeare Editions
<http://internetshakespeare.uvic.ca/>
Department of English, University of Victoria
Victoria B.C. V8W 3W1, Canada.
_______________________________________________________________
S H A K S P E R: The Global Shakespeare Discussion List
Hardy M. Cook, editor@shaksper.net
The S H A K S P E R Web Site <http://www.shaksper.net>
DISCLAIMER: Although SHAKSPER is a moderated discussion list, the
opinions expressed on it are the sole property of the poster, and the
editor assumes no responsibility for them.
|
|
|||||