![]() |
||||||
|
SHAKSPER 2004: More about the Nameless Shakespeare
From: Hardy M. Cook (editor@shaksper.net) Date: 12/14/04
The Shakespeare Conference: SHK 15.2101 Tuesday, 14 December 2004 From: Martin Mueller <martinmueller@northwestern.edu> Date: Monday, 13 Dec 2004 14:01:04 -0600 Subject: More about the Nameless Shakespeare We have made some progress with The Nameless Shakespeare in the context of our WordHoard project. You may now download the texts from www.library.northwestern.edu/shakespeare. More usefully, you may order it from http://www.natcorp.ox.ac.uk/babyinfo.html as part of XAIRA CD-ROM, which includes XAIRA, an XML aware search engine, and the one million word sample of the British National Corpus. Alternately, you could sign up as a beta tester for XAIRA (http://www.oucs.ox.ac.uk/rts/xaira/) and download the texts from our site. I would also like to tell you about a new feature of the current interface to the Nameless Shakespeare at www.library.northwestern.edu. If you click on any line in the text, you are taken to a transcription of the relevant column in the Folio text, with the hit line marked in red. This means that for any reader of the modern text information about the orthography and punctuation of the Folio is only a couple of seconds away. The transcriptions come to us courtesy of the Internet Shakespeare editions. The Nameless Shakespeare is a TEI-encoded, lemmatized, and morphosyntactically tagged text of the plays and poems of Shakespeare. It is based on a thorough revision of the Globe Shakespeare. It is a modern-spelling edition that tries to preserve the morphological and prosodic features of the Folio and Quarto source texts. The header document to the text files describes the editorial and tagging procedures in some details. The raw files of the Nameless Shakespeare are not meant to be 'human readable' texts. Not much pleasure or wisdom can be got out of looking at something like <l part="N" id="sha-juc101001"><w wt="av" pos="av">Hence</w><c>!</c> <w wt="n" m="sg" pos="n">home</w><c>,</c> <w wt="pnp" m="2pl" pos="pnp">you</w> <w wt="aj" pos="aj">idle</w> <w le="creature" wt="n" m="pl" pos="n">creatures</w> <w wt="v" m="pr" pos="v">get</w> <w wt="pnp" m="2pl" pos="pnp">you</w> <w wt="av" pos="av">home</w><c>:</c> <lb n="JuC.6"/></l> which is the fully encoded first line of Julius Caesar. If you look closely at this hideously verbose encoding you will notice that it spells out in tedious detail some very primitive facts that every minimally competent reader will bring to the task of decoding the words on the page. This does little good in looking at the text word by word. But with the right kind of search tool (such as XAIRA or the WordHoard tools we are developing) this information can serve as the point of departure for many stylistic inquiries. The tagging of the Nameless Shakespeare was done automatically but went through several rounds of manual error checking. I believe that there is a residual error rate of 0.7%. This is virtually meaningless for any quantitative inquiry. On the other hand, it means that in a play of 20,000 words something is wrong with about 150 tags. I would like to get this error rate much closer to zero and will be grateful for any corrections. There is an error report form at www.library.northwestern.edu/shakespeare. If there volunteers who are attracted by the thought of chasing errors, I can provide them with Excel files that show a "verticalized" form of the text, in which you read downward row by row and see the morphosyntactic tag next to the word with a special column for marking an error. Errors discovered in one play can be automatically related to the same errors occurring elsewhere. Thus a volunteer who finds 150 errors in one play is likely to correct 300-500 errors across the corpus. This is a boring but painless and effective way of doing a little philological good in the world. If you're interested, please write to me at martinmueller@northwestern.edu Martin Mueller Professor of English and Classics Department of English Northwestern University Evanston, Illinois 60208 martinmueller@northwestern.edu 847-864-3496 _______________________________________________________________ S H A K S P E R: The Global Shakespeare Discussion List Hardy M. Cook, editor@shaksper.net The S H A K S P E R Web Site <http://www.shaksper.net> DISCLAIMER: Although SHAKSPER is a moderated discussion list, the opinions expressed on it are the sole property of the poster, and the editor assumes no responsibility for them.
|
|
|||||