Copyright Online, Incorporated Dec 1999 The movie Shakespeare in Love rekindled interest not only in the Bard's plays but also in the life of the playwright. Last April, Harper's published "The Ghost of Shakespeare," an article in which ten essayists debated the true author of the plays-William Shakespeare of Stratford or Edward de Vere, 17th Earl of Oxford.
Jonathan Bate, a professor of English at the University of Liverpool, noted that the first collected edition of Shakespeare's plays-the 1623 edition known as the First Folio-"was adorned with Martin Droeshout's famous woodcut of the dramatist, his forehead domed like the Globe, as if to gesture toward the name of his theater and the universality of his genius." Bate also noted that many parts of the Folio's front matter-the introductory text and commendatory poems-provide evidence that supports the Stratfordian position.
It's difficult to read Bate's essay without wanting to see the First Folio. Fortunately for literary students and scholars worldwide, it now is available on the Web through Early English Books Online. Designed for academic libraries, EEBO offers images of the original pages in more than 96,000 historical literary works.
It's a database that will create unprecedented research opportunities for many academic fields. It might even cause us to reevaluate our knowledge of history. As one head librarian said, it's a database that will help "bring literature alive."
FILLING A DIGITAL VAULT
Literature, linguistics, history, religion, art, and music are just a few of the areas for which the Early English Books collection provides unique research opportunities. Shakespeare, Chaucer, Spenser, Malory, Bacon, Moore, Boyle, Newton, and Galileo are just a few of the authors represented.
Clive Hurst, Head of Rare Books & Printed Ephemera at Oxford University's Bodleian Library, pointed out that "the collection represents 80% of the total surviving record of the English-speaking world, from 1475 to 1700." He said it contains everything "from single-sheet ballads to the great Greek Chrysostom printed in eight folio volumes at Eton in 1610, from an Elizabethan bookplate to the King James Bible."
The collection figures prominently in the history of the company now known as Bell & Howell Information and Learning (formerly UMI). When the company started doing business in 1938, it offered only one product: the Early English Books series on microfilm. Images of the books' pages were captured from the originals in the British Museum.
Now, over 60 years later, the digitization of the microfilm marks a new beginning for the company-it's the first phase in the Digital Vault Initiative. The company is creating a virtual vault by digitizing hundreds of thousands of books, newspapers, and periodicals-over 5.5 billion pages-stored in three real vaults at the company's headquarters in Ann Arbor, Michigan. Scanning the material will continue for years, but titles from Early English Books Online became available through ProQuest Direct (http://www.umi.com/hp/Features/Dvault), the company's online subscription-based service, in December 1998.
EEBO includes all the works-more than 22 million pages-represented in the microfilm series Early English Books I & II, which include the titles listed in Pollard and Redgrave's bibliography, A Short Title Catalogue of Books Printed In England, Scotland, & Ireland and of English Books Printed Abroad, 1475-1640, as well as Donald Wing's A Short Title Catalogue of Books Printed In England, Scotland, Ireland, Wales, and British America, and of English Books Printed In Other Countries, 1641-1700.
To date, Bell & Howell Information and Learning has produced 187 units of the collection on microfilm, and continues to release five more units every year. According to the company's Senior Product Manager Austin McLean, "Early English Books Online is a complement, not replacement, for the higher-- resolution images that exist on microfilm."
EEBO also includes material from the Thomason Tracts microfilm collection, which is a compendium of broadsides on the English Civil War.
GOALS AND SPECIFICATIONS
McLean said the ultimate goal of EEBO is "to provide scholars with a single source for research on Early English Books, including bibliographic citations, full-page representations of all images, and ASCII-encoded text."
In 1997, the company's EEBO project team worked with librarians, library directors, and university faculty members to assess needs, determine features for the database, and investigate how digitized literature could enhance primary areas of scholarship.
The managers also studied other initiatives based on the digitization of microform images, including Project Open Book (http://www.library.yale. edu/preservation/pobweb.htm), which investigated the feasibility of digitizing 100,000 volumes, and the Internet Library of Early Journals (http://rsl.ox. ac.uk/ilej/), which scanned 18th and 19th century journals and put them online.
The research led the EEBO team to develop several specifications for the database:
* For each work the bibliographic information (a MARC record) must be standardized and imported into a Fulcrum database.
The original records, compiled by Bell & Howell Information and Learning catalogers for more than 20 years, contained all the necessary information, but it needed to be reorganized. Titles and author names appeared in multiple fields, in slightly differing formats. The company's employees collapsed several fields and standardized the names.
* The images, which are linked to the bibliographic data, must be delivered quickly.
To meet this specification, the project team reviewed various scanning rates and decided to create and deliver the images, which are CCITT Group 4 TIFF files, at 400 dpi. This falls within the range of 75 to 600 dpi that other digitization projects have used and allows relatively high-quality images to be delivered at an acceptable speed.
For online viewing, the EEBO team decided to use the DjVu image-- compression technique from AT&T Labs (http://djvu.research.att.com/ home_mstr.htm). It provides several features that allow manipulation of the images. For example, users can zoom in and out and print pages one at a time. At this writing, however, the company is reevaluating the technology. "We're looking for the fastest delivery of images possible," McLean said. "DjVu seems very fast, but we're always evaluating new ways to deliver this content, since we have a flexibility that we didn't have with the microfilm. Also, we've had comments from librarians who say, 'Oh, no, not another plug-in.'"
Besides viewing the images online, EEBO users also can download them as PDF documents.
* The Web interface to the database must be suitable for novice users as well as seasoned scholars.
This specification led the EEBO team to create an interface that allows browsing by subjects such as history, literature, and religion, as well as searching by keyword, author, or title. EEBO also includes an advanced interface that makes it easy to build Boolean queries; limit a search to a specific source library, language, or collection; review search histories; and combine previous sets of results.
The results include brief bibliographic citations. After receiving them, researchers can choose to view more detailed citations, mark specific documents for printing or downloading, and sort results alphabetically, by relevance, or by date.
* A partnership with libraries and universities must be established to foster the creation of text files to accompany the page images.
Bell & Howell Information and Learning, Oxford University, and the University of Michigan will launch the initial five-year phase of the text-- encoding initiative early next year. BHIL, which is putting one-and-a-half million dollars into the project, is partnering with 150 libraries willing to invest $10,000 annually over the course of the five years. In return, the partners get not only access to the text collection, but also co-ownership of it.
"Partner libraries can mount some of the text or all of the text on their own university Web sites," McLean said, "or they can access the text through Early English Books Online. So their $50,000 investment gets them ownership of what we hope to be a $9-million pool of converted text." Partners also will be able to help select the materials that will be encoded first, and they will be able to play a role in establishing guidelines for the process.
Partnership is open to libraries with subscriptions to EEBO. An annual subscription costs about $12,500 if the university subscribes to Early English Books on microfilm. Perpetual access to the database costs a one-time fee of $93,750, plus annual maintenance charges. Worldwide, about 150 institutions have bought perpetual access or subscribed to the microfilm series.
For institutions that aren't subscribers, EEBO costs about $30,000 per year, or $300,000 for perpetual access. These are the prices for universities with a full-time enrollment over 7,500 and a doctoral program; lower pricing is available for smaller institutions.
THE SCANNING PROCESS
Early English Books microfilm is stored on 1,000-foot reels. They contain second-generation negative masters, which are direct image copies of the original negatives. Originals were not used for EEBO so they wouldn't be subjected to wear and tear. Image degradation wasn't a concern, because the difference between the originals and the second-- generation is undetectable at 400 dpi.
Bell & Howell Information and Learning employees began scanning the 96,000 titles in June 1998, and finished last summer. The process started in the company's Vault Duplication Department, where employees cleaned the film, visually inspected each frame for damage, and then delivered the rolls to the Scanning Division of the Xerography department. Five scanners dedicated to the project were in use twenty-four hours per day, seven days per week. Scanning operators were trained for two months so they could become familiar with the EEBO content as well as the production system.
| Subscribers to Early English Books Online can view images of Shakespeare's First Folio and 96,000 other historical works. |
The operators monitored the scanning process on an image-detect screen. A blue line crossing the center of it represented a scan field that sensed the microfilm and created a graph beneath the image window. Data on the graph represented the densities (number of pixels) of the images. Spikes appeared when the scan field crossed a page with a dense image such as an illustration. To create the best-quality image, the scanning operator had to manually set an average line between the peaks and valleys of the graph while avoiding spikes. This challenged the operators' skills because much of the Early English Books collection contains dense illustrations.
The images were scanned and inspected for quality, and illustrations were indexed. The company's research found that scholars considered the indexing of illustrations an especially important addition to the database.
The images were stored on a hard drive and, after 650 megabytes of data were compiled, written to a CD-ROM, which then was added to the EEBO CD-ROM jukebox towers-each of which holds 5,280 discs.
Because EEBO was designed as an extension of the microfilm series, each scanned image represents one film exposure. In most cases, the exposure contains two printed pages because the books originally were filmed with their bindings intact. Therefore, the scanned images resemble the "open book" images on the microfilm.
RESHAPING RESEARCH
The University of Texas System libraries were the first customers for Early English Books Online. According to Dennis Dillon, head of Collections and Information Resources at the University of Texas at Austin, representatives of the fifteen libraries in the university system vote on the information resources they want to acquire, and they voted unanimously for Early English Books Online. "It's a first for us," Dillon noted. "It's the first unanimous vote we've had among all the institutions."
He said the database was so well received because it "will bring literature alive for students. One of the other library directors said English faculty members were in his office talking about how excited they were to have access to this resource. They plan to use it in classroom instruction because it will help the students understand the historical nature of the works."
He elaborated: "Having these resources with their archaic type styles and grammatical conventions, along with their more-primitive types of imagery, on the Web and readily available to today's students, will help bring the reality of man's history home to them, and make it easier for them to actually see the gradual nature of cultural improvements. They will be able to easily trace both the development of the conventions of the printed word and the development of western cultural ideas."
Dillon noted that EEBO"is a unique source," but it also "complements other literature resources, such as indices or full-text works devoted to later time periods. It fills a gap in our online literature coverage, and allows us to present a more complete online picture of the development of English literature."
Dillon also said the database will be popular with students because of the way they perform research today: "It is almost impossible to get students to use microfilm. They simply aren't in the library as much as they were in earlier generations, and they are less likely to stumble across classic works of literature or living pieces of history."
Oxford's Clive Hurst noted that "teaching will be transformed. A class can execute searches and see the results instantly-a far cry from calling up, say, fifteen microfilms-ns and laboriously spooling through to find the text you want, or making arrangements with a special-collections librarian to show specific originals. Also, the database can be accessed from an office or study: you don't have to go to the library."
Both Hurst and Dillon also pointed out that increased access to the material will help scholars delve into unexplored bits of history. Dillon said the database brings Early English Books "to the front in a way they haven't been for several generations. Having these works on the Web makes them more available to scholars, who may once again mine them to find overlooked bits of our cultural heritage."
Hurst noted that the database may even reshape our view of history: The ability to search for titles, imprints, and dates will mean that many books which haven't been read for hundreds of years will be consulted by scholars on this database. Our received views on the life of the 16th and 17th centuries will, in my opinion, have to be radically reassessed in the light of the evidence now readily accessible."
| An important goal of EEBO was to create a search interface suitable for novice users as well as seasoned scholars. |
| Much of the Early English Books collection is rich with illustrations. This is an image from a work by Galileo published in 1663. |
| [Author Affiliation] |
| Thomas Pack is a freelance writer who lives near Louisville, KY. Communications to the author should be addressed to ThomasPack@aol.com |