Wednesday, May 26, 2010

C.I.T.E - The Infrastructure of the Homer Multitext (Part 1 - Introduction)

The Infrastructure of the Homer Multitext


     C · I · T · E


The Homer Multitext (HMT) is a project of the Center for Hellenic Studies of Harvard University (CHS). It is best described in the words of its editors, Casey Dué and Mary Ebbott:
“The Homer Multitext project, the first of its kind in Homeric studies, seeks to present the textual transmission of the Iliad and Odyssey in a historical framework. Such a framework is needed to account for the full reality of a complex medium of oral performance that underwent many changes over a long period of time. These changes, as reflected in the many texts of Homer, need to be understood in their many different historical contexts. The Homer Multitext provides ways to view these contexts both synchronically and diachronically.” (From the CHS website)
Dué and Ebbott, in collaboration with the Director of the CHS, Gregory Nagy, and the CHS’s Head of Publications, Leonard Muellner, initiated research toward this project with an eye to advancing particular arguments about the nature of Homeric poetry. But anyone interested in epic poetry, Greek poetry in general, and the intellectual history of the Greco-Roman world, the cultures that came into contact with it, and those that succeeded it, stand to profit from the project.

Overview

The HMT aims to collect, as comprehensively as possible, all of the sources for our knowledge of the Homeric epics, and to publish these online, freely accessible to any interested reader.

These sources include versions of the Iliad and Odyssey, and the surviving pieces of lesser-known epic poems born in the Greek Bronze Age. These versions may be fragments of papyrus found in the sands of Egypt or manuscripts produced under the Byzantine Emperors of Constantinople. These sources also include texts of later Greek and Roman writers who quote from Homer, writers such as Plato, Aristotle, Herodotus, and Thucydides. A particularly rich body of evidence comes from the writings of the literary scholars who worked in the Libraries of Alexandria and Pergamum; the works of these writers do not survive intact, but thousands of excerpts from them and references to them do survive, as comments written in the margins of manuscripts.

Dué and Ebbott are committed to providing the most useful access possible to these sources. This means offering texts of those sources in the original Greek and translated into modern languages where possible. It also means providing high-quality digital facsimiles of the actual manuscripts wherever possible.

It is impossible to overstate the value of digital facsimiles. The Greek and Latin texts that we can check out of libraries, or find online, are highly processed documents. Editors will compare different manuscripts of a work – which always differ – and produce a uniform text that is identical to no single medieval or ancient “witness” to the work. Responsible editors will provide notes explaining in what ways their edited text differs from particular manuscripts, but these notes – even the most meticulous – fall far short of providing the depth of information that can be gleaned from direct access to good images of the manuscripts themselves.

Scholarship based entirely on edited texts is fundamentally handicapped. However brilliant the scholars working from these texts may be, their insights will be limited by the absent editors of their source-texts, by their assumptions, and by the innumerable details that disappear on the journey from the hand-written manuscript, through generations of editions, to the shelves of the library. 

For the past century, scholars of Greece and Rome have been content for the most part to work from edited texts. There were justifiable reasons for this – practical, technological, and economic reasons. None of those justifications survived the turn of the 21st Century.

In addition to texts and images, other kinds of data might shed light on Homeric poetry: morphological and lexical data, lists of persons, geographic information (where is "Sandy Pylos” or “Horse-rolling Thessaly”, is a reference to Thebes pointing to Seven-Gated Thebes, or Hundred-Gated Thebes in Egypt?), and so forth.


The Challenge

To bring these disparate materials online in a useful way posed a challenge. The collaborators on the HMT wanted an all-purpose infrastructure that would both contribute to end-user applications for browsing, searching, and reading, but would also make the raw data available for discovery and retrieval. 

Some kind of digital library infrastructure was necessary, but the complexity of the anticipated contents of that library posed another problem. A digital library containing highly diverse data, which is expected to expand indefinitely must be exposed through protocols that define requests and responses. Those requests and responses should allow discovery of contents, access to objects, retrieval of parts of objects – passages of texts, data elements, parts of images – and querying, manipulation, and other kinds of processing.

Since the data is highly varied and the possible uses of the data potentially infinite,  should the protocol become correspondingly complex, then the infrastructure would become, essentially, an end-user application, useable only to its creators, fragile and difficult to maintain, and increasingly vulnerable to obsolescence as time goes by.

Almost a decade of thinking and experimentation went into defining a generic, scaleable protocol that enables scholarly access to and use of these materials in a networked environment, as simply as possible.
This was mainly the task of the HMT’s Project Architects, Neel Smith and me, Christopher Blackwell.

Our answer is C.I.T.E., that is, Collections, Indices, Texts, and Extensions.

This looks like four things, but it is really only three: texts, collections, and indices. In our conception of the requirements of the Homer Multitext, we have reduced scholarship to these three kinds of digital object, have defined protocols for working with each, and have working code that implements each.
In the next installments of this series of postings, I will describe each element in the C.I.T.E. architecture in some detail. Finally, I will describe how they can be brought together to build rich applications for sholarship.


A Final Note

Any discussion of a “generic infrastructure for scholarship” will inevitably sound like the beginning of an evangelical spiel about how everyone needs to adopt the speaker’s pet approach to data. That is not our intention here. 

Our dear friend, the late Professor Ross Scaife, was once playing advocatus diabli as I was describing our protocol for texts. “How many other projects need to adopt this protocol for it to be useful?” My colleague Neel had the answer: “One, ours.”

We have developed C.I.T.E. because we needed something like it in order to do what we want to do with the history of Homeric texts. I am describing it here because it is the foundation for much of the ongoing research of the HMT team, which we will also document here, and it might be of interest to other scholars working on similar projects.

All computer code developed for the HMT is free and open-source; all data published by the project is open-content under a Creative Commons or similar license.


Next… Part 2 - Texts



No comments:

Post a Comment