18.
Electronic Texts: Audiences and Purposes
Perry Willett
History
… today, even the most reluctant scholar has at least an indefinite notion of the computer's ability to store, manipulate and analyse natural language texts, whilst his less wary colleagues are likely to meet with a sympathetic reception if they approach their local computing centre with proposals for literary or linguistic research – some universities, indeed, have set up institutes and departments whose special task it is to facilitate work of this kind.
Wisbey, The Computer in Literary and Linguistic Research: Papers from a Cambridge Symposium, vii
Claims of the ubiquity of electronic text, such as this one published in 1971, may seem exaggerated for an era before personal computers, before standards for character or text encoding, before the World Wide Web, yet they have come more or less true. What is remarkable about this and other writing on electronic texts in the humanities published before 1995 is how well they predicted the uses of electronic text, before many people were even aware that text could be electronic. It is important to remember the introduction of the World Wide Web in the mid-1990s in evaluating the statements of computing humanists, for prior to the Web's arrival, while a great deal was written on the uses of and audiences for electronic text, almost no one foresaw such a powerful tool for the wide distribution of electronic texts, or that wide distribution for a general reading public would become the most successful use made of electronic texts.
It is well documented that the history of electronic text is almost as long as the history of electronic computing itself. Vannevar Bush famously imagined vast libraries available via the new technology in 1945. Father Roberto Busa began his pioneering effort to create the Index Thomisticus, the monumental index to the works of St Thomas Aquinas, in 1946 (Busa 1950, 1992). Other pioneers adopted computers and advocated their use in literary and linguistic research, with electronic texts as the necessary first ingredient. The same early study quoted above divides the world of humanities computing into several categories, as shown in its table of contents:
1 Lexicographical, textual archives, and concordance making
2 Textual editing and attribution studies
3 Vocabulary studies and language learning
4 Stylistic analysis and automated poetry generation
This list encompasses all aspects of humanities scholarship, from literary analysis and author attribution studies, to scholarly editing, to rhetoric and language studies, to the creation of archives of electronic text. While automated poetry generation has yet to find its audience, the other categories seem remarkably prescient. Humanities computing now includes media in other formats such as digital images, audio and video, yet early studies viewed electronic text as the starting point. Electronic text has indeed become common in humanities research and publishing. Many other early studies (Lusignan and North 1977; Hockey 1980; Bailey 1982) enthusiastically describe the potential that computers and electronic text hold for literary studies, and describe similar audiences.
These scholars developed a vision of the importance of electronic texts for the humanities, and developed the standards by which they are created. Humanists, precisely because of their sophisticated understanding of text, have a central role in the development of standards driving the World Wide Web, with far-reaching implications for what can be done today on the Internet.
Still, the kind of environment described by Wisbey in 1971, with computing centers sympathetic to humanists' concerns, existed at only a handful of universities and research centers at that time. One was more likely to find scientists and social scientists at such computer centers than literary scholars or historians. Universities might have had at most one or two humanities professors interested in electronic texts, leaving computing humanists largely isolated, with annual conferences and specialist journals the only opportunities to discuss issues, ideas, and developments with like-minded colleagues.
Skepticism about the use of electronic texts in humanities research has a long history also, and not just among traditionalists. Despite the best efforts of computing humanists, electronic text remained the domain of a few specialists into the early 1990s. In 1993, Mark Olsen wrote:
[c]omputer processing of textual data in literary and historical research has expanded considerably since the 1960s. In spite of the growth of such applications, however, it would seem that computerized textual research has not had a significant influence on research in humanistic disciplines and that literature research has not been subject to the same shift in perspective that accompanied computer-assisted research in the social science oriented disciplines, such as history. … In spite of the investment of significant amounts of money and time in many projects, the role of electronic text in literary research remains surprisingly limited.
(Olsen 1993–4: 309)
He goes on to list several reasons to support this assertion. Most importantly, Olsen notes a distrust of computing methodology among humanists at large, and believes that most research involving computing is too narrowly focused to one or two authors. Olsen spoke not as a traditionalist, for he was (and remains) the director of the pioneering collection of electronic text, the American and French Research on the Treasury of the French Language (ARTFL). He describes the potential of harnessing computers to large collections of literature, and the ability to trace concepts and important keywords over time both within and among various authors' works, along with more sophisticated kinds of analysis. He notes that in at least some disciplines, in the era prior to the World Wide Web, a critical mass of primary texts was available, pointing to the Thesaurus Linguae Graecae (TLG) for classics and ARTFL for modern French. Yet, he still concludes that the potential goes largely ignored. Alison Finch in 1995 displayed even more skepticism in her analysis of computing humanists, as she explored the heroic narratives invoked by literary researchers who use computers, while she claimed that these studies lacked importance to the larger field of literary studies.
Other commentators, less technologically savvy, see darker implications. The best known of these critics, Sven Birkerts in The Gutenberg Elegies (1994) and Nicholson Baker in Double Fold (2001), mourn the changes wrought by the development of electronic media, and fear that books, once decoupled from their physical presence, will lose their meaning and historical importance. Birkerts, also writing pre-World Wide Web, in particular fears that the digitization of books may lead to the "erosion of language" and a "flattening of historical perspectives" (Birkerts 1994: 128–9). He does not consider other possible outcomes, such as one in which general readers and scholars alike have a better sense of the concerns and ideas of peoples and historical periods with increased access to works otherwise available in only a few libraries. The development of digital collections does not require the destruction of books; instead, it may provoke more interest in their existence and provide different opportunities for their study through keyword and structured searching.
More nuanced critiques recognize the disadvantages of digital technology, while exploring its uses and potentials. As Bornstein and Tinkle noted:
We agree with the proposition that the shift from print to digital culture has analogies in scale and importance to the shift from manuscript to print culture beginning in the Renaissance. Further, the change enables us to consider more strikingly than ever before the diverse characteristics of manuscript, print, and electronic cultures. This is particularly important now that new electronic technologies are making possible the production, display, and transmission of texts in multiple forms that far exceed the more fixed capacity of traditional codex books.
(Bornstein and Tinkle 1998: 2)
Most of these studies were written prior to widespread popularization of the World Wide Web. As a tool, the Web does not solve all of the problems surrounding the creation, storage, delivery and display of electronic texts, but certainly simplifies many of them, particularly in comparison with the older technologies used for electronic text storage and delivery such as ftp, listserv, and Gopher. The early adopters, enthusiasts, and critics viewed electronic texts as the domain of scholars and researchers, and used electronic texts to assist in the traditional work of humanists as outlined in the table of contents reproduced above. They did not foresee the World Wide Web's capacity to reach a wider reading public.
Has the world of electronic texts changed with the ubiquity of the World Wide Web, or are these criticisms still accurate? Put another way, who is the audience for electronic texts today? A recent book on electronic text (Hockey 2000) provides a current view on their audience. Hockey lists these categories in her table of contents:
1 Concordance and Text Retrieval Programs
2 Literary Analysis
3 Linguistic Analysis
4 Stylometry and Attribution Studies
5 Textual Critical and Electronic Editions
6 Dictionaries and Lexical Databases
On the whole, this list is not significantly different from the categories of uses and users in the book edited by Wisbey almost 30 years earlier. Hockey believes that "most of the present interest in electronic texts is focused on access" (Hockey 2000: 3), that is, the use of computers to store and deliver entire texts. In her view, access is the least interesting aspect of electronic texts, for it leaves largely unexploited their real power: the ability for texts to be searched and manipulated by computer programs.
What is Electronic Text?
Answering this simple question could involve textual and literary theory and their intersection with digital technology, as discussed elsewhere in this volume. I wish to focus instead on practical considerations: what forms do electronic texts in the humanities take?
The first type to consider is an electronic transcription of a literary text, in which characters, punctuation, and words are faithfully represented in a computer file, allowing for keyword or contextual searching. Just what is meant by a "faithful representation" of a printed book is at the crux of a long debate, again explored elsewhere in this volume (see chapters 16, 17, and 22, for discussions of theoretical issues, and chapter 32 for more practical matters). A transcription in the form of a computer text file is the most compact form for electronic text, an important consideration given the severe constraints on computer storage and bandwidth that still exist for some people. In addition, this form provides greater ease of manipulation, for searching, for editing, and has significant advantages for people with visual impairments. Another basic form is a digital image of a physical page, allowing readers to see a representation of the original appearance, of great importance for textual scholars. This once seemed impractical given the amount of storage necessary for the hundreds of image files needed for even one book. Now, with ever greater storage available even on desktop computers, this is much less of an issue (while bandwidth remains an issue). Some collections, such as those available through digital libraries at the Library of Congress, the University of Virginia, and the University of Michigan, combine the two forms.
Another category is encoded text. For a multitude of reasons, many scholars and editors believe that encoding schemes, such as that developed by the Text Encoding Initiative (TEI), provide the best and fullest representation of text in all its complexity, allowing the creator or editor an opportunity to encode the hierarchical structures and multitude of features found in text. In addition to the belief that explicit structural markup along with robust metadata will allow for greater longevity of electronic texts, those who choose the TEI use its many elements and features for sophisticated analysis and repurposing of electronic texts for electronic and print publishing. The people developing the TEI have used open standards such as SGML and XML, as well as providing extensive documentation to assist in the use of the standard, as a way of encouraging and promoting the preservation and interchange of e-texts.
Others, notably people associated with Project Gutenberg, distrust all encoding schemes, believing that the lack of encoding will better allow e-texts to survive changes in hardware, operating systems, and application software. No special software and little training are required for contributors to Project Gutenberg, and a volunteer ethos prevails. They will have over 10,000 titles at the end of 2003, but little can be said for the accuracy of the transcriptions, for there is no central editorial control. The lack of encoding means it would be impossible, for example, to separate notes from text, or to determine quickly where chapters begin and end, or to indicate highlighting and font shifts within the text. Still, Project Gutenberg's founder Michael Hart has tapped into a volunteer spirit that drives most open source projects.
Encoding and editorial standards remain expensive, and the more encoding to be added and editing performed, the higher the level of expertise about the original document and/ or the technology for searching and display is required. Projects such as the Women Writers Online project at Brown University and the Model Editions Partnership provide extensive documentation about editorial standards and practices. Other projects listed on the TEI Consortium website provide examples of equally detailed documentation and exacting editorial standards.
Some projects have avoided this additional cost by converting page images to text using optical character recognition (OCR) software, and allowing readers to perform keyword searches against the unedited and unencoded text files. The results are displayed as digital images of the original page, with the uncorrected text hidden. While imperfections impede precise and complete results from keyword searches, proponents believe that with adequate OCR accuracy, the results will still prove useful, and at a significantly lower overall cost to produce than highly accurate transcriptions. Some extremely large e-text collections with tens of thousand of volumes and millions of pages, such as the Making of America and American Memory collections, are based upon this idea, pioneered by John P. Wilkin, called "rough OCR" or "dirty OCR."
Several people have pursued the idea of creating an archive of works containing manuscripts and variant editions, meant for both sophisticated researchers and general readers. In this type of publication, researchers would have access to all manuscript versions, allowing them to trace the development of a work through its variants, while a general audience could use a preferred reading created by the editors and derived directly from the sources. This kind of archive, proposed for authors such as Yeats, Hardy, and others, would be a boon to both audiences, but has proven to be very difficult. Outside of a few notable exceptions such as the Canterbury Tales Project, this idea has rarely been realized, indicating the enormous commitment and difficult work required.
A very different category is hypertext. Works of this type are generally original and not representations of previously published works. Hypertext would seem to have great potential for expression, allowing for a multiplicity of narrative choices and making the reader an active participant in the reading experience. Hypertextual content is generally bound inextricably with hypertext software, making productions in this format even more ephemeral than other kinds of electronic text. Some hypertext authors, such as Michael Joyce and Stuart Mouthrop, recognize the impermanence inherent in the genre and incorporate it into their hyperfictions, but this fundamental aspect of its nature will make it difficult or impossible for libraries to collect hyperfictions for the long term. Eventually, even if the medium on which the hypertext is stored remains viable, the software on which it relies will no longer run. While early critics such as Robert Coover believed (with some hope) that hypertext would lead to "the end of books", others, such as Tim Parks, dismiss the genre as "hype."
Creating Electronic Text
Scholars, students, librarians, computing professionals, and general enthusiasts of all kinds create and publish electronic texts. Commercial publishers, notably those previously known for publishing microfilm collections, are digitizing and licensing access to ambitiously large collections of tens of thousands of volumes. A few projects, such as the University of Virginia's collaboration with Chadwyck-Healey to publish Early American Fiction, and the Text Creation Partnerships formed for Early English Books Online and the Evans Early American Imprints, are examples of partnerships between libraries and publishers.
How do humanities scholars and students approach the creation of electronic texts? One axis stretches from a traditional humanities approach, taking time and great pains to create well-considered, well-documented, well-edited electronic works, such as the Women Writers Online project at Brown University, the archive of Henrik Ibsen's writings, sponsored by the National Library of Norway and hosted by the universities of Oslo and Bergen, and the archive of Sir Isaac Newton's manuscripts at Imperial College, London, to less rigorous efforts, relying on volunteers and enthusiasts, with the best-known examples being Project Gutenberg and Distributed Proofreaders. This difference in approaches mirrors various editorial approaches in the past century, with the same difference occurring between scholarly editions and reading editions, each with different purposes and audiences. For some scholars, the latter type suffices for classroom use or quick consultation, while for others, the extensive documentation and editorial standards of the former are of paramount importance.
Interestingly, skepticism may play a central role in either attitude. On the one hand, Project Gutenberg enthusiasts, most notably project leader Michael Hart, express skepticism about the added software and expertise needed to create and read encoded electronic texts. His concerns, forged by painful hardware and software changes beginning in the mainframe era, are founded in the belief that the lowest common denominator of encoding will ensure the largest readership and longevity of the texts. Others, equally concerned about longevity and the demands created by changing hardware and software, believe that encoding systems based on international standards such as SGML, XML, and Unicode allow for the best representation of complex textual structures and writing systems, and these formats will prove most useful for scholars and general readers, as well as being the most promising means for the long-term archiving of these works. It should be noted that even some Project Gutenberg texts are available in HTML, PDF, or Microsoft Ebook reader formats, presumably for the added functionality or characters available beyond plain ASCII. Still, both sides view the other with suspicion.
High-cost equipment is not required for creating electronic text – it can be achieved with the simplest of computers and word processing software. Scanning equipment and OCR software continue to drop in price. However, our expectations of accuracy in printed texts are high – we are generally disappointed to find any errors, and certainly disappointed if we discover an error every 10 to 20 pages. An accurate transcription requires skill and concentration, either in keying it in or in proofreading, yet the standard acceptable accuracy rate used by many large e-text projects is 99.995 percent, or 1 error in every 20,000 characters. Given that an average page has 1,000–2,000 characters, this works out to 1 error every 10 to 20 pages. Spellcheckers, while useful in catching some misspellings, cannot catch typographic errors or inaccurate OCR that result in correctly spelled words, and they are also little help with dialect or creative misspellings that authors employ. Spellcheckers exist for only a limited number of languages as well, making them generally not useful for the work of correcting electronic texts. Instead, it is common for large projects to outsource the creation of electronic text to vendors. Many of these vendors use a technique whereby two or three typists work on the same text. The typists' work is collated, with any discrepancies between the versions being used to find errors. This method produces a high accuracy rate, because the chance of two or three typists making the same error in the same spot in a text is statistically low.
Using Electronic Text
The use of online collections, even collections of relatively obscure writers, remains surprisingly high. As noted above, one group overlooked by most early computing humanists is the general reader, someone willing and interested to read something online, remarkable considering the relatively primitive displays, unsophisticated interfaces, and slow connections available. These general readers may have many different reasons for using electronic texts – they may lack access to research libraries; they may prefer to find and use books on their computers, in their homes or offices; they may live in countries without access to good collections in foreign languages; the books themselves may be fairly rare and physically available at only a handful of libraries. Whatever the motivation may be, these readers are finding and using electronic texts via the World Wide Web.
Jerome McGann, noted scholar, editor, theorist, and creator of the Dante Gabriel Rossetti Archives, places much more value on this aspect of access than does Hockey. A great change has swept through literary studies in the past twenty years, as scholars re-evaluate writers and works overlooked by previous generations of literary critics. The immediate challenge for anyone interested in re-evaluating a little-known text will be to find a copy, for it may be available in only a few libraries. For McGann, collections of electronic texts may be most valuable for holding works by those writers who are less well known, or who are considered minor. In analyzing the Chadwyck-Healey English Poetry Database, McGann explains it this way:
For research purposes the database grows less and less useful for those authors who would be regarded, by traditional measures, as the more or the most important writers. It's most useful for so-called minor writers. This paradox comes about for two reasons. On one hand, the poetical works of "minor" writers are often hard to obtain since they exist only in early editions, which are typically rare and can be quite expensive. By providing electronic texts of those hard-to-acquire books, "The English Poetry Database" supplies scholars with important primary materials. On the other hand, the policy of the Database is – wherever possible – to print from collected editions of the poets as such editions exist. The better-known the poet, the more likely there will be collected edition(s) …. The Database would have done much better to have printed first editions of most of its authors, or at any rate to have made its determinations about editions on scholastic rather than economic grounds. But it did not do this.
[…] Speaking for myself, I now use the Database in only two kinds of operation: as a vast concordance, and as an initial source for texts that we don't have in our library. In the latter case I still have to find a proper text of the work I am dealing with.
(McGann 1995: 382–3)
The initial hindrances to reading works by lesser-known writers, perhaps insurmountable in the past, can be much more easily overcome in this new medium. It should be noted that even such a digitally adept scholar as McGann prefers printed sources over electronic versions (or at least, still did in 1996). He used the online collection for discovery and research, but turned to the printed copy for verification and citation.
Many problems face a scholar wishing to use electronic texts for research. The immediate problem is in discovering just what is available. Traditionally, a researcher would check the library catalogue to discover whether a particular title or edition is available in the stacks. Given the growing number of online collections, it is impossible to be aware of all relevant sources for any given research topic, and even the specialized portals such as Voice of the Shuttle have fallen behind. As it stands now for most online projects, researchers must remember that a particular website has works by Charles Dickens or Margaret Oliphant to find electronic editions by these authors, for they may be found neither in online catalogues nor by using search tools such as Google. Libraries could have records for electronic texts in their online catalogues also and link directly to the electronic editions. However, few libraries have included records for all titles available in, for instance, the English Poetry Database from Chadwyck-Healey even if they have acquired the full-text database for their communities. This situation is worse for those texts that are freely available, for the job of discovering and evaluating electronic texts, and then creating and maintaining links to them, is overwhelming.
There is a great deal of interest in improving this situation. Libraries are beginning to include records for electronic texts in their online catalogues. Other developments, such as the Open Archives Initiative, could allow the discovery of the existence of electronic texts much more readily than at present. Methods are under development for searching across collections stored and maintained at different institutions, meaning that someone interested in nineteenth-century American history could perform one search that would be broadcast to the many sites with collections from this period, with results collected and presented in a single interface.
Another problem is the artificial divide that exists between those collections available from commercial publishers and those that are locally created. Generally, these two categories of materials use different interfaces and search systems. This distinction is completely irrelevant to researchers and students, but systems and interfaces that allow for searching both categories of materials simultaneously and seamlessly, while available, are still rare, due to the complexities of authentication and authorization.
Another type of divide exists as well. With the growing body of collections available from commercial publishers, the divide between the haves and have-nots in this area is growing. At a recent conference on nineteenth-century American literature, it was notable that graduate students at research universities had access to a wide range of commercially published electronic text collections, while many of their colleagues, recently graduated with first jobs at smaller institutions, did not. These untenured scholars may not need to travel to institutions to see original documents any more, but they will continue to need support to travel to institutions that have access to licensed collections of electronic texts. There is hope for these scholars, however, as libraries and museums digitize wider portions of their collections and make them publicly available.
A much more complex problem is the limited range of electronic texts that are available. A crazy patchwork quilt awaits any researcher or reader willing to use electronic texts, and as McGann points out, the selection of proper editions may not be given much thought. The situation resembles a land rush, as publishers, libraries, and individuals seek to publish significant collections. The number of freely available texts, from projects such as Making of America, to the Wright American Fiction project, to the University of Virginia E-Text Center, to the Library of Congress's American Memory, is growing at a phenomenal pace.
Commercial publishers are now digitizing large microfilm collections such as Pollard and Redgrave, and Wing (Early English Books, published by Bell and Howell/UMI), and Evans (Early American Imprints, by Readex), and Primary Source Media has recently announced its intention to digitize the massive Eighteenth Century collection. The English Poetry Database, first published in 1992 and one of the earliest efforts, used the New Cambridge Bibliography of English Literature, first published in 1966 based on an earlier edition from the 1940s, as the basis for inclusion. The collections listed above are based upon bibliographies begun in the 1930s. Those collections of early literature such as the Early English Books Online can claim comprehensive coverage of every known publication from Great Britain before 1700, but as these online collections include publications closer to the present time, the less inclusive they can be, given the exponential growth of publishing. Thus, large collections such as the English Poetry Database have to employ selection criteria. Critics have debated these traditional collections, known informally as the "canon", and argued for broader inclusion of women or authors in popular genres. These electronic collections, while very large and inclusive, reflect the values and selection criteria of fifty years ago or more.
Some libraries are developing their own digital collections, and McGann suggests it is no coincidence. He sees in literary studies two movements: a return to a "bibliographical center" and a "return to history":
The imperatives driving libraries and museums toward greater computerization are not the same as those that have brought the now well-known "return to history" in literary and humanities scholarship. Nevertheless, a convergence of the twain has come about, and now the two movements – the computerization of the archives, and the re-historicization of scholarship – are continually stimulating each other toward new ventures.
(McGann 1995: 380)
Even with this phenomenal growth, one realizes immediately the inadequacy of using these collections for comprehensive research. Scholars still cannot assume that their fields have an adequate collection of electronic texts available, nor can they assume that those collections that do exist will reflect current thinking about inclusion. McGann stated in 1996, something still true, "the Net has not accumulated those bodies of content that we need if we are to do our work" (p. 382). The collection size of a research library numbers in millions of volumes. The size of the collective electronic text collections available through both commercial publishers and freely available websites probably exceeds 200,000, but not by much. Scholars, students, and librarians are learning that the collections will have to grow considerably in order to reliably meet the needs of a broad range of humanists.
As McGann states, these collections and editions were chosen in large part because they are in the public domain and free of copyright restrictions. The commercial publishers listed above started as microfilm publishers, and their microfilm collections were formed by the same principle. Copyright is the hidden force behind most electronic text collections. Very few electronic text collections, even those from commercial publishers, contain publications under copyright. This has two main effects on electronic collections. First and foremost, it means that most digital collections consist of authors who lived and published up to the twentieth century; the works of writers after that may still be under copyright, and therefore more difficult and perhaps expensive to copy and republish. Second, new editions of these pre-twentieth-century writers are generally excluded, with projects and publishers selecting those in the public domain. Finally, contemporary works of literary criticism, biography, and theory, that could provide needed context and interpretation to the primary literature, also remain largely excluded. The possibilities inherent in the medium, for providing a rich context for the study of primary literary texts and historical documents, have not yet been realized.
The effect of copyright means that researchers and students interested in twentieth-century and contemporary writing are largely prevented from using electronic text. A quick check of the online MLA Bibliography shows the number of articles published after 1962 dealing with twentieth-century authors is nearly double that of all other centuries combined, covering 1200–1900 ce. With the majority of researchers and students interested in writers and their work after 1900, it is no wonder that they may consider electronic text largely irrelevant to their studies.
In addition, many of the markers traditionally used by scholars to determine the merit of any given electronic text are missing. There may be no recognizable publisher, no editor, no preface or statement of editorial principles, which may cause scholars to shy away. They are left to their own devices to judge the value of many resources. On the other hand, in this unfamiliar terrain, there may be a tendency to trust the technology in the absence of such markers. Electronic text collections do not receive reviews as frequently as those published commercially, perhaps for some of these same reasons, although they may receive more use. Even in those reviews, one notes an uncritical trust on the part of reviewers. For instance, I assert on the website of the Victorian Women Writers Project (VWWP) that it is "devoted to highly accurate transcriptions" of the texts. I am alternately amused and alarmed to see that phrase quoted verbatim in reviews and websites that link to the VWWP (e.g., Burrows 1999: 155; Hanson 1998; McDermott 2001) without any test of its accuracy. Scholars who use these collections are generally appreciative of the effort required to create these online resources and reluctant to criticize, but one senses that these resources will not achieve wider acceptance until they are more rigorously and systematically reviewed.
It is difficult to find scholarly articles that cite electronic text collections as sources, or discuss the methodology of creating or using e-texts, outside of journals for computing humanists. Humanists have been slow to accept electronic texts for serious research, for the reasons explained above, particularly their scattershot availability, and the inadequate documentation of sources and editorial practices used in creating them. In an article taken from a major scholarly journal almost at random (Knowles 2001), the author investigates the use of the word "patriot" throughout the seventeenth century. He cites familiar authors such as Milton, Dryden, Bacon, Jonson, Sir Winston Churchill, as well as less familiar authors. Electronic text collections such as Chadwyck-Healey's English Poetry Database or ProQuest's Early English Books Online would be perfectly suited for this purpose – McGann's "vast concordance", with the ability to search across thousands of works, leading to both familiar and unfamiliar works containing the term. However, Knowles does not mention these, although it is possible that he used them behind the scenes. At any rate, it is rare to find them cited in scholarly articles or books, and thus their use and importance goes unnoticed and untested. Scholars need to hear more from their peers about the use of these resources in their research.
Electronic texts have an important place in classroom instruction as well. Julia Flanders points to a survey conducted by Women Writers Online in which a majority of professors responded that they "were more likely to use electronic tools in their teaching than in their research" (Flanders 2001: 57). The same problems described above would apply, but she posits that for students, unfamiliarity is an opportunity rather than an obstacle, giving them the chance to interact directly with the texts independent of contextual information and prior interpretation. The form of interaction offered by e-texts, with the opportunity to quickly explore words and terms across a large body of works, is vastly different than that offered by print texts. This approach to literary studies, closely related to traditional philology developed in the nineteenth century, is more common in classical studies, and may account for the successful adoption of computers and digital resources among classicists and medievalists in comparison to those who study and teach later periods.
One assumption underlying much of the criticism of Olsen and Finch noted above is that the computer will in some way aid in literary criticism. Beginning with Father Busa, up to the creation of the World Wide Web, computer-aided analysis was the entire purpose of electronic texts, as there was no simple way for people to share their works. More recent books (Sutherland 1997; Hockey 2000) have shifted focus to the methods needed to create and publish electronic texts, leaving aside any discussion of their eventual use.
Still, Hockey and others point to the inadequacies of the tools available as another limitation to wider scholarly acceptance of electronic texts. Early electronic texts, such as the WordCruncher collection of texts, were published on CD-ROM along with tools for their use. While there are a multitude of problems associated with this kind of bundling, it certainly made entrance to use very easy. Today, many collections of electronic texts contain tools for simple navigation and keyword searching, but little else. As Hockey notes (2000: 167), "[c]omplex tools are needed, but these tools must also be easy for the beginner to use", which explains in part why few exist. Another researcher, Jon K. Adams, writes:
Like many researchers, I have found the computer both a fascinating and frustrating tool. This fascination and frustration, at least in my case, seems to stem from a common source: we should be able to do much more with the computer in our research on literary texts than we actually do.
(Adams 2000: 171)
In some sense, while the World Wide Web has sparked development and distribution of electronic texts in numbers unthinkable before it, the tools available for using these texts are of lesser functionality than those available through the previous means of publishing, such as CD-ROMs. This is perhaps due to the lack of consensus about the uses of electronic texts, as well as the difficulty of creating generalized text analysis tools for use across a wide range of collections. The acceptance of more sophisticated automated analysis will remain limited until more sophisticated tools become more widely available. Until then, the most common activities will be fairly simple keyword searches, and text retrieval and discovery. These types of research can be powerful and important for many scholars, but do not begin to tap the potential of humanities computing.
Still, even noting their inadequacies, these collections are finding audiences. The E-text Center at the University of Virginia reports that their collections receive millions of uses every year. Making of America at the University of Michigan reports similar use. The Victorian Women Writers Project (VWWP), which contains about 200 works by lesser-known nineteenth-century British women writers, receives over 500,000 uses per year. The most popular works in the VWWP collection include such books as Caroline Norton's study English Laws for Women, Vernon Lee's gothic stories Hauntings, and Isabella Bird's travel writings. These writers are better known than most of the others in the collection. Nevertheless, the most heavily used work in the collection is Maud and Eliza Keary's Enchanted Tulips and Other Verse for Children, which receives thousands of uses per month. This is one indication that scholars are not the only audience for electronic text. In the end, the general reader, students, and elementary and secondary school teachers, particularly those in developing countries without access to good libraries, may be the largest and most eager audience for these works.
Electronic texts give humanists access to works previously difficult to find, both in terms of locating entire works, with the Internet as a distributed interconnected library, and in access to the terms and keywords within the works themselves, as a first step in analysis. As reliable and accurate collections grow, and as humanists come to understand their scope and limitations, the use of e-texts will become recognized as a standard first step in humanities research. As tools for the use of e-texts improve, humanists will integrate e-texts more deeply and broadly into their subsequent research steps. Until then, electronic texts will remain, in Jerome McGann's words, a "vast concordance" and library with great potential.
Bibliography
Adams, Jon K. (2000). Narrative Theory and the Executable Text. Journal of Literary Semantics 29, 3: 171–81.
American Memory. Washington, DC: Library of Congress. Accessed October 13, 2003. At http://memory.loc.gov.
ARTFL: Project for American and French Research on the Treasury of the French Language. Mark Olsen, ed. University of Chicago. Accessed October 13, 2003. At http://humanities.uchicago.edu/orgs/ARTFL.
Bailey, Richard W. (ed.) (1982). Computing in the Humanities: Papers from the Fifth International Conference on Computing in the Humanities. Amsterdam: North Holland.
Baker, Nicholson (2001). Double Fold: Libraries and the Assault on Paper. New York: Random House.
Birkerts, Sven (1994). The Gutenberg Elegies: The Fate of Heading in an Electronic Age. Boston: Faber and Faber.
Bornstein, George and Theresa Tinkle (1998). Introduction. In The Iconic Page in Manuscript, Print and Digital Culture, ed. Bornstein and Tinkle (pp. 2–6). Ann Arbor: University of Michigan Press.
Burrows, Toby (1999). The Text in the Machine: Electronic Texts in the Humanities. New York: Haworth Press.
Busa, Roberto (1950). Complete Index Verborum of Works of St Thomas. Speculum 25, 3: 424–5.
Busa, Roberto (1992). Half a Century of Literary Computing: Towards a "New" Philology. Literary and Linguistic Computing 7, 1: 69–72.
Bush, Vannevar (1945). As We May Think. The Atlantic Monthly (July): 101–8.
Canterbury Tales Project. Peter Robinson, ed. De Montfort University. Accessed October 13, 2003. At http://www.cta.dmu.ac.uk/projects/ctp/.
Coover, Robert (1992). The End of Books. The New York Times Book Review (June 21): 1, 4.
Early American Fiction. ProQuest/Chadwyck-Healey. Accessed October 13, 2003. At http://www.e-text.virginia.edu/eaf/.
Early English Books Online Text Creation Partnership. University of Michigan Library. Accessed October 13, 2003. At http://www.lib.umich.edu/eebo/.
Electronic Text Center. Alderman Library, University of Virginia. Accessed October 13, 2003. At http://e-text.virginia.edu.
Evans Early American Imprint Collection Text Creation Partnership. University of Michigan Library. Accessed October 13, 2003. At http://www.lib.umich.edu/evans/.
Finch, Alison (1995). The Imagery of a Myth: Computer-Assisted Research on Literature. Style 29, 4: 511–21.
Flanders, Julia (2001). Learning, Reading, and the Problem of Scale: Using Women Writers Online. Pedagogy 2, 1: 49–59.
Hanson, R. (1998). Review, Victorian Women Writers Project. Choice 35 (Supplement): 88.
Hockey, Susan (1980). A Guide to Computer Applications in the Humanities. Baltimore: Johns Hopkins University Press.
Hockey, Susan (2000). Electronic Texts in the Humanities: Principles and Practices. Oxford: Oxford University Press.
Knowles, Ronald (2001). The "All-Attoning Name": the Word "Patriot" in Seventeenth-century England. Modern Language Review 96, 3: 624–43.
Lusignan, Serge and John S. North, (eds.) (1977). Computing in the Humanities: Proceedings of the Third International Conference on Computing in the Humanities. Waterloo, ON: University of Waterloo Press.
Making of America. Ann Arbor: University of Michigan. Accessed October 15, 2003. At http://moa.umdl.umich.edu.
McDermott, Irene (2001). Great Books Online, Searcher 9, 8: 71–7.
McGann, Jerome (1996). Radiant Textuality. Victorian Studies 39, 3: 379–90.
Model Editions Partnership. David Chestnut, Project Director, University of South Carolina. October 13, 2003 http://mep.cla.sc.edu.
Olsen, Mark (1993–4). Signs, Symbols, and Discourses: A New Direction for Computer-aided Literature Studies. Computers and the Humanities 27, 5–6: 309–14.
Parks, Tim (2002). Tales Told by a Computer. New York Review of Books 49:16 (October 24): 49–51.
Schaffer, Talia (1999). Connoisseurship and Concealment in Sir Richard Calmady. Lucas Malet's Strategic Aestheticism. In Talia Schaffer and Kathy Alexis Psomiades (eds.), Women and British Aestheticism. Charlottesville: University Press of Virginia.
Sutherland, Kathryn (ed.) (1997). Electronic Text: Investigations in Method and Theory. New York: Oxford University Press.
Text Encoding Initiative. University of Oxford, Brown University, University of Virginia, University of Bergen. Accessed October 13, 2003. At http://www.tei-c.org.
Thesaurus Linguae Graecae (1999). CD-ROM. Irvine, CA: University of California, Irvine.
Victorian Women Writers Project. Perry Willett, ed. Indiana University. Accessed October 13, 2003. At http://www.indiana.edu/~letrs/vwwp.
Wilkin, John P. (1997). Just-in-time Conversion, Just-in-case Collections: Effectively Leveraging Rich Document Formats for the WWW. D-Lib Magazine (May). Accessed October 13, 2002. At http://www.dlib.org/dlib/may97/michigan/05pricewilkin.html.
Wisbey, R. A., (ed.) (1971). The Computer in Literary and Linguistic Research: Papers from a Cambridge Symposium. Cambridge: Cambridge University Press.
Women Writers Online Project. Julia Flanders, ed. Brown University. Accessed October 13, 2003. At http://www.wwp.brown.edu.
The WordCrumher Disk (1990). CD-ROM. Orem, Utah: Electronic Text Corporation.