Congratulations to Laura Mandell and the eMOP (the Early Modern OCR Project) Initiative at Texas A&M for winning a 2-year, $734,000 grant from the Andrew W. Mellon Foundation. The eMOP project hopes to improve the “digitization, transcription, and preservation of early modern texts” by focusing upon the recognized weaknesses of existing OCR technology in the creation of full-text databases like ECCO and EEBO. As they describe it,
The peculiarities of early printing technology make it difficult for Optical Character Recognition (OCR) software to discern discrete characters and, thus, to render readable digital output. By creating a database of early modern fonts, training the software that mechanically types page images (OCR) to read those typefaces, and creating crowd-sourced correction tools, eMOP promises to improve the quality of digital surrogates for early modern texts.
As I’ve been reflecting upon this project and its implications, I’ve been thinking about the dual imperatives in humanities scholarship of “discovery” and “preservation.” This project serves both.
If this project succeeds, it will allow everyday users of ECCO or EEBO to recognize those images and search results as the result of a historical process, one that preceded and conditioned their own particular search process. Hopefully, it will make that historical process more visible to users, in the same way that a scholarly edition makes the stages of composition of a work visible to readers. That is the preservationist side of the project.
The discovery side of this project comes from the possibility that a more expansive, reliable, and accessible set of texts could generate new questions, methods, and problems for scholars in 18th century studies, and new ways of knowing and understanding this material.
Finally, the very welcome external support given to this project reminds me of the need in historical literary studies for what I’d call “digital infrastructures,” which Laura (reg. req.) and others have argued for, as well. These are the “tools” that make possible the new insights and the new forms of evidence about our period, and they are of course more than mere tools, because they help introduce new concepts into our discussions of historical writing.
Even scholars who do not “do” DH (which is how I’d describe myself) depend upon this kind of infrastructural work, in the same way that we are dependent upon bibliographers, translators, reviewers and many others to gain access to others’ work. This is just another aspect of the collective nature of scholarship, which the shift towards digital resources seems to have accelerated. It makes sense, then, for us to understand where our infrastructures have come from, and to support scholarly efforts to improve and refine them in this way.
It will be exciting to see how this project plays out.