Archival Digitization and the Struggle to Create Useful Digital Reproductions | Archive and Public Culture Research Initiative

Posted on January 7, 2014
The past decade has fundamentally changed how archives provide access to historical records. Many archives now provide digital access to collections, have digitization on demand services, and have started to prioritize collections for digitization. Much of this digitization has been driven by funding bodies and a desire to increase accessibility to collections. But how has the digitization of archival records been received by historians, genealogists, and other patrons?

One of the initial challenges presented by the digital representation of archival sources is the need to preserve context. Original order and provenance are fundamental archival arrangement principals which help maintain context within collections. Original order allows for connections between records to be illuminated and provenance describes the origins of archival records. However, many archives have struggled to replicate the physical experience of browsing through an archival box in a digital environment. At times this challenge has resulted in the loss of context or inability to determine original order online.

Some researchers are weary of the digital record being a true representation of the physical record. For example, an institution might make the decision to only scan the front of photographs for online consumption. Any notations on the rear of the photograph are then inputted into a notes field. The notation is still preserved but it's not displayed in its original form. The transcription process used to enter the notation as metadata may include interpretations of handwriting, short forms, etc. The very process of transcription is subject to interpretation and human error.

Including of a set of transcription standards on an archival website may help eliminate some of the worry about improper transcription or material being left out. However many researchers prefer both a transcribed version and digital copy of a handwritten archival record which allows them to compare the transcription with the original.[1]

For cases of archival sources that are in machine readable text, using OCR software can eliminate some transcription errors and allow the documents to be full text searchable, which can be invaluable for researchers. That being said, OCR isn't perfect and errors tend to be most evident when OCR is used on older texts with irregular or faded typefaces. Correcting OCR errors can be extremely time consuming for archival staff and is something many organizations simply don't have the time or staffing to devote to.[2]

Another common concern relating to the digitization of archival sources comes from the quality of the digital surrogates. If the digital copy is blurry or marked is this a product of a poor scan or a true representation of the record?

How can the researcher be sure that all marginalia and annotations are represented in the digital copy? The original photograph above includes a notation that the photograph was taken by William Dunlop in Sault Ste Marie. But in the above representations of the photograph this note is missing. Including information about the scan resolution, condition of the original, notes, and any digital editing done can help alleviate these concerns. Similarly, including a border around the scan allows researchers to be sure that the entire document or photograph has been scanned and hasn't been cropped by someone during digitization.

How can archivists and other heritage professionals work with researchers and historians to make the best use of digitization? Explaining how records are digitized, transcribed, and presented online helps mitigate many concerned regarding authenticity of reproductions. In cases where only selections of a collection have been digitized, archives should be clear that only some of the material is available online and how the remaining material can be accessed.

Many digitization workflows include spot checking, reviews of content before it is published online and other checks and balances. That being said, human error happens. The Art of Google Books project is an amusing example of small errors in mass digitization, namely images of scanner operators' hands found in digitized books. Pages can be missed in scanning and metadata can be inputted incorrectly.

Archivists often work with expert researchers to correct processing errors and improve collection descriptions. For example, many archives have identified photographs based on patron knowledge and corrected dates or attributions based on patron research. Digital archival platforms, which allow easy communication between archival staff and researchers, helps facilitate this and can strengthen archival description.

Digitization of traditional archival material is something many archives and heritage institutions are working to integrate into their day to day practices. Understanding what patrons want and how they use existing digitized material is crucial to creating digitization programs which are effective and practical.

Krista McCracken is a Researcher/Curator at Algoma Unviersity's Shingwauk Residential Schools Centre. She is a co-editor at Activehistory.ca

Source: Activehistory.ca website

Notes:

[1] Alexandra Chassanoff, 'Historians and the Use of Primary Source Materials in the Digital Age,' The American Archivist, vol. 76, no. 2. (Fall/Winter 2013), pp. 458-480.

[2] For those interested in the use of OCR in archives: Larisa K. Miller's 'All Text Considered: A Perspective on Mass Digitizing and Archival Processing' is an good example of a new take on OCR use in archives. Miller's article suggests a drastic shift in how machine readable archival records are digitized and made available, by eliminating archival processing and finding aids and shifting to mass digitization and OCR.