Posted on March 14, 2011

Roger Layton
If you are currently engaged in digitisation, or if you are planning to start a digitization project, then one of the most important questions you need to ask and answer, before you get started, is where will you house all of these digital objects how will they be managed?

It is all too easy to spend bags of money on the digital capture itself, and to then fail to look after these digital objects properly once they are created. Once the project is completed, you will have a large number of these objects, which for most institutions will be digital photographs, and the size of these will be larger than anything else you will ever have had to store, perhaps 10GB -10TB in your digital collection. This collection will be the larger single storage area anywhere in your institution.

You can keep these digital files on DVD or move them onto the file system of a desktop computer or a server, or copy them onto a repository service, but they must be stored somewhere, and you must know where they are, who is looking after them, and more importantly you must know what exactly you have stored. In essence, you need to have an inventory of your digital collection, just like you should by now also have a detailed inventory of your physical collections.

Let me present a few scenarios which highlight some of the associated risks: (1) The person responsible for your digitization project leaves, and the masters are kept on their own computer, which is then put into storage after they leave, and eventually given away, in five years time, to a local school. A few years after that you realize you only have the low-res images on your web site, and not the hi-res images. (2) You create your own repository, using open-source software, but this software goes out of fashion and you need to move it, but there is no easy way to extract the images from the repository because of the file formats that have been used and how the files are embedded into the digital file base. (3) You write all of your collection onto DVDs, and create two copies, one goes into your bank vault, and the other is stored in your institutions safe. After 5 years you look back at these and find that many of the DVDs cannot be read because of errors in the physical materials, and that many of the digital images on these DVDs are lost forever. You try the copies in the bank vault and the same happens. (4) Your repository is created but everyone is using this openly, and accessing the images, creating their own, manipulating these images to change the resolution or size, and you then discover that you do not know which were the original versions and which are the manipulated copies.

I like to explore scenarios of the future, as in the previous paragraph. These are all conjured up from my own experiences in working with others, not only for heritage repositories but for digital records in general, and these present only a few of the risk situations resulting from our need to store these records forever. What we need is a foolproof way to create and maintain a digital repository, as a simple set of rules that can be followed, or as a list of do's and don'ts. The creation of a repository does not itself open up a discourse related to history, this is very much a technical discussion in terms of digital preservation that should not be clouded by other discussions. However, there are some decisions to be made to determine the placement of the repository and also to deal with any perceptions that this placement also implies control of content and limitations of access.

What is a Digital Repository?

A digital repository is a collection of digital objects, which are in most cases digital reproductions of physical archives, books, videos, recordings, photographs and museum artifacts. There is no limit to what can be stored so long as these are objects in digital form, and I have no doubt that the future will bring us a range of new technologies including new forms and formats. It is only recently that we have been able to produce 3D images that can be rotated by the user to see all sides, which is very useful for ceramics. The recent introduction of 3D movies holds the promise that this may become the norm in the future. What is important for the repository is that when these objects are stored they are all simply bits and bytes and stored in files and folders – they all look the same.

A number of important questions need to be raised at the start of building a new repository, and especially if you intend this to be a “trustedâ€Â repository which can be used for long into the future and continue to provide authentic results and outputs. These questions include the naming of the digital objects, how they are packaged with their metadata, what security controls are in place to prevent unauthorised access and usage, and the risks associated with the formats and media used for storage in terms of digital preservation.

One important element of a successful repository is its ability to provide the information is has stored when asked. In general users will ask for information using a number of search terms to find the items of interest, and modern search engines tend to use a full-text search algorithm which is not suited to such repositories. Rather, the existing methods of indexing as used in libraries and archives should be extended through powerful vocabularies to enhance the semantic value of the metadata and hence provide faster and more accurate access to relevant data. These vocabularies should ideally be common property, established at the national level, so that there is a common meaning. The technologies to do this are in their infancy at present and there is no clear world-wide standard or leader in semantic heritage repositories.

How to Create a Digital Repository?

When you are commencing a digitization project you will be creating many digital objects, and you will need to store these as soon as they are created so that they are not lost or misplaced. As a result, you should have your own repository in which these can be stored, and in which they can remain while you work on these and gather metadata, or link these to existing inventories.

I recommend three types of repository. Firstly a temporary repository in which every digital object is stored including those that you may later discard, since it is common for photographers to take many photographs but only use a few selected one. This temporary repository is essentially a file store allowing you to commence the process of organizing your resources. One important tasks within such a temporary store is to link the images and other objects to the physical items themselves since when you are confronted with a file store of thousands of digital images it is a complex task to start this linking process.

The second repository should be your own institutional trusted repository, into which the digital masters are placed for preservation and for access purposes. This must be a professionally-operated repository which aims to protect your investment and to meet your institutional digitization goals.

The final repository should be an external repository, one of the National Digital Repositories, whose goal is to preserve your digital content over the long-term, beyond the lifetime of your own institution and the people and skills within your institution. This is the equivalent of the National Archives or the National Library, which are expected to live forever. It may be that your own institution is itself a National Digital Repository, in which case the second and third repositories identified here can be combined.

Having identified three types of repository it is now necessary to create those repositories within your own institution, and for this it is important to tie up with an organisation that provide this service for you if you are unable to build this yourself. Building these repositories is possible using a wide range of open-source and proprietary software and database tools, but in many cases these are not easy to implement or use, and there is a risk that these themselves may not provide for long-term sustainability of your digital resources.

Where to Locate a Digital Repository?

With all of your digital objects stored into a single digital repository, it is important to choose the right place to locate this. In particular, storing this on a desktop computer of the curator is not acceptable, and in general the server environments of the larger institutions are not suitable unless accompanied by a range of well-run procedures for managing the repository.

One alternative is to locate the repository on the web, or in what is called the “cloudâ€Â storage, so that this is stored outside of your institution, and the storage is then outsourced to a specialist provider. As an analogy, many institutions use storage companies, such as Metrofile, or The Document Warehouse, to store their physical files and computer backup disks, rather than keeping this information on-site.

I recommend that this web-based approach has considerable benefit for the smaller institutions, and specifically those that have significant collections, but insufficient budget to run and maintain their own digital repositories.

How to Manage a Digital Repository?

Every repository must be managed using a range of best practices to ensure that this is not lost and it is protected as best as possible against all future risks. If you cannot ensure this level of management, then seriously consider whether you should be building a repository in the first place.

When we look at recent natural disasters, such as the earthquakes in New Zealand and Japan, it is evident that the damage is so widespread that the losses may not only the physical but may also be the loss of major backup sites and repositories. It is possible to guard against all such disasters, but the more risks we have to consider the more expensive that this may become.

The management of a repository should consider at least the following : how this is protected against losses from natural and man-made disasters, how it is secured against unauthorized access both physical and digital, how it is preserved over the long term against changes in media and format, how changes to the repository are managed using a version control structure, how it is able to prove that it is a trusted repository, how it adapts to changing storage technologies, and how it adapts to changing access requirements.

There is no doubt in my mind that building a repository is the hard part of a digitization projects, and yet is an essential output of such a project. Taking photographs and storing these, and even making these available to users is quite simple, but the development of trusted repositories in which to house these digital resources for the long-term, and to provide secure and authorized access, is a challenge for all institutions, and in particular for those with smaller budgets.

Roger is an Archival Platform correspondent and an IT consultant specializing in the digital heritage and is the creator of the ETHER Initiative. Contact Roger at roger@rl.co.za or view his website www.ether.co.za