On March 3, 2009, the building housing the Cologne Municipal Archives collapsed, crushing or drowning 50,000 charters dating back to the Middle Ages, documents salvaged from buildings bombed in the Second World War, the private papers of Nobel-winner Heinrich Böll, and much more. Among the comments on news articles at the time were many along lines of “Well, why wasn’t it all just digitized?” Digitization underpins The Commons, so we’ve asked just that question within the Commons community. Here’s one answer.
Why isn’t everything digitized yet?
Archivists hear this question a lot, usually in conjunction with “Can’t you just slap it on the scanner?” And the quick and dirty approach may work for a few things, or even a few hundred, but it doesn’t really scale. Say you’ve got a collection of 1,000 photographs — you can scan them, give them some sort of file names, and create a web page to display them. It kind of works if they’re all related in some way, you give them tags or let others tag them, and you have some sort of organized way of storing the masters. That’s the way a lot of archives started digitizing their collections.
But then you digitize a second collection, and a third, and a fourth, and suddenly you’ve got five or ten thousand images to manage. And maybe (hopefully) you’ve got a database with metadata that describes the images and the files. And in addition to the collections, which have some kind of internal logic, you’ve also got those miscellaneous things that you scanned for special projects or individual requests.
Brooklyn Museum's ScanLab
At that point, you’re thinking about implementing a digital asset management software system — expensive, time-consuming, but really necessary after you hit a certain volume. And you’re working with your IT staff to make the flow of images to your website happen automatically without custom coding and design, and to make it easy for people to find things. And, if you’re really thinking volume (which you’ll have to be, if your goal is to digitize everything), you’re adding staff and designing efficient workflows to move from analog to website smoothly.
OK, so this is all doable (I know, because we’ve been doing it). But what’s it going to take to get everything digitized? First of all, photographs and museum objects (where a lot of us started) are easy: each one is a unique item and, while there may well be relationships with other images (context, in archives speak), the interrelationships are less critical than for other archival materials.
Aerial Mosiacs Case, OSU Archives
Think, for example, of a folder of letters. One letter is of interest, but you really need to read the entire flow of the correspondence and read it in chronological order to get the most out of it. A letter may have multiple pages and it may have attachments like drawings or notes. The folder may be related to other folders and you may need to know something about who the players are and why the files were created. All the standard business of archives and archivists – who have great ways of dealing with these things in the analog world — but translating that into digitized collections is difficult. Most software packages don’t deal with the hierarchies (“this item is part of this larger thing which is part of this even larger group and they all share these common elements”) or interrelationships well at all.
Archivists have been working on the issues for several years and have come up with ways to deal with “complex digital objects” — the METS standard, for example, lets you create XML “wrappers” around related digital files — but this is far from “slap it on the scanner.” We’re all getting very friendly with our technology folks … and if we don’t have them, we’re either spending lots of money on consultants, learning it ourselves, or just sticking to the simple stuff.
The scale of “getting everything digitized” is just mind boggling. In our small archives at the Brooklyn Museum, we have about 1,600 feet of documents, photographs, negatives, ledger books — just about any analog format you can imagine, covering the Museum’s history from 1823 to the present. Here’s the math: at an estimated 3,000 documents per foot, that’s 4.8 million items. Even if you could scan, describe and process 30 per hour (highly unlikely), that’s 160,000 hours of work, or 20,000 eight-hour workdays.
If you saved a 20 Mb master file (or even half that size) for each of those 4.8 million documents, that’s serious storage and backup, not to speak of long-term management and preservation. And that’s just for a really, really small repository!
So, we all have to make choices. We look for the things that are most interesting to our community; most fragile and in need of a digital version to reduce handling; that relate to other things we’ve digitized or others have; and (frankly) that funders are likely to support.
Deborah Wythe is Head of Digital Collections and Services at the Brooklyn Museum, in Brooklyn, New York.