Slap It on the Scanner

Posted by dwythe in Articles
On March 3, 2009, the building housing the Cologne Municipal Archives collapsed, crushing or drowning 50,000 charters dating back to the Middle Ages, documents salvaged from buildings bombed in the Second World War, the private papers of Nobel-winner Heinrich Böll, and much more. Among the comments on news articles at the time were many along lines of “Well, why wasn’t it all just digitized?” Digitization underpins The Commons, so we’ve asked just that question within the Commons community. Here’s one answer.


Why isn’t everything digitized yet?

Epson V700, by Ryner12

(by Ryner12)

Archivists hear this question a lot, usually in conjunction with “Can’t you just slap it on the scanner?” And the quick and dirty approach may work for a few things, or even a few hundred, but it doesn’t really scale. Say you’ve got a collection of 1,000 photographs — you can scan them, give them some sort of file names, and create a web page to display them. It kind of works if they’re all related in some way, you give them tags or let others tag them, and you have some sort of organized way of storing the masters. That’s the way a lot of archives started digitizing their collections.

But then you digitize a second collection, and a third, and a fourth, and suddenly you’ve got five or ten thousand images to manage. And maybe (hopefully) you’ve got a database with metadata that describes the images and the files. And in addition to the collections, which have some kind of internal logic, you’ve also got those miscellaneous things that you scanned for special projects or individual requests.

Brooklyn Museum's ScanLab

Brooklyn Museum's ScanLab

At that point, you’re thinking about implementing a digital asset management software system — expensive, time-consuming, but really necessary after you hit a certain volume. And you’re working with your IT staff to make the flow of images to your website happen automatically without custom coding and design, and to make it easy for people to find things. And, if you’re really thinking volume (which you’ll have to be, if your goal is to digitize everything), you’re adding staff and designing efficient workflows to move from analog to website smoothly.

OK, so this is all doable (I know, because we’ve been doing it). But what’s it going to take to get everything digitized? First of all, photographs and museum objects (where a lot of us started) are easy: each one is a unique item and, while there may well be relationships with other images (context, in archives speak), the interrelationships are less critical than for other archival materials.

Aerial Mosiacs Case, OSU Archives

Aerial Mosiacs Case, OSU Archives

Think, for example, of a folder of letters. One letter is of interest, but you really need to read the entire flow of the correspondence and read it in chronological order to get the most out of it. A letter may have multiple pages and it may have attachments like drawings or notes. The folder may be related to other folders and you may need to know something about who the players are and why the files were created. All the standard business of archives and archivists – who have great ways of dealing with these things in the analog world — but translating that into digitized collections is difficult. Most software packages don’t deal with the hierarchies (“this item is part of this larger thing which is part of this even larger group and they all share these common elements”) or interrelationships well at all.

Archivists have been working on the issues for several years and have come up with ways to deal with “complex digital objects” — the METS standard, for example, lets you create XML “wrappers” around related digital files — but this is far from “slap it on the scanner.” We’re all getting very friendly with our technology folks … and if we don’t have them, we’re either spending lots of money on consultants, learning it ourselves, or just sticking to the simple stuff.

The scale of “getting everything digitized” is just mind boggling. In our small archives at the Brooklyn Museum, we have about 1,600 feet of documents, photographs, negatives, ledger books — just about any analog format you can imagine, covering the Museum’s history from 1823 to the present. Here’s the math: at an estimated 3,000 documents per foot, that’s 4.8 million items. Even if you could scan, describe and process 30 per hour (highly unlikely), that’s 160,000 hours of work, or 20,000 eight-hour workdays.

If you saved a 20 Mb master file (or even half that size) for each of those 4.8 million documents, that’s serious storage and backup, not to speak of long-term management and preservation. And that’s just for a really, really small repository!

So, we all have to make choices. We look for the things that are most interesting to our community; most fragile and in need of a digital version to reduce handling; that relate to other things we’ve digitized or others have; and (frankly) that funders are likely to support.


Deborah Wythe is Head of Digital Collections and Services at the Brooklyn Museum, in Brooklyn, New York.

Tags: ,

4 Responses to “Slap It on the Scanner”

  1. Preserving and digitizing our past can be a daunting project Says:

    [...] types of projects, I am beginning to understand how huge some of these undertakings can be.  Just read the blog from Deborah Wythe, who is Head of Digital Collections and Services at the Brooklyn Museum, in [...]

  2. Paula Bray Says:

    ‘We look for the things that are most interesting to our community’

    This resonates with me a lot at the moment. Deb is right. We can never digitise everything. So we make decisions on what to digitise. But are we making the right choices? Many institutions digitise content with no copyrights restrictions, or around current exhibitions and projects, which is fine but the representation of the collections can become skewed. We should be looking towards the model of digitising for what our audience wants too and what has multiple outputs. This is something we (Powerhouse Museum) are learning from the Flickr community.

  3. Why isn’t everything digitized yet? « WITNESS Media Archive Says:

    [...] everything digitized yet? Jump to Comments A few weeks ago Indicommons featured an excellent blog post  by Deborah Wythe,  Head of Digital Collections and Services at the Brooklyn Museum.  She poses the question many [...]

  4. indicommons» Blog Archive » Brooklyn Museum focuses on copyright Says:

    [...] Wythe, Head of Digital Collections and Services (and a past contributor to Indicommons), writes in the Brooklyn’s blog about the complex decision-making and time-consuming [...]

Leave a Reply