Problems With Archiving Electronic Materials May Make 20th Century "Worst-Documented" Period In History

Release Date: November 23, 1998 This content is archived.

Print

BUFFALO, N.Y. -- The electronic age has transformed nearly all fields of human endeavor and one of the conundrums in its wake -- how to preserve historic records -- turns out to be enormous and ironic.

Christopher Densmore, archivist at the University at Buffalo, speaks for an international network of archivists when he says that because of the explosion in information technologies, the late 20th century will be one of the worst-documented periods in history.

The problem, he says, is that the preservation of information produced and stored in digital form is far more difficult, time-consuming and expensive than it is to save documents on paper and microfiche. Continuing improvements in electronic media of all kinds have provoked legal, organizational and financial nightmares for archivists, librarians, museums and other information administrators.

"New technologies allow us to produce, alter and dispose of records and documents with unusual efficiency and facility," Densmore acknowledges.

There is much expense and many complications involved in their preservation. However, he and his colleagues have serious concerns about the stability, longevity and historical significance of computer-generated and electronically filed documents.

They warn that however jacked-in you are or however many killer software applications come down the pike, if you want to insure that your work product and process will be available to future scholars, it's a good idea to save it in hard copy.

It's impossible, of course, for paper copies to capture the nature of many complex, ephemeral, colorful, often-animated and scored electronic documents with their hypertext references and links to a daunting phalanx of Web sites around the world. This indicates the enormous difficulty of archiving such records for historical, legal and other purposes. Still, a prodigious effort has been launched by archivists to avoid future problems.

In defining the problem, experts say that as we increasingly digitize research, literature, journals, financial and tax records, legal documents, family photos and even those email love letters, we should be aware of two facts in particular.

"One," Densmore says, "is that digital information is extremely fragile. Little is known about the stability even of old technologies like magnetic tape, which lasts only about 10 years. Much less is known about the generations of disks (floppy or hard) and CDs that have evolved so far." It is known, however, that magnetic impulses deteriorate and that various coatings and physical materials used in these products degenerate at different rates under different conditions.

"Second," he adds, "little is known about how to retrieve information from the hundreds of different kinds of obsolete hardware and software that produced and now store millions of significant documents. They were, no doubt, stored this way with the assumption that they would be readily and permanently available for use, which turns out not to be the case. It is safe to assume that today's documents will be equally difficult to retrieve using tomorrow's hardware and software."

Citing a 1996 report on online electronic documents and distributed databases produced by the SUNY Office of Archives and Records Management, Densmore notes that electronic information systems are not inherently designed to serve as record-keeping systems in the archival sense of that term.

"No one knows how stable electronic files are or how long they'll last," he says. "Right now, the average book published on acid-free paper by university presses and stored in a library is expected to remain useable for 500 years. That's the archival standard for paper documents, photographs and microfilm. So material stored today in those formats will be available to our progeny in the year 2498.

"Contrast this with the stuff of floppies, which, with great care, might last until 2028 and, without care, only 10 years, or until 2008. Optical disks might survive intact until 2058," he says. "No one really knows for sure."

As Jeff Rothenberg noted in Scientific American four years ago, today's CDs may last for 30 years and tomorrow's DVDs (the next generation of CDs) may last for 50. Even if such materials are stable for hundreds of years, however, they won't be readable unless the hardware and the operating system that produced them are available.

Given the variety of hardware and software programs that have been heralded and then discarded over the years, Rothenberg is describing a colossal retrieval headache.

Densmore agrees.

"Already, information produced by now-defunct software on retired hardware can't be read by today's computers," he says. "And it is very difficult to find an old computer that can read it for several reasons. Not only may the document be readable only by a specific generation of old computer, but by vintage software that may not be available any more.

"Of course, today's computers and programs will eventually be defunct themselves, soon enough, raising questions about the viability of documents being produced as I speak. On top of that, although most disks have their interface system on the disk itself, computer-operating systems also degenerate, so although they may look like they work, they may be useless."

Densmore acknowledges that technological changes in records production have been an issue among archivists long before the dawn of the new electronic media.

"Archivists like to have the authentic original document in their collection for evidentiary reasons," Densmore points out.

He explains that a file kept by a particular office or official is the "official file" and, archivally speaking, ideally contains the original documents produced by that office or individual. If you lose control of that original, official material -- if, because of copying, the copy (not the original) is in the file or if copies went to everyone, then everybody's got the file or parts of the file and no one may have the original documents.

"Years ago, mimeographing and photocopying, for instance, raised problems because they produced multiple copies of records, which later were found in the hands of many people. Now it's possible that all the copies are identical to the original," Densmore says. "It's also the case that the original or its "copy" could be altered -- perhaps officially -- and copied again, making it very difficult to identify and authenticate the original document."

Today, he says, printers produce virtually identical copies, with none of the degeneration manifested by mimeos and Xerox copies. So it is also almost impossible to identify the original document at all. They all have the same characteristics.

"Also, because documents are frequently mailed electronically," he says, "the archivist may not know who received copies and was, therefore, in on a decision. The original item also may have been edited electronically, making it very difficult to document the process by which the decisions were made to change a curriculum, build a science building, promote a professor."

New issues that confront archivists are perplexing and difficult to resolve. One of them is the question of whether libraries should maintain a collection of equipment and operating systems that can read old electronic materials. This would be a formidable task.

Densmore cites another new problem to be aware of as well. Today, photographs taken on digital cameras are stored on zip drives or some other media. They take up a great deal of storage space and so the data often are discarded to make room for storage of a new digital photo, so the primary document is gone, even as it's used. He warns that published versions of such photos are not nearly as reproducible as traditional negatives and even if saved, the disks may not be readable in the not-so-distant future.

Even more difficult to deal with is the electronic manipulation of photographs that are then used to "document" an activity or person in a more attractive form.

"The result may be more appealing visually," Densmore admits, "but if you go around straightening or whitening teeth, moving trees around, changing hair color or adding characters to a scenario, altering bodies to conform to current standards of attractiveness, then you are no longer documenting fact, but producing an aesthetic, but inaccurate, document that may not reflect reality at all. That's fine for advertising, I suppose, but historians trying to keep a record need to know how accurate these things are."

Finally, in reviewing the scope of the changes that confront archivists in this regard, he says that the biggest problem may not be technology, planning or the availability of trained personnel, but the cost of these undertakings.

"It is an expensive and enormously complex task to maintain old hardware and software so library users in 2030 can read what a professor typed into his Mac Classic six years ago or into several incarnations of Dell PCs from 1994," he admits.

"It is also expensive to regularly migrate vast bodies of software to new generations of technology and impossible to maintain the depth of reference of the hypertext originals.

"So we'll have to carefully assess these costs and compare them to the costs of traditional methodologies as we set about to digitize the entire contents of a library, for instance, or accept archival materials in digital form, which will require expensive upkeep."

EDITOR'S NOTE: Christopher Densmore can be reached in the University Archives at the University at Buffalo at 716-645-2916. In addition to offering perspectives on the challenges of preserving electronic documents, Densmore can discuss specific steps that can be taken to preserve and maintain both institutional and personal electronic documents.

Media Contact Information

Patricia Donovan has retired from University Communications. To contact UB's media relations staff, call 716-645-6969 or visit our list of current university media contacts. Sorry for the inconvenience.