Library goes online bit by bit







After digitizing about 1.5 million items from its special collections for the National
Digital Libraries program, the Library of Congress continues to process other items for
Web access at the rate of about 1 million files a year.


In spite of the experience gained in preserving old items by reducing them to bits,
each collection still presents new difficulties.


“There are a lot of practical considerations that make this not a cookie-cutter
operation,” said Jane Mandelbaum, who leads one of the library’s systems
development groups.


The library hired a number of contractors for National Digital Libraries projects. It
also assembled its own tools to produce, display and manage data, but no single platform
so far can handle the complex range of jobs. Different media types, physical conditions
and kinds of cataloging information make each job unique.


“When we started, there were no tools,” said Laura Campbell, the program
director. “My impression is, there are still not enormous numbers of ready-made
solutions.”


Each collection becomes a research project that involves testing new scanners, document
handlers and software, plus prototyping new ways of cataloging, indexing and tracking
information.


The National Digital Libraries program began with the American Memory pilot that
produced a number of CD-ROMs and video disks of Library of Congress materials dated 1990
to 1995. The library demonstrated an early model for online access in 1994, and work on
Web access began in 1995. The results of the American Memory and National Digital
Libraries programs appear on the Web at http://memory.loc.gov/ammem.


Campbell said the special collections hold 80 million items—music, maps,
photographs, prints, manuscripts, typescripts, rare books, and video and audio recordings.


Books cannot be handled in the same way as manuscripts, and a brittle typescript cannot
be handled like a photo or poster. Further complicating the job are varying degrees of
detail and types of cataloging information.


For a digital object to be accessible, it must have metadata to describe it, further
data about its structure and more data about its management within the library system.


“We’re trying to develop a digital system that will track this material for
all time,” said Martha Anderson, a senior digital conversion specialist who
coordinates production.


Another goal is a permanent storage system other than the current one on Unix servers
to accommodate the persistent identifiers as well as the objects themselves. “This is
still very much in development,” Anderson said.


Most of the library’s digitized Web materials as well as the software servers
managing the objects reside on IBM Corp. RS/6000 servers running the AIX operating system.
On a separate RS/6000, the Internet gateway receives and translates Hypertext Transfer
Protocol object requests.


The library relies on a range of hardware and software to get the materials onto the
servers. “You can have a fancy, expensive scanner that scans only fancy, expensive
things,” Mandelbaum said.


In digitizing 2,000 images from its Federal Theatre Project, the library used a
Pro/3000 high-resolution scanner with a PS/2 workstation running the PISA95 scanning
application, all from IBM’s Digital Library product suite.


“That’s a very good scanner for its purpose,” Mandelbaum said. “You
want to use it for scanning manuscripts because a manuscript is usually many pages of
black and white, and there is lots of handling.”


A PS 3000 overhead scanner from Minolta Corp.’s Business Products Division of
Ramsey, N.J., scans the books, which cannot be opened all the way for flat scanning,
Anderson said.


The Federal Theatre Project test bed made two images each of 10,000 pages: one a
preservation-quality image in grayscale at eight bits per pixel or in color at 24 bits per
pixel; and a black-and-white, access-quality image with just one bit per pixel.


The access-quality images were supposed to convey information without being complete
facsimiles, but they often failed to reproduce enough information to be useful.


One goal of the National Digital Libraries program is to develop an architecture for
digitizing and managing materials.


So far, the program has established the base requirements for a modular,
standards-based open system. But building it will take a while, Campbell said.  



About the Author

William Jackson is a Maryland-based freelance writer.

inside gcn

  • Congressman sees broader role for DHS in state and local cyber efforts

    Automating the ATO

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above