Research zeroes in on handling large data sets

Research zeroes in on handling large data sets


REDONDO BEACH, Calif.'Finding ways to deal with large data sets, archives and collections is attracting government research dollars.

At the National Conference on Digital Government Research last month, researchers presented papers on future technologies to improve the collection, organization and dissemination of large and distributed databases.

For example, Sherri Harms, a computer science professor at the University of Missouri, demonstrated an experimental data mining technique that could assimilate data from historical agriculture records, weather sensors throughout the world, crop models and other sources to come up with accurate local drought predictions.

Watch that data change

Her approach and other research projects receive grants from the National Science Foundation's Information Technology Research program, which sponsored the conference, dubbed DG.O.

According to Eduard Hovey, professor and co-director of the computational linguistics program at the University of Southern California, geospatial data is a hot research area. Viewing data changes over time is a particular challenge, he said.

'The brain organizes things temporally and spatially,' but databases don't, he said.

Another topic of intrigue is how to gather large amounts of data in a short span of time.

'At tax time, the IRS has the need to upload millions of documents overnight'they need it today but not tomorrow,' Hovey said. Several academic researchers are working on similar problems.

Eternal backups

Still another research focus, said Deborah Noble, external relations manager for the University of Southern California's Information Sciences Institute, is maintaining and migrating data that agencies must, by law, archive in perpetuity.

USC's San Diego Supercomputing Center is working with the National Archives and Records Administration on the problem, which extends to government videos and Web sites.

This type of research is popular with Congress and the new administration, said Larry Brandt, program manager for NSF's Digital Government Research program. Two years ago, his program was funded at $2.2 million and this year at $8.4 million.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected