Agency taxonomies are a tall order, experts say

An agency building an enterprisewide taxonomy should expect to see more than a million categories within their design, according to Claude Vogel, chief technology officer for search engine company Convera Corp. of Vienna, Va.

Vogel spoke at a seminar yesterday in Washington presented by Convera on understanding and building taxonomies.

'People underestimate the magnitude of how big their taxonomies will be,' Vogel said, adding that commercial software, such as Convera's, can handle most, though not all, of the job.

Every federal agency needs to develop an overall taxonomy of their Web sites by the end of the year, as mandated by Section 207 of the 2002 E-Government Act. A taxonomy is a classification of documents and other information ordered in a single hierarchy (Click to link to May 20, 2002,GCN story

Taxonomies can be used to organize large numbers of electronic documents, said Rita Knox, a vice president and research director of the Stamford, Conn.-based IT analysis firm Gartner Inc. An advanced search engine can then use a taxonomy to better point users to the appropriate content. Many search engines, such as those from Autonomy Inc. of San Francisco, Convera and Verity Inc. of Sunnyvale, Calif., can index into a taxonomy documents of many different formats, such as Microsoft Word files, Excel spreadsheets and e-mails.

The trick to organizing all these documents, Knox said, is building a taxonomy around a central objective. Although IT workers are often tasked to do this job, it is important to get domain experts or business experts to work on the classifications and vocabulary, because they are more familiar with the context of the material, Knox said.

Knox said categorizing is the largest task in building a taxonomy, accounting for about 30 percent of the work. To cut development time, agencies can modify taxonomies already developed by other parties. She cited the site Taxonomy Warehouse as a good place to search for other taxonomies.

Once a taxonomy is formalized, documents can then be indexed to it. A search tool such as Convera's can be used to automatically populate the categories, though someone should check the results afterwards to make sure the taxonomy adequately covers the topics, said Joshua Powers, the principal ontologist for Convera.

In addition to taxonomies, ontologies can also be used to refine the results, Vogel said. Ontologies are a higher-level ordering technique, one that maps how taxonomy categories relate to one another.

'Ontologies are for modeling, taxonomies are for indexing,' Vogel said.

About the Author

Joab Jackson is the senior technology editor for Government Computer News.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected