Need for hierarchies makes taxonomy an essential ingredient in content management
- By Joab Jackson
- Apr 23, 2004
A taxonomy is critical to running a large Web site with a content management system because the software requires that site documents and other information be classified in hierarchies.
Designing an accurate taxonomy also will save time and energy in complying with Section 207 of the 2002 E-Government Act, which requires each agency to develop an overall taxonomy of its sites by the end of this year.
Any agency building an enterprisewide taxonomy should anticipate more than a million categories, said Claude Vogel, chief technology officer for search engine vendor Convera Corp. of Vienna, Va.
'People underestimate how big their taxonomies will be,' Vogel said. Commercial software such as his company's Taxonomy Workbench can handle most, though not all, of the job, he said.
About 30 percent of the work of building a taxonomy is setting up the categories, said Rita Knox, a vice president and research director for Gartner Inc. of Stamford, Conn.
Although IT workers are often assigned to do this job, it is essential for administrative and domain experts to work on the classifications and vocabulary because they are more familiar with the material and its context, she said.
To cut development time, agencies can modify taxonomies already developed by others.
Knox cited the Taxonomy Warehouse, at www.taxonomywarehouse.com
, as a good place to start.
Web content management systems help an agency's content providers encode each document in the taxonomy with its necessary metadata.
Another tool that aids in building a taxonomy is automated classification software, available from vendors such as Stratify Inc. of Mountain View, Calif., and Verity Inc. of Sunnyvale, Calif. These companies' tools can automatically generate a taxonomy from a set of documents.
'Statistically, the results can be pretty darn close to human levels,' said Ron Daniel, a consultant with Taxonomy Strategies of Concord, Calif. Daniel spoke at a recent conference on content management systems sponsored by the Digital Government Institute of Washington.
Classification software works best on documents that need careful scrutiny because human workers get bored and tend to make errors. The errors produced by classification software tend to be fairly glaring, Daniel said, so a human editor should look over the results.
'The best practice is semi-automation, where a machine suggests something and people look at it to correct the errors,' Daniel said.
Joab Jackson is the senior technology editor for Government Computer News.