White House: Web site doesn't steer clear of Iraq

The White House today dismissed charges that its Web site was deliberately guiding search engines away from pages about Iraq, saying its Web team was only trying to avoid duplication.

'It's lubricious,' said White House spokesman Jimmy Orr, replying to charge issued by a Democratic Party Web site.

Search engine spiders, which index content on the Internet, were directed away only from indexing duplicate pages, Orr said.

'All the material on the White House Web site is fully searchable by our search engine,' Orr said.

Orr was responding to a minor tempest arising from a Web page authored by Keith Spurgeon, a New York resident who works in the Internet industry.

On Oct. 24, Spurgeon noticed that the White House Web site carried instructions for search engines not to index certain White House Web pages about Iraq.

Internet search engines such as Google use spiders to crawl through Web sites and index the contents.

Frequently indexed sites often post a file, called ROBOTS.TXT, that instructs spiders to not index certain pages on that site. These files usually list pages that have scripts, file-pointers and other forms of content generally more of interest to computers than potential readers.

Spurgeon, however, said he saw that the White House's ROBOTS.TXT file listing 783 files or directories with the term 'Iraq' in their titles, most of them leading to pages about the recent combat operations in that country.

Spurgeon had searched the ROBOTS.TXT file when he noticed that the Google search engine, owned by Mountain View, Calif.-based Google Inc., had not indexed all of the White House's pages. He then found an earlier version of the White House ROBOTS.TXT file, dating from April 2003, with only 10 instances of the word 'Iraq.' Spurgeon did not speculate on why the White House disallowed these pages.

But other observers had no shortage of theories.

The Democratic National Committee Web log, linking to Spurgeon's site, accused the White House of historical revisionism. Google and other engines keep copies of the pages they index. So not allowing a search engine to cache a page means that fewer alternate copies of a page will exist'and so it will be easier for the White House to change a document without people noticing.

Dan Gillmor, a technology columnist for the San Jose Mercury News, speculated on his Web log: 'Perhaps the White House doesn't want to make it easy for people to compare its older statements about Iraq with current realities.'

The pages that were listed were duplicate pages, Orr said. Last summer, the White House set up a section of the site devoted of issues relating to Iraq at www.whitehouse.gov/infocus/iraq/index.html.

Although this section has a different look-and-feel from the rest of the White House site, it uses many documents that are also posted elsewhere on the site, such as press releases relating to the combat effort.

The ROBOTS.TXT file lists those documents that appear in multiple places on the site, Orr said. The staff wanted to reduce the number of duplicate items that someone would see by doing a search on the site.

Although agreeing most of the pages are duplicates, Spurgeon maintains that the file does have pointers to pages without duplicates.

'We've tried to eliminate redundancies on the site,' Orr said.

Orr oversees administration of the White House site, which has about 33,000 documents. A staff of 10 people manages the site, he said.


About the Author

Joab Jackson is the senior technology editor for Government Computer News.

inside gcn

  • network

    6 growing threats to network security

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group