Web sites get results-oriented

Agencies are working with Google to boost rankings and increase traffic

PRIMO FINDS: NIH's Dennis Rodrigues says the goal is to boost the quality of search results rather than the quantity.

Rick Steele

When people search for federal information online, the vast majority reach first for search engines like Google or Yahoo.

Only 4 percent of visitors to www.nih.gov, for instance, got there by typing the URL into their browser's address line, according to a ComScore research study released last year. The rest arrived by typing nih.gov into a search engine ' usually Google's ' and then clicking on the results.

This has set up an interesting dynamic between search engine companies and the federal government. The feds want their sites to appear high on the list of results delivered. Google, Yahoo and the other search engines
want to have satisfied
searchers. The more content
that is searchable, the better,
and the happier everybody is.

Users performing a search
think, ' 'I've been diagnosed
with cancer, and I need information.'
They don't think about
their information sources,' said
J.L. Needham, who represents
Google's public sector content
partnership. 'But if people
can't find something, they
blame it on Google, not the

To boost their rankings on
search lists, agencies have been
working with Google to develop
sitemaps, which are Extensible
Markup Language-based
lists of Web addresses that
point to database records.

A sitemap can take a couple
of forms, Needham said. At its
simplest, it can be a list of
URLs submitted through
Google's Webmaster Tools Web
site at www.google. com/webmasters/

Much of the government's
information on the Web is uncrawlable,
Needham said.

'Some estimates are that as
much as 90 percent of government
information is not accessible
through Web search engines,'
embedded in databases. 'We estimate that
at about 50 percent,' Needham
said. A sitemap makes
this information visible to the
search engines.

Opting in

But does this request for
sitemaps put Google in the
tricky position of telling the federal
government what to do?

No, said Chris Sherman, who
is the executive editor of
Searchengineland.com. 'It's
voluntary. Web sites don't have
to do it,' he said. 'I don't think
any of the search engines are
dictating anything. Their concern
is to get as much content
as they can. As good as search
engines have become, there are
still some barriers.'

Most government Web sites
do quite well in search engine
rankings, he said. A sitemap
will boost a site's ranking if it
has a lot of the content stored
in databases. 'Databases are
tough for search engines to
crack,' he said.

Seeking content

Historically, search engines
have looked with suspicion on
content providers, Sherman
said. 'Now they're saying, 'We
want your content, and this is
how to get it.' '

The sitemap protocol is an industry
standard, supported by
Google, Yahoo and Microsoft.
The actual development of a
sitemap doesn't take much
more than a day or so.

And federal Webmasters
don't seem to mind complying
with the protocol. If anything,
it's a labor of love.

Setting up the sitemap for
www.plainlanguage.gov took
Miriam Vincent between eight
and 10 hours. Vincent is an attorney
at the Social Security
Administration, but she volunteers
time to the Plain Language
Action and Information
Network, an interagency working
group of federal employees
who promote the use of plain
language for all government
communications. Vincent describes
herself as the site's
Webwright ' the -wright suf-
fix indicating a careful craftsman,
as in wheelwright or
shipwright ' not its master.

Before Vincent instituted the
sitemap, a search in Google for
one of the site's specific examples
of plain language 'wouldn't
show up on the first page or
first two pages' of results, she
said. The site's examples of language,
both plain and obfuscating,
are some of its most popular
features, and they eluded
search engines.

Since Vincent implemented
the sitemap, she has seen some
increase in Web traffic.

Now when users type 'plain
language' into Google, plainlanguage.
gov is the first result.
Type in 'plain language' and
'engineer jargon,' and the site
is still the first result.

Vincent has to do a short
copy-paste step when she updates
the database, but some
other federal Web sites have
managed to automate the
process entirely, dynamically
generating an XML file, she

It took the Energy Department's
Office of Science and
Technology Information 12
hours to create its sitemap
using the Google protocol, said
Walt Warnick, OSTI's director.
'We've spent more time talking
about what we did regarding
the sitemap protocol than
we did executing it,' Warnick

When osti.gov began offering
sitemaps several years ago, the
agency saw a huge increase in
traffic. 'The first day that
Yahoo offered up our material
for search, our traffic increased
so much that we could not keep
up with it,' Warnick said.

Everybody wins

Dennis Rodrigues, chief of the
online information branch for
the National Institutes of
Health, called the sitemap
project a win-win for federal
Web sites and search engines.
Rodrigues coordinates sites for
27 separate agencies under the
health agency's umbrella.

'I think a lot of the breadand-
butter stuff agencies have
on the Web sites [was] already
carefully indexed,' Rodrigues
said. The bulk of searches sent
to NIH Web sites are for health
problems, such as cancer, diabetes
and heart disease. But it
would be harder for someone
looking for information on a
particular gene or protein, he
said. The information would be
buried in a database.

Rodrigues said developing
sitemaps is more about creating
'a better quality of the site's
index and covering all the disparate,
eclectic information.'
The goal of the project is to
boost the quality of search results,
rather than the quantity.

'As federal providers, we have
a lot of concern about whether
or not the public is going to be
able to find our information,
especially about health information,'
Rodrigues said. 'We
know with the ever-growing
volume of information on the
Web, it's easy to become lost in a sea of data.

