John McCormick

Take a look at refined searching methods to rescue bandwidth

Big Web search engines waste a lot of everyone's time and bandwidth on irrelevant hits. That's because Yahoo, InfoSeek, HotBot, Excite, Lycos and others build gigantic, full-text indexes of Web page content that are ranked according to certain rules.

Some search sites list only pages that have been checked by a human, but in view of the million-plus new Web pages appearing daily, this approach has its limits. The ranking rules are heavily weighted toward the number of times a word or phrase appears on a page. That's why some sleazy pages repeat 'sex, sex, sex ... ' to snare a high but useless ranking on some search engines.

Google.com's approach is better: Rank sites by the number of times their pages are cited by other pages. This hyperlink-based ranking often produces significant search results.

List and serve

The June 1999 Scientific American has an article on Web hypersearching that says IBM Corp. researchers are working on a refinement: Look at exactly which sites point to certain pages, then rank the pages according to number and quality of links.

Web search strategy is relatively easy if users in your office regularly look for the same sort of information. In the long run, it pays to take the time to test every search engine for its relevance to your particular needs.

Once you have found the best site, teach your users how to compose queries. Most search engines do Boolean searches. Experiment a little. Connectives such as and between key terms in a search string, quotation marks or a plus sign to mandate the presence of search words in the hits will greatly improve search quality.

If Web searches are a big part of your office's daily work, check out www.searchenginewatch.com/ for the latest developments. The site gives tips for getting a site listed on search engines. It publishes a free newsletter with site updates, and it provides technical reports and search technique tutorials.

How about commercial search sites that are not free? The Gov.Search venture recently piloted by the National Technical Information Service and Northern Light Technology LLC of Cambridge, Mass., indexes more than 4 million Web pages on 20,000 government servers.

Following NTIS' withdrawal from the venture last month, Northern Light planned to charge by the page, by the day or a flat $250 yearly fee for use of the site at usgovsearch.northernlight.com/.

To see just how targeted the engine was, I searched last month on supercomputer and drew about 6,000 government site hits. Ranking second and third was a link to a State Department server that seemed to indicate a sale for supercomputers.

Computer blues

As I'm still hunting for a computer that can run Microsoft Windows 98 as fast as my old PCs ran MS-DOS, I was interested. But the listing turned out to be for a 'Super Computer Sale' of used PCs at the Wisconsin State Fair last March.

Inspecting the link more closely, I found it wasn't a State Department server but a Wisconsin state server'yes, a government site, but so much for relevance.

The same search words in the site's Special Collection section brought up a list of 11,000 sites that looked more interesting. But clicking on the top one took me to a Northern Light page that explained it would cost $2.95 to view the hit it described as 'very short (less than one page).'

I passed.

John McCormick, a free-lance writer and computer consultant, has been working with computers since the early 1960s. E-mail him at [email protected].


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/Shutterstock.com)

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected