Power User: Don't search for text with all the wrong tools

John McCormick

Whether you regularly dig through more than 15,000 documents you've authored or delve into years of records created by others, using Microsoft Office search tools is nothing short of torture.

It's bad enough that Word sometimes makes finding a particular document such an exercise in futility that you fear the onset of carpal tunnel syndrome. But the Search tool that drops down from the Start button can bring even a strong man to tears. Word is simply not well-suited for quickly locating information in text files.

And that's not all. If you're running Windows XP'especially an early version with no service packs installed'a search against all file types definitely won't yield the results you expect. Microsoft programmed the search tool to ignore many kinds of files. That's fine for the typical office worker but not for power users.

At minimum, you'll want to install all service packs so you can gain access to those files deemed irrelevant by the software maker. But a better approach is to get a real search engine.

My favorite is dtSearch from dtSearch Corp. of Bethesda, Md. You can download a trial version at www.dtsearch.com. This engine is extremely fast, searching gigabytes of text in seconds.

It first produces a concordance, or full-text index, of all your document files. You can restrict the locations and file types indexed. But if you want everything, dtSearch can handle word processor, spreadsheet, and database files as well as many others.

Searches are, of course, normally done on the index created by dtSearch, which can include local and network files as well as Web sites.

Besides all the usual Boolean tools, you can check boxes to allow stemming, which includes other grammatical forms of the keywords, phonic searches that include other words that sound like the search terms and synonym searches.

Particularly useful for poor spellers like me, a fuzzy search feature can find slightly misspelled words or those that are close.

You can also search material not already indexed'it just takes longer. There are also the usual file name, date and size search options.

Of course, the biggest database of all is the Internet, and I'm always looking for interesting new online search engines.

The Hewlett-Packard Speechbot site, at speechbot.research.compaq.com, indexes nearly 18,000 hours of online video and audio clips available from sites across the Web. It uses speech recognition to generate transcripts so it isn't perfect, but it's pretty darn good.

The FindArticles site, at www.findarticles.com, does just that, indexing about 3.5 million magazine articles. Created by LookSmart Ltd. of San Francisco, the site offers free full-text versions of articles from about 700 consumer and trade publications.

The Government Printing Office's search site, at www.gpoaccess.gov/multidb.html, is great if you need federal records.

How about country-specific search engines? Check out www.searchenginecolossus.com to find search engines based in certain countries or territories.

And then there's IncyWincy. Behind the modest-looking opening page, IncyWincy (Spider, get it?) indexes 46 million pages courtesy of the Open Directory Project. But www.incywincy.com also digs into a million Web search portals not indexed by other search engines.

I also maintain a modest help site at www.helpdotcom.com, and government users seem to like using my site for access to the invisible Web and specialized search engines.

John McCormick is a free-lance writer and computer consultant. E-mail him at [email protected].


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/Shutterstock.com)

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected