A sharp eye for details

 

Connecting state and local government leaders

Giving users near-instant access to precise information is a tall order. But expand that requirement across all an agency's data, or even sources of related data, and the task seems insurmountable. Yet that's the promise of enterprise search technology.

Giving users near-instant access to precise information is a tall order, even within a specific application. But expand that requirement across all an agency's data, or even sources of related data, and the task could seem insurmountable. Yet that's the promise of enterprise search technology.Say 'search' and most people think of consumer sites such as Google, Yahoo and Microsoft MSN. These sites index a significant portion of the public Web, including not just HTML pages but documents, images and now even video.But the demands on enterprise search engines are different. They need to reach not just into Web sites, but also network file systems, e-mail repositories, content and record management systems, and even databases and business application platforms. And while users can often tolerate Web search engines serving up piles of marginal results, enterprise users can't afford to spend the day clicking through such results.'From our point of view, search is a means, part of an application or a business process,' said Lubor Ptacek, director of software product marketing for storage and document management vendor EMC Corp. of Hopkinton, Mass.As you might expect, government applications and business processes can be unique. So agencies' search platforms should meet a variety of important criteria. For one thing, the kind of information some government users look for can put much greater demands on search systems. Those searches may be for specific citations in much larger documents, or for the recurrence of terms in multiple documents in different languages, or for patterns of information such as phone and Social Security numbers. Moreover, the information may have associated security levels, requiring integrated access controls. Fortunately, evolving search technology can support most agency missions.Enterprise search is rapidly encompassing a number of other technologies, ranging from ad-hoc database query tools to advanced pattern recognition and relationship analysis software. At the same time, search is being incorporated directly into an array of other applications. For example, there's software called Splunk, a search tool that can find patterns in system and error logs to help administrators pinpoint the source of an IT problem.Enterprise search platforms generally fall into four categories: departmental-level systems intended for smaller sets of documents within a network or intranet; general-purpose enterprise search systems such as Google's Search Appliance; high-end, customizable systems for specialized types of search; and document management systems that can be adapted.The lines between those groups are becoming blurry. Depending on the application, any or all of them may end up being part of a proposal. While document management companies are providing connectors to external search engines for certain tasks, search-engine vendors offer integration with content repositories generally considered document management turf.Consider video search, for example, which has been the exclusive domain of specialized digital-asset management systems in the past. Fast Search & Transfer, an enterprise search vendor, offers the ability to search video footage based on text in closed-caption content, as well as voice-to-text translation, shape comparison, sound recognition and other criteria. FAST's technology is used not only by organizations managing their own content, but also by law enforcement agencies to ferret out child pornography and other illegal content.Vendors of enterprise content management and records management systems, such as Waterloo, Ontario-based Open Text Corp. and EMC's Documentum division, approach enterprise search as a component of their applications.'Really, enterprise search is about going across all those silos of information included in records management systems, as well as documents that fall outside of RM solutions today,' said David Schubmehl, vice president for discovery products at Open Text.Although not considered search companies themselves, Open Text and Documentum both rely on technology called 'search federation''executing searches across multiple data and document repositories and providing those results along with search results from within the application or system users are accessing. They use software connectors to access data in repositories of other applications, or even external search engines and online data services.'We have the ability to organize and classify the information in multiple repositories,' Schubmehl said, 'and provide some taxonomic way to browse it.'[IMGCAP(2)]In fact, many search vendors offer federation. Vivisimo's implementation of search for [, GCN.com/578], for example, relies not just on its own crawl of .gov sites, but also federates searches across the MSN Web search engine.Google has also moved toward search federation in its Google Search Appliance with the introduction of OneBox (as in, one search query box), which provides access to external data sources such as those from Cisco, Cognos, Employease, Netsuite, Oracle, Salesforce.com and SAS.'You can type a purchase order number into the search box and get information from Oracle Financials,' said Matt Glotzbach, head of Google's enterprise products. 'Or you can type in [a query] and get information back from a customer relationship management application.'Google can also integrate its search technology with an enterprise version of its Desktop Search tool. Or agencies could pull in data from Google Earth Enterprise in order to overlay search results on a map, three-dimensional model or satellite image.Progress Software's EasyAsk, another enterprise search tool, takes a unique approach. It provides different types of query and navigation interfaces for different types of data. 'We give special emphasis to the universe of reports,' said Dr. Larry Harris, vice president and general manager of Progress' EasyAsk division. 'The conventional view of search'full text indexing'is fine for unstructured data. But as you move toward more structured data, the methodology works less well'records in a database are treated as documents, which ignores the benefit of the structure they're stored in.'Just how accurate search platforms are is determined largely by the content 'map' they have to work with. These maps are often generated around technologies such as taxonomies, metadata and entity extraction.A taxonomy is a defined structure of content classification. Document and records management systems usually come with at least one predefined taxonomy, as well as tools for organizations to import or build their own taxonomies. Enterprise search engines can use those same taxonomies to categorize the content in enterprise file systems and other information repositories.Getting information into a taxonomy automatically requires powerful content processing tools. Those tools depend on two sources of information: metadata, or data about the data, and actual information within a document itself, including identifiable names and concepts commonly referred to as 'entities.'Entity extraction tools find blocks of information that match sets of defined entity types'the names of people or places, phone numbers, addresses, etc.'and create indexing information for the document or data. A similar technique relies on rules-based processing to determine the proximity of words to each other, thus discovering concepts within information.At the Homeland Security Department, officials use a FAST search engine along with a tool called Teragram Categorizer to automatically categorize and extract concept information from policy and strategy documents in its Homeland Security Digital Library.'We have 20 Boolean operators for classification,' said Dr. Yves Schabes, co-founder of Teragram Corp of Cambridge, Mass. 'You can group things based on concepts. For example, if you want to define a rule that recognizes publicly traded companies, if more than one is found then the item is about 'business.' 'Keep in mind, however, when you build an enterprise search platform that the concepts and entities within an information source can be more important to many searchers than how they fall into a predefined taxonomy.'Too often enterprise search deployments focus too much on what data should be indexed in the system and not on how people find information,' said Bob Tennant, CEO of Recommind Inc. of San Francisco. 'For example, people often look for information based on a concept, without even knowing the right key words. Once they have found information that fits the context of their search, they may need to dig deeper, exploring more than one possible angle. Finally, they often need to relate the specific information they find with other organizational information in order to act.'And that's part of the search quandary. One size does not fit all. You must sit down with business process stakeholders to learn how people work. Looking for intelligence data in foreign-language content? You'll need specialized entity extraction and text analytics software. Just want out-of-the-box search for your departmental LAN? Google's appliance or Coveo's downloadable Windows-based search product might be enough. And there are specialized search tools for legal discovery, intellectual-property policing and nearly every other imaginable content analysis task.

CHOICES FOUND: Search appliances from Thunderstone Software and Google.

Google OneBox









Search gets specialized











Part and parcel









FirstGov.govGCN.com











A map of content

















S. Michael Gallagher is a freelance writer based in Maryland.

NEXT STORY: The Pipeline

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.