A sentiment-al education: Text analytics comes of age
- By Stephen Swoyer
- Nov 04, 2013
Getting started with text analytics can seem daunting, if not mystifying.
A new report from TDWI Research aims to make it less so, describing applicable text analytic use cases as well as strategies for developing and implementing a text analytic program.
This means taking text analytics beyond its core use cases of sentiment and customer experience analysis. According to Fern Halper, research director for advanced analytics at TDWI Research, text analytics is increasingly used for applications other than these bread-and-butter use cases.
"Text analytics is being used across industries in numerous ways, including customer-focused solutions such as voice of the customer, churn analysis, and fraud detection," writes Halper, author of How to Gain Insight from Text, the latest release in TDWI Research's "Checklist Reports" series. TDWI is owned by 1105 Media, parent company of GCN.
"Many early adopters have used the technology to better understand customer experience, and this is still one of the most popular use cases," she acknowledges, noting that "text analytics is also being used in other areas such as risk analysis, warranty analysis and medical research."
With this in mind, how does one get started? The good news is that text analytics depends less on specialized software and expertise than it used to. For one thing, most business intelligence (BI) vendors ship limited text analytic features with their tools: Microsoft, for example, exposes a wizard-driven front-end for its SQL Server Analysis Services (namely, SQL Server Data Tools, or SSDT) the purpose of which is to automate the steps of selecting and preparing data sources — including semi-structured text sources.
Most other BI platforms — including SAP BusinessObjects, IBM Cognos, WebFOCUS from Information Builders, MicroStrategy, Oracle Business Intelligence Enterprise Edition, QlikView, SAS and Tableau, and others — incorporate (limited) self-service text-analytic features, too. Stepping up to text analytics doesn't have to entail a huge commitment or capital outlay, such as purchasing a solution from a specialty vendor such as SAS or IBM and hiring the requisite — and typically costly — talent to use it.
There's a difference between stepping up to text analytics — such as by using text in one-off projects or as a component of BI/analytic discovery — and developing a mature text analytic program.
Halper's report addresses the latter requirement. She outlines a pragmatic approach for developing a text analytic practice and getting started with text analytics. "It generally makes sense to pick an initial problem that has relatively high visibility and where it is fairly easy to get at the data. If possible, it should be a quick win that uses a proof of concept," she writes.
Halper points out that the selection of a high-visibility problem — preferably one with the promise of tangible ROI — "will earn a seat at the executive table, which can help to keep momentum high." The proof of concept is important, she explains, because it "ensure[s] that the technology you're using works with your specific data."
Depending on their needs, adopters must distinguish between general-purpose text analytics and built-for-purpose text analytics products, Halper says, citing the surfeit of available sentiment analysis and customer experience improvement offerings.
"Another factor to consider as part of the business case is whether the solution is multi-purpose," she writes. "For example, there are numerous products on the market that use text analytics to gain insight into social media to understand customer opinions and sentiment. It is important to think beyond the first use case and consider your options wisely: i.e., point solutions versus more robust, integrated solutions."
Halper's report addresses a total of nine checklist items, including the importance of pro-actively determining data access, timeliness and security requirements; the role of data visualization in text analytics; the use of sentiment analysis; and more advanced uses of text analytics.
She also considers the problem of accurately identifying so-called "text features" for extraction. These consist of entities, such as the names of persons, companies or products; geographical locations; dates or times; themes, such as important phrases or words/concepts that occur or co-occur with one another; and concepts, such as words or phrases that have semantic significance.
"The goal is to accurately extract the entities, concepts, themes and sentiment in which you are interested," Halper writes, explaining that different text analytic tools address this problem in different ways: "A vendor might include only a dictionary, list of names, or synonym list. Another might support hierarchical taxonomies to better organize information. The disadvantage of any purely list-based or taxonomic solution is that you're limited to finding what's in the list."
In this respect, she concludes, text analytic technologies are getting both more sophisticated and more usable. "Some vendors now incorporate statistical models based on machine learning into their solutions to help users extract features that were not preconfigured. Vendors that provide models often pre-train them so users don't need to do anything but simply use the model. Some vendors provide hybrid approaches — statistical and rules-based — which provide the benefits of collection investigation combined with the specificity that comes from linguistic rules."
You can download a copy of Halper's report here. (A short registration is required if you are downloading a free TDWI report for the first time.)