The trouble with self-service business intelligence
Easy to do is easy to say.
Self-service analytics sounds like a great idea – empower employees to extract insights from organizational data without involving data scientists or data mining experts. Once the IT department sets up the data sources, business users can access the data via a user-friendly interface and generate reports on their own.
What’s not so easy is balancing information governance with maximum agility for hundreds or thousands of agency workers.
Not so simple set up
The price organizations pay for self-service convenience is that someone must first load data into a box somewhere -- be it a database-like server or a business analyst's desktop.
This might sound straightforward enough, but to oversimplify what's involved here is to ignore just why data integration (DI) is such a complex and often frustrating discipline. Integrating data isn't as simple as downloading a tool such as Tableau, installing it on the desktop, and going out and grabbing data from SAP or PeopleSoft.
Somehow, some way, someone has to build the plumbing. Even if it were easy to obtain and configure access -- to get the appropriate credentials, to configure a connection to an SAP system and to begin siphoning data -- it isn't at all easy to join data.
"There is a big myth in self-service which is that you can just somehow open a big database and the users are going to go make sense out of it," said Philip Russom, director of research for data management at TDWI, a sister site to GCN. "Typically, there has to be a lot of prep work to get something in place. It doesn't matter what kind of self-service it is. IT, data people, whoever --technical people -- have to do a fair amount of prep work to get at it and make it available."
This is where self-service discovery breaks down -- and why self-service data prep technologies (such as those marketed by Alteryx, Paxata, Trifacta and others) have received so much attention.
Traditionally, the only way to join data when using self-service tools such as Qlik or Tableau was to use the vendor's back-end scripting facility. Unlike the easy-to-use front-end offerings, these tools were much more complex, meaning analysts needed both business and technical expertise to use them effectively.
The challenge for vendors is to transform the features they've traditionally exposed as part of the end-user-oriented self-service experience into enterprise-grade services that emphasize reuse, repeatability and manageability.
Another issue is that self-service tools are designed primarily for looking at or transforming data -- not for managing it. In the self-service model, there's nothing analogous to the logical metadata layer that (for better or worse) is a fixture of enterprise business intelligence tools.
Self-serving consumers (or the IT departments charged with serving them) must resort to kludges to enforce consistent metadata definitions or to create the equivalent of a single, authoritative view of business information. This makes it extremely challenging to roll out and support self-service discovery or data prep at enterprise scale.
Until recently, self-service vendors often would downplay the value of portable/standardized business definitions. That said, some vendors such as Qlik have taken metadata management far more seriously for far longer. Established vendors (IBM, Microsoft, MicroStrategy, Oracle, SAP, and SAS Institute, among others) have introduced self-service discovery tools that complement -- and for all intents and purposes integrate with -- their enterprise BI offerings.
Self-service vendors have been consistently dinged by Forrester Research, Gartner, and other industry watchers for their shortcomings with respect to metadata management and other enterprise amenities. Consequently, this is an issue to which they've started to pay a lot more attention.
In the classic self-service discovery model, individual users create point-to-point data flow pipes between source and (server or desktop) target. This can and does result in upstream performance problems. Three or four self-serving users, working individually, could schedule batch extracts from the same upstream system at the same time -- bringing that system to its knees.
Data warehouse architecture, although far from perfect, was conceived with just this kind of problem in mind. It consolidates information into a single system and staggers batch extraction jobs to mitigate the impact on upstream data sources.
Over time, self-service practices -- discovery and data prep -- will likely converge on something like a data warehouse (a central repository for persisting, managing and reusing data visualizations, data flows, and so on) to address this problem.
There's also a sense in which self-service tools -- discovery tools, in particular -- are a new spin on spreadmarts, or desktop-based spreadsheets that were used for data organization, storage and dissemination.
Even though self-service discovery tools presume a high degree of information sharing and collaboration, most of the same issues remain. For example, how do you know which numbers came from which calculations came from which workbook? You don't, because each metric is redone in each workbook.
In the self-service model, as with spreadmarts, there's no scalable, reliable, built-in way to control for data provenance, to track data lineage, or even to ensure that common business definitions ("customer," "product,") mean the same thing from workbook to workbook.
An absence of visibility and auditability is a problem with self-service in all of its incarnations. It's something that self-service vendors are just beginning to address. There's a whole host of other potential kinks -- especially of the regulatory kind -- that will need to be worked out, too.
If history is any indication, the industry (vendors, integrators, customers, regulators and other interested parties) will coalesce around a mix of technological, process and behavioral interventions. These interventions should -- at the very least -- improve these key measures: data integration, metadata management and auditability.
A longer version of this article first appeared on TDWI, a sister site to GCN.