What

What's holding back Hadoop?

Hadoop -- the open-source, distributed programming framework that relies on parallel processing to store and analyze both structured and unstructured data -- has been the talk of big data for several years now.   And while a recent survey of IT, business intelligence and data warehousing leaders found that 60 percent will Hadoop in production by 2016, deployment remains a daunting task.

TDWI -- which, like GCN, is owned by 1105 Media -- polled data management professionals in both the public and private sector, who reported that staff expertise and the lack of a clear business case topped their list of barriers to implementation:

Barriers to implementation Respondents who checked each category
Inadequate skills or difficulty of finding skilled staff
 
  42%
Lack of compelling business case
 
  31%
Lack of business sponsorship
 
  29%
Lack of data governance
 
  29%
Security for Hadoop data
 
  29%
Lack of metadata management
 
  28%
Excessive hand coding required of Hadoop
 
  27%
Cost of staffing Hadoop admin/development
 
  25%
Cost of implementing a new technology
 
  22%
Difficulty of architecting big data analytic system
 
  22%
Immature support for ANSI-standard SQL
 
  19%
Interoperability with existing systems or tools
 
  19%
Software tools are few and immature
 
  19%
Enterprise-class manageability
 
  17%
Not enough information on how to get started
 
  16%
Slow pace of hand-coded development
 
  16%
Cannot make big data usable for end users
 
  13%
Handling data in real time
 
  13%
Existing user-defined DW architecture
 
  12%
Poor quality of Hadoop data
 
  11%
Software tools need higher-level language support
 
  10%
Hadoop's high operational expenses
 
  9%
Enterprise-class availability
 
  9%
Other
 
  2%

The respondents did, however, see a wide range of uses to justify the deployment efforts, including:

HDFS applications Respondents who checked each category
Complementary extension of a data warehouse
 
  46%
Data exploration and discovery
 
  46%
Data staging for data warehousing and data integration
 
  39%
Data lake
 
  36%
Queryable archive for non-traditional data
 
  36%
Computational platform and sandbox for analytics
 
  33%
Enterprise data hub (for both new and traditional data)
 
  28%
Business intelligence (reporting, dashboards)
 
  27%
Queryable archive for traditional enterprise data
 
  19%
Operational data store (ODS)
 
  17%
Repository for content, records management
 
  17%
Operational application support (apps on Hadoop data)
 
  11%
Don't know
 
  3%
Other
 
  1%

And just 6 percent said Hadoop deployments were not in their organization's plans at all:

When do you expect to have HDFS in production?

- 2012 - 2014


The full report, which also includes best practices and implementation trends, is available here.

About the Authors

Troy K. Schneider is editor-in-chief of FCW and GCN.

Prior to joining 1105 Media in 2012, Schneider was the New America Foundation’s Director of Media & Technology, and before that was Managing Director for Electronic Publishing at the Atlantic Media Company. The founding editor of NationalJournal.com, Schneider also helped launch the political site PoliticsNow.com in the mid-1990s, and worked on the earliest online efforts of the Los Angeles Times and Newsday. He began his career in print journalism, and has written for a wide range of publications, including The New York Times, WashingtonPost.com, Slate, Politico, National Journal, Governing, and many of the other titles listed above.

Schneider is a graduate of Indiana University, where his emphases were journalism, business and religious studies.

Click here for previous articles by Schneider, or connect with him on Twitter: @troyschneider.


Jonathan Lutton is an FCW editorial fellow. Connect with him at jlutton@fcw.com

Stay Connected

Sign up for our newsletter.

I agree to this site's Privacy Policy.