Data analytics

How to make big data more useful, reliable – and fast

Government IT managers are looking for tools that make it easier to identify meaningful patterns and statistical trends in far-flung data sets. At the same time, these tools must work well with other technologies in order to help analysts make decisions in real-time.

Splunk, a firm that offers tools for collecting and analyzing machine data generated by back-end IT systems, is looking to address these concerns by bringing real-time operational intelligence to big data storage and batch processing. 

Enter Splunk Enterprise, software which collects, indexes and harnesses fast moving machine data generated by organizations’ applications, servers and devices, whether they are physical, virtual or in the cloud. Splunk also troubleshoots application problems and investigates security incidents rapidly, helping organizations avoid service degradation or outages.

Splunk’s Hadoop Connect helps integrate and move data easily between Splunk Enterprise and Hadoop, software that has become a mainstay of big data analytics. Splunk lets users send events from Splunk to Hadoop for long-term archival and data science batch analytics.  Conversely, data already in Hadoop can be sent to Splunk for analysis without users having to write code.

Additionally, the company unveiled The Splunk App for HadoopOps, an application that provides real-time monitoring and analysis of the health and performance of the entire Hadoop environment all from one interface. While existing Hadoop monitoring tools focus just on the Hadoop layer,  the Splunk App for HadoopOps encompasses all layers of the infrastructure, including Hadoop, the network, switch, rack, operating system and database, Splunk officials said.

So how might this work in a real world situation in a government agency?

A security administrator who needs real-time visibility across the whole enterprise and every device can use Splunk to immediately detect anomalies because the software collects data as it is being generated, according to Sanjay Mehta, Splunk’s vice president of product marketing.

Federal users are looking forward to capabilities that let them solve problems faster through the integration of Splunk and Hadoop, said Bill Cull vice president of Splunk’s public sector, noting that many of Splunk’s federal users are in the intelligence community.

Both Splunk Connect and Splunk App for HadoopOps are available for free to users of Splunk Enterprise via the company’s app store.  Splunk runs on all major platforms including Linux, Unix and Microsoft, Mehta said.  The software is built on a distributed architecture, which allows users to share workloads across machines via parallel processing.  It runs in virtualized and cloud environments and can even manage multi-tenant cloud environments, Mehta said.

"Splunk has taken a methodological approach to defining its co-existence with Hadoop," said Matt Aslett, research manager for data management and analytics with 451 Research.  Splunk Hadoop Connect not only integrates with Hadoop but also interacts with it while Splunk App for HadoopOps monitors cluster resources beyond Hadoop itself.  This offers users a single platform for managing and analyzing data in both environments, he said.

Reader Comments

Tue, Dec 11, 2012

An important distinction regarding Splunk and Splunk's Hadoop Connect is that non-technical users can install and use Splunk immediately without requiring developers or the lag time of a development/deployment schedule. Both technical and non-technical users can use Splunk's Hadoop Connect to browse and analyze Hadoop data without programming. Splunk can be deployed in both small and Enterprise architectures, and there is full SDK and API development support as well. Since the above comment seems to be a commercial for HPCC, I will add my disclaimer: #IWorkForSplunk.

Wed, Nov 7, 2012 H. M. United States

Good article Rutrell. We are seeing a lot of activity around trying to make Hadoop a complete solution. Another option to consider is HPCC Systems from LexisNexis Risk Solutions as an efficient tool for helping organizations overcome the challenges that arose with the era of big data. The recent release of their HPCC/Hadoop data integration connector allows for read/write of data to and from HDFS and HPCC, which can enable several opportunities to leverage HPCC components from within existing Hadoop clusters. Their built-in analytics libraries for Machine Learning and BI integration provide a complete end to end solution for ETL, Data Mining and Reporting. For more info visit: hpccsystems.com

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above