With the Right Tools, Big Data Provides Great Insight

Big data—the vast amount of structured and unstructured data that includes data from e-mail, machine sensors, social media, images, web logs and transaction records—continues to grow unabated. As agency data stores grow, they are looking for better ways to store, manage and leverage that data for new and actionable insights.

With tools like object storage and big data analytics, both available on the SEWP V contract, agencies are beginning to realize the potential of those massive data stores. According to a Meritalk survey, 72 percent of federal agencies are already using big data to improve mission outcomes. Over the next three to five years, agencies expect to be rely on it even more to maximize productivity, root out inefficiencies, and identify areas for improvement. They will also improve the experience of citizens and stakeholders using government services.

Before agencies can begin mining big data for insights, they must find effective methods to store both structured and unstructured data. Many experts recommend object storage because of its scalability, performance and availability. With object storage, data items are stored in containers called objects instead of blocks, files, or folders. This facilitates faster storage processing of data items. The NASA SEWP V contract offers many leading object storage systems from vendors including Dell EMC, Hitachi Data Systems, HPE, IBM and NetApp.

With data storage under control, the next step is selecting the right analytics tools. Done right, big data analytics is extremely powerful. According to a recent survey, the top uses of big data analytics in government are cybersecurity analytics, predictive analytics for forecasting and pattern recognition, gaining operational efficiency, financial data management and analysis, logistics management, and fraud detection.

Through the SEWP V contract, agencies at all levels of government have access to the leading big data analytics solutions from vendors like Splunk, Microsoft, IBM and SAP. Splunk’s big data analytics platform helps agencies collect, index, search, analyze, and visualize all data in one place from a huge variety of sources across geographies, data centers and cloud infrastructures. SAP Vora works hand-in-hand with the SAP HANA in-memory relational database and integrates with Hadoop. This combination of tools can perform advanced analytics including predictive analytics, spatial data processing, text analytics, text search, streamlining analytics, and graph data processing.

Microsoft has taken a slightly different approach to big data analytics. Through a partnership with Hortonworks, Microsoft’s entry is HDInsights. This is built with open source components and fully compatible with Apache Hadoop. HDInsight is a cloud-hosted service built for agencies using Azure clusters to run the Hortonworks Data Platform (HDP).  Microsoft has other big data analytics offerings as well, including HDP for Windows and the Microsoft Analytics Platform System, which lets users query and combine Hadoop with on-premises relational data.

IBM has a full array of big data analytics engines. One interesting choice is Watson Analytics, a data analysis and visualization solution that aims to simplify building queries non-data scientists. It uses automated predictive analytics to help users build analytics queries in natural language. It also features guided exploration and data discovery, which facilitates using natural language to find answers to questions and gain insight.

Most big data analytics solutions are available both on-premises and in the cloud. SAP’s cloud platform for big data services, for example, offers Hadoop- and Spark-based processing in the cloud. This allows for compute bursting when necessary, along with job monitoring, support and security. For most agencies, hosting big data and analytics solutions in the cloud makes a lot of sense. It is an effective way to gain the flexibility and scalability required with fast-growing, diverse data sets.