Big data best practices from down under
- By Mark Pomerleau
- Jan 22, 2015
Private companies have become adept at using and prioritizing big data to better serve their interests and customers. Big data analytics gives organizations insights into their customers’ transactions, searches and purchase behavior, among other things. Government, while late to the game, is likewise turning to big data to make services easier for citizens and more organized for personnel.
The Australian Public Service Better Practice Guide for Big Data addresses the key issues agencies face when growing their capabilities in big data and big data analytics in order to find “new solutions for enhanced evidence-based policy research, improved service delivery and increased efficiency,” the report states.
The areas in which the Australian government wishes to improve big data competence are identifying business requirements for big data, developing capabilities for cloud computing and the necessary infrastructure to accompany it, identifying high value datasets, advising the government’s use of third-party datasets, and promoting privacy designs and privacy impact assessments.
In addition, the guide aims to improve competence in data project management and the responsible use of data analytics.
In terms of implementing big data, the Best Practice Guide discusses infrastructure requirements for storage and processing that agencies should take into account before full implementation.
Because scalability is critical to a big data system, the guide suggests estimating storage requirements by determining:
- Extensibility – or the ability to extend platforms without limiting overall capacity.
- Performance criteria – or the number of users, the nature of the queries, the relevant storage configurations, the speed of the channels to and from the appliances and the fault tolerance.
- Compatibility requirements – or determining if the appliance needs to be coupled with production systems or existing data stores, or if it is a standalone development environment.
The guide asserts that agencies must consider what kind of infrastructure will best deliver the analytics they need. Among the several options are:
Grid computing, which allows for current processing capacity to be used across a grid of processors. Grid computing is satisfactory for parallel computations taking place separately without processor communication.
Cloud computing, which is suited to applications with uncertain demand for processing and storage or those that are not yet tightly integrated to in-house applications or processes.
Supercomputing, which uses the system’s unique infrastructure to interpret highly dependent data, usually pertaining to climate modeling or time-series correlations.
In-database processing, which more quickly facilitates analytics by limiting the movement of data while processing is conducted. This action provides for faster run times and is suited to operations in data discovery and exploration.
In-memory processing, which uses cached data and RAM to reduce query response time and enable real-time analytics processing.
In addition to the Australian Public Service’s Better Practice Guide for Big Data, IDG Government (Australia) has developed a road map to help integrate management of big data into various programs. IDG offers five steps to help clarify terms and concepts for easy integration.
Among its recommendations, IDG suggests:
- Making clarifications between big data and conventional data management.
- Assessing how big data and analytics allocate services.
- Analyzing available resources to be better equipped at predicting events accurately.
- Examining privacy and security implications as implementation of big data strategies can result in loss of accuracy.
- Learning about de-identification capabilities prior to implementing large object-based storage repositories.
Mark Pomerleau is a former editorial fellow with GCN and Defense Systems.