Big data applications at NSF

Convergence of technologies will spur big data projects in 2013

The maturity and convergence of four technologies will help government decision-makers derive more value from their big data projects in 2013, predicts Chris Biow, public sector CTO at MarkLogic, a developer of databases for big data applications.

“Cloud computing, Hadoop and NoSQL databases are the three game-changing technologies that are being applied to big data,” Biow said. “I think this is the year that government agencies get their hands around what each of them can and cannot do.”

Semantic technology is the fourth discipline to add to the equation, which can be used to extract facts from structured and unstructured data as well as handle relationships between data in a more flexible way than traditional relational databases, Biow said during an interview with GCN.

Big data is defined as a volume, velocity and variety of data that exceeds an organization’s storage or compute capacity for accurate and timely decision-making.  Public sector agencies have worked for years on complex, analytic projects in many domains before the term big data came along.  What has changed, according to industry experts, is that the cost of computing has come down, unlocking capabilities for agencies to analyze and find hidden value in data.

Cloud computing has enhanced the delivery of big data.  The cloud “gives you a flexible way to get the computing resources you need,” Biow said.  But it doesn’t answer the question of what you are going to apply those resources to, Biow said.  That is where Hadoop comes in for the distribution of bulk-type tasks, not ones that require real-time responses, he said.

Hadoop is an open-source software framework that supports data-intensive distributed applications. Hadoop takes care of compute-intensive, non-real time requirements and handles file-type storage, he said.  However, if users are interested in the storage of big data applications, which tend to be unstructured, NoSQL databases are better positioned to handle that type of data.

NoSQL databases are not going to take over all big data since there are still some big data applications that fit into relational databases, Biow noted.  “But the 80 percent that is unstructured, poly-structured or changing structure, that is the domain of NoSQL databases,” he explained.

“This is the year that government agencies will see that Hadoop handles distribution of computing tasks well, but doesn’t give real-time, immediate response to the user that most of the applications actually require,” he said, adding that agencies will see that NoSQL databases come into play here, giving responses back to the user in real time. 

Hadoop is a complementary platform on top of which NoSQL databases  will run.  Cloud infrastructures, upon which Hadoop can reside, give users the flexibility to get computing resources when they need them.  “So each of them have their own layer and fit together.  This is the year where they come into standard practice,” Biow said.

The major theme underlying all big data applications will be information sharing, he said. Whether the applications are applied by the intelligence, defense and security communities; by agencies to ferret out waste, fraud and abuse; or by agencies seeking to make government operations more transparent, the broad theme is the sharing of information.  For example, different data sets that help in the detection of waste, fraud and abuse can be analyzed and shared across agencies with diverse missions such as those involved management of finances and taxes, health care benefits and immigration.

With the implementation of any technology there are challenges agency IT managers and decision-makers must be aware of, Biow said.

In general, agency managers must ensure that, in this era of declining budgets, they still invest in innovation.  Big data techniques have enormous cost-saving potential, but if budgets are locked into maintaining existing systems IT,  managers won’t have the flexibility to innovate and reduce the operation and maintenance of existing resources, Biow warned.

Nevertheless, there are challenges associated with the four technologies.  For example:

  • Hadoop: The biggest problem is over-estimation of what it can do.  Agency managers must realize that Hadoop is suited for compute-intensive, batch processing, not for handling real-time requirements. 
  • NoSQL databases: The forte of most of these next-generation databases is enterprise robustness as well as the ability to work on top of Hadoop. Getting reliability and continuity out of them is essential. As with the previous evolution of relational databases, users must carefully consider the reliability of the database -- much will depend on the data and applications. “You can’t pick the technology first and assume it is going to solve your problem. You have to fit the two,” Biow said.
  • Cloud:  Cloud is the answer to agile provisioning of computing resources. Netflix, for example, uses Amazon Web Services cloud infrastructure to meet peak holiday demands for watching movies online. However, even if cloud providers can stand up 1,000 servers when an agency needs them, how do agency procurement departments handle that agile provisioning with agile expenditure of money? That is a challenge for government, Biow said.
  • Semantic technology: Semantics is a technology that has been the victim of over sale.  “Expectations have been so high for so many years that there is disillusionment with it, and I am predicting it will be a player in big data problems this year,” Biow said.  But it is in the early stage of fruition.

In all these areas, agency managers must require that their projects return mission value quickly, Biow said. Managers must get value from their technology and projects in a matter of weeks or short  monthly periods.  Everything is not going to be accomplished during this period, but managers should expect to see some value.

“If in a matter of some months you are not seeing any mission value, it is probably time to pull the plug on that project,”  Biow advised. “Don’t rely on promises that in a year or two something is really going to happen.”

About the Author

Rutrell Yasin is is a freelance technology writer for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected