Searching for standards in big data
Ensuring the interoperability of big data and the tools that go with it will be a prime goal.
The promise of big data lies not just in the analytical riches that can be plumbed in separate databases but also what can be derived from searching across the many different kinds of data that exist in different organizations. That, in turn, will require a set of standards unique to big data.
One of the things that is changing attitudes about managing data is that organizations are realizing there is a compelling need to share data with one another, said Leslie Collica, deputy chief of the Information Access Division at the National Institute of Standards and Technology.
“Data sharing is a large issue, a game changer,” she said. “We need to be able to share and have access to disparate types of data, and you can’t expect to just take one piece of data from a database and be able to process that in another system.”
Developing data formatting standards that would allow for the labeling and indexing of the larger datasets that big data is involved with — and learning how to manage them — is an area that NIST will likely get involved in, she said.
“Are we going to need new tools, how are the tools going to manage the different data types, and how are those tools going to work on different platforms?” Collica asked. “All of the interoperability issues will need to be looked at.”
The problem with sharing across various big data repositories is that the metadata used to describe the data in one repository is different from that used in another. Without standards that can bridge those metadata or without some kind of analytics metadata layer that can span different repositories, developers will have a hard time fashioning applications that can use the data.
Some people believe the big data process will facilitate those standards. As new databases are built around the use of Hadoop, they will inevitably drive standards because there will be an expectation and a demand that things seamlessly interoperate between the databases, said Dale Wickizer, NetApp Inc.’s chief technology officer for the U.S. Public Sector.
“Standards are the things that help people through the terrifying aspects of new and emerging markets,” he said. “The expectation in the enterprise is that things will interoperate, so that will drive the need for standards in big data.”
There is a strong consensus for the need to standardize on various technologies, agreed Matthew Martin, a solutions architect at Merlin International. Things are moving quickly, but there’s still a lot of work to be done, and it’s hard to say when it will happen.
“Government definitely prefers [to have standards], but it can’t wait forever because it’s faced with serious data problems now,” he said. “Agencies may have to move forward anyway [with big data] because they have a mission to meet.”
The urgency is certainly there, Collica said. NIST is already dealing with much larger datasets as far as search and retrieval are concerned and finding it harder to get to certain types of data and get them shared, she said. However, providing answers is not something that NIST alone can address.
“It will need to be multi-agency work because people still have a fairly tight hold on their own data,” she said. “To provide realistic data and be more likely to share them, we need to find ways we can rely on the data we get from other places and, once they’re available, know that they still have integrity.”
Because of that, Collica said, it’s hard to predict when the push to develop big data standards will bear fruit.
“It’s going to take a very large community to look at this,” she said. “It isn’t a simple solution.”