Lawrence Livermore explores the shape of data, expanding query-free analytics
- By William Jackson
- Jan 09, 2014
Lawrence Livermore National Laboratory is taking advantage of federally funded research into topological data analysis (TDA) to find new ways of extracting and using information from data sets that are too large and complex to yield to traditional analytical techniques.
The lab is collaborating with Ayasdi Inc., a commercial spin-off from research at Stanford University that is funded through the Defense Advanced Research Projects Agency and the National Science Foundation. Ayasdi’s Insight Discovery platform is a software suite already being used by private- and public-sector organizations, including the intelligence community, to glean insights from large and varied data collections.
“Big data challenges are a part of our mission,” said Anantha Krishnan, director of the lab’s Office of Mission Innovation.
The lab uses high-performance computing for modeling and simulation in areas of energy, climate change, biological defense and national security. “For many years the lab has had to rely on homegrown technology,” Krishnan said. “We have developed our own set of data analysis tools and modeling and simulation tools.”
But the lab also is looking at commercial tools that have emerged as big data has become a mainstream subject in IT. “Our sense is that topological data analysis could be a big contributor to the things we do,” Krishnan said.
Topology is a branch of mathematics dating to the 18th century that studies shapes. In the 21st century it has been expanded to apply to problems beyond physical shapes and surfaces to include the very large and high-dimensional data sets that constitute what is called big data. Data has shape, and shape has meaning, said Krishnan. The lab’s work with Ayasdi, announced in November, is an effort to extract that meaning.
“We are going through the evaluation phase now,” said Krishnan. “Our hope is that in the next few months the value will become clear.” The challenge in working with big data is not just volume. Big data is more than small data made large, said Ben Mann, Ayasdi’s vice president of federal operations. “Big data, done right, is completely different.”
How it works
Traditional topology assumes that what is being studied exists in a metric space in which the distance between points can be measured. In three-dimensional space, this can be used in tasks ranging from computer graphics to statistics to infer features or relationships. Using TDA, data points from more complex data sets can be put into a multidimensional framework and relationships identified based on the distances of the points from each other.
“The fundamental idea is that topological methods act as a geometric approach to pattern or shape recognition within data,” says a September 2013 article in the journal Science co-authored by Ayasdi CEO Gurjeet Singh. It allows “exploration of the data, without first having to formulate a query or hypothesis.”
That is, researchers can find things they did not know they were looking for. For instance, in a database of billions upon billions of phone records scientists could make sense of who was talking to whom. TDA could show these patterns across multiple databases without being queried about specific relationships.
At a high level the concept is simple. But it is difficult for people living in a 3D world to make sense of the n-dimensional space in which data lives, Ayasdi’s Mann said. “It is very hard for anyone to picture in his mind what that complicated shape is.”
Another difficulty is picturing relationships not just within a data set, but between data sets that have differing formats. TDA can identify and display shapes based only on the notion of distance between points regardless of the specific dimensional framework of the data set.
The software developed for the Insight Discovery platform analyzes the data to produce dimensional shapes, then uses algorithms to extract relationships shown in them. The platform does not query the databases. “We let the data speak to us and illustrate features we might not have been looking for,” Mann said.
Although the practical use of topology in big data analysis is new, the roots of TDA date back to research begun in the 1970s at Stanford University. In 2003 the university received $10 million from DARPA and NSF to develop TDA into a practical tool, and Ayasdi was founded in 2008 to commercialize software developed from that research.
Government uses of TDA
“We plan to be ubiquitous,” Mann said. The software is being used today in the pharmaceutical, energy and financial services sectors, as well as in government agencies including the Agriculture Department, where it has been used to study E. coli bacteria; the Director of National Intelligence’s Intelligence Advanced Research Projects Activity; and the National Institute of Allergy and Infectious Diseases.
The resources required to use the software depend on the amount of data being analyzed. It can be run on a laptop computer for small jobs, while some agencies with a tremendous amount of data use supercomputers. “Our software is massively scalable,” Mann said.
Ayasdi offers its TDA platform as a cloud-based service, although intelligence agencies prefer to license the software and keep their data in their own clouds, Mann said.
Mann called the Lawrence Livermore partnership “a huge opportunity for us,” because of the technical expertise of the lab’s staff, its access to many types of data and its advanced computing power. The lab, in Livermore, Calif., is home to two of the fastest supercomputers in the world. Its Sequoia was ranked the third fastest at 17.2 petaflops (quadrillion calculations per second) in the most recent TOP500 listing, and Vulcan was in ninth place at 4.3 petaflops.
The collaboration arose from personal connections between the lab and people at DARPA familiar with the research, Lawrence Livermore’s Krishnan said. “Many of us became convinced this could have a major impact on what we’re trying to do. One of the things we saw immediately was its ability to go into complex, heterogeneous data sets and extract patterns in a way that is query free.”
An agreement was reached in February 2013. The software is being made available to Lawrence Livermore researchers who are looking at ways to use it. So far, “feedback has been positive,” Krishnan said.
Lawrence Livermore is a national leader in modeling, simulation and big data computing, working in areas ranging from climate change to national security. One of the principal areas in which Krishnan hopes to bring the Ayasdi platform to bear is public health, a growing area that fits into the lab’s biodefense portfolio.
“Biodefense has been a mission of the lab for 20 years now,” Krishnan said. In that time the field has moved from simply deploying biological detection devices and into the clinical space. Now biodefense is entwined with public health. The large volume of personal medical data being gathered by public health agencies is “a very rich target for us.”
The lab is at the forefront of the area of bioinformatics, which deals with the storage and analysis of biological data, and hopes to bring the power of topological data analysis to bear in this research area.
“We’re talking about terabytes of data or probably more,” Krishnan said. “If you want to get a handle on the global problem, you are looking at a pretty big data challenge.”