The big data challenge: How to improve time-to-insight
- By Ray Falcione
- Apr 17, 2020
Government agencies are looking for new ways to combine their silos of information into a single view to help them make better decisions, reduce costs and improve time-to-value. But this is a huge challenge when the most valuable databases can be difficult, if not impossible, to join together. This kind of segregation is expensive and inhibits good decision-making and integrated insights.
When critical questions need answering, the typical approach is to analyze these silos of data individually. This can be acceptable if results aren’t needed for hours or days. But today, and for government agencies in particular, that’s rarely the case. In fact, data can become stale even before a query is complete.
Slow, disjointed data analysis presents two types of difficulties. One is the time requirement; in many situations the quality of a decision degrades quickly when insight is delayed. There is a big difference between an answer that comes in minutes versus a half-second. Those small time deltas can make an immense difference when critical services -- and even lives -- are on the line.
The other challenge is the curiosity factor. In conventional analytics, the freedom to be curious is time consuming and costly. Simply asking a question may require a large effort to prepare the data and reserve the hardware. Yet if decision-makers can ask one question, get the answer, and then ask three more questions based on that first answer without having to pay a time penalty, they have the opportunity to be significantly more curious.
Agencies typically underappreciate overlook this benefit until they’ve experienced it. Without speed and agility, agencies can end up in a rut, asking the same stale questions over and over again because the time/value equation never improves.
A new era in analytics
Better technology, however, is beginning to break down silos and significantly bend the time curve. New platforms based on parallel processing architectures deliver insights at the speed of curiosity. When CPUs and GPUs are able to ingest and interrogate billions of lines of data per second, using familiar interfaces including JDBC, SQL and Kafka, the possibilities are transformative not only in terms of speed, but inquiry as well.
With parallel processing it’s possible to reset the analytics paradigm by merging and integrating conventional data analytics with spatiotemporal analysis. The result is a visualized, map-based analytics environment that provides democratized insight to everyone, regardless of their technical background.
Parallel processing removes traditional data manipulation workarounds like pre-indexing, down-sampling and pre-aggregation. It combines this power with visualization capabilities that allow users to not only view data geographically, but also drill down, pivot and cross-filter in real time, regardless of scale or data size.
What’s more, datasets -- even joined internal and external datasets -- can be cross-indexed for greater understanding. Comparing data from multiple sources allows users to assess what is happening not just within their organization, but also with other entities and even the world at large.
For governmental agencies, the possibilities are remarkable. For example, the Federal Aviation Administration’s ADS-B (Automatic Dependent Surveillance—Broadcast) data, when viewed alone, shows the location of all aircraft flying in U.S. airspace at a given time. Because this is a time series, the scale is enormous, with a row being generated by each plane every five seconds.
Currently this ADS-B data allows a curious analyst to ask, “Show me which planes are X distance off their typical flight path.” However, with a data platform built to take advantage of a parallel processing architecture, users can reduce the cost of that query to milliseconds. Then, by adding real-time weather data, they can quickly discern which aircraft are off their normal flight path due to a thunderstorm that’s coming in, versus those affected by other causes.
It’s also interesting to note that both ADS-B and weather data are publicly available. If these kinds of public sources are instantly compared against private or internal information, decisions can be impacted in a very big way.
Benefits across government
Parallel processor-based analytics platforms support more than just faster querying. They are also able to render hundreds of millions of points, lines and polygons in mere milliseconds, then make them filterable and interactive. This kind of geospatial interrogation has never before been achievable.
In the past, analysts would either have to down-sample data to accommodate the limited power of their platform or purposely look at only a small portion of a map. If health officials studying the COVID-19 pandemic, for instance, wanted to track patterns-of-life for those who were geolocated with an infected person, they were stuck with a terrible choice: look at one small area of a rendering, or wait for hours. Even further, if analysts wanted to measure levels of social distancing, contact tracing and travel/activity restrictions over specific geographies, then correlate these findings with the rate of spread of COVID-19 in those regions to predict the rate of spread based on movement behavior, those results simply aren’t readily available.
Today, however, they are, thanks to several analytics and data companies working in task forces with non-profit organizations, private industry, academics institutions and the federal government. Their efforts are supporting health officials looking at data in new ways to help address the immediate needs of the global community.
Advanced analytics also makes it possible to democratize the pursuit of curiosity and insight in every corner of an organization. No longer are people outside the data team locked into a limited selection of static pie charts or stale histograms; now those same decision-makers can explore geospatially mapped, cross-filtered data on their own. Such agility not only opens data to end-users in a self-serve environment, but also frees data scientists and analysts to pursue the deeper, predictive, machine learning work they were probably hired to do.
When an agency’s collections of data are finally connected, deeper insights are possible; and when the time it takes to get those insights is no longer an issue, better decision-making becomes the norm. Today, everyone can access the right data from the right sources, explore them in the right way and obtain answers never before possible -- at the time those answers are needed. Silos may still be necessary -- but with the new analytics technologies available, their contents are sprouting fields of knowledge like never before.
Ray Falcione is vice president of U.S. federal business at OmniSci.