Stream of consciousness

Complex-event processing engines can make sense of data flowing in from many sources

Current with Events

CEP can be used to track shipments, other event-driven systems

Where could complex-event processing work better than standard database analysis? During the May JavaOne conference in San Francisco, Thomas Bernhardt offered one answer: tracking shipments of valuable items.

Bernhardt is head of a project to develop complex-event processing software called Esper. He said the software could easily keep track of warehouses of items bearing radio frequency identification tags.

Say, you are in the military overseeing warehouses full of ammunition. How do you keep track of all the items, especially as they move from building to building? How could you set up an alert to tell you if a box 'fell off the truck' or was 'lost in transit,' in the parlance of yore? RFID readers would detect a box of ammunition leaving or entering a building. But what back-end software could interpret and coordinate these messages? In a busy environment, their sheer numbers would quickly overwhelm a traditional database trying to log them.

Nor would standard database analytic software necessarily be able to coordinate all these messages into something meaningful. If a box of ammunition were moved from one building to another, how could the manager tell it had made the journey successfully, short of manually following the progress of each RFID number? Online analytical processing may be good for parsing multidimensional datasets, but it doesn't do well with making sense of data elements in terms of when each element was logged in.

'It is hard to deal with temporal streams of data,'Bernhardt said.

Bernhardt characterized systems such as an RFID-equipped warehouse as having 'or needing ' an event-driven architecture. These systems are built around detection of and reaction to events. Timing and aggregation are essential ' and difficult. 'It [is] very hard to construct an event-driven application manually,' he said.

In the industry, the basic processing for this sort of environment is called event stream processing (ESP). The idea is to pull the right events out of a constant stream of data points and aggregate them in a meaningful way. For instance, someone might want to calculate the average price of a stock over the past 30 ticks of trading, which involves constantly recalculating the average of numbers that are being constantly updated.

But even ESP isn't strong enough to handle more-complex jobs, such as watching RFID tags as they move about. Here is where CEP comes in. CEP is the advanced analytical detection of patterns in a series of events. A CEP function specifically looks for complex behaviors, such as two successive gains in stock price of one company followed immediately by a drop in the price of another company's stock.

Bernhardt's open-source Esper software can do both ESP and CEP. Under benchmarks, Esper has been able to process and make sense of about 110,000 events per second. The Esper container works in either a Java Enterprise Edition application server or the Microsoft .Net framework.

What Esper does is run all the messages it receives through a set of queries defined by a user. Instead of committing these messages to a database and then analyzing them, ESP analyzes the data as it comes across the wire. As soon as one of the query engines detects a set of messages that meets some predefined set of conditions, Esper alerts the user or another program.

Users describe the events they are looking for in Esper's Event Query Language, an SQL-like language with a number of unique characteristics for identifying data on the fly. EQL can do many things SQL can't. For instance, EQL can define a range of parameters as a unique set. You can define a warehouse based on the geographic coordinates within that warehouse, so you know any items identified at one of those coordinates are in the warehouse. Although SQL lets you specify a certain point in time, EQL lets you specify a period, or window, of time. So you can easily compile a list of items that have entered a warehouse within the last 30 seconds, 30 minutes or some other window. And, like SQL, it can bring in additional sets of data for comparison through the join function. You could join data from two database tables to watch how materials are flowing from one location to another.

Using this function, Bernhardt said, you could write a query that could watch when a large number of items moves from one warehouse to the next and send you an alert when one or more of the items that left one location fail to reach their destination. And so CEP solves a problem with an ease other tools can hardly match.

To see Bernhardt's JavaOne presentation, go to GCN GCN.com/806. To download Esper, go to esper.codehaus.org.

' Joab Jackson

An emerging class of software that can monitor data bombarding an organization's information technology infrastructure from multiple sources is making inroads into the government sector.

Known as complex-event processing, or CEP, the software can detect patterns in intricate situations from multiple sources, giving analysts a deeper understanding of their business processes and events.

First used to analyze trading transactions on Wall Street, CEP engines are being applied to other areas, such as intelligence and surveillance, battlefield command and control, and network monitoring, industry experts say.

That's why In-Q-Tel, the CIA's venture-capital arm and technology incubator, is interested in the technology. The company made a strategic investment in StreamBase Systems, a developer of CEP technology, in February.

'We're seeing the emergence of devices that are flooding businesses with events,' said Troy Pearsall, executive vice president of technology transfer at In-Q-Tel.
'These devices range from your standard network monitoring devices to radio frequency identification tags to sensor networks that are really upping the ante in terms of the volume of data being flowed into organizations,' he said.

Traditional databases and their emphasis on historical data don't give business users and analysts the level of analysis they need to make decisions on the fly, Pearsall said. 'Users want to take action more quickly, and that leads to the need for complex event processing. You want to make decisions as events are flowing into your organizations, not after the fact.'

In-Q-Tel chose StreamBase, a four-year-old company, because of its strong development environment, he said. 'They have a real strong graphical and text-based development environment that allows mere mortals to stand up applications very quickly.'

StreamBase also can connect to many sources of information, and the company's CEP engine can integrate historical data and real-time data to answer questions, Pearsall said.

CEP is an emerging market, and ' as in the business intelligence and standard database markets ' there will be a lot of complementary technologies coming out in the next few years, he said.

'Certainly, we're always looking for opportunities,' Pearsall said about the possibility of investing in more CEP vendors.

'There are three attributes that really define complex-event processing: high data volumes, instant response and complex analytics,' said Bill Hobbib, vice president of marketing at StreamBase. 'If you have all three together, it's a complex-event processing problem.'

The StreamBase software platform handles high-volume data. The company is working with users in the areas of intelligence and surveillance, intrusion detection, network monitoring and battlefield command and control, he said.

StreamBase runs on Microsoft Windows, Sun Microsystems Solaris and Linux platforms. 'So, it runs fast on commodity platforms,' Hobbib said.

The company released in June a new version of its software platform, StreamBase 5.0, which speeds the development and deployment of real-time applications.
The new release offers built-in support for IBM's DB2 data server, WebSphere Front Office and xSeries hardware.

Other new features include out-of-the box application development frameworks, end-to-end application development, expanded support for advanced data types, flexible pattern matching, enterprise security and integration with a broad range of market data, and messaging infrastructure systems.

Flexible pattern matching is significant, and StreamBase plans to expand more into this area, Hobbib said.

For example, with pattern matching, a network administrator might say, 'If a network event occurred, and its characteristics look like X, we're going to consider this person an intruder and kick him out of our network.' But with flexible pattern matching, network analysts can say, 'If A occurred followed by B and then C, this is a particular sequence of events or patterns, and this is really significant.' Or they can say, 'A occurred, and we think it is suspicious, but because it wasn't followed by B and C, then maybe it's not so suspicious that we need to give it the highest level of alert. It can be given a middle level,' Hobbib said.

This concept of looking for patterns in real-time events is applicable to the battlefield, intelligence and network monitoring, he said.

Another direction for the company is support for binary large objects, or BLOBS, which would aid battlefield command, control, intelligence and surveillance. This involves working with partners to extract features from video images and audio files, Hobbib said.

Other areas of expansion include the ability to process advanced data types and enterprise security tightly integrated with Lightweight Directory Access Protocol and authentication systems.

Teaming for CEP

Two other companies, ANTs Software and Coral8, have teamed on a next-generation CEP environment. ANTs' high-performance SQL database management system will work in conjunction with Coral's CEP engine, company officials said. The combination of ANTs Data Server and the Coral8 Engine will boost application performance in the CEP arena, said John Morell, director of product marketing at Coral8.

'It's important to recognize that you just don't process and generate events and throw them out into the ether,' he said. 'When these high-level events happen, you need to be able to store them somewhere, and sometimes in multiple places.' Users need to be able to access data to get at all the information relevant to an event.

'No matter which way you store it, you need a high-speed storage system like the ANTs database that can keep up with all of the events that are flagged through the system,' Morell said. 'You're talking hundreds of thousands ' sometimes half a million events per second.'

'That's what we are doing for Coral8 ' the ability to do a high number of inserts per second into our database,' said Cesar Rojas, ANTs' director of marketing.
The Coral8 application rapidly presents data, but the ANTs database can easily keep up because of its ability to do as many as one million inserts per second, he said. Also, ANTs lets users quickly process data, he added.

ANTs is no stranger to the defense market. The high-performance database has been used in conjunction with the Navy's DDG 1000 Zumwalt-class destroyer project, said Patrick Moore, an ANTs vice president.

In that capacity, the product has integrated with real-time extensions to Red Hat Linux developed by IBM and IBM's service-oriented product, Real Time WebSphere, to support mission-critical ship operations, he said.

Old school, new school

Another CEP engine used in government is Progress Software's Apama. One could call Apama the godfather of complex-event processing.

The Apama algorithmic engine was introduced in 1999 by two researchers from Cambridge University with some assistance from developers at Stanford University and the California Institute of Technology, said Mark Palmer, vice president and general manager at Progress' Apama division.

Apama was one of three vendors that sold complex processing engines at the time, he said. The company and product were acquired by Progress Software in April 2005.
Apama's origins are in the capital trading market, but government is a growing sector.

'We've worked in government applications with radar feeds of intelligence information that come from surveillance planes,' he said. That information is transmitted back to a central, event-driven architecture environment that analyzes it and distributes it to appropriate analysts for defense purposes.

The Progress Apama Event Processing Platform is a complete environment for creating applications that monitor event streams, detect event patterns and take action. The platform includes Business Activity Monitoring capabilities and a sophisticated CEP language, company officials said.

Truviso, formerly Amalgamated Insight, a new player in the CEP arena, uses standard SQL language for querying, multiple heterogeneous data streams, said Mike Trigg, co-founder and executive vice president of marketing and business development at Truviso.

Building on experience

'Compared to some of the earlier products in this space, we think taking a standard approach using SQL ' a language with 30-plus years of proven work behind it in the relational database world ' is an important part of what we are doing differently,' he said.

Three main things differentiate the company's CEP product from others, he said. The core engine has an adaptive query processor, which lets the product support a wide set of SQL queries, such as user-defined functions, user-defined aggregates and subqueries. Some vendors that have taken an SQL approach have not included that capability, Trigg said.

Adaptive query process also allows the product to run thousands of concurrent queries against incoming streams of data. This lets Truviso perform sophisticated analytics on the fly, he said.

Another critical function is visualization capabilities. The product has a full Web-based user dashboard, which is critical for helping users understand what's going on with the data they are analyzing, Trigg said.

The third piece is a full database embedded within the product, not just hooked on the side. The company took the PostgresSQL open-source database and added Truviso's extreme processing capabilities to the Postgres engine to give users streaming and more traditional relational database capabilities. In the same engine, they can have queries run over streams and tables, and they can do caching and archiving, Trigg said.

'In the real-world use cases that we see, you're not just analyzing incoming streams, but you want to compare those streams to aggregates, historical information, trends and averages,' he said.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above