Data stream management systems and architectures

M. A. Hammad, T. M. Ghanem, W. G. Aref, A. K. Elmagarmid, M. F. Mokbel

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

4 Citations (Scopus)

Abstract

A growing number of applications in several disciplines - sensor networks, Global Positioning Systems (GPSs), Internet, retail companies, and the phone network industry, for example - deal with a new and challenging form of data; namely data streams. In these applications, data evolve over time in an unpredictable and bursty arrival fashion, representing streams of sensor-measured values, locations of moving objects, network traffic, sales transactions or phone call records. In contrast to conventional data management applications, a major requirement of streaming applications is to continuously monitor and possibly react to new, interesting events at the input streams. Since input streams are unbounded, stream applications may specify that user queries should process only recent input data (e.g., data items that arrive within the last hour). This type of queries is termed time-based sliding window queries or SWQs for short. This chapter describes our efforts to address the research challenges that arose while building Nile, a prototype data stream management system. Nile focuses on executing sliding-window queries over data streams and provides an operational platform that facilitates experimental data stream management research. Specifically, Nile includes novel technologies that address the following fundamental research challenges. • Correctness of SWQ. SWQ introduces a new semantic. For example, in contrast to a SQL query that processes fixed multi-sets of input data and produces a fixed multi-set of output data. SWQ processes continuous flows of input data streams and produces a continuous flow of output data stream. Therefore, in order to judge new algorithms of evaluating SWQ, a correctness criterion must be in place. One approach to define a correct execution of SWQ is to map the outputs from SWQ at specific time instants to the outputs from repeatedly executing a SQL query with specific properties (described in Sect. 5.3). For now, we will refer to this SQL query as the time-instant, equivalent query or Qc for short. • Efficient Execution of SWQ. SWQs introduce significant challenges to conventional query processors for the following primary reason: with a high input rate, SWQ will encounter significant processing overhead to produce a new result as a new data item arrives. Nile adopts the correctness view of SWQ as repeated evaluations of Qc. Therefore, Nile progressively builds on the inputs, intermediate data structures, and output of Qc at one time instant to generate inputs, intermediate data structures, and output of Qc at the following time instant. At the query level, Nile includes two approaches to progressively evaluate SWQs; namely the time probing approach and the negative tuple approach. Moreover, the following research goals were addressed using Nile as the underlying system. • Evaluating Multi-way Stream Window Joins. The multi-way join is a common operation in several streaming applications. When combined with the windowed execution, the continuous evaluation of the multi-way window join (or W-join for short) needs to instantly react to new tuples as well as expired tuples. The research in W-join uses the Nile progressive evaluation approach and features novel algorithms to maintain the join states (i.e., the hash tables maintained by the join operation). • Scalable Execution of Multiple SWQs. The data stream processing architecture involves potentially large numbers of SWQs. The availability of a collection of standing queries in stream database systems raises the potential for aggressively sharing the processing required by multiple queries.Window joins are at the core of emerging architectures for continuous query processing over data streams. A naive approach would treat identical joins with different window constraints as having different signatures and would execute them separately. In this research we formulate the problem of shared window join processing and describe scheduling algorithms that prioritize such shared execution to reduce the average response time per query while preserving their original semantics. This chapter is structured as follows. Section 5.2 presents the Nile representations of streams and SWQs. The SWQ correctness criterion is presented in Sect. 5.3. Section 5.4 describes the progressive evaluation of SWQ at the operator and the query-plan levels. In Sects. 5.5 and 5.6 we present the multi-way stream window join and the shared execution of multiple window joins over data streams, respectively. The related work is presented in Sect. 5.7 and we conclude the chapter with a summary in Sect. 5.8.

Original languageEnglish
Title of host publicationLearning from Data Streams
Subtitle of host publicationProcessing Techniques in Sensor Networks
PublisherSpringer Berlin Heidelberg
Pages51-71
Number of pages21
ISBN (Print)9783540736783
DOIs
Publication statusPublished - 2007
Externally publishedYes

Fingerprint

Dive into the research topics of 'Data stream management systems and architectures'. Together they form a unique fingerprint.

Cite this