Research of Continuous Query Processing Techniques over Data Streams

 

During the last couple of years, a number of researchers have paid their attention to data stream management, which is different from the conventional database management. At present, the new type of data management system, called data stream management system (DSMS), has become one of the most popular research areas in data engineering field. Lots of research projects have made great progress in this area, such as Aurora, Niagara, Stream, Telegraph, and so on. However, in our country the efforts made in this area are still limited. To satisfy all kinds of requirements in DSMS, this project will focus on the core technique ¾ continuous query processing and develop the three key techniques it needs, which are described respectively as follows:

 

1. Continuous Query Processing of Relational Data

For relational data, like the conventional database query processing, the main issue is how to do the query optimization. The novel features coming from DSMS include the continuous changes of distributions and arrival rates over query and data streams, and also the system overload problem. As a result, this project will respectively study the scalability and adaptability issues of continuous query processing as well as the load shedding problem in three years.

 

2. Monitoring of Query and Data Streams

The change statistics of query and data streams can be utilized for query optimization. Due to the dynamic feature of data stream environment, we need more efficient data analysis techniques in order to capture the current changes over query and data streams. As a consequence, this project will respectively study the pattern mining on query streams, the statistics approximation and burst detection on data streams in three years.

 

3. Continuous Query Processing of Sequential Data

Since the current DSMS does not support queries on sequence data, this project will study the issues related to three types of sequence data. In the first year, we will study the content filtering on multi-valued streams, such as network packets and radio music. In the second year, we will focus on the content filtering on multi-attributed streams, such as video films. At last, we will consider the integration of multiple streams, where both types of sequence data are involved. We will discuss the related issues such as how to build an efficient index for all queries on different streams.

 

In this project, we will research into the continuous query processing technology and develop the needed data mining techniques. This project expects to compete with the best research teams in this area. Moreover, this project also proposes novel techniques for continuous query processing on sequence data. The research results will lead the entire area to a new field with more applications and greatly contribute to the technological and academic promotion of our country.