Mining
sequential patterns in large databases is an important research topic. The main
challenge of mining sequential patterns is the high processing cost due to the
large amount of data. In this paper, we propose a new strategy called DIrect
Sequence Comparison (abbreviated as DISC), which can find frequent sequences
without having to compute the support counts of non-frequent sequences. The main
difference between the DISC strategy and the previous works is the way to prune
non-frequent sequences. The previous works are based on the anti-monotone
property and prune the non-frequent sequences according to the frequent
sequences with shorter lengths. On the contrary, the DISC strategy prunes the
non-frequent sequences according to the other sequences with the same length.
Moreover, we summarize three strategies used in the previous works and design an
efficient algorithm called DISC-all to take advantages of all the four
strategies. The experimental results show that the DISC-all algorithm
outperforms the PrefixSpan algorithm on mining frequent sequences in large
databases. In addition, we analyze these strategies to design the dynamic
version of our algorithm and achieve a much better performance.