Mining Frequent Sequences by Direct Sequence Comparison: The Dynamic DISC-all Algorithm

Mining sequential patterns in large databases has been an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. In this paper, we propose a novel strategy to find all the frequent sequences without having to compute the support counts of non-frequent sequences. The previous works prune candidate sequences according to frequent ones with shorter lengths. On the contrary, our strategy prunes candidate sequences according to non-frequent ones with the same lengths. We summarize three strategies mainly used in the previous works and design an efficient algorithm to take full advantages of all the four strategies. The experiment results show that our algorithm outperforms the previous works on mining frequent sequences in large databases.