A New Approach for Sensitive Rule Hiding Avoiding Side Effects

Data mining has been considered useful in many applications. However, the misuse of mining techniques may lead to the exposure of sensitive information. Recently, researchers have made efforts at hiding sensitive association rules. However, two undesired side effects, i.e., non-sensitive rules falsely hidden and non-existing rules falsely generated, could also occur. In this paper, we present a novel approach that modifies database contents to hide sensitive rules avoiding these side effects. We classify the modifications according to the contents of the transactions to be modified and the modification methods. Each class of modifications is related to a set of sensitive rules, a set of non-sensitive rules, and a set of non-existing rules. Given a class of modifications with the three sets, we can respectively estimate the minimum number of the transactions that need to be modified for hiding sensitive rules and the maximum number of the transactions that can be modified without hiding non-sensitive rules or generating non-existing rules. With these numbers, the number of the transactions to be modified for each class of the modifications is determined. We design mechanisms for efficient rule hiding and the experiment results indicate a perfect scalability in terms of the database size. The effectiveness is evaluated for our chief concerns, i.e., hiding sensitive rules and avoiding the side effects. In most cases, all the sensitive rules are hidden without generating non-existing rules. On the other hand, the correlation among rules makes it impossible to avoid hiding non-sensitive rules in some cases. We perform experiments on various settings of the sensitive rules, and the influences on the number of non-sensitive rules falsely hidden are observed and reported.