关键词:
privacy preserving data publishing
k-anonymity
quasi-identifiers
attributes graph
cut-vertex
摘要:
The rapid development of data publishing and information access technology bring a growing number of problems in privacy leakage. In order to avoid linking attacks happened between attributes, K-anonymity model was proposed and become the most widely used in privacy preserving data publishing. Identification of quasi-identifiers (QIs) is one of the primary problems which will directly affect the effectiveness of K-anonymity method. However, most of the existing methods ignored this problem or just choose QIs empirically. These will greatly reduce the validity of K-anonymity method as well as the utility of anonymous data. In this paper, we study the problem of finding QIs for privacy preserving data publishing method based on K-anonymity model. Firstly, we analyze the roles of QIs from the view of independence of sets, and define it as a collection of attributes that can separate sensitive attributes from the other non-sensitive attributes. Then, we propose a construction method for attribute graph based on relationship matrix, which can represent potential connectivity of publishing data, published data and external knowledge. Finally, we put forward an identification algorithm for QIs based on the concept of cut-vertex, which is aiming to find the necessary and minimum QIs. The proposed algorithm is useful to avoid inconvenience and inaccuracy caused by artificial partition of QIs, and can be applied in data publishing situations with multiple sensitive attributes after some extension. Experiments and analysis show that the proposed identification algorithm has better partition ability and lower computational complexity. Therefore, it has good practical value in the application environment of publishing of big data.