关键词:
机器学习
特征选择
三元互信息
分类
摘要:
基于信息论的特征选择算法在度量候选特征所提供的分类信息时,往往仅考虑单一标签的情况,忽略了候选特征和成对标签存在的多样关联关系,这可能导致低估了候选特征的重要性。为解决这一问题,提出一种新颖的基于三元互信息的成对多标签特征选择算法(Pairwise multi-label feature selection based on interaction mutual information, IPFS)。具体地,IPFS算法为不同的成对标签分配基于三元互信息的不同权重,并据此权重测量候选特征为两个标签提供的分类信息总量,从而精确评估候选特征的重要性,同时基于最大相关最小冗余原则,筛选出最优的特征子集。最后,将提出的算法与其他8个先进的特征选择算法在12个多样化的数据集上进行了比较。实验结果表明,IPFS在3个评估指标上均显著优于其他算法。The feature selection methods based on information theory usually focus on considering the single label when evaluating the classification information provided by the candidate features, and do not take into account the multiple correlations between the candidate features and the paired labels, thus underestimating the importance of the candidate features. To solve this issue, an innovative paired multi-label feature selection method based on interaction mutual information (IPFS) was proposed. Specifically, IPFS method assigns different weights based on interaction mutual information to different pairs of labels, so as to accurately evaluate the importance of candidate features, and further select the most suitable feature subset based on the maximum correlation minimum redundancy strategy. To verify the effectiveness of the proposed method, IPFS is compared with eight other advanced feature selection methods on 12 diverse datasets, and the results show that IPFS significantly outperforms other methods on four different evaluation metrics.