关键词:
信用评分
类不平衡
集成学习
LightGBM
摘要:
信用评分作为金融风险管理和决策制定的核心环节,对于金融机构的稳健运营和市场竞争力至关重要。然而,信贷数据中普遍存在的类别不平衡现象,即违约样本数量远小于非违约样本数量,给信用评分模型的构建带来了挑战,容易导致模型偏向多数类而忽略少数类,从而降低模型的预测准确性和泛化能力。为解决这一问题,提出的GLOF-BFL-LightGBM模型采用了一种分阶段的优化策略。首先,考虑到异常样本的存在会进一步加剧类别不平衡的影响,并降低模型的鲁棒性,本研究引入高斯加权局部异常因子(GLOF)技术,识别并剔除数据中的潜在异常样本,以净化数据集并提高模型的稳定性。其次,为了提升模型对少数类的识别能力,采用Focal Loss损失函数来降低多数类样本对模型训练的影响,并利用贝叶斯优化技术自动搜索Focal Loss损失函数的最优参数,以获得最佳的类别不平衡学习效果。为验证模型的有效性,本文在UCI数据库的四个信贷数据集上进行了实验,并将GLOF-BFL-LightGBM模型与多种基线模型(包括传统的分类方法和常规的集成学习模型)进行了比较。实验结果表明,GLOF-BFL-LightGBM模型在AUC、KS值等关键指标上均优于对比模型,有效提升了信用评分的准确性和模型的泛化能力,为个人信用风险评估提供了一种可靠的工具。Credit scoring, as a core aspect of financial risk management and decision making, is crucial to the sound operation and market competitiveness of financial institutions. However, the prevalent category imbalance in credit data, in which the number of default samples is much smaller than the number of non-default samples, poses a challenge to the construction of credit scoring models, which can easily lead to a model biased toward the majority class and ignoring the minority class, thus reducing the predictive accuracy and generalization ability of the model. To address this problem, the proposed GLOF-BFL-LightGBM model adopts a staged optimization strategy. First, considering that the presence of anomalous samples can further exacerbate the effect of category imbalance and reduce the robustness of the model, this study introduces the Gaussian-weighted local anomaly factor (GLOF) technique to identify and remove potentially anomalous samples from the data in order to purify the dataset and improve the stability of the model. Secondly, in order to improve the model’s ability to identify the minority class, the Focal Loss loss function is used to reduce the impact of the majority class samples on the model training, and Bayesian optimization technique is used to automatically search for the optimal parameters of the Focal Loss loss function in order to obtain the best class imbalance learning effect. To verify the effective