关键词:
文本分类
预训练模型
图神经网络
摘要:
文本分类作为自然语言处理领域的一个核心任务,旨在实现对文本数据的自动化归类,使其对应到预先设定的类别之中。BertGCN模型结合了BERT和GCN两者的优势,从而能够有效地处理文本和图结构数据。然而,该模型在应对复杂的文本分类任务时,仍然存在一定的局限性。BERT使用绝对位置编码来表示每个词在序列中的位置,不能很好地捕捉句子中词语之间的相对关系,同时,BERT模型将词语的内容信息和位置信息结合在一起进行处理,可能导致模型难以区分这两种不同的信息。为了克服这些限制,我们提出了DeGraph-Net模型,通过引入DeBERTa模型,来提升文本分类的效果。DeBERTa使用相对位置编码,更好地表示词语间的相对位置关系。此外,DeBERTa将词语的内容信息和位置信息分开处理,避免了内容信息和位置信息的混淆,提高了模型分类的准确率。实验结果表明,DeGraph-Net模型在三个基准文本分类数据集上均取得了显著的性能提升,验证了该模型在复杂文本分类任务中的有效性。Text classification is a core task in the field of natural language processing, which aims to automatically categorize text data into predefined categories. The BertGCN model combines the advantages of both BERT and GCN, enabling it to effectively handle both text and graph-structured data. However, there are still some limitations when it comes to handling complex text classification tasks. BERT uses absolute position encoding to represent the position of each word in a sequence, which may not effectively capture the relative relationships between words in a sentence. Additionally, by combining content and position information, the BERT model may struggle to differentiate between these two distinct types of information. To overcome these limitations, we propose the DeGraph-Net model. We enhance text classification performance by incorporating the DeBERTa model. DeBERTa uses relative position encoding, which better represents the relative positional relationships between words. Additionally, DeBERTa processes the content information and location information of words separately, preventing the confusion between these two types of data and improving the model’s classification accuracy. Experimental results demonstrate that the DeGraph-Net model achieves significant performance improvements on three benchmark text classification datasets, validating the model’s effectiveness in complex text classification tasks.