详细信息

Leveraging large language models and embedding representations for enhanced word similarity computation ( SCI-EXPANDED收录)

文献类型：期刊文献

英文题名：Leveraging large language models and embedding representations for enhanced word similarity computation

作者：Peng, XiaoHong[1];Jiang, Hongbin[1];Chen, Jing[1];Liu, MingXin[2];Chen, Xiao[3]

机构：[1]Guangdong Ocean Univ, Coll Math & Comp Sci, Zhanjiang 524088, Peoples R China;[2]Guangdong Ocean Univ, Coll Elect & Informat Engn, Zhanjiang 524088, Peoples R China;[3]Hebei Normal Univ Sci & Technol, Marine Sci Res Ctr, Qinhuangdao 066004, Peoples R China

年份：2025

卷号：16

期号：1

外文期刊名：SCIENTIFIC REPORTS

收录：SCI-EXPANDED(收录号：WOS:001660889600009)、、WOS

基金：This research was supported by National Natural Science Foundation of China (No.62172352, No. 61871465, No. 42306218); Department of Education Ocean Ranch Equipment Information and Intelligent Innovation Team Project (No. 2023KCXTD016); Natural Science Foundation of Hebei Province (No. 2022203028, No. 2023407003); Guangdong Ocean University Research Fund Project (No. 060302102304);Key Research Project of Guangdong Province's Colleges and Universities - Project in the Key Field of New Generation Electronic Information (Semiconductor) (No. 2025ZDZX1007).

语种：英文

外文关键词：Word similarity; Large language models; Semantic enhancement; Semantic embedding; Computational framework

外文摘要：Current mainstream methods for computing word similarity often struggle to precisely capture the fine-grained semantics of words across different contexts. Particularly, generative semantic representations typically suffer from issues such as part-of-speech bias, semantic ambiguity, redundant exemplars, and informational redundancy, all of which compromise the accuracy of similarity measurements. To address these problems, this paper proposes WSLE, a word similarity computation framework integrating the semantic generation capabilities of large language models (LLMs) with embedding-based vector representations. First, WSLE addresses four common challenges encountered in generating semantic representations using LLMs-part-of-speech bias, redundant exemplars, semantic ambiguity, and informational redundancy. By applying constraints to lexical items, grammatical categories, semantic descriptions, and prompt length, WSLE effectively mitigates these issues, thus enabling LLMs to generate coherent, precise, and contextually rich semantic representations. Second, these generated semantic representations are transformed into high-dimensional vector embeddings via a deep semantic embedding module, facilitating quantitative assessment of semantic similarity between words. Finally, the effectiveness of WSLE is rigorously evaluated through analyses based on Pearson's correlation coefficient (r) and Spearman's rank correlation coefficient (rho). Experimental results on benchmark datasets, including RG65, MC30, YP130, and MED38, demonstrate that the proposed WSLE framework significantly outperforms existing similarity computation methods, exhibiting notable advantages in accuracy and robustness for word similarity measurement tasks.

参考文献：

正在载入数据...