详细信息

Leveraging large language models and embedding representations for enhanced word similarity computation

文献类型：期刊文献

英文题名：Leveraging large language models and embedding representations for enhanced word similarity computation

作者：Peng X.; Jiang H.; Chen J.; Liu M.; Chen X.

机构：[1]College of Mathematics and Computer Science, Guangdong Ocean University, Zhanjiang, 524088, China;[2]College of Electronic and Information Engineering, Guangdong Ocean University, Zhanjiang, 524088, China;[3]Marine Science Research Center, Hebei Normal University of Science and Technology, Qinhuangdao, 066004, China

年份：2026

卷号：16

期号：1

外文期刊名：Scientific Reports

收录：Scopus(收录号：2-s2.0-105027279010)

语种：英文

外文关键词：Computational framework; Large language models; Semantic embedding; Semantic enhancement; Word similarity

外文摘要：Current mainstream methods for computing word similarity often struggle to precisely capture the fine-grained semantics of words across different contexts. Particularly, generative semantic representations typically suffer from issues such as part-of-speech bias, semantic ambiguity, redundant exemplars, and informational redundancy, all of which compromise the accuracy of similarity measurements. To address these problems, this paper proposes WSLE, a word similarity computation framework integrating the semantic generation capabilities of large language models (LLMs) with embedding-based vector representations. First, WSLE addresses four common challenges encountered in generating semantic representations using LLMs—part-of-speech bias, redundant exemplars, semantic ambiguity, and informational redundancy. By applying constraints to lexical items, grammatical categories, semantic descriptions, and prompt length, WSLE effectively mitigates these issues, thus enabling LLMs to generate coherent, precise, and contextually rich semantic representations. Second, these generated semantic representations are transformed into high-dimensional vector embeddings via a deep semantic embedding module, facilitating quantitative assessment of semantic similarity between words. Finally, the effectiveness of WSLE is rigorously evaluated through analyses based on Pearson’s correlation coefficient (r) and Spearman’s rank correlation coefficient (ρ). Experimental results on benchmark datasets, including RG65, MC30, YP130, and MED38, demonstrate that the proposed WSLE framework significantly outperforms existing similarity computation methods, exhibiting notable advantages in accuracy and robustness for word similarity measurement tasks. ? The Author(s) 2025.

参考文献：

正在载入数据...