详细信息
HCAM-CL: A Novel Method Integrating a Hierarchical Cross-Attention Mechanism with CNN-LSTM for Hierarchical Image Classification ( SCI-EXPANDED收录)
文献类型:期刊文献
英文题名:HCAM-CL: A Novel Method Integrating a Hierarchical Cross-Attention Mechanism with CNN-LSTM for Hierarchical Image Classification
作者:Su, Jing[1];Liang, Jianmin[1];Zhu, Jiayi[1];Li, Yongjiang[1]
机构:[1]Guangdong Ocean Univ, Sch Math & Comp, Zhanjiang 524088, Peoples R China
年份:2024
卷号:16
期号:9
外文期刊名:SYMMETRY-BASEL
收录:SCI-EXPANDED(收录号:WOS:001323331300001)、、Scopus(收录号:2-s2.0-85205121249)、WOS
基金:This research was supported by a special grant from the program for scientific research start-up funds of Guangdong Ocean University under Grant No. 060302102303, the Industry-University-Research Innovation Fund Project of the Science and Technology Development Center of the Ministry of Education under Grant No. 2020QT13, the Ministry of Education's Industry-University-Research Collaborative Education Project under Grant No. 239920011, and the National College Students Innovation and Entrepreneurship Training Program under Grant No. 010403102309.
语种:英文
外文关键词:hierarchical image classification; cross-attention mechanism; CNN-LSTM
外文摘要:Deep learning networks have yielded promising insights in the field of image classification. However, the hierarchical image classification (HIC) task, which involves assigning multiple, hierarchically organized labels to each image, presents a notable challenge. In response to this complexity, we developed a novel framework (HCAM-CL), which integrates a hierarchical cross-attention mechanism with a CNN-LSTM architecture for the HIC task. The HCAM-CL model effectively identifies the relevance between images and their corresponding labels while also being attuned to learning the hierarchical inter-dependencies among labels. Our versatile model is designed to manage both fixed-length and variable-length classification pathways within the hierarchy. In the HCAM-CL model, the CNN module is responsible for the essential task of extracting image features. The hierarchical cross-attention mechanism vertically aligns these features with hierarchical levels, uniformly weighing the importance of different spatial regions. Ultimately, the LSTM module is strategically utilized to generate predictive outcomes by treating HIC as a sequence generation challenge. Extensive experimental evaluations on CIFAR-10, CIFAR-100, and design patent image datasets demonstrate that our HCAM-CL framework consistently outperforms other state-of-the-art methods in hierarchical image classification.
参考文献:
正在载入数据...