详细信息

Towards multimodal underwater object detection: A bidirectional feature recomposition network and visual-sonar dataset ( SCI-EXPANDED收录 EI收录)

文献类型：期刊文献

英文题名：Towards multimodal underwater object detection: A bidirectional feature recomposition network and visual-sonar dataset

作者：Wu, Yujie[1];Wang, Wenling[2];Lin, Cong[3];Hou, Mingxin[4];Liu, Mingxin[3,5]

机构：[1]Guangdong Ocean Univ, Coll Naval Architecture & Shipping, Zhanjiang 524088, Peoples R China;[2]Hainan Univ, Coll Informat & Commun Engn, Haikou 570100, Peoples R China;[3]Guangdong Ocean Univ, Sch Elect & Informat Engn, Zhanjiang 524088, Peoples R China;[4]Guangdong Ocean Univ, Sch Mech Engn, Zhanjiang 524088, Peoples R China;[5]China Sea Marine Ranching, Guangdong Prov Key Lab Intelligent Equipment South, Zhanjiang 524088, Peoples R China

年份：2026

卷号：316

外文期刊名：EXPERT SYSTEMS WITH APPLICATIONS

收录：SCI-EXPANDED(收录号：WOS:001707829600001)、、EI(收录号：20261120273646)、WOS

基金：This work was supported in part by the National Natural Science Foundation of China (62171143), Natural Science Foundation of Guangdong Province (2025A1515011356 and 2025A1515012901), the Stable Supporting Fund of sonar Science and Technology Laboratory (JCKYS2024604SSJS00301), the program for scientific research start-upfunds of Guangdong Ocean University (060302112405), Guangdong Provincial University Innovation Team (2023KCXTD016) and Undergraduate Innovation Team Project of Guangdong Ocean University (CXTD2024011 and JDTD2024003) .

语种：英文

外文关键词：Underwater object detection; Multimodal feature fusion; Visual-sonar images; Attention mechanism

外文摘要：The Remotely Operated Vehicle (ROV) is a critical platform for marine resource exploration and biodiversity surveys, primarily relying on optical and sonar sensors for underwater perception.However, unimodal approaches are limited by the inherent constraints of optical or sonar sensors, resulting in restricted perceptual capabilities.While multimodal information fusion can achieve complementary advantages, the heterogeneous nature of visual-sonar image modalities makes feature alignment and fusion particularly challenging.To address these challenges, this paper proposes an underwater visual-sonar attention fusion detection network (VSAFDet). This network fully leverages feature information from both visual and sonar images to enhance object detection performance in underwater visual images.Specifically, a visual-sonar adaptive fusion (VSAF) module is proposed to tackle severe spatial misalignment via an attention-based feature reorganization mechanism, while achieving effective fusion of global contextual information through bidirectional channel weight modulation.Subsequently, we design a multimodal feature extraction and recognition network that employs dual-stream multi-scale feature extraction to achieve object detection in visual images assisted by sonar images. Furthermore, we establish an underwater multimodal data acquisition system and construct a spatiotemporally synchronized underwater multimodal object detection dataset (UMOD), providing a benchmark platform for researching and evaluating underwater visual-sonar multimodal feature fusion technologies.Extensive quantitative and qualitative experiments on the UMOD dataset demonstrate that VSAFDet effectively utilizes cross-modal complementary advantages, significantly improving underwater object detection capability compared to existing mainstream methods. The UMOD dataset can be found at https://github.com/UM-Research-Ng/UMOD-dataset.

参考文献：

正在载入数据...