详细信息
Adaptive predefined-time specific performance control for underactuated multi-AUVs: An edge computing-based optimized RL method ( SCI-EXPANDED收录 EI收录) 被引量:4
文献类型:期刊文献
英文题名:Adaptive predefined-time specific performance control for underactuated multi-AUVs: An edge computing-based optimized RL method
作者:Liu, Haitao[1,2,3];Feng, Zhijian[1,2];Tian, Xuehong[1,2,3];Mai, Qingqun[1,2,3]
机构:[1]Guangdong Ocean Univ, Sch Mech Engn, Zhanjiang 524088, Peoples R China;[2]Guangdong Ocean Univ, Shenzhen Inst, Shenzhen 518120, Peoples R China;[3]Guangdong Engn Technol Res Ctr Ocean Equipment & M, Zhanjiang 524088, Peoples R China
年份:2025
卷号:318
外文期刊名:OCEAN ENGINEERING
收录:SCI-EXPANDED(收录号:WOS:001392899400001)、、EI(收录号:20245117561544)、Scopus(收录号:2-s2.0-85212423857)、WOS
基金:This work was supported by the Guangdong Basic and Applied Basic Research Foundation [grant number 2024A1515011345] , the Key Project of the Department of Education of Guangdong Province [2023ZDZX1005] , the Shenzhen Science and Technology Program [grant number JCYJ20220530162014033] , the National Natural Sci-ence Foundation of China [grant number 62171143] , and the Science and Technology Planning Project of Zhanjiang City [grant numbers 2021A05023 and 2021E05012] .
语种:英文
外文关键词:Reinforcement learning; Predefined-time control; Edge computing; Flexibility performance function; Underactuated AUV
外文摘要:In this paper, an optimal control strategy with predefined time-adaptive dynamic programming with a dual-layer structure consisting of a trajectory analysis rule layer and an optimal control rule layer is proposed for solving the 3D trajectory tracking problem of underactuated multi-UAV formation control. In the trajectory analysis layer, an edge computing-based event-triggered fully distributed adaptive state compensator is developed, which is utilized to estimate the state information of the virtual leader to optimize the consistency problem of formation control. In addition, event triggering via edge computing reduces the communication burden. In the optimal control rule layer, to overcome the complexity of the gradient descent algorithm of reinforcement learning (RL) via the square of the HJB equation, first, the solution via a positive function, which is equivalent to the HJB equation, is derived to reduce the complexity of the algorithm. Then, the optimized reinforcement actor-critic neural network (NN) structure is combined with dynamic planning to ensure that each subsystem can achieve the optimal solution. A full-state constraint is imposed on the error system to increase the convergence speed and accuracy of the control system, and a new boundary performance function is designed to address its brittleness to increase the system's ability to cope with unexpected situations. The convergence of all the signals at a globally predefined time is theoretically proven, whereas the effectiveness of the control system is demonstrated via experimental simulations.
参考文献:
正在载入数据...