研究生: |
李安民 Akbar, Ilham |
---|---|
論文名稱: |
針對單一和多智能體人形機器人之創新雙演員近端策略優化算法 A Novel Dual-Actor Proximal Policy Optimization Algorithm for Single and Multi-Agent Humanoid Robot |
指導教授: |
包傑奇
Jacky Baltes 薩義德 Saeed Saeedvand |
口試委員: |
李祖聖
Li, Tzuu-Hseng 王偉彥 Wang, Wei-yen 包傑奇 Jacky Baltes 薩義德 Saeed Saeedvand |
口試日期: | 2024/07/01 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 68 |
英文關鍵詞: | DA-PPO, IDA-PPO, Single Agent, Multi Agent, reinforcement learning, cooperative tasks, humanoid robots, robotic navigation |
DOI URL: | http://doi.org/10.6345/NTNU202400949 |
論文種類: | 學術論文 |
相關次數: | 點閱:48 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Single-agent and multi-agent systems are integral to the dynamic environmental processes of reinforcement learning in advanced humanoid robotic applications. This thesis introduces the Dual Proximal Policy Optimization (DA-PPO) algorithm and its extension, Independent Dual Actor Proximal Policy Optimization (IDA-PPO),designed for robotic navigation and cooperative tasks using the ROBOTIS-OP3 humanoid robot. The study validates the effectiveness of DA-PPO and IDA-PPO cross various scenarios, demonstrating significant improvements in both single-agent and multi-agent environments. DA-PPO excels in robotic navigation and movement tasks, outperforming established reinforcement learning methods in complex environments and basic walking tasks. This success is attributed to its innovative architecture, efficient utilization of hardware resources like the NVIDIA GeForce RTX 3050, and an effective reward function strategy. IDA-PPO, with its decentralized training and dual actor policy network, achieves higher mean rewards and faster learning compared to IPPO and MAPPO. IDA-PPO is 5.49 times faster than MAPPO and 8.22 times faster than IPPO, highlighting its superior efficiency and adaptability in multi-agent tasks. These findings underscore the importance of algorithmic innovation and hardware capabilities in advancing robotic performance, positioning DA-PPO and IDA-PPO as significant advancements in robotic learning
S. Saeedvand, M. Jafari, H. S. Aghdasi, and J. Baltes, “A comprehensive survey on humanoid robot development,” The Knowledge Engineering Review, vol. 34, p. e20, 2019.
D. Rodriguez and S. Behnke, “Deepwalk: Omnidirectional bipedal gait by deep reinforcement learning,” in 2021 IEEE International Conference on Robotics and Automation(ICRA), pp. 3033–3039, 2021.
J. Baltes, G. Christmann, and S. Saeedvand, “A deep reinforcement learning algorithm to control a two-wheeled scooter with a humanoid robot,” Engineering Applications of Artificial Intelligence, vol. 126, p. 106941, 2023.
S. Saeedvand, H. Mandala, and J. Baltes, “Hierarchical deep reinforcement learning to drag heavy objects by adult-sized humanoid robot,” Applied Soft Computing, vol. 110,p. 107601, 2021.
R. Sutton and A. Barto, “Reinforcement learning: An introduction,” IEEE Transactions on Neural Networks, vol. 9, pp. 1054–1054, 1998.
H. Xu, Z. Yan, J. Xuan, G. Zhang, and J. Lu, “Improving proximal policy optimization with alpha divergence,” Neurocomputing, vol. 534, pp. 94–105, 2023.
J. Reher and A. D. Ames, “Dynamic walking: Toward agile and efficient bipedal robots,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 4, no. 1, pp. 535–572,2021.
S. Djebrani and F. Abdessemed, “Multi-agent prototyping for a cooperative carrying task,” in 2009 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 1421 1426, 2009.
C. Yu, X. Yang, J. Gao, J. Chen, Y. Li, J. Liu, Y. Xiang, R. Huang, H. Yang, Y. Wu, and Y. Wang, “Asynchronous multi-agent reinforcement learning for efficient real-time multi robot cooperative exploration,” CoRR, vol. abs/2301.03398, 2023.
A. K. Shakya, G. Pillai, and S. Chakrabarty, “Reinforcement learning algorithms: A brief survey,” Expert Systems with Applications, vol. 231, p. 120495, 2023.
F. AlMahamid and K. Grolinger, “Reinforcement learning algorithms: An overview and classification,” 2021 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–7, 2021.
Y. Li, “Deep reinforcement learning: An overview,” ArXiv, vol. abs/1701.07274, 2017.
L. Ye, H. Liu, X. Wang, B. Liang, and B. Yuan, “Multi-task control for a quadruped robot with changeable leg configuration,” pp. 3944–3950, 2020.
M. Kim, J.-S. Kim, and J.-H. Park, “Automated hyperparameter tuning in reinforcement learning for quadrupedal robot locomotion,” Electronics, vol. 13, no. 1, 2024.
Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,” 2024.
L. Kumar, S. Sortee, T. Bera, and R. Dasgupta, “Enhancing efficiency of quadrupedal locomotion over challenging terrains with extensible feet,” CoRR, vol. abs/2305.01998,2023.
J. Dao, K.Green, H.Duan, A.Fern, andJ.Hurst, “Sim-to-real learning for bipedal locomotion under unsensed dynamic loads,” in 2022 International Conference on Robotics and Automation (ICRA), p. 10449–10455, IEEE Press, 2022.V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller,N. Rudin, A. Allshire, A. Handa, and G. State, “Isaac gym: High performance gpu-based physics simulation for robot learning,” ArXiv, vol. abs/2108.10470, 2021.
Z.Xie, H.Y.Ling, N.H.Kim,andM.vandePanne,“Allsteps: curriculum-drivenlearning of stepping stone skills,” in Computer Graphics Forum, vol. 39, pp. 213–224, 2020.
Y. Liu, H. An, and H. Ma, “A biped robot learning to walk like human by reinforcement learning,” in Proceedings of the 4th International Conference on Advanced Information Science and System, AISS ’22, (New York, NY, USA), Association for Computing Machinery, 2023.
R.P.Singh, M.Benallegue, M.Morisawa, R.Cisneros, andF.Kanehiro, “Learningbipedal walking on planned footsteps for humanoid robots,” in 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids), pp. 686–693, 2022.
H.Mou,J.Xue,J.Liu,Z.Feng,Q.Li,andJ.Zhang,“Amulti-agentreinforcementlearning method for omnidirectional walking of bipedal robots,” Biomimetics, vol. 8, no. 8, 2023.
T. Haarnoja, B. Moran, G. Lever, S. H. Huang, D. Tirumala, M. Wulfmeier, J. Humplik, S. Tunyasuvunakool, N. Siegel, R. Hafner, M. Bloesch, K. Hartikainen, A. Byravan, L. Hasenclever, Y. Tassa, F. Sadeghi, N. Batchelor, F. Casarini, S. Saliceti, C. Game,N. Sreendra, K. Patel, M. Gwira, A. Huber, N. Hurley, F. Nori, R. Hadsell, and N. M. O.Heess, “Learning agile soccer skills for a bipedal robot with deep reinforcement learning,”ArXiv, vol. abs/2304.13653, 2023.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” ArXiv, vol. abs/1707.06347, 2017.
C. Qiu, Y. Hu, Y. Chen, and B. Zeng, “Deepdeterministic policy gradient (ddpg)-based energy harvesting wireless communications,” IEEE Internet of Things Journal, vol. 6, no. 5, p. 8577–8588, 2019.
S. Dankwa and W. Zheng, “Twin-delayed ddpg: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent,” in Proceedings of the 3rd International Conference on Vision, Image and Signal Processing, ICVISP 2019, (New York, NY, USA), Association for Computing Machinery, 2020.
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” ArXiv, vol. abs/1801.01290, 2018.
Y. Gu, Y. Cheng, C. L. P. Chen, and X. Wang, “Proximal policy optimization with policy feedback,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 7, pp. 4600–4610, 2022.
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. M. O. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” CoRR, vol. abs/1509.02971, 2015.
S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International Conference on Machine Learning, 2018.
Z. Ning and L. Xie, “A survey on multi-agent reinforcement learning and its application,” Journal of Automation and Intelligence, 2024.
A.Wong,T.Bäck,A.V.Kononova,andA.Plaat,“Deepmultiagentreinforcementlearning: challenges and directions,” Artif. Intell. Rev., vol. 56, p. 5023–5056, oct 2022.
L. Li, Y. Li, W. Wei, Y. Zhang, and J. Liang, “Multi-actor mechanism for actor-critic reinforcement learning,” Information Sciences, vol. 647, p. 119494, 2023.
N. Gupta, S. Anand, T. Joshi, D. Kumar, M. Ramteke, and H. Kodamana, “Process control of mab production using multi-actor proximal policy optimization,” Digital Chemical Engineering, vol. 8, p. 100108, 2023.
B. Dai, A. E. Shaw, N. He, L. Li, and L. Song, “Boosting the actor with dual critic,” ArXiv, vol. abs/1712.10282, 2017.
D.Hendrycks and K.Gimpel, “Gaussianerrorlinear units (gelus),” arXiv: Learning, 2016.
J. T. KimandS.Ha,“Observationspacematters: Benchmark and optimization algorithm,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 15271534, 2021.