研究生: |
林昱維 Lin, Yu-Wei |
---|---|
論文名稱: |
基於影像到動作轉換之未知環境下目標物件夾取策略 Image-to-Action Translations-Based Target Grasp Strategy Using Reinforcement Learning in Uncertain Environment |
指導教授: |
王偉彥
Wang, Wei-Yen |
口試委員: |
王偉彥
Wang, Wei-Yen 蘇順豐 Su, Shun-Feng 呂成凱 Lu, Cheng-Kai |
口試日期: | 2023/07/11 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 51 |
中文關鍵詞: | 深度增強式學習 、目標物件夾取策略 、近端策略最佳化 、影像到動作的轉換 |
英文關鍵詞: | Deep Reinforcement learning, target grasp strategy, proximal policy optimization(PPO), image-to-action translations |
DOI URL: | http://doi.org/10.6345/NTNU202301269 |
論文種類: | 學術論文 |
相關次數: | 點閱:102 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文的主要目標是利用僅有的彩色影像,使機械手臂在沒有相關的3D位置信息的情況下夾取靜態或動態目標。所提出方法的優點包括在未知環境下,為各種類型的機器人手臂提供一類通用控制策略、能夠自主生成相應的自由度動作指令的影像到動作轉換,以及不需要目標位置。首先,使用YOLO (You Only Look Once)算法進行影像分割,然後將彩色影像分成不同的有意義的對象或區域。採用近端策略最佳化(Proximal Policy Optimization, PPO)算法對卷積神經網絡 (CNN)模型進行訓練。機械手臂和目標物件的彩色影像以及馬達的轉動量分別是CNN模型的輸入和輸出。為了避免機器人手臂與物體碰撞造成機構損壞,在深度增強式學習訓練中使用Gazebo模擬環境。最後,實驗結果展示了所提出策略的有效性。
The main objective of this thesis is to utilize only RGB images to let a robotic arm grasp a static or dynamic target without the related 3D position information. The advantages of the proposed method include a class of general control strategies for various types of robotic arms in uncertain environments, image-to-action translations that can autonomously generate the corresponding degrees of freedom action instructions, and a target position that is not necessary. Firstly, the YOLO (You Only Look Once) algorithm performs the image segmentation. Then every RGB image divides into different meaningful objects or regions. The proximal policy optimization (PPO) algorithm trains the CNN model. The RGB images which keep only the robotic arm and target and the rotational amounts of the motors are the inputs and outputs of the CNN model, respectively. To avoid damage to the mechanism caused by the robotic arm colliding with objects, the Gazebo simulated environment is utilized during deep reinforcement learning training. Finally, some illustrative examples show how effective the proposed strategy is.
[1] Y. LeCun, "Generalization and network design strategies," Connectionism in perspective 19.143-155 (1989): 18.
[2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," nature 323.6088 (1986): 533-536.
[3] A. Vaswani, et al, "Attention is all you need," Advances in neural information processing systems 30 (2017).
[4] ronny.rest, Semantic Segmentation Tutorial - 01 Intro, Available: https://steemit.com/dtube/@ronny.rest/44b78ivb.
[5] J. Redmon, et al, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
[6] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, 2014.
[7] W. Liu, et al, "Ssd: Single shot multibox detector," Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016.
[8] Jacob Solawetz, Francesco, What is YOLOv8? The Ultimate Guide, Available:https://blog.roboflow.com/whats-new-in-yolov8/#yolov8-architecture-a-deep-dive.
[9] L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research 4 (1996):237-285.
[10] J. Ng, 強化學習 Reinforcement Learning, Available:https://medium.com/@jockeyng/%E5%BC%B7%E5%8C%96%E5%AD%B8%E7%BF%92-reinforcement-learning-487aef228c04.
[11] V. Mnih, et al, "Playing Atari with Deep Reinforcement Learning," arXiv preprint arXiv:1312.5602 , 2013.
[12] J. Schulman, et al, "Proximal Policy Optimization Algorithms," arXiv preprint arXiv:1707.06347, 2017.
[13] T. P. Lillicrap, et al, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
[14] OpenAI, Proximal Policy Optimization, Available: https://spinningup.openai.com/en/latest/algorithms/ppo.html.
[15] 謝佩哲,"具機械手臂之履帶式機器人協作任務之實現",國立臺灣師範大學,碩士論文,2022年。
[16] S. M. LaValle, "Rapidly-Exploring Random Trees: A New Tool for Path Planning," Research Report 9811, 1998.
[17] O. Salzman, and D. Halperin, "Asymptotically near-optimal RRT for fast, high-quality motion planning," IEEE Transactions on Robotics, vol. 32, no. 3, pp. 473-483, 4 2016.
[18] G. Kang, Y. B. Kim, Y. H. Lee, H. S. Oh, W. S. You, and H. R. Choi, "Sampling-based motion planning of manipulator with goal-oriented sampling," Intelligent Service Robotics, vol.12, no. 3, pp. 265-273, 2019.
[19] N. Ye, A. Somani, D. Hsu, W. S. Lee, "DESPOT: Online POMDP Planning with Regularization," Journal of Artificial Intelligence Research, vol. 58, pp. 231-266, 2017.
[20] M. Zucker, et al, "Chomp: Covariant hamiltonian optimization for motion planning," The International journal of robotics research, vol 32, no. 9-10, pp. 1164-1193, 2013.
[21] KINOVA, KINOVA Gen2 Ultra User Guide, Available: https://drive.google.com/file/d/1xQbkx1-v3SfAentKR9f3p3c2SVdViyQl/view
[22] logitech, C930e 商務網路攝影機, Available: https://lp.logitechclub.com/vc/zh-tw/product/c930e?gclid=CjwKCAjw-IWkBhBTEiwA2exyOxhXKx0y7CC8IEZks5Hys8es-SWnvfI_Sr377EQ0o7ivPnnqXfwMG%20B%20oCQ3QQAvD_BwE.
[23] LFT24, "ROS 介紹:定義、架構、通信機制," 27 6 2020. Available: https://www.twblogs.net/a/5ef6d19f209c567d16133511.
[24] GAZEBO, Available: https://staging.gazebosim.org/home.
[25] Stable Baselines3, Policy Networks, Available: https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html.
[26] H. Lin, Day 11 - Confusion Matrix 混淆矩陣-模型的好壞 (1), Available: https://ithelp.ithome.com.tw/articles/10254593.