國立臺灣師範大學博碩士論文全文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林昱維 Lin, Yu-Wei
論文名稱：	基於影像到動作轉換之未知環境下目標物件夾取策略 Image-to-Action Translations-Based Target Grasp Strategy Using Reinforcement Learning in Uncertain Environment
指導教授：	王偉彥 Wang, Wei-Yen
口試委員：	王偉彥 Wang, Wei-Yen 蘇順豐 Su, Shun-Feng 呂成凱 Lu, Cheng-Kai
口試日期：	2023/07/11
學位類別：	碩士 Master
系所名稱：	電機工程學系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	51
中文關鍵詞：	深度增強式學習、目標物件夾取策略、近端策略最佳化、影像到動作的轉換
英文關鍵詞：	Deep Reinforcement learning, target grasp strategy, proximal policy optimization(PPO), image-to-action translations
DOI URL：	http://doi.org/10.6345/NTNU202301269
論文種類：	學術論文
相關次數：	點閱：331 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文的主要目標是利用僅有的彩色影像，使機械手臂在沒有相關的3D位置信息的情況下夾取靜態或動態目標。所提出方法的優點包括在未知環境下，為各種類型的機器人手臂提供一類通用控制策略、能夠自主生成相應的自由度動作指令的影像到動作轉換，以及不需要目標位置。首先，使用YOLO (You Only Look Once)算法進行影像分割，然後將彩色影像分成不同的有意義的對象或區域。採用近端策略最佳化(Proximal Policy Optimization, PPO)算法對卷積神經網絡 (CNN)模型進行訓練。機械手臂和目標物件的彩色影像以及馬達的轉動量分別是CNN模型的輸入和輸出。為了避免機器人手臂與物體碰撞造成機構損壞，在深度增強式學習訓練中使用Gazebo模擬環境。最後，實驗結果展示了所提出策略的有效性。

The main objective of this thesis is to utilize only RGB images to let a robotic arm grasp a static or dynamic target without the related 3D position information. The advantages of the proposed method include a class of general control strategies for various types of robotic arms in uncertain environments, image-to-action translations that can autonomously generate the corresponding degrees of freedom action instructions, and a target position that is not necessary. Firstly, the YOLO (You Only Look Once) algorithm performs the image segmentation. Then every RGB image divides into different meaningful objects or regions. The proximal policy optimization (PPO) algorithm trains the CNN model. The RGB images which keep only the robotic arm and target and the rotational amounts of the motors are the inputs and outputs of the CNN model, respectively. To avoid damage to the mechanism caused by the robotic arm colliding with objects, the Gazebo simulated environment is utilized during deep reinforcement learning training. Finally, some illustrative examples show how effective the proposed strategy is.

誌　　謝	i
摘　　要	ii
ABSTRACT	iii
目　　錄	iv
表 目 錄	vi
圖 目 錄	vii
第一章 緒論	1
1.1 研究背景與動機	1
1.2 挑戰與貢獻	2
1.3 論文架構	2
第二章 文獻探討	4
2.1 物件之偵測模型	4
2.2 深度增強式學習	6
2.3 近端策略最佳化(Proximal Policy Optimization, PPO)	8
2.4 目標物件夾取策略	11
第三章 基於深度增強式學習之物件夾取系統	13
3.1 硬體設備及軟體介紹	13
3.1.1 機械手臂	13
3.1.2 攝影機	15
3.1.3 ROS通訊架構介紹	16
3.1.4 3D模擬環境	17
3.2 基於增強式學習物件夾取策略設計	18
第四章 實驗場景及結果	26
4.1 模擬環境實驗場景	26
4.2 模擬環境增強式學習訓練流程	27
4.3 模擬環境實驗結果與分析	30
4.4 實機實驗場景	34
4.5 實機實驗結果與分析	35
4.5.1 YOLOv8訓練結果	35
4.5.2 實機實驗結果	38
第五章 結論與未來展望	46
5.1 結論	46
5.2 未來展望	46
參考文獻	48
自　　傳	50
學術成就	51
                                

[1] Y. LeCun, "Generalization and network design strategies," Connectionism in perspective 19.143-155 (1989): 18.
[2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," nature 323.6088 (1986): 533-536.
[3] A. Vaswani, et al, "Attention is all you need," Advances in neural information processing systems 30 (2017).
[4] ronny.rest, Semantic Segmentation Tutorial - 01 Intro, Available: https://steemit.com/dtube/@ronny.rest/44b78ivb.
[5] J. Redmon, et al, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
[6] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580-587, 2014.
[7] W. Liu, et al, "Ssd: Single shot multibox detector," Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016.
[8] Jacob Solawetz, Francesco, What is YOLOv8? The Ultimate Guide, Available:https://blog.roboflow.com/whats-new-in-yolov8/#yolov8-architecture-a-deep-dive.
[9] L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research 4 (1996):237-285.
[10] J. Ng, 強化學習 Reinforcement Learning, Available:https://medium.com/@jockeyng/%E5%BC%B7%E5%8C%96%E5%AD%B8%E7%BF%92-reinforcement-learning-487aef228c04.
[11] V. Mnih, et al, "Playing Atari with Deep Reinforcement Learning," arXiv preprint arXiv:1312.5602 , 2013.
[12] J. Schulman, et al, "Proximal Policy Optimization Algorithms," arXiv preprint arXiv:1707.06347, 2017.
[13] T. P. Lillicrap, et al, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
[14] OpenAI, Proximal Policy Optimization, Available: https://spinningup.openai.com/en/latest/algorithms/ppo.html.
[15] 謝佩哲，"具機械手臂之履帶式機器人協作任務之實現"，國立臺灣師範大學，碩士論文，2022年。
[16] S. M. LaValle, "Rapidly-Exploring Random Trees: A New Tool for Path Planning," Research Report 9811, 1998.
[17] O. Salzman, and D. Halperin, "Asymptotically near-optimal RRT for fast, high-quality motion planning," IEEE Transactions on Robotics, vol. 32, no. 3, pp. 473-483, 4 2016.
[18] G. Kang, Y. B. Kim, Y. H. Lee, H. S. Oh, W. S. You, and H. R. Choi, "Sampling-based motion planning of manipulator with goal-oriented sampling," Intelligent Service Robotics, vol.12, no. 3, pp. 265-273, 2019.
[19] N. Ye, A. Somani, D. Hsu, W. S. Lee, "DESPOT: Online POMDP Planning with Regularization," Journal of Artificial Intelligence Research, vol. 58, pp. 231-266, 2017.
[20] M. Zucker, et al, "Chomp: Covariant hamiltonian optimization for motion planning," The International journal of robotics research, vol 32, no. 9-10, pp. 1164-1193, 2013.
[21] KINOVA, KINOVA Gen2 Ultra User Guide, Available: https://drive.google.com/file/d/1xQbkx1-v3SfAentKR9f3p3c2SVdViyQl/view
[22] logitech, C930e 商務網路攝影機, Available: https://lp.logitechclub.com/vc/zh-tw/product/c930e?gclid=CjwKCAjw-IWkBhBTEiwA2exyOxhXKx0y7CC8IEZks5Hys8es-SWnvfI_Sr377EQ0o7ivPnnqXfwMG%20B%20oCQ3QQAvD_BwE.
[23] LFT24, "ROS 介紹：定義、架構、通信機制," 27 6 2020. Available: https://www.twblogs.net/a/5ef6d19f209c567d16133511.
[24] GAZEBO, Available: https://staging.gazebosim.org/home.
[25] Stable Baselines3, Policy Networks, Available: https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html.
[26] H. Lin, Day 11 - Confusion Matrix 混淆矩陣-模型的好壞 (1), Available: https://ithelp.ithome.com.tw/articles/10254593.

電子全文延後公開
2028/08/10

簡易檢索 / 詳目顯示

相關論文