研究生: |
隋嘉銘 Sue, Chia-Ming |
---|---|
論文名稱: |
改良深度學習的人形機器人於高動態雜訊之視覺定位 Advanced Deep Learning-based Humanoid Robot Visual Localization With High Motion Noise |
指導教授: |
包傑奇
Jacky Baltes |
口試委員: |
包傑奇
Jacky Baltes 林政宏 Lin, Cheng-Hung 杜國洋 Tu, Kuo-Yang |
口試日期: | 2024/07/01 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 80 |
中文關鍵詞: | 人形機器人 、深度學習 、機器人定位 、視覺里程計 |
英文關鍵詞: | Humanoid Robot, Deep Learning, Robot Localization, Visual Odometry |
DOI URL: | http://doi.org/10.6345/NTNU202400933 |
論文種類: | 學術論文 |
相關次數: | 點閱:70 下載:8 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
一些基於相機或其他技術的視覺 SLAM 方法已經被提出。 光學感測器來導航和了解其環境。例如, ORB-SLAM 是一個完 整的 SLAM 系統,包括視覺里程計、追蹤和定位 ORB-SLAM 僅 依賴使用單目視攝影機進行特徵偵測,但在與人形機器人一起工 作時,會出現嚴重的問題晃動模糊問題。
深度學習已被證明對於穩健且即時的單眼影像重新定位是有 效的。視覺定位的深度學習是基於卷積神經網路來學習 6-DoF 姿 勢。 它對於複雜的照明和運動條件更加穩健。然而,深度學習的 問題是視覺定位方法的一個缺點是它們需要大量的資料集和對這 些資料集的準確標記。
本文也提出了標記視覺定位資料和自動辨識的方法用於訓練 視覺定位的資料集。我們的標籤為基於 2D 平面( x 軸、 y 軸、 方向)的姿勢。最後,就結果而言可見,深度學習方法確實可以 解決運動模糊的問題。比較與我們以往的系統相比,視覺定位方 法減少了最大誤差率 31.73% ,平均錯誤率減少了 55.18% 。
Some visual SLAM methods have been proposed, based on cameras or other optical sensors to navigate and understand their environment. For example,ORBSLAM is a complete SLAM system, including visual odometry, tracking, and loop back detection. ORB-SLAM depends solely on feature detection using a monocular camera, but when working with humanoid robots, there will be serious motion blur problems.
Deep learning has shown to be effective for robust and real-time monocular image relocalization. The deep learning for visual localization is based on a convolutional neural network to learn the 6-DoF pose. It is more robust to codifficult lighting and motion conditions. However, the problem with deep learning methods for visual localization is that they require a lot of datasets and accurate labeling for these datasets.
This thesis also proposes methods for labeling visual localization data and augmenting datasets for training visual localization. Our labels regress the camera pose based on a 2D plane (x-axis, y-axis, orientation). Finally, in terms of results, deep learning methods can indeed solve the problem of motion blur. Compared to our previous systems, the visual localization method reduces maximum errors by 31.73% and average errors by 55.18%.
[1] M. Murooka, Y. Kakiuchi, K. Okada, and M. Inaba, “Whole¬body posture evaluation and modification for crane¬less servo¬off operation of life¬sized humanoid robot,” in 2018 IEEE¬RAS 18th International Conference on Humanoid Robots (Humanoids), pp. 1–9, 2018.
[2] S. Mokssit, D. B. Licea, B. Guermah, and M. Ghogho, “Deep learning techniques for visual slam: A survey,” IEEE Access, vol. 11, pp. 20026–20050, 2023.
[3] H.¬F. Wang, Y.¬H. Shan, T. Hao, X.¬M. Zhao, S.¬Z. Song, H. Huang, and J.¬J. Zhang, “Vehicle¬road environment perception under low¬visibility condition based on polarization features via deep learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 10, pp. 17873–17886, 2022.
[4] M. S. Mueller, A. Metzger, and B. Jutzi, “Cnn¬based initial localization improved by data augmentation,” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. IV¬1, pp. 117–124, 2018.
[5] S. H. S. Basha, S. R. Dubey, V. Pulabaigari, and S. Mukherjee, “Impact of fully connected layers on performance of convolutional neural networks for image classification,” CoRR, vol. abs/1902.02771, 2019.
[6] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” 2015.
[7] Ultralytics, “YOLOv5: A state¬of¬the¬art real¬time object detection system.” https:// docs.ultralytics.com, 2021. Accessed: insert date here.
[8] A. Bochkovskiy, C. Wang, and H. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” CoRR, vol. abs/2004.10934, 2020.
[9] C. Wang, H. Mark Liao, Y. Wu, P. Chen, J. Hsieh, and I. Yeh, “Cspnet: A new backbone that can enhance learning capability of cnn,” in Proceedings ¬ 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020, IEEE 54 Computer Society Conference on Computer Vision and Pattern Recognition Workshops, (United States), pp. 1571–1580, IEEE Computer Society, June 2020. Publisher Copyright: © 2020 IEEE.; 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020 ; Conference date: 14¬06¬2020 Through 19¬06¬2020.
[10] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8768, 2018.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems(F. Pereira, C. Burges, L. Bottou, and K. Weinberger, eds.), vol. 25, Curran Associates, Inc., 2012.
[12] B. Alsadik and S. Karam, “The simultaneous localization and mapping (slam)-an overview,” Journal of Applied Science and Technology Trends, vol. 2, pp. 147 – 158, Nov. 2021.
[13] K. Liang, F. He, Y. Zhu, and X. Gao, “A semi¬supervised learning based on variational autoencoder for visual¬based robot localization,” in Computer Supported Cooperative Work and Social Computing (Y. Sun, T. Lu, B. Cao, H. Fan, D. Liu, B. Du, and L. Gao, eds.), (Singapore), pp. 615–627, Springer Nature Singapore, 2022.
[14] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” CoRR, vol. abs/1502.03167, 2015.
[15] C. F. G. D. Santos and J. a. P. Papa, “Avoiding overfitting: A survey on regularization methods for convolutional neural networks,” ACM Comput. Surv., vol. 54, sep 2022.
[16] C. Garbin, X. Zhu, and O. Marques, “Dropout vs. batch normalization: an empirical study of their impact to deep learning,” Multimedia Tools and Applications, vol. 79, pp. 12777 – 12815, 2020.
[17] X. Zhang, J. Feng, Z. Yu, Z. Hong, and X. Yun, “Prediction of the slope solute loss based on bp neural network,” Computers, Materials & Continua, vol. 69, no. 3, pp. 3871–3888, 2021. 55
[18] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2017.
[19] Q. Liu, H. Zhang, Y. Xu, and L. Wang, “Unsupervised deep learning¬based rgb¬d visual odometry,” Applied Sciences, vol. 10, no. 16, 2020.
[20] R. Li, S. Wang, Z. Long, and D. Gu, “Undeepvo: Monocular visual odometry through unsupervised deep learning,” CoRR, vol. abs/1709.06841, 2017.
[21] A. Ajmal, C. Hollitt, M. Frean, and H. Al¬Sahaf, “A comparison of rgb and hsv colour spaces for visual attention models,” in 2018 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6, 2018.
[22] C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós, “ORBSLAM3: an accurate open¬source library for visual, visual¬inertial and multi¬map SLAM,” CoRR, vol. abs/2007.11898, 2020.
[23] S. Wang, T. Xie, and P. Chen, “Simultaneous localisation and mapping of intelligent mobile robots,” International Journal of Cybernetics and Cyber¬Physical Systems, vol. 1, no. 1, pp. 93–104, 2021.
[24] Y.¬J. Zhang, Camera Calibration, pp. 37–65. Singapore: Springer Nature Singapore, 2023.
[25] Q. Li, R. Li, K. Ji, and W. Dai, “Kalman filter and its application,” in 2015 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS), pp. 74–77, 2015.
.