研究生: |
黃而旭 Huang, Erh-Hsu |
---|---|
論文名稱: |
應用於SLAM系統之具有改良式SIFT演算法的立體視覺及其在FPGA上的實現 FPGA-Based Stereo Vision using Improved Scale Invariant Feature Transform Algorithm for SLAM Systems |
指導教授: |
郭建宏
Kuo, Chien-Hung |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 76 |
中文關鍵詞: | 立體視覺 、影像辨識技術 、尺度不變特徵轉換演算法 、特徵匹配 、場域可程式化邏輯陣列 |
英文關鍵詞: | Stereo Vision, Image Recognition, SIFT, Feature Matching, FPGA |
DOI URL: | http://doi.org/10.6345/NTNU202001181 |
論文種類: | 學術論文 |
相關次數: | 點閱:287 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文設計與實現一個立體視覺(Stereo Vision)尺度不變特徵轉換(Scale- Invariant Feature Transform, SIFT)的影像辨識系統,並經由場域可程式化邏輯陣列(Field Programmable Gate Array, FPGA)的硬體加速電路實現。可以應用於即時定位與地圖構建系統(Simultaneous Localization and Mapping, SLAM)中,有效的改善視覺型機器人在自主導航下所需要的影像匹配與地圖建立等議題。在所設計的視覺系統中,機器人能於未知的環境下,能以高運算效率的方式即時比對每張拍攝的影像畫面,匹配出雙眼視覺攝影機兩張影像畫面之間的共同特徵點,並利用雙眼視覺攝影本身的結構特性,計算出各個特徵點到實際攝影機的距離,達到精準匹配影像與距離估測的目標。
本論文中,提出了新的梯度計算方法以及降低特徵描述子維度的方法,這可以大幅減少SIFT的硬體使用量及加快運算速度。此外,本論文也提出了一套立體匹配的方法,透過KITTI資料庫做為輸入影像,並使用對極幾何以及限制範圍的方法來完成立體匹配,並且完成深度的計算。本研究採用Altera的DE2i-150,操作頻率為50MHz,使用KITTI資料庫的立體影像,並擷取影像中心的640×370的大小作為輸入影像。在640×480的輸入影像中,SIFT有著205fps的影像更新率與54,911的邏輯元件使用量。在640×370的輸入影像中,立體視覺SIFT的影像辨識系統有著181fps的影像更新率及140,303的邏輯元件使用量。
This project proposed a stereo vision scale-invariant feature transform(SIFT) image recognition system with the auxiliary design of FPGA hardware acceleration circuit. It can be applied to the SLAM system to effectively improve the image matching and map establishment required by the vision robot under autonomous navigation. In the designed vision system, the robot can instantly compare each captured image frame with high computing efficiency in an unknown environment. Then, it matches the common feature points between the two image frames of the stereo vision. Finally, by using the structural characteristics of stereo camera, the distance between each feature point and the actual camera is calculated to achieve the goal of accurately matching the image and estimating the distance.
In this paper, a new gradient calculation and a method to reduce the dimension of the feature descriptor is proposed to greatly reduce the hardware usage of SIFT and to speed up the calculation speed. Moreover, this paper also proposed a stereo matching method, which uses the KITTI database as the input image and uses Epipolar geometry and limited range methods to complete stereo matching and the depth calculation. In this project, we used Altera DE2i-150 and the operation frequency is 50MHz. Also, we used the stereo image from the KITTI database and captured the size of 640×370 from the center of the image as the input image. In the 640×480 input image, SIFT has an image frame rate of 205fps and a total logical element usage of 54,911. Among the 640×370 input images, the stereoscopic SIFT image recognition system has an image frame rate of 181fps and a total logical element usage of 140,303.
[1] C. Chien, C. J. Hsu, W. Wang, and H. Chiang, “Indirect Visual Simultaneous Localization and Mapping Based on Linear Models,” in IEEE Sensors Journal, vol. 20, no. 5, pp. 2738-2747, 1 March1, 2020.
[2] D. G. Lowe, “Object recognition from local scale-invariant features,” Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, pp. 1150-1157 vol.2, 1999.
[3] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91-110, 2004.
[4] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “SUFT: Speeded Up Robust Feature,” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 309-432, June. 2008.
[5] S. Leutenegger, M. Chli and R. Y. Siegwart, “BRISK: Binary Robust invariant scalable keypoints,” 2011 International Conference on Computer Vision, Barcelona, pp. 2548-2555, 2011.
[6] M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, and P. Fua, “BRIEF: Computing a Local Binary Descriptor Very Fast,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1281-1298, July 2012.
[7] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” 2011 International Conference on Computer Vision, Barcelona, pp. 2564-2571, 2011.
[8] S. A. K. Tareen and Z. Saleem, “A Comparative Analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK,” 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, pp. 1-10, 2018.
[9] C. Harris and M. Stephens, “A Combined Corner and Edge Detector,” Alvey Vision Conf., pp. 147-151, 1988.
[10] R. Hartley and A. Zisserman, “Multiple view Geometry in Computer Vision” in Cambridge university press, 2003.
[11] DE2i-150 Development Kit FPGA System User Manual, Altera, Apr. 2016. (URL: https://www.intel.com/content/dam/altera-www/global/en_US/portal/dsn/42/doc-us-dsnbk-42-2204202203-de2i-150usermanual.pdf)
[12] S. Li, W. Wang, W. Pan, C. J. Hsu, and C. Lu, “FPGA-Based Hardware Design for Scale-Invariant Feature Transform,” in IEEE Access, vol. 6, pp. 43850-43864, 2018.
[13] RAM-Based Shift Register (ALTSHIFT_TAPS) IP Core User Guide, Altera, Aug. 2014. (URL:https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_shift_register_ram_based.pdf)
[14] E. Volder, “The CORDIC Trigonometric Computing Technique,” IRE Transaction Electronic Computers, vol. EC-8, pp. 330-334, 1959.
[15] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, pp. 3354-3361, 2012.
[16] J. Yum, C. H. Lee, J. Park, J. S. Kim, and H. J. Lee, “A Hardware Architecture for the Affine-Invariant Extension of SIFT,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 11, pp. 3251–3261, 2018.
[17] J. Vourvoulakis, J. Kalomiros, and J. Lygouras, “Fully pipelined FPGA-based architecture for real-time SIFT extraction,” Microprocess. Microsyst., vol. 40, pp. 53–73, 2016.
[18] J. Yum, C. H. Lee, J. S. Kim, and H. J. Lee, “A novel hardware architecture with reduced internal memory for real-time extraction of SIFT in an HD video,” IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 10, pp. 1943–1954, 2016.
[19] C. Wilson et al., “A power-efficient real-time architecture for SURF feature extraction,” 2014 Int. Conf. Reconfigurable Comput. FPGAs, ReConFig 2014, 2014.
[20] F. C. Huang, S. Y. Huang, J. W. Ker, and Y. C. Chen, “High-performance SIFT hardware accelerator for real-time image feature extraction,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 3, pp. 340–351, 2012.
[21] L. C. Chiu, T. S. Chang, J. Y. Chen, and N. Y. C. Chang, “Fast SIFT design for real-time visual feature extraction,” IEEE Trans. Image Process., vol. 22, no. 8, pp. 3158–3167, 2013.
[22] D. LIU, G. Zhou, D. Zhang, X. ZHOU, and C. Li, “Ground Control Point Automatic Extraction for Spaceborne Georeferencing Based on FPGA,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 13, pp. 1–1, 2020.
[23] J.Vourvoulakis, J. Kalomiros, and J. Lygouras, “FPGA-based architecture of a real-time SIFT matcher and RANSAC algorithm for robotic vision applications,” Multimed. Tools Appl., vol. 77, no. 8, pp. 9393–9415, 2018.
[24] J. Vourvoulakis, J. Kalomiros, and J. Lygouras, “FPGA accelerator for real-time SIFT matching with RANSAC support,” Microprocess. Microsyst., vol. 49, pp. 105–116, 2017.
[25] J. Vourvoulakis, J. Kalomiros, and J. Lygouras, “A complete processor for SIFT feature matching in video sequences,” Proc. 2017 IEEE 9th Int. Conf. Intell. Data Acquis. Adv. Comput. Syst. Technol. Appl. IDAACS 2017, vol. 1, no. September, pp. 95–100, 2017.