研究生: |
黃朝慶 HUANG, Chao-Ching |
---|---|
論文名稱: |
自動樂譜辨識與打擊樂機器人系統 Automatic Music Score Recognition and Robotic Percussion System |
指導教授: |
王偉彥
Wang, Wei-Yen 蔣欣翰 Chiang, Hsin-Han |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 93 |
中文關鍵詞: | 樂譜辨識 、Delta機械手臂 、深度學習 、影像處理 |
英文關鍵詞: | music score recognition, delta robot, deep learning, digital image processing |
DOI URL: | http://doi.org/10.6345/NTNU202001197 |
論文種類: | 學術論文 |
相關次數: | 點閱:259 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
光學樂譜辨識系統是一套針對樂譜影像進行影像辨識的系統,在樂譜影像中,音符是用以記錄音階和節拍的資訊,在過去許多的研究和實驗當中,針對高解析度的樂譜辨識系統已經達到成熟的階段。然而,基於相機影像的樂譜辨識會受到環境光線、角度和模糊的影響,故仍有進一步研究的必要,我們初次嘗試將深度學習架構應用在基於相機影像的樂譜辨識系統。首先,我們使用線偵測演算法在即時攝影畫面中自動偵測樂譜影像,並找出樂譜當中的五線譜範圍,因為我們只專注於五線譜當中的音符資訊,為了完成這個任務,我們使用霍夫線偵測演算法並取得每一行五線譜的範圍。接下來,為偵測、切割及辨識每一個音符,我們將每一行獨立的五線譜送至基於Darknet53網路之YOLO v3的檢測模型當中,目前可以辨識六類的音符分類名稱分別為全音符、二分音符、四分音符、八分音符、四分休止符和二分休止符,再者,將YOLO v3所偵測到的音符根據樂譜中的位置進行排序,並送至卷積神經網路用以辨識音階,現階段我們可以辨識C3到F4共十一類的音階,最後我們透過RS232連接Delta機械手臂進行樂器的演奏。在光學樂譜辨識的發展中,我使用霍夫線偵測樂譜中每行的五線譜範圍,如此我們可以避免歌詞或圖案的雜訊,減少辨識的錯誤。不僅如此,透過自動化五線譜偵測所取得的樂譜影像使用深度學習的架構進行辨識,並在介面上顯示音階和節拍,至終,我們使用機械手臂進行演奏。
Optical music recognition (OMR) is a system for music score recognition. In music scores, notes are utilized to record pitch and duration information. After much research and experimentation, the recognition of high-resolution music scores is in a mature state. However, the research of the recognition of camera-based music scores is needed because of different illumination and perspective distortions. Therefore, we explored the utilization of deep learning architectures for music object recognition system. At the first step, we performed Hough lines detection algorithm to automatically detect scores, find the staff areas and get the boundary of each staff in real-time because we just needed to focus on the information in these areas. Then, in order to detect, recognize, and make a segmentation of musical notes, our approach was to feed each individual staff row into YOLOv3, which is based on Darknet-53, to classify the notes into six categories: whole notes, half notes, quarter notes, eighth notes, half rest, and quarter rest. After that, we utilized a convolutional neural network (CNN) to recognize the pitch. Currently, eleven classes are considered: pitch from C3 to F4. Finally, we employed one of the Delta robot’s serial ports (RS232) for communication. In the development of the OMR system, by using Hough lines determining for each staff area, we can avoid drawings, text and thus reduce detection errors. Moreover, we utilized deep learning architectures for music object recognition. The proposed system only needs a picture of music score by a webcam as input, and then it can automatically detect the staff area, as well as output the duration and pitch of the notes. Finally, we utilized robotic arms to play musical instruments.
[1] D.Doermann, Handbook of document image processing and recognition. Springer-Verlag London, 2014.
[2] C.R. Boër, L. Molinari-Tosatti, and K.S. Smith , Parallel Kinematic Machines: Theoretical Aspects and Industrial Requirements. Springer-Verlag, 1999.
[3] L.W Tsai, Robot analysis: the mechanics of serial and parallel manipulators. New York, 1999.
[4] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proc. of the IEEE., vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[5] A. Rebelo, I. Fujinaga, F. Paszkiewicz, R.S. Marc¸al, C. Guedes, and J.S. Cardoso, “Optical music recognition: State-of-the-art and open issues,” Int. J. Multimed. Inf. Retr., pp.173-190, Feb. 2012.
[6] Q.N. Vo, S.H Kim, H.J. Yang, and G. Lee, “ An MRF model for binarization of music scores with complex background,” Pattern Recognit. Lett., vol. 69, pp. 88-95, Jan. 2016.
[7] C.M. Dinh, H.J. Yang, G.S. Lee, and S.H. Kim, “ Fast lyric area extraction from images of printed Korean music scores,” IEICE Trans. Inf. Syst., vol. 99, no. 6, pp.1576– 1584, Nov.2016.
[8] C.Y. Tzou, M.J. Hsu, J.Z. Jian, Y.H. Chien , W.Y. Wang, and C.C. Hsu, “Mathematical analysis and practical applications of a serial-parallel robot with delta-like architecture,” Int. J. Eng. Res. Sci., vol. 2, pp.80-91, May. 2016.
[9] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and S. Dieleman, “ Mastering the game of Go with deep neural networks and tree search,” Nature., pp.484-489, Jan. 2016.
[10] C. Wen, A. Rebelo, J. Zhang, and J. Cardoso, “A new optical music recognition system based on combined neural network,” Pattern Recognit. Lett., vol. 58, pp.1-7, Jun. 2015.
[11] Z. Huang, X. Jia, and Y. Guo, “State-of-the-art model for music object recognition with deep learning,” Appl. Sci., Sep. 2019.
[12] A. Rico, “Camera-based Optical Music Recognition using a Convolutional Neural Network,” in IAPR Int. Conf. on Document Anal. and Recognit., Nagoya, Japan, Aug. 2017, pp.27-28.
[13] H.N. Bui, I.S. Na, and S.H. Kim, “ Staff line removal using line adjacency graph and staff line skeleton for camera-based printed music scores,” in Proc. IEEE. ICPR., Stockholm, Sweden, Aug. 2014, pp.2787-2789.
[14] Q.N. Vo, T. Nguyen, S.H. Kim, H.J. Yang, and G.S. Lee, “Distorted music score recognition without Staffline removal,” in Proc. IEEE. ICPR., Stockholm, Sweden, 2014, pp.2956–2960.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural network,” in Proc. NIPS Conf., Lake Tahoe, Nevada, USA, Dec. 2012, pp.1097-1105.
[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, USA, 2016, pp.779-788.
[17] J. Redmon, and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, HI, USA, 2017, pp.7263-7271.
[18] J. Redmon, and A. Farhadi, “YOLOv3: An Incremental Improvement.,” arXiv preprint arXiv:1804.02767.,2018.
[19] W. Liu, et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis., Las Vegas, NV, USA, July 2016, pp. 21-37.
[20] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR., San Diego, CA, USA, 2015.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, NV, USA, 2016, pp.770-778.
[22] S. Christian, et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Boston, MA, 2015, pp.1-9.
[23] J.H. Chen, G.Y. Lu, Y.H. Chien, H.H. Chiang, W.Y. Wang, C.C. Hsu, “A Robotic Arm System Learning from Human Demonstrations,” in Proc. Int. Autom. Control Conf., Taiwan, Nov. 2018, pp.4-7.
[24] Clavel, Reymond, “Device for the movement and positioning of an element in space,” U.S. Patent no 4,976,582, 1990.
[25] 蔡代桓,“應用自適應滑動模式實現於機械手臂之位置控制器設計”, 國立臺灣師範大學 , 碩士 , 2017年7月
[26] 凃昱銘,“基於快速音高序列比對之哼唱式歌曲檢索”, 國立臺北科技大學 , 碩士 , 2012年1月