研究生: |
林彥榕 Lin, Yen-Jung |
---|---|
論文名稱: |
用於陪伴型機器人之輕量化深度學習音樂情緒辨識模型 Lightweight Deep Learning Music Emotion Recognition Model for Companion Robots |
指導教授: |
呂成凱
Lu, Cheng-Kai |
口試委員: |
呂成凱
Lu, Cheng-Kai 林承鴻 Lin, Cheng-Hung 連中岳 Lien, Chung-Yueh |
口試日期: | 2024/07/15 |
學位類別: |
碩士 Master |
系所名稱: |
電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 84 |
中文關鍵詞: | 深度學習 、卷積神經網路 、音樂情緒辨識 、輕量化模型 、陪伴機器人 |
英文關鍵詞: | Deep Learning, Convolutional Neural Networks, Music Emotion Recognition, Lightweight Models, Companion Robots |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202401378 |
論文種類: | 學術論文 |
相關次數: | 點閱:497 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
為了應對現今社會高齡化,導致老人缺乏陪伴導致的孤獨問題,本研究提出用於陪伴型機器人Zenbo Junior II的音樂情緒辨識模型來解決老人孤獨導致的情緒問題。在音樂情緒辨識這個研究領域中,雖然也有很多人已經在進行這項研究,但是這些研究中沒有能用於Zenbo Junior II的輕量化架構。本研究提出的方法是使用一維卷機神經網路(1D-Convolutional Neural Network, 1D-CNN)替換掉常用的2D-CNN並且使用閘門循環單元(Gated Recurrent Unit, GRU)使模型能更好的考慮音頻特徵的連續性。在訓練完模型後儲存並應用於Zenbo Junior II上,先將另一研究的情緒對應成4種情緒後播放音樂調適情緒。本研究提出之模型在PMEmo數據集上Valence和Arousal分別為0.04和0.038與其他模型相比效能最好。並且參數量僅有0.721M浮點運算次數僅有9.303M,遠小於其他相比較之模型。運算強度最靠近Zenbo Junior II之最佳工作點,且模型辨識音樂所需推理時間僅需229毫秒,可以即時辨識出音樂的情緒。這些表明本研究成功提出一個輕量化且效能優異,並且可以在Zenbo Junior II上運行的模型。
To address the loneliness of the elderly in an aging society, this study proposes a music emotion recognition model for the companion robot Zenbo Junior II. Although many researchers have studied music emotion recognition, no lightweight framework exists for Zenbo Junior II. This study uses a 1D-Convolutional Neural Network (1D-CNN) instead of the commonly used 2D-CNN and incorporates a Gated Recurrent Unit (GRU) to better capture audio feature continuity. After training, the model was saved and deployed on Zenbo Junior II, where emotions from another study were mapped to four categories, and music was played to adjust emotions. The proposed model achieved the best performance on the PMEmo dataset with Valence and Arousal scores of 0.04 and 0.038, respectively. It has only 0.721M parameters and 9.303M FLOPs, significantly smaller than other models. The computational strength is closest to Zenbo Junior II's optimal operating point, and the model's inference time for music recognition is only 229 milliseconds. These results demonstrate that this study has successfully developed a lightweight and efficient model suitable for Zenbo Junior II.
[1] World Population Prospects 2022, [Online]. Available: https://www.un.org/development/desa/pd/sites/www.un.org.development.desa.pd/files/wpp2022_summary_of_results.pdf, 2022.
[2] World Population Ageing 2020 Highlights: Living arrangements of older persons , United Nations Department of Economic and Social Affairs, Population Division, 2020.
[3] B. Gaire, B.R. Khagi . “Attitude of the youth towards the elderly people in the selected community in Lalitpur district of Nepal”, Journal of National Medical College., vol. 1, no. 5, pp.46-53, 2020.
[4] 國家發展委員會「中華民國人口推估(2022至2070年)」2022年8月。
[5] WHO, Investing in the health workforce enables stronger health systems, inFact sheet. Belgrade, Copenhagen, 2007.
[6] B. Joost, et al. Assistive social robots in elderly care: a review. Gerontechnology, vol. 2, no.8, pp.94-103, 2009.
[7] G. Norina, et al. Friends from the future: a scoping review of research into robots and computer agents to combat loneliness in older people. Clinical interventions in aging, 2021, pp.941-971.
[8] Eilik陪伴機器人集資計. [Online]. Available: https://backme.tw/ref/TOhZx/, 2022年11月。
[9] C. L. Krumhansl, “Music: A link between cognition and emotion,” Current directions in psychological science., vol.2, no.11, pp.45-50, 2002.
[10] Zenbo. [Online]. Available: https://zenbo.asus.com/tw/product/zenbo/specifications/.
[11] Buddy. [Online]. Available: https://www.wevolver.com/specs/buddy.
[12] Hovis Genie. [Online]. Available: http://www.smart-robot.com.tw/product_d.php?lang=tw&tb=1&id=39&cid=20.
[13] P. Ekman, “An argument for basic emotions.” Cognition & emotion., vol. 6, no.6-4, pp.169-200, 1992.
[14] J. Mizgajski, M. Morzy , “Affective recommender systems in online news industry: how emotions influence reading choices,” User Modeling and User-Adapted Interaction., vol. 29, no. 2, pp. 345-379, 2019.
[15] R. Plutchik, “A general psychoevolutionary theory of emotion,” Theories of emotion. Academic press, pp. 3-33, 1980.
[16] M. Zentner, D. Grandjean, KR. Scherer, ”Emotions evoked by the sound of music: characterization, classification, and measurement,” Emotion, vol. 8, no. 4, pp. 494, 2008.
[17] J.A. Russell, “A circumplex model of affect,” Journal of personality and social psychology, vol. 39, no. 6, pp.1161, 1980.
[18] D. Han, Y. Kong, J. Han, and G. Wang, “A survey of music emotion recognition,” Frontiers of Computer Science, vol. 16, no. 6, pp.166335, 2022.
[19] X. Liu, Q. Chen, X. Wu, Y. Liu, and Y. Liu, "CNN based music emotion classification," arXiv preprint arXiv:1704.05665, 2017.
[20] W.J. Han, H.F. Li, H.B. Ruan, and L. Ma, "Review on speech emotion recognition," Journal of Software, vol. 25, no. 1, pp. 37-50, 2014.
[21] Y.H. Yang and H. H. Chen, "Machine recognition of music emotion: A review," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 3, no. 3, pp. 1-30, 2012.
[22] Z. Fu, G. Lu, K. M. Ting, and D. Zhang, "A survey of audio-based music classification and annotation," IEEE Transactions on Multimedia, vol. 13, no. 2, pp. 303-319, 2010.
[23] M. Barthet, G. Fazekas, and M. Sandler, "Music emotion recognition: From content-to context-based models," in Proc. From Sounds to Music and Emotions: 9th International Symposium, CMMR 2012, London, UK, June 19-22, 2012, Revised Selected Papers, vol. 9, Springer Berlin Heidelberg, 2013, pp. 228-252.
[24] B. Logan, "Mel frequency cepstral coefficients for music modeling," in Proc. International Society for Music Information Retrieval Conference (ISMIR), IEEE, 2000, p. 11.
[25] P. Chen, L. Zhao, Z. Xin, Y. Qiang, M. Zhang, and T. Li, "A scheme of MIDI music emotion classification based on fuzzy theme extraction and neural network," in Proc. 2016 12th International Conference on Computational Intelligence and Security (CIS), Dec. 2016, pp. 323-326.
[26] R. Panda, R. Malheiro, and R. P. Paiva, "Novel audio features for music emotion recognition," IEEE Trans. Affective Comput., vol. 11, no. 4, pp. 614-626, 2018.
[27] P. N. Juslin and P. Laukka, "Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening," J. New Music Res., vol. 33, no. 3, pp. 217-238, 2004.
[28] X. Hu, J. S. Downie, and A. F. Ehmann, "Lyric text mining in music mood classification," Amer. Music, vol. 183, no. 5, pp. 2-209, 2009.
[29] M. Van Zaanen and P. H. M. Kanters, "Automatic mood classification using tf-idf based on lyrics," in Proc. 11th Int. Soc. for Music Information Retrieval Conf. (ISMIR 2010), TiCC, 2010, pp. 75-80.
[30] R. Malheiro et al., "Emotionally-relevant features for classification and regression of music lyrics," IEEE Trans. Affective Comput., vol. 9, no. 2, pp. 240-254, 2016.
[31] N. Thammasan, K. Fukui, and M. Numao, "Application of deep belief networks in EEG-based dynamic music-emotion recognition," in Proc. 2016 Int. Joint Conf. on Neural Networks (IJCNN), IEEE, 2016, pp. 881-888.
[32] P. C. Petrantonakis and L. J. Hadjileontiadis, "Emotion recognition from brain signals using hybrid adaptive filtering and higher order crossings analysis," IEEE Trans. Affective Comput., vol. 1, no. 2, pp. 81-97, 2010.
[33] X. Hu, F. Li, and T.-D. J. Ng, "On the relationships between music-induced emotion and physiological signals," in Proc. Int. Soc. for Music Information Retrieval Conf. (ISMIR), 2018, pp. 362-369.
[34] N. E. Nawa, D. E. Callan, P. Mokhtari, H. Ando, and J. Iversen, "Decoding music-induced experienced emotions using functional magnetic resonance imaging—Preliminary results," in Proc. 2018 Int. Joint Conf. on Neural Networks (IJCNN), IEEE, 2018, pp. 1-7.
[35] SUPPORT VECTOR MACHINE (SVM) MODEL. [Online]. Available:https://www.raybiotech.com/learning-center/support-vector-machine-model/
[36] D. Basak, S. Pal, and D. C. Patranabis, "Support vector regression," Neural Inf. Process. Lett. Rev., vol. 11, no. 10, pp. 203-224, 2007.
[37] T. Li and M. Ogihara, Detecting emotion in music, 2003.
[38] D. Unni, A. M. D’Cunha, and G. Deepa, "A technique to detect music emotions based on machine learning classifiers," in Proc. 2022 Second Int. Conf. on Interdisciplinary Cyber Physical Systems (ICPS), IEEE, 2022, pp. 136-140.
[39] Y. H. Yang, Y. C. Lin, Y. F. Su, and H. H. Chen, "A regression approach to music emotion recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 2, pp. 448-457, 2008.
[40] 卷積神經網絡(CNN). [Online]. Available: https://medium.com/ching-i/%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E7%B5%A1-convolutional-neural-network-cnn-d7246d24ff3e
[41] P. T. Yang, S. M. Kuang, C. C. Wu, and J. L. Hsu, "Predicting music emotion by using convolutional neural network," in Proc. Int. Conf. on Human-Computer Interaction, Cham, Switzerland: Springer Int. Publishing, 2020, pp. 266-275
[42] R. Sarkar, S. Choudhury, S. Dutta, A. Roy, and S. K. Saha, "Recognition of emotion in music based on deep convolutional neural network," Multimedia Tools Appl., vol. 79, no. 1, pp. 765-783, 2020.
[43] T. Bathigama and S. Madushika, "Multi-representational music emotion recognition using deep convolution neural networks," Authorea Preprints, 2023.
[44] S. Chowdhury, V. Praher, and G. Widmer, "Tracing back music emotion predictions to sound sources and intuitive perceptual qualities," arXiv preprint arXiv:2106.07787, 2021.
[45] P.-C. Chang, Y.-S. Chen, and C.-H. Lee, "IIOF: Intra-and inter-feature orthogonal fusion of local and global features for music emotion recognition," Pattern Recognit., vol. 148, p. 110200, 2024.
[46] P.-C. Chang, Y.-S. Chen, and C.-H. Lee, "MS-SincResNet: Joint learning of 1D and 2D kernels using multi-scale SincNet and ResNet for music genre classification," in Proc. 2021 Int. Conf. on Multimedia Retrieval, 2021, pp. 29-36.
[47] 淺談遞歸神經網路 (RNN) 與長短期記憶模型 (LSTM). [Online]. Available: https://tengyuanchang.medium.com/%E6%B7%BA%E8%AB%87%E9%81%9E%E6%AD%B8%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-rnn-%E8%88%87%E9%95%B7%E7%9F%AD%E6%9C%9F%E8%A8%98%E6%86%B6%E6%A8%A1%E5%9E%8B-lstm-300cbe5efcc3
[48] C. Huang and Q. Zhang, "Research on music emotion recognition model of deep learning based on musical stage effect," Sci. Prog., vol. 2021, pp. 1-10, 2021.
[49] M. Malik, S. Adavanne, K. Drossos, T. Virtanen, D. Ticha, and R. Jarina, "Stacked convolutional and recurrent neural networks for music emotion recognition," arXiv preprint arXiv:1706.02292, 2017.
[50] W. D. Dang, D. M. Lv, R. M. Li, L. G. Rui, Z. Y. Yang, C. Ma, and Z. K. Gao, "Multilayer network-based CNN model for emotion recognition," Int. J. Bifurcation Chaos, vol. 32, no. 01, p. 2250011, 2022.
[51] C. Zhang, J. Yu, and Z. Chen, "Music emotion recognition based on combination of multiple features and neural network," in Proc. 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conf. (IMCEC), IEEE, 2021, pp. 1461-1465.
[52] A. Aljanaki, F. Wiering, and R. C. Veltkamp, "Studying emotion induced by music through a crowdsourcing game," Inf. Process. Manage., vol. 52, no. 1, pp. 115-128, 2016.
[53] M. Soleymani, A. Aljanaki, and Y. H. Yang, "DEAM: Mediaeval database for emotional analysis in music," Geneva, Switzerland, 2016.
[54] K. Zhang, H. Zhang, S. Li, C. Yang, and L. Sun, "The PMEmo dataset for music emotion recognition," in Proc. 2018 ACM Int. Conf. on Multimedia Retrieval, 2018, pp. 135-142.
[55] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in Proc. Int. Conf. on Machine Learning, PMLR, 2015, pp. 448-456.
[56] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," in Proc. 14th Int. Conf. on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, 2011, pp. 315-323.
[57] N. Srivastava, "Dropout: A simple way to prevent neural networks from overfitting," J. Machine Learn. Res., vol. 15, no. 1, pp. 1929-1958, 2014.
[58] M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
[59] J. Platt, "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods," in Proc. in Advances in Large Margin Classifiers, vol. 10, no. 3, pp. 61-74, 1999.
[60] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
[61] MFCC - Significance of number of features. [Online]. Available: https://dsp.stackexchange.com/questions/28898/mfcc-significance-of-number-of-features
[62] 呂健維(2024)。基於臉部及語音特徵之輕量化深度學習情感辨識系統。碩士論文,國立臺灣師範大學電機工程學系,2024。
[63] A. Toisoul, J. Kossaifi, A. Bulat, G. Tzimiropoulos, and M. Pantic, "Estimation of continuous valence and arousal levels from faces in naturalistic conditions," Nat. Mach. Intell., vol. 3, no. 1, pp. 42-50, 2021.
[64] 武敏, "音乐情绪情感与个体反应对心理治疗的作用," 音乐创作, vol. 4, pp. 186-187, 2010.
[65] 獨老者的餐桌》居住正義背後,被視而不見的獨居老人如何老有所終?[Online]. Available: https://www.thenewslens.com/feature/oldalone/41606