研究生: |
俞大智 William Ta-Chih Yu |
---|---|
論文名稱: |
音樂資料庫中重複及循序特徵探勘之研究 Mining Repeating and Sequential Patterns from Music Databases |
指導教授: | 柯佳伶 |
學位類別: |
碩士 Master |
系所名稱: |
資訊教育研究所 Graduate Institute of Information and Computer Education |
畢業學年度: | 87 |
語文別: | 中文 |
論文頁數: | 65 |
中文關鍵詞: | 資料探勘 、重複特徵 、循序特徵 |
英文關鍵詞: | data mining, repeating pattern, sequential pattern |
論文種類: | 學術論文 |
相關次數: | 點閱:373 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在此論文中,我們探討音樂資料探勘的兩個主要問題,包含探勘一首樂曲中的重複特徵,以及多首樂曲中共同出現的循序特徵。重複特徵是指一首樂曲中重複出現之音符序列,能表現該樂曲之主要旋律。而多首樂曲中所具有的共同循序特徵還可以分為常出現循序特徵以及常出現非連續循序特徵兩種,其能夠表現出這些樂曲的共同旋律特性。這兩類特徵均是樂曲資料的重要屬性,可以作為樂曲搜尋或分類之關鍵特徵。
為解決重複特徵探勘的問題,我們提出位元索引序列表示法,運用此儲存表示法設計出能夠有效率探勘出重複特徵的演算法。其概念是逐次增加一個音符來漸進地產生候選音符序列,再運用位元索引序列的運算來驗證其出現次數,以探勘出重複特徵。另外,為解決循序特徵探勘的問題,我們提出出現位元序列表示法,搭配位元索引表示法,進行候選循序特徵的初步篩選,以產生常出現循序特徵,並由特徵組合演算法,進一步從常出現循序特徵的組合中來探勘常出現非連續循序特徵。
為證實探勘重複特徵之演算法的效率,我們使用台灣流行音樂資料以及電腦亂數資料,與其他兩個相關演算法的執行效率與空間需求做比較。實驗結果證實,我們所提出的演算法對於真實樂曲資料與電腦亂數資料在執行中所需記憶體大小較小並具有較優的執行效率。另外在探勘多首樂曲中的共同循序特徵方面,實驗結果顯示我們所提出之演算法能夠正確地找出全部的常出現循序特徵及常出現循非連續序特徵。
In this paper, two research issues on music data mining are studied. The first one is mining repeating patterns in a single music and the other one is mining common sequential patterns in a set of music. The common sequential patterns can be classified further into two kinds of patterns: one is large sequential pattern and another is non-continuous sequential pattern. These kinds of patterns can represent the important features of music, which can be used for music data retrieval or music classification. The MRP algorithm is proposed for mining repeating patterns. The basic idea is to add one note at a time to produce the candidate note streams. Then the bit index stream representation is designed for counting the appearing frequency of a note stream efficiently in order to verify the repeating patterns. In addition, the appearance bit stream representation is proposed to cooperate with the bit index stream representation for mining the large sequential patterns. Moreover, by applying the pattern join algorithm, the large non-continuous sequential patterns will be found from the combinations of the large sequential patterns. Finally, the result of the experiments shows that the proposed algorithms have better performance than the related works.
[1] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," in Proc. 20th International Conference on Very Large Data Bases, 1994.
[2] R. Agrawal and R. Srikant, "Mining Sequential Patterns," in Proc. of the International Conference on Data Engineering (ICDE), Taipei, Taiwan, 1995.
[3] K. Ali, S. Manganaris, and R. Srikant, "Partial Classification using Association Rules", in Proc. of the 3rd International Conference on Knowledge Discovery in Databases and Data Mining, 1997.
[4] C. Bettini, S. Wang, S. Jajodia, and J.-L. Lin, "Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences," IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 2, 1998.
[5] C.H. Chang and C.C. Hsu, "A Multi-Engine Search Tool based on Clustering," in Proc. of the 6th International WWW Conference, 1997.
[6] M.S. Chen, J. Han, and P.S. Yu, "Data Mining: An Overview from a Database Perspective," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, 1996
[7] M.S. Chen, J.S. Park and P.S. Yu, "Data Mining for Path Traversal Patterns," IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 2, 1998.
[8] D.W. Cheung, V.T. Ng, A.W. Fu, and Y. Fu, "Efficient Mining of Association Rules in Distributed Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, 1996.
[9] J.Han, W.Gong, and Y.Yin, "Mining Segment-Wise Periodic Patterns in Time-Related Databases," in Proc. Knowledge Discovery and Data Mining (KDD'98), 1998.
[10] J.L. Hsu, C.C. Liu, and A.L.P Chen, "Efficient Repeating Pattern Finding in Music Databases," in Proc. of the 1998 ACM 7th International Conference on Information and Knowledge Management (CIKM'98), 1998.
[11] Roberto J and Bayardo Jr., "Efficiently Mining Long Patterns from Databases," in Proc. ACM SIGMOD International Conference on Management of Data, 1998.
[12] E.M. Knorr and R.T. Ng, "Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, 1996.
[13] C.-C. Liu, J.-L. Hsu and A.L.P. Chen, "Efficient Theme and Non-Trivial Repeating Pattern Discovering in Music Databases," in Proc. IEEE International Conference on Data Engineering, 1999.
[14] L. Singh, P. Scheuermann, and B. Chen, "Generating Association Rules from Semi-Structured Documents Using an Extended Concept Hierarchy," in Proc. of the Sixth International Conference on Information and Knowledge Management (CIKM'97), 1997.
[15] S. Thomas and S. Sarawagi, "Mining Generalized Association Rules and Sequential Patterns Using SQL Queries," in Proc. of the 4th International Conference on Knowledge Discovery in Databases and Data Mining, 1998.
[16] H. Toivonen, "Sampling Large Databases for Association Rules," in Proc. of the 22nd Very Large Data Bases Conference, Mumbai (Bombay), India, 1996.
[17] K. Wang, "Discovering Patterns from Large and Dynamic Sequential Data," Journal of Intelligent Information Systems (JIIS), vol. 9, no 1, 1997.
[18] S.J. Yen and A.L.P. Chen, "An Efficient Data Mining Technique for Discovering Interesting Association Rules," in Proc. International Workshop on Database and Expert Systems Applications (DEXA), IEEE Computer Society, 1997.