簡易檢索 / 詳目顯示

研究生: 龔毓婷
Yu-Ting, Kung
論文名稱: 以位元序列為基礎探勘容錯重複樣式之研究
An Efficient Approach for Mining Fault-Tolerant Repeating Patterns based on Bit Sequences
指導教授: 柯佳伶
Koh, Jia-Ling
學位類別: 碩士
Master
系所名稱: 資訊教育研究所
Graduate Institute of Information and Computer Education
論文出版年: 2004
畢業學年度: 92
語文別: 英文
論文頁數: 83
中文關鍵詞: 重複樣式容錯探勘位元序列前K個樣式探勘資料序列
英文關鍵詞: Repeating Patterns, Fault-Tolerant Mining, Bit Sequences, Top-K patterns Mining, Data Sequence
論文種類: 學術論文
相關次數: 點閱:178下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一個有效率的方法對資料序列探勘出前K個非顯然且滿足最小長度限制的容錯重複樣式。我們擴展出現位元序列的表示法,設計出容錯出現位元序列,在考慮有插入或刪除錯誤的容錯情況下,用來表示候選樣式在資料序列中的出現位置。本論文提出二個演算法,分別命名為TFTRP-Mine (Top-K non-trivial FT-RPs Mining)及RE-TFTRP-Mine (REfinement of TFTRP-Mine)。兩個演算法皆根據論文中歸納出的遞迴公式,可系統性地計算出一個樣式的容錯出現位元序列,因而很有效率地得到每個侯選樣式的容錯出現次數。此外,RE-TFTRP-Mine演算法採用額外兩個技巧來砍除搜尋空間以加快探勘效率。由實驗結果得知,當K及min_len的值較小時,RE-TFTRP-Mine比TFTRP-Mine有較好的執行效率;而由實際的樂曲資料之實驗顯示,當探勘過程中有考慮容錯比對時,可以找出更多重要且隱藏的重複樣式。

    An efficient way of mining top-K non-trivial fault-tolerant repeating patterns (FT-RPs in short) with length no less than min_len for data sequences is proposed in this thesis. By extending the idea of appearing bit sequences, fault-tolerant appearing bit sequences are defined to represent the locations where candidate patterns appear in a data sequence with insertion/deletion errors allowed. Two algorithms, named TFTRP-Mine (Top-K non-trivial FT-RPs Mining) and RE-TFTRP-Mine (REfinement of TFTRP-Mine), respectively, are proposed. Both of two algorithms use the recursive formulas to obtain fault-tolerant appearing bit sequences of a pattern systematically and then the fault-tolerant frequency of each candidate pattern could be counted quickly. Besides, RE-TFTRP-Mine adopts two additional strategies to prune the searching space in order to increase the mining efficiency. From experiment results, we can know that RE-TFTRP-Mine outperforms TFTRP-Mine algorithm when K and min_len are small. In addition, when adopting fault tolerant mining, more important and implicit repeating patterns could be found for music objects.

    TABLE OF CONTENTS LIST of TABLES..vi LIST of FIGURES.vii CHAPTER 1 Introduction.1 1.1 Motivation..1 1.2 Literature Review.3 1.2.1 Repeating Patterns Mining.3 1.2.2 Periodic Patterns Mining5 1.2.3 Fault-Tolerant Mining6 1.2.4 Top-k Closed Patterns Mining..8 1.3 Our Approach.10 1.4 Organization....11 CHAPTER 2 Preliminaries...12 2.1 Basic Terms...12 CHAPTER 3 Bit Sequence Representation 17 3.1 Appearing Bit Sequences ..17 3.2 Appearing Bit Sequences of Insertion Fault Tolerance ..20 3.3 Appearing Bit Sequences of Deletion Fault Tolerance 31 CHAPTER 4 Mining Top-K Non-Trivial FT-RPs with min_length Constraints.43 4.1 TFTRP-Mine Algorithm.43 4.2 RE-TFTRP-Mine Algorithm.57 CHAPTER 5 Performance Study.67 5.1 Data Generator67 5.2 Experiment Evaluation68 5.2.1 Performance Evaluation on Execution Time..68 5.2.2 Performance Evaluation on Effectiveness..77 5.3 Conclusions of Experiment Results .78 CHAPTER 6 Conclusion and Feature Works ..81 BIBLIOGRAPHY.82 LIST OF TABLES Table 3.1: The bit index table of “ABCDABCACDEEABCCDEAC”.18 Table 4.1: The bit index table of “ABCDABCACDEEABCCDEAC”.46 Table 5.1: Major Parameters for Data Generator67 LIST OF FIGURES Figure 3.1: Example of the relationship between and .22 Figure 3.2: Example of the way of getting recursively ..27 Figure 3.3: Modification of Figure 3.2.30 Figure 3.4: Example of the relationship between and .33 Figure 3.5: Example of the way of getting recursively..39 Figure 3.6: Example of getting .41 Figure 4.1: ..48 Figure 4.2: ..48 Figure 4.3: Current result in Minlen set (1).49 Figure 4.4: ..49 Figure 4.5: Current result in Minlen set (2).50 Figure 4.6: The searching tree of all candidates starting with “A” 52 Figure 4.7: The current result of the temporal output set after all candidates starting with “A” are traversed .52 Figure 4.8: The searching tree of all candidates starting with “B” 53 Figure 4.9: The current results of Minlen and temporal output set after all candidates starting with “B” are traversed 53 Figure 4.10(a): The searching tree of all candidates starting with “C”.54 Figure 4.10(b): The current results of Minlen and temporal output set after all candidates in Figure 4.10(a) are traversed .54 Figure 4.11(a): The searching tree of all candidates starting with “D”.55 Figure 4.11(b): The current results of Minlen and temporal output set after all candidates in Figure 4.11(a) are traversed .55 Figure 4.12(a): The searching tree of all candidates starting with “E”.55 Figure 4.12(b): The current results of Minlen and temporal output set after all candidates in Figure 4.12(a) are traversed .56 Figure 4.13: The result of mining .56 Figure 4.14: All candidates with length less or equal to min_len = 2 .61 Figure 4.15(a): Candidates generated from “AC” by appending one data item .62 Figure 4.15(b): The contents in Minlen_Heap and temporary Top-K set (1) .62 Figure 4.16(a): Candidates generated from “CD” by appending one data item 63 Figure 4.16(b): The contents in Minlen_Heap and temporary Top-K set (2) 64 Figure 4.17(a): Candidates generated from “CDA” by appending one data item .65 Figure 4.17(b): The contents in Minlen_Heap and temporary Top-K set (3) 65 Figure 4.18: The contents in Minlen_Heap and temporary Top-K set (4) .65 Figure 5.1: The execution time and the number of generated candidate patterns versus the size of data sequences 69 Figure 5.2: The execution time and the number of generated candidate patterns versus the number of various data items in the data sequence.71 Figure 5.3: The execution time and the number of generated candidate patterns versus the setting value of insertion fault tolerance 72 Figure 5.4: The execution time and the number of generated candidate patterns versus the setting value of min_length constraint .74 Figure 5.5: The execution time and the number of generated candidate patterns versus the setting value of K .76 Figure 5.6: The found FT-RPs when mining without fault tolerance versus those of mining with insertion fault tolerance79 Figure 5.7: The found FT-RPs when mining without fault tolerance versus those of mining with deletion fault tolerance 80

    Bibliography
    [1] C. C. Liu, J. L. Hsu, and A. L. P. Chen, “Efficient Theme and Non-Trivial Repeating Pattern Discovering in Music Databases,” in Proceedings of the 15th International Conference on Data Engineering (ICDE’99), 1999.
    [2] C. C. Liu, J. L. Hsu, and A. L. P. Chen, “An Approximate String Matching Algorithm for Content-Based Music Data Retrieval,” in Proceedings of the International Conference on Multimedia Computing and System (ICMCS’99), 1999.
    [3] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential Pattern Mining using A Bitmap Representation,” in Proceedings of the Eighth International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD’02), 2002.
    [4] J. Han, G. Dong, and Y. Yin, “Efficient Mining of Partial Periodic Patterns in Time Series Database,” in Proceedings of the 1999 IEEE International Conference on Data Engineering (ICDE’99), 1999.
    [5] J. Han, J. Wang, Y. Lu, and P. Tzvetkov, “Mining Top-K Frequent Closed Patterns without Minimum Support”, in Proceedings of 2002 International Conference on Data Mining (ICDM'02), 2002.
    [6] J. L. Hsu, C. C. Liu, and A. L.P. Chen, “Efficient Repeating Pattern Finding in Music Databases,” in Proceedings of the Seventh International Conference on Information and Knowledge Management (ACM CIKM’98), 1998.
    [7] J. L. Koh and W. D. C. Yu, “Efficient Feature Mining in Music Objects,” Lecture Notes in Computer Science: DEXA’01: Database and Expert Systems Applications, pp. 221-231, Springer-Verlag, 2001. (EI)
    [8] J. L. Koh and W. D. C. Yu, “Mining Repeating and Sequential Patterns from Music Databases.” Technique Report in Department of Information and Computer Education, National Taiwan Normal University, 1999.
    [9] J. L. Koh and P. W. Yo, “An Efficient Approach for Mining Fault-Tolerant Frequent Itemsets based on Bit Sequences,” Technique Report in Department of Information and Computer Education, National Taiwan Normal University, 2003.
    [10] J. Pei, A. K.H. Tung, and J. Han, “Fault-Tolerant Frequent Pattern Mining: Problem and Challenges,” in Proceedings of ACM-SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’01), 2001.
    [11] J. Yang, W. Wang, and P. Yu, “Infominer: mining surprising periodic patterns,” in Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD’01), 2001.
    [12] P. Tzvetkov, X. Yan, and J. Han, “TSP: Mining Top-K Closed Sequential Patterns”, in Proceedings of 2003 International Conference on Data Mining (ICDM'03), 2003.
    [13] S.-S Wang and S,-Y. Lee, “Mining Fault-Tolerant Frequent Patterns In Large Database,” in Proceedings of Workshop on Software Engineering and Database Systems, International Computer Symposium, Taiwan, 2002.

    QR CODE